Analysis Of Quantile Regression As Alternative To Ordinary Least Squares Regression – Complete project material

[ad_1]

ABSTRACT

In this thesis, we present an alternative to ordinary least squares (OLS) regression based on analytical solution in the Statgraphics software is considered, and this alternative is no other than quantile regression (QR) model. We also present goodness of fit statistic called Quantile regression coefficient of determination as well as heteroskedasticity test statistics for the parameters. The procedure is well presented, illustrated and validated by a numerical example based on publicly available dataset on fuel consumption in miles per gallon in highway driving. Theresults obtained from the analysis in this thesissuggest that sometimes OLS estimates can even be misleading what the true relationship between response variable and covariate as the effects can be very different for different subsections of the sample. Quantile Regression therefore gives a better and more complete view of the relationship among random variables.

 

 

TABLE OF CONTENTS

DECLARATION ……………………………………………………………………………………………………………………. i
CERTIFACATION ………………………………………………………………………………………………………………… ii
DEDICATION ……………………………………………………………………………………………………………………… iii
ACKNOWLEDMENT ………………………………………………………………………………………………………….. iv
ABSTRACT ………………………………………………………………………………………………………………………….. v
TABLE OF CONTENTS ……………………………………………………………………………………………………….. vi
List of Tables ……………………………………………………………………………………………………………………… viii
List of Figures ………………………………………………………………………………………………………………………. ix
Appendix ………………………………………………………………………………………………………………………………. x
CHAPTER ONE ……………………………………………………………………………………………………………………. 1
INTRODUCTION ……………………………………………………………………………………………………………… 1
1.0 Background of the Study…………………………………………………………………………………………… 1
1.1 Limitations for the Conditional Mean …………………………………………………………………………. 2
1.2 Motivation of the Study ……………………………………………………………………………………………. 4
1.3 Statement of the Problem ………………………………………………………………………………………….. 5
1.4 Scope of the study ……………………………………………………………………………………………………. 5
1.5 Significance of the study …………………………………………………………………………………………… 6
1.6 Aim and Objective(s) of the Study …………………………………………………………………………….. 6
1.7 Limitation of the Study …………………………………………………………………………………………….. 7
1.8 Statement of Hypotheses …………………………………………………………………………………………… 7
CHAPTER TWO …………………………………………………………………………………………………………………… 8
LITERATURE REVIEW ……………………………………………………………………………………………………. 8
2.0 Introduction ………………………………………………………………………………………………………………… 8
2.1 Multiple Linear Regression ……………………………………………………………………………………….. 8
2.2 Quantile Regression ……………………………………………………………………………………………….. 10
CHAPTER THREE ……………………………………………………………………………………………………………… 16
METHODOLOGY …………………………………………………………………………………………………………… 16
3.0 Introduction ……………………………………………………………………………………………………………… 16
3.1 Data Collection ……………………………………………………………………………………………………… 16
3.2 Classical Linear Regression …………………………………………………………………………………….. 17
3.3 Estimation of the Parameters in Linear Regression Models …………………………………………. 18
3.4 Estimating
2  ……………………………………………………………………………………………………….. 22
vii
3.5 Properties of the Estimators …………………………………………………………………………………….. 23
3.6 Coefficient of Multiple Determination
2 R ……………………………………………………………….. 23
3.7 Stepwise Selection …………………………………………………………………………………………………. 25
3.8 Quantile Regression ……………………………………………………………………………………………….. 25
3.8.1 Computation of Quantile Regression …………………………………………………………………. 26
3.8.2 Least Absolute Deviation Regression Goodness of Fit …………………………………………. 30
3.8.3 Quantile Regression Goodness of Fit …………………………………………………………………. 31
CHAPTER FOUR ………………………………………………………………………………………………………………… 33
RESULTS AND DISCUSSION …………………………………………………………………………………………. 33
4.0 Introduction ……………………………………………………………………………………………………………… 33
4.1 Numerical Illustration and Discussion of the result …………………………………………………….. 33
4.2 Ordinary least squares regression …………………………………………………………………………….. 34
4.3 Heteroskedasticity Test: White ………………………………………………………………………………… 35
4.4 Comparison of OLS and QR as the number of variable increases …………………………………. 36
CHAPTER FIVE …………………………………………………………………………………………………………………. 38
SUMMARY CONCLUSIONS AND RECOMMENDATIONS ……………………………………………… 38
5.0 Introduction ……………………………………………………………………………………………………………… 38
5.1 Summary ………………………………………………………………………………………………………………. 38
5.2 Conclusion ……………………………………………………………………………………………………………. 39
5.3 Recommendations ………………………………………………………………………………………………….. 39
5.4 Contribution to knowledge………………………………………………………………………………………. 39
5.5 Further research …………………………………………………………………………………………………….. 40
REFERENCES………………………………………………………………………………..41
APPENDIX A ……………………………………………………………………………………………………………………… 44
viii

 

 

CHAPTER ONE

INTRODUCTION
1.0 Background of the Study
In most regression problems, interest lies in studying the relationship between two or more variables. Where it is an important aspect in the philosophy of science to study the concept of relationship between varying qualities or events. The purpose of regression analysis is to expose the relationship between a response variable and predictor variables. In real applications, the response variable cannot be predicted exactly from the predictor variables. Instead, the response for a fixed value of each predictor variable is a random variable. For this reason, we often summarize the behavior of the response for fixed values of the predictors using measures of central tendency. Typical measures of central tendency are the average value (mean), the middle value (median), or the most likely value (mode). Traditional regression analysis is focused on the mean; that is, we summarize the relationship between the response variable and predictor variables by describing the mean of the response for each fixed value of the predictors, using a function we refer to as the conditional mean of the response. The idea of modeling and fitting the conditional-mean function is at the core of a broad family of regression-modeling approaches, including the familiar simple linear-regression model, multiple regression, models with “heteroscedastic” errors using weighted least squares, and nonlinear regression models.
Conditional-mean models have certain attractive properties. Under ideal conditions, they are capable of providing a complete and parsimonious description of the relationship between the covariates and the response distribution. In addition, using conditional-mean models leads to
2
estimators (least squares and maximum likelihood) that possess attractive statistical
properties that are easy to calculate, and are straightforward to interpret. Such models have
been generalized in various ways to allow for “heteroscedastic” errors so that given the
predictors, modeling of the conditional mean and conditional scale of the response can be
carried out simultaneously. Conditional-mean modeling has been applied widely in the social
sciences, particularly in the past half century, and regression modeling of the relationship
between a continuous response and covariates via least squares and its generalization is now
seen as an essential tool. More recently, models for binary response data, such as Logistic,
Probit and Poisson regression models for count data have become increasingly popular in
social science research. These approaches fit naturally within the conditional mean modeling
framework. While quantitative social-science researchers have applied advanced methods to
relax some basic modeling assumptions under the conditional-mean framework, this
framework itself is seldom questioned.
The conditional-mean framework has inherent limitations.
1.1 Limitations for the Conditional Mean
 The mean of the errors is zero, i.e.   0 i E  .
 The error is assumed to have equal variance that is, var   2   i for all value of X  x
 The distributions of errors for different value of regressors are independent, denoting
the errors in different value by i j  and , that is, Cov  0 i j. i j     
 The probability distribution of errors for all values of x is normally distributed by
 2   ~ N 0, i .
An alternative to conditional-mean modeling believed to have roots that can be traced to the
mid-18th century. This approach can be referred to as conditional median modeling, or
simply median regression. It addresses some of the issues mentioned above regarding the
choice of a measure of central tendency. The method replaces least-squares estimation with
3
least-absolute distance estimation. While the least-squares method is simple to implement without high-powered computing capabilities, least-absolute-distance estimation demands significantly greater computing power. It was not until the late 1970s, when computing technology was combined with algorithmic developments such as linear programming, that median-regression modeling via least-absolute-distance estimation became practical. The median-regression model can be used to achieve the same goal as conditional-mean-regression modeling in order to represent the relationship between the central location of the response and a set of covariates. However, when the distribution is highly skewed, the mean can be challenging to interpret while the median remains highly informative. As a consequence, Conditional-median modeling has the potential to be more useful. The median is a special quantile, which describes the central location of a distribution. Conditional-median regression is a special case of quantile regression in which the conditional 0.5 quantile is modeled as a function of covariates. More generally, other quantiles can be used to describe non-central positions of a distribution. The quantilenotion generalizes specific terms like quartile, quintile, decile, and percentile. The pthquantile denotes that value of the response below which the proportion of the population is p. Thus, quantiles can specify any position of a distribution. For example, 2.5% of the population lies below the 0.25th quantile. Koenker and Bassett (1978) introduced quantile regression, which models conditional quantiles as functions of predictors. The quantile-regression model is a natural extension of the linear-regression model. While the linear-regression model specifies the change in the conditional mean of the dependent variable associated with a change in the covariates, the quantile regression model specifies changes in the conditional quantile. Since any quantile can be used, it is possible to model any predetermined position of the distribution.
4
Quantile-regression models can be fitted by minimizing a generalized measure of distance
using algorithms based on linear programming. As a result, quantile regression is now a
practical tool for researchers. Software packages familiar to statistical scientists offer readily
accessed commands for fitting quantileregression models.
This research work aims to investigate the robustness of quantile regression as an alternative
to least squares regression, especially when the number of regressors increases in the model
to a broad audience of social scientists who are interested in modeling both the location and
shape of the distribution they wish to study. It is imperative for researchers to utilize the
comparison in two parts.
1.2 Motivation of the Study
Standard linear regression techniques summarize the average relationship between a set of
regressors and the outcome variable based on the conditional mean function Ey x . This
provides only a partial view of the relationship, as we might be interested in describing the
relationship at different points in the conditional distribution of y. Quantile regression
provides that capability. Analogous to the conditional mean function of linear regression, we
may consider the relationship between the regressors and outcome using the conditional
median functionQ y x q , where the median is the 50th percentile, or quantile q, of the
empirical distribution. The quantile q  (0; 1) is that y which splits the data into proportions q
below and1 q above:     : 1 F y q and y F q q q
   for the median, q = 0.5.If  i
is the model
prediction error, OLS minimizes 
2
i
. Median regression, also known as least-absolutedeviations
(LAD) regression, minimizes  i
. Quantile regression minimizes a sum that
gives asymmetric penalties   i
1 q for overprediction and  i
q for underprediction.
Although its computation requires linear programming methods, the quantile regression
estimator is asymptotically normally distributed. Median regression is more robust to outliers
5
than least squares regression, and is semi parametric as it avoids assumptions about the
parametric distribution of the error process. Just as regression models conditional moments,
such as predictions of the conditional mean function, we may use quantile regression to
model conditional quantiles of the joint distribution of y and x.
1.3 Statement of the Problem
Regression analysis is robust in application for various kind of research, especially when
provisions are made to control for problems dealing with heteroskedasticity, due to the
violation of OLS assumption      2 
2
2
1
2 2   i.e    . What is the implication of
heteroskedasticity?The ordinary least squares (OLS) estimators and regression predictions
based on them remain unbiased and consistent.The OLS estimators are no longer the BLUE
because they are no longer efficient. As a result, regression predictions will be inefficient as
well.Because of the inconsistency of the covariance matrix of the estimated regression
coefficients, the tests of hypotheses, that is, t-tests or F-tests, are no longer valid.
The purpose of this study is to introduce a good model that controls the problem of
heteroskedasticity, and the newly model is considered as quantile regression model as a
robust alternative to ordinary least squares regression when the said assumptions fail to hold.
1.4 Scope of the study
The study, seeks to investigate the behavior or nature of the two regression processes as the
number of predictor (explanatory) variables increases. Keeping in mind that quantile
regressiondoes not take into account the failure or otherwise of the existing assumptions.
Discussion of the two techniques, various significant tests of the importance of independent
variables, the reliability of the models and results interpretation in the two techniques will be
considered. The study also involves empirical analysis using the two techniques and a
comparison of the results will be made to establish the discrepancies in using the two
techniques, if any.
6
1.5 Significance of the study
The study has been found worthy of research due to the fact that QR has been found to give a more holistic view of the effect of the explanatory variables on the response variable at different quantiles. The main significance of the study is that it strongly propose the use of QR as against the OLS regression that was used in the past for examiningthe heteroscedasticity, thereby recommending the use of QR to researchers for the study of heteroscedasticity. Most research involves estimation of the relationship between independent variables and dependent variable on the average. The relationship between dependent and independent variables can be estimated and examined at each quantile of the dependent variable. Inthis research it is our purpose to revisit the least absolute deviation estimation in regression analysis, consider some of its theoretical properties, and consider its implementation from a computational mathematical programming point of view. We also consider goodness of fit statistics as well as approximate distributions of the associate test statistics for the parameters. Furthermore, we suggest a new goodness of fit statistic, called the Quantilecoefficient of determination, which is adapted to the metrics used in LAD estimation. Finally some examples are provided to illustrate the behavior of the procedures in data that include outliers growing recognition of the need for a more flexible, more complete analysis is a driving force in the use of QR in the literature.
1.6 Aim and Objective(s) of the Study
The main aim of the study is to investigatequantile regression as an alternative to least squares regression, especially when the number of regressors increases.
1. To examine the quantile regression andleast squares regression.
2. To compare the models in term of goodness of fit statistic.
3. To recommend a suitable model for regression analysis.
7
1.7 Limitation of the Study
The study would have focused on conditional, unconditional, mean and quantilesbut for time
constraint, the study will focused only on conditional mean and conditional quantiles.
1.8 Statement of Hypotheses
t – test for significance of one coefficient
The t – test is used to determine whether the relationship between y andxj is significant
H0: j = 0 (i.e. the coefficient j  is not significantly different from zero).
H1: βj ≠ 0 (i.e. the coefficient is significantly different from zero).
F – Test for overall significant of all coefficients
Testing whether the relationship between y and all xvariables is significant
: at least one of is different from zero. 1 js H 
: 0 0 1 2     p H    
8

 

GET THE COMPLETE PROJECT»

Do you need help? Talk to us right now: (+234) 8111770269, 08111770269 (Call/WhatsApp). Email: [email protected]

IF YOU CAN’T FIND YOUR TOPIC, CLICK HERE TO HIRE A WRITER»

Disclaimer: This PDF Material Content is Developed by the copyright owner to Serve as a RESEARCH GUIDE for Students to Conduct Academic Research. You are allowed to use the original PDF Research Material Guide you will receive in the following ways: 1. As a source for additional understanding of the project topic. 2. As a source for ideas for you own academic research work (if properly referenced). 3. For PROPER paraphrasing ( see your school definition of plagiarism and acceptable paraphrase). 4. Direct citing ( if referenced properly). Thank you so much for your respect for the authors copyright. Do you need help? Talk to us right now: (+234) 8111770269, 08111770269 (Call/WhatsApp). Email: [email protected]

[ad_2]


Purchase Detail

Hello, we’re glad you stopped by, you can download the complete project materials to this project with Abstract, Chapters 1 – 5, References and Appendix (Questionaire, Charts, etc) for N4000 ($15) only, To pay with Paypal, Bitcoin or Ethereum; please click here to chat us up via Whatsapp.
You can also call 08111770269 or +2348059541956 to place an order or use the whatsapp button below to chat us up.
Bank details are stated below.

Bank: UBA
Account No: 1021412898
Account Name: Starnet Innovations Limited

The Blazingprojects Mobile App



Download and install the Blazingprojects Mobile App from Google Play to enjoy over 50,000 project topics and materials from 73 departments, completely offline (no internet needed) with the project topics updated Monthly, click here to install.

0/5 (0 Reviews)
Read Previous

Relevance Of Marketing Information System In The Achievement Of Organizational Objective – Complete project material

Read Next

Role Of Personality Traits, Physical Attractiveness And Gender On Sexual Harassment – Complete project material

Need Help? Chat with us