6 things to keep in mind for your Linear Regressions to function superbly

John V. Krompas
DataDrivenInvestor
Published in
5 min readJun 14, 2022

--

Linear Regressions are perhaps the most used tool in applied sciences. Ordinary least Squares in particular are widely used to model relations between variables, create forecasts or explain variations. A properly specified OLS model has the ability to outperform many advanced multi-equation models. But reaching this well-specified model can be an extremely tedious task as your OLS estimation needs to fulfill some well-known and some lesser-known requirements to work properly. In this short article we are going to go through an OLS checklist, which if your model passes successfully, you can be confident that you have a well-specified model in your hands. Let’s go:

1. Homoscedasticity and non-autocorrelation in the residuals: They are perhaps the first thing any modeler checks when estimating an OLS equation. If your residuals suffer from the violation of these assumptions, your model is statistically weak. However, you can keep in mind that the values of the coefficient estimates are not biased when heteroscedasticity and autocorrelation are present (with one exception, mentioned in #2), so if you simply want a value that connects X to Y you are covered but you cannot make any statistical inference. To correct for heteroscedasticity and autocorrelation, you can use heteroscedasticity and autocorrelation consistent variance-covariance (like White or Huber-White), including ARMA and ARCH terms if you are modeling time series, or use another set of explanatory variables.

2. Autocorrelation in the presence of dependent variable lag: If you are using a lag of the dependent variable as an explanatory variable and autocorrelation is present in the residuals, then your estimates are biased, as the past values of the residuals affect both the lag of your dependent variable and the current value of the residual, causing endogeneity. In the presence of endogeneity, model coefficients are biased, and your model is misspecified.

3. Covariance vs Correlation: When you estimate an OLS model, you do not want explanatory variables that are highly linearly correlated to one another. This will cause multicollinearity which messes up both the estimates and their statistical properties. On the other hand, you want to include relevant variables that have high covariance with one another as the existence of this covariance benefits OLS estimates. In fact, omitting a crucial variable from the model may result in biased estimations for the included variables (there are relevant statistical tests in most econometric packages).

4. Normality of the residuals. If your residuals are distributed normally, it is a good thing for your model as it is a prerequisite for OLS. If they are not, you should be careful: If you have a large enough sample (>30 observations) then the central limit theorem applies, and your residuals will behave as approximately normal. There are however exceptions to this rule, like when you are using time-series techniques, then your residuals must be normally distributed.

5. High level of significance in a very large sample: Econometrics and Statistics were founded to draw conclusions from limited sets of data. Nowadays, however, we have enormous datasets we apply these methods on. If you have thousands of observations, then do not set a 10% or 5% level of significance to determine if the true value of the coefficient is zero or not. At such big datasets, the coefficient will have surely converged to its true value so you either reject it or accept the null hypothesis for any level of significance.

6. Stationarity and Cointegration: When modeling time series our variables must be stationary in order to draw statistical conclusions, otherwise the regression might be spurious. There is one exception to that rule: If our variables are cointegrated then the super consistency theorem applies: OLS converge to their true value twice as fast. This means that if your variables are cointegrated* you can estimate your regression and have an estimate of their true relationship, however, you still should not perform any statistical test on that model as their statistics are not trustworthy. To overcome this problem, you should use alternative methods, such as FMOLS.

  • To determine if two variables are cointegrated you can either perform a Johansen test or check if the residuals of the model are stationary.

If you found this story useful and you want to keep reading such stories and support me in providing quality content, you can do so by following me!

You can also receive my stories straight to your email inbox by clicking here!

Furthermore, if you are an investment enthusiast, like me, and wish to get access to powerful investment platforms and help me keep delivering interesting stories in the process you can sign up on any of the platforms below:

(If you want to know more about those platforms and why they should be a part of your portfolio, you can read about them here.)

Freedom 24 -One of the top investment platforms, you can invest with low to zero fees in stocks, bonds, ETFs, etc. It also has a deposit account with 3% annual interest paid out daily, plus they give you free stock upon signing up through this link.

Nexo -One of the most regulated cryptocurrency investment platforms. Earn interest on crypto and fiat currencies, use your crypto to get collateralized loans, and get access to the first crypto credit card.

Mintos -Perhaps the only regulated P2P platform in the EU. They recently started turning loans into formal investment instruments (called “Notes”). They also plan to expand and offer instruments like ETFs in the future.

Revolut -The one that brings all other platforms together. Revolut is the best solution to move money between investment platforms and banking accounts. They recently became a fully- licensed bank themselves and offer a variety of products from insurance and crypto accounts to interest-bearing deposit accounts (in some countries).

Whether you believe it or not, economists run on coffee. So, if you do not feel like registering on any of the platforms above, you can support me by buying me my morning coffee!

(and help fight climate change in the process, as 1% of the donations go towards carbon removal!)

Subscribe to DDIntel Here.

Join our network here: https://datadriveninvestor.com/collaborate

--

--

Private Sector Economist, MPhil Economics, MSc Applied Economics & Management