If we have domain knowledge, we know that it’s not necessary to exclude both from our regression model. We know that highway MPG and city MPG have a high VIF value.Let’s take a look at the example of our use case why domain knowledge would be helpful in this case: To remove collinearity, we can exclude independent variables that have a high VIF value from our regression model. This is the most straightforward solution to remove collinearity and oftentimes, domain knowledge would be extremely helpful to achieve the best solution. Next, we build a regression model and below is the summary statistics of our model. To predict it, we have independent variables such as the car’s city MPG, highway MPG, horsepower, engine size, stroke, width, peak RPM, and compression ratio. Let’s imagine we want to predict the price of a car. To make it more clear why collinearity is such a problem, let’s take a look at the following use case. This in turn will reduce the reliability of our model and we shouldn’t trust the p-Value that our model showed us to judge whether an independent variable is statistically significant for our model or not. Collinearity will inflate the variance and standard error of coefficient estimates.This makes it difficult for us to understand the influence of each independent variable. Let’s say we want to remove or add one independent variable, the coefficient estimates then will fluctuate massively. The coefficient estimates of independent variables would be very sensitive to the change in the model, even for a tiny change.There are several things how collinearity would affect our model, which are: In this post, we are going to see why collinearity becomes such a problem for our regression model, how we can detect it, how it affects our model, and what we can do to remove collinearity. Now if we have collinearity, the key point above is no longer valid, as if we change the value of one independent variable, the other independent variables that are correlated will also change. This way, we can interpret the fitted coefficient of each independent variable as the mean change in the dependent variable for each 1 unit change in an independent variable while keeping the other independent variables constant. In regression analysis, we want to isolate the influence of each independent variable to our dependent variable. If collinearity exists between independent variables, the key point of regression analysis is violated. It shouldn’t have any correlation with other independent variables. This is problematic because as the name suggests, an independent variable should be independent. Collinearity occurs because independent variables that we use to build a regression model are correlated with each other.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |