Author: Daoqin Tong, Xueting Jin
Date Created: 11/23/2021
This notebook provides the code for running a Segmented Linear Regression Model to examine visits to grocery stores in six large metropolitan statistical areas (MSAs).
import numpy as np
import pandas as pd
import statsmodels.api as sm
food_access = pd.read_excel('Regression_Data.xlsx')
print(food_access.head(5))
GeoID Visits AWDist intD TripChain_B TripChain_D \ 0 170310101002 1756.663795 9.074393 0 606.279159 0.0 1 170310101003 2197.925421 7.045779 0 698.586606 0.0 2 170310102011 3897.613786 6.069707 0 1080.965566 0.0 3 170310102012 4683.447118 7.227351 0 1486.399091 0.0 4 170310102013 1573.585389 6.101111 0 355.207317 0.0 OtherR_Visits Fast_Visits Dist_Convenience Dist_LargeRetailer ... \ 0 3943.300326 1228.570038 0.292435 2.669511 ... 1 5118.357246 1541.180055 0.180847 2.426438 ... 2 4035.986211 1685.464689 0.155907 2.066019 ... 3 8024.780492 3173.090726 0.284614 2.050767 ... 4 2730.858675 1244.316341 0.207635 2.176084 ... No_Vehicle_B No_Vehicle_D High_Educated_B High_Educated_D May \ 0 45.825427 0.0 37.233310 0.0 0 1 54.147251 0.0 60.928571 0.0 0 2 23.735955 0.0 40.533778 0.0 0 3 34.691011 0.0 30.654545 0.0 0 4 27.986907 0.0 27.089073 0.0 0 PCT_Case POPDensityKsqmi Commute PCT_PUBLIC MSA 0 0.0 0.9935 34.4 12.36 Chicago 1 0.0 0.9935 34.4 12.36 Chicago 2 0.0 0.9935 34.4 12.36 Chicago 3 0.0 0.9935 34.4 12.36 Chicago 4 0.0 0.9935 34.4 12.36 Chicago [5 rows x 28 columns]
y1 = food_access.iloc[:,1]
x1 = food_access.iloc[:,3:27]
x1 = sm.add_constant(x1)
y2 = food_access.iloc[:,2]
x2 = food_access.iloc[:,[3,4,5,8,9,10,11,12,14,15,16,17,18,19,20,21,22,23,24,25,26]]
x2 = sm.add_constant(x2)
model1 = sm.OLS(y1,x1)
result1 = model1.fit()
print(result1.summary())
OLS Regression Results ============================================================================== Dep. Variable: Visits R-squared: 0.887 Model: OLS Adj. R-squared: 0.887 Method: Least Squares F-statistic: 4.526e+04 Date: Wed, 24 Nov 2021 Prob (F-statistic): 0.00 Time: 17:58:02 Log-Likelihood: -1.1144e+06 No. Observations: 137991 AIC: 2.229e+06 Df Residuals: 137966 BIC: 2.229e+06 Df Model: 24 Covariance Type: nonrobust ====================================================================================== coef std err t P>|t| [0.025 0.975] -------------------------------------------------------------------------------------- const 651.9254 40.300 16.177 0.000 572.938 730.913 intD 245.3612 11.289 21.734 0.000 223.235 267.488 TripChain_B 2.3147 0.004 535.185 0.000 2.306 2.323 TripChain_D 2.2311 0.006 394.149 0.000 2.220 2.242 OtherR_Visits -0.0439 0.002 -21.235 0.000 -0.048 -0.040 Fast_Visits 0.3696 0.004 96.972 0.000 0.362 0.377 Dist_Convenience 108.0727 3.734 28.945 0.000 100.755 115.391 Dist_LargeRetailer -17.4920 0.816 -21.446 0.000 -19.091 -15.893 Dist_Specialty 13.0355 0.616 21.159 0.000 11.828 14.243 Dist_Supermarket 4.5200 2.172 2.081 0.037 0.263 8.777 Dist_Warehouse -5.0634 0.620 -8.167 0.000 -6.279 -3.848 Population 0.2511 0.004 65.698 0.000 0.244 0.259 PCT_17 4.4685 0.294 15.189 0.000 3.892 5.045 PCT_65 5.7481 0.243 23.677 0.000 5.272 6.224 PCT_race -2.6667 0.094 -28.470 0.000 -2.850 -2.483 Income 0.9692 0.083 11.710 0.000 0.807 1.131 No_Vehicle_B 0.1023 0.167 0.611 0.541 -0.226 0.430 No_Vehicle_D -4.0589 0.180 -22.608 0.000 -4.411 -3.707 High_Educated_B -0.7727 0.181 -4.268 0.000 -1.127 -0.418 High_Educated_D 2.1330 0.186 11.448 0.000 1.768 2.498 May -83.9182 4.464 -18.798 0.000 -92.668 -75.168 PCT_Case -22.1611 0.902 -24.578 0.000 -23.928 -20.394 POPDensityKsqmi 59.1955 4.282 13.823 0.000 50.802 67.589 Commute -39.0944 1.364 -28.672 0.000 -41.767 -36.422 PCT_PUBLIC 12.6462 0.352 35.898 0.000 11.956 13.337 ============================================================================== Omnibus: 92918.776 Durbin-Watson: 0.976 Prob(Omnibus): 0.000 Jarque-Bera (JB): 17547130.540 Skew: 2.241 Prob(JB): 0.00 Kurtosis: 58.062 Cond. No. 6.90e+04 ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. [2] The condition number is large, 6.9e+04. This might indicate that there are strong multicollinearity or other numerical problems.
model2 = sm.OLS(y2,x2)
result2 = model2.fit()
print(result2.summary())
OLS Regression Results ============================================================================== Dep. Variable: AWDist R-squared: 0.235 Model: OLS Adj. R-squared: 0.234 Method: Least Squares F-statistic: 2013. Date: Wed, 24 Nov 2021 Prob (F-statistic): 0.00 Time: 17:58:03 Log-Likelihood: -3.9549e+05 No. Observations: 137991 AIC: 7.910e+05 Df Residuals: 137969 BIC: 7.912e+05 Df Model: 21 Covariance Type: nonrobust ====================================================================================== coef std err t P>|t| [0.025 0.975] -------------------------------------------------------------------------------------- const 6.6923 0.220 30.474 0.000 6.262 7.123 intD -1.3596 0.061 -22.288 0.000 -1.479 -1.240 TripChain_B 4.647e-05 2.06e-05 2.251 0.024 6.01e-06 8.69e-05 TripChain_D 0.0004 2.56e-05 15.561 0.000 0.000 0.000 Dist_Convenience 0.6984 0.020 34.375 0.000 0.659 0.738 Dist_LargeRetailer 0.0791 0.004 17.777 0.000 0.070 0.088 Dist_Specialty 0.1280 0.003 38.156 0.000 0.121 0.135 Dist_Supermarket 0.4054 0.012 34.171 0.000 0.382 0.429 Dist_Warehouse 0.1198 0.003 35.364 0.000 0.113 0.126 PCT_17 -0.0284 0.002 -17.846 0.000 -0.031 -0.025 PCT_65 -0.0188 0.001 -14.256 0.000 -0.021 -0.016 PCT_race -0.0027 0.001 -5.265 0.000 -0.004 -0.002 Income -0.0056 0.000 -12.521 0.000 -0.007 -0.005 No_Vehicle_B -0.0099 0.001 -10.881 0.000 -0.012 -0.008 No_Vehicle_D 0.0082 0.001 8.476 0.000 0.006 0.010 High_Educated_B 0.0164 0.001 16.738 0.000 0.014 0.018 High_Educated_D 0.0088 0.001 8.746 0.000 0.007 0.011 May 0.3634 0.024 14.970 0.000 0.316 0.411 PCT_Case 0.0869 0.005 17.746 0.000 0.077 0.096 POPDensityKsqmi -0.3249 0.023 -14.042 0.000 -0.370 -0.280 Commute 0.0383 0.007 5.158 0.000 0.024 0.053 PCT_PUBLIC -0.0047 0.002 -2.447 0.014 -0.008 -0.001 ============================================================================== Omnibus: 139901.766 Durbin-Watson: 1.631 Prob(Omnibus): 0.000 Jarque-Bera (JB): 32497106.534 Skew: 4.534 Prob(JB): 0.00 Kurtosis: 77.631 Cond. No. 1.49e+04 ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. [2] The condition number is large, 1.49e+04. This might indicate that there are strong multicollinearity or other numerical problems.