Author: Naomi W. Lazarus, PhD
Date Created: 6-14-21
This notebook provides the code for running a Geographically Weighted Regression (GWR) using COVID-19 incidence rates as the dependent variable and independent variables representing age and underlying conditions. It is an exploratory analysis of spatial relationships between COVID-19, age demographics, and comorbidities. It is not recommended for predictive purposes.
The notebooks consists of two sections. The first uses covid-19 incidence rates for peak period 1 (03/01/20 - 04/30/20). The second uses covid-19 incidence rates for peak period 2 (06/01/20 - 07/31/20). The following links provide information on file descriptions and metadata.
try:
from mgwr.gwr import GWR
except:
print('Installing MGWR')
! pip install -U mgwr
import numpy as np
import pandas as pd
import libpysal as ps
from spreg import OLS
from mgwr.gwr import GWR, MGWR
from mgwr.sel_bw import Sel_BW
from mgwr.utils import compare_surfaces, truncate_colormap
import geopandas as gp
import matplotlib.pyplot as plt
import matplotlib as mpl
covid_IR1 = gp.read_file('/home/jovyan/shared_data/data/geospatialfellows21/lazarus_data/Data_Files1/Layer_IR1_1.shp')
covid_IR1.head()
# Run Ordinary Least Squares Regression
y = covid_IR1['IR1_log'].values.reshape((-1, 1))
X = covid_IR1[['PCT_50to74', 'PCT_over75', 'DIAB_PCT', 'CARDIO_MR', 'OBESE_PCT']].values
ols = OLS(y, X)
print(ols.summary)
# Defining variables and coordinates
g_y = covid_IR1['IR1_log'].values.reshape((-1, 1))
g_X = covid_IR1[['PCT_50to74', 'PCT_over75', 'DIAB_PCT', 'CARDIO_MR', 'OBESE_PCT']].values
u = covid_IR1['X']
v = covid_IR1['Y']
g_coords = list(zip(u, v))
# Inspecting the data contents
print('g_y:\n', g_y[:5])
print('\ng_X:\n', g_X[:5])
print('\nu:\n', list(u[:5]))
print('\nv:\n', list(v[:5]))
print('\ng_coords:\n', g_coords[:5], "\n")
# Testing suitable bandwidth prior to specifying the model
gwr_selector = Sel_BW(g_coords, g_y, g_X)
gwr_bw = gwr_selector.search()
print(gwr_bw)
# Specifying the GWR model - bandwidth set at 102 neighbors using an adaptive Gaussian kernal function
gwr_model = GWR(g_coords, g_y, g_X, bw=102, fixed=False, kernel='gaussian')
gwr_results = gwr_model.fit()
# Print Global regression results
print(gwr_results.resid_ss)
print(gwr_results.aic)
print(gwr_results.R2)
print(gwr_results.adj_R2)
# Create a column to store R2 values in the dataframe
covid_IR1['R2'] = gwr_results.localR2
covid_IR1.head()
# Visualizing local R2 values on a map
covid_IR1['R2'] = gwr_results.localR2
covid_IR1.plot('R2', legend = True)
ax = plt.gca()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
# Create a column to store standardized residual values in the dataframe
covid_IR1['SR'] = gwr_results.std_res
covid_IR1.head()
# Visualizing standardized residuals on a map
covid_IR1['SR'] = gwr_results.std_res
covid_IR1.plot('SR', legend = True)
ax = plt.gca()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
# Reviewing parameter estimates associated with the predictors starting with the constant.
# B coefficients follow the sequence of predictors listed in code sample for running OLS
print(gwr_results.params)
covid_IR2 = gp.read_file('/home/jovyan/shared_data/data/geospatialfellows21/lazarus_data/Data_Files1/Layer_IR2_1.shp')
covid_IR2.head()
# Run Ordinary Least Squares Regression
y = covid_IR2['IR2_log'].values.reshape((-1, 1))
X = covid_IR2[['PCT_50to74', 'PCT_over75', 'DIAB_PCT', 'CARDIO_MR', 'OBESE_PCT']].values
ols = OLS(y, X)
print(ols.summary)
# Defining variables and coordinates
g_y = covid_IR2['IR2_log'].values.reshape((-1, 1))
g_X = covid_IR2[['PCT_50to74', 'PCT_over75', 'DIAB_PCT', 'CARDIO_MR', 'OBESE_PCT']].values
u = covid_IR2['X']
v = covid_IR2['Y']
g_coords = list(zip(u, v))
# Inspecting the data contents
print('g_y:\n', g_y[:5])
print('\ng_X:\n', g_X[:5])
print('\nu:\n', list(u[:5]))
print('\nv:\n', list(v[:5]))
print('\ng_coords:\n', g_coords[:5], "\n")
# Testing suitable bandwidth prior to specifying the model
gwr_selector = Sel_BW(g_coords, g_y, g_X)
gwr_bw = gwr_selector.search()
print(gwr_bw)
# Specifying the GWR model - bandwidth set at 111 neighbors using an adaptive Gaussian kernal function
gwr_model = GWR(g_coords, g_y, g_X, bw=111, fixed=False, kernel='gaussian')
gwr_results = gwr_model.fit()
# Print Global regression results
print(gwr_results.resid_ss)
print(gwr_results.aic)
print(gwr_results.R2)
print(gwr_results.adj_R2)
# Create a column to store R2 values in the dataframe
covid_IR2['R2'] = gwr_results.localR2
covid_IR2.head()
# Visualizing local R2 values on a map
covid_IR2['R2'] = gwr_results.localR2
covid_IR2.plot('R2', legend = True)
ax = plt.gca()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
# Create a column to store standardized residual values in the dataframe
covid_IR2['SR'] = gwr_results.std_res
covid_IR2.head()
# Visualizing standardized residuals on a map
covid_IR2['SR'] = gwr_results.std_res
covid_IR2.plot('SR', legend = True)
ax = plt.gca()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
# Reviewing parameter estimates associated with the predictors starting with the constant.
# B coefficients follow the sequence of predictors listed in code sample for running OLS
print(gwr_results.params)
Environmental Systems Research Institute (ESRI). (2021). How Geographically Weighted Regression (GWR) works. https://pro.arcgis.com/en/pro-app/latest/tool-reference/spatial-statistics/how-geographicallyweightedregression-works.htm#GUID-A5307DAE-12AF-41C2-831B-10192ECB6CE4. Accessed Jun 15th, 2021
Fotheringham, A.S., Brunsdon, C., and Charlton, M. (2002). Geographically Weighted Regression. West Sussex, U.K.: John Wiley & Sons Ltd.
Multiscale Geographically Weighted Regression (MGWR). (2018). https://mgwr.readthedocs.io/en/latest/generated/mgwr.gwr.GWRResults.html#mgwr.gwr.GWRResults. Accessed Jun 15th, 2021
Ndiath, M.M., Cisse, B., Ndiaye, J.L., Gomis, J.F., Bathiery, O., Dia, A. T., Gaye, O., and Faye, B. (2015).Application of geographically‑weighted regression analysis to assess risk factors for malaria hotspots in Keur Soce health and demographic surveillance site. Malaria Journal, 14:463. doi:10.1186/s12936-015-0976-9
Oshan, T.M., Li, Z., Kang, W., Wolf, L.J., and Fotheringham, A.S. (2019). MGWR: A Python Implementation of Multiscale GeographicallyWeighted Regression for Investigating Process Spatial Heterogeneity and Scale. International Journal of Geo-Information, 8:269. doi:10.3390/ijgi8060269
Spatial Regression Models (spreg). (2018). https://pysal.org/spreg/generated/spreg.OLS.html#spreg.OLS. Accessed Jun 15th, 2021