Getting Started with Interactive Mapping and Data Visualization¶

Prepared for 2021 GIS Day Workshop by Zhiyu Li¶

Programming with CyberGIS-Jupyter¶

Cell
- Add a cell: single click on a cell --> Menu --> Insert
- Delete a cell: single click on the cell to delete --> Menu --> Edit --> Delete Cells
- Reorder a cell: click on a cell --> up arrow or down arrow
- Change cell type: single click on a cell ---> Menu --> Cell Type
- Edit cell
  - Markdown: double click (basic syntax reference)
  - Code: single click
- Run a cell: single click on a cell --> Run (or Shift + Enter)
- Run all cells: Menu --> Cell --> Run All
- Clear all cell output: Menu --> Cell --> All Output --> Clear
Kernel
- Change kernel: Menu --> Kernel --> Change kernel
  - Choose versioned kernel (eg XXXXX-0.8.0) before sharing
- Restart kernel: Menu --> Kernel --> Restart (& Clear output)
See all notebooks (Tree View) on CyberGIS-Jupyter
- On CyberGIS-Hub page: click on "Launch CyberGISX" at the upper-right corner
- On an opened notebook page: remove notebook filename (XXXX.ipynb) from browser address bar
Troubleshooting
- Restart kernel
- Restart CyberGIS-Jupyter (save notebook first!): Control Panel --> Stop My Server --> Start Server
- Bug Report button
- Announcement area (maintenance plan, release notes)
More Info
- "Getting Started" series notebooks on CyberGISX
- "Jupyter Notbooks" section in Knowledge Base
Try it out now

Task 1: Add one new code cell after this cell¶

Hint: Single click on this cell --> go to Menu --> Insert --> Insert Cell Below

Task 2: Change the new cell's type to Markdown, and write something in it

Hint: Single click on the new cell --> go to Menu --> Cell --> Cell Type --> Markdown; Single click on the new cell ---> write some Markdowns see basic syntax reference; Click on the "Run" button on the tool bar;

Task 3: Uncomment the Python codes in the cell below and run it¶

Hint: Single click on the cell below --> Remove the Pound sign ('#') --> press Shift + Enter keys together (or click on the 'Run" button on the tool bar)

#print("hello world")

Task 4: clean all output of this notebook¶

Hint: go to Menu --> Cell --> All Output --> Clear

------------- Below is the main section of the Mapping and Visualization notebok -----------------¶

Introduction¶

This notebook will walk you through some basic techniques of conducting Interative Mapping and Data Visualization in the CyberGIS-Jupyter environment. We will retrieve the latest COVID-19 data from the Illinois Department of Public Health (IDPH) website, examine and preprocess the data, make plots of daily new cases by counties using matplotlib, create an interactive map with ipyleaflet to visualize weekly channge of new cases across the state, and finally link map and plot together.

After finishing this notebook, you will have a "app-style" notebook like the screenshot below. You are encouraged to tweak the codes a little bit to visualize other COVID-19 indices, such as deaths and testings.

Notebook Outline:¶

Introduction
Setup
Data retrieval and preprocessing
- Download data from IDPH website
- Examine raw data
Plot time series data for a county
Visualize weekly change of new cases at county level across the state
App-style Interactive Map
Using this Metric Alone May Mislead You

Setup¶

This cell is to import required modules and libs. A breif description on the purpose of each libs can be found below:

json - standard Python module for JSON format I/O operations
wget - for downloading files from URLs
numpy - for handling N-dimentional arrays and numerical computing
pandas - for tabular data analysis and manipulation
ipyleaflet - for interactive mapping in Jupyter Notebook environment
branca - for dealing with colormaps
matplotlib - for creating plots and figures

import json
import wget
import numpy as np
import pandas as pd
import geopandas as gpd
import ipyleaflet
from branca import colormap
import matplotlib.pyplot as plt

# for interactive charting
%matplotlib notebook

Data retrieval and preprocessing¶

The Illinois Department of Public Health has a COVID-19 data portal that provides different metrics and data, including "county-level histrorical cases, deaths and tested", "hospitalization data", "vaccine administration data", "zipcode-level cases and tested data" and others. As noted on the website, data avaialbility, update frequency, data format and metrics reported are subject to change.

Since IDPH may choose to stop updating the data from their website or change the format in the future, we downloaded the data as of Nov 16, 2021 and saved it (filename: idph_counties_Nov16_2021.csv) alongside the notebook for archive. You can also programmatically download the latest "county-level histrorical cases, deaths and tested" data on the fly by uncommenting some codes in the cell below.

Download data from IDPH website¶

# By default, use data downloaded from IDPH as of Nov 16, 2021
idph_counties_csv = "./data/idph_counties_Nov16_2021.csv"

## Uncomment the 2 lines below to download latest data from IDPH on the fly
## IDPH may have removed or changed the data format after this notebook was developed
## So in that case you would need to change codes accordingly
## see: https://dph.illinois.gov/covid19/data/data-portal/all-county-historical-snapshot.html

idph_counties_url = "https://idph.illinois.gov/DPHPublicInformation/api/COVIDExport/GetSnapshotHistorical?format=csv"
idph_counties_csv = wget.download(idph_counties_url, out="./idph_counties.csv")

print("Using data at {}".format(idph_counties_csv))

Using data at ./idph_counties.csv

Examine raw data¶

The raw data is a csv file. We load it into a pandas dataframe (the 1st row is ignored as it is the title line).

df = pd.read_csv(idph_counties_csv, skiprows=1, parse_dates=['ReportDate'])
df

Each row includes metrics of a specific county reported on a specific day. We can see the earliest data is from 2020-03-17 and the rows at the bottom are the most recent data.

Weekly Change Rate

List all county names. Note that IDPH separates Chicago city area from from Cook county in this dataset, and it treats Chicago as standalone county.

df.CountyName.unique()

array(['Adams', 'Alexander', 'Bond', 'Boone', 'Brown', 'Bureau',
       'Calhoun', 'Carroll', 'Cass', 'Champaign', 'Chicago', 'Christian',
       'Clark', 'Clay', 'Clinton', 'Coles', 'Cook', 'Crawford',
       'Cumberland', 'De Witt', 'DeKalb', 'Douglas', 'DuPage', 'Edgar',
       'Edwards', 'Effingham', 'Fayette', 'Ford', 'Franklin', 'Fulton',
       'Gallatin', 'Greene', 'Grundy', 'Hamilton', 'Hancock', 'Hardin',
       'Henderson', 'Henry', 'Illinois', 'Iroquois', 'Jackson', 'Jasper',
       'Jefferson', 'Jersey', 'Jo Daviess', 'Johnson', 'Kane', 'Kankakee',
       'Kendall', 'Knox', 'Lake', 'LaSalle', 'Lawrence', 'Lee',
       'Livingston', 'Logan', 'Macon', 'Macoupin', 'Madison', 'Marion',
       'Marshall', 'Mason', 'Massac', 'McDonough', 'McHenry', 'McLean',
       'Menard', 'Mercer', 'Monroe', 'Montgomery', 'Morgan', 'Moultrie',
       'Ogle', 'Peoria', 'Perry', 'Piatt', 'Pike', 'Pope', 'Pulaski',
       'Putnam', 'Randolph', 'Richland', 'Rock Island', 'Saline',
       'Sangamon', 'Schuyler', 'Scott', 'Shelby', 'St. Clair', 'Stark',
       'Stephenson', 'Tazewell', 'Union', 'Vermilion', 'Wabash', 'Warren',
       'Washington', 'Wayne', 'White', 'Whiteside', 'Will', 'Williamson',
       'Winnebago', 'Woodford'], dtype=object)

Plot time series data for a county¶

We will plot the time series of daily new cases (metric/column name: "CasesChange") for a specific county you selected. You may change the county name or the metric/column name and re-run the following cells to visualiae a different plot of interest.

county_name = "Chicago" # pick a county from above and put it here
metric_name = "CasesChange" # which metric/column to plot, see the headers of the original dataframe

Here we extract data for the county selected.

one_county = df[df["CountyName"]==county_name].set_index("ReportDate")
one_county

Plot the selected metric/column ("CasesChange" by default). The plot is created with matplotlib. Pandas has build-in support for matplotlib so we can make a plot from a panda dataframe directly.

fig1, ax1 = plt.subplots(1,1, figsize=(8,4))
title = 'COVID-19 {} - {}'.format(metric_name, county_name)
one_county[metric_name].plot(ax=ax1, title=title)

<AxesSubplot:title={'center':'COVID-19 CasesChange - Chicago'}, xlabel='ReportDate'>

Visualize weekly change of new cases at county level across the state¶

In this section, we will create a Choropleth Map to visualize the weekly change of new cases at county level across all counties in the state. We will randomly pick a date (day0) as the starting point, and look back for 1 week and calculate the number of new cases reported for that week (cases_current_week). We then look back for another week and caculate the same metric (cases_last_week). The weekly change rate is (cases_current_week - cases_last_week)/cases_last_week * 100%, and is calculated county by county. If the resulting number is postive (or negative), that means we are getting more (or less) cases than last week. The magnitude shows how fast the situation is changing, and we will use different colors to repreent them on the map.

There are 2 edge cases:

a) If there are new cases for current week (current_week is not 0) but no new cases reported for the last week (cases_last_week=0), the pandas calculaed change rate will become positive infinity (numpy.Inf) as the denominator is 0. This case is fine as we will handle it in data classification;

b) If current_week and last_week are both 0, the pandas calculation will give numpy.NAN (Not A Number). We would need to replace it with 0 before data classfication.

Note: This metric is selected for demo purpose only. Using a single metric alone may mislead you.

Weekly Change Rate

Calcuate weekly change rate¶

Here we pick a date as the day0 and extract "CumulativeCases" on that day for all counties. Be sure the date selected is within the range of the data and at least 2 weeks after the earilest reporting date (2020-03-27).

Here we calcuate number of new cases reported for current week (cases_current_week) and last week (cases_last_week).

day0 = np.datetime64('2021-11-16') # pick a date in the format of YYYY-MM-DD

# calculate new cases for this week
cum_cases_day0 = df[df["ReportDate"]==day0].set_index("CountyName").CumulativeCases
day_1week = day0 - np.timedelta64(1,'W')
cum_cases_1week = df[df["ReportDate"]==day_1week].set_index("CountyName").CumulativeCases
cases_current_week = cum_cases_day0.sub(cum_cases_1week)
# calculate new cases for last week
day_2week = day0 - np.timedelta64(2,'W')
cum_cases_2week = df[df["ReportDate"]==day_2week].set_index("CountyName").CumulativeCases
cases_last_week = cum_cases_1week.sub(cum_cases_2week)

pd.DataFrame({'Current Week': cases_current_week,'Last Week':cases_last_week}).sort_values(by="Last Week")

Calcuate weekly change rate for all counties, and sort them by the rates. We replace any numpy.NAN with 0 and leave any nump.Inf as is. You can check if any county has Postive Infinity (numpy.Inf).

weekly_change_rate = cases_current_week.sub(cases_last_week).div(cases_last_week).fillna(0)
weekly_change_rate.sort_values().to_frame("Change_Rate")

Data classification and Colormap¶

We have calculcated the weekly change rate for every county. To visualize them on the map, we need to classify this metric into several classes and assign each class a different color.

Here we classify the weekly change rate into 5 classes. Note that we are doing manual data classification and the breaks we chose are pretty arbitrary. How to do the data classifiation mainly depends on the data itself and how you want to represent them on the map. Sometimes trial and error might be needed to find the "best" classification. Also there are other more advanced methods and tools. See Data Classification For Choropleth Maps and [MapClassify].(https://github.com/pysal/mapclassify) for details.

  > +50%      Class 1
+10% to +50%  Class 2   
-10% to +10%  Class 3            
-10% to -50%  Class 4       
  < -50%      Class 5

# the function that maps weekly change rate values to class 1-5
def classify(v):
    if v > 0.5:            
        color_index = 1    #  class 1 - change rate > 50%
    elif v > 0.1:
        color_index = 2    #  class 2 - change rate 10% to 50%
    elif v > -0.1:
        color_index = 3    #  class 3 - change rate -10% to +10%
    elif v > -0.5:
        color_index = 4    #  class 4 - change rate -50% to +-10%
    else: # < -0.5
        color_index = 5    #  class 5 - change rate < -50%
    return color_index

Apply the above classify() function to every county in the "weekly chanage rate" dateframe. The resulting dataframe lists county names and assigned color indices.

weekly_change_rate_class = weekly_change_rate.apply(classify)
weekly_change_rate_class.to_frame("Class")

Once the data is classified, we can pick colors for every class. The library branca provides a large collection of prebuilt color ramps. Here we have picked a Red-Yellow-Green color ramp. You may uncomment the following cell to see more avaiable color ramps.

#colormap.linear

cm_linear = colormap.linear.RdYlGn_08
cm_linear

We create 5 discrete color steps and assign class indices to them:

N_color_steps = 5  # How many discrete color steps
# The "cm" object (color map) is callable that takes a index and returns color code.
cm = cm_linear.to_step(index=range(N_color_steps+1), round_method="int")

def display_colormap(cm_func, vlist):
    from IPython.display import HTML
    n = len(vlist)
    s = '<svg height="40" width="{}">'.format(n*40) \
    + "".join(['<circle cx="{}" cy="20" r="20" fill="{}"/><text x="{}" y="25">{}</text>'.format(i*40+20, cm_func(vlist[i]), i*40+15, vlist[i]) for i in range(len(vlist))]) \
    +'</svg>'
    return HTML(s)
display_colormap(cm, range(1, N_color_steps+1))

Red           Class 1      > +50% 
Orange        Class 2      +10% to +50%
Yellow        Class 3      -10% to +10%
Light Green   Class 4      -10% to -50%
Green         Class 5      < -50%

Create a choropleth map with ipyleaflet¶

Here we use ipyleaflet for mapping, which is a Jupyter extension that brings in leaflet features to notebook environment. (Note that there also are other tools avaiable you can use to creates maps in notebook, such as folium, plotly, carto, mapbox and arcgis.)

We first ceate a "map" object, center it at Illinois, and add some basic controls to the map including a scale bar and a layer control. For more map control opntions, see here.

# create a ipyleaflet map obj, centering at Illinois
map = ipyleaflet.Map(center=[40.6, -89.6],  zoom = 6)
# add a layer control at the topright
map.add_control(ipyleaflet.LayersControl(position='topright'))
# add a scale bar at the bottomleft
map.add_control(ipyleaflet.ScaleControl(position='bottomleft'))
map

In the "data" folder, there is a GeoJSON file ("idph_geometry.geojson") that contains geomery (polygon) for all Illinois counties. A "Chicago county" was added to make it compatible with IDPH data. We can use GeoPandas to have quick inspection on it.

# geojson file has county geometry (polygon)
county_geomoetry_geojson = "data/idph_geometry.geojson"
gpd.read_file(county_geomoetry_geojson)[["id", "geometry"]]

We put everything together using the ipyleaflet.Choropleth class. There are 3 parameters we need to pay attention to: "geo_data" is the geometry (county polygon) to plot; "choro_data" is a dictionary that maps geometry (county) to class indices (classified weekly change rate); "colormap" is the colormap funtion that converts class indices into colors.

ipyleaflet.Choropleth

with open(county_geomoetry_geojson, 'r') as f:
    layer = ipyleaflet.Choropleth(
                name="Weekly Change of New COVID-19 Cases",
                geo_data=json.load(f),                          # County geometry (geojson file)
                choro_data=weekly_change_rate_class.to_dict(),  # Geometry ID --> Class Index
                colormap=cm,                                    # Class Index --> Color
                style={'fillOpacity': 0.8})                     
map.add_layer(layer)
map

Add a legend to the map

legend = ipyleaflet.LegendControl({"+50%": cm(1),
                                   "+10% to +50%": cm(2),
                                   "Steady (-10% to +10%)":cm(3),
                                   "-10% to -50%":cm(4), 
                                   "-50%":cm(5)},
                                   name="Weekly Change of New Cases", 
                                   position="bottomright")
map.add_control(legend)

App-style Interactive Map¶

The ipywidget allows you to monitor user actions on the map and make responses accordingly.

In this case, when a county is being clicked, we catch the county id (name) and plot the daily new cases time series as we did above, making it a app-style interactive map.

def layer_on_click(**kwargs):
    global ax2
    ax2.cla() # clear previous plot
    county_name=kwargs["id"] # get the id (name) of the clicked county
    one_county = df[df["CountyName"]==county_name].set_index("ReportDate") # extract data for this county
    one_county.CasesChange.plot(ax=ax2, title='COVID-19 Daily Cases - {}'.format(county_name)) # plot time series

fig2, ax2 = plt.subplots(1,1, figsize=(9,4))
fig2.suptitle("(Click on the Map to view COVID-19 Daily Cases)")
layer.on_click(layer_on_click) # monitor mouse click event on the layer
map

Using this Metric Alone May Mislead You¶

This metric is selected for demo purpose only. Using this single metric alone may mislead you! For example, say a county's covid cases have been pretty bad and are on the top of the curve for a long time. However the weekly change rate could be a relative small value because numbers of new cases between thest weeks are close.

	CountyName	CumulativeCases	CasesChange	TotalTested	TotalTestedChange	Deaths	DeathsChange	ReportDate
0	Adams	0	0	0	0	0	0	2020-03-17
1	Alexander	0	0	0	0	0	0	2020-03-17
2	Bond	0	0	0	0	0	0	2020-03-17
3	Boone	0	0	0	0	0	0	2020-03-17
4	Brown	0	0	0	0	0	0	2020-03-17
...	...	...	...	...	...	...	...	...
78827	Whiteside	15376	1	191353	272	225	0	2022-04-14
78828	Will	162799	180	2371004	3814	1530	0	2022-04-14
78829	Williamson	22583	7	263413	360	249	0	2022-04-14
78830	Winnebago	76629	32	973502	1261	819	0	2022-04-14
78831	Woodford	10397	11	157757	179	111	0	2022-04-14

	CountyName	CumulativeCases	CasesChange	TotalTested	TotalTestedChange	Deaths	DeathsChange
ReportDate
2020-03-17	Chicago	0	0	0	0	0	0
2020-03-18	Chicago	0	0	0	0	0	0
2020-03-19	Chicago	0	0	0	0	0	0
2020-03-20	Chicago	0	0	0	0	0	0
2020-03-21	Chicago	0	0	0	0	0	0
...	...	...	...	...	...	...	...
2022-04-10	Chicago	569847	234	11600646	7311	7338	0
2022-04-11	Chicago	570277	430	11602904	2258	7338	0
2022-04-12	Chicago	570680	403	11611165	8261	7338	0
2022-04-13	Chicago	571214	534	11621370	10205	7340	2
2022-04-14	Chicago	571988	774	11634853	13483	7340	0

	id	geometry
0	Chicago	MULTIPOLYGON (((-87.93514 42.00089, -87.93521 ...
1	McHenry	POLYGON ((-88.70742 42.49352, -88.70741 42.493...
2	Boone	POLYGON ((-88.70742 42.49352, -88.70750 42.493...
3	Ogle	POLYGON ((-89.68809 42.19950, -89.68807 42.184...
4	Will	POLYGON ((-88.26146 41.72439, -88.26103 41.708...
...	...	...
98	Lawrence	POLYGON ((-87.90806 38.85013, -87.90819 38.835...
99	Marion	POLYGON ((-89.13844 38.73633, -89.13847 38.721...
100	Union	POLYGON ((-89.04143 37.59650, -89.06017 37.597...
101	Pope	POLYGON ((-88.70860 37.59926, -88.70876 37.584...
102	Cook	MULTIPOLYGON (((-87.93993 41.99344, -87.94006 ...

	Current Week	Last Week
CountyName
Scott	4	0
Pope	1	3
Gallatin	3	3
Pulaski	5	5
Stark	15	6
...	...	...
Will	1510	1068
DuPage	1712	1371
Chicago	3682	2717
Cook	4199	2986
Illinois	24923	18559

	Change_Rate
CountyName
Brown	-0.833333
Hamilton	-0.666667
Pope	-0.666667
Henderson	-0.416667
Mason	-0.391304
...	...
Menard	1.259259
Stark	1.500000
Pike	1.526316
McDonough	1.760000
Scott	inf