Interactive Mapping using Python and Bokeh

Author: Rebecca Vandewalle rcv3@illinois.edu
Created: 5-31-21

This notebook demonstrates how to create interactive maps using the bokeh library (https://docs.bokeh.org/en/latest/index.html).

Introduction

The goals of this notebook are:

  • Learn how to map shapefiles
  • Learn how edit and create new dataframes and geodataframes
  • Learn how to add interactive components to maps and link data

Installing Bokeh

This tutorial uses Bokeh 2.3 or higher. To install, remove the comment on the next line. Once it is installed, add the comment back so you are not re-installing the library each time you run the notebook.

In [1]:
#pip install --upgrade bokeh==2.3
In [2]:
# check bokeh version - expected '2.3.0'
import bokeh
bokeh.__version__
Out[2]:
'2.3.0'

Data used in the Notebook

The city of Chicago has an open data portal with a variety of geospatial and non-geospatial datasets at https://data.cityofchicago.org/browse?q=zip%20codes&sortBy=relevance.

Data for this notebook has been downloaded from this portal, renamed for convenience, and is stored in the root folder.

The raw data for the chicago_parks shapefile (in the chicago_parks folder) can be found here: https://data.cityofchicago.org/Parks-Recreation/Parks-Chicago-Park-District-Facilities-current-/5yyk-qt9y . This dataset contains general information and point locations for facilities within parks in Chicago.

The raw data for the chicago_zip shapefile (in the chicago_zip folder) can be found here: https://data.cityofchicago.org/Facilities-Geographic-Boundaries/Boundaries-ZIP-Codes/gdcf-axmw . This dataset contains zip code boundaries for Chicago.

Next, movies_2018.csv contains movies that were show in Chicago parks during summer 2018. The raw data can be found here: https://data.cityofchicago.org/Events/Chicago-Park-District-Movies-in-the-Parks-2018/e2v8-k3us .

Finally, an example image, Back_to_the_Future.jpeg, was downloaded from https://en.wikipedia.org/wiki/Back_to_the_Future#/media/File:Back_to_the_Future.jpg .

Load Libraries and setup Bokeh

The following libraries will be used in this notebook. Run the next cell to import them.

In [3]:
# import libraries

import numpy as np
from datetime import date, datetime
import pandas as pd
import geopandas as gpd

from bokeh.plotting import save, figure
from bokeh.io import show, output_notebook, push_notebook
from bokeh.models import GeoJSONDataSource, ColumnDataSource, CustomJS
from bokeh.models import LinearColorMapper, ColorBar
from bokeh.models import DataTable, DateFormatter, TableColumn
from bokeh.models import HoverTool, TapTool, Div
from bokeh.models import DateRangeSlider, Dropdown
from bokeh.palettes import brewer
from bokeh.events import Tap
from bokeh.tile_providers import Vendors, get_provider
from bokeh.layouts import gridplot, layout, column, row

The output_notebook function is needed for the interactions to work properly.

In [4]:
# run to properly interact with bokeh

output_notebook();
Loading BokehJS ...

Map a Shapefile

This first example will demonstrate how to load shapefiles, and display two linked maps.

Load Shapefiles

It is first necessary to load shapefiles. These will be read with geopandas (gpd) and loaded as a geodataframe.

In [5]:
# load chicago zip codes dataset

chicago_zip_fp = r'chicago_zip/chicago_zip.shp'
chi_zip_codes = gpd.read_file(chicago_zip_fp)
In [6]:
# load chicago zip codes dataset

chicago_parks_fp = r'chicago_parks/chicago_parks.shp'
chi_parks_gdf = gpd.read_file(chicago_parks_fp)
In [7]:
# display dataset types
print(type(chi_zip_codes))
print(type(chi_parks_gdf))
<class 'geopandas.geodataframe.GeoDataFrame'>
<class 'geopandas.geodataframe.GeoDataFrame'>

It is very useful to check the first few rows of a loaded dataset and possibly quickly plot the data to see what the data look like.

In [8]:
# view first few rows of chicago zip codes dataset

chi_zip_codes.head()
Out[8]:
objectid shape_area shape_len zip geometry
0 33.0 1.060523e+08 42720.044406 60647 POLYGON ((-87.67762 41.91776, -87.67761 41.917...
1 34.0 1.274761e+08 48103.782721 60639 POLYGON ((-87.72683 41.92265, -87.72693 41.922...
2 35.0 4.506904e+07 27288.609612 60707 POLYGON ((-87.78500 41.90915, -87.78531 41.909...
3 36.0 7.085383e+07 42527.989679 60622 POLYGON ((-87.66707 41.88885, -87.66707 41.888...
4 37.0 9.903962e+07 47970.140153 60651 POLYGON ((-87.70656 41.89555, -87.70672 41.895...
In [9]:
# view first few rows of chicago parks dataset

chi_parks_gdf.head()
Out[9]:
facility_n facility_t gisobjid objectid park park_no x_coord y_coord geometry
0 CULTURAL CENTER SPECIAL 2494.0 1066.0 HAMILTON (ALEXANDER) 9.0 -87.637698 41.762999 POINT (-87.63770 41.76300)
1 GYMNASIUM INDOOR 2495.0 1067.0 HAMILTON (ALEXANDER) 9.0 -87.637929 41.762817 POINT (-87.63793 41.76282)
2 BASEBALL JR/SOFTBALL OUTDOOR 2496.0 1068.0 HAMILTON (ALEXANDER) 9.0 -87.636914 41.760849 POINT (-87.63691 41.76085)
3 BASEBALL JR/SOFTBALL OUTDOOR 2497.0 1069.0 HAMILTON (ALEXANDER) 9.0 -87.638320 41.762005 POINT (-87.63832 41.76201)
4 BASEBALL JR/SOFTBALL OUTDOOR 2498.0 1070.0 HAMILTON (ALEXANDER) 9.0 -87.638059 41.760474 POINT (-87.63806 41.76047)
In [10]:
# view simple plot of chicago zip codes dataset

chi_zip_codes.plot();
In [11]:
# view simple plot of chicago zip codes dataset

chi_parks_gdf.plot();

Edit Shapefiles for Plotting

Next we will make some adjustments to the shapefiles to prepare to plot. We will make sure the spatial information is in the needed format and add some columns to allow for plotting.

Adjust coordinate reference

For everything to plot correctly, we need the coordinate reference system (CRS) for each geospatial data source to be the same. Additionally, since we will be plotting over external map tiles, we need the CRS to be the Web Mercator projection (epsg: 3857). Common reference systems can be found here.

In [12]:
# display current CRS

zip_CRS = chi_zip_codes.crs
park_CRS = chi_parks_gdf.crs

print(zip_CRS)
print(park_CRS)
epsg:4326
epsg:4326
In [13]:
# convert to web mercator (epsg=3857) for tile mapping

chi_zip_codes = chi_zip_codes.to_crs(epsg=3857)
chi_parks_gdf = chi_parks_gdf.to_crs(epsg=3857)

zip_CRS = chi_zip_codes.crs
park_CRS = chi_parks_gdf.crs

print(zip_CRS)
print(park_CRS)
epsg:3857
epsg:3857

Add a point color column to the parks dataset

We will add a column that has a different color to the parks dataset depending on whether the park location is indoor or outdoor. np.where is a filtering function that returns data depending on a condition. In this case, the function returns red if the facility type is OUTDOOR, or blue for anything else.

For more info refer to the np.where docs here.

In [14]:
# create an array of color values based on facility types

np.where(chi_parks_gdf['facility_t'] == 'OUTDOOR', 'red', 'blue')
Out[14]:
array(['blue', 'blue', 'red', ..., 'red', 'red', 'blue'], dtype='<U4')

The above code created an array but didn't add it to the parks geodataframe. The next cell assigns the array to the chi_parks_gdf dataframe, creating a new column called pt_color.

In [15]:
# color outdoor parks red, everything else blue

chi_parks_gdf['pt_color'] = np.where(chi_parks_gdf['facility_t']
                                     == 'OUTDOOR', 'red', 'blue')
In [16]:
# view the updated dataframe

chi_parks_gdf.head()
Out[16]:
facility_n facility_t gisobjid objectid park park_no x_coord y_coord geometry pt_color
0 CULTURAL CENTER SPECIAL 2494.0 1066.0 HAMILTON (ALEXANDER) 9.0 -87.637698 41.762999 POINT (-9755783.874 5125543.723) blue
1 GYMNASIUM INDOOR 2495.0 1067.0 HAMILTON (ALEXANDER) 9.0 -87.637929 41.762817 POINT (-9755809.634 5125516.459) blue
2 BASEBALL JR/SOFTBALL OUTDOOR 2496.0 1068.0 HAMILTON (ALEXANDER) 9.0 -87.636914 41.760849 POINT (-9755696.597 5125222.888) red
3 BASEBALL JR/SOFTBALL OUTDOOR 2497.0 1069.0 HAMILTON (ALEXANDER) 9.0 -87.638320 41.762005 POINT (-9755853.171 5125395.401) red
4 BASEBALL JR/SOFTBALL OUTDOOR 2498.0 1070.0 HAMILTON (ALEXANDER) 9.0 -87.638059 41.760474 POINT (-9755824.121 5125166.843) red

Get a count of the parks in each zip code

It is often useful to summarize one spatial data file by how it is spatially connected to another. In this case, we want to know how many park facilities are in each zip code. We can do this using a spatial join (gpd.sjoin). More information about merging data based on space and attributes can be found here

In [17]:
# create a temporary geodataframe with spatial join

points_in_polygon = gpd.sjoin(chi_parks_gdf, chi_zip_codes, how="inner", op='intersects')
points_in_polygon['count']=1

Notice the resulting dataframe has attributes from both the parks and the zip codes datasets, but only one geometry column.

In [18]:
# preview data frame

points_in_polygon.head()
Out[18]:
facility_n facility_t gisobjid objectid_left park park_no x_coord y_coord geometry pt_color index_right objectid_right shape_area shape_len zip count
0 CULTURAL CENTER SPECIAL 2494.0 1066.0 HAMILTON (ALEXANDER) 9.0 -87.637698 41.762999 POINT (-9755783.874 5125543.723) blue 10 9.0 1.047468e+08 42299.920391 60621 1
1 GYMNASIUM INDOOR 2495.0 1067.0 HAMILTON (ALEXANDER) 9.0 -87.637929 41.762817 POINT (-9755809.634 5125516.459) blue 10 9.0 1.047468e+08 42299.920391 60621 1
2 BASEBALL JR/SOFTBALL OUTDOOR 2496.0 1068.0 HAMILTON (ALEXANDER) 9.0 -87.636914 41.760849 POINT (-9755696.597 5125222.888) red 10 9.0 1.047468e+08 42299.920391 60621 1
3 BASEBALL JR/SOFTBALL OUTDOOR 2497.0 1069.0 HAMILTON (ALEXANDER) 9.0 -87.638320 41.762005 POINT (-9755853.171 5125395.401) red 10 9.0 1.047468e+08 42299.920391 60621 1
4 BASEBALL JR/SOFTBALL OUTDOOR 2498.0 1070.0 HAMILTON (ALEXANDER) 9.0 -87.638059 41.760474 POINT (-9755824.121 5125166.843) red 10 9.0 1.047468e+08 42299.920391 60621 1

Now you can group all the points by zip code.

In [19]:
# create a temporary geodataframe with spatial join grouped by zip code

new_df = gpd.GeoDataFrame(points_in_polygon.groupby(['zip']).sum()['count'])
In [20]:
# preview data frame

new_df.head()
Out[20]:
count
zip
60601 4
60602 11
60603 1
60604 1
60605 50

Finally, we can join the mini new_df to the existing zip codes data set.

In [21]:
# combine dataframe with counts with zip codes data frame

zip_with_pk_ct = chi_zip_codes.merge(new_df, on='zip')

Now each of the zip code rows contains a count of how many park facilities are within the zip code.

In [22]:
# preview updated data frame

zip_with_pk_ct.head()
Out[22]:
objectid shape_area shape_len zip geometry count
0 33.0 1.060523e+08 42720.044406 60647 POLYGON ((-9760228.181 5148667.912, -9760226.5... 72
1 34.0 1.274761e+08 48103.782721 60639 POLYGON ((-9765706.326 5149399.264, -9765717.1... 103
2 35.0 4.506904e+07 27288.609612 60707 POLYGON ((-9772181.764 5147379.934, -9772215.9... 11
3 51.0 3.450671e+06 7909.890407 60707 POLYGON ((-9774588.623 5151174.579, -9774614.5... 11
4 36.0 7.085383e+07 42527.989679 60622 POLYGON ((-9759053.446 5144344.509, -9759053.4... 94

Set Up the Map

Now we can start setting up the first map. We first need to set the two geodataframes as sources for Bokeh.

In [23]:
# set bokeh data sources

parks_source = GeoJSONDataSource(geojson=chi_parks_gdf.to_json())
zips_source = GeoJSONDataSource(geojson=zip_with_pk_ct.to_json())

Now we will set some graphics settings. The tile_provider determines what tile set is used for the background. Brewer is a handy collection of color palettes.

In [24]:
# set graphics settings

tile_provider = get_provider(Vendors.OSM)  # for tile background

palette = list(brewer['OrRd'][8]) # for zip code colors
palette.reverse()
vals = zip_with_pk_ct['count']
color_mapper = LinearColorMapper(palette = palette,
                                       low = vals.min(), high = vals.max())
color_bar = ColorBar(color_mapper=color_mapper,
                           label_standoff=8, width=500, height=20,
                           location=(0,0), orientation='horizontal')

Now we will set up the first map window.

In [25]:
# Set up first sub map

s1 = figure(title="Park Map", plot_width=500, 
            x_range=(-9800000, -9740000), y_range=(5100000, 5170000))

# add zip code boundaries
s1.patches('xs', 'ys', source=zips_source, legend_label='zip codes',
           fill_alpha=0.0, line_color='black')

# add park points
s1.circle(x='x', y='y', source=parks_source, legend_label='parks', size=5, 
          color='pt_color', name="park_pts")

# add a tooltip for parks
hover = HoverTool(tooltips=[("park type", "@facility_n")], names=["park_pts"])
s1.add_tools(hover)

s1.legend.location = "top_right"
s1.legend.click_policy="hide" # hide the contents if you click the legend

Now we will set up the second map window.

In [26]:
# Set up second sub map

# the x_range and y_range make so that panning and zooming
# the maps will show the same extent
s2 = figure(title="Park Count Map", plot_width=500, x_range=s1.x_range, y_range=s1.y_range, 
            tooltips=[("zip code", "@zip"),("count", "@count")])

# add background tiles
s2.add_tile(tile_provider)

# add zip codes
s2.patches('xs', 'ys', source=zips_source, legend_label='zip codes',
           fill_alpha=0.8, line_color='grey',
           fill_color={'field' :'count' , 'transform': color_mapper})
s2.legend.location = "top_right"
In [27]:
# display map

p = layout(row(s1, s2), sizing_mode="stretch_width")
show(p)