Author: Rebecca Vandewalle rcv3@illinois.edu
Created: 5-31-21
This notebook demonstrates how to create interactive maps using the bokeh library (https://docs.bokeh.org/en/latest/index.html).
The goals of this notebook are:
This tutorial uses Bokeh 2.3 or higher. To install, remove the comment on the next line. Once it is installed, add the comment back so you are not re-installing the library each time you run the notebook.
#pip install --upgrade bokeh==2.3
# check bokeh version - expected '2.3.0'
import bokeh
bokeh.__version__
The city of Chicago has an open data portal with a variety of geospatial and non-geospatial datasets at https://data.cityofchicago.org/browse?q=zip%20codes&sortBy=relevance.
Data for this notebook has been downloaded from this portal, renamed for convenience, and is stored in the root folder.
The raw data for the chicago_parks
shapefile (in the chicago_parks
folder) can be found here: https://data.cityofchicago.org/Parks-Recreation/Parks-Chicago-Park-District-Facilities-current-/5yyk-qt9y . This dataset contains general information and point locations for facilities within parks in Chicago.
The raw data for the chicago_zip
shapefile (in the chicago_zip
folder) can be found here: https://data.cityofchicago.org/Facilities-Geographic-Boundaries/Boundaries-ZIP-Codes/gdcf-axmw . This dataset contains zip code boundaries for Chicago.
Next, movies_2018.csv
contains movies that were show in Chicago parks during summer 2018. The raw data can be found here: https://data.cityofchicago.org/Events/Chicago-Park-District-Movies-in-the-Parks-2018/e2v8-k3us .
Finally, an example image, Back_to_the_Future.jpeg
, was downloaded from https://en.wikipedia.org/wiki/Back_to_the_Future#/media/File:Back_to_the_Future.jpg .
The following libraries will be used in this notebook. Run the next cell to import them.
# import libraries
import numpy as np
from datetime import date, datetime
import pandas as pd
import geopandas as gpd
from bokeh.plotting import save, figure
from bokeh.io import show, output_notebook, push_notebook
from bokeh.models import GeoJSONDataSource, ColumnDataSource, CustomJS
from bokeh.models import LinearColorMapper, ColorBar
from bokeh.models import DataTable, DateFormatter, TableColumn
from bokeh.models import HoverTool, TapTool, Div
from bokeh.models import DateRangeSlider, Dropdown
from bokeh.palettes import brewer
from bokeh.events import Tap
from bokeh.tile_providers import Vendors, get_provider
from bokeh.layouts import gridplot, layout, column, row
The output_notebook
function is needed for the interactions to work properly.
# run to properly interact with bokeh
output_notebook();
This first example will demonstrate how to load shapefiles, and display two linked maps.
It is first necessary to load shapefiles. These will be read with geopandas (gpd) and loaded as a geodataframe.
# load chicago zip codes dataset
chicago_zip_fp = r'chicago_zip/chicago_zip.shp'
chi_zip_codes = gpd.read_file(chicago_zip_fp)
# load chicago zip codes dataset
chicago_parks_fp = r'chicago_parks/chicago_parks.shp'
chi_parks_gdf = gpd.read_file(chicago_parks_fp)
# display dataset types
print(type(chi_zip_codes))
print(type(chi_parks_gdf))
It is very useful to check the first few rows of a loaded dataset and possibly quickly plot the data to see what the data look like.
# view first few rows of chicago zip codes dataset
chi_zip_codes.head()
# view first few rows of chicago parks dataset
chi_parks_gdf.head()
# view simple plot of chicago zip codes dataset
chi_zip_codes.plot();
# view simple plot of chicago zip codes dataset
chi_parks_gdf.plot();
Next we will make some adjustments to the shapefiles to prepare to plot. We will make sure the spatial information is in the needed format and add some columns to allow for plotting.
For everything to plot correctly, we need the coordinate reference system (CRS) for each geospatial data source to be the same. Additionally, since we will be plotting over external map tiles, we need the CRS to be the Web Mercator projection (epsg: 3857). Common reference systems can be found here.
# display current CRS
zip_CRS = chi_zip_codes.crs
park_CRS = chi_parks_gdf.crs
print(zip_CRS)
print(park_CRS)
# convert to web mercator (epsg=3857) for tile mapping
chi_zip_codes = chi_zip_codes.to_crs(epsg=3857)
chi_parks_gdf = chi_parks_gdf.to_crs(epsg=3857)
zip_CRS = chi_zip_codes.crs
park_CRS = chi_parks_gdf.crs
print(zip_CRS)
print(park_CRS)
We will add a column that has a different color to the parks dataset depending on whether the park location is indoor or outdoor. np.where
is a filtering function that returns data depending on a condition. In this case, the function returns red
if the facility type is OUTDOOR
, or blue
for anything else.
For more info refer to the np.where
docs here.
# create an array of color values based on facility types
np.where(chi_parks_gdf['facility_t'] == 'OUTDOOR', 'red', 'blue')
The above code created an array but didn't add it to the parks geodataframe. The next cell assigns the array to the chi_parks_gdf
dataframe, creating a new column called pt_color
.
# color outdoor parks red, everything else blue
chi_parks_gdf['pt_color'] = np.where(chi_parks_gdf['facility_t']
== 'OUTDOOR', 'red', 'blue')
# view the updated dataframe
chi_parks_gdf.head()
It is often useful to summarize one spatial data file by how it is spatially connected to another. In this case, we want to know how many park facilities are in each zip code. We can do this using a spatial join (gpd.sjoin). More information about merging data based on space and attributes can be found here
# create a temporary geodataframe with spatial join
points_in_polygon = gpd.sjoin(chi_parks_gdf, chi_zip_codes, how="inner", op='intersects')
points_in_polygon['count']=1
Notice the resulting dataframe has attributes from both the parks and the zip codes datasets, but only one geometry
column.
# preview data frame
points_in_polygon.head()
Now you can group all the points by zip code.
# create a temporary geodataframe with spatial join grouped by zip code
new_df = gpd.GeoDataFrame(points_in_polygon.groupby(['zip']).sum()['count'])
# preview data frame
new_df.head()
Finally, we can join the mini new_df
to the existing zip codes data set.
# combine dataframe with counts with zip codes data frame
zip_with_pk_ct = chi_zip_codes.merge(new_df, on='zip')
Now each of the zip code rows contains a count of how many park facilities are within the zip code.
# preview updated data frame
zip_with_pk_ct.head()
Now we can start setting up the first map. We first need to set the two geodataframes as sources for Bokeh.
# set bokeh data sources
parks_source = GeoJSONDataSource(geojson=chi_parks_gdf.to_json())
zips_source = GeoJSONDataSource(geojson=zip_with_pk_ct.to_json())
Now we will set some graphics settings. The tile_provider
determines what tile set is used for the background. Brewer
is a handy collection of color palettes.
# set graphics settings
tile_provider = get_provider(Vendors.OSM) # for tile background
palette = list(brewer['OrRd'][8]) # for zip code colors
palette.reverse()
vals = zip_with_pk_ct['count']
color_mapper = LinearColorMapper(palette = palette,
low = vals.min(), high = vals.max())
color_bar = ColorBar(color_mapper=color_mapper,
label_standoff=8, width=500, height=20,
location=(0,0), orientation='horizontal')
Now we will set up the first map window.
# Set up first sub map
s1 = figure(title="Park Map", plot_width=500,
x_range=(-9800000, -9740000), y_range=(5100000, 5170000))
# add zip code boundaries
s1.patches('xs', 'ys', source=zips_source, legend_label='zip codes',
fill_alpha=0.0, line_color='black')
# add park points
s1.circle(x='x', y='y', source=parks_source, legend_label='parks', size=5,
color='pt_color', name="park_pts")
# add a tooltip for parks
hover = HoverTool(tooltips=[("park type", "@facility_n")], names=["park_pts"])
s1.add_tools(hover)
s1.legend.location = "top_right"
s1.legend.click_policy="hide" # hide the contents if you click the legend
Now we will set up the second map window.
# Set up second sub map
# the x_range and y_range make so that panning and zooming
# the maps will show the same extent
s2 = figure(title="Park Count Map", plot_width=500, x_range=s1.x_range, y_range=s1.y_range,
tooltips=[("zip code", "@zip"),("count", "@count")])
# add background tiles
s2.add_tile(tile_provider)
# add zip codes
s2.patches('xs', 'ys', source=zips_source, legend_label='zip codes',
fill_alpha=0.8, line_color='grey',
fill_color={'field' :'count' , 'transform': color_mapper})
s2.legend.location = "top_right"
# display map
p = layout(row(s1, s2), sizing_mode="stretch_width")
show(p)