Author: Tianci Guo, Anand Padmanabhan
Affiliation: Department of Geography & GIS, University of Illinois Urbana-Champaign
Course: GGIS 407
In this notebook, you'll learn:
OSMnx to download OSM data?OpenStreetMap is a collaborative project to create a free, editable map of the world. It contains data for example about streets, buildings, different services, and landuse to mention a few.
Key concepts:
highway=residential, amenity=cafe).We will not edit OSM in this course, but we will consume OSM data for analysis and visualization. You can also sign up as a contributor if you want to edit the map. More details about OpenStreetMap and its contents are available in the OpenStreetMap Wiki.
OSM data is described using tags, which are simple key = value pairs.
| Category | Key | Example Values | Meaning |
|---|---|---|---|
| Roads | highway |
primary, residential, footway |
Type of road or path |
| Buildings | building |
yes, house, school |
Building footprint |
| Land Use | landuse |
residential, commercial |
Land use type |
| POIs | amenity |
cafe, restaurant, library |
Common services |
| Transport | public_transport |
platform, stop_position |
Transit features |
OSMnx is a Python library for:
Reference: Boeing, G. (2017). OSMnx: New methods for acquiring, constructing, analyzing, and visualizing complex street networks.
Run the cell below to import OSMnx, GeoPandas, and Matplotlib.
import osmnx as ox
import geopandas as gpd
import matplotlib.pyplot as plt
import pandas as pd
import contextily as cx
ox.settings.use_cache = True
ox.settings.log_console = True
print('OSMnx version:', ox.__version__)
osmnx.geocode_to_gdf() can transform a place name, ZIP code, or administrative region into its corresponding boundary polygon.
This is useful for defining areas of interest before extracting networks.
# Get the city boundaries and campus polygon
# Each geocode_to_gdf() call returns a GeoDataFrame containing the boundary polygon
champaign = ox.geocode_to_gdf('Champaign, Illinois, USA')
urbana = ox.geocode_to_gdf('Urbana, Illinois, USA')
uiuc = ox.geocode_to_gdf('University of Illinois Urbana-Champaign')
display(champaign, urbana, uiuc)
# Get the city boundaries and campus polygon
# Each geocode_to_gdf() call returns a GeoDataFrame containing the boundary polygon
champaign = ox.geocode_to_gdf('Champaign, Illinois, USA')
urbana = ox.geocode_to_gdf('Urbana, Illinois, USA')
uiuc = ox.geocode_to_gdf('University of Illinois Urbana-Champaign')
display(champaign, urbana, uiuc)
Let's plot Champaign, Urbana, and the UIUC campus together.
# Unify the GeoDataFrames for plotting and to get the combined extent
combined = gpd.GeoDataFrame(pd.concat([champaign, urbana, uiuc], ignore_index=True),
crs=champaign.crs)
fig, ax = plt.subplots(figsize=(8, 8))
# Plot the boundaries
champaign.boundary.plot(ax=ax, label='Champaign', linewidth=1)
urbana.boundary.plot(ax=ax, label='Urbana', linewidth=1, linestyle='--')
uiuc.boundary.plot(ax=ax, label='UIUC Campus', linewidth=2)
# Add a basemap
cx.add_basemap(ax=ax, crs=combined.crs.to_string(), source=cx.providers.CartoDB.Positron)
ax.set_title('Champaign, Urbana, and UIUC Campus')
ax.legend()
plt.tight_layout()
plt.show()
import contextily as cx
# List of place names at different geographic scales
queries = [
"Denver, CO, USA", # city
"Champaign County, Illinois, USA", # county
"Texas, USA", # state
"United States" # country
]
# Loop through each place name
for q in queries:
# Retrieve the boundary polygon for the place using OSM geocoding.
# The result is returned as a GeoDataFrame.
tmp = ox.geocode_to_gdf(q)
# Plot the boundary geometry
ax = tmp.plot()
# Add a basemap underneath the boundary.
# contextily needs the CRS to correctly align the basemap.
cx.add_basemap(ax=ax, crs=tmp.crs.to_string())
# Add a descriptive title showing which place is being plotted
ax.set_title(q)
# Display the figure
plt.show()
OSMnx can download a street network graph for a specified place.
graph_from_place() returns a NetworkX MultiDiGraph, representing a street network:
This object is not a GeoDataFrame — it is a graph designed for:
We will work with:
# Define the places we want to download the driving network for:
# Champaign and Urbana together form the metro area around UIUC.
places_cu = ["Champaign, Illinois, USA", "Urbana, Illinois, USA"]
# Download the street network for driving for the combined area.
# network_type="drive" keeps only drivable roads (no footpaths, no alleys).
G_drive = ox.graph_from_place(places_cu, network_type="drive")
# Download the walking network specifically for the UIUC campus area.
# network_type="walk" includes footpaths, sidewalks, pedestrian areas, and other walkable connections used by pedestrians.
G_walk_uiuc = ox.graph_from_place(
'University of Illinois Urbana-Champaign',
network_type='walk'
)
# Display the two graphs (useful for confirming they were created successfully)
G_drive, G_walk_uiuc
# Project the graph to Web Mercator (required for basemaps)
G_drive_proj = ox.project_graph(G_drive, to_crs="EPSG:3857")
fig, ax = ox.plot_graph(G_drive_proj, bgcolor='white', node_size=0,
edge_linewidth=0.5,edge_color='blue',
show=False, close=False)
# Add basemap
cx.add_basemap(ax, crs="EPSG:3857")
plt.show()
# Project the graph to Web Mercator (required for basemaps)
G_walk_uiuc_proj = ox.project_graph(G_walk_uiuc, to_crs="EPSG:3857")
# Plot the graph but keep the axis open for adding layers
fig, ax = ox.plot_graph(G_walk_uiuc_proj, bgcolor='white', node_size=0,
edge_linewidth=0.5,edge_color='blue',
show=False, close=False)
# Add basemap
cx.add_basemap(ax, crs="EPSG:3857")
plt.show()
OSMnx can convert a network graph into nodes and edges GeoDataFrames by ox.graph_to_gdfs(), which we can inspect and filter.
Let's do this for the UIUC walking network.
# Convert the walking network graph into GeoDataFrames:
# - nodes_uiuc: contains node locations (points)
# - edges_uiuc: contains street segments connecting nodes (lines)
nodes_uiuc, edges_uiuc = ox.graph_to_gdfs(G_walk_uiuc)
# Display the first few rows of each GeoDataFrame
# This helps verify the structure and attributes of nodes and edges
display(nodes_uiuc.head(), edges_uiuc.head())
#Project to Web Mercator for basemap compatibility
nodes_uiuc_proj = nodes_uiuc.to_crs(epsg=3857)
edges_uiuc_proj = edges_uiuc.to_crs(epsg=3857)
# Plot edges and nodes
fig, ax = plt.subplots(figsize=(8, 8))
edges_uiuc_proj.plot(ax=ax, linewidth=0.4)
nodes_uiuc_proj.plot(ax=ax, markersize=1, color='red')
# Add basemap
cx.add_basemap(ax, crs=edges_uiuc_proj.crs)
ax.set_title('UIUC Campus Walking Network: Edges and Nodes')
ax.set_axis_off()
plt.show()
OSMnx’s geometries_from_place and geometries_from_bbox functions allow us to retrieve any OSM features using tag dictionaries.
Features are defined by tags, such as:
{'building': True} {'amenity': 'cafe'} {'highway': 'bus_stop'}Use geometries_from_bbox when you want data for a rectangle defined by latitude/longitude.
We will focus on features around the UIUC campus.
# Define a bounding box around the UIUC campus using latitude and longitude
north = 40.1205
south = 40.0950
east = -88.2100
west = -88.2450
# Retrieve all building footprints within the bounding box from OpenStreetMap
# "tags={"building": True}" means we only want features with a "building" tag
buildings_uiuc = ox.geometries_from_bbox(
north, south, east, west,
tags={"building": True}
)
# Display the first few rows of the GeoDataFrame
buildings_uiuc.head()
To understand the variety of OSM tags, you can print unique values.
# Find all building types and drop note available
buildings_uiuc['building'].dropna().unique()
# Project to Web Mercator (required for basemap)
buildings_uiuc_proj = buildings_uiuc.to_crs(epsg=3857)
# Plot
fig, ax = plt.subplots(figsize=(8, 8))
# Plot buildings
buildings_uiuc_proj.plot(ax=ax, facecolor='lightgray', edgecolor='red', linewidth=0.3)
# Add basemap
cx.add_basemap(ax, crs=buildings_uiuc_proj.crs)
# Title and layout
ax.set_title('UIUC Campus Buildings')
plt.tight_layout()
plt.show()
When you retrieve buildings using:
buildings_uiuc = ox.geometries_from_place(..., {"building": True})
OpenStreetMap returns all features that contain the tag building=*.
These features may have different geometry types:
Some OpenStreetMap contributors map buildings only as points, typically when no detailed footprint is available.
OSMnx preserves all geometry types exactly as they exist in the OSM database.
Therefore, when you plot the resulting GeoDataFrame:
If you want to display only building footprints, you can filter polygons using:
# Keep only Polygon or MultiPolygon types
buildings_poly = buildings_uiuc[buildings_uiuc.geometry.type.isin(["Polygon", "MultiPolygon"])]
# Reproject to Web Mercator for basemap
buildings_poly_proj = buildings_poly.to_crs(epsg=3857)
# Plot
fig, ax = plt.subplots(figsize=(8, 8))
buildings_poly_proj.plot(ax=ax, facecolor='lightgray', edgecolor='blue', linewidth=0.3)
# Add basemap
cx.add_basemap(ax, crs=buildings_poly_proj.crs)
Use geometries_from_place when you want OSM data for an administrative region or named location.
We can query what amenities are present.
# Define the place for which we want to extract amenities
campus_place = 'University of Illinois Urbana-Champaign'
# Download all geometries from OSM within the campus boundary
# that have an "amenity" tag (True means: return any value of 'amenity')
amenity_values = ox.geometries_from_place(
campus_place,
{"amenity": True}
)
# Extract the "amenity" column, remove missing values,
# and list all unique amenity types found on campus
amenity_values['amenity'].dropna().unique()
We can query for cafes and coffee-related amenities near Green Street / Campustown.
# Define the OSM tags we want to extract:
# - amenity = cafe or fast_food
# - cuisine = coffee_shop
# This helps capture cafés and places serving coffee on campus.
tags_cafes = {'amenity': ['cafe', 'fast_food'], 'cuisine': 'coffee_shop'}
# Query all OSM geometries within the campus boundary that match the tags above (café-type amenities)
cafes_uiuc = ox.geometries_from_place(campus_place, tags_cafes)
# Display the first few rows, showing:
# - name: the café or shop name (if available)
# - amenity: the amenity category returned by OSM
# - geometry: the spatial location (Point or Polygon)
cafes_uiuc[['name', 'amenity', 'geometry']].head()
fig, ax = plt.subplots(figsize=(8, 8))
# Plot cafes
cafes_uiuc.plot(ax=ax, color='darkred', markersize=20)
# Add basemap
cx.add_basemap(ax, crs=cafes_uiuc.crs)
ax.set_title('UIUC Campus: Cafes and Coffee Shops')
plt.tight_layout()
plt.show()
We can filter edges based on the highway tag.
Now, we have a driving network for Champaign–Urbana named G_drive. First, let's take a look at what highway types appear in the network.
# Convert the graph into GeoDataFrames for nodes and edges
nodes_cu, edges_cu = ox.graph_to_gdfs(G_drive)
# Extract the "highway" column and drop rows where the value is missing (NaN)
highway_vals = edges_cu['highway'].dropna()
# Find all unique highway types in the dataset
unique_highways = (
highway_vals
# Each "highway" value might be a string or a list (OSM can store multiple tags)
# → Convert strings to single-item lists so we can handle them uniformly
.apply(lambda x: x if isinstance(x, list) else [x])
# Flatten all lists into one long series of individual highway types
.explode()
# Sort for easier reading
.sort_values()
# Get unique values
.unique()
)
# Display all unique highway categories found in the graph
unique_highways
Then, let's show only primary and secondary roads around Champaign–Urbana.
# Download the driving network for Champaign–Urbana, but keep only primary and secondary roads.
# - network_type='drive' ensures only drivable streets are included.
# - retain_all=True keeps all disconnected subgraphs instead of just the largest.
# - custom_filter='["highway"~"primary|secondary"]' filters for primary and secondary highways only.
G_primary = ox.graph_from_place(
places_cu,
network_type='drive',
retain_all=True,
custom_filter='["highway"~"primary|secondary"]'
)
fig, ax = ox.plot_graph(G_primary, bgcolor='white', node_size=0, edge_linewidth=1)
Now let’s assume we want to access this data outside of python, or have a permanent copy of our building footprints for UIUC campus.
Although these objects are already geopandas.GeoDataFrame, We can’t write OSM GeoDataFrames directly to disk because they contain field types (like lists) that can’t be saved in .shp or .geojsons etc. Instead lets isolate only the attributes we are interested in, including geometry which is required.
We need to isolate just the attributes we are interested in:
# Select only the columns that we are interested in keeping from the GeoDataFrame
# - 'addr:' selects all columns containing address information (e.g., addr:housenumber, addr:street)
# - 'geometry' ensures we keep the spatial geometry of each building, which is required for GIS operations
buildings_uiuc_exp = buildings_uiuc.loc[:,buildings_uiuc.columns.str.contains('addr:|geometry')]
OSM data often contains multiple feature types like mixing points with polygons. This is a problem when we try to write it to disk.
We also need to isolate the feature type we are looking for [e.g. Multipolygon, Polygon, Point]. Since here we want building footprints we are going to keep only polygons.
buildings_uiuc_exp = buildings_uiuc_exp.loc[buildings_uiuc_exp.geometry.type=='Polygon']
Now, finally, we can write it to disk.
# Save building footprints to a Shapefile
buildings_uiuc_exp.to_file('uiuc_buildings.shp')
# Alternatively, save in a more open-source and flexible format like GeoJSON
buildings_uiuc_exp.to_file('uiuc_buildings.geojson', driver='GeoJSON')
During the earlier exercises in this notebook, several patterns emerged when working with OSM queries. For example, building extractions included points and polygons together, street attributes sometimes contained lists, and many feature tables included a large number of rarely used or empty columns. These issues are common in crowdsourced datasets like OSM and must be addressed before performing spatial analysis, visualization, or exporting results to GIS formats.
This section summarizes the most important cleaning and preprocessing steps that help ensure consistent and usable OSM datasets.
OSM queries often return heterogeneous geometries in a single GeoDataFrame:
For example, buildings retrieved with {"building": True} may appear as points or polygons.
To work with building footprints, keep only polygon-based geometries:
# Keep only Polygon and MultiPolygon geometries.
# This removes points and lines that are not building footprints.
buildings_uiuc_clean = buildings_uiuc[buildings_uiuc.geometry.type.isin(["Polygon", "MultiPolygon"])].copy()
This avoids errors when saving data to shapefiles and ensures consistent geometry types.
OSM-derived GeoDataFrames often contain dozens or even hundreds of attribute fields because OpenStreetMap supports an open tagging system. Each contributor may add different tags to a feature (e.g., building:levels, addr:street, amenity, operator, opening_hours, etc.). Therefore, the first step is to inspect which attributes are present and evaluate what is relevant for your analysis or export format.
You can check the available fields by examining the columns of the GeoDataFrame:
# Display all attribute names in the dataset
list(buildings_uiuc.columns)
To quickly preview non-empty columns:
# Show only columns containing at least one non-null value
buildings_uiuc.loc[:, buildings_uiuc.notna().any()].head()
You can also drop entirely empty columns:
# Make a copy of the original cafes dataset to avoid modifying it directly
gdf = cafes_uiuc.copy()
gdf_clean = gdf.dropna(axis=1, how="all")
gdf_clean.head()
The decisioof which attributes to retain n depends on your analytical goals and GIS export constraints.
In practice, attribute selection typically follows the criteria below.
geometry.name) if they will be used for labeling or spatial selection.building, building:levels, address-related tags.amenity, shop, cuisine, etc.Using these rules, we select only the fields that will actually be used:he attribute table.
# Example: keep only relevant, simple attributes for cafes
cols_keep = ["name", "amenity", "geometry"]
gdf_clean = cafes_uiuc[cols_keep].copy()
Some attributes contain lists (e.g., multiple highway categories).
GIS formats such as shapefiles do not support list fields, so we convert them to strings.
Step 1. Identify which columns contain lists
gdf_clean = buildings_uiuc.copy()
# Find all columns where at least one value is a list
list_cols = [
col for col in gdf_clean.columns
if gdf_clean[col].apply(lambda x: isinstance(x, list)).any()
]
list_cols
Step 2. Flatten only those list-type columns
# Convert list values into semicolon-separated strings
for col in list_cols:
gdf_clean[col] = gdf_clean[col].apply(
lambda x: ";".join([str(v) for v in x]) if isinstance(x, list) else x
)
After this step, all list-type attributes become simple strings, making the GeoDataFrame compatible with common GIS export formats.
Duplicate geometries may appear in some OSM extracts (e.g., overlapping POIs or multi-part features). Removing duplicates ensures cleaner outputs and prevents double counting.
# Remove duplicate entries based on identical geometry
# This keeps only one instance of each spatial feature.
gdf_clean = gdf_clean.drop_duplicates(subset=["geometry"])
| Category | Function | What It Returns | Main Use Case | |
|---|---|---|---|---|
| Geocoding & Boundaries | ox.geocode_to_gdf(place_name) |
GeoDataFrame (polygon boundary) |
Get boundaries for a city, county, campus, or region | |
ox.geocode(place_name) |
(lat, lon) coordinates | Convert a place name to coordinates | ||
| Street Networks | ox.graph_from_place(place, network_type='drive') |
NetworkX MultiDiGraph |
Download street network (drive/walk/bike/all) for a named place | |
ox.graph_from_bbox(north, south, east, west, network_type) |
MultiDiGraph |
Extract network within bounding box | ||
ox.graph_from_point((lat, lon), dist, network_type) |
MultiDiGraph |
Get network within radius distance of a coordinate | ||
ox.graph_from_address(address, dist, network_type) |
MultiDiGraph |
Get network around a street address | ||
| Network Conversion | ox.graph_to_gdfs(G) |
(nodes_gdf, edges_gdf) |
Convert a graph into GeoDataFrames for nodes & edges | |
ox.save_graph_geopackage(G, filepath) |
Writes a .gpkg file | Save a graph to GIS-readable format | ||
| Network Plotting | ox.plot_graph(G) |
Matplotlib figure | Quick visualization of a network | |
ox.plot_graph_route(G, route) |
Figure | Plot a specific path on the network | ||
| Geometries & Features (POIs, land use, buildings) | ox.geometries_from_place(place, tags={...}) |
GeoDataFrame | Retrieve buildings, POIs, or land-use polygons by place name | |
ox.geometries_from_bbox(north, south, east, west, tags={...}) |
GeoDataFrame | Retrieve OSM features within a bounding box |
This notebook introduced the concepts of OpenStreetMap and demonstrated how to access, analyze, and visualize OSM data using OSMnx. We covered:
OSM data provides a flexible and open foundation for many geospatial tasks, from routing to urban analysis and mapping. With OSMnx, these datasets become accessible directly through Python, enabling reproducible workflows that integrate seamlessly with GeoPandas, NetworkX, and other geospatial libraries.
You may now explore the data further, integrate it into analyses, or build applications such as routing tools, accessibility measurements, land-use visualizations, or campus-scale mapping projects.