This project aims to conduct an analysis of heatwave patterns across the Chicago over the past lustrum (2020.10.1-2025.10.1). The research will focus on identifying spatial variations and intensity patterns of heatwave events using CyberGISX. The ultimate goal is to create interactive visualizations that effectively communicate heat risk patterns to support urban planning and public health preparedness.

The primary data will be sourced from the National Oceanic and Atmospheric Administration (NOAA) through its Global Historical Climatology Network (GHCN). I use daily maximum temperature collected by different stations as the main variable to analyze.

Due to the restrictions of the data resources on the size of the dataset, I will make multiple requests for data and combine the datasets from the same observation station into one before conducting further analysis. I will use the geospatial tools including folium, matplotlib, pandas and geopandas to conduct further analysis and comparison of different observation stations, and finally visualize the patterns and intensities of heat waves.

For now, I just request the data once from NOAA, and I can get the data of 2 years, from 2023.9.1 to 2025.9.1. I select one station to give an example of what I would do. I use 90°F as a heatwave threshold and draw a graph of daily maximum temperature. Also, I draw the map of Chicago using Folium.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import folium
In [2]:
df = pd.read_csv('weather_data.csv')
In [3]:
df_clean = df[df.STATION == "USC00111577"].reset_index()
In [4]:
plt.figure(figsize=(14, 6))
Out[4]:
<Figure size 1400x600 with 0 Axes>
<Figure size 1400x600 with 0 Axes>
In [5]:
plt.plot(df_clean.index, df_clean['TMAX'], 'b-', linewidth=1.0, alpha=0.8, label='Daily Max Temperature (°F)')
plt.axhline(y=90, color='red', linestyle='--', linewidth=2, alpha=0.8, label='Heatwave Threshold (90°F)')
plt.xlabel('Days from 2023.9.1')
plt.ylabel('Temperature (°F)')
plt.title('Maximum Temperature (TMAX) with Heatwave Threshold in Chicago')
plt.legend()
plt.grid(True, alpha=0.3)
In [55]:
def plot_chicago():
    chicago_center = [41.8781, -87.6298]
    m = folium.Map(location=chicago_center, zoom_start=10, tiles='OpenStreetMap')
    folium.Marker(
        chicago_center,
        popup='Chicago Downtown',
        tooltip='Click for more info',
        icon=folium.Icon(color='red', icon='info-sign')
    ).add_to(m)
    return m
In [56]:
plot_chicago()
Out[56]:
Make this Notebook Trusted to load map: File -> Trust Notebook

After deciding the variables and scope to focus on, now it is time to calculate the heatwave frequency. As I mentioned, I choose 90°F as a heatwave threshold. If the maximum temperature exceed 90°F for more 2 consecutive days, these days will be considered as having a heatwave. Since I can only get the data of maximuim temperature for 2 years at a time, I respectively request the data of 2020-2021, 2021-2023, and 2023-2025. Then I write a function to calculate the number of heatwave days of the each stations with effective data and record the result, presenting below.

In [38]:
df_01 = pd.read_csv('Tmax01.csv', na_values = ["NaN"])
df_01_clean = df_01.dropna()
df_01_clean.reset_index()
df_01_clean.NAME.unique()
df_01_s = df_01_clean.sort_values(["STATION", "DATE"]).reset_index(drop=True)
def count_hot_days(sub_df):
    temps = sub_df["TMAX"]
    mask = temps > 90
    groups = (mask != mask.shift()).cumsum()
    streak_lengths = mask.groupby(groups).sum()
    valid_groups = streak_lengths[streak_lengths >= 2].index
    total_days = mask.groupby(groups).sum().loc[valid_groups].sum()
    return total_days
result_01 = df_01_s.groupby("STATION").apply(count_hot_days).reset_index(name="NumDays")
print(result_01)
       STATION  NumDays
0  USC00111550        2
1  USC00111577       18
2  USC00115097        7
3  USC00115110        0
4  USC00116616        7
5  USC00117457        4
6  USW00004838       16
7  USW00014819       18
8  USW00094846       10
In [37]:
df_13 = pd.read_csv('Tmax13.csv', na_values = ["NaN"])
df_13_clean = df_13.dropna()
df_13_clean.reset_index()
df_13_clean.NAME.unique()
df_13_s = df_13_clean.sort_values(["STATION", "DATE"]).reset_index(drop=True)
result_13 = df_13_s.groupby("STATION").apply(count_hot_days).reset_index(name="NumDays")
print(result_13)
       STATION  NumDays
0  USC00111550        5
1  USC00111577       30
2  USC00115097       11
3  USC00116616       14
4  USC00117457       12
5  USW00004838       15
6  USW00014819       30
7  USW00094846       20
In [33]:
df_35 = pd.read_csv('Tmax35.csv', na_values = ["NaN"])
df_35_clean = df_35.dropna()
df_35_clean.reset_index()
df_35_clean.NAME.unique()
df_35_s = df_35_clean.sort_values(["STATION", "DATE"]).reset_index(drop=True)
result_35 = df_35_s.groupby("STATION").apply(count_hot_days).reset_index(name="NumDays")
print(result_35)
       STATION  NumDays
0  USC00111550       13
1  USC00111577       38
2  USC00115097       14
3  USC00116616       22
4  USC00117457       21
5  USW00004838       28
6  USW00014819       38
7  USW00094846       34

After getting the result of 8 stations of 2020-2021, 2021-2023, and 2023-2025, I add them together to get the total heatwave days of these stations in 5 years.

In [40]:
combined = pd.concat([result_01, result_13, result_35])
combined = combined[~combined["STATION"].str.contains("110")]
final = combined.groupby("STATION", as_index=False)["NumDays"].sum()
print(final)
       STATION  NumDays
0  USC00111550       20
1  USC00111577       86
2  USC00115097       32
3  USC00116616       43
4  USC00117457       37
5  USW00004838       59
6  USW00014819       86
7  USW00094846       64

After that, I use the original dataframe to find the latitude and longitude information for these effective stations, and merge the dataset.

In [49]:
coords = df[["STATION", "LATITUDE", "LONGITUDE"]].drop_duplicates().reset_index()
final_coords = coords[coords["STATION"].isin(final["STATION"])].reset_index()
final_coords
Out[49]:
level_0 index STATION LATITUDE LONGITUDE
0 4 256 USC00115097 41.81271 -88.07275
1 31 12778 USC00116616 41.49453 -87.67951
2 47 19743 USC00111550 41.85580 -87.60940
3 75 34805 USW00004838 42.12076 -87.90479
4 93 43937 USW00094846 41.96017 -87.93164
5 134 64373 USC00117457 41.60413 -88.08497
6 135 65095 USW00014819 41.78412 -87.75514
7 180 90538 USC00111577 41.73727 -87.77734

The heatwave index is simply calculated by dividing the number of heatwave days (heatwave frequency) by 1826, the total days from 2020.10.1 to 2025.10.1.

In [53]:
final_with_coords = final.merge(final_coords[["STATION", "LATITUDE", "LONGITUDE"]], on="STATION", how="left")
final_with_coords["heatwave_index"] = final_with_coords["NumDays"] / 1826
final_with_coords
Out[53]:
STATION NumDays LATITUDE LONGITUDE heatwave_index
0 USC00111550 20 41.85580 -87.60940 0.010953
1 USC00111577 86 41.73727 -87.77734 0.047097
2 USC00115097 32 41.81271 -88.07275 0.017525
3 USC00116616 43 41.49453 -87.67951 0.023549
4 USC00117457 37 41.60413 -88.08497 0.020263
5 USW00004838 59 42.12076 -87.90479 0.032311
6 USW00014819 86 41.78412 -87.75514 0.047097
7 USW00094846 64 41.96017 -87.93164 0.035049

At last, I connect the result with folium, and present the data on folium. I record the location of the stations on the map as seperate points, and when clicking the point, we can get the heatwave frequency and heatwave index for these stations for 2020-2025.

In [54]:
import folium

def plot_chicago(final_with_coords):
    chicago_center = [41.8781, -87.6298]
    m = folium.Map(location=chicago_center, zoom_start=10, tiles='OpenStreetMap')
    folium.Marker(
        chicago_center,
        popup='Chicago Downtown',
        tooltip='Click for more info',
        icon=folium.Icon(color='red', icon='info-sign')
    ).add_to(m)
    for _, row in final_with_coords.iterrows():
        folium.CircleMarker(
            location=[row["LATITUDE"], row["LONGITUDE"]],
            radius=8,
            color='blue',
            fill=True,
            fill_color='orange',
            fill_opacity=0.7,
            popup=(
                f"<b>Station:</b> {row['STATION']}<br>"
                f"<b>Heatwave Index:</b> {row['heatwave_index']:.4f}<br>"
                f"<b>NumDays:</b> {row['NumDays']}"
            ),
            tooltip=f"{row['STATION']}"
        ).add_to(m)
    return m
m = plot_chicago(final_with_coords)
m
Out[54]:
Make this Notebook Trusted to load map: File -> Trust Notebook

We can see that the stations that are closer to Chicago downtown has higher heatwave frequency, while the stations in suburban areas has lower heatwave frequency. The station with index USC00111550 is an exception, maybe because it is very close to the Michigan Lake.