This project aims to conduct an analysis of heatwave patterns across the Chicago over the past lustrum (2020.10.1-2025.10.1). The research will focus on identifying spatial variations and intensity patterns of heatwave events using CyberGISX. The ultimate goal is to create interactive visualizations that effectively communicate heat risk patterns to support urban planning and public health preparedness.¶

The primary data will be sourced from the National Oceanic and Atmospheric Administration (NOAA) through its Global Historical Climatology Network (GHCN). I use daily maximum temperature collected by different stations as the main variable to analyze.¶

Due to the restrictions of the data resources on the size of the dataset, I will make multiple requests for data and combine the datasets from the same observation station into one before conducting further analysis. I will use the geospatial tools including folium, matplotlib, pandas and geopandas to conduct further analysis and comparison of different observation stations, and finally visualize the patterns and intensities of heat waves.¶

For now, I just request the data once from NOAA, and I can get the data of 2 years, from 2023.9.1 to 2025.9.1. I select one station to give an example of what I would do. I use 90°F as a heatwave threshold and draw a graph of daily maximum temperature. Also, I draw the map of Chicago using Folium.¶

import pandas as pd
import matplotlib.pyplot as plt
import folium

df = pd.read_csv('weather_data.csv')

df_clean = df[df.STATION == "USC00111577"].reset_index()

plt.figure(figsize=(14, 6))

<Figure size 1400x600 with 0 Axes>

<Figure size 1400x600 with 0 Axes>

plt.plot(df_clean.index, df_clean['TMAX'], 'b-', linewidth=1.0, alpha=0.8, label='Daily Max Temperature (°F)')
plt.axhline(y=90, color='red', linestyle='--', linewidth=2, alpha=0.8, label='Heatwave Threshold (90°F)')
plt.xlabel('Days from 2023.9.1')
plt.ylabel('Temperature (°F)')
plt.title('Maximum Temperature (TMAX) with Heatwave Threshold in Chicago')
plt.legend()
plt.grid(True, alpha=0.3)

def plot_chicago():
    chicago_center = [41.8781, -87.6298]
    m = folium.Map(location=chicago_center, zoom_start=10, tiles='OpenStreetMap')
    folium.Marker(
        chicago_center,
        popup='Chicago Downtown',
        tooltip='Click for more info',
        icon=folium.Icon(color='red', icon='info-sign')
    ).add_to(m)
    return m

plot_chicago()

After deciding the variables and scope to focus on, now it is time to calculate the heatwave frequency. As I mentioned, I choose 90°F as a heatwave threshold. If the maximum temperature exceed 90°F for more 2 consecutive days, these days will be considered as having a heatwave. Since I can only get the data of maximuim temperature for 2 years at a time, I respectively request the data of 2020-2021, 2021-2023, and 2023-2025. Then I write a function to calculate the number of heatwave days of the each stations with effective data and record the result, presenting below.¶

df_01 = pd.read_csv('Tmax01.csv', na_values = ["NaN"])
df_01_clean = df_01.dropna()
df_01_clean.reset_index()
df_01_clean.NAME.unique()
df_01_s = df_01_clean.sort_values(["STATION", "DATE"]).reset_index(drop=True)
def count_hot_days(sub_df):
    temps = sub_df["TMAX"]
    mask = temps > 90
    groups = (mask != mask.shift()).cumsum()
    streak_lengths = mask.groupby(groups).sum()
    valid_groups = streak_lengths[streak_lengths >= 2].index
    total_days = mask.groupby(groups).sum().loc[valid_groups].sum()
    return total_days
result_01 = df_01_s.groupby("STATION").apply(count_hot_days).reset_index(name="NumDays")
print(result_01)

       STATION  NumDays
0  USC00111550        2
1  USC00111577       18
2  USC00115097        7
3  USC00115110        0
4  USC00116616        7
5  USC00117457        4
6  USW00004838       16
7  USW00014819       18
8  USW00094846       10

df_13 = pd.read_csv('Tmax13.csv', na_values = ["NaN"])
df_13_clean = df_13.dropna()
df_13_clean.reset_index()
df_13_clean.NAME.unique()
df_13_s = df_13_clean.sort_values(["STATION", "DATE"]).reset_index(drop=True)
result_13 = df_13_s.groupby("STATION").apply(count_hot_days).reset_index(name="NumDays")
print(result_13)

       STATION  NumDays
0  USC00111550        5
1  USC00111577       30
2  USC00115097       11
3  USC00116616       14
4  USC00117457       12
5  USW00004838       15
6  USW00014819       30
7  USW00094846       20

df_35 = pd.read_csv('Tmax35.csv', na_values = ["NaN"])
df_35_clean = df_35.dropna()
df_35_clean.reset_index()
df_35_clean.NAME.unique()
df_35_s = df_35_clean.sort_values(["STATION", "DATE"]).reset_index(drop=True)
result_35 = df_35_s.groupby("STATION").apply(count_hot_days).reset_index(name="NumDays")
print(result_35)

       STATION  NumDays
0  USC00111550       13
1  USC00111577       38
2  USC00115097       14
3  USC00116616       22
4  USC00117457       21
5  USW00004838       28
6  USW00014819       38
7  USW00094846       34

After getting the result of 8 stations of 2020-2021, 2021-2023, and 2023-2025, I add them together to get the total heatwave days of these stations in 5 years.¶

combined = pd.concat([result_01, result_13, result_35])
combined = combined[~combined["STATION"].str.contains("110")]
final = combined.groupby("STATION", as_index=False)["NumDays"].sum()
print(final)

       STATION  NumDays
0  USC00111550       20
1  USC00111577       86
2  USC00115097       32
3  USC00116616       43
4  USC00117457       37
5  USW00004838       59
6  USW00014819       86
7  USW00094846       64

After that, I use the original dataframe to find the latitude and longitude information for these effective stations, and merge the dataset.¶

coords = df[["STATION", "LATITUDE", "LONGITUDE"]].drop_duplicates().reset_index()
final_coords = coords[coords["STATION"].isin(final["STATION"])].reset_index()
final_coords

The heatwave index is simply calculated by dividing the number of heatwave days (heatwave frequency) by 1826, the total days from 2020.10.1 to 2025.10.1.¶

final_with_coords = final.merge(final_coords[["STATION", "LATITUDE", "LONGITUDE"]], on="STATION", how="left")
final_with_coords["heatwave_index"] = final_with_coords["NumDays"] / 1826
final_with_coords

At last, I connect the result with folium, and present the data on folium. I record the location of the stations on the map as seperate points, and when clicking the point, we can get the heatwave frequency and heatwave index for these stations for 2020-2025.¶

import folium

def plot_chicago(final_with_coords):
    chicago_center = [41.8781, -87.6298]
    m = folium.Map(location=chicago_center, zoom_start=10, tiles='OpenStreetMap')
    folium.Marker(
        chicago_center,
        popup='Chicago Downtown',
        tooltip='Click for more info',
        icon=folium.Icon(color='red', icon='info-sign')
    ).add_to(m)
    for _, row in final_with_coords.iterrows():
        folium.CircleMarker(
            location=[row["LATITUDE"], row["LONGITUDE"]],
            radius=8,
            color='blue',
            fill=True,
            fill_color='orange',
            fill_opacity=0.7,
            popup=(
                f"<b>Station:</b> {row['STATION']}<br>"
                f"<b>Heatwave Index:</b> {row['heatwave_index']:.4f}<br>"
                f"<b>NumDays:</b> {row['NumDays']}"
            ),
            tooltip=f"{row['STATION']}"
        ).add_to(m)
    return m
m = plot_chicago(final_with_coords)
m

We can see that the stations that are closer to Chicago downtown has higher heatwave frequency, while the stations in suburban areas has lower heatwave frequency. The station with index USC00111550 is an exception, maybe because it is very close to the Michigan Lake.¶

	level_0	index	STATION	LATITUDE	LONGITUDE
0	4	256	USC00115097	41.81271	-88.07275
1	31	12778	USC00116616	41.49453	-87.67951
2	47	19743	USC00111550	41.85580	-87.60940
3	75	34805	USW00004838	42.12076	-87.90479
4	93	43937	USW00094846	41.96017	-87.93164
5	134	64373	USC00117457	41.60413	-88.08497
6	135	65095	USW00014819	41.78412	-87.75514
7	180	90538	USC00111577	41.73727	-87.77734

	STATION	NumDays	LATITUDE	LONGITUDE	heatwave_index
0	USC00111550	20	41.85580	-87.60940	0.010953
1	USC00111577	86	41.73727	-87.77734	0.047097
2	USC00115097	32	41.81271	-88.07275	0.017525
3	USC00116616	43	41.49453	-87.67951	0.023549
4	USC00117457	37	41.60413	-88.08497	0.020263
5	USW00004838	59	42.12076	-87.90479	0.032311
6	USW00014819	86	41.78412	-87.75514	0.047097
7	USW00094846	64	41.96017	-87.93164	0.035049