Generate Time Series News Reporting for Earthquake Events via GDELT

This repo will show you how to generate time series news reporting for earthquake events via GDELT. The GDELT Project is a real-time network diagram and database of global human society for open research. GDELT monitors the world's news media from nearly every corner of every country in print, broadcast, and web formats, in over 100 languages, every moment of every day.

Eathquake event data is collected from The International Disaster Database (EM-DAT). To reduce the data size, this notebook only focus on sample data containing 3 earthquake events.

In [1]:
# Data structure:  
# .  
# ├── bigquery.json // your own Google BigQuery API key  
# ├── data  
# │   ├── countryinfo2.csv // country information  
# │   ├── earthquake_sample.csv // sample data containing 3 earthquake events 
# │   ├── earthquake_gdelt.csv // sample gdelt bigquery data containing 3 earthquake events 
# │   └── sourcesbycountry2018.csv // news sources by country  
# ├── EQNews_TS_generation.ipynb // Jupyter notebook  
# └── README.md  
In [2]:
import os
import pandas as pd
import numpy as np
from datetime import timedelta

Step 0: Generate FIPS for Earthquake Events

In [3]:
# read earthquake event sample data
df_event = pd.read_csv('data/earthquake_sample.csv')
df_event
Out[3]:
Event_wiki_page UTC Country Location_iso3 Latitude Longitude Magnitude Depth (km) MMI Death_wiki Location_fips Event_id UTC_round
0 2024 Noto earthquake 2024-01-01 07:10:09 Japan JPN 37.495 137.265 7.5 10.0 XI (Extreme) 339 JA 240101_Japan 2024-01-01 07:15:00
1 2024 Hualien earthquake 2024-04-02 23:58:11 Taiwan TWN 23.819 121.562 7.4 40.0 VIII (Severe) 18 TW 240402_Taiwan 2024-04-03 00:00:00
2 2023 Al Haouz earthquake 2023-09-08 22:11:01 Morocco MAR 31.058 -8.385 6.8 19.0 IX (Violent) 2960 MO 230908_Morocco 2023-09-08 22:15:00
In [4]:
# read country information
country_info = pd.read_csv('data/countryinfo2.csv')
# construct a iso3 to fips mapping
iso3_to_fips = pd.Series(country_info.fips.values, index=country_info.iso3).to_dict()
In [5]:
# Earthquake affected counrties, join by '|'
print(df_event.Location_fips.str.split(',').explode().unique().tolist())
['JA', 'TW', 'MO']

Step 1: BigQuery GDELT data

In [6]:
!pip install --upgrade 'google-cloud-bigquery[bqstorage,pandas]'
!pip install --upgrade google-cloud-storage
Requirement already satisfied: google-cloud-bigquery[bqstorage,pandas] in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (3.25.0)
Collecting google-cloud-bigquery[bqstorage,pandas]
  Downloading google_cloud_bigquery-3.30.0-py2.py3-none-any.whl.metadata (7.9 kB)
Requirement already satisfied: google-api-core<3.0.0dev,>=2.11.1 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from google-api-core[grpc]<3.0.0dev,>=2.11.1->google-cloud-bigquery[bqstorage,pandas]) (2.19.1)
Requirement already satisfied: google-auth<3.0.0dev,>=2.14.1 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from google-cloud-bigquery[bqstorage,pandas]) (2.27.0)
Requirement already satisfied: google-cloud-core<3.0.0dev,>=2.4.1 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from google-cloud-bigquery[bqstorage,pandas]) (2.4.1)
Requirement already satisfied: google-resumable-media<3.0dev,>=2.0.0 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from google-cloud-bigquery[bqstorage,pandas]) (2.7.2)
Requirement already satisfied: packaging>=20.0.0 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from google-cloud-bigquery[bqstorage,pandas]) (23.2)
Requirement already satisfied: python-dateutil<3.0dev,>=2.7.3 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from google-cloud-bigquery[bqstorage,pandas]) (2.8.2)
Requirement already satisfied: requests<3.0.0dev,>=2.21.0 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from google-cloud-bigquery[bqstorage,pandas]) (2.31.0)
Requirement already satisfied: google-cloud-bigquery-storage<3.0.0dev,>=2.6.0 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from google-cloud-bigquery[bqstorage,pandas]) (2.22.0)
Requirement already satisfied: grpcio<2.0dev,>=1.47.0 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from google-cloud-bigquery[bqstorage,pandas]) (1.59.3)
Requirement already satisfied: pyarrow>=3.0.0 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from google-cloud-bigquery[bqstorage,pandas]) (14.0.1)
Requirement already satisfied: pandas>=1.1.0 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from google-cloud-bigquery[bqstorage,pandas]) (2.1.3)
Requirement already satisfied: db-dtypes<2.0.0dev,>=0.3.0 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from google-cloud-bigquery[bqstorage,pandas]) (1.1.1)
Collecting pandas-gbq>=0.26.1 (from google-cloud-bigquery[bqstorage,pandas])
  Downloading pandas_gbq-0.28.0-py2.py3-none-any.whl.metadata (3.3 kB)
Requirement already satisfied: numpy>=1.16.6 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from db-dtypes<2.0.0dev,>=0.3.0->google-cloud-bigquery[bqstorage,pandas]) (1.25.2)
Requirement already satisfied: googleapis-common-protos<2.0.dev0,>=1.56.2 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from google-api-core<3.0.0dev,>=2.11.1->google-api-core[grpc]<3.0.0dev,>=2.11.1->google-cloud-bigquery[bqstorage,pandas]) (1.61.0)
Requirement already satisfied: protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<6.0.0.dev0,>=3.19.5 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from google-api-core<3.0.0dev,>=2.11.1->google-api-core[grpc]<3.0.0dev,>=2.11.1->google-cloud-bigquery[bqstorage,pandas]) (4.25.1)
Requirement already satisfied: proto-plus<2.0.0dev,>=1.22.3 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from google-api-core<3.0.0dev,>=2.11.1->google-api-core[grpc]<3.0.0dev,>=2.11.1->google-cloud-bigquery[bqstorage,pandas]) (1.22.3)
Requirement already satisfied: grpcio-status<2.0.dev0,>=1.33.2 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from google-api-core[grpc]<3.0.0dev,>=2.11.1->google-cloud-bigquery[bqstorage,pandas]) (1.59.3)
Requirement already satisfied: cachetools<6.0,>=2.0.0 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from google-auth<3.0.0dev,>=2.14.1->google-cloud-bigquery[bqstorage,pandas]) (5.3.2)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from google-auth<3.0.0dev,>=2.14.1->google-cloud-bigquery[bqstorage,pandas]) (0.3.0)
Requirement already satisfied: rsa<5,>=3.1.4 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from google-auth<3.0.0dev,>=2.14.1->google-cloud-bigquery[bqstorage,pandas]) (4.9)
Requirement already satisfied: google-crc32c<2.0dev,>=1.0 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from google-resumable-media<3.0dev,>=2.0.0->google-cloud-bigquery[bqstorage,pandas]) (1.5.0)
Requirement already satisfied: pytz>=2020.1 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from pandas>=1.1.0->google-cloud-bigquery[bqstorage,pandas]) (2023.3.post1)
Requirement already satisfied: tzdata>=2022.1 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from pandas>=1.1.0->google-cloud-bigquery[bqstorage,pandas]) (2023.3)
Requirement already satisfied: setuptools in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from pandas-gbq>=0.26.1->google-cloud-bigquery[bqstorage,pandas]) (68.0.0)
Collecting pydata-google-auth>=1.5.0 (from pandas-gbq>=0.26.1->google-cloud-bigquery[bqstorage,pandas])
  Downloading pydata_google_auth-1.9.1-py2.py3-none-any.whl.metadata (2.8 kB)
Collecting google-auth-oauthlib>=0.7.0 (from pandas-gbq>=0.26.1->google-cloud-bigquery[bqstorage,pandas])
  Downloading google_auth_oauthlib-1.2.1-py2.py3-none-any.whl.metadata (2.7 kB)
Requirement already satisfied: six>=1.5 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from python-dateutil<3.0dev,>=2.7.3->google-cloud-bigquery[bqstorage,pandas]) (1.16.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from requests<3.0.0dev,>=2.21.0->google-cloud-bigquery[bqstorage,pandas]) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from requests<3.0.0dev,>=2.21.0->google-cloud-bigquery[bqstorage,pandas]) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from requests<3.0.0dev,>=2.21.0->google-cloud-bigquery[bqstorage,pandas]) (2.0.7)
Requirement already satisfied: certifi>=2017.4.17 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from requests<3.0.0dev,>=2.21.0->google-cloud-bigquery[bqstorage,pandas]) (2023.7.22)
Collecting requests-oauthlib>=0.7.0 (from google-auth-oauthlib>=0.7.0->pandas-gbq>=0.26.1->google-cloud-bigquery[bqstorage,pandas])
  Downloading requests_oauthlib-2.0.0-py2.py3-none-any.whl.metadata (11 kB)
Requirement already satisfied: pyasn1<0.6.0,>=0.4.6 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from pyasn1-modules>=0.2.1->google-auth<3.0.0dev,>=2.14.1->google-cloud-bigquery[bqstorage,pandas]) (0.5.0)
Collecting oauthlib>=3.0.0 (from requests-oauthlib>=0.7.0->google-auth-oauthlib>=0.7.0->pandas-gbq>=0.26.1->google-cloud-bigquery[bqstorage,pandas])
  Downloading oauthlib-3.2.2-py3-none-any.whl.metadata (7.5 kB)
Downloading pandas_gbq-0.28.0-py2.py3-none-any.whl (37 kB)
Downloading google_cloud_bigquery-3.30.0-py2.py3-none-any.whl (247 kB)
Downloading google_auth_oauthlib-1.2.1-py2.py3-none-any.whl (24 kB)
Downloading pydata_google_auth-1.9.1-py2.py3-none-any.whl (15 kB)
Downloading requests_oauthlib-2.0.0-py2.py3-none-any.whl (24 kB)
Downloading oauthlib-3.2.2-py3-none-any.whl (151 kB)
Installing collected packages: oauthlib, requests-oauthlib, google-auth-oauthlib, pydata-google-auth, google-cloud-bigquery, pandas-gbq
  Attempting uninstall: google-cloud-bigquery
    Found existing installation: google-cloud-bigquery 3.25.0
    Uninstalling google-cloud-bigquery-3.25.0:
      Successfully uninstalled google-cloud-bigquery-3.25.0
Successfully installed google-auth-oauthlib-1.2.1 google-cloud-bigquery-3.30.0 oauthlib-3.2.2 pandas-gbq-0.28.0 pydata-google-auth-1.9.1 requests-oauthlib-2.0.0
Requirement already satisfied: google-cloud-storage in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (2.18.2)
Collecting google-cloud-storage
  Downloading google_cloud_storage-3.1.0-py2.py3-none-any.whl.metadata (12 kB)
Requirement already satisfied: google-auth<3.0dev,>=2.26.1 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from google-cloud-storage) (2.27.0)
Requirement already satisfied: google-api-core<3.0.0dev,>=2.15.0 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from google-cloud-storage) (2.19.1)
Collecting google-cloud-core<3.0dev,>=2.4.2 (from google-cloud-storage)
  Downloading google_cloud_core-2.4.3-py2.py3-none-any.whl.metadata (2.7 kB)
Requirement already satisfied: google-resumable-media>=2.7.2 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from google-cloud-storage) (2.7.2)
Requirement already satisfied: requests<3.0.0dev,>=2.18.0 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from google-cloud-storage) (2.31.0)
Requirement already satisfied: google-crc32c<2.0dev,>=1.0 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from google-cloud-storage) (1.5.0)
Requirement already satisfied: googleapis-common-protos<2.0.dev0,>=1.56.2 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage) (1.61.0)
Requirement already satisfied: protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<6.0.0.dev0,>=3.19.5 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage) (4.25.1)
Requirement already satisfied: proto-plus<2.0.0dev,>=1.22.3 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from google-api-core<3.0.0dev,>=2.15.0->google-cloud-storage) (1.22.3)
Requirement already satisfied: cachetools<6.0,>=2.0.0 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from google-auth<3.0dev,>=2.26.1->google-cloud-storage) (5.3.2)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from google-auth<3.0dev,>=2.26.1->google-cloud-storage) (0.3.0)
Requirement already satisfied: rsa<5,>=3.1.4 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from google-auth<3.0dev,>=2.26.1->google-cloud-storage) (4.9)
Requirement already satisfied: charset-normalizer<4,>=2 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from requests<3.0.0dev,>=2.18.0->google-cloud-storage) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from requests<3.0.0dev,>=2.18.0->google-cloud-storage) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from requests<3.0.0dev,>=2.18.0->google-cloud-storage) (2.0.7)
Requirement already satisfied: certifi>=2017.4.17 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from requests<3.0.0dev,>=2.18.0->google-cloud-storage) (2023.7.22)
Requirement already satisfied: pyasn1<0.6.0,>=0.4.6 in /home/weih9/miniconda3/envs/metagdelt/lib/python3.10/site-packages (from pyasn1-modules>=0.2.1->google-auth<3.0dev,>=2.26.1->google-cloud-storage) (0.5.0)
Downloading google_cloud_storage-3.1.0-py2.py3-none-any.whl (174 kB)
Downloading google_cloud_core-2.4.3-py2.py3-none-any.whl (29 kB)
Installing collected packages: google-cloud-core, google-cloud-storage
  Attempting uninstall: google-cloud-core
    Found existing installation: google-cloud-core 2.4.1
    Uninstalling google-cloud-core-2.4.1:
      Successfully uninstalled google-cloud-core-2.4.1
  Attempting uninstall: google-cloud-storage
    Found existing installation: google-cloud-storage 2.18.2
    Uninstalling google-cloud-storage-2.18.2:
      Successfully uninstalled google-cloud-storage-2.18.2
Successfully installed google-cloud-core-2.4.3 google-cloud-storage-3.1.0
In [7]:
from google.cloud import bigquery
from google.cloud import storage

# Setting Google application credentials
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "bigquery.json"
In [8]:
# Initialize BigQuery client
bq_client = bigquery.Client()

# Test: Run a simple query
query = """
SELECT
    gkg.GKGRECORDID,
    gkg.DATE,
    gkg.SourceCommonName,
    ARRAY_TO_STRING(
        ARRAY(
            SELECT
                DISTINCT UPPER(TRIM(SPLIT(location, '#')[OFFSET(2)]))  -- Extract and uppercase the FIPS code
            FROM
                UNNEST(SPLIT(gkg.V2Locations, ';')) AS location
            WHERE
                UPPER(TRIM(SPLIT(location, '#')[OFFSET(2)])) IN ['JA', 'TW', 'MO']
        ), ','
    ) AS V2Locations_FIPS
FROM
    `gdelt-bq.gdeltv2.gkg_partitioned` gkg
WHERE
    _PARTITIONTIME BETWEEN TIMESTAMP("2023-09-07") AND TIMESTAMP("2024-08-17")
    AND gkg.DATE >= 20230907000000
    AND REGEXP_CONTAINS(gkg.V2Themes, r'(?i)EARTHQUAKE.*EARTHQUAKE')
    AND ARRAY_LENGTH(
        ARRAY(
            SELECT
                DISTINCT UPPER(TRIM(SPLIT(location, '#')[OFFSET(2)]))  -- Ensure FIPS codes match case-insensitively
            FROM
                UNNEST(SPLIT(gkg.V2Locations, ';')) AS location
            WHERE
                UPPER(TRIM(SPLIT(location, '#')[OFFSET(2)])) IN ['JA', 'TW', 'MO']
        )
    ) > 0;
"""
In [9]:
# Run the query once
# Uncomment to run
# query_job = bq_client.query(query)
# results = query_job.to_dataframe()
# results.to_csv('data/earthquake_gdelt.csv', index=False, sep=',')

Step 2: Match Sources Country by sourcesbycountry2018

In this step, we show how to match sources country by sourcesbycountry2018. The sourcesbycountry2018 table is a list of all known news sources in the GDELT Global Knowledge Graph, along with their country of focus. The table is generated based on assumption that the country of focus of a news source is the country in which it is headquartered.

In [10]:
df_2018 = pd.read_csv('data/sourcesbycountry2018.csv', sep='\t')
results = pd.read_csv('data/earthquake_gdelt.csv')
In [11]:
# add column names sources_FIPS to results join by df_2018 Domain on SourceCommonName
results['sources_FIPS'] = results['SourceCommonName'].map(df_2018.set_index('Domain')['FIPS'])

# filter out rows with sources_FIPS is null
results = results[results['sources_FIPS'].notnull()]
results_count = len(results)

# drop SourceCommonName
results.drop(columns=['SourceCommonName'], inplace=True)

# print step 2 statistics
print(f'News records dropped with invalid or null FIPS: {results_count - len(results)} (% of total: {len(results) / results_count * 100:.2f}%)')
News records dropped with invalid or null FIPS: 0 (% of total: 100.00%)
In [12]:
results.to_csv('data/earthquake_gdelt_addsourcefips.csv', index=False, sep=',')

Step 3: Generate Time Series based on GDELT data

In this step, we show how to generate time series news reporting for earthquake events via GDELT. And we also show how quickly the earthquake events can be reported. The temporal resolution of the time series is 15 mins, and time series can be generate at resolution of an integer times 15 mins, such as 30 mins, 1 hour, 1 day, etc.

In [13]:
df_gdelts = pd.read_csv('data/earthquake_gdelt_addsourcefips.csv')
df_event = pd.read_csv('data/earthquake_sample.csv')
In [14]:
df_event['UTC'] = pd.to_datetime(df_event['UTC'])
df_event['UTC_round'] = pd.to_datetime(df_event['UTC_round'])
df_gdelts['DATE'] = pd.to_datetime(df_gdelts['DATE'], format='%Y%m%d%H%M%S')
In [15]:
df_gdelts.head()
Out[15]:
GKGRECORDID DATE V2Locations_FIPS sources_FIPS
0 20240609034500-T479 2024-06-09 03:45:00 TU,TW CH
1 20231010010000-1055 2023-10-10 01:00:00 SY,TU,JA PK
2 20231010030000-148 2023-10-10 03:00:00 MO,SY,TU US
3 20240204233000-T1042 2024-02-04 23:30:00 TW,TU CH
4 20240204184500-T1719 2024-02-04 18:45:00 JA,SY,TU EG
In [16]:
# check if V2Locations_FIPS is NaN
df_gdelts['V2Locations_FIPS'].isnull().sum()
Out[16]:
0
In [17]:
df_event.head()
Out[17]:
Event_wiki_page UTC Country Location_iso3 Latitude Longitude Magnitude Depth (km) MMI Death_wiki Location_fips Event_id UTC_round
0 2024 Noto earthquake 2024-01-01 07:10:09 Japan JPN 37.495 137.265 7.5 10.0 XI (Extreme) 339 JA 240101_Japan 2024-01-01 07:15:00
1 2024 Hualien earthquake 2024-04-02 23:58:11 Taiwan TWN 23.819 121.562 7.4 40.0 VIII (Severe) 18 TW 240402_Taiwan 2024-04-03 00:00:00
2 2023 Al Haouz earthquake 2023-09-08 22:11:01 Morocco MAR 31.058 -8.385 6.8 19.0 IX (Violent) 2960 MO 230908_Morocco 2023-09-08 22:15:00
In [ ]:
def generate_ts_df(df_event, df_gdelts, start_shift='1D', end_shift='21D', ts_interval='1D', quick_report=['1h', '3h']):
    # Initialize an empty list to store results
    results = []

    # Create timedelta objects for shifts and intervals
    ts_interval_timedelta = pd.to_timedelta(ts_interval)
    start_shift_timedelta = pd.to_timedelta(start_shift)
    end_shift_timedelta = pd.to_timedelta(end_shift)

    # Iterate over each event in df_event
    for _, event in df_event.iterrows():
        event_id = event['Event_id']
        event_fips = set(event['Location_fips'].split(','))

        # Filter df_gdelts for the full time series generation (including first report time)
        ts_gdelts = df_gdelts[
            (df_gdelts['DATE'] >= event['UTC_round'] - start_shift_timedelta) & 
            (df_gdelts['DATE'] <= event['UTC_round'] + end_shift_timedelta)
        ]

        # Further filter based on relevant FIPS codes and ensure FIPS codes are valid
        ts_gdelts = ts_gdelts[
            ts_gdelts['V2Locations_FIPS'].apply(lambda x: bool(event_fips.intersection(set(x.split(',')))))
        ].copy()

        # Skip to the next event if there are no reports in the current window
        if ts_gdelts.empty:
            continue

        # Calculate the first report time for each reporting country
        # the first report time is after event['UTC_round'] 
        first_report_times = ts_gdelts[ts_gdelts['DATE'] > event['UTC_round']].groupby('sources_FIPS')['DATE'].min().reset_index()
        first_report_times.columns = ['report_country_FIP', 'first_report_time']

        # Initialize a dictionary to store quick report counts for different intervals
        quick_report_counts_dict = {}

        # Process each quick_report interval
        for quick_interval in quick_report:
            quick_report_timedelta = pd.to_timedelta(quick_interval)

            # Filter df_gdelts based on the quick report date range
            quick_report_gdelts = ts_gdelts[
                (ts_gdelts['DATE'] >= event['UTC_round']) & 
                (ts_gdelts['DATE'] < event['UTC_round'] + quick_report_timedelta)
            ]

            # Calculate the quick report count for this event and interval
            quick_report_counts = quick_report_gdelts.groupby('sources_FIPS')['DATE'].count().reset_index()
            quick_report_counts.columns = ['report_country_FIP', f'report_{quick_interval}']

            # Store the quick report counts for the current interval
            quick_report_counts_dict[quick_interval] = quick_report_counts

        # Group by the reporting country for time series
        grouped_gdelts = ts_gdelts.groupby('sources_FIPS')

        for report_country, group in grouped_gdelts:
            # Find the first report time for this report_country
            first_report_time = first_report_times[first_report_times['report_country_FIP'] == report_country]
            if not first_report_time.empty:
                first_report_time = first_report_time['first_report_time'].values[0]
            else:
                continue  # Skip this country if there's no valid first report time

            ts_start = event['UTC_round'] - start_shift_timedelta
            ts_end = event['UTC_round'] + end_shift_timedelta
            
            # Create a time series range
            time_range = pd.date_range(start=ts_start, end=ts_end, freq=ts_interval_timedelta)

            # Efficiently calculate reports within each time interval
            # [start, end)
            ts_array = group['DATE'].groupby(pd.cut(group['DATE'], time_range, right=False, include_lowest=True), observed=False).size().tolist()

            # Initialize quick report counts for this report_country
            quick_report_results = {}
            for quick_interval in quick_report:
                quick_report_count = quick_report_counts_dict[quick_interval]
                quick_report_count_for_country = quick_report_count[quick_report_count['report_country_FIP'] == report_country]
                
                if not quick_report_count_for_country.empty:
                    quick_report_results[f'report_{quick_interval}'] = quick_report_count_for_country[f'report_{quick_interval}'].values[0]
                else:
                    quick_report_results[f'report_{quick_interval}'] = 0

            # Add results for this country and event to the final result list
            result = {
                'Event_id': event_id,
                'Location_fips': event['Location_fips'],
                'UTC': event['UTC'],
                'UTC_round': event['UTC_round'],
                'report_country_FIP': report_country,
                'first_report_time': first_report_time,
                'TS_start': ts_start,
                'TS_end': ts_end,
                'TS_interval': ts_interval,
                'TS_array': ts_array
            }

            # Add quick report results to the result dictionary
            result.update(quick_report_results)

            results.append(result)

        print(f"Event {event_id} processed.")
    
    # Convert results list to DataFrame
    final_df = pd.DataFrame(results)
    return final_df
In [19]:
result_df = generate_ts_df(df_event, df_gdelts)
result_df.to_csv('data/earthquake_ts_1D_21D_1D.csv', index=False, sep=',')
Event 240101_Japan processed.
Event 240402_Taiwan processed.
Event 230908_Morocco processed.