GGIS 527

Lab 7 Analyzing Geo-Text Data with Natural Language Processing (NLP) Tools

Developed by Zhaonan Wang in Fall 2023

In this lab, you will go through the data wrangling process with both types of geo-text data, with explicit geo-tags or implicit location mentions within text.

  • Explicit geo-text dataset: 2145 business located in Illinois side of St. Louis, derived from Yelp Academic Dataset. Each data record also comes with a user review, in plain text. Refer here for data format of utilized business and review data. Your task is to perform sentiment analysis on each review and map the polarity score onto a map.
  • Implicit geo-text dataset: news reports usually mention various locations, like countries, states, and even local toponyms (place names). In this notebook, you will play with a toy corpus containing three chunks of online news about some dam failure events. Your task to extract location mentions buried in the unstructured text.

Explicit Geo-Text Data Analysis

In [1]:
import pandas as pd

# read prepared yelp data
yelp_data = pd.read_csv('./data/yelp_STL_IL.csv')

# check data
print(yelp_data.shape)
yelp_data.head()

# we are majorly interested in column 'text' with geotag ['latitude', longitude]
(2145, 25)
Out[1]:
Unnamed: 0 Unnamed: 0_x business_id name address city state postal_code latitude longitude ... hours Unnamed: 0_y review_id user_id stars_y useful funny cool text date
0 0 38 LcAozWCMLGjwRbokaJAKMg Edwardsville Children's Museum 722 Holyoake Rd Edwardsville IL 62025 38.804395 -89.949733 ... {'Monday': '10:0-15:0', 'Tuesday': '9:30-14:0'... 313 LfsU2lVUr1-pC802v0o32A mRgAqvxz9jHYpm8ccIjZUQ 5.0 0 0 0 Place rocks excellent children's activities an... 2016-07-04 20:56:17
1 13 41 ljxNT9p0y7YMPx0fcNBGig Tony's Restaurant & 3rd Street Cafe 312 Piasa St Alton IL 62002 38.896563 -90.186203 ... {'Monday': '0:0-0:0', 'Tuesday': '16:0-21:30',... 20 uiqzlDEsUN_y1awEw_HHDA qmQPWMV_YYmwV2DyvmIDYQ 5.0 0 0 0 We had been driving around for some time, on a... 2018-07-17 01:07:49
2 118 48 bCBPXIVfVzBZBEpFu29dcg All In Shipping 5343 Belleville Crossing St Belleville IL 62226 38.517586 -90.021929 ... NaN 1378 oZqb2LRrJFaEjTz9ETzpPA BHrWZS0J0FuJuLqeNk6J7w 5.0 0 0 0 I love this little local business. They have e... 2017-01-20 14:13:47
3 123 86 sE6jSnvMts_MAn-b4OkMAw K-9 Groom Room 820 Industrial Dr Troy IL 62294 38.716244 -89.885830 ... {'Monday': '8:0-16:0', 'Tuesday': '8:0-16:0', ... 194 UjBwlySBW4iPpFWGOw5Xkw SE85OT0FKxeL28izk-5POg 4.0 3 0 0 This is another great local business. Our two... 2011-03-25 17:36:39
4 128 102 EuRGgOwJ0g1vTj2R04j37Q Crafty Crab 51 Ludwig Dr Fairview Heights IL 62208 38.601298 -89.989683 ... {'Monday': '12:0-22:0', 'Tuesday': '12:0-22:0'... 3261 DrWMCBMRweRydBEk-OLKYg h3o-SqWjDeMI2fCJI63-jg 1.0 0 0 0 Waiter was absolutely terrible ordered our foo... 2021-11-06 02:07:15

5 rows × 25 columns

Introduction to Spacy and Sentiment Analysis

We will use Spacy, which is free, open-sourced, and easy-to-use python library for foundamental NLP tasks, such as pre-processing, information extraction, and natural language understanding. Specifically, we will leverage a pre-trained pipeline, namely spacytextblob, for sentiment analysis. Depending on whether the user like the commented business or not, the model will return a sentiment polarity score on a scale from -1 to 1. Here negative denotes dislike and positive denotes like, to some extent.

In [2]:
# install required libraries
# spacy
!pip install -U pip setuptools wheel
!pip install -U spacy
!python -m spacy download en_core_web_sm

# spacytextblob
!pip install spacytextblob
!python -m textblob.download_corpora
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: pip in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (23.3.1)
Requirement already satisfied: setuptools in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (69.0.2)
Requirement already satisfied: wheel in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (0.42.0)
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: spacy in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (3.7.2)
Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy) (3.0.12)
Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy) (1.0.5)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy) (1.0.10)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy) (2.0.8)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy) (3.0.9)
Requirement already satisfied: thinc<8.3.0,>=8.1.8 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy) (8.2.1)
Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy) (1.1.2)
Requirement already satisfied: srsly<3.0.0,>=2.4.3 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy) (2.4.8)
Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy) (2.0.10)
Requirement already satisfied: weasel<0.4.0,>=0.1.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy) (0.3.4)
Requirement already satisfied: typer<0.10.0,>=0.3.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy) (0.9.0)
Requirement already satisfied: smart-open<7.0.0,>=5.2.1 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy) (6.4.0)
Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from spacy) (4.62.3)
Requirement already satisfied: requests<3.0.0,>=2.13.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from spacy) (2.27.1)
Requirement already satisfied: pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy) (2.5.2)
Requirement already satisfied: jinja2 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from spacy) (3.0.3)
Requirement already satisfied: setuptools in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy) (69.0.2)
Requirement already satisfied: packaging>=20.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from spacy) (21.3)
Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy) (3.3.0)
Requirement already satisfied: numpy>=1.15.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from spacy) (1.22.0)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from packaging>=20.0->spacy) (3.0.7)
Requirement already satisfied: annotated-types>=0.4.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy) (0.6.0)
Requirement already satisfied: pydantic-core==2.14.5 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy) (2.14.5)
Requirement already satisfied: typing-extensions>=4.6.1 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy) (4.8.0)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from requests<3.0.0,>=2.13.0->spacy) (1.25.11)
Requirement already satisfied: certifi>=2017.4.17 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from requests<3.0.0,>=2.13.0->spacy) (2021.10.8)
Requirement already satisfied: charset-normalizer~=2.0.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from requests<3.0.0,>=2.13.0->spacy) (2.0.10)
Requirement already satisfied: idna<4,>=2.5 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from requests<3.0.0,>=2.13.0->spacy) (3.3)
Requirement already satisfied: blis<0.8.0,>=0.7.8 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from thinc<8.3.0,>=8.1.8->spacy) (0.7.11)
Requirement already satisfied: confection<1.0.0,>=0.0.1 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from thinc<8.3.0,>=8.1.8->spacy) (0.1.4)
Requirement already satisfied: click<9.0.0,>=7.1.1 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from typer<0.10.0,>=0.3.0->spacy) (7.1.2)
Requirement already satisfied: cloudpathlib<0.17.0,>=0.7.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from weasel<0.4.0,>=0.1.0->spacy) (0.16.0)
Requirement already satisfied: MarkupSafe>=2.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from jinja2->spacy) (2.0.1)
Defaulting to user installation because normal site-packages is not writeable
Collecting en-core-web-sm==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl (12.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.8/12.8 MB 15.0 MB/s eta 0:00:0000:0100:01
Requirement already satisfied: spacy<3.8.0,>=3.7.2 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from en-core-web-sm==3.7.1) (3.7.2)
Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (3.0.12)
Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (1.0.5)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (1.0.10)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.0.8)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (3.0.9)
Requirement already satisfied: thinc<8.3.0,>=8.1.8 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (8.2.1)
Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (1.1.2)
Requirement already satisfied: srsly<3.0.0,>=2.4.3 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.4.8)
Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.0.10)
Requirement already satisfied: weasel<0.4.0,>=0.1.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (0.3.4)
Requirement already satisfied: typer<0.10.0,>=0.3.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (0.9.0)
Requirement already satisfied: smart-open<7.0.0,>=5.2.1 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (6.4.0)
Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (4.62.3)
Requirement already satisfied: requests<3.0.0,>=2.13.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.27.1)
Requirement already satisfied: pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.5.2)
Requirement already satisfied: jinja2 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (3.0.3)
Requirement already satisfied: setuptools in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (69.0.2)
Requirement already satisfied: packaging>=20.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (21.3)
Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (3.3.0)
Requirement already satisfied: numpy>=1.15.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (1.22.0)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from packaging>=20.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (3.0.7)
Requirement already satisfied: annotated-types>=0.4.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (0.6.0)
Requirement already satisfied: pydantic-core==2.14.5 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.14.5)
Requirement already satisfied: typing-extensions>=4.6.1 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (4.8.0)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (1.25.11)
Requirement already satisfied: certifi>=2017.4.17 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2021.10.8)
Requirement already satisfied: charset-normalizer~=2.0.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.0.10)
Requirement already satisfied: idna<4,>=2.5 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (3.3)
Requirement already satisfied: blis<0.8.0,>=0.7.8 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from thinc<8.3.0,>=8.1.8->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (0.7.11)
Requirement already satisfied: confection<1.0.0,>=0.0.1 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from thinc<8.3.0,>=8.1.8->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (0.1.4)
Requirement already satisfied: click<9.0.0,>=7.1.1 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from typer<0.10.0,>=0.3.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (7.1.2)
Requirement already satisfied: cloudpathlib<0.17.0,>=0.7.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from weasel<0.4.0,>=0.1.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (0.16.0)
Requirement already satisfied: MarkupSafe>=2.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from jinja2->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.0.1)
✔ Download and installation successful
You can now load the package via spacy.load('en_core_web_sm')
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: spacytextblob in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (4.0.0)
Requirement already satisfied: spacy<4.0,>=3.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacytextblob) (3.7.2)
Requirement already satisfied: textblob<0.16.0,>=0.15.3 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacytextblob) (0.15.3)
Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<4.0,>=3.0->spacytextblob) (3.0.12)
Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<4.0,>=3.0->spacytextblob) (1.0.5)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<4.0,>=3.0->spacytextblob) (1.0.10)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<4.0,>=3.0->spacytextblob) (2.0.8)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<4.0,>=3.0->spacytextblob) (3.0.9)
Requirement already satisfied: thinc<8.3.0,>=8.1.8 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<4.0,>=3.0->spacytextblob) (8.2.1)
Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<4.0,>=3.0->spacytextblob) (1.1.2)
Requirement already satisfied: srsly<3.0.0,>=2.4.3 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<4.0,>=3.0->spacytextblob) (2.4.8)
Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<4.0,>=3.0->spacytextblob) (2.0.10)
Requirement already satisfied: weasel<0.4.0,>=0.1.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<4.0,>=3.0->spacytextblob) (0.3.4)
Requirement already satisfied: typer<0.10.0,>=0.3.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<4.0,>=3.0->spacytextblob) (0.9.0)
Requirement already satisfied: smart-open<7.0.0,>=5.2.1 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<4.0,>=3.0->spacytextblob) (6.4.0)
Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from spacy<4.0,>=3.0->spacytextblob) (4.62.3)
Requirement already satisfied: requests<3.0.0,>=2.13.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from spacy<4.0,>=3.0->spacytextblob) (2.27.1)
Requirement already satisfied: pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<4.0,>=3.0->spacytextblob) (2.5.2)
Requirement already satisfied: jinja2 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from spacy<4.0,>=3.0->spacytextblob) (3.0.3)
Requirement already satisfied: setuptools in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<4.0,>=3.0->spacytextblob) (69.0.2)
Requirement already satisfied: packaging>=20.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from spacy<4.0,>=3.0->spacytextblob) (21.3)
Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<4.0,>=3.0->spacytextblob) (3.3.0)
Requirement already satisfied: numpy>=1.15.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from spacy<4.0,>=3.0->spacytextblob) (1.22.0)
Requirement already satisfied: nltk>=3.1 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from textblob<0.16.0,>=0.15.3->spacytextblob) (3.8.1)
Requirement already satisfied: click in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from nltk>=3.1->textblob<0.16.0,>=0.15.3->spacytextblob) (7.1.2)
Requirement already satisfied: joblib in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from nltk>=3.1->textblob<0.16.0,>=0.15.3->spacytextblob) (1.1.0)
Requirement already satisfied: regex>=2021.8.3 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from nltk>=3.1->textblob<0.16.0,>=0.15.3->spacytextblob) (2023.10.3)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from packaging>=20.0->spacy<4.0,>=3.0->spacytextblob) (3.0.7)
Requirement already satisfied: annotated-types>=0.4.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy<4.0,>=3.0->spacytextblob) (0.6.0)
Requirement already satisfied: pydantic-core==2.14.5 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy<4.0,>=3.0->spacytextblob) (2.14.5)
Requirement already satisfied: typing-extensions>=4.6.1 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy<4.0,>=3.0->spacytextblob) (4.8.0)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from requests<3.0.0,>=2.13.0->spacy<4.0,>=3.0->spacytextblob) (1.25.11)
Requirement already satisfied: certifi>=2017.4.17 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from requests<3.0.0,>=2.13.0->spacy<4.0,>=3.0->spacytextblob) (2021.10.8)
Requirement already satisfied: charset-normalizer~=2.0.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from requests<3.0.0,>=2.13.0->spacy<4.0,>=3.0->spacytextblob) (2.0.10)
Requirement already satisfied: idna<4,>=2.5 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from requests<3.0.0,>=2.13.0->spacy<4.0,>=3.0->spacytextblob) (3.3)
Requirement already satisfied: blis<0.8.0,>=0.7.8 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from thinc<8.3.0,>=8.1.8->spacy<4.0,>=3.0->spacytextblob) (0.7.11)
Requirement already satisfied: confection<1.0.0,>=0.0.1 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from thinc<8.3.0,>=8.1.8->spacy<4.0,>=3.0->spacytextblob) (0.1.4)
Requirement already satisfied: cloudpathlib<0.17.0,>=0.7.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from weasel<0.4.0,>=0.1.0->spacy<4.0,>=3.0->spacytextblob) (0.16.0)
Requirement already satisfied: MarkupSafe>=2.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from jinja2->spacy<4.0,>=3.0->spacytextblob) (2.0.1)
[nltk_data] Downloading package brown to /home/jovyan/nltk_data...
[nltk_data]   Package brown is already up-to-date!
[nltk_data] Downloading package punkt to /home/jovyan/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to /home/jovyan/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package conll2000 to /home/jovyan/nltk_data...
[nltk_data]   Package conll2000 is already up-to-date!
[nltk_data] Downloading package movie_reviews to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Package movie_reviews is already up-to-date!
Finished.
In [3]:
# import required libraries
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob

# load pipelines
nlp = spacy.load('en_core_web_sm')
nlp.add_pipe('spacytextblob')
Out[3]:
<spacytextblob.spacytextblob.SpacyTextBlob at 0x7f73caf88d60>
In [4]:
# define a function to be applied on each row of pandas dataframe
def sentiment_score(text):
    doc = nlp(text)
    return doc._.blob.polarity
In [5]:
%%time

# apply sentiment analysis to each row
yelp_data['sentiment'] = yelp_data['text'].apply(sentiment_score)
# It will take ~1 min to run through
CPU times: user 1min 2s, sys: 0 ns, total: 1min 2s
Wall time: 1min 2s
In [6]:
# check the derived column
print(yelp_data['sentiment'].min(), yelp_data['sentiment'].max())
yelp_data.head()
-1.0 1.0
Out[6]:
Unnamed: 0 Unnamed: 0_x business_id name address city state postal_code latitude longitude ... Unnamed: 0_y review_id user_id stars_y useful funny cool text date sentiment
0 0 38 LcAozWCMLGjwRbokaJAKMg Edwardsville Children's Museum 722 Holyoake Rd Edwardsville IL 62025 38.804395 -89.949733 ... 313 LfsU2lVUr1-pC802v0o32A mRgAqvxz9jHYpm8ccIjZUQ 5.0 0 0 0 Place rocks excellent children's activities an... 2016-07-04 20:56:17 0.436623
1 13 41 ljxNT9p0y7YMPx0fcNBGig Tony's Restaurant & 3rd Street Cafe 312 Piasa St Alton IL 62002 38.896563 -90.186203 ... 20 uiqzlDEsUN_y1awEw_HHDA qmQPWMV_YYmwV2DyvmIDYQ 5.0 0 0 0 We had been driving around for some time, on a... 2018-07-17 01:07:49 0.200250
2 118 48 bCBPXIVfVzBZBEpFu29dcg All In Shipping 5343 Belleville Crossing St Belleville IL 62226 38.517586 -90.021929 ... 1378 oZqb2LRrJFaEjTz9ETzpPA BHrWZS0J0FuJuLqeNk6J7w 5.0 0 0 0 I love this little local business. They have e... 2017-01-20 14:13:47 0.266146
3 123 86 sE6jSnvMts_MAn-b4OkMAw K-9 Groom Room 820 Industrial Dr Troy IL 62294 38.716244 -89.885830 ... 194 UjBwlySBW4iPpFWGOw5Xkw SE85OT0FKxeL28izk-5POg 4.0 3 0 0 This is another great local business. Our two... 2011-03-25 17:36:39 0.481250
4 128 102 EuRGgOwJ0g1vTj2R04j37Q Crafty Crab 51 Ludwig Dr Fairview Heights IL 62208 38.601298 -89.989683 ... 3261 DrWMCBMRweRydBEk-OLKYg h3o-SqWjDeMI2fCJI63-jg 1.0 0 0 0 Waiter was absolutely terrible ordered our foo... 2021-11-06 02:07:15 -0.124603

5 rows × 26 columns

Visualization of Explicit Geo-Text Data

We will use folium, a python plug-in to build an interactive map in leaflet.js.

In [7]:
# install folium
!pip install folium
# alternative conda install
# conda install -c conda-forge folium
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: folium in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (0.12.1.post1)
Requirement already satisfied: branca>=0.3.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from folium) (0.4.2)
Requirement already satisfied: jinja2>=2.9 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from folium) (3.0.3)
Requirement already satisfied: numpy in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from folium) (1.22.0)
Requirement already satisfied: requests in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from folium) (2.27.1)
Requirement already satisfied: MarkupSafe>=2.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from jinja2>=2.9->folium) (2.0.1)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from requests->folium) (1.25.11)
Requirement already satisfied: certifi>=2017.4.17 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from requests->folium) (2021.10.8)
Requirement already satisfied: charset-normalizer~=2.0.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from requests->folium) (2.0.10)
Requirement already satisfied: idna<4,>=2.5 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from requests->folium) (3.3)
In [8]:
# import libraries
import folium
import branca.colormap as cm
from branca.element import Figure
In [9]:
# firstly, filter a selected neighborhood from the dataset
select_neighbor = yelp_data[yelp_data['city']=='Edwardsville']

print(select_neighbor.shape)
# there are 274 businesses after filtering
(274, 26)
In [10]:
# build a color map to visualize sentiment polarity
rainbow = cm.StepColormap(['purple', 'lightblue', 'lightgreen', 'yellow', 'orange', 'red'], vmin=-1, vmax=1)
rainbow
Out[10]:
-11
In [11]:
# Create a map instance with a frame
fig = Figure(width=800, height=500)
m = folium.Map(location=[38.8039, -89.9583], zoom_start=11)
fig.add_child(m)

# iterate each business to add a marker onto the basemap
for index, row in select_neighbor.iterrows():
    iframe = folium.IFrame(row['text'])
    folium.Marker([row['latitude'], row['longitude']],
                  popup=folium.Popup(iframe, min_width=300, max_width=300),
                  icon=folium.Icon(color='lightgray', icon_color=rainbow(row['sentiment']))).add_to(m)

m
# Any observation about the spatial distribution pattern?
Out[11]:

You can play with it by replacing the visualized attribute with other column, e.g., stars, or filter to a different neighborhood. You are also welcome to explore other regions for course project or out of personal interest. Please feel free to reach out to me (znwang@illinois.edu) about data or your cool project.

Implicit Geo-Text Data Analysis

According to Twitter, while only 1-2% of Tweets are geotagged, 30-40% of Tweets contain some location information. Similarly, this GIScience'21 paper confirms that over 10% of Tweets contain some location references in the contents. Thus, it is important to perform text mining to extract these implicit geographic information from unstructed text data.

Recently, researchers have been utilizing advanced NLP techniques to perform this task, which can be considered as a sub-task of Named Entity Recognition (NER). Instead of any named entities, like person names, time expression, we majorly focus on geospatial named entities, such as geopolitical entities, local organizations. We will use Spacy again, as a general NER tool to recognize geo-entities from text data.

In [12]:
# import required libraries
import json
import spacy
from spacy import displacy    # visualizer
from collections import defaultdict
from tqdm import tqdm
In [13]:
# read text corpus and save into a list
data_list = []
with open('./data/news_samples.txt', encoding='utf-8') as f:
    readin = f.readlines()
    for line in tqdm(readin):
        data_list.append(line.strip())

print(f'Length of data_list: {len(data_list)}')
for text in data_list:
    print(text)
100%|██████████| 3/3 [00:00<00:00, 5236.33it/s]
Length of data_list: 3
Dozens, if not more than a hundred, Midland-area residents gathered to seek refuge within the walls of Midland High School Tuesday night after the Edenville Dam failed to hold back a deluge of water. Midland officials warned residents living near the Tittabawassee River to evacuate. They are concerned the Sanford Dam, located a few miles northwest of the city and downstream of the Edenville Dam, will also fail. Some drove to the school at 1301 Eastlawn Drive to seek shelter. Others were brought in by bus.
Videos and images captured by witnesses show just how much water was unleashed when Michigan's Edenville Dam failed. Officials had been warning nearby residents to evacuate all day Tuesday because of fears the hydroelectric dam holding back Wixom Lake would break. It was announced on Facebook around 6 p.m. Tuesday that the dam had failed -- and a torrent of water was rushing down the Tittabawassee River. The water's unrelenting flow continued overnight and daylight on Wednesday showed how little was left of the lake. An aerial image taken by a drone shows the Edenville dam breach on Wednesday.
Soaking rains from the remnants of Hurricane Ida prompted the evacuations of thousands of people Wednesday after water reached dangerous levels at a dam near Johnstown, PA. The storm moved east in the evening, with the National Weather Service confirming at least one tornado and social media posts showing homes blown to rubble and roofs torn from buildings in a southern New Jersey county just outside Philadelphia. Pennsylvania was blanketed with rain after high water drove some from their homes in Maryland and Virginia. The storm killed one person, two people were not accounted for, and a tornado was believed to have touched down along the Chesapeake Bay in Maryland. Ida caused countless school and business closures in Pennsylvania. About 150 roadways maintained by the Pennsylvania Department of Transportation were closed and many smaller roadways also were impassable.

In [14]:
# load spacy pipeline
nlp = spacy.load('en_core_web_sm')

# iterate through the news
for i, text in enumerate(tqdm(data_list)):
    doc = nlp(text)
    
    entity_dict = defaultdict(int)
    for entity in doc.ents:
        if entity.label_ in ['LOC', 'GPE']:    # LOCation, GeoPolitical Entity (i.e. countries, cities, states)
            entity_dict[entity.label_ + '_' + entity.text] += 1
    
    # visualize NER results
    displacy.render(doc, style='ent', options={"ents": ['LOC', 'GPE']}, jupyter=True)
    
    # save recognized entities into json
    with open(f'./data/NER_{i}.txt', 'w') as fout:
        fout.write(json.dumps(entity_dict) + '\n')
  0%|          | 0/3 [00:00<?, ?it/s]
Dozens, if not more than a hundred, Midland GPE -area residents gathered to seek refuge within the walls of Midland High School Tuesday night after the Edenville Dam LOC failed to hold back a deluge of water. Midland GPE officials warned residents living near the Tittabawassee River LOC to evacuate. They are concerned the Sanford Dam LOC , located a few miles northwest of the city and downstream of the Edenville Dam LOC , will also fail. Some drove to the school at 1301 Eastlawn Drive LOC to seek shelter. Others were brought in by bus.
Videos GPE and images captured by witnesses show just how much water was unleashed when Michigan GPE 's Edenville Dam LOC failed. Officials had been warning nearby residents to evacuate all day Tuesday because of fears the hydroelectric dam holding back Wixom Lake would break. It was announced on Facebook around 6 p.m. Tuesday that the dam had failed -- and a torrent of water was rushing down the Tittabawassee River LOC . The water's unrelenting flow continued overnight and daylight on Wednesday showed how little was left of the lake. An aerial image taken by a drone shows the Edenville dam breach on Wednesday.
 67%|██████▋   | 2/3 [00:00<00:00, 10.84it/s]
Soaking rains from the remnants of Hurricane Ida prompted the evacuations of thousands of people Wednesday after water reached dangerous levels at a dam near Johnstown GPE , PA GPE . The storm moved east in the evening, with the National Weather Service confirming at least one tornado and social media posts showing homes blown to rubble and roofs torn from buildings in a southern New Jersey GPE county just outside Philadelphia GPE . Pennsylvania GPE was blanketed with rain after high water drove some from their homes in Maryland GPE and Virginia GPE . The storm killed one person, two people were not accounted for, and a tornado was believed to have touched down along the Chesapeake Bay LOC in Maryland GPE . Ida caused countless school and business closures in Pennsylvania GPE . About 150 roadways maintained by the Pennsylvania Department of Transportation were closed and many smaller roadways also were impassable.
100%|██████████| 3/3 [00:00<00:00,  9.67it/s]

Optional: Visualization of Implicit Geo-Text Data (Geocoding Service Required)

In [ ]:
# import libraries
import requests
import folium
import matplotlib.pyplot as plt
%matplotlib inline
In [ ]:
# load target news' NER results
target_news = 1
with open(f'./data/NER_{i}.txt') as f:
    ner = f.read()
ner_list = ner.split('\n')
ner_num = ner_list[0]
ner_js = json.loads(ner_num)
ner_js
In [ ]:
ner_class = {}
for key in ner_js.keys():
    class_ = key[:3]
    if class_ not in ner_class.keys():
        ner_class[class_] = {}
In [ ]:
# need Google Maps API key
my_Google_Maps_API_key = 'your_Google_Maps_API_key'
for key in ner_js.keys():
    class_, place_name = key.split('_')
    if place_name not in ner_class[class_].keys():
        response = requests.get(f'https://maps.googleapis.com/maps/api/geocode/json?address={place_name}&key={my_Google_Maps_API_key}')
        if response.json()['results']:
            ner_class[class_][place_name] = response.json()['results'][0]['geometry']['location']
In [ ]:
# Create a map instance with a frame
fig = Figure(width=800, height=500)
m = folium.Map(location=[38, -97], tiles="cartodbpositron", zoom_start=6)
fig.add_child(m)

# LOC
for key in ner_class['LOC']:
    lat, lon = ner_class['LOC'][key]['lat'], ner_class['LOC'][key]['lng']
    folium.Marker([lat, lon], popup=key, icon=folium.Icon(color='red'),).add_to(m)
# GPE
for key in ner_class['GPE']:
    lat, lon = ner_class['GPE'][key]['lat'], ner_class['GPE'][key]['lng']
    folium.Marker([lat, lon], popup=key, icon=folium.Icon(color='blue'),).add_to(m)

m