Choropleth Map Classification Examples

In this notebook, you will learn how to create choropleth maps using different classification methods available in the mapclassify package of the Python Spatial Analysis Library (PySAL).

The examples in this notebook are adapted from the electronic book "Geographic Data Science with Python" by Sergio Rey, Dani Arribas-Bel, and Levi Wolf, which can be found at https://geographicdata.science/book/intro.html. Specifically, we will focus on the section titled "Choropleth Mapping" under "PART II - Spatial Data Analysis."

Our goal is to explore regional income data for the 32 Mexican states used in the paper by [RSastreGutierrez10]. The variable we will be analyzing is per capita gross domestic product for 1940 (PCGDP1940).

By the end of this notebook, you should have a better understanding of how to create choropleth maps using different classification methods and be able to apply this knowledge to your own spatial data analyses.

In [2]:
import seaborn
import pandas as pd
import geopandas
import pysal
import numpy
import matplotlib.pyplot as plt
In [3]:
mx = geopandas.read_file("data/mexico/mexicojoin.shp")
mx.head()
Out[3]:
POLY_ID AREA CODE NAME PERIMETER ACRES HECTARES PCGDP1940 PCGDP1950 PCGDP1960 ... GR9000 LPCGDP40 LPCGDP50 LPCGDP60 LPCGDP70 LPCGDP80 LPCGDP90 LPCGDP00 TEST geometry
0 1 7.252751e+10 MX02 Baja California Norte 2040312.385 1.792187e+07 7252751.376 22361.0 20977.0 17865.0 ... 0.05 4.35 4.32 4.25 4.40 4.47 4.43 4.48 1.0 MULTIPOLYGON (((-113.13972 29.01778, -113.2405...
1 2 7.225988e+10 MX03 Baja California Sur 2912880.772 1.785573e+07 7225987.769 9573.0 16013.0 16707.0 ... 0.00 3.98 4.20 4.22 4.39 4.46 4.41 4.42 2.0 MULTIPOLYGON (((-111.20612 25.80278, -111.2302...
2 3 2.731957e+10 MX18 Nayarit 1034770.341 6.750785e+06 2731956.859 4836.0 7515.0 7621.0 ... -0.05 3.68 3.88 3.88 4.04 4.13 4.11 4.06 3.0 MULTIPOLYGON (((-106.62108 21.56531, -106.6475...
3 4 7.961008e+10 MX14 Jalisco 2324727.436 1.967200e+07 7961008.285 5309.0 8232.0 9953.0 ... 0.03 3.73 3.92 4.00 4.21 4.32 4.30 4.33 4.0 POLYGON ((-101.52490 21.85664, -101.58830 21.7...
4 5 5.467030e+09 MX01 Aguascalientes 313895.530 1.350927e+06 546702.985 10384.0 6234.0 8714.0 ... 0.13 4.02 3.79 3.94 4.21 4.32 4.32 4.44 5.0 POLYGON ((-101.84620 22.01176, -101.96530 21.8...

5 rows × 35 columns

In [4]:
selected = "PCGDP1940"      #"%_manufacturing"
mx[["NAME", selected]]  
# Plot histogram
ax = seaborn.histplot(mx[selected], bins=20)
# Add rug on horizontal axis
seaborn.rugplot(mx[selected], height=0.05, color="red", ax=ax);
In [5]:
selected ="PCGDP1940"
mx[selected].describe()
Out[5]:
count       32.000000
mean      7230.531250
std       5204.952883
min       1892.000000
25%       3701.750000
50%       5256.000000
75%       8701.750000
max      22361.000000
Name: PCGDP1940, dtype: float64
In [6]:
import mapclassify

Choropleth Maps will be created by using the following methods

  • Equal Intervals (EqualInterval)
  • Quantiles (Quantiles)
  • Mean-Standard Deviation (StdMean)
  • Maximum Breaks (MaximumBreaks)
  • Head/Tail Breaks (HeadTailBreaks)
  • Optimal Method
  • The Jenks-Caspall Algorithm (JenksCaspall)
  • The Fisher-Jenks Algorithm (FisherJenks)
  • Max-p (MaxP)
In [7]:
ei5 = mapclassify.EqualInterval(mx[selected], k=5)
ei5
Out[7]:
EqualInterval               

      Interval         Count
----------------------------
[ 1892.00,  5985.80] |    17
( 5985.80, 10079.60] |     9
(10079.60, 14173.40] |     3
(14173.40, 18267.20] |     1
(18267.20, 22361.00] |     2
In [8]:
ax = mx.plot(
    figsize=(10,10),
    column=selected,  # Data to plot
    scheme="EqualInterval",  # Classification scheme
    cmap="YlGn",  # Color palette
    legend=True,  # Add legend
)
ax.set_axis_off();
In [9]:
q5 = mapclassify.Quantiles(mx[selected], k=5)
q5
Out[9]:
Quantiles                   

      Interval         Count
----------------------------
[ 1892.00,  3576.20] |     7
( 3576.20,  4582.80] |     6
( 4582.80,  6925.20] |     6
( 6925.20,  9473.00] |     6
( 9473.00, 22361.00] |     7
In [10]:
ax = mx.plot(
    figsize=(10,10),
    column=selected,  # Data to plot
    scheme="Quantiles",  # Classification scheme
    cmap="YlGn",  # Color palette
    legend=True,  # Add legend
)
ax.set_axis_off();
In [11]:
msd = mapclassify.StdMean(mx[selected])
msd
Out[11]:
StdMean                     

      Interval         Count
----------------------------
(    -inf, -3179.37] |     0
(-3179.37,  2025.58] |     1
( 2025.58, 12435.48] |    28
(12435.48, 17640.44] |     0
(17640.44, 22361.00] |     3
In [12]:
ax = mx.plot(
    figsize=(10,10),    
    column=selected,  # Data to plot
    scheme="StdMean",  # Classification scheme
    cmap="YlGn",  # Color palette
    legend=True,  # Add legend
)
ax.set_axis_off();
In [13]:
mb5 = mapclassify.MaximumBreaks(mx[selected], k=5)
mb5
Out[13]:
MaximumBreaks               

      Interval         Count
----------------------------
[ 1892.00,  5854.00] |    17
( 5854.00, 11574.00] |    11
(11574.00, 14974.00] |     1
(14974.00, 19890.50] |     1
(19890.50, 22361.00] |     2
In [14]:
ax = mx.plot(
    figsize=(10,10),    
    column=selected,  # Data to plot
    scheme="MaximumBreaks",  # Classification scheme
    cmap="YlGn",  # Color palette
    legend=True,  # Add legend
)
ax.set_axis_off();
In [15]:
ht = mapclassify.HeadTailBreaks(mx[selected])
ht
Out[15]:
HeadTailBreaks              

      Interval         Count
----------------------------
[ 1892.00,  7230.53] |    20
( 7230.53, 12244.42] |     9
(12244.42, 20714.00] |     1
(20714.00, 22163.00] |     1
(22163.00, 22361.00] |     1
In [16]:
ax = mx.plot(
    figsize=(10,10),
    column=selected,  # Data to plot
    scheme="HeadTailBreaks",  # Classification scheme
    cmap="YlGn",  # Color palette
    legend=True,  # Add legend
)
ax.set_axis_off();
In [17]:
numpy.random.seed(12345)
jc5 = mapclassify.JenksCaspall(mx[selected], k=5)
jc5
Out[17]:
JenksCaspall                

      Interval         Count
----------------------------
[ 1892.00,  2934.00] |     4
( 2934.00,  4414.00] |     9
( 4414.00,  6399.00] |     5
( 6399.00, 12132.00] |    11
(12132.00, 22361.00] |     3
In [18]:
ax = mx.plot(
    figsize=(10,10),
    column=selected,  # Data to plot
    scheme="JenksCaspall",  # Classification scheme
    cmap="YlGn",  # Color palette
    legend=True,  # Add legend
)
ax.set_axis_off();
In [19]:
numpy.random.seed(12345)
fj5 = mapclassify.FisherJenks(mx[selected], k=5)
fj5
Out[19]:
FisherJenks                 

      Interval         Count
----------------------------
[ 1892.00,  5309.00] |    17
( 5309.00,  9073.00] |     8
( 9073.00, 12132.00] |     4
(12132.00, 17816.00] |     1
(17816.00, 22361.00] |     2
In [20]:
ax = mx.plot(
    figsize=(10,10),
    column=selected,  # Data to plot
    scheme="FisherJenks",  # Classification scheme
    cmap="YlGn",  # Color palette
    legend=True,  # Add legend
)
ax.set_axis_off();
In [21]:
mp5 = mapclassify.MaxP(mx[selected], k=5)
mp5
Out[21]:
MaxP                        

      Interval         Count
----------------------------
[ 1892.00,  3569.00] |     7
( 3569.00,  5309.00] |    10
( 5309.00,  7990.00] |     5
( 7990.00, 10384.00] |     5
(10384.00, 22361.00] |     5
In [22]:
ax = mx.plot(
    figsize=(10,10),
    column=selected,  # Data to plot
    scheme="MaxP",  # Classification scheme
    cmap="YlGn",  # Color palette
    legend=True,  # Add legend
)
ax.set_axis_off();

Computing the Sum of Absolute Deviations about Class Medians (ADCM)

In [23]:
# Bunch classifier objects
class5 = ei5, q5, msd, mb5, ht, jc5, fj5, mp5
# Collect ADCM for each classifier
fits = numpy.array([c.adcm for c in class5])
# Convert ADCM scores to a DataFrame
adcms = pd.DataFrame(fits)
# Add classifier names
adcms["classifier"] = [c.name for c in class5]
# Add column names to the ADCM
adcms.columns = ["ADCM", "Classifier"]
ax = seaborn.barplot(
    y="Classifier", x="ADCM", data=adcms, palette="Pastel1"
)