Project Assignment 3

Introduction

The overall goal of my project is to analyze the Social Vulnerability Index (SVI) for counties in the U.S. using the 2022 SVI dataset. The Social Vulnerability Index measure how vulnerable each community to disaters, economic stress, and public health problems. With analyzing this dataset, I can find patterns in vulnerability across counties and find locations that might need more resources and support. Learning to understand these patterns can allow policymakers and emergency planners improve disater readiness and community resilience.

Dataset Description

The dataset that I used for this project is the 2022 Social Vilnerability index dataset at the county level. This dataset was made by the CDC and has vulnerability scores for counties all across the U.S. Each county is given a score from 0 to 1, with the higher values meaning larger vulnerability. The main variable I will analyze is the RPL_THEMES, which represents the overall social vulnerability score.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
In [2]:
svi = pd.read_csv("SVI_2022_US_county.csv")
svi.head()
Out[2]:
ST STATE ST_ABBR STCNTY COUNTY FIPS LOCATION AREA_SQMI E_TOTPOP M_TOTPOP ... EP_ASIAN MP_ASIAN EP_AIAN MP_AIAN EP_NHPI MP_NHPI EP_TWOMORE MP_TWOMORE EP_OTHERRACE MP_OTHERRACE
0 1 Alabama AL 1001 Autauga County 1001 Autauga County, Alabama 594.454786 58761 0 ... 1.1 0.4 0.1 0.1 0.0 0.1 3.3 1.0 0.2 0.3
1 1 Alabama AL 1003 Baldwin County 1003 Baldwin County, Alabama 1589.861817 233420 0 ... 0.9 0.1 0.2 0.1 0.0 0.1 3.1 0.4 0.4 0.3
2 1 Alabama AL 1005 Barbour County 1005 Barbour County, Alabama 885.007619 24877 0 ... 0.5 0.1 0.3 0.1 0.0 0.1 1.8 0.7 1.2 0.8
3 1 Alabama AL 1007 Bibb County 1007 Bibb County, Alabama 622.469286 22251 0 ... 0.3 0.4 0.1 0.1 0.0 0.2 1.7 1.0 0.1 0.1
4 1 Alabama AL 1009 Blount County 1009 Blount County, Alabama 644.890376 59077 0 ... 0.2 0.2 0.1 0.1 0.2 0.2 2.8 0.7 0.1 0.1

5 rows × 158 columns

Inspecting the Dataset

In [3]:
svi.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3144 entries, 0 to 3143
Columns: 158 entries, ST to MP_OTHERRACE
dtypes: float64(75), int64(79), object(4)
memory usage: 3.8+ MB
In [4]:
svi.describe()
Out[4]:
ST STCNTY FIPS AREA_SQMI E_TOTPOP M_TOTPOP E_HU M_HU E_HH M_HH ... EP_ASIAN MP_ASIAN EP_AIAN MP_AIAN EP_NHPI MP_NHPI EP_TWOMORE MP_TWOMORE EP_OTHERRACE MP_OTHERRACE
count 3144.000000 3144.000000 3144.000000 3144.000000 3.144000e+03 3144.000000 3.144000e+03 3144.000000 3.144000e+03 3144.000000 ... 3144.000000 3144.000000 3144.000000 3144.000000 3144.000000 3144.000000 3144.000000 3144.000000 3144.000000 3144.000000
mean 30.264313 30368.187023 30368.187023 1123.812856 1.053109e+05 5.687023 4.482939e+04 66.367366 3.999248e+04 378.694975 ... 1.409606 0.387182 1.709192 0.413518 0.091985 0.356934 3.105821 0.874491 0.264695 0.423282
std 15.152668 15170.427484 15170.427484 3592.773606 3.337924e+05 29.924606 1.320087e+05 64.258691 1.214871e+05 367.587221 ... 2.889120 0.766913 7.489594 1.119786 0.472447 1.142309 1.968423 1.203668 0.384044 1.354489
min 1.000000 1001.000000 1001.000000 2.046442 5.000000e+01 0.000000 5.500000e+01 11.000000 3.200000e+01 20.000000 ... 0.000000 0.100000 0.000000 0.100000 0.000000 0.100000 0.000000 0.100000 0.000000 0.100000
25% 18.000000 18174.500000 18174.500000 430.968649 1.083575e+04 0.000000 5.238250e+03 30.000000 4.139750e+03 173.000000 ... 0.300000 0.100000 0.100000 0.100000 0.000000 0.100000 2.000000 0.400000 0.000000 0.100000
50% 29.000000 29174.000000 29174.000000 615.625786 2.578450e+04 0.000000 1.231600e+04 47.000000 9.944000e+03 276.000000 ... 0.600000 0.200000 0.200000 0.100000 0.000000 0.100000 2.800000 0.600000 0.200000 0.200000
75% 45.000000 45079.500000 45079.500000 924.245980 6.807975e+04 0.000000 3.153475e+04 77.000000 2.643475e+04 449.000000 ... 1.300000 0.400000 0.500000 0.400000 0.100000 0.300000 3.700000 1.000000 0.400000 0.400000
max 56.000000 56045.000000 56045.000000 145575.491748 9.936690e+06 328.000000 3.599561e+06 931.000000 3.363093e+06 4811.000000 ... 41.600000 20.500000 90.900000 37.500000 12.000000 37.500000 23.400000 37.500000 9.200000 41.900000

8 rows × 154 columns

Checking for NA values and exploring important variables

In [5]:
svi.isnull().sum()
Out[5]:
ST              0
STATE           0
ST_ABBR         0
STCNTY          0
COUNTY          0
               ..
MP_NHPI         0
EP_TWOMORE      0
MP_TWOMORE      0
EP_OTHERRACE    0
MP_OTHERRACE    0
Length: 158, dtype: int64
In [6]:
svi[['STATE','COUNTY','RPL_THEMES']].head()
Out[6]:
STATE COUNTY RPL_THEMES
0 Alabama Autauga County 0.2663
1 Alabama Baldwin County 0.3487
2 Alabama Barbour County 0.9927
3 Alabama Bibb County 0.8451
4 Alabama Blount County 0.6166

Distribution of Social Vulnerability Scores

In [7]:
plt.figure(figsize=(10,5))
plt.hist(svi['RPL_THEMES'], bins=10, edgecolor='black')
plt.title("Distribution of Social Vulnerability Index Scores")
plt.xlabel("SVI Score")
plt.ylabel("Number of Counties")

plt.show()

Because the SVI variable is a percentile ranking, the values are almost distrubited equally from 0 to 1. WHen the data is a uniform distribution, each bin contains roughly the same number of observations.

Identifying Counties with the Highest Vulnerability and Visualization

Sorts the dataset to allow me to find the counties with the largest vulnerability score.

In [8]:
top_counties = svi.sort_values(by='RPL_THEMES', ascending=False)
top_counties[['STATE','COUNTY','RPL_THEMES']].head(10)
Out[8]:
STATE COUNTY RPL_THEMES
1147 Louisiana Madison Parish 1.0000
1813 New Mexico Luna County 0.9997
2588 Texas Dimmit County 0.9994
1134 Louisiana Evangeline Parish 0.9990
120 Arkansas Chicot County 0.9987
1945 North Carolina Lenoir County 0.9984
334 Florida DeSoto County 0.9981
111 Arizona Yuma County 0.9978
1484 Mississippi Yazoo County 0.9975
1478 Mississippi Washington County 0.9971

Since the top ten counties have very similar score close the 1, the x-axis is zoomed in to show the difference between counties. The chart show that Madison Parish In Louisiana has the highest vulnerability score and then goes down a the list to show the others counties scores.

In [13]:
plt.figure(figsize=(10,6))
plt.barh(top_counties.head(10)['COUNTY'], top_counties.head(10)['RPL_THEMES'], color = 'steelblue')
plt.title("Top 10 Counties with Highest Social Vulnerability")
plt.xlabel("SVI Score")
plt.xlim(0.995, 1.001)
plt.gca().invert_yaxis()

plt.show()

State-Level Vulnerability Patterns or Visualization

This groups the counties by states and finds the average vulnerability by each state.

In [11]:
state_avg = svi.groupby('STATE')['RPL_THEMES'].mean().sort_values(ascending=False)
state_avg.head(10)
Out[11]:
STATE
Arizona           0.829213
Mississippi       0.806721
New Mexico        0.794094
Louisiana         0.772586
South Carolina    0.763543
Florida           0.708306
Texas             0.706484
Georgia           0.706448
California        0.690778
Arkansas          0.690255
Name: RPL_THEMES, dtype: float64

The visualization show exactly which states have the highest average vulnerability levels across the different counties.

In [12]:
plt.figure(figsize=(10,6))
state_avg.head(10).plot(kind='bar')
plt.title("States with Highest Average Social Vulnerability")
plt.ylabel("Average SVI Score")

plt.show()

Results

The first graph that I made shows the SVI scores for counties in the U.S. These values are from 0 to 1, and the bars are almost the same height across the full graph. This is becasue the SVI score is actually based on percentile rankings, which spreads the counties fairly across the range of values.

The second graph that I made shows the top ten counties with the highest social vulnerability score. I found that Madison Parish in Louisiana has the highest score followed by Luna County in New Mexico and Dimmit County in Texas. All of the top ten counties have scores very close to 1 with the losest one being higher the 0.997, which means they all rank among the most vulnerable counties in the country. This counties might be facing different or similar challenges like poverty, limited healthcare, and weak infrastructure.

The last graph that I made shows the states with the largest average social vulnerability scores. The graph shows that Arizona has the biggest average score, followed by Mississippi and New Mexico. Some other states also appear in the top ten like Louisiana, South Carolina, Florida, and Texas. This implies that vulnerability could be more common in certain areas of the country.

Overall, the visualization and analysis shows that while SVI scores are evenly spread across the counties, some counties and states continue to appear consistently among the most vulnerable.

Conclusion

In conclusion for this project I analyzed the Social Vulnerability Index (SVI)for counties in the U.S. using the 2022 dataset. The graph I made helped show how vulnerability scores are distributed across counties and which counties have the highest scores. Counties like Madison Parish in Louisiana, Luna City in New Mexico, and Dimmit County in Texas had the highest vulnerability score.

The state-level results showed that certain states, like Arizona, Mississippi, and New Mexico have the largest average when it comes to vulnerability levels. This shows that vulnerability could be more common in certain regions of the country. In geography, places nearby tend to have similar characteristics. Because of this, counties that are nearby one another may show similar vulnerability patterns. By looking at these spatial patterns it could help find regions where communities may need more support and resources.

In [ ]: