The overall goal of my project is to analyze the Social Vulnerability Index (SVI) for counties in the U.S. using the 2022 SVI dataset. The Social Vulnerability Index measure how vulnerable each community to disaters, economic stress, and public health problems. With analyzing this dataset, I can find patterns in vulnerability across counties and find locations that might need more resources and support. Learning to understand these patterns can allow policymakers and emergency planners improve disater readiness and community resilience.
The dataset that I used for this project is the 2022 Social Vilnerability index dataset at the county level. This dataset was made by the CDC and has vulnerability scores for counties all across the U.S. Each county is given a score from 0 to 1, with the higher values meaning larger vulnerability. The main variable I will analyze is the RPL_THEMES, which represents the overall social vulnerability score.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
svi = pd.read_csv("SVI_2022_US_county.csv")
svi.head()
svi.info()
svi.describe()
Checking for NA values and exploring important variables
svi.isnull().sum()
svi[['STATE','COUNTY','RPL_THEMES']].head()
plt.figure(figsize=(10,5))
plt.hist(svi['RPL_THEMES'], bins=10, edgecolor='black')
plt.title("Distribution of Social Vulnerability Index Scores")
plt.xlabel("SVI Score")
plt.ylabel("Number of Counties")
plt.show()
Because the SVI variable is a percentile ranking, the values are almost distrubited equally from 0 to 1. WHen the data is a uniform distribution, each bin contains roughly the same number of observations.
Sorts the dataset to allow me to find the counties with the largest vulnerability score.
top_counties = svi.sort_values(by='RPL_THEMES', ascending=False)
top_counties[['STATE','COUNTY','RPL_THEMES']].head(10)
Since the top ten counties have very similar score close the 1, the x-axis is zoomed in to show the difference between counties. The chart show that Madison Parish In Louisiana has the highest vulnerability score and then goes down a the list to show the others counties scores.
plt.figure(figsize=(10,6))
plt.barh(top_counties.head(10)['COUNTY'], top_counties.head(10)['RPL_THEMES'], color = 'steelblue')
plt.title("Top 10 Counties with Highest Social Vulnerability")
plt.xlabel("SVI Score")
plt.xlim(0.995, 1.001)
plt.gca().invert_yaxis()
plt.show()
This groups the counties by states and finds the average vulnerability by each state.
state_avg = svi.groupby('STATE')['RPL_THEMES'].mean().sort_values(ascending=False)
state_avg.head(10)
The visualization show exactly which states have the highest average vulnerability levels across the different counties.
plt.figure(figsize=(10,6))
state_avg.head(10).plot(kind='bar')
plt.title("States with Highest Average Social Vulnerability")
plt.ylabel("Average SVI Score")
plt.show()
The first graph that I made shows the SVI scores for counties in the U.S. These values are from 0 to 1, and the bars are almost the same height across the full graph. This is becasue the SVI score is actually based on percentile rankings, which spreads the counties fairly across the range of values.
The second graph that I made shows the top ten counties with the highest social vulnerability score. I found that Madison Parish in Louisiana has the highest score followed by Luna County in New Mexico and Dimmit County in Texas. All of the top ten counties have scores very close to 1 with the losest one being higher the 0.997, which means they all rank among the most vulnerable counties in the country. This counties might be facing different or similar challenges like poverty, limited healthcare, and weak infrastructure.
The last graph that I made shows the states with the largest average social vulnerability scores. The graph shows that Arizona has the biggest average score, followed by Mississippi and New Mexico. Some other states also appear in the top ten like Louisiana, South Carolina, Florida, and Texas. This implies that vulnerability could be more common in certain areas of the country.
Overall, the visualization and analysis shows that while SVI scores are evenly spread across the counties, some counties and states continue to appear consistently among the most vulnerable.
In conclusion for this project I analyzed the Social Vulnerability Index (SVI)for counties in the U.S. using the 2022 dataset. The graph I made helped show how vulnerability scores are distributed across counties and which counties have the highest scores. Counties like Madison Parish in Louisiana, Luna City in New Mexico, and Dimmit County in Texas had the highest vulnerability score.
The state-level results showed that certain states, like Arizona, Mississippi, and New Mexico have the largest average when it comes to vulnerability levels. This shows that vulnerability could be more common in certain regions of the country. In geography, places nearby tend to have similar characteristics. Because of this, counties that are nearby one another may show similar vulnerability patterns. By looking at these spatial patterns it could help find regions where communities may need more support and resources.