Q1. Write a brief description of the project you would like to pursue on CyberGISX. Specify the data sources you would be using for the project. Describe the analysis/modeling/simulation scenarios you hope to achieve this semester using CyberGISX.

I am interested in comparing which states have underserved pediatric patients compared to others. The following links contain the pediatric census data andP general pediatricians employed within their respective states. Pediatric data: https://data.census.gov/table?q=Children%20by%20state Pediatrician data: https://data.bls.gov/oes/#/occGeo/One%20occupation%20for%20multiple%20geographical%20areas

I will have to use skills learned in previous classes to merge information and calculate statistics but i will then use follium to generate maps by state that are interactive.

Q2. Using techniques, we have learned in Module 3 please develop a visualization of any of the components of the datasets you have chosen.

In [1]:
# Import the pandas library for data manipulation
import pandas as pd

# Import the numpy library for numerical operations (not used in this specific code block)
import numpy as np

# Import the folium library for creating web maps
import folium

# Setting the data path (location of your data files)
data_path = "/home/jovyan/shared_data/data/geog407/assignment3/"

# Identify the GeoJSON file containing US state boundaries
state_geo = f"{data_path}/us-states.json"

# Read the pediatric data from a CSV file using pandas
population_data = pd.read_csv('Pediatric_data.csv')#, dtype={'column1': int, 'column2': float})

# Read the doctor data from a CSV file using pandas
#   - skiprows=5: Skips the first 5 rows of the CSV (assuming they are headers)
#   - usecols=[0,1]: Selects only the first two columns (adjust if needed)
doctor_data = pd.read_csv('Physician_Report.csv', skiprows=5, usecols=[0, 1])

# Display the doctor data (optional)
print(doctor_data)  # This line will print the contents of the doctor_data DataFrame
                                            Area Name Employment(1)
0                                    Alabama(0100000)           450
1                                     Alaska(0200000)            90
2                                    Arizona(0400000)          1160
3                                   Arkansas(0500000)           240
4                                 California(0600000)          4350
5                                   Colorado(0800000)           420
6                                Connecticut(0900000)           410
7                                   Delaware(1000000)           270
8                                    Florida(1200000)          1400
9                                    Georgia(1300000)             -
10                                    Hawaii(1500000)           480
11                                  Illinois(1700000)          1790
12                                   Indiana(1800000)           740
13                                      Iowa(1900000)           220
14                                    Kansas(2000000)             -
15                                  Kentucky(2100000)           280
16                                 Louisiana(2200000)           170
17                                     Maine(2300000)           130
18                                  Maryland(2400000)           750
19                             Massachusetts(2500000)           950
20                                  Michigan(2600000)           970
21                                 Minnesota(2700000)           740
22                               Mississippi(2800000)            70
23                                  Missouri(2900000)           170
24                                   Montana(3000000)            40
25                                  Nebraska(3100000)           210
26                             New Hampshire(3300000)           170
27                                New Jersey(3400000)           860
28                                New Mexico(3500000)           180
29                                  New York(3600000)          4560
30                            North Carolina(3700000)          1020
31                                      Ohio(3900000)          1360
32                                  Oklahoma(4000000)             -
33                                    Oregon(4100000)           680
34                              Pennsylvania(4200000)          1760
35                               Puerto Rico(7200000)             -
36                              Rhode Island(4400000)           300
37                            South Carolina(4500000)           250
38                              South Dakota(4600000)            50
39                                 Tennessee(4700000)           690
40                                     Texas(4800000)          2530
41                                      Utah(4900000)           200
42                                   Vermont(5000000)           120
43                                  Virginia(5100000)           890
44                                Washington(5300000)           550
45                             West Virginia(5400000)           110
46                                 Wisconsin(5500000)           510
47                                   Wyoming(5600000)            70
48                                                NaN           NaN
49  (1)Estimates for detailed occupations do not s...           NaN
50                          (8)Estimate not released.           NaN
51                                                NaN           NaN
52  SOC code: Standard Occupational Classification...           NaN
53                    Date extracted on :Jun 23, 2024           NaN
In [2]:
doctor_data = doctor_data.head(len(doctor_data)-6)
In [3]:
# Extract the state name from the 'Area Name' column
doctor_data['state'] = doctor_data['Area Name'].str.split("(").str.get(0)

# Print the column names after adding the 'state' column (optional)
print(doctor_data.columns)

# Rename the 'Employment(1)' column for clarity
doctor_data['number of doctors'] = doctor_data['Employment(1)']

# Select only the 'state' and 'number of doctors' columns
doctor_data = doctor_data[['state', 'number of doctors']]
Index(['Area Name', 'Employment(1)', 'state'], dtype='object')
In [4]:
# Print the DataFrame dimensions (number of rows and columns)
print(doctor_data.shape)

# Extract unique state names from the 'state' column
doctor_state_names = sorted(list(set(doctor_data['state'])))

# Print the number of unique state names
print(len(doctor_state_names))
(48, 2)
48
In [5]:
# Extract column names (excluding the first column) as strings
column_names_string = [str(column) for column in population_data.columns[1:]]

# Extract state names from column names (assuming they're separated by "!!")
ped_data_state_names = [column.split("!!")[0] for column in column_names_string]

# Find unique state names (remove duplicates) and sort them alphabetically (optional)
unique_state_names = sorted(list(set(ped_data_state_names)))

# Calculate the number of unique state names
num_unique_state_names = len(unique_state_names)

# Calculate the number of data columns (excluding the first column)
num_columns = len(population_data.columns[1:])

# Calculate the average number of columns per state
average_columns_per_state = num_columns / num_unique_state_names

# Print the average number of columns per state
print(average_columns_per_state)
12.0
In [6]:
# Define the keep list containing indices to include (second and every 12th column)
keep_list = [i for i in range(1, len(population_data.columns), 12)]
#add_to_list = [str(population_data.columns[0])]
# Create a new sorted list containing elements from the keep list
#wanted_columns =add_to_list + sorted([population_data.columns[i] for i in keep_list])
wanted_columns = sorted([population_data.columns[i] for i in keep_list])
# row 23 is the <18 yo people in the respective state/ teritory
scrubbed_population_ds = population_data[wanted_columns].iloc[23,:]
scrubbed_population_ds = scrubbed_population_ds.rename("pediatric population estimate")
scrubbed_population_ds.index = scrubbed_population_ds.index.str.split("!!").str.get(0)
scrubbed_population_df = scrubbed_population_ds.to_frame(name=scrubbed_population_ds.index.name)
new_column_names = {"index": "state", scrubbed_population_df.columns[0]: "pediatric population estimate"}
scrubbed_population_df = scrubbed_population_df.rename(columns=new_column_names)
In [7]:
missing_states = [state for state in unique_state_names if state not in doctor_state_names]
missing_states #states that are missing and will need to be removed from
Out[7]:
['District of Columbia', 'Idaho', 'Nevada', 'North Dakota']
In [8]:
joined_data = pd.merge(scrubbed_population_df, doctor_data, left_index=True, right_on='state', how='inner')

joined_data = joined_data[['state','number of doctors','pediatric population estimate']]
joined_data = pd.merge(scrubbed_population_df, doctor_data, left_index=True, right_on='state', how='left')
joined_data['state'].fillna('Unknown', inplace=True)  # Replace NaNs with 'Unknown'
joined_data = joined_data.reset_index()  # Modifies the DataFrame itself
joined_data = joined_data[['state', 'pediatric population estimate', 'number of doctors']]
print(joined_data)
                   state pediatric population estimate number of doctors
0                Alabama                     1,111,028               450
1                 Alaska                       175,745                90
2                Arizona                     1,587,603              1160
3               Arkansas                       695,832               240
4             California                     8,499,006              4350
5               Colorado                     1,211,095               420
6            Connecticut                       730,424               410
7               Delaware                       208,227               270
8   District of Columbia                       124,244               NaN
9                Florida                     4,294,894              1400
10               Georgia                     2,505,668                 -
11                Hawaii                       296,866               480
12                 Idaho                       464,620               NaN
13              Illinois                     2,716,206              1790
14               Indiana                     1,569,374               740
15                  Iowa                       716,748               220
16                Kansas                       688,212                 -
17              Kentucky                     1,004,880               280
18             Louisiana                     1,058,860               170
19                 Maine                       245,491               130
20              Maryland                     1,346,731               750
21         Massachusetts                     1,335,988               950
22              Michigan                     2,106,228               970
23             Minnesota                     1,288,213               740
24           Mississippi                       676,085                70
25              Missouri                     1,362,791               170
26               Montana                       234,182                40
27              Nebraska                       474,697               210
28                Nevada                       689,653               NaN
29         New Hampshire                       251,693               170
30            New Jersey                     1,990,177               860
31            New Mexico                       455,475               180
32              New York                     3,981,238              4560
33        North Carolina                     2,292,928              1020
34          North Dakota                       180,965               NaN
35                  Ohio                     2,558,156              1360
36              Oklahoma                       953,422                 -
37                Oregon                       832,367               680
38          Pennsylvania                     2,622,158              1760
39           Puerto Rico                       518,336                 -
40          Rhode Island                       203,146               300
41        South Carolina                     1,117,142               250
42          South Dakota                       218,981                50
43             Tennessee                     1,535,133               690
44                 Texas                     7,439,663              2530
45                  Utah                       930,593               200
46               Vermont                       113,734               120
47              Virginia                     1,862,087               890
48            Washington                     1,644,027               550
49         West Virginia                       351,543               110
50             Wisconsin                     1,241,780               510
51               Wyoming                       129,603                70
In [9]:
#cleaned_docs = joined_data.dropna(subset=['number of doctors'])
num_docs_list = joined_data['number of doctors'].to_list()
new_num_docs_list = pd.to_numeric(num_docs_list, errors='coerce')
new_num_docs_list
Out[9]:
array([ 450.,   90., 1160.,  240., 4350.,  420.,  410.,  270.,   nan,
       1400.,   nan,  480.,   nan, 1790.,  740.,  220.,   nan,  280.,
        170.,  130.,  750.,  950.,  970.,  740.,   70.,  170.,   40.,
        210.,   nan,  170.,  860.,  180., 4560., 1020.,   nan, 1360.,
         nan,  680., 1760.,   nan,  300.,  250.,   50.,  690., 2530.,
        200.,  120.,  890.,  550.,  110.,  510.,   70.])
In [10]:
#cleaned_patients = joined_data.dropna(subset=['number of doctors'])
num_patients_list = joined_data['pediatric population estimate'].to_list()
#new_num_docs_list = pd.to_numeric(num_docs_list, errors='coerce')
num_patients_list
clean_num_pat_list = []
for number in num_patients_list:
    solid_number = number.replace(",", "")
    clean_num_pat_list.append(int(solid_number))
clean_num_pat_list   
Out[10]:
[1111028,
 175745,
 1587603,
 695832,
 8499006,
 1211095,
 730424,
 208227,
 124244,
 4294894,
 2505668,
 296866,
 464620,
 2716206,
 1569374,
 716748,
 688212,
 1004880,
 1058860,
 245491,
 1346731,
 1335988,
 2106228,
 1288213,
 676085,
 1362791,
 234182,
 474697,
 689653,
 251693,
 1990177,
 455475,
 3981238,
 2292928,
 180965,
 2558156,
 953422,
 832367,
 2622158,
 518336,
 203146,
 1117142,
 218981,
 1535133,
 7439663,
 930593,
 113734,
 1862087,
 1644027,
 351543,
 1241780,
 129603]
In [11]:
state_names = joined_data['state']
cleaned_data = pd.DataFrame({
  "State": state_names,
  "Number of Doctors": new_num_docs_list,
  "Patient Estimate": clean_num_pat_list
})
cleaned_data['Doc to Patient Ratio'] = cleaned_data['Number of Doctors']/cleaned_data['Patient Estimate']
cleaned_data['Patient to doc Ratio'] = cleaned_data['Patient Estimate']/cleaned_data['Number of Doctors']
cleaned_data
Out[11]:
State Number of Doctors Patient Estimate Doc to Patient Ratio Patient to doc Ratio
0 Alabama 450.0 1111028 0.000405 2468.951111
1 Alaska 90.0 175745 0.000512 1952.722222
2 Arizona 1160.0 1587603 0.000731 1368.623276
3 Arkansas 240.0 695832 0.000345 2899.300000
4 California 4350.0 8499006 0.000512 1953.794483
5 Colorado 420.0 1211095 0.000347 2883.559524
6 Connecticut 410.0 730424 0.000561 1781.521951
7 Delaware 270.0 208227 0.001297 771.211111
8 District of Columbia NaN 124244 NaN NaN
9 Florida 1400.0 4294894 0.000326 3067.781429
10 Georgia NaN 2505668 NaN NaN
11 Hawaii 480.0 296866 0.001617 618.470833
12 Idaho NaN 464620 NaN NaN
13 Illinois 1790.0 2716206 0.000659 1517.433520
14 Indiana 740.0 1569374 0.000472 2120.775676
15 Iowa 220.0 716748 0.000307 3257.945455
16 Kansas NaN 688212 NaN NaN
17 Kentucky 280.0 1004880 0.000279 3588.857143
18 Louisiana 170.0 1058860 0.000161 6228.588235
19 Maine 130.0 245491 0.000530 1888.392308
20 Maryland 750.0 1346731 0.000557 1795.641333
21 Massachusetts 950.0 1335988 0.000711 1406.303158
22 Michigan 970.0 2106228 0.000461 2171.369072
23 Minnesota 740.0 1288213 0.000574 1740.828378
24 Mississippi 70.0 676085 0.000104 9658.357143
25 Missouri 170.0 1362791 0.000125 8016.417647
26 Montana 40.0 234182 0.000171 5854.550000
27 Nebraska 210.0 474697 0.000442 2260.461905
28 Nevada NaN 689653 NaN NaN
29 New Hampshire 170.0 251693 0.000675 1480.547059
30 New Jersey 860.0 1990177 0.000432 2314.159302
31 New Mexico 180.0 455475 0.000395 2530.416667
32 New York 4560.0 3981238 0.001145 873.078509
33 North Carolina 1020.0 2292928 0.000445 2247.968627
34 North Dakota NaN 180965 NaN NaN
35 Ohio 1360.0 2558156 0.000532 1880.997059
36 Oklahoma NaN 953422 NaN NaN
37 Oregon 680.0 832367 0.000817 1224.069118
38 Pennsylvania 1760.0 2622158 0.000671 1489.862500
39 Puerto Rico NaN 518336 NaN NaN
40 Rhode Island 300.0 203146 0.001477 677.153333
41 South Carolina 250.0 1117142 0.000224 4468.568000
42 South Dakota 50.0 218981 0.000228 4379.620000
43 Tennessee 690.0 1535133 0.000449 2224.830435
44 Texas 2530.0 7439663 0.000340 2940.578261
45 Utah 200.0 930593 0.000215 4652.965000
46 Vermont 120.0 113734 0.001055 947.783333
47 Virginia 890.0 1862087 0.000478 2092.232584
48 Washington 550.0 1644027 0.000335 2989.140000
49 West Virginia 110.0 351543 0.000313 3195.845455
50 Wisconsin 510.0 1241780 0.000411 2434.862745
51 Wyoming 70.0 129603 0.000540 1851.471429
In [12]:
#Setting the data path
data_path="/home/jovyan/shared_data/data/geog407/lab3/"

#Identifying the spatial data to be used
state_geo = f"{data_path}/us-states.json"
#Create a Folium Map Object identifying the central latitude, longitude and starting Zoom
m = folium.Map(location=[48, -102], zoom_start=3)



# Call the Choropleth function
# state_geo variable is identified as geographic data 
# UnemploymentRateJuly2021 variable is identified as tabular data
# Columns to be used in data and what to key on are identified
folium.Choropleth(
    geo_data=state_geo, #setting spatial data
    name="Patient to Doctor Ratio",
    data=cleaned_data, #setting tabular data
    columns=["State", "Patient to doc Ratio"],
    key_on="feature.properties.name",
    fill_color="YlGnBu", #setting fill color
    fill_opacity=0.7, # setting opacity
    line_opacity=0.2,
    legend_name="Patient to Doctor Ratio", #setting legend
).add_to(m)

#Adds Layer control
folium.LayerControl().add_to(m)

# Finally show the map
m
Out[12]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [ ]: