Q1. Write a brief description of the project you would like to pursue on CyberGISX. Specify the data sources you would be using for the project. Describe the analysis/modeling/simulation scenarios you hope to achieve this semester using CyberGISX.

I am interested in comparing which states have underserved pediatric patients compared to others. The following links contain the pediatric census data andP general pediatricians employed within their respective states. Pediatric data: https://data.census.gov/table?q=Children%20by%20state Pediatrician data: https://data.bls.gov/oes/#/occGeo/One%20occupation%20for%20multiple%20geographical%20areas

I will have to use skills learned in previous classes to merge information and calculate statistics but i will then use follium to generate maps by state that are interactive.

Q2. Using techniques, we have learned in Module 3 please develop a visualization of any of the components of the datasets you have chosen.

# Import the pandas library for data manipulation
import pandas as pd

# Import the numpy library for numerical operations (not used in this specific code block)
import numpy as np

# Import the folium library for creating web maps
import folium

# Setting the data path (location of your data files)
data_path = "/home/jovyan/shared_data/data/geog407/assignment3/"

# Identify the GeoJSON file containing US state boundaries
state_geo = f"{data_path}/us-states.json"

# Read the pediatric data from a CSV file using pandas
population_data = pd.read_csv('Pediatric_data.csv')#, dtype={'column1': int, 'column2': float})

# Read the doctor data from a CSV file using pandas
#   - skiprows=5: Skips the first 5 rows of the CSV (assuming they are headers)
#   - usecols=[0,1]: Selects only the first two columns (adjust if needed)
doctor_data = pd.read_csv('Physician_Report.csv', skiprows=5, usecols=[0, 1])

# Display the doctor data (optional)
print(doctor_data)  # This line will print the contents of the doctor_data DataFrame

                                            Area Name Employment(1)
0                                    Alabama(0100000)           450
1                                     Alaska(0200000)            90
2                                    Arizona(0400000)          1160
3                                   Arkansas(0500000)           240
4                                 California(0600000)          4350
5                                   Colorado(0800000)           420
6                                Connecticut(0900000)           410
7                                   Delaware(1000000)           270
8                                    Florida(1200000)          1400
9                                    Georgia(1300000)             -
10                                    Hawaii(1500000)           480
11                                  Illinois(1700000)          1790
12                                   Indiana(1800000)           740
13                                      Iowa(1900000)           220
14                                    Kansas(2000000)             -
15                                  Kentucky(2100000)           280
16                                 Louisiana(2200000)           170
17                                     Maine(2300000)           130
18                                  Maryland(2400000)           750
19                             Massachusetts(2500000)           950
20                                  Michigan(2600000)           970
21                                 Minnesota(2700000)           740
22                               Mississippi(2800000)            70
23                                  Missouri(2900000)           170
24                                   Montana(3000000)            40
25                                  Nebraska(3100000)           210
26                             New Hampshire(3300000)           170
27                                New Jersey(3400000)           860
28                                New Mexico(3500000)           180
29                                  New York(3600000)          4560
30                            North Carolina(3700000)          1020
31                                      Ohio(3900000)          1360
32                                  Oklahoma(4000000)             -
33                                    Oregon(4100000)           680
34                              Pennsylvania(4200000)          1760
35                               Puerto Rico(7200000)             -
36                              Rhode Island(4400000)           300
37                            South Carolina(4500000)           250
38                              South Dakota(4600000)            50
39                                 Tennessee(4700000)           690
40                                     Texas(4800000)          2530
41                                      Utah(4900000)           200
42                                   Vermont(5000000)           120
43                                  Virginia(5100000)           890
44                                Washington(5300000)           550
45                             West Virginia(5400000)           110
46                                 Wisconsin(5500000)           510
47                                   Wyoming(5600000)            70
48                                                NaN           NaN
49  (1)Estimates for detailed occupations do not s...           NaN
50                          (8)Estimate not released.           NaN
51                                                NaN           NaN
52  SOC code: Standard Occupational Classification...           NaN
53                    Date extracted on :Jun 23, 2024           NaN

doctor_data = doctor_data.head(len(doctor_data)-6)

# Extract the state name from the 'Area Name' column
doctor_data['state'] = doctor_data['Area Name'].str.split("(").str.get(0)

# Print the column names after adding the 'state' column (optional)
print(doctor_data.columns)

# Rename the 'Employment(1)' column for clarity
doctor_data['number of doctors'] = doctor_data['Employment(1)']

# Select only the 'state' and 'number of doctors' columns
doctor_data = doctor_data[['state', 'number of doctors']]

Index(['Area Name', 'Employment(1)', 'state'], dtype='object')

# Print the DataFrame dimensions (number of rows and columns)
print(doctor_data.shape)

# Extract unique state names from the 'state' column
doctor_state_names = sorted(list(set(doctor_data['state'])))

# Print the number of unique state names
print(len(doctor_state_names))

(48, 2)
48

# Extract column names (excluding the first column) as strings
column_names_string = [str(column) for column in population_data.columns[1:]]

# Extract state names from column names (assuming they're separated by "!!")
ped_data_state_names = [column.split("!!")[0] for column in column_names_string]

# Find unique state names (remove duplicates) and sort them alphabetically (optional)
unique_state_names = sorted(list(set(ped_data_state_names)))

# Calculate the number of unique state names
num_unique_state_names = len(unique_state_names)

# Calculate the number of data columns (excluding the first column)
num_columns = len(population_data.columns[1:])

# Calculate the average number of columns per state
average_columns_per_state = num_columns / num_unique_state_names

# Print the average number of columns per state
print(average_columns_per_state)

12.0

# Define the keep list containing indices to include (second and every 12th column)
keep_list = [i for i in range(1, len(population_data.columns), 12)]
#add_to_list = [str(population_data.columns[0])]
# Create a new sorted list containing elements from the keep list
#wanted_columns =add_to_list + sorted([population_data.columns[i] for i in keep_list])
wanted_columns = sorted([population_data.columns[i] for i in keep_list])
# row 23 is the <18 yo people in the respective state/ teritory
scrubbed_population_ds = population_data[wanted_columns].iloc[23,:]
scrubbed_population_ds = scrubbed_population_ds.rename("pediatric population estimate")
scrubbed_population_ds.index = scrubbed_population_ds.index.str.split("!!").str.get(0)
scrubbed_population_df = scrubbed_population_ds.to_frame(name=scrubbed_population_ds.index.name)
new_column_names = {"index": "state", scrubbed_population_df.columns[0]: "pediatric population estimate"}
scrubbed_population_df = scrubbed_population_df.rename(columns=new_column_names)

missing_states = [state for state in unique_state_names if state not in doctor_state_names]
missing_states #states that are missing and will need to be removed from

['District of Columbia', 'Idaho', 'Nevada', 'North Dakota']

joined_data = pd.merge(scrubbed_population_df, doctor_data, left_index=True, right_on='state', how='inner')

joined_data = joined_data[['state','number of doctors','pediatric population estimate']]
joined_data = pd.merge(scrubbed_population_df, doctor_data, left_index=True, right_on='state', how='left')
joined_data['state'].fillna('Unknown', inplace=True)  # Replace NaNs with 'Unknown'
joined_data = joined_data.reset_index()  # Modifies the DataFrame itself
joined_data = joined_data[['state', 'pediatric population estimate', 'number of doctors']]
print(joined_data)

                   state pediatric population estimate number of doctors
0                Alabama                     1,111,028               450
1                 Alaska                       175,745                90
2                Arizona                     1,587,603              1160
3               Arkansas                       695,832               240
4             California                     8,499,006              4350
5               Colorado                     1,211,095               420
6            Connecticut                       730,424               410
7               Delaware                       208,227               270
8   District of Columbia                       124,244               NaN
9                Florida                     4,294,894              1400
10               Georgia                     2,505,668                 -
11                Hawaii                       296,866               480
12                 Idaho                       464,620               NaN
13              Illinois                     2,716,206              1790
14               Indiana                     1,569,374               740
15                  Iowa                       716,748               220
16                Kansas                       688,212                 -
17              Kentucky                     1,004,880               280
18             Louisiana                     1,058,860               170
19                 Maine                       245,491               130
20              Maryland                     1,346,731               750
21         Massachusetts                     1,335,988               950
22              Michigan                     2,106,228               970
23             Minnesota                     1,288,213               740
24           Mississippi                       676,085                70
25              Missouri                     1,362,791               170
26               Montana                       234,182                40
27              Nebraska                       474,697               210
28                Nevada                       689,653               NaN
29         New Hampshire                       251,693               170
30            New Jersey                     1,990,177               860
31            New Mexico                       455,475               180
32              New York                     3,981,238              4560
33        North Carolina                     2,292,928              1020
34          North Dakota                       180,965               NaN
35                  Ohio                     2,558,156              1360
36              Oklahoma                       953,422                 -
37                Oregon                       832,367               680
38          Pennsylvania                     2,622,158              1760
39           Puerto Rico                       518,336                 -
40          Rhode Island                       203,146               300
41        South Carolina                     1,117,142               250
42          South Dakota                       218,981                50
43             Tennessee                     1,535,133               690
44                 Texas                     7,439,663              2530
45                  Utah                       930,593               200
46               Vermont                       113,734               120
47              Virginia                     1,862,087               890
48            Washington                     1,644,027               550
49         West Virginia                       351,543               110
50             Wisconsin                     1,241,780               510
51               Wyoming                       129,603                70

#cleaned_docs = joined_data.dropna(subset=['number of doctors'])
num_docs_list = joined_data['number of doctors'].to_list()
new_num_docs_list = pd.to_numeric(num_docs_list, errors='coerce')
new_num_docs_list

array([ 450.,   90., 1160.,  240., 4350.,  420.,  410.,  270.,   nan,
       1400.,   nan,  480.,   nan, 1790.,  740.,  220.,   nan,  280.,
        170.,  130.,  750.,  950.,  970.,  740.,   70.,  170.,   40.,
        210.,   nan,  170.,  860.,  180., 4560., 1020.,   nan, 1360.,
         nan,  680., 1760.,   nan,  300.,  250.,   50.,  690., 2530.,
        200.,  120.,  890.,  550.,  110.,  510.,   70.])

#cleaned_patients = joined_data.dropna(subset=['number of doctors'])
num_patients_list = joined_data['pediatric population estimate'].to_list()
#new_num_docs_list = pd.to_numeric(num_docs_list, errors='coerce')
num_patients_list
clean_num_pat_list = []
for number in num_patients_list:
    solid_number = number.replace(",", "")
    clean_num_pat_list.append(int(solid_number))
clean_num_pat_list

[1111028,
 175745,
 1587603,
 695832,
 8499006,
 1211095,
 730424,
 208227,
 124244,
 4294894,
 2505668,
 296866,
 464620,
 2716206,
 1569374,
 716748,
 688212,
 1004880,
 1058860,
 245491,
 1346731,
 1335988,
 2106228,
 1288213,
 676085,
 1362791,
 234182,
 474697,
 689653,
 251693,
 1990177,
 455475,
 3981238,
 2292928,
 180965,
 2558156,
 953422,
 832367,
 2622158,
 518336,
 203146,
 1117142,
 218981,
 1535133,
 7439663,
 930593,
 113734,
 1862087,
 1644027,
 351543,
 1241780,
 129603]

state_names = joined_data['state']
cleaned_data = pd.DataFrame({
  "State": state_names,
  "Number of Doctors": new_num_docs_list,
  "Patient Estimate": clean_num_pat_list
})
cleaned_data['Doc to Patient Ratio'] = cleaned_data['Number of Doctors']/cleaned_data['Patient Estimate']
cleaned_data['Patient to doc Ratio'] = cleaned_data['Patient Estimate']/cleaned_data['Number of Doctors']
cleaned_data

#Setting the data path
data_path="/home/jovyan/shared_data/data/geog407/lab3/"

#Identifying the spatial data to be used
state_geo = f"{data_path}/us-states.json"
#Create a Folium Map Object identifying the central latitude, longitude and starting Zoom
m = folium.Map(location=[48, -102], zoom_start=3)



# Call the Choropleth function
# state_geo variable is identified as geographic data 
# UnemploymentRateJuly2021 variable is identified as tabular data
# Columns to be used in data and what to key on are identified
folium.Choropleth(
    geo_data=state_geo, #setting spatial data
    name="Patient to Doctor Ratio",
    data=cleaned_data, #setting tabular data
    columns=["State", "Patient to doc Ratio"],
    key_on="feature.properties.name",
    fill_color="YlGnBu", #setting fill color
    fill_opacity=0.7, # setting opacity
    line_opacity=0.2,
    legend_name="Patient to Doctor Ratio", #setting legend
).add_to(m)

#Adds Layer control
folium.LayerControl().add_to(m)

# Finally show the map
m

	State	Number of Doctors	Patient Estimate	Doc to Patient Ratio	Patient to doc Ratio
0	Alabama	450.0	1111028	0.000405	2468.951111
1	Alaska	90.0	175745	0.000512	1952.722222
2	Arizona	1160.0	1587603	0.000731	1368.623276
3	Arkansas	240.0	695832	0.000345	2899.300000
4	California	4350.0	8499006	0.000512	1953.794483
5	Colorado	420.0	1211095	0.000347	2883.559524
6	Connecticut	410.0	730424	0.000561	1781.521951
7	Delaware	270.0	208227	0.001297	771.211111
8	District of Columbia	NaN	124244	NaN	NaN
9	Florida	1400.0	4294894	0.000326	3067.781429
10	Georgia	NaN	2505668	NaN	NaN
11	Hawaii	480.0	296866	0.001617	618.470833
12	Idaho	NaN	464620	NaN	NaN
13	Illinois	1790.0	2716206	0.000659	1517.433520
14	Indiana	740.0	1569374	0.000472	2120.775676
15	Iowa	220.0	716748	0.000307	3257.945455
16	Kansas	NaN	688212	NaN	NaN
17	Kentucky	280.0	1004880	0.000279	3588.857143
18	Louisiana	170.0	1058860	0.000161	6228.588235
19	Maine	130.0	245491	0.000530	1888.392308
20	Maryland	750.0	1346731	0.000557	1795.641333
21	Massachusetts	950.0	1335988	0.000711	1406.303158
22	Michigan	970.0	2106228	0.000461	2171.369072
23	Minnesota	740.0	1288213	0.000574	1740.828378
24	Mississippi	70.0	676085	0.000104	9658.357143
25	Missouri	170.0	1362791	0.000125	8016.417647
26	Montana	40.0	234182	0.000171	5854.550000
27	Nebraska	210.0	474697	0.000442	2260.461905
28	Nevada	NaN	689653	NaN	NaN
29	New Hampshire	170.0	251693	0.000675	1480.547059
30	New Jersey	860.0	1990177	0.000432	2314.159302
31	New Mexico	180.0	455475	0.000395	2530.416667
32	New York	4560.0	3981238	0.001145	873.078509
33	North Carolina	1020.0	2292928	0.000445	2247.968627
34	North Dakota	NaN	180965	NaN	NaN
35	Ohio	1360.0	2558156	0.000532	1880.997059
36	Oklahoma	NaN	953422	NaN	NaN
37	Oregon	680.0	832367	0.000817	1224.069118
38	Pennsylvania	1760.0	2622158	0.000671	1489.862500
39	Puerto Rico	NaN	518336	NaN	NaN
40	Rhode Island	300.0	203146	0.001477	677.153333
41	South Carolina	250.0	1117142	0.000224	4468.568000
42	South Dakota	50.0	218981	0.000228	4379.620000
43	Tennessee	690.0	1535133	0.000449	2224.830435
44	Texas	2530.0	7439663	0.000340	2940.578261
45	Utah	200.0	930593	0.000215	4652.965000
46	Vermont	120.0	113734	0.001055	947.783333
47	Virginia	890.0	1862087	0.000478	2092.232584
48	Washington	550.0	1644027	0.000335	2989.140000
49	West Virginia	110.0	351543	0.000313	3195.845455
50	Wisconsin	510.0	1241780	0.000411	2434.862745
51	Wyoming	70.0	129603	0.000540	1851.471429