Climate change is one of the greatest threats that humanity will face in the 21st century and beyond. A great deal of effort has been expended to document and forecast exactly how and when the damage will occur, but these analyses do not create climate solutions. A variety of low carbon emissions energy sources exist, such as wind, solar, geothermal, nuclear, and hydropower. All of these will play a role in the coming decades to create a piecemeal solution to replace fossil fuels. However, these energy sources are all bring with them their own constraints:
There are many other factors to consider that are outside the scope of this project that would make the analysis conducted in this project more accurate. They are listed below:
Data description and sources:
Project goals:
%matplotlib inline
import geopandas
import pandas
import os
import matplotlib
import shapely
import rasterio
from rasterio.plot import show
import random
# file = 'USA_power-density_100m.geojson' # this file is just the outline of the us, no wind power data.
# os.path.exists(file)
# open(file)
# data = geopandas.read_file(file)
# data
# ax = data.plot(figsize=(20, 20), color='white', edgecolor='black', legend=True)
# type(ax)
# the source of this land value data is Nolte, Christoph (2020), Data for: High-resolution land value maps reveal underestimation of conservation costs in the United States, Dryad, Dataset, https://doi.org/10.5061/dryad.np5hqbzq9
land_value = rasterio.open('land_values_epsg4326.tif')
# upperLeft = dataset.transform * (0, 0)
# lowerRight = dataset.transform * (dataset.width, dataset.height)
# lowerRight
# land_value.crs
type(land_value)
print("the raster has " + str(land_value.height * land_value.width) + " values")
# dataset.bounds
band1 = land_value.read(1)
band1
# regular indexing allows me to access land values from any cell
band1[land_value.height // 4, land_value.width // 10 * 9]
# spatial indexing -- lat and long in gives me regular indices out
# x, y = (land_value.bounds.left + 10, land_value.bounds.top - 5)
# row, col = land_value.index(x, y)
# band1[row, col]
# get spatial coordinates of a pixel as (x, y) coords:
lonlat = land_value.xy(land_value.height, land_value.width)
lonlat
show(land_value, cmap='rainbow')
show(band1, cmap='rainbow')
# the following code was used to reproject raster data from epsg 5070 to 4326
# import rioxarray
# rds = rioxarray.open_rasterio('places_fmv_all.tif')
# rds_4326 = rds.rio.reproject("EPSG:4326")
# rds_4326.rio.to_raster("file.tif")
# Next, we convert the land value data into a geodataframe so it can be compared / analyzed with the other datasets.
# this is a very lengthy process due to the size of the data file, so a counter variable was used to track signal progress once every million records.
landDict = {'land_value':[], 'geometry':[]}
counter = 0
for y in range(land_value.height):
for x in range(land_value.width):
lv = band1[y, x] # get the land value at that location
# there is far too much data, so we extract 1% of it:
if random.randint(0,99) == 0 and lv != 0: # only add to the table if land value isn't zero, since null values default to zero for this dataset
landDict['land_value'].append(lv) # add it to the dictionary
lonlat = land_value.xy(y, x) # what is generated is a 2 item tuple in this order: lon, then lat
# landDict['latitude'].append(lonlat[1])
# landDict['longitude'].append(lonlat[0])
landDict['geometry'].append(shapely.geometry.Point(lonlat))
counter += 1
if counter % 1000000 == 0:
print(counter) # visualize how far along it has come
landValueGdf = geopandas.GeoDataFrame(landDict, crs="EPSG:4326")
landValueGdf # seen below, the gdf of land values has been created successfully.
# Plotting the land value gdf shows that we have sufficient data density for analysis with the other datasets.
landValueGdf.plot(figsize=(20, 20), column='land_value', cmap='rainbow')
landValueGdf.to_file('land_values_trimmed')
# to avoid having to regenerate the map above each day, this line can be used to pull it from file.
landValueGdf = geopandas.read_file('land_values_trimmed/land_values_trimmed.shp')
pv_data_path = 'pv_open_2020.csv' # data source: https://www.nrel.gov/gis/solar-supply-curves.html.
gdf = geopandas.read_file(pv_data_path, crs='EPSG:4326')
gdf
# by default, the downloaded csv does not contain geometry values. The code below generates that from the latitude and longitude columns.
for record in gdf.index:
lat = float(gdf.loc[record, 'latitude'])
lon = float(gdf.loc[record, 'longitude'])
gdf.loc[record, 'geometry'] = shapely.geometry.Point(lon, lat)
# save to shapefile so above calculation does not have to be performed repeatedly
# saving to shapefile truncates column names to 10 characters
gdf.to_file('USA_GHI')
gdf = geopandas.read_file('USA_GHI/USA_GHI.shp', crs='EPSG:4326')
# Plotting the insolation data reveals that the geometries were successfully generated from the longitude and latitude strings.
ax = gdf.plot(figsize=(20, 20), column='global_hor', cmap='rainbow')
# data source: https://maps.nrel.gov/?da=wind-prospector#/?aL=wbDr04%255Bv%255D%3Dt&bL=groad&cE=0&lR=0&mC=41.80407814427234%2C-95.361328125&zL=4
# I unzipped this folder to access 110m hub height near future, which is a mostly complete map of the US.
wind_table = geopandas.read_file('110m_near_future/110_Meter_Hub_Height_Near_Future_Technology_.shp')
wind_table
wind_table.plot(column='AREA WI_02', figsize=(20,20))
# convert polygons into points for later combination with other data. This is necessary because later we will be combining
# point data all together into one raster, and it is easier to do that with the wind data if it is in Point form rather than Polygon form.
wind_table['geometry'] = wind_table['geometry'].centroid
wind_table.plot(column='AREA WI_02', figsize=(20,20)) # make sure centroid calculation worked
# because saving to file truncates file names, I am renaming the column titles to include the part that was truncated.
# "GCF" stands for "gross capacity factor" which is a measure of actual electrical energy output over a given period of time compared to the theoretical maximum electrical energy output over that period.
wind_table.rename(columns={'AREA WITH ':'GCF > 30', 'AREA WI_01':'GCF > 35', 'AREA WI_02':'GCF > 40'}, inplace=True)
wind_table = wind_table[['GCF > 30', 'GCF > 35', 'GCF > 40', 'geometry']]
wind_table.to_file('wind_data_processed')
# pull out the electricity prices file and add it to a geodataframe
# data source: https://www.eia.gov/electricity/data.php
# data was found under sales, Average retail price of electricity to ultimate customers, Annual retail price, By state and utility, All sectors, file name: Electricity prices by state
elec = geopandas.read_file('Electricity prices by state.csv')
# next, we find the average electricity prices for each state.
elec['Sales (Megawatthours)'] = elec['Sales (Megawatthours)'].str.replace(",", "") # replace the commas in the numbers so they can be converted to ints
elec['Revenues (Thousands Dollars)'] = elec['Revenues (Thousands Dollars)'].str.replace(",", "") # replace the commas in the numbers so they can be converted to ints
elec['Sales (Megawatthours)'] = pandas.to_numeric(elec['Sales (Megawatthours)']) # convert strings to ints
elec['Revenues (Thousands Dollars)'] = pandas.to_numeric(elec['Revenues (Thousands Dollars)']) # convert strings to ints
elec = elec[['State', 'Sales (Megawatthours)', 'Revenues (Thousands Dollars)', 'geometry']] # trim away the unneeded columns
elec = elec.dissolve(by='State', aggfunc='sum') # consolidate rows
elec['Avg price ($/kwh)'] = elec['Revenues (Thousands Dollars)'] / elec['Sales (Megawatthours)']
elec # display to make sure we have the desired output
# In this cell, the state boundary geometry is pulled from file and pared down to the contiguous 48 states.
state_boundaries = geopandas.read_file('states_boundaries/cb_2018_us_state_500k.shp')
state_boundaries = state_boundaries[state_boundaries['NAME'] != 'Alaska']
state_boundaries = state_boundaries[state_boundaries['NAME'] != 'Puerto Rico']
state_boundaries = state_boundaries[state_boundaries['NAME'] != 'Hawaii']
state_boundaries = state_boundaries[state_boundaries['NAME'] != 'American Samoa']
state_boundaries = state_boundaries[state_boundaries['NAME'] != 'United States Virgin Islands']
state_boundaries = state_boundaries[state_boundaries['NAME'] != 'Commonwealth of the Northern Mariana Islands']
state_boundaries = state_boundaries[state_boundaries['NAME'] != 'Guam']
# state_boundaries.plot(figsize=(20,20)) # check to make sure it looks good
state_boundaries = state_boundaries[['STUSPS', 'geometry']]
state_boundaries.rename(columns={'STUSPS': 'State'}, inplace=True) # make title more readable
state_boundaries # review the data
state_boundaries.plot(figsize=(20, 20), color='white', edgecolor='black') # check to make sure the state boundaries look good
# Now, we attach the state geometries to the electricity prices table. This is the beginning of the process to consolidate ALL of the project data
# into one geodataframe.
elec_and_geo = elec.merge(state_boundaries, on='State', how='inner')
elec_and_geo.rename(columns={'geometry_y': 'geometry'}, inplace=True)
elec_and_geo = elec_and_geo[['State', 'Sales (Megawatthours)', 'Revenues (Thousands Dollars)', 'Avg price ($/kwh)', 'geometry']]
elecGdf = geopandas.GeoDataFrame(elec_and_geo)
elecGdf.to_crs("EPSG:4326", inplace=True)
#type(elec_and_geo.loc[1, 'geometry'])
# Below, I am manually creating a raster of rectangles across the contiguous United States. The points from each dataset will be tested to
# see which pixel they intersect with, then will be added to that row (and averaged if necessary). This process should allow for the data
# to be compared to one another without it mattering how densely distributed they are -- a pixel can have 1 solar data point, for example, and
# it can also have 30 land value data points as well -- because the land_value data points can all be averaged to give one land value datum
# for that pixel. Then that single solar value and single land price value can be analyzed together.
min_x = -127
max_x = -65
min_y = 23
max_y = 50
numRastersPerDegree = 8 # resolution of the grid is 8 pixels per degree, 64 pixels per square degree.
shapes = {'geometry':[]}
for x in range(min_x * numRastersPerDegree, max_x * numRastersPerDegree): # step along horizontally by 0.25 degree increments
lon = x/numRastersPerDegree
for y in range(min_y * numRastersPerDegree, max_y * numRastersPerDegree):
lat = y/numRastersPerDegree
# Define the vertices of the polygon in counterclockwise order
vertices = [(lon, lat), (lon+0.5, lat), (lon+0.5, lat+0.5), (lon, lat+0.5)]
polygon = shapely.geometry.Polygon(vertices)
#print(polygon)
shapes['geometry'].append(polygon)
print(len(shapes['geometry']))
manual_raster = geopandas.GeoDataFrame(shapes, crs="EPSG:4326")
manual_raster.plot(figsize=(20,20), color = 'white', edgecolor='black') # check to make sure raster was generated properly
# perform a spatial join to get the points contained by the same raster polygon to
# be ready to be averaged together.
landValueGdf = geopandas.read_file('land_values_trimmed/land_values_trimmed.shp')
# add land values to the raster grid I created based on where the points are contained by the raster polygons.
compiledData1 = manual_raster.sjoin(landValueGdf, how='inner', predicate='contains')
compiledData1 = compiledData1[['geometry', 'land_value']]
compiledData1 # make sure it looks ok. There are far too many rows, which is fixed in the next cell.
#compiledData1.loc[6471]
compiledData1 = compiledData1.dissolve(by=compiledData1.index, aggfunc='mean') # consolidate rows -- many polygons were duplicated.
# Importantly, the aggfunc was 'mean' so that the land values in the same polygon all contribute to that pixel's final value.
compiledData1 # the number of rows has decreased back to its proper size
# plotting the geodataframe shows that so far, we have successfully mapped point data from the land values table
# onto the empty raster grid I created.
compiledData1.plot(figsize=(20, 20), column='land_value', cmap='rainbow')
compiledData1.to_file('compiledData1')
# Next, we will add global horizontal irradiance to the table. This process is similar to the steps needed for land value data.
solarGdf = geopandas.read_file('USA_GHI/USA_GHI.shp') # get from file
solarGdf.rename(columns={'global_hor': 'GHI'}, inplace=True)
solarGdf = solarGdf[['GHI', 'geometry']] # exclude unneeded columns
solarGdf['GHI'] = pandas.to_numeric(solarGdf['GHI']) # convert strings to ints
compiledData1 = geopandas.read_file('compiledData1/compiledData1.shp')
# perform a spatial join to get the points contained by the raster polygons
compiledData2 = compiledData1.sjoin(solarGdf, how='left', predicate='contains')
compiledData2 = compiledData2[['geometry', 'land_value', 'GHI']] # remove unneeded rows
compiledData2 = compiledData2.dissolve(by=compiledData2.index, aggfunc='mean') # consolidate rows with the same geometry
compiledData2 # make sure the data looks ok.
# plotting the geodataframe shows that we have successfully mapped point data from the solar table
# onto the empty raster grid I created.
compiledData2.plot(figsize=(20, 20), column='GHI', cmap='rainbow')
compiledData2.to_file('compiledData2')
# Next, we do the same thing for wind.
windGdf = geopandas.read_file('wind_data_processed/wind_data_processed.shp') # get from file
#solarGdf['GHI'] = pandas.to_numeric(solarGdf['GHI']) # convert strings to ints
compiledData2 = geopandas.read_file('compiledData2/compiledData2.shp')
# perform a spatial join to get the points contained by my the same anual raster polygons to
# be ready to be averaged together.
compiledData3 = compiledData2.sjoin(windGdf, how='left', predicate='contains')
compiledData3 = compiledData3[['geometry', 'land_value', 'GHI', 'GCF > 30', 'GCF > 35', 'GCF > 40']]
compiledData3 = compiledData3.dissolve(by=compiledData3.index, aggfunc='mean') # consolidate rows with the same geometry
compiledData3
compiledData3.plot(column='GCF > 30', figsize=(20, 20)) # wind data has been successfully transferred to compiled data gdf.
compiledData3.to_file('compiledData3')
# Next, we add in electricity prices by state, performing a spatial join to map electricity prices onto pixels.
compiledData3 = geopandas.read_file('compiledData3/compiledData3.shp')
compiledData4 = elecGdf.sjoin(compiledData3, how='right', predicate='intersects')
compiledData4.rename(columns={'Sales (Megawatthours)': 'Sales (mwh)', 'Revenues (Thousands Dollars)': 'Revenue k$', 'Avg price ($/kwh)':'Price$/kwh'}, inplace=True)
compiledData4.dissolve(by=compiledData4.index, aggfunc='mean')
compiledData4
compiledData4.plot(figsize=(20,8), cmap='rainbow', column='Price$/kwh', vmin=0, legend=True)
matplotlib.pyplot.title('Electricity Prices by State')
matplotlib.pyplot.show() # plotting data shows that electricity prices have now been added to the compiled data gdf.
# Now we pare down the columns to only include the ones that are relevant for analysis.
compiledData5 = compiledData4[['State', 'Price$/kwh', 'land_value', 'GHI', 'GCF > 30', 'GCF > 35', 'GCF > 40', 'geometry']]
compiledData5.to_file('compiledData5')
# Pull data from file
compiledData5 = geopandas.read_file('compiledData5/compiledData5.shp')
# Finally, we can perform analysis! The first metric we will calculate is solar energy viability by location.
# Each raster has global horizontal irradiance, land value, and electricity prices associated with it.
# High GHI and high electricity prices are good for solar, while high land prices are bad. Hence the following math:
compiledData5['Solar!'] = compiledData5['GHI'] * compiledData5['Price$/kwh'] / compiledData5['land_value']
# We can also calculate a wind viability metric. Since wind requires so much less land than solar, land values
# are discounted and the only components of the analysis are capacity factor and electricity prices.
compiledData5['Wind!'] = compiledData5['GCF > 30'] * compiledData5['Price$/kwh']
# Finally, we can generate a column that combines these metrics. This column takes into account electricity
# prices, wind capacity factor, insolation, and land value all in the same metric.
compiledData5['Combined!'] = compiledData5['GHI'] * compiledData5['Price$/kwh'] * compiledData5['GCF > 30'] / compiledData5['land_value']
# A quick overview of the data table:
# Price$/kwh describes electricity prices for each pixel on the grid, under the assumption that prices are uniform across states. units: $/kwh
# land_value describes the average value of land for each pixel on the grid. Unit: ln($/ha)
# GHI describes Global horizontal index, a measure of intensity of radiation hitting the ground
# GCF describes gross capacity factor, a ratio of the amount of power produced by wind to the amount of power theoretically possible for 110m turbines.
# Solar! describes my calculated measure of solar viability for each pixel. Its calculation is described in the cell above.
# Wind! describes my calculated measure of wind viability for each pixel. Its calculation is described in the cell above.
# Combined! describes my calculated measure of the combined solar and wind viability for each pixel. Its calculation is described in the cell above.
compiledData5.describe()
compiledData5.to_file('compiledData5')
compiledData5 = geopandas.read_file('compiledData5/compiledData5.shp')
The plot below demonstrates the outsized influence that electricity prices have on the viability of solar. California is a clear winner.
compiledData5.plot(figsize=(20, 8), cmap='plasma', column='Solar!', vmin=0, legend=True)
matplotlib.pyplot.title('Renewable Energy Viability for Solar Only')
matplotlib.pyplot.show()
The map below demonstrates that the middle of the country is uniformly good for wind projects, while certain hotspots stand out in the northeast in New York and Maine.
compiledData5.plot(figsize=(20, 8), cmap='inferno', column='Wind!', vmin=0, legend=True)
matplotlib.pyplot.title('Renewable Energy Viability for Wind Only')
matplotlib.pyplot.show()
Below, we see the map incorporating all the data -- wind gross capacity factor, insolation, electricity prices, and land values. The corridor just east of the Rocky mountains is well positioned for both wind and solar, particularly eastern Colorado. Southern California has a positive outlook, and New York and Main do as well. Certain areas will be very reliant on imported electricity in the future, such as the Pacific Northwest, much of the Rocky Mountains, and the Appalachian Mountains.
compiledData5.plot(figsize=(20, 8), cmap='viridis', column='Combined!', vmin=0, legend=True)
matplotlib.pyplot.title('Renewable Energy Viability for Solar and Wind')
matplotlib.pyplot.show()