CAMELS (Catchment Attributes and Meteorology for Large-sample Studies) is a large-sample hydrometeorological dataset that provides catchment attributes, forcings and GIS data for 671 small- to medium-sized basins across the CONUS (continental United States). HydroShare hosts a copy of CAMELS and exposes it through different public data access protocols (WMS, WFS and OPeNDAP) for easy visualization, retrieval and subsetting of the dataset in community modeling research. This notebook demostrates how to set up SUMMA models with CAMELS dataset from HydroShare using various tools integrated in the CyberGIS-Jupyter for Water (CJW) environment and execution of ensemble model runs on a High Performance Computing (HPC) resource through CyberGIS-Compute Service.
The CAMELS dataset is currently stored in two HydroShare resources.
In HydroShare, shapefiles are represented as GeographicFeatureConetentType and exposed through OGC WMS and WFS services using GeoServer; NetCDF files are represented as MultideimentialContentType and exposed through OPeNDAP protocol using Hyrax Data Server.
To programmatically access or subset the data in Jupyter Notebook environment, the use of one or more client tools that are compatiable to the above protocols are often required, which will be introduced in later sections. For now, we just take advantage of the simple web interfaces built into GeoServer and Hyrax Data Server for quick data preview.
In this section, we will set up SUMMA model for a user-picked CAMELS basin. Several steps are required:
Select a CAMELS basin (hru_id) and simulation period (start_datetime and end_datetime);
Subset NLDAS hourly forcing data;
Subset basin attribute and parameter files;
Create initial conditions;
Run the single model on Jupyter server;
Build ensemble model and run on HPC through CyberGIS Computing Service;
You may put the "hru_id" of your interested CAMELS basin below if you already know it. Otherwise we will interactively select a basin on the map. For simulation period, the start_datetime and end_datetime should be in "YYYY-MM-DD HH:MM" format and within "1980-01-01 00:00" to "2018-12-31 23:00" (temporal coverage of NLDAS forcings).
# id of CAMELS basin
hru_id = 13313000
# simulation period (YYYY-MM-DD) within "1980-01-01 00:00" to "2018-12-31 23:00" (39-year)
start_datetime = "1991-01-01 00:00"
end_datetime = "2000-12-31 23:00"
Here we provide an interactive map for you to view the 671 CAMELS basins. By taking advantage of the OGC WFS service that HydroShare has set up on top of the basin shapefile "HCDN_nhru_final_671.shp", we can retrieve the basin geometry in GeoJSON and visualize it using ipyleaflet.
You may hover over a basin to check hru_id shown in the bottom-right corner. You may click on a basin in the map to select it for modelling. A popuop with more info (hru_id, lat, lon, area, elevation and perimeter) about the basin will show up in the upper-left corner. NCAR provides high-resolution basin image and elevation map for some basins located in the western coast. For those basins, the popup shows thumbnails that are clickable and linked to the originals.
Note: Clicking on a basin indicates it is selected for modelling, and this action will overwrite and upate the above "hru_id" variable.
import json
import random
import requests
from ipywidgets import HTML
from ipyleaflet import Map, ZoomControl, GeoJSON, Popup, WidgetControl
from shapely.geometry import shape, GeometryCollection
# A WFS request made to HydroShare-provisioned GeoServer to retrieve basin geometry in geojson format
url = "https://geoserver.hydroshare.org/geoserver/HS-a28685d2dd584fe5885fc368cb76ff2a/wfs?service=wfs&version=2.0.0&request=GetFeature&typeNames=HS-a28685d2dd584fe5885fc368cb76ff2a:HCDN_nhru_final_671&outputFormat=JSON"
r = requests.get(url)
geojson_str = r.content.decode("utf-8")
m = Map(center=(39, -104), zoom=4, zoom_control=False)
m.add_control(ZoomControl(position='topright'))
geo_json = GeoJSON(
data=json.loads(geojson_str),
style={'opacity': 0, 'fillOpacity': 0.6, 'weight': 1},
hover_style={'color': 'cyan', 'opacity': 1, 'weight': 3, 'fillOpacity': 0},
style_callback=lambda feature: {'color': 'white',
'fillColor': random.choice(['red', 'yellow', 'green', 'orange']),}
)
geo_json_clicked = None
def geojson_onclick_handler(event=None, id=None, properties=None, feature=None):
global geo_json_clicked
global hru_id
if geo_json_clicked is not None:
geo_json_clicked.on_hover(basin_hover_html, remove=True)
m.remove_layer(geo_json_clicked)
_hru_id = feature["properties"]["hru_id"]
geom = GeometryCollection([shape(feature["geometry"])])
bounds = geom.bounds
swne = [(bounds[1], bounds[0]), (bounds[3], bounds[2])]
m.fit_bounds(swne)
geo_json_clicked = GeoJSON(
data=feature,
style={'color': 'black', 'opacity': 1, 'fillColor': 'cyan', 'fillOpacity': 1},
)
m.add_layer(geo_json_clicked)
geo_json_clicked.on_hover(basin_hover_html)
basin_click_html(feature)
# update global hru_id
hru_id = _hru_id
geo_json.on_click(geojson_onclick_handler)
m.add_layer(geo_json)
html_click = HTML('''Click on a basin to start modeling''')
html_click.layout.margin = '0px 20px 20px 20px'
control_click = WidgetControl(widget=html_click, position='topleft')
m.add_control(control_click)
html_hover = HTML('''Hover over to check hru_id''')
html_hover.layout.margin = '0px 20px 0px 20px'
control_hover = WidgetControl(widget=html_hover, position='bottomright')
m.add_control(control_hover)
def get_feature_by(geojson_layer, key, value, first=False):
out = []
for fea in geojson_layer.data['features']:
if fea["properties"][key] == value:
if first == True:
return fea
out.append(fea)
return out
def check_url(url):
# check if a url is reachable
return requests.head(url).status_code == 200
def img_html(url, alt, link=None):
img_value = '<img src="{url}" alt="{alt}" width="100" height="600">'.format(url=url, alt=alt)
if link is not None:
img_value = '<a href="{link}" target="_blank"> {img_value} </a>'.format(link=link, img_value=img_value)
return img_value
def basin_click_html(feature, **kwargs):
hru_id = feature['properties']["hru_id"]
watershed_url = "https://ral.ucar.edu/staff/wood/watersheds/basin_figs/{hru_id}.watershed.png".format(hru_id=hru_id)
dem_url = "https://ral.ucar.edu/staff/wood/watersheds/dem_figs//{hru_id}.dem.png".format(hru_id=hru_id)
fields = ["hru_id", "lon_cen", "lat_cen", "AREA", "elev_mean", "Perimeter"]
table_row_tmpl = '''<tr><th scopt="row">{}</td><td>{}</td></tr>'''
tbody_value = ''
for field in fields:
row = table_row_tmpl.format(field, feature['properties'][field])
tbody_value = tbody_value + row
html_value = '''<h4><b>Selected Basin</b></h4>
<table class="table table-striped"><tbody>{tbody_value}</tbody></table>'''.format(tbody_value=tbody_value)
if check_url(watershed_url):
html_value = html_value + img_html(watershed_url, "Watershed Boundary", link=watershed_url)
if check_url(dem_url):
html_value = html_value + img_html(dem_url, "DEM", link=dem_url)
html_click.value = html_value
def basin_hover_html(feature, **kwargs):
hru_id = feature['properties']["hru_id"]
html_value = '''<h5><b>{hru_id}</b></h5>'''.format(hru_id=hru_id)
html_hover.value = html_value
geo_json.on_hover(basin_hover_html)
m
# if user didn't click on a basin, use default basin and zoom to it on map
if geo_json_clicked is None:
hru_fea = get_feature_by(geo_json, "hru_id", hru_id, first=True)
geojson_onclick_handler(feature=hru_fea)
# the selected basin
hru_id_selected = hru_id
# display selected hru_id and simulation period as header as a confirmation
from IPython.display import Markdown as md
md("## Prepare NLDAS forcings for CAMELS Basin '{hru_id}' over {start_datetime} to {end_datetime}".format(hru_id=hru_id_selected, start_datetime=start_datetime, end_datetime=end_datetime))
In this section, we will subset NLDAS forcing data to the above user-selected basin over the requested simulation period. The orginal forcing contains data for 671 CAMELS basins from 1980-2018 (39 years) in a single 6GB NetCDF file "nldasForcing1980to2018.nc". HydroShare exposes the NetCDF file through OpenDAP protocol using Hydrax, which enables users to directly retrieve a portion of the data without having to download the whole file to Jupyter environment. The public OpenDAP access url for a specific NetCDF file hosted on HydroShare follows the following pattern:
http://hyrax.hydroshare.org/opendap/hyrax/
{RESOURCE_ID}/data/contents/
{NETCDF_FILE_NAME}
In the case of NLDAS CAMELS forcing file "nldasForcing1980to2018.nc", the access url is:
http://hyrax.hydroshare.org/opendap/hyrax/a28685d2dd584fe5885fc368cb76ff2a/data/contents/nldasForcing1980to2018.nc
We would need to use a OpenDAP-complaint client tool to open the url (putting it in browser directly only results in a warning message). Here we chose to use XArray for this purpose.
The following cell shows the metadata of the remote NLDAS NetCDF file through OpenDAP. Under the Demensions tab, you can see it has 671 basins and 341880 timesteps (14245 days * 24 timesteps/day).
import os
import sys
import time
import pandas as pd
import numpy as np
import xarray as xr
opendap_access_url = 'http://hyrax.hydroshare.org/opendap/hyrax/a28685d2dd584fe5885fc368cb76ff2a/data/contents/nldasForcing1980to2018.nc'
opendap_access_url = 'https://thredds.hydroshare.org/thredds/dodsC/hydroshare/resources/a28685d2dd584fe5885fc368cb76ff2a/data/contents/nldasForcing1980to2018.nc'
forcing_all = xr.open_dataset(opendap_access_url)
forcing_all
Here we use hru_id, start_datetime and end_datetime to subset the NLDAS forcing NetCDF. The new Dimensions should show only 1 basin and whatever timesteps that matches the selected simulation period (num of days * 24 timesteps/day). Note that we also tweaked the resulting file a bit (set variable "data_step" and remove attribute "_NCProperties") to make it compatible with SUMMA model.
# subset by basin
the_hru = np.array([hru_id_selected])
forcing = forcing_all.sel(hru=the_hru)
# subset by simulation period
forcing = forcing.loc[dict(time=slice(start_datetime, end_datetime))]
# tweak the resulting netcdf for summa model
forcing['data_step'] = 3600
#del forcing.attrs['_NCProperties']
forcing
top_folder = os.path.join(os.getcwd(), 'summa_camels')
settings_folder = os.path.join(top_folder, 'settings')
output_folder = os.path.join(top_folder, 'output')
%%time
truth = forcing
t0 = truth['time'].values[0]
tl = truth['time'].values[-1]
t0_s = pd.to_datetime(str(t0))
t0_sf =t0_s.strftime('%Y%m%d')
tl_s = pd.to_datetime(str(tl))
tl_sf =tl_s.strftime('%Y%m%d')
!mkdir -p {top_folder}/data/forcing
ffname ='NLDAS_' + str(hru_id_selected) + "_" + t0_sf +'-' + tl_sf +'.nc'
truth.to_netcdf(top_folder+'/data/forcing/'+ffname)
truth.close()
fflistname = settings_folder+'/forcingFileList.txt'
file =open(fflistname,"w")
file.write(ffname)
file.close()
import matplotlib.pyplot as plt
%matplotlib inline
#Plot hourly
constant_vars=['airpres','airtemp','LWRadAtm','pptrate','spechum','SWRadAtm','windspd']
fig, axes = plt.subplots(nrows=7, ncols=1, figsize=(20, 20))
axes = axes.flatten()
axes[0].set_title('Hourly')
unit_str = ['($ ^o K$)', '($kg/m/s$)', '($W/m^2$)','($w/m^2$)','($g/g$)','($Pa$)', '($m/s$)',]
forcing_plt= forcing
for idx, var in enumerate(constant_vars):
forcing_plt[var].plot(ax=axes[idx],label='NLDAS')
axes[idx].set_title('')
axes[idx].set_ylabel('{} {}'.format(var, unit_str[idx]))
axes[idx].set_xlabel('Date')
plt.tight_layout()
plt.legend()
SUMMA uses a number of files to specify model attributes and parameters. Although SUMMA's distinction between attributes and parameters is somewhat arbitrary, attributes generally describe characteristics of the model domain that are time-invariant during the simulation, such as GRU and HRU identifiers, spatial organization, an topography. The important part for understanding the organization of the SUMMA input files is that the values specified in the local attributes
file do not overlap with those in the various parameter files. Thus, these values do not overwrite any attributes specified elsewhere. In contrast, the various parameter file are read in sequence (as explained in the next paragraph) and parameter values that are read in from the input files successively overwrite values that have been specified earlier.
Since the CAMELS basin attribute NetCDF file attributes.camels.v2.nc is pretty small (70KB), a copy is included in the summa model folder. We just need to subset it to our selected basin.
# remove existing attributes.nc if any
! rm -rf {settings_folder}/attributes.nc
# Attributes
the_gru = the_hru
attrib_orig = xr.open_dataset(settings_folder+'/attributes.camels.v2.nc')
attrib = attrib_orig.copy()
attrib_orig.close()
attrib = attrib.assign_coords(hru=attrib['hruId'])
attrib = attrib.assign_coords(gru=attrib['gruId'])
gg = attrib['gruId'] # save because gruId was missing from the parameter file
attrib = attrib.sel(hru=the_hru)
attrib = attrib.sel(gru=the_gru)
attrib = attrib.drop(['hru','gru']) #summa doesn't like these to have coordinates
attrib.to_netcdf(settings_folder+'/attributes.nc')
attrib.close()
The trial parameters file is a NetCDF file that specifies model parameters for GRUs and individual HRUs. This enables the user to overwrite the default and/or Noah-MP parameter values with local-specific ones.
Since the CAMELS parameter NetCDF file trialParams.camels.Oct2020.nc is pretty small (250KB), a copy is included in the summa model folder. We just need to subset it to our selected basin.
# remove existing parameters.nc if any
! rm -rf {settings_folder}/parameters.nc
# Parameters
param_orig = xr.open_dataset(settings_folder+'/trialParams.camels.Oct2020.nc')
param = param_orig.copy()
param_orig.close()
param = param.assign_coords(hru=param['hruId'])
param = param.assign_coords(gru=gg) # there should be a gruId in here, but there wasn't
param = param.sel(hru=the_hru)
param = param.sel(gru=the_gru)
param = param.drop(['hru','gru']) #summa doesn't like these to have coordinates
param.to_netcdf(settings_folder+'/parameters.nc')
param.close()
!cd {settings_folder}; rm -rf init_cond.nc; {sys.executable} gen_coldstate.py attributes.nc init_cond.nc int
! cd {top_folder}; chmod +x installTestCases_local.sh; ./installTestCases_local.sh
# get full path to summa executable
executable = os.popen('which summa.exe').read().split("\n")[0]
file_manager = top_folder+'/settings/file_manager.txt'
!mkdir -p {output_folder}
import pysumma as ps
camels_summa = ps.Simulation(executable, file_manager)
camels_summa.manager['simStartTime'] = start_datetime
camels_summa.manager['simEndTime'] = end_datetime
camels_summa.manager.write()
print(camels_summa.manager)
# remove old output files if any
! rm -rf {output_folder}/*.*
%%time
camels_summa.run('local')
print(camels_summa.stdout)
camels_summa.output
output_variable = ['pptrate', 'airtemp', 'spechum', 'windspd', 'SWRadAtm', 'LWRadAtm', 'airpres', 'scalarCanopyWat', 'scalarSWE',
'scalarTotalSoilWat', 'scalarSenHeatTotal', 'scalarLatHeatTotal', 'scalarSnowSublimation', 'scalarRainPlusMelt',
'scalarInfiltration', 'scalarSurfaceRunoff', 'scalarSoilDrainage', 'scalarAquiferBaseflow', 'scalarTotalET',
'scalarTotalRunoff', 'scalarNetRadiation' ]
fig = plt.figure(figsize=(18,18))
for i in range(len(output_variable)):
fig.add_subplot(7, 3, i+1)
plt.plot(camels_summa.output['time'], camels_summa.output[output_variable[i]], label=output_variable[i])
plt.legend(fontsize=11, loc=2)
camels_summa.output.close()
decisions = {
'stomResist': ['Jarvis', 'BallBerry', 'simpleResistance'],
}
parameters = {
'aquiferBaseflowExp': [1.0, 5.0, 10.0],
'qSurfScale': [1.0, 100.0],
}
config = ps.ensemble.total_product(dec_conf=decisions, param_trial_conf=parameters)
print(len(config))
import tempfile
import shutil, os
workspace_dir = os.path.join(os.getcwd(), 'workspace')
!mkdir -p {workspace_dir}
unzip_dir = tempfile.mkdtemp(dir=workspace_dir)
model_folder_name = "summa_camels"
model_folder = os.path.join(unzip_dir, model_folder_name)
shutil.make_archive(model_folder_name, 'zip', os.getcwd()+"/summa_camels")
!unzip -o {model_folder_name}.zip -d {model_folder}
shutil.rmtree(os.path.join(model_folder, "output"))
!mkdir -p {os.path.join(model_folder, "output")}
import json
with open(os.path.join(model_folder, 'summa_options.json'), 'w') as outfile:
json.dump(config, outfile)
# check ensemble parameters
print("Number of ensemble runs: {}".format(len(config)))
print(json.dumps(config, indent=4, sort_keys=True)[:800])
print("...")
from job_supervisor_client import *
communitySummaSession = Session('summa', isJupyter=True)
communitySummaJob = communitySummaSession.job() # create new job
communitySummaJob.upload(model_folder)
communitySummaJob.submit(payload={
"node": 18,
"machine": "keeling",
"file_manager_rel_path": "settings/file_manager.txt"
})
%%time
communitySummaJob.events(liveOutput=True)
%%time
job_dir = os.path.join(model_folder, "{}".format(communitySummaJob.id))
!mkdir -p {job_dir}/output
communitySummaJob.download(job_dir)
!cd {job_dir} && unzip *.zip -d output
# check output directory
output_path = os.path.join(job_dir, "output")
# check SUMMA output file
name_list = os.listdir(output_path)
full_list = [os.path.join(output_path,i) for i in name_list if i.endswith(".nc")]
sorted_list = sorted(full_list)
sorted_list = sorted(sorted_list, key=lambda v: v.upper())
for f in sorted_list:
print(f)
print("Number of NC files: {}".format(len(sorted_list)))
import matplotlib.pyplot as plt
%matplotlib inline
def plot_ensemble_output_var(nc_list, var_name):
fig = plt.figure(figsize=(18, 10))
for i in nc_list:
ds = xr.open_dataset(i)
plt.plot(ds.time.values, ds[var_name].values, label=i.split("_")[-2])
plt.legend()
ds.close()
plot_ensemble_output_var(sorted_list, "scalarTotalRunoff")
plot_ensemble_output_var(sorted_list, "scalarInfiltration")
plot_ensemble_output_var(sorted_list, "scalarAquiferBaseflow")
plot_ensemble_output_var(sorted_list, "scalarTotalSoilWat")