CyberTraining Workshop 2024: Emergency Evacuation Simulation with CyberGIS-Compute

Author: Rebecca Vandewalle rcv3@illinois.edu
Created: 6-15-24

This notebook provides an example of running an emergency evacuation simulation on a remote computing resource using CyberGIS-Compute. While this example will run on Virtual Roger (Keeling), a supercomputer at the University of Illinois, CyberGIS-Compute supports additional computing resources such as ACCESS (formerly XSEDE).

Overview: Running Code on CyberGIS-Compute

CyberGIS-Compute is a service for running High Performance Computing (HPC) jobs from a Jupyter Notebook. In this example, the emergency evacuation simulation is run twice, each separately using two different tasks on the supercomputer. This small example demonstrates how to run a serial script with no in-built parallelization multiple times on CyberGIS-Compute, how to pass parameters from a notebook to CyberGIS-Compute, how to access standard HPC variables (such as node_ids) from within a CyberGIS-Compute job, and how to specify the correct working and results directories for running the job script and downloading the results. The goal of this example is to demonstrate how to use CyberGIS-Compute with no or very little adjustments to the original serial script. The custom job in this notebook uses this repository: https://github.com/cybergis/cybergis-compute-fireabm.git .

Learning Objectives

  • Learn how to run an emergency evacuation simulation using Python from within a Jupyter Notebook
  • Learn how to use the CyberGIS-Compute graphic user interface to run code remotely on a High Performance Computing (HPC) resource

By the end of this session, you should be able to effectively use remote HPC resources for running an emergency evacuation simulation and be able to understand how you this capability can work for your own code.

Introduction

The evacuation simulation code base is flexible and can serve a variety of purposes. Broadly, the code models the process of evacuation on a road network in which roads are progressively closed by wildfire spread. Individual households, represented by a vehicle, must navigate out of the danger zone and re-route if the road they are currently on becomes blocked by the wildfire.

Import Libraries and Setup Simulation

(back to Table of Contents)

You will need to install legacy software versions (for backwards compatibility). Uncomment the first two lines to install older versions of networkx and osmnx.

Expected output:

Shapely version is 1.8.5.post1
Networkx version is 2.5.1
OSMnx version is 1.0.1
In [3]:
# install key libraries
!pip install --upgrade networkx==2.5.1 --quiet
!pip install --upgrade osmnx==1.0.1 --quiet
#!pip install --upgrade shapely==1.8.5.post1 --quiet

import shapely
print("Shapely version is", shapely.__version__)
import networkx
print("Networkx version is", networkx.__version__)
import osmnx
print("OSMnx version is", osmnx.__version__)
Shapely version is 1.8.5.post1
Networkx version is 2.5.1
OSMnx version is 1.0.1

In the next several cells, additional libraries and modules used in this notebook are imported and a few helper variables and functions are defined.

In [4]:
# import libraries

import glob
import IPython
from datetime import datetime
from shapely.ops import unary_union
import pytz
from pathlib import Path
import os
In [8]:
# set notebook parameters

out_path = Path(os.getcwd())
time_zone = pytz.timezone('America/Chicago') 
In [9]:
# import functions and text of FireABM_opt.py

from FireABM_opt import *
fabm_file=open('FireABM_opt.py')
fabm_lines=fabm_file.readlines()

run_fabm_file=open('run_fireabm.py')
run_fabm_lines=run_fabm_file.readlines()
In [10]:
# define helper function for displaying code
# start and stop are 0-indexed, line numbers are 1-indexed

def display_code_txt(start, stop, lines_obj):
    return IPython.display.Code(data="".join([(str(i+start+1)+" "+x) 
                                              for i, x in enumerate(lines_obj[start:stop])]), language='py3')

Evacuation Simulation Overview

(back to Table of Contents)

The code models the process of evacuation on a road network in which roads are progressively closed by wildfire spread.

There are two input datasets needed:

  1. Household distribution information
  2. A set of wildfire perimeters with timestamps

The code has two stages: init and run.

  • In the init stage, a boundary for the evacuation zone is set. The initial positions of evacuee vehicles will be generated within the boundary according to the density of households. The position of the wildfire at timestamp 0 will be used to close intersecting road segments and no vehicles will be positioned on closed roads.
  • In the run stage, the wildfire position is updated at set intervals and newly intersected roads are closed. Vehicles attempt to exit

Running sample code

(back to Table of Contents)

First, we will run a small sample simulation to showcase working code. Before running the code, we create an output directory to save generated results.

In [11]:
# determine current path and make output directory for quick start

import os
out_path = os.getcwd() # save current path 
if not os.path.isdir('demo_quick_start'): # create quick start output directory
    os.mkdir('demo_quick_start')

The following line of code runs the wildfire agent-based model evacuation simulation one time and saves the results in the demo_quick_start folder.

Important parameters (these will be described in more depth later on):

  • -nv: number of vehicles to include in the simulation
  • -sd: seed, this number is used to set the randomization of the initial vehicle positions
  • -epath: current path
  • -ofd: name of the output directory used to store results
  • -strat: driving strategy used (quickest, dist, or dist+road_type_weight)
  • -rg: formatted Osmnx road graph stored as a pickle file
  • -exdsc: experiment description (a tag to help keep track of runs)
In [12]:
# run a small simulation

!python run_fireabm.py -nv 10 -sd 1 -epath "$out_path" \
-ofd demo_quick_start -strat dist -rg Sta_Rosa_2000.pkl \
-exdsc 'demo_run' -strd 1.0
/cvmfs/iguide.purdue.edu/software/conda/iguide/lib/python3.8/site-packages/geopandas/_compat.py:124: UserWarning: The Shapely GEOS version (3.10.3-CAPI-1.16.1) is incompatible with the GEOS version PyGEOS was compiled with (3.11.2-CAPI-1.17.2). Conversions between both will be slow.
  warnings.warn(

!! starting file parse at: 15:22:04

!! checking input parameters
!! input parameters OK

!! starting full run at 15:22:04 

!! run simulation

run params: 100% shortest distance i: 0 j: 1 SEED: 1 strat_perc 1.0
	END init
	fire spreads.... 2.0
	Finished at frame_number 129

success! no: 1 run_time: 0:00:21.525367 timestamp: 15:22:25

bad_seeds []

!! ending full run at 15:22:25, elapsed time: 0:00:21.525600
!! runs completed: 1 / 1
!! Full simulation block complete!

If you have run the previous cell with the default parameters, you will find three types of results are saved in subfolders for each run:

  • in folder 1files: A file of run information, which stores things like the number of vehicles, seeds, driving strategies, clearance times etc.
  • in folder 1trajs: A file of trajectories taken by each vehicle
  • in folder 1videos: A video of the completed simulation run

Note that the '1' at the start of the folder names is from the experiment number, a flag to help keep track of different groups simulation runs. It is set by default in run_fireabm.py.

How the Evacuation Simulation Works

(back to Table of Contents)

In this section, we'll look at how the evacuation simulation code works in more detail.

Running the simulation model takes two steps. The simulation object needs to be called twice, once to set up the simulation by calling the __init__ function, and once to actually run the simulation using the run function.

Here the simulation __init__ function in FireABM_opt.py is shown. Relavent parameters are defined below.

In [13]:
# show the simulation init section

display_code_txt(1164, 1165, fabm_lines)
Out[13]:
1165     def __init__(self, g, n, bbox=None, fire_perim=None, fire_ignit_time=None, fire_act_ts_min=60, fire_des_ts_sec=10, sim_type="main", sim_number=0, reset_interval=False, start_vehicle_positons=None, nav_weight_list=None, placement_prob=None, init_strategies=None):

NetABM __init__ input parameter description:

  • g - road graph: the input Osmnx road graph
  • n - number of vehicles: the total number of vehicles used in the simulation run
  • bbox - bounding box: the bounding box of the evacuation zone, lbbox is created by the create_bboxes function with the following buffer [2.1, 4.5, 3, 3]
  • fire_perim - fire perimeter: shapefile of fire perimeters
  • fire_ignit_time - fire ignition time: 'SimTime' value for the first perimeter used in the simulation, can be adjusted to start with a later perimeter, 60 is used because is the first output fire perimeter
  • fire_des_ts_sec - fire update time intervals to seconds: translates intervals between fire times to seconds, used to speed up or slow down a fire spread from the input shapefile, for these experiments 100 is used, so that the fire expands every 100 timesteps (seconds)
  • reset_interval - reset fire interval: flag used to indicate the fire time has been translated
  • placement_prob - vehicle placement probability: value name containing the placement probabilities per initial vehicle placements
  • init_strategies - initial vehicle strategies: dictionary indicating which driving strategies are used and the percentage of vehicles that should be assigned to each strategy

Here the simulation run function in FireABM_opt.py is shown. Relevant parameters are defined below.

In [14]:
# show the simulation run section

display_code_txt(1442, 1443, fabm_lines)
Out[14]:
1443     def run(self, nsteps=None, mutate_rate=0.005, update_interval=100, save_args=None, opt_interval=35, strategies=['dist', 'dist+speed'], opt_reps=1, mix_strat=True, congest_time=25):

NetABM run input parameter description:

  • mutate_rate - the road segment closure rate: ignored if a fire shapefile is given, if no fire shapefile it is the rate at which road segments will randomly become blocked at an update interval
  • update interval - the road closure interval: used for to determine how often the mutate rate will check to close roads
  • save_args - save file arguments: used to create simulation animation, contains fig, ax created by the setup_sim function, the result file name, the video file name, the folder name for storing the results, i and j which can be used to keep track of iterating through seeds and driving strategies, the seed used, the short tag describing the treatment, the short text describing the experiment, the experiment number and notebook number, and which road graph file is used. More detailed on these can be found in the 'Simulation output structure and explanation' help document
  • congest_time - congestion time: interval at which congestion is recorded

Note that nsteps, opt_interval, strategies, opt_reps, and mix_strat are not used for the current set of experiments.

Running an Evacuation Simulation

(back to Table of Contents)

In this section, we will run an example simulation on the full road graph.

Calling the Simulation

(back to Table of Contents)

run_fireabm.py calls both the __init__ and run functions as seen below.

In [15]:
# show the simulation __init__ section of run_fireabm.py

display_code_txt(164, 167, run_fabm_lines)
Out[15]:
165                 simulation = NetABM(road_graph, args.num_veh, bbox=lbbox, fire_perim=sr_fire, fire_ignit_time=args.start_fire_time,
166                                     fire_des_ts_sec=100, reset_interval=True, placement_prob='Pct_HH_Cpd',
167                                     init_strategies={major_strat: strat_perc})
In [16]:
# show the simulation run section of run_fireabm.py

display_code_txt(168, 171, run_fabm_lines)
Out[16]:
169                 simulation.run(save_args=(fig, ax, args.rslt_file_name, args.vid_file_name, args.out_folder,
170                                i, j, seed, treat_desc, args.exp_desc, args.exp_no, args.nb_no, args.road_graph_pkl), mutate_rate=0.005, update_interval=100)
171                 run_count += 1

Running the Simulation on a Full Road Graph

(back to Table of Contents)

Note! Running the code in the following 2 cells will take approximately 15-20 minutes to complete

The following code demonstrates running a larger simulation using the full road graph used for manuscript experiments (Sta_Rosa_8000.pkl). Only 200 vehicles are used so that the runtime is ~20 minutes instead of multiple hours. However, since this takes a while, switch run_long_full_sim to True in order to run the simulation.

In [14]:
# the example full simulation takes ~15-20 minutes
# change this variable to True to run

run_long_full_sim = False
In [15]:
# run full example (shortest distance driving strategy, 200 vehicles)

if run_long_full_sim:
    if not os.path.isdir('demo_full_example'): # create full example output directory
        os.mkdir('demo_full_example')
    if os.path.isdir(os.path.join("demo_full_example", "1files")):
        print("You have likely already ran this cell: the results will be the same!")
    else:
        !python run_fireabm.py -nv 200 -sd 2 -epath "$out_path" -ofd demo_full_example -strat dist \
        -rg Sta_Rosa_8000.pkl -exdsc 'Demo quickest strat comp to mjrds and dist' -strd 1.0 -rfn 'demo_result' \
        -vfn 'demo_output'
else:
    print("Change 'run_long_full_sim' to True to run this simulation!")
Change 'run_long_full_sim' to True to run this simulation!

Once this has finished running, you can view the output data in the demo_full_example folder.

Driving Strategies used in the Evacuation Simulation

(back to Table of Contents)

In this evacuation simulation, there are three driving strategies evacuees can choose frum. In this section the strategies will be explained in more detail and demonstrated on a small road graph.

  1. Quickest path:

    • this driving strategy is commonly used in evacuation models

      How to select the path:

      • Select the shortest path from the vehicle position to the nearest exit using Dijkstra's algorithm weighted by both road segment length and road segment speed limit
      • As the simulation progresses, at each time step, for each road segment that has at least one vehicle on it, estimate the current road segment speed by the average speed of all of the vehicles currently on the road segment
      • each time at least one road segment's estimated speed is different from its usual speed (more than a small difference value to adjust for rounding), each vehicle will redetermine the path using the estimated speed in stead of the actual speed for each road segment containing vehicles
      • there typically needs to be a relatively large amount of congestion for it to make sense to take a detour that is longer distance wise
      • A new path, using the same method as described above, will also selected if a road segment on the current path is closed due to the wildfire spread
  1. Shortest path:

    • this driving strategy is also commonly used in evacuation models

      How to select the path:

      • Select the shortest path from the vehicle position to the nearest exit using Dijkstra's algorithm weighted by road segment length
      • A new path will only be selected if a road segment on the current path is closed due to the wildfire spread
  1. Major roads:

    • this driving strategy is used in an attempt to more realistically model traffic behavior observed in evacuations

      How to determine the path:

      • Select the shortest path from the vehicle position to the nearest exit using Dijkstra's algorithm weighted by road type weight multiplied by road segment length. Road type weights are determined according to the table below. Road type is a value found in OpenStreetMap's 'highway' column.
      • A new path will only be selected if a road segment on the current path is closed due to the wildfire spread
OSM Road Type weight
motorway, motorway_link, trunk, trunk_link 1
primary, primary_link 5
secondary, secondary_link 10
tertiary, tertiary_link 15
unclassified, residential, (other value) 20

Road Strategy Demonstration

(back to Table of Contents)

First we need to import the road graph. This particular graph only has two exits, both at the bottom left corner, in order to help visualize differences between paths chosen according to each driving strategy.

In [17]:
# import demo road graph used for driving strategy demo

road_graph_pkl = 'demo_road_graph.pkl'
road_graph = load_road_graph(road_graph_pkl)
gdf_nodes, gdf_edges = get_node_edge_gdf(road_graph)
(bbox, lbbox, poly, x, y) = create_bboxes(gdf_nodes, 0.1, buff_adj=[-1, -1, 0.5, -1])

The next cell displays the road graph and the bounding box.

In [18]:
# display road graph

check_graphs(gdf_edges, x, y);

As described above, road segment speeds and road types are important for the different routing strategies, so in the next two cells we view speeds and the road types found in this small road graph. Each segment has the same speed limit, which makes it easier to see differences between driving strategies.

In [19]:
# view speeds

view_edge_attrib(road_graph, 'speed', show_val=True, val=[1, 5, 10, 15, 20]);
Attribute: speed, Type: int64

Only one street has the designation of "motorway".

In [20]:
# view road types

view_edge_attrib(road_graph, 'highway')
Attribute: highway, Type: object
Out[20]:
(<Figure size 800x800 with 1 Axes>, <Axes: >)

Now we will run each driving strategy on this small graph with the following code blocks.

In [21]:
# make sure output directory exists

if not os.path.isdir('demo_driving_compare'):
    os.mkdir('demo_driving_compare')
In [22]:
# set run parameters

seed = 2
j = 0
exp_no, nb_no = 0, 0
strats = ['quickest', 'dist', 'dist+road_type_weight']
treat_desc = ['100% quickest', '100% shortest distance', '100% major roads']
exp_desc = 'demo compare driving strats'
out_path = Path(os.getcwd())
vid_path = out_path / "demo_driving_compare" / "0videos"
In [22]:
# run simulations

if os.path.isdir(os.path.join("demo_driving_compare", "0files")):
    print("You have likely already ran this cell: the results will be the same!")
else:
    for i in range(len(strats)):
        print('Starting simulation run for', strats[i])
        start_full_run_time = datetime.now(time_zone)

        road_graph_pkl = 'demo_road_graph.pkl'
        road_graph = load_road_graph(road_graph_pkl)
        gdf_nodes, gdf_edges = get_node_edge_gdf(road_graph)
        (bbox, lbbox, poly, x, y) = create_bboxes(gdf_nodes, 0.1, buff_adj=[-1, -1, 0.5, -1])

        fig, ax = setup_sim(road_graph, seed)
        simulation = NetABM(road_graph, 200, bbox=lbbox, fire_perim=None, fire_ignit_time=None, 
        fire_des_ts_sec=100, reset_interval=False, placement_prob=None, 
        init_strategies={strats[i]:1.0})

        simulation.run(save_args=(fig, ax, 'demo_driving_out', 'demo_driving_vid', 'demo_driving_compare', 
                 i, j, seed, treat_desc[i], exp_desc, exp_no, nb_no, 'demo_road_graph.pkl'), 
                       mutate_rate=0.000, update_interval=100)

    end_full_run_time = datetime.now(time_zone) 
    print('Run complete run at', end_full_run_time.strftime("%H:%M:%S")+',', 'elapsed time:', 
          (end_full_run_time-start_full_run_time))
You have likely already ran this cell: the results will be the same!

Now download and view the result videos in the demo_driving_compare folder. We'll start with the simplest driving strategy, shortest distance. Notice that all the vehicles on and to the right of the motorway road use the right most exit.

Now view the quickest driving strategy. Watch closely the vehicles that start at the center top, in this case many of them switch to use the left most exit because of the build up of congestion on the right one.

Finally, view is the major roads simulation. Here very few vehicles take the left most exit because the highway is preferred.

Working With and Customizing Simulation Input Data

(back to Table of Contents)

In order to run the full evacuation simulation in a different location, you will need to create a new road graph, use a new households file, and import a new wildfire perimeter shapefile. More details about these data and the actions needed to create them for a given study area are discussed below.

A road graph created with the OSMnx Python library is a form of a Networkx graph. This graph data structure contains nodes, edges, and data associated with nodes and edges.

The road graph is specifically a MultiDiGraph, which is a directed graph (i.e. an edge connecting node A and node B is considered different from an edge connecting node B to node A) that can contain self loops (a node can be connected to itself) and parallel edges (there can be multiple edges in the same direction between two nodes.

In [23]:
# view road_graph type (MultiDiGraph)

type(road_graph)
Out[23]:
networkx.classes.multidigraph.MultiDiGraph

Viewing Road Graph Structure with NetworkX

(back to Table of Contents)

Networkx graph methods can be directly used to interact with the road graph and its data. For example, you can list nodes, and edges, and list data for each.

In [24]:
# list first nodes in graph 

list(road_graph.nodes)[:5]
Out[24]:
[38027658, 37995105, 38126875, 38077723, 38126878]
In [25]:
# list first edges in graph 

list(road_graph.edges)[:5]
Out[25]:
[(38126875, 38062557, 0),
 (38126875, 37982610, 0),
 (38126875, 38126878, 0),
 (38126875, 38123182, 0),
 (38077723, 38039364, 0)]
In [26]:
# list nodes and data

list(road_graph.nodes(data=True))[:2]
Out[26]:
[(38027658,
  {'y': 4442613.982683914,
   'x': 395159.05469646316,
   'osmid': 38027658,
   'highway': nan,
   'lon': -88.2304975,
   'lat': 40.1273174}),
 (37995105,
  {'y': 4442611.456380954,
   'x': 395482.3892753059,
   'osmid': 37995105,
   'highway': nan,
   'lon': -88.226703,
   'lat': 40.1273349})]
In [27]:
# list edges and data

list(road_graph.edges(data=True))[:2]
Out[27]:
[(38126875,
  38062557,
  {'osmid': 5341699,
   'name': 'East Eureka Street',
   'highway': 'residential',
   'oneway': False,
   'length': 141.2340907837702,
   'geometry': <shapely.geometry.linestring.LineString at 0x7f5a27fcac70>,
   'seg_time': 9.415606052251347,
   'speed': 15,
   'ett': 9.415606052251347,
   'length_n': 0.8249213332852637,
   'rt_weight': 20,
   'rt_weighted_len': 2824.681815675404,
   'rt_wght_len_n': 0.8658169133484331}),
 (38126875,
  37982610,
  {'osmid': 5341699,
   'name': 'East Eureka Street',
   'highway': 'residential',
   'oneway': False,
   'length': 140.7986940773007,
   'geometry': <shapely.geometry.linestring.LineString at 0x7f5a27fcac40>,
   'seg_time': 9.38657960515338,
   'speed': 15,
   'ett': 9.38657960515338,
   'length_n': 0.8213591096762171,
   'rt_weight': 20,
   'rt_weighted_len': 2815.973881546014,
   'rt_wght_len_n': 0.8630867682760824})]

Viewing Road Graph Structure as a Geographic Data File

(back to Table of Contents)

While the direct access to the Networkx structure is powerful, working with a road graph as a Geographic Data Files (.gdf) file format is a useful way to inspect data attributes for nodes and edges. The default view here can be easier to sort and filter data.

Here it is easy to inspect road graph node attributes. Each node has an OSM ID, x and y coordinates, a highway designation value, longitude and latitude coordinates, and a point geometry column.

In [28]:
# inspect the first values of the nodes file

gdf_nodes.head()
Out[28]:
y x osmid highway lon lat geometry
osmid
38027658 4.442614e+06 395159.054696 38027658 NaN -88.230497 40.127317 POINT (395159.055 4442613.983)
37995105 4.442611e+06 395482.389275 37995105 NaN -88.226703 40.127335 POINT (395482.389 4442611.456)
38126875 4.442376e+06 395016.213277 38126875 NaN -88.232135 40.125160 POINT (395016.213 4442376.450)
38077723 4.441719e+06 395552.668665 38077723 NaN -88.225734 40.119309 POINT (395552.669 4441719.467)
38126878 4.442496e+06 395017.693742 38126878 NaN -88.232137 40.126233 POINT (395017.694 4442495.553)

Edges connect nodes. Each edge has a starting and ending node, which are designated as u and v nodes. Key values are used to differentiate between two edges that connect the same two nodes in the same direction (i.e. parallel edges). Edges also have an id and an OSMid. They can have a street name, have a road type (highway), have an indication of direction, length, maxspeed, number of lanes, and geometry. Additional columns are created during preprocessing to use in the simulation.

In [29]:
# inspect the first values of the edges file

gdf_edges.head()
Out[29]:
osmid name highway oneway length geometry seg_time speed ett length_n rt_weight rt_weighted_len rt_wght_len_n lanes maxspeed ref
u v key
38126875 38062557 0 5341699 East Eureka Street residential False 141.234091 LINESTRING (395016.213 4442376.450, 395037.851... 9.415606 15 9.415606 0.824921 20 2824.681816 0.865817 NaN NaN NaN
37982610 0 5341699 East Eureka Street residential False 140.798694 LINESTRING (395016.213 4442376.450, 394882.417... 9.386580 15 9.386580 0.821359 20 2815.973882 0.863087 NaN NaN NaN
38126878 0 238681201 North Fifth Street motorway False 119.111384 LINESTRING (395016.213 4442376.450, 395016.313... 7.940759 15 7.940759 0.643923 1 119.111384 0.017556 NaN NaN NaN
38123182 0 238681201 North Fifth Street motorway False 111.791100 LINESTRING (395016.213 4442376.450, 395016.183... 7.452740 15 7.452740 0.584032 1 111.791100 0.015261 NaN NaN NaN
38077723 38039364 0 5337865 North Mathews Avenue residential False 106.460513 LINESTRING (395552.669 4441719.467, 395552.811... 7.097368 15 7.097368 0.540419 20 2129.210268 0.647770 NaN NaN NaN

Working with Wildfire Input Data

(back to Table of Contents)

Refer to the FlamMap Documentation for creating simulated wildfires. For this manuscript, the fire tutorial was used to generate a series of output perimeters. These were resized and relocated to fit in the study area. Although the resulting fire spread is not realistic for the location, it demonstrates how the simulation code can use results generated in FlamMap.

Looking at the shapefile columns for the wildfire shapefile, we can see that FlamMap has generated attributes for the fire. The most important two columns used in the simulation are the geometry column, which contains a polygon that maikes up part of the fire perimeter at a specific point, and the SimTime colum, which contains the simulated time in minutes that each row belongs to.

In [30]:
# inspect fire

fire_file = load_shpfile(road_graph, ("fire_input",'santa_rosa_fire.shp'))
fire_file.head()
Out[30]:
ENTITY Fire_Type Month Day Hour SimTime Acres geometry
0 0.0 Enclave Fire 7 21 2000 2160.0 0.0 POLYGON ((-2652855.321 4915293.918, -2652857.5...
1 1.0 Enclave Fire 7 21 2000 2160.0 0.0 POLYGON ((-2653303.602 4916407.759, -2653338.8...
2 2.0 Enclave Fire 7 21 2000 2160.0 0.0 POLYGON ((-2650361.820 4918183.587, -2650351.8...
3 3.0 Enclave Fire 7 21 2000 2160.0 0.0 POLYGON ((-2650316.917 4918163.170, -2650315.1...
4 4.0 Enclave Fire 7 21 2000 2160.0 0.0 POLYGON ((-2653587.593 4916708.431, -2653535.8...

The following code shows the fire perimeters used in the simulation colored by 'SimTime', i.e. the number of minutes elapsed since the start of the simulation. Two different color portions can be seen because in this case the fire does not spread during the night.

In [31]:
# display fire

fire_file.plot(column='SimTime', legend=True)
Out[31]:
<Axes: >

Working with Households Input Data

(back to Table of Contents)

Household data is used to initially place vehicles in proportion to households within census tracts. Households data have been gathered from US Census Data Table S1101 2014-2018 American Community Survey 5-Year Estimates, Table S1101. You can download a CSV from the Census Bureau with household data and join to census tract shapefiles that have been also downloaded from the Census Bureau. Althouugh this simulation code expects census tracts, it could be modified to use other geographic areas.

This shapefile contains basic information about the census tract from the Census Bureau and the number of households per census tract has been joined to the shapefile. Important columns here are Tot_Est_HH, which contains the ACS estimate of total households per census tract from the above table, and geometry with the census tract geometry.

In [32]:
# inspect households

hh_tract = load_shpfile(road_graph, ("households", "Santa_Rosa_tracts_hh.shp"))
hh_tract.head()
Out[32]:
STATEFP COUNTYFP TRACTCE GEOID NAME NAMELSAD MTFCC FUNCSTAT ALAND AWATER INTPTLAT INTPTLON Santa_Rosa Santa_Ro_1 Santa_Ro_2 Tot_Est_HH geometry
0 06 055 201300 06055201300 2013 Census Tract 2013 G5020 S 3825479 6876 +38.3935118 -122.3655355 NaN NaN NaN NaN POLYGON ((-2625602.592 4895396.940, -2624873.5...
1 06 055 202000 06055202000 2020 Census Tract 2020 G5020 S 6672855 46827 +38.5814083 -122.5826961 NaN NaN NaN NaN POLYGON ((-2635488.687 4925562.735, -2635376.7...
2 06 097 151602 06097151602 1516.02 Census Tract 1516.02 G5020 S 33113703 228016 +38.4218634 -122.5985465 1400000US06097151602 Census Tract 1516.02, Sonoma County, California 1852.0 1852.0 POLYGON ((-2646299.185 4912693.434, -2646244.1...
3 06 097 150607 06097150607 1506.07 Census Tract 1506.07 G5020 S 11093822 0 +38.2766586 -122.6437522 1400000US06097150607 Census Tract 1506.07, Sonoma County, California 1719.0 1719.0 POLYGON ((-2657204.756 4892702.043, -2657201.8...
4 06 097 150610 06097150610 1506.10 Census Tract 1506.10 G5020 S 1349445 0 +38.2654848 -122.6401777 NaN NaN NaN NaN POLYGON ((-2655973.729 4891745.883, -2655959.1...

The following cell shows the estimated number of households per census tract in the Santa Rosa area.

In [33]:
# display households

hh_tract.plot(column="Tot_Est_HH", legend=True)
Out[33]:
<Axes: >

Scaling Up with CyberGIS-Compute

(back to Table of Contents)

As one full simulation run using the quickest driving strategy can take approximately 5 hours to run, using remote HPC resources can be very useful to obtain simulation results. CyberGIS-Compute is service for running High Performance Computing (HPC) jobs from a Jupyter Notebook within the I-GUIDE platform. In this example, the FireABM simulation script is run twice, each separately using two different tasks.

This small example demonstrates how to run a serial script with no in-built parallelization multiple times on CyberGIS-Compute, how to pass parameters from a notebook to CyberGIS-Compute, how to access standard HPC variables (such as node_ids) from within a CyberGIS-Compute job, and how to specify the correct working and results directories for running the job script and downloading the results. The goal of this example is to demonstrate how to use CyberGIS-Compute with no or very little adjustments to the original serial code. The CyberGIS-Compute job in this section uses this repository: https://github.com/cybergis/cybergis-compute-fireabm.git .

Load the CyberGIS-Compute Client

(back to Table of Contents)

The CyberGIS-Compute client is the middleware that makes it possible to access High Performance Computing (HPC) resources from within a CyberGISX Jupyter Notebook. The first cell loads the client.

In [34]:
# load cybergis-compute client
from cybergis_compute_client import CyberGISCompute

Next it is necessary to create an object for the compute client.

In [35]:
# create cybergis-compute object
cybergis = CyberGISCompute(url="cgjobsup.cigi.illinois.edu", 
                           isJupyter=True, protocol="HTTPS", port=443, suffix="v2")

CyberGIS-Compute works by pulling data and information about the job from trusted github repositories. Each job requires a GitHub repository to be created and specified when the job is created. After the GitHub repository is created, the CyberGISX team must be contacted to review the repository and if approved, add it to the available repositories that can be used with CyberGIS-Compute. You can see which repositories are supported using the list_git function. The repository we will be using is linked under fireabm.

In [36]:
# list available repositories for jobs
cybergis.list_git()
link name container repository commit
git://wrfhydro-5.x WRFHydro wrfhydro-5.x https://github.com/cybergis/cybergis-compute-v2-wrfhydro.git NONE
git://WRFHydro_Postprocess WRFHydro Post Processing wrfhydro-postprocess https://github.com/I-GUIDE/cybergis-compute-wrfhydro-postprocessing NONE
git://Watershed_DEM_Raster_Connector Watershed DEM Raster Connector deminput-connector https://github.com/I-GUIDE/cybergis-compute-deminput-connector.git NONE
git://Watershed_DEM_Processing Watershed DEM Processing demprocessing https://github.com/I-GUIDE/cybergis-compute-demprocessing.git NONE
git://three-examples three-examples cybergisx-0.4 https://github.com/alexandermichels/cybergis-compute-examples.git NONE
git://summa3 SUMMA summa-3.0.3 https://github.com/cybergis/cybergis-compute-v2-summa.git NONE
git://Subset_AORC_Forcing_Data_Processor SubsetAORCForcing Data Processor subsetaorcforcingdata-processor https://github.com/I-GUIDE/cybergis-compute-subsetaorcforcingdata-processor.git NONE
git://SimpleDataProc_Processor SimpleDataProc Processor simpledataprocess https://github.com/I-GUIDE/cybergis-compute-SimpleDataProc NONE
git://SimpleDataClean_Processor SimpleDataClean Processor simpledataclean https://github.com/I-GUIDE/cybergis-compute-SimpleDataClean NONE
git://pysal-access Pysal Access Example pysal-access https://github.com/cybergis/pysal-access-compute-example.git NONE
git://population_vulnerable_to_dam_failure Populations Vulnerable to Dam Failure extractinundationcensustracts-processor https://github.com/cybergis/population_vulnerable_to_dam_failure_compute.git NONE
git://mpi-test MPI Hello World mpich https://github.com/cybergis/cybergis-compute-mpi-helloworld.git NONE
git://hello_world hello world python https://github.com/cybergis/cybergis-compute-hello-world.git NONE
git://fireabm hello FireABM cybergisx-0.4 https://github.com/cybergis/cybergis-compute-fireabm NONE
git://Extract_Inundation_Census_Tracts_Processor ExtractInundationCensusTracts Data Processor extractinundationcensustracts-processor https://github.com/I-GUIDE/cybergis-compute-extractinundationcensustracts-processor.git NONE
git://ERA5_Connector ERA5 Data Connector era5input-connector https://github.com/I-GUIDE/cybergis-compute-era5input-connector.git NONE
git://DEM_Raster_Reprojection_Processor DEM Raster Reprojection Processor demreproject-processor https://github.com/I-GUIDE/cybergis-compute-demreproject-processor.git NONE
git://DEM_Raster_Merging_Processor DEM Raster Merging Processor demmerge-processor https://github.com/I-GUIDE/cybergis-compute-demmerge-processor.git NONE
git://DEM_Raster_Clipping_Processor DEM Raster Clipping Processor demclip-processor https://github.com/I-GUIDE/cybergis-compute-demclip-processor.git NONE
git://data_fusion data fusion datafusion https://github.com/cybergis/data_fusion.git NONE
git://Dam_Flood_Inundation_Map_Connector Dam Flood Inundation Map Connector damfiminput-connector https://github.com/I-GUIDE/cybergis-compute-damfiminput-connector.git NONE
git://Customized_Resilience_Inference_Measurement_Framework CRIM gearlab-r https://github.com/rohan-debayan/CustomizedResilienceInferenceMeasurementFramework.git NONE
git://CUAHSI_Subsetter_Connector CUAHSISubsetterInput Data Connector cuahsisubsetterinput-connector https://github.com/I-GUIDE/cybergis-compute-cuahsisubsetterinput-connector.git NONE
git://covid-access COVID-19 spatial accessibility cybergisx-0.4 https://github.com/cybergis/cybergis-compute-spatial-access-covid-19.git NONE
git://ACESTest CRIM gearlab-r https://github.com/alexandermichels/CustomizedResilienceInferenceMeasurementFramework.git NONE

Review the FireABM GitHub Repository

(back to Table of Contents)

The custom repository used in this example is https://github.com/cybergis/cybergis-compute-fireabm.git .

This repo contains the following key files:

  • manifest.json: a file that controls how the CyberGIS-Compute is run
  • runjobs.sh: a shell script that creates needed directories and runs run_fireabm.py
  • run_fireabm.py: the top level python script that runs the simulation
  • other files and directories: contain data and functions needed to run the simulation

manifest.json (https://github.com/cybergis/cybergis-compute-fireabm/blob/main/manifest.json) is a mandatory file. It must be a JSON file named manifest.json and must contain a JSON array of key value pairs that are used by CyberGIS-Compute. In particutlar, the "name" value must be set, the "container" must be set ("cybergisx-0.4" contains the same modules as a CyberGISX notebook at the time this tutorial notebook was created), and the "execution_stage" must be set. In this case "bash ./runjobs.sh" tells CyberGIS-Compute to run the shell script runjobs.sh when the job runs.

runjobs.sh (https://github.com/cybergis/cybergis-compute-fireabm/blob/main/runjobs.sh) is a shell script that runs when a CyberGIS-Compute Job is run. This script does the following actions:

  • sets a $SEED variable value based on the $param_start_value (a value set when the job is constructed within this Notebook) and #SLURM_PROCID (the task ID, a built in variable populated when the job runs on HPC)
  • creates a directory in the $result_folder (a path set by the CyberGIS-Compute Client when the job is created)
  • on one task only: copies files to the $result_folder
  • runs the python script run_fireabm.py (the serial starting script) passing in the $SEED value and the $result_folder value
  • on one task only: after the script is run, removes data files from the $result_folder (note that for real examples, this task is better done in the post_processing_stage

Variables: This shell script uses variables and directories set in a few different places. The $SEED variable is created in runjobs.sh. The $param_start_value is a value that is passed to the CyberGIS-Compute client from a notebook. This value is set in the param array within the .set() function in the next section of this notebook. #SLURM_PROCID is a built-in variable set on the HPC (other available variables can be found here: https://slurm.schedmd.com/srun.html#lbAJ)

Directories: CyberGIS-Compute client uses two primary directories which are set when the job is created. The paths to these directories can be accessed by environment variables. Although scripts are run in the $executable_folder, results should be written to the $results_folder. These folders are not in the same location. You might need to adjust your primary script if it by default writes result files in the same folder as the script. In this example, the $results_folder variable is passed to the python script, which requires an output path to use to write results.

Execution Stages: The CyberGIS-Compute client supports three stages: "pre_processing_stage", "execution_stage", and "post_processing_stage". These are each keys in the manifest.json file which expect a command to run as a value. An example of a manifest.json file that uses all three stages can be found here: https://github.com/cybergis/cybergis-compute-hello-world/blob/main/manifest.json . Ideally the clean up tasks should be performed in the "post_processing_stage" to ensure that all tasks in the execution stage are finished before performing clean up activities.

Other files and directories in the repo: The FireABM simulation needs some small input data files and a specific input directory structure. These files and directories are included in the GitHub repo and will be copied to the $executable_folder by the CyberGIS-Compute Client.

How to Setup and Run the CyberGIS-Compute Job

(back to Table of Contents)

Using the show_ui() function, you can see the user interface for CyberGIS-Compute, which will open an interactive component like in the below image.

Job Logs

First, you will need to select a template. The template loads information from the git repository we mentioned above and uses it to populate settings for the job. The template selector can be seen in the image below.

Job Logs

Now you will need to select job specific parameters for the job, such as which HPC resource to run on, how it will run on the HPC resource, and any input values needed by the code. These parameters differ for each job template. You can select a tab to expand information for the parmeters within, and then adjust certain parameters. For this demo, use the already selected parameters.

Job Logs

Optionally you can give the job a custom name and provide an email address to get job status updates. Now the job can be submitted using the submit button at the bottom of the component.

Job Logs

Once the job has been submitted, you can view live updates through the logs.

Job Logs

You can also check out the status on the Your Job Status tab, illustrated in the image below.

Job Logs

Once the job is complete, you will see a notification that it has finished.

Job Logs

Finally, you can go to the Download Job Results tab to download output generated by running the job and any output created by the job. It will download to a folder in your root Jupyter directory. For this example, select the folder demo_quick_start21 (the number might be different if you changed the start_value parameter). It will download to a globus_download_{JOBID} folder on your Jupyter enviornment, where JOBID is the ID of your cyberGIS-Compute job.

Job Logs

Run the CyberGIS-Compute Job

(back to Table of Contents)

Using the above instructions in the How to Setup and Run the CyberGIS-Compute Job section, run this next cell to bring up the CyberGIS-Compute user interface. Set the parameters for the job and run the job. When the job has finished, download the results.

In [7]:
cybergis.show_ui()
📃 Found "cybergis_compute_user.json! NOTE: if you want to login as another user, please remove this file
🎯 Logged in as rcv3@illinois.edu@jupyter.iguide.illinois.edu

Next Steps: Creating your own Custom Job

(back to Table of Contents)

If you are interested is using CyberGIS-Compute, documentation can be found at https://cybergis.github.io/cybergis-compute-python-sdk/index.html.

You can also learn more about setting up your own code to run on CyberGIS-Compute. This will involve setting up a GitHub repo with your code and a manifest.json file to direct how the code will run on a Slurm system.