Tutorial and Recipes

In this tutorial we walk through each important class and function, and explain each one’s purpose and use with an example. At the end of the tutorial in Example: Parameter sensitivity we show a more advanced workflow involving parameter sensitivity over dual parameter space that uses several of the previously described classes and functions.

Data

The Data Class class loads a PRMS data file into a Pandas DataFrame and allows for easy modification and writing of PRMS data files.

A PRMS data file holds time series variables that are used as input for running PRMS, e.g. daily air temperature and precipitation. Being tabular and date-indexed the data file is well represented and managed as a pandas.DataFrame. A common paractice in hydrologic modeling is to evaluate hydrologic response to multiple climate change scenarios. The Data class offers function-based modification of time series variable/s so that the user can quickly create new climate inputs for PRMS. The user can visualize the data file variables from easily using Pandas, matplotlib or other plotting libraries. After modification of climatic variables the Data.write method will save the data to disk in the PRMS ascii format.

from prms_python import Data
d = Data('test/data/data')

# modify daily temperature by adding 2 degrees to each input
def f(x):
  return (x + 2)

# apply function to daily temperature input
d.modify(f, ['tmax','tmin'])

# write new modified data file to disk
d.write('test/data/temp_plus2_data')

The Data.data_frame property is the pandas.DataFrame representation of hydro-climatic variables found in a PRMS data file. As such it can also be assigned within Python. However in order to create a data file from scratch using the Data object one must also assign the Data.metadata property. For an example of what is stored in the metadata attribute please refer to the data examples Jupyter notebook.

The write method of data has dual functions depending on the status of a Data instance– if the method is called before the Data.data_frame is accessed then the original file will be simply copied to the path given to write, on the other hand if the data_frame has been accessed within Python then the write method writes the current state of the data in the data_frame from memory. The first function is useful in reducing memory use and computational cost when using the Data class in more advanced workflows.

The Scenario and ScenarioSeries will soon incorporate the functionality to modify the climatic data within either a single Scenario or a series of Scenarios.

Parameters

The Parameters Class provides a NumPy-backed representation of a PRMS parameters file that allows the user to select, modify, and save PRMS parameters files. It can be used similarly to a Pandas DataFrame.

The PRMS parameters file contains data arrays of varying dimensionality, which is why we can’t simply use a DataFrame to do these manipulations. The implementation of the Parameters class is loosely based on the netCDF data structure, where metadata about each parameter is kept separately. Parameters are read into memory only if the user selects or modifies a particular parameter. This allows for memory-efficient processing of Parameter files.

Below is an example of reading a parameter file, reading a particular variable from a parameter file, replacing that parameter data with other data (in this case an array of all zeros), then saving the modified parameters to a new Parameters file.

from prms_python import Parameters
p = Parameters('test/data/parameter')

# select PRMS parameter by name, raising KeyError if DNE
snow_adj = p['snow_adj']
assert snow_adj.shape == (12, 16)

# assign values to PRMS parameter
import numpy as np
z = np.zeros(snow_adj.shape)
p['snow_adj'] = z  # now p['snow_adj'] is 12x16 matrix of zeros

# write modified parameters to file
p.write('newparameters')

The Scenario and ScenarioSeries use this functionality (via the prms_python.modify_params function) to implement either a single Scenario or a series of Scenarios.

Simulation & SimulationSeries

The Simulation class provides a simple wrapper around running the PRMS model. It encourages standardization of input file names by requiring the three PRMS inputs to be named data, parameters, and control. In order to add some natural metadata to the inputs, the user should use a memorable name for the directory that holds these three files. The Simulation and SimulationSeries however are more useful as building blocks for more advanced workflows or for new PRMS-Python submodules and routines, e.g. new optimization routines.

After the user prepares their input files, say into a directory called prms-sim-example, they can run the following in either a Python script or a Python REPL

from prms_python import Simulation
sim = Simulation('prms-sim-example')
sim.run()

This will run PRMS assuming the PRMS executable is on the system path and is called prms. In this usage, the outputs will go to the prms-sim-dir directory. If the user wishes to use a different executable name or provide the path to it explicitly, they can do so by replacing

sim.run()

with

sim.run(prms_executable='path/to/myPRMSExecutable')

Another available option is to specify a different directory to use as the “simulation directory,” which can be useful if you want to separate a directory with only input data from directories where both input and output model run data will be stored. You can do this by specifying an additional keyword argument in the Simulation constructor, like so

sim = Simulation('prms-sim-example', simulation_dir='sim-dir-1')
sim.run()

Additional examples of Simulation including the file structure can be found in the API prms_python.Simulation.run() and the class method from_data which allows for initialization from PRMS-Python Data and Parameter objects can be found in the API at prms_python.Simulation.from_data().

SimulationSeries

The SimulationSeries class offers the same functionality as Simulation for an arbitrary number of simulations, with the added function of running PRMS in parralel. The example below is taken from the PRMS-Python API.

Lets say you have already created a series of PRMS models by modifying the input climatic forcing data, e.g. you have 100 data files and you want to run each using the same control and parameters file. For simplicity lets say there is a directory that contains all 100 data files e.g. data1, data2, … or whatever they are named and nothing else. This example also assumes that you want each simulation to be run and stored in directories named after the data files as shown.

data_dir = 'dir_that_contains_all_data_files'
params = Parameters('path_to_parameter_file')
control_path = 'path_to_control'
# a list comprehension to make multiple simulations with
# different data files, alternatively you could use a for loop
sims = [
        Simulation.from_data
          (
            Data(data_file),
            params,
            control_path,
            simulation_dir='sim_{}'.format(data_file)
          )
        for data_file in os.listdir(data_dir)
        ]

Next we can use SimulationSeries to run all of these simulations in parrallel. For example we may use 8 logical cores on a common desktop computer.

sim_series = SimulationSeries(sims)
sim_series.run(nprocs=8)

The SimulationSeries.run() method will run all 100 simulations where chunks of 8 at a time will be run in parrallel. Inputs and outputs of each simulation will be sent to each simulation’s simulation_dir following the file structure of Simulation.run().

Scenario & ScenarioSeries

The Scenario class implements data management on top of the Simulation class, enforcing the user to separate base input data and simulation input and output data, plus simple, optional metadata. Let’s dive in with an example, assuming there are properly-formed files called data, control, and parameters, in a directory called base-inputs. We’ll use a simulation directory called sim-dir and further provide a title and description for the Scenario. If sim-dir exists it will be overwritten and if it does not exist it will be created. It’s up to the user to make sure data doesn’t get overwritten.

Both Scenarios and ScenarioSeries have a three-step process for set-up and run. First the Scenario or ScenarioSeries must be initialized with the base and simulation paths, plus, optionally, a title and description. Next, the Scenario(Series) must be “built”. This means defining which/how parameters should be modified.

Scenario

First, let’s see how we implement these three steps for a single Scenario. We’ll just increase one parameter, jh_coef, by 10%, or multiply by a scaling factor of 1.10.

sc = Scenario('base-inputs', 'sim-dir',
              title='Example Scenario',
              description='''
For the case of documentation we are including some example code.
Unless you actually have some inputs in the base-inputs directory used above
this will fail in an interpreter.
''')
def scale_1p1(x):
    return x * 1.1
sc.build({'jh_coeff': scale_1p1})
sc.run()

ScenarioSeries

Now let’s build and run a series of scenarios. Each Scenario in the series is specified by a dictionary that needs to have the title of the scenario and a key-value pair of parameter-function for every parameter that should be modified. In this example, we’ll still just scale jh_coef, but now over a range of values from 0.5 to 1.5, in increments of 0.1.

base_dir = '../models/lbcd/'
simulation_dir = 'example-sim-series-dir'
title = 'Jensen-Hays and Radiative Transfer Function Sensitivity Analysis'
description = '''
Use title of \'"jh_coef":{jh factor value}\' so later
we can easily generate a dictionary of these param/function combinations.
'''
sc_series = ScenarioSeries(base_dir, simulation_dir, title, description)

# define the scenario_list used to build the ScenarioSeries;
# build series in three steps:

#  1) define fun to return a function that scales a value by an amount
def _scale_fun(scale_val):
    def scale(x):
        return x * scale_val

    return scale
#  2) use the function generator `_scale_fun` in scenario_list comprehension
scenario_list = [
    {
        'title': '"jh_coef":{0:.1f}'.format(jh_val),
        'jh_coef': _scale_fun(jh_val),
    }
    for jh_val in np.arange(0.5, 1.5, 0.1)
]
#  3) "build" the series, meaning create scenario inputs and scenario dirs
sc_series.build(scenario_list)

sc_series.run()  # could provide nproc, ex: sc_series.run(nproc=10)

If, for example, we wanted to co-vary jh_coef with scalings of rad_trncf (or any other parameter) we can use the following as a recipe. Just add one more key/value pair to the dictionaries generated in the list comprehension that build the scenario_list.

scenario_list = [
    {
        'title': '"jh_coef":{0:.1f}|"rad_trncf":{1:.1f}'.format(jh_val, rad_val),
        'jh_coef': _scale_fun(jh_val),
        'rad_trncf': _scale_fun(rad_val)
    }
    for jh_val in np.arange(0.5, 1.5, 0.1)
    for rad_val in np.arange(0.5, 1.5, 0.1)
]

Note that this will square the number of scenarios to be done.

The title might look strange, but it is useful as part of the metadata to recover information about the individual Scenarios in the data analysis steps shown below in Example: Parameter sensitivity. Alternatively, if the title is omitted the subdirectory names of each scenario will not be intuitively matched to the unique universal identifiers that are assigned automatically by ScenarioSeries.build. However metadata for each scenario’s simulation will be stored in its respective directory and could later be used to refer which parameter(s) were modified and how because the metadata file contains a text representation of the Python functions that were used to modify the parameter(s).

Additional explanations and examples including the file structures and metadata created by the Scenario and ScenarioSeries are found in the API prms_python.Scenario and prms_python.ScenarioSeries.

Optimizer & OptimizationResult

The Optimizer class holds routines for PRMS parameter optimization or calibration, and sensitivity.uncertainty analysis. Currently the Optimizer.monte_carlo method offers a parameter resampling routine that can automate the resampling or an arbitrary number of PRMS parameters, conduct simulations for each set of resampled parameters, and self-generate metadata for each. The routine uses the stand-alone function prms_python.optimizer.resample_param which utilizes the uniform and normal distributions with added functionalities for parameters of varying dimensions. In other words there are different rules for parameter resampling for spatial parameters or parameters of large dimension than those of single value or monthly dimensions.

The OptimizationResult class is designed to aid management and analysis of output from a single optimization stage.

Note, this section is currently under development, please refer to the example Jupyter notebook here for detailed documentation of the monte_carlo parameter resampling routine. And the notebook here for explanations and examples for OptimizationResult.

load_data & load_statvar

Among other uses, if we want to compare the performance of our model to historical data for the purposes of parameterization or analyzing climate change scenarios, we will have to load the input and output hydrographs. The two functions prms_python.load_data and prms_python.load_statvar read the data and statvar files into a Pandas DataFrame, which allows for streamlined plotting and analysis.

Here is a simple example of how to use these functions to generate a plot like (not identical to) the one shown in Comparison of observed and modeled flow.

import matplotlib.pyplot as plt

from prms_python import load_data, load_statvar

data_df = load_data('path/to/data')
data_df.runoff_1.plot(label='observed')

statvar_df = load_statvar('path/to/statvar.dat')
statvar_df.basin_cfs_1.plot(label='modeled')

plt.legend()
plt.show()

Example: Parameter sensitivity

This is a full example of how the tools outlined above can be used together to build a parameter sensitivity analysis and goodness-of-fit. We’ll be modifying two parameters, the monthly jh_coef and the HRU scale rad_trncf. We will create a list of scenario definitions to “build” the prms_python.ScenarioSeries. We’ll then use the parallelized prms_python.ScenarioSeries.run() method to execute all requested scenarios.

This is adapted from the scenario_series.ipynb, viewable on GitHub. There are some details on customizing the plots that can be viewed there.

See inline comments for more details.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
import itertools
import matplotlib.pyplot as plt
import numpy as np

from prms_python import (
    ScenarioSeries, load_data, load_statvar, nash_sutcliffe
)

# define some ScenarioSeries metadata and initialize the series
base_dir = '../models/lbcd/'
simulation_dir = 'example-sim-series-dir'
title = 'Jensen-Hays and Radiative Transfer Function Sensitivity Analysis'
description = '''
Use title of \'"jh_coef":{jh factor value}|"rad_trncf":{rad factor value}\' so later
we can easily generate a dictionary of these factor value combinations.
'''
sc_series = ScenarioSeries(base_dir, simulation_dir, title, description)

# define the scenario_list used to build the ScenarioSeries;
# build series in three steps:

#  1) define fun to return a function that scales a value by an amount
def _scale_fun(scale_val):
    def scale(x):
        return x * scale_val

    return scale
#  2) use the function generator `_scale_fun` in scenario_list comprehension
scenario_list = [
    {
        'title': '"jh_coef":{0:.1f}|"rad_trncf":{1:.1f}'.format(jh_val, rad_val),
        'jh_coef': _scale_fun(jh_val),
        'rad_trncf': _scale_fun(rad_val)
    }
    for jh_val in np.arange(0.7, 1.0, 0.1)
    for rad_val in np.arange(0.7, 1.0, 0.1)
]
#  3) "build" the series, meaning create scenario inputs and scenario dirs
sc_series.build(scenario_list)

sc_series.run()  # could provide nproc, ex: sc_series.run(nproc=10)

# now we want to analyze the results by plotting the model efficiency matrix
# for the two parameters we varied, in three steps:
#  1) Load basin_cfs_1 streamflow timeseries for every scenario
metadata = json.loads(
    open(os.path.join(simulation_dir, 'series_metadata.json')).read()
)

def _build_statvar_path(uu):
    'Given a scenario UUID, build the path to the statvar file'
    return os.path.join(simulation_dir, uu, 'outputs', 'statvar.dat')

modeled_flows = {
    title: load_statvar(_build_statvar_path(uu)).basin_cfs_1
    for uu in metadata['uuid_title_map'].iteritems()
}

#  2) load the data file which contains the original streamflow
data_path = os.path.join(base_dir, 'data')
data_df = load_data_file(data_path)
observed = data_df.runoff_1

#  3) check model sensitivity via the Nash-Sutcliffe goodness of fit
# define index lookup for scaling labels
idx_lookup = {
    '{:.1f}'.format(val): idx
    for idx, val in enumerate(np.arange(0.7, 1.0, 0.1))
}
# initialize the Nash-Sutcliffe matrix with all zeros
nash_sutcliffe_mat = np.zeros((4, 4))
# build nash_sutcliffe_mat
for title, hydrograph in modeled_flows.iteritems():

    param_scalings = eval('{' + title.replace('|', ',') + '}')
    coord = (
        idx_lookup[str(param_scalings['jh_coef'])],
        idx_lookup[str(param_scalings['rad_trncf'])]
    )

    nash_sutcliffe_mat[coord] = nash_sutcliffe(observed, hydrograph)

# Finally let's visualize these results. First just a comparison of
# one of the modeled flows and the observed streamflow; Figure 1 below.
observed.plot(label='observed')

ex_uuid, ex_title = metadata['uuid_title_map'].iteritems().pop()
ex_modeled_flow = load_statvar(_build_statvar_path(ex_uuid)).basin_cfs_1
ex_modeled_flow.plot(label=ex_title.replace('"', '').replace('|', ', '))

# now let's plot the Nash-Sutcliffe Matrix, Figure 2 below
plt.ylabel('Streamflow (cfs)')
plt.legend()
plt.show()

fig, ax = plt.subplots()

cax = ax.matshow(nash_sutcliffe_mat, cmap='viridis')
tix = [0.7, 0.8, 0.9, 1.0]
plt.xticks(range(4), tix)
plt.yticks(range(4), tix)


ax.xaxis.set_ticks_position('bottom')
plt.ylabel('jh_coef factor')
plt.xlabel('rad_trncf factor')

for i, j in itertools.product(range(4), range(4)):
    plt.text(j, i, "%.2f" % nash_sutcliffe_mat[i, j],
             horizontalalignment="center",
             color="w" if nash_sutcliffe_mat[i, j] < .61 else "k")

plt.title('Nash-Sutcliffe Matrix')
plt.grid(b=False)
cbar = fig.colorbar(cax)

The resulting plots from the end of the example script are shown below

comparison of observed and modeled flow

Comparison of observed and modeled flow

nash-sutcliffe matrix

Nash-Sutcliffe Matrix of model efficiencies