15 Adding to PEcAn

Case studies
- Adding a model
- Adding new species, PFTs, and traits from a new site
  - Add a site
  - Add some species
  - Add PFT
  - Add trait data
- Adding a benchmark
- Adding a met driver
Reference (How to change tables)
- Models
- Species
- PFTs
- Traits
- Inputs
- DB files
- Variables
- Formats
- (Link each section to relevant Bety tables)

15.1 Case studies

Adding a model
Adding new species, PFTs, and traits from a new site
- Add a site
- Add some species
- Add PFT
- Add trait data
Adding a benchmark
Adding a met driver

15.2 Adding An Ecosystem Model

Adding a model to PEcAn involves two activities:

Updating the PEcAn database to register the model
Writing the interface modules between the model and PEcAn

Note that coupling a model to PEcAn should not require any changes to the model code itself. A key aspect of our design philosophy is that we want it to be easy to add models to the system and we want to using the working version of the code that is used by all other model users, not a special branch (which would rapidly end up out-of-date).

15.2.0.1 PEcAn Database

To run a model within PEcAn requires that the PEcAn database know about the model – this includes a MODEL_TYPE designation, the types of inputs the model requires, the location of the model executable, and the plant functional types used by the model. The instructions below assume that you will be specifying this information using the BETYdb web-based interface. This can be done either on your local VM (localhost:3280/bety or localhost:6480/bety) or on a server installation of BETYdb, though in either case we’d encourage you to set up your PEcAn instance to support database syncs so that these changes can be shared and backed-up across the PEcAn network.

The figure below summarizes the relevant database tables that need to be updated to add a new model and the primary variables that define each table.

15.2.0.2 Define MODEL_TYPE

The first step to adding a model is to create a new MODEL_TYPE, which defines the abstract model class which we will then use to specify input requirements, define plant functional types, and keep track of different model versions. A MODEL_TYPE is created by selecting Runs > Model Type and then clicking on New Model Type. The MODEL_TYPE name should be identical to the MODEL package name (see Interface Module below) and is case sensitive.

15.2.0.3 MACHINE

The PEcAn design acknowledges that the same model executables and input files may exist on multiple computers. Therefore, we need to define the machine that that we are using. If you are running on the VM then the local machine is already defined as pecan32 or pecan64 for the 32-bit and 64-bit versions respectively. Otherwise, you will need to select Runs > Machines, click New Machine, and enter the URL of your server (e.g. pecan2.bu.edu).

15.2.0.4 MODEL

Next we are going to tell PEcAn where the model executable is. Select Runs > Files, and click ADD. Use the pull down menu to specify the machine you just defined above and fill in the path and name for the executable. For example, if SIPNET is installed at /usr/local/bin/sipnet then the path is /usr/local/bin/ and the file (executable) is sipnet.

Now we will create the model record and associate this with the File we just registered. The first time you do this select Runs > Models and click New Model. Specify a descriptive name of the model (which doesn’t have to be the same as MODEL_TYPE), select the MODEL_TYPE from the pull down, and provide a revision identifier for the model (e.g. v3.2.1). Once the record is created select it from the Models table and click EDIT RECORD. Click on “View Related Files” and when the search window appears search for the model executable you just added (if you are unsure which file to choose you can go back to the Files menu and look up the unique ID number). You can then associate this Model record with the File by clicking on the +/- symbol. By contrast, clicking on the name itself will take you to the File record.

In the future, if you set up the SAME MODEL VERSION on a different computer you can add that Machine and File to PEcAn and then associate this new File with this same Model record. A single version of a model should only be entered into PEcAn once.

If a new version of the model is developed that is derived from the current version you should add this as a new Model record but with the same MODEL_TYPE as the original. Furthermore, you should set the previous version of the model as Parent of this new version.

15.2.0.5 FORMATS

The PEcAn database keep track of all the input files passed to models, as well as any data used in model validation or data assimilation. Before we start to register these files with PEcAn we need to define the format these files will be in. To create a new format see Formats Documentation.

15.2.0.5.1 MODEL_TYPE -> Formats

For each of the input formats you specify for your model, you will need to edit your MODEL_TYPE record to add an association between the format and the MODEL_TYPE. Go to Runs > Model Type, select your record and click on the Edit button. Next, click on “Edit Associated Formats” and choose the Format you just defined from the pull down menu. If the Input box is checked then all matching Input records will be displayed in the PEcAn site run selection page when you are defining a model run. In other words, the set of model inputs available through the PEcAn web interface is model-specific and dynamically generated from the associations between MODEL_TYPEs and Formats. If you also check the Required box, then the Input will be treated as required and PEcAn will not run the model if that input is not available. Furthermore, on the site selection webpage, PEcAn will filter the available sites and only display pins on the Google Map for sites that have a full set of required inputs (or where those inputs could be generated using PEcAn’s workflows). Similarly, to make a site appear on the Google Map, all you need to do is specify Inputs, as described in the next section, and the point should automatically appear on the map.

15.2.0.6 INPUTS

After a file Format has been created then input files can be registered with the database. Creating Inputs can be found under How to insert new Input data.

15.2.0.7 PFTS (Plant Functional Types)

Since many of the PEcAn tools are designed to keep track of parameter uncertainties and assimilate data into models, to use PEcAn with a model it is important to define Plant Functional Types for the sites or regions that you will be running the model. PFTs are MODEL_TYPE specific, so when you create a new PFT entry (Data > PFTs; New PFT) you will want to choose your MODEL_TYPE from the pull down and then give the PFT a descriptive name (e.g. temperate deciduous).

15.2.0.7.1 Species

Within PEcAn there are no predefined PFTs and user can create new PFTs very easily at whatever taxonomic level is most appropriate, from PFTs for individual species up to one PFT for all plants globally. To allow PEcAn to query its trait database for information about a PFT, you will want to associate species with the PFT record by choosing Edit and then “View Related Species”. Species can be searched for by common or scientific name and then added to a PFT using the +/- button.

15.2.0.7.2 Cultivars

You can also define PFTs whose members are cultivars instead of species. This is designed for analyses where you want to want to perform meta-analysis on within-species comparisons (e.g. cultivar evaluation in an agricultural model) but may be useful for other cases when you want to specify different priors for some member of a species. You cannot associate both species and cultivars with the same PFT, but the cultivars in a cultivar PFT may come from different species, potentially including all known cultivars from some of the species, if you wish to and have thought about how to interpret the results.

It is not yet possible to add a cultivar PFT through the BETYdb web interface. See this GithHub comment for an example of how to define one manually in PostgreSQL.

15.2.0.8 PRIORS

In addition to adding species, a PFT is defined in PEcAn by the list of variables associated with the PFT. PEcAn takes a fundamentally Bayesian approach to representing model parameters, so variables are not entered as fixed constants but as Prior probability distributions (see below). Once Priors are defined for each model variable then you Edit the PFT and use “View Related Priors” to search for and add Prior distributions for each model parameter. It is important to note that the priors are defined for the variable name and units as specified in the Variables table. If the variable name or units is different within the model it is the responsibility of write.configs.MODEL function to handle name and unit conversions (see Interface Modules below). This can also include common but nonlinear transformations, such as converting SLA to LMA or changing the reference temperature for respiration rates.

There are a wide variety of priors already defined in the PEcAn database that often range from very diffuse and generic to very informative priors for specific PFTs. If the current set of Priors for a variable are inadequate, or if a prior needs to be specified for a new variable, this can be done under Data > Priors then “New Prior”. After using the pull-down menu to select the Variable you want to generate a prior for, the prior is defined by choosing a probability distribution and specifying values for that distribution’s parameters. These are labeled Parameter a & b but their exact meaning depends upon the distribution chosen. For example, for the Normal distribution a and b are the mean and standard deviation while for the Uniform they are the minimum and maximum. All parameters are defined based on their standard parameterization in the R language. If the prior is based on observed data (independent of data in the PEcAn database) then you can also specify the prior sample size, N. The Phylogeny variable allows one to specify what taxonomic grouping the prior is defined for, at it is important to note that this is just for reference and doesn’t have to be specified in any standard way nor does it have to be monophyletic (i.e. it can be a functional grouping). Finally, the Citation is a required variable that provides a reference for how the prior was defined. That said, there are a number of unpublished Citations in current use that simply state the expert opinion of an individual.

Additional information on adding PFTs, Species, and Priors can be found under [[Choosing PFTs]]

15.2.0.9 Interface Modules

15.2.0.9.1 Setting up the module directory (required)

PEcAn assumes that the interface modules are available as an R package in the models directory named after the model in question. The simplest way to get started on that R package is to make a copy the template directory in the pecan/models folder and re-name it to the name of your model. In the code, filenames, and examples below you will want to substitute the word MODEL for the name of your model (note: R is case-sensitive).

If you do not want to write the interface modules in R then it is fairly simple to set up the R functions describe below to just call the script you want to run using R’s system command. Scripts that are not R functions should be placed in the inst folder and R can look up the location of these files using the function system.file which takes as arguments the local path of the file within the package folder and the name of the package (typically PEcAn.MODEL). For example

## Example met conversion wrapper function
met2model.MODEL <- function(in.path, in.prefix, outfolder, start_date, end_date){
   myMetScript <- system.file("inst/met2model.MODEL.sh", "PEcAn.MODEL")
   system(paste(myMetScript, file.path(in.path, in.prefix), outfolder, start_date, end_date))
}

would execute the following at the Linux command line

inst/met2model.MODEL.sh in.path/in.prefix outfolder start_date end_date    `

15.2.0.9.2 DESCRIPTION

Within the module folder open the DESCRIPTION file and change the package name to PEcAn.MODEL. Fill out other fields such as Title, Author, Maintainer, and Date.

15.2.0.9.3 NAMESPACE

Open the NAMESPACE file and change all instances of MODEL to the name of your model. If you are not going to implement one of the optional modules (described below) at this time then you will want to comment those out using the pound sign #. For a complete description of R NAMESPACE files see here. If you create additional functions in your R package that you want to be used make sure you include them in the NAMESPACE as well (internal functions don’t need to be declared)

15.2.0.9.4 Building the package

Once the package is defined you will then need to add it to the PEcAn build scripts. From the root of the pecan directory, go into the scripts folder and open the file build.sh. Within the section of code that includes PACKAGES= add model/MODEL to the list of packages to compile. If, in writing your module, you add any other R packages to the system you will want to make sure those are listed in the DESCRIPTION and in the script scripts/install.dependencies.R. Next, from the root pecan directory open all/DESCRIPTION and add your model package to the Suggests: list.

At any point, if you want to check if PEcAn can build your MODEL package successfully, just go to the linux command prompt and run scripts/build.sh. You will need to do this before the system can use these packages.

15.2.0.9.5 write.config.MODEL (required)

This module performs two primary tasks. The first is to take the list of parameter values and model input files that it receives as inputs and write those out in whatever format(s) the MODEL reads (e.g. a settings file). The second is to write out a shell script, jobs.sh, which, when run, will start your model run and convert its output to the PEcAn standard (netCDF with metadata currently equivalent to the MsTMIP standard). Within the MODEL directory take a close look at inst/template.job and the example write.config.MODEL to see an example of how this is done. It is important that this script writes or moves outputs to the correct location so that PEcAn can find them. The example function also shows an example of writing a model-specific settings/config file, also by using a template.

You are encouraged to read the section above on defining PFTs before writing write.config.MODEL so that you understand what model parameters PEcAn will be passing you, how they will be named, and what units they will be in. Also note that the (optional) PEcAn input/driver processing scripts are called by separate workflows, so the paths to any required inputs (e.g. meteorology) will already be in the model-specific format by the time write.config.MODEL receives that info.

15.2.0.9.6 Output Conversions

The module model2netcdf.MODEL converts model output into the PEcAn standard (netCDF with metadata currently equivalent to the MsTMIP standard). This function was previously required, but now that the conversion is called within jobs.sh it may be easier for you to convert outputs using other approaches (or to just directly write outputs in the standard).

Whether you implement this function or convert outputs some other way, please note that PEcAn expects all outputs to be broken up into ANNUAL files with the year number as the file name (i.e. YEAR.nc), though these files may contain any number of scalars, vectors, matrices, or arrays of model outputs, such as time-series of each output variable at the model’s native timestep.

Note: PEcAn reads all variable names from the files themselves so it is possible to add additional variables that are not part of the MsTMIP standard. Similarly, there are no REQUIRED output variables, though time is highly encouraged. We are shortly going establish a canonical list of PEcAn variables so that if users add additional output variables they become part of the standard. We don’t want two different models to call the same output with two different names or different units as this would prohibit the multi-model syntheses and comparisons that PEcAn is designed to facilitate.

15.2.0.9.7 met2model.MODEL

met2model.MODEL(in.path, in.prefix, outfolder, start_date, end_date)

Converts meteorology input files from the PEcAn standard (netCDF, CF metadata) to the format required by the model. This file is optional if you want to load all of your met files into the Inputs table as described in How to insert new Input data, which is often the easiest way to get up and running quickly. However, this function is required if you want to benefit from PEcAn’s meteorology workflows and model run cloning. You’ll want to take a close look at [Adding-an-Input-Converter] to see the exact variable names and units that PEcAn will be providing. Also note that PEcAn splits all meteorology up into ANNUAL files, with the year number explicitly included in the file name, and thus what PEcAn will actually be providing is in.path, the input path to the folder where multiple met files may stored, and in.prefix, the start of the filename that precedes the year (i.e. an individual file will be named <in.prefix>.YEAR.nc). It is valid for in.prefix to be blank. The additional REQUIRED arguments to met2model.MODEL are outfolder, the output folder where PEcAn wants you to write your meteorology, and start_date and end_date, the time range the user has asked the meteorology to be processed for.

15.2.0.9.8 Commit changes

Once the MODEL modules are written, you should follow the Using-Git instructions on how to commit your changes to your local git repository, verify that PEcAn compiles using scripts/build.sh, push these changes to Github, and submit a pull request so that your model module is added to the PEcAn system. It is important to note that while we encourage users to make their models open, adding the PEcAn interface module to the Github repository in no way requires that the model code itself be made public. It does, however, allow anyone who already has a copy of the model code to use PEcAn so we strongly encourage that any new model modules be committed to Github.