45 Introduction to the PEcAn R API
The PEcAn API package (
pecanapi) is designed to allow users to submit PEcAn workflows directly from an R session.
The basic idea is that users build the PEcAn settings object via an R script (manually, or using the included helper functions) and then use the RabbitMQ API to send this object to a Dockerized PEcAn instance running on a local or remote machine.
pecanapi is specifically designed to only depend on CRAN packages, and not on any PEcAn internal packages.
This makes it easy to install, and allows it to be used without needing to download and install PEcAn itself (which is large and has many complex R package and system dependencies).
It can be installed directly from GitHub as follows:
## devtools::install_github("pecanproject/pecan", subdir = "api") library(pecanapi)
## Error in library(pecanapi): there is no package called 'pecanapi'
This vignette covers the following major sections:
- Initial setup goes over the configuration, both inside and outside R, required to make
- Registering a workflow goes over how to register a PEcAn workflow with the PEcAn database, including searching for the required site and model IDs
- Building a settings object covers how to configure a PEcAn workflow using the PEcAn settings list.
- Finally, submitting a run covers how to submit the complete settings object for execution.
45.2 Initial setup
This tutorial assumes you are running a Dockerized instance of PEcAn on your local machine (hostname
localhost, port 8000).
To check this, open a browser and try to access
If you are trying to access a remote instance of PEcAn, you will need to substitute the hostname and port accordingly.
To perform database operations, you will also need to have read access to the PEcAn database.
Note that the PEcAn database Docker container (
postgres) does not provide this by default, so you will need to open port 5432 (the PostgreSQL default) to that container.
You can do this by creating a
docker-compose.override.yml file with the following contents in the root directory of the PEcAn source code:
version: "3" services: postgres: ports: - 5432:5432
Here, the first port is the one used to access the database (can be any open port; most PostgreSQL applications assume 5432 by default), and the second is the port the database is actually running on (which will always be 5432).
After making this change, reload the
postgres container by running
docker-compose up -d.
To check that this works, open an R session and try to create a database connection object to the PEcAn database.
con <- DBI::dbConnect( RPostgres::Postgres(), user = "bety", password = "bety", host = "localhost", port = 5432 ) DBI::dbListTables(con)[1:5]
This code should print out five table names from the PEcAn database. If it throws an error, you have a problem with your database connection.
The rest of this tutorial assumes that you are using this same database connection object (
In addition, any API operations that modify the database will not work unless a user ID is set.
To avoid having to manually specify the ID each time, we can set it via
options(pecanapi.user_id = 99000000002)
pecanapi package has many other options that it uses for its default configuration, including the Docker server and RabbitMQ hostname and credentials.
To learn more about them, see
45.3 Registering a workflow with the database
For the PEcAn workflow to work, it needs to be registered with the PEcAn database.
pecanapi, this is done via the
Building a workflow requires two important pieces of information: the model and site IDs.
If you know these for your site and model, you can pass them directly into
However, chances are you may have to look them up in the database first.
pecanapi provides several
search_* utilities to make this easier.
First, let’s pick a model.
To list all models, we can run
search_models with no arguments (other than the database connection object,
models <- search_models(con)
We can narrow down our search by model name, revision, or “type”.
search_models(con, "ED") search_models(con, "sipnet") search_models(con, "ED", revision = "git")
Note that the search is case-insensitive by default, and searches before and after the input string.
?search_models to learn how to toggle this behavior.
For the purposes of this tutorial, let’s use the SIPNET model because it has low input requirements and runs very quickly.
Specifically, let’s use the
We could grab the model ID from the search results, but
pecanapi also provides an additional helper function for retrieving model IDs if you know the exact name and revision.
model_id <- get_model_id(con, "SIPNET", "136") model_id
We can repeat this process for sites with the
search_sites function (though there is currently no
Note the use of
% as a wildcard (matches zero or more of any character, equivalent to the regular expression
The two sites in the search below are largely identical, so we’ll use the one with more site information (i.e. where
mat is not
all_umbs <- search_sites(con, "umbs%disturbance") all_umbs site_id <- subset(all_umbs, !is.na(mat))[["id"]]
With site and model IDs in hand, we are ready to create a workflow.
workflow <- insert_new_workflow(con, site_id, model_id, start_date = "2004-01-01", end_date = "2004-12-31") workflow
insert_new_workflow function inserts the workflow into the database and returns a
data.frame containing the row that was inserted.
45.4 Building a settings object
Now that we have a workflow registered, we need to configure it via the PEcAn settings list.
The PEcAn settings list is a nested list providing parameters for the various actions performed by the PEcAn workflow, including the trait meta-analysis, processing input files, and running models.
It can be created manually with a bunch of
However, this is tedious and error-prone, so
pecanapi provides several utilities that facilitate this process.
We start with a blank list.
settings <- list()
Let’s start by adding the workflow we created in the previous section to this list.
This is done via the
add_workflow function, which takes as input a workflow
data.frame and adds the relevant fields to the right places in the settings list.
settings <- add_workflow(settings, workflow)
add_* functions work by incrementally adding to an input settings object and returning a new modified settings object.
The first argument of these functions is always the settings list, which gives these functions a consistent syntax and makes it easy to string multiple settings modifications together using the
magrittr pipe (
%>%), similar to
tidyverse tabular data manipulations.
Let’s continue by adding a basic database configuration to this settings list.
settings <- add_database(settings)
## Error in add_database(settings): could not find function "add_database"
add_database function adds a sensible default configuration for the PEcAn database in the right place with the right names in the settings file.
These defaults can, of course, be modified in the function call (see
?add_database), or, better yet, by setting package options, which is where most
add_* functions get their defaults (see
add_rabbitmq automatically adds the RabbitMQ configuration to the settings object.
add_database, it takes all of its defaults from
settings <- add_rabbitmq(settings)
## Error in add_rabbitmq(settings): could not find function "add_rabbitmq"
PFTs are added to the settings object with the
To search for PFTs, use the
search_pfts function, which can take optional arguments for PFT name (
name), description of its definition (
definition), and model type (
search_pfts(con, name = "deciduous", modeltype = "sipnet") search_pfts(con, name = "tundra", modeltype = "ED")
search_sites, these functions are case insensitive and do partial matching by default.
add_pft function adds individual PFTs by name.
settings <- add_pft(settings, "temperate.deciduous")
## Error in add_pft(settings, "temperate.deciduous"): could not find function "add_pft"
This adds the
temperate.deciduous PFT to the appropriate spot in the settings hierarchy.
add_pft adds a single PFT to the settings,
add_pft_list can add a vector of PFTs.
settings <- add_pft_list(settings, c("temperate.coniferous", "miscanthus"))
## Error in add_pft_list(settings, c("temperate.coniferous", "miscanthus")): could not find function "add_pft_list"
add_pft_list can also take arbitrary additional configuration arguments via their
add_pft, such arguments are passed only to that PFT, while for
add_pft_list, they are shared between all PFTs.
For more details, see
One final note is that, because the settings object is just a list, you can make arbitrary modifications to it via base R’s
modifyList function (indeed, many of the
pecanapi::add_* functions use
modifyList under the hood).
customization <- list( meta.analysis = list(iter = 3000, random.effects = FALSE), run = list( inputs = list(met = list(source = "CRUNCEP", output = "SIPNET", method = "ncss")) ) ) settings <- modifyList(settings, customization)
modifyList operates recursively on nested lists, which makes it easy to modify settings at different levels of the list hierarchy.
For instance, below, we modify the previous settings object to make
random.effects = TRUE, and change the download method of the
inputs to OpenDAP, but keep all the other settings the same.
settings <- modifyList(settings, list( meta.analysis = list(random.effects = TRUE), run = list(inputs = list(met = list(method = "opendap"))) ))
All of these steps can be chained together via
magrittr pipes (
library(magrittr) settings <- list() %>% add_workflow(workflow) %>% add_database() %>% add_rabbitmq() %>% add_pft("temperate.deciduous") %>% add_pft("temperate.coniferous") %>% modifyList(list( meta.analysis = list(iter = 3000, random.effects = FALSE), run = list(inputs = list(met = list(source = "CRUNCEP", output = "SIPNET", method = "ncss"))), host = list(rabbitmq = list( uri = "amqp://guest:guest@rabbitmq:5672/%2F", queue = "SIPNET_136" )) ))
45.5 Submitting a run
Now that we have all the pieces, let’s put them together into a single settings object.
settings <- list() %>% add_workflow(workflow) %>% add_database() %>% add_pft("temperate.deciduous") %>% modifyList(list( meta.analysis = list(iter = 3000, random.effects = FALSE), run = list(inputs = list(met = list(source = "CRUNCEP", output = "SIPNET", method = "ncss"))), host = list(rabbitmq = list( uri = "amqp://guest:guest@rabbitmq:5672/%2F", queue = "SIPNET_136" )) ))
We can then submit these settings as a run via the
This function has only one required input – the settings list – but a number of optional arguments for specifying how to connect to the RabbitMQ API (see
?submit_workflow for details).
If the workflow was submitted successfully, this will return the HTTP response
routed = TRUE as a named list.
Note that this only means that the RabbitMQ message was posted; the workflow can still crash for various reasons.
To see the status of the workflow, look at
docker-compose logs executor or use the Portainer interface.
45.6 Processing output
All of PEcAn’s outputs as well as its database files (
dbfiles) can be accessed remotely via the THREDDS data server.
You can explore these files by browsing to
localhost:8000/thredds/ in a browser (substituting hostname and port, accordingly).
All files, regardless of file type, can be downloaded directly (via HTTP) through the THREDDS
pecanapi, URLs for these files can be easily constructed via
output_url for any workflow output and
run_url for run-specific outputs.
For instance, to read the
workflow.Rout file from the workflow we created earlier, you can do the following:
workflow_id <- workflow[["id"]] readLines(workflow_id, "workflow.Rout")
Outputs in NetCDF format can also be accessed via the OpenDAP service, which allows remote variable selection and subsetting (meaning you can only download the outputs you need without needing to download the entire file).
These URLs are created via the
thredds_dap_url (for a generic URL) or
run_dap (to access outputs from a specific model run).
sipnet_out <- ncdf4::nc_open(run_dap(workflow_id, "2004.nc")) gpp <- ncdf4::ncvar_get(sipnet_out, "GPP") time <- ncdf4::ncvar_get(sipnet_out, "time") ncdf4::nc_close(sipnet_out) plot(time, gpp, type = "l")