11.2 Advanced features

11.2.1 ensemble: Ensemble Runs

As with meta.analysis, if this section is missing, then PEcAn will not do an ensemble analysis.

  <ensemble>
    <size>1</size>
    <variable>NPP</variable>
    <samplingspace>
      <parameters>
        <method>uniform</method>
      </parameters>
      <met>
        <method>sampling</method>
      </met>
    </samplingspace>
  </ensemble>

An alternative configuration is as follows:

<ensemble>
  <size>5</size>
  <variable>GPP</variable>
  <start.year>1995</start.year>
  <end.year>1999</end.year>
  <samplingspace>
  <parameters>
    <method>lhc</method>
  </parameters>
  <met>
    <method>sampling</method>
  </met>
  </samplingspace>
</ensemble>

Tags in this block can be broken down into two categories: Those used for setup (which determine how the ensemble analysis runs) and those used for post-hoc analysis and visualization (i.e. which do not affect how the ensemble is generated).

Tags related to ensemble setup are:

  • size : (required) the number of runs in the ensemble.
  • samplingspace: (optional) Contains tags for defining how the ensembles will be generated.

Each piece in the sampling space can potentially have a method tag and a parent tag. Method refers to the sampling method and parent refers to the cases where we need to link the samples of two components. When no tag is defined for one component, one sample will be generated and used for all the ensembles. This allows for partitioning/studying different sources of uncertainties. For example, if no met tag is defined then, one met path will be used for all the ensembles and as a result the output uncertainty will come from the variability in the parameters. At the moment no sampling method is implemented for soil and vegetation. Available sampling methods for parameters can be found in the documentation of the PEcAn.utils::get.ensemble.samples function. For the cases where we need simulations with a predefined set of parameters, met and initial condition we can use the restart argument. Restart needs to be a list with name tags of runid, inputs, new.params (parameters), new.state (initial condition), ensemble.id (ensemble ids), start.time, and stop.time.

The restart functionality is developed using model specific functions by called write_restart.modelname. You need to make sure first that this function is already exist for your desired model.

Note: if the ensemble size is set to 1, PEcAn will select the posterior median parameter values rather than taking a single random draw from the posterior

Tags related to post-hoc analysis and visualization are:

  • variable: (optional) name of one (or more) variables the analysis should be run for. If not specified, sensitivity.analysis variable is used, otherwise default is GPP (Gross Primary Productivity).

(NOTE: This static visualization functionality will soon be deprecated as PEcAn moves towards interactive visualization tools based on Shiny and htmlwidgets).

This information is currently used by the following PEcAn workflow functions:

  • PEcAn.<MODEL>::write.config.<MODEL> - See above.
  • PEcAn.uncertainty::write.ensemble.configs - Write configuration files for ensemble analysis
  • PEcAn.uncertainty::run.ensemble.analysis - Run ensemble analysis

11.2.2 sensitivity.analysis: Sensitivity analysis

Only if this section is defined a sensitivity analysis is done. This section will have <quantile> or <sigma> nodes. If neither are given, the default is to use the median +/- [1 2 3] x sigma (e.g. the 0.00135 0.0228 0.159 0.5 0.841 0.977 0.999 quantiles); If the 0.5 (median) quantile is omitted, it will be added in the code.

<sensitivity.analysis>
    <quantiles>
        <sigma>-3</sigma>
        <sigma>-2</sigma>
        <sigma>-1</sigma>
        <sigma>1</sigma>
        <sigma>2</sigma>
        <sigma>3</sigma>
    </quantiles>
  <variable>GPP</variable>
  <perpft>TRUE</perpft>
    <start.year>2004</start.year>
    <end.year>2006</end.year>
</sensitivity.analysis>
  • quantiles/sigma : [optional] The number of standard deviations relative to the standard normal (i.e. “Z-score”) for which to perform the ensemble analysis. For instance, <sigma>1</sigma> corresponds to the quantile associated with 1 standard deviation greater than the mean (i.e. 0.681). Use a separate <sigma> tag, all under the <quantiles> tag, to specify multiple quantiles. Note that we do not automatically add the quantile associated with -sigma – i.e. if you want +/- 1 standard deviation, then you must include both <sigma>1</sigma> and <sigma>-1</sigma>.
  • start.date : [required?] start date of the sensitivity analysis (in YYYY/MM/DD format)
  • end.date : [required?] end date of the sensitivity analysis (in YYYY/MM/DD format)
    • NOTE: start.date and end.date are distinct from values set in the run tag because this analysis can be done over a subset of the run.
  • variable : [optional] name of one (or more) variables the analysis should be run for. If not specified, sensitivity.analysis variable is used, otherwise default is GPP.
  • perpft : [optional] if TRUE a sensitivity analysis on PFT-specific outputs will be run. This is only possible if your model provides PFT-specific outputs for the variable requested. This tag only affects the output processing, not the number of samples proposed for the analysis nor the model execution.

This information is currently used by the following PEcAn workflow functions:

  • PEcAn.<MODEL>::write.configs.<MODEL> – See above
  • PEcAn.uncertainty::run.sensitivity.analysis – Executes the uncertainty analysis

11.2.3 Parameter Data Assimilation

The following tags can be used for parameter data assimilation. More detailed information can be found here: Parameter Data Assimilation Documentation

11.2.4 Multi-Settings

Multi-settings allows you to do multiple runs across different sites. This customization can also leverage site group distinctions to expedite the customization. It takes your settings and applies the same settings, changing only the site level tags across sites.

To start, add the multisettings tag within the <run></run> section of your xml

<multisettings>
  <multisettings>run</multisettings>
<multisettings> 

Additional tags for this section exist and can fully be seen here:

 <multisettings>
  <multisettings>assim.batch</multisettings>
  <multisettings>ensemble</multisettings>
  <multisettings>sensitivity.analysis</multisettings>
  <multisettings>run</multisettings>
 </multisettings>

These tags correspond to different pecan analysis that need to know that there will be multiple settings read in.

Next you’ll want to add the following tags to denote the group of sites you want to use. It leverages site groups, which are defined in BETY.

 <sitegroup>
   <id>1000000022</id>
 </sitegroup>

If you add this tag, you must remove the <site> </site> tags from the <run> tag portion of your xml. The id of your sitegroup can be found by lookig up your site group within BETY.

You do not have to use the sitegroup tag. You can manually add multiple sites using the structure in the example below.

Lastly change the top level tag to <pecan.multi>, meaning the top and bootom of your xml should look like this:

<?xml version="1.0"?>
<pecan.multi>
...
</pecan.multi>

Once you have defined these tags, you can run PEcAn, but there may be further specifications needed if you know that different data sources have different dates available.

Run workflow.R up until

# Write pecan.CHECKED.xml
PEcAn.settings::write.settings(settings, outputfile = "pecan.CHECKED.xml")

Once this section is run, you’ll need to open pecan.CHECKED.xml. You will notice that it has expanded from your original pecan.xml.

 <run>
  <settings.1>
   <site>
    <id>796</id>
    <met.start>2005/01/01</met.start>
    <met.end>2011/12/31</met.end>
    <name>Bartlett Experimental Forest (US-Bar)</name>
    <lat>44.06464</lat>
    <lon>-71.288077</lon>
   </site>
   <start.date>2005/01/01</start.date>
   <end.date>2011/12/31</end.date>
   <inputs>
    <met>
     <path>/fs/data1/pecan.data/dbfiles/AmerifluxLBL_SIPNET_site_0-796/AMF_US-Bar_BASE_HH_4-1.2005-01-01.2011-12-31.clim</path>
    </met>
   </inputs>
  </settings.1>
  <settings.2>
   <site>
    <id>767</id>
    <met.start>2001/01/01</met.start>
    <met.end>2014/12/31</met.end>
    <name>Morgan Monroe State Forest (US-MMS)</name>
    <lat>39.3231</lat>
    <lon>-86.4131</lon>
   </site>
   <start.date>2001/01/01</start.date>
   <end.date>2014/12/31</end.date>
   <inputs>
    <met>
     <path>/fs/data1/pecan.data/dbfiles/AmerifluxLBL_SIPNET_site_0-767/AMF_US-MMS_BASE_HR_8-1.2001-01-01.2014-12-31.clim</path>
    </met>
   </inputs>
  </settings.2>
....
</run>
  • The ... replaces the rest of the site settings for however many sites are within the site group.

Looking at the example above, take a close look at the <met.start></met.start> and <met.end></met.end>. You will notice that for both sites, the dates are different. In this example they were edited by hand to include the dates that are available for that site and source. You must know your source prior. Only the source CRUNCEP has a check that will tell you if your dates are outside the range available. PEcAn will automatically populate these dates across sites according the original setting of start and end dates.

In addition, you will notice that the <path></path> section contains the model specific meteorological data file. You can add that in by hand or you can you can leave the normal tags that met process workflow will use to process the data into your model specific format:

<met>
  <source>AmerifluxLBL</source>
  <output>SIPNET</output>
  <username>pecan</username>
</met>

11.2.5 (experimental) State Data Assimilation

The following tags can be used for state data assimilation. More detailed information can be found here: State Data Assimilation Documentation

<state.data.assimilation>
    <process.variance>FALSE</process.variance>
  <sample.parameters>FALSE</sample.parameters>
  <state.variables>
   <variable>AGB.pft</variable>
   <variable>TotSoilCarb</variable>
  </state.variables>
  <spin.up>
    <start.date>2004/01/01</start.date>
      <end.date>2006/12/31</end.date>
  </spin.up>
  <forecast.time.step>1</forecast.time.step>
    <start.date>2004/01/01</start.date>
    <end.date>2006/12/31</end.date>
</state.data.assimilation>
  • process.variance : [optional] TRUE/FLASE flag for if process variance should be estimated (TRUE) or not (FALSE). If TRUE, a generalized ensemble filter will be used. If FALSE, an ensemble Kalman filter will be used. Default is FALSE.
  • sample.parameters : [optional] TRUE/FLASE flag for if parameters should be sampled for each ensemble member or not. This allows for more spread in the intial conditions of the forecast.
  • NOTE: If TRUE, you must also assign a vector of trait names to pick.trait.params within the sda.enkf function.
  • state.variable : [required] State variable that is to be assimilated (in PEcAn standard format). Default is “AGB” - Above Ground Biomass.
  • spin.up : [required] start.date and end.date for model spin up.
  • NOTE: start.date and end.date are distinct from values set in the run tag because spin up can be done over a subset of the run.
  • forecast.time.step : [optional] start.date and end.date for model spin up.
  • start.date : [required?] start date of the state data assimilation (in YYYY/MM/DD format)
  • end.date : [required?] end date of the state data assimilation (in YYYY/MM/DD format)
  • NOTE: start.date and end.date are distinct from values set in the run tag because this analysis can be done over a subset of the run.

11.2.6 (experimental) Brown Dog

This section describes how to connect to Brown Dog. This facilitates processing and conversions of data.

  <browndog>
    <url>...</url>
    <username>...</username>
    <password>...</password>
  </browndog>
  • url: (required) endpoint for Brown Dog to be used.
  • username: (optional) username to be used with the endpoint for Brown Dog.
  • password: (optional) password to be used with the endpoint for Brown Dog.

This information is currently used by the following R functions:

  • PEcAn.data.atmosphere::met.process – Generic function for processing meteorological input data.
  • PEcAn.benchmark::load_data – Generic, versatile function for loading data in various formats.

11.2.7 (experimental) Benchmarking

Coming soon…

11.2.8 Remote data module

This section describes the tags required for configuring remote_process.

  <remotedata>
  <out_get_data>...</out_get_data>
  <source>...</source>
  <collection>...</collection>
  <scale>...</scale>
  <projection>...</projection>
  <qc>...</qc>
  <algorithm>...</algorithm>
  <credfile>...</credfile>
  <out_process_data>...</out_process_data>
  <overwrite>...</overwrite>
  </remotedata>
  • out_get_data: (required) type of raw output requested, e.g, bands, smap
  • source: (required) source of remote data, e.g., gee or appeears
  • collection: (required) dataset or product name as it is provided on the source, e.g. “COPERNICUS/S2_SR” for gee or “SPL3SMP_E.003” for appeears
  • scale: (optional) pixel resolution required for some gee collections, recommended to use 10 for Sentinel 2
  • projection: (optional) type of projection. Only required for appeears polygon AOI type
  • qc: (optional) quality control parameter, required for some gee collections
  • overwrite: (optional) if TRUE database checks will be skipped and existing data of same type will be replaced entirely. When processed data is requested, the raw data required for creating it will also be replaced. By default FALSE

These tags are only required if processed data is requested:

  • out_process_data: (optional) type of processed output requested, e.g, LAI
  • algorithm: (optional) algorithm used for processing data, currently only SNAP is implemented to estimate LAI from Sentinel-2 bands
  • credfile: (optional) absolute path to JSON file containing Earthdata username and password, only required for AppEEARS
  • pro_mimetype: (optional) MIME type of the processed file
  • pro_formatname: (optional) format name of the processed file

Additional information for the module are taken from the registration files located at data.remote/inst/registration

The output data from the module are returned in the following tags:

  • raw_id: input id of the raw file
  • raw_path: absolute path to the raw file
  • pro_id: input id of the processed file
  • pro_path: absolute path to the processed file