20.3 Adding input data

20.3.1 Input records in BETY

All model input data or data used for model calibration/validation must be registered in the BETY database.

Before creating a new Input record, you must make sure that the format type of your data is registered in the database. If you need to make a new format record, see Creating a new format record in BETY.

20.3.2 Create a database file record for the input data

An input record contains all the metadata required to identify the data, however, this record does not include the location of the data file. Since the same data may be stored in multiple places, every file has its own dbfile record.

From your BETY interface:

  • Create a DBFILES entry for the path to the file
    • From the menu click RUNS then FILES
    • Click “New File”
    • Select the machine your file is located at
    • Fill in the File Path where your file is located (aka folder or directory) NOT including the name of the file itself
    • Fill in the File Name with the name of the file itself. Note that some types of input records will refer to be ALL the files in a directory and thus File Name can be blank
    • Click Update

20.3.3 Creating a new Input record in BETY

From your BETY interface:

  • Create an INPUT entry for your data
    • From the menu click RUNS then INPUTS
    • Click “New Input”
    • Select the SITE that this data is associated with the input data set
    • Other required fields are a unique name for the input, the start and end dates of the data set, and the format of the data. If the data is not in a currently known format you will need to create a NEW FORMAT and possibly a new input converter. Instructions on how to do add a converter can be found here Input conversion. Instructions on how to add a format record can be found here
    • Parent ID is an optional variable to indicated that one dataset was derived from another.
    • Click “Create”
  • Associate the DBFILE with the INPUT
    • In the RUNS -> INPUTS table, search and find the input record you just created
    • Click on the EDIT icon
    • Select “View related Files”
    • In the Search window, search for the DBFILE you just created
  • Once you have found the DBFILE, click on the “+” icon to add the file
  • Click on “Update” at the bottom when you are done.

20.3.4 Adding a new input converter

Three Types of data conversions are discussed below: Meteorological data, Vegetation data, and Soil data. Each section provides instructions on how to convert data from their raw formats into a PEcAn standard format, whether it be from a database or if you have raw data in hand.

Also, see PEcAn standard formats.

20.3.4.1 Meterological Data

20.3.4.1.1 Adding a function to PEcAn to convert a met data source

In general, you will need to write a function to download the raw met data and one to convert it to the PEcAn standard.

Downloading raw data function are named download.<source>.R. These functions are stored within the PEcAn directory: /modules/data.atmosphere/R.

Conversion function from raw to standard are named met2CF.<source>.R. These functions are stored within the PEcAn directory: /modules/data.atmosphere/R.

Current Meteorological products that are coupled to PEcAn can be found in our Available Meteorological Drivers page.

Note: Unless you are also adding a new model, you will not need to write a script to convert from PEcAn standard to PEcAn models. Those conversion scripts are written when a model is added and can be found within each model’s PEcAn directory.

Standards dimesion, names, nad units can be found here: Input Standards

20.3.4.1.2 Adding Single-Site Specific Meteorological Data

Perhaps you have meteorological data specific to one site, with a unique format that you would like to add to PEcAn. Your steps would be to: 1. write a script or function to convert your files into the netcdf PEcAn standard 2. insert that file as an input record for your site following these instructions

20.3.4.1.3 Processing Met data outside of the workflow using PEcAn functions

Perhaps you would like to obtain data from one of the sources coupled to PEcAn on its own. To do so you can run PEcAn functions on their own.

20.3.4.1.3.1 Example 1: Processing data from a database

Download Amerifluxlbl from Niwot Ridge for the year 2004:

raw.file <-PEcAn.data.atmosphere::download.AmerifluxLBL(sitename = "US-NR1", 
                                             outfolder = ".", 
                                             start_date = "2004-01-01", 
                                             end_date = "2004-12-31")

Using the information returned as the object raw.file you will then convert the raw files into a standard file.

Open a connection with BETY. You may need to change the host name depending on what machine you are hosting BETY. You can find the hostname listed in the machines table of BETY.


con <- PEcAn.DB::db.open(
  params = list(
    driver = RPostgres::Postgres(),
    dbname   = 'bety',
    host ='localhost',
    user     = "bety",
    password = "bety")
)

Next you will set up the arguments for the function

in.path <- '.'
in.prefix <- raw.file$dbfile.name
outfolder <- '.'
format.id <- 5000000002
format <- PEcAn.DB::query.format.vars(format.id=format.id,bety = con)
lon <- -105.54
lat <- 40.03
format$time_zone <- "America/Chicago"

Note: The format.id can be pulled from the BETY database if you know the format of the raw data.

Once these arguments are defined you can execute the met2CF.csv function

PEcAn.data.atmosphere::met2CF.csv(in.path = in.path, 
                                  in.prefix =in.prefix,
                                  outfolder = ".", 
                                  start_date ="2004-01-01",
                                  end_date = "2004-12-01",
                                  lat= lat,
                                  lon = lon,
                                  format = format) 
20.3.4.1.3.2 Example 2: Processing data from data already in hand

If you have Met data already in hand and you would like to convert into the PEcAn standard follow these instructions.

Update BETY with file record, format record and input record according to this page How to Insert new Input Data

If your data is in a csv format you can use the met2CF.csvfunction to convert your data into a PEcAn standard file.

Open a connection with BETY. You may need to change the host name depending on what machine you are hosting BETY. You can find the hostname listed in the machines table of BETY.

con <- PEcAn.DB::db.open(
  params = list(
    driver = RPostgres::Postgres(),
    dbname = 'bety',
    host ='localhost',
    user = "bety",
    password = "bety")
)

Prepare the arguments you need to execute the met2CF.csv function

in.path <- 'path/where/the/raw/file/lives'
in.prefix <- 'prefix_of_the_raw_file'
outfolder <- 'path/to/where/you/want/to/output/thecsv/'
format.id <- formatid of the format your created
format <- PEcAn.DB::query.format.vars(format.id=format.id, bety = con)
lon <- longitude of your site
lat <- latitude of your site
format$time_zone <- time zone of your site
start_date <- Start date of your data in "y-m-d"
end_date <- End date of your data in "y-m-d"

Next you can execute the function:

PEcAn.data.atmosphere::met2CF.csv(in.path = in.path, 
                                  in.prefix =in.prefix, 
                                  outfolder = ".", 
                                  start_date = start_date,
                                  end_date = end_date,
                                  lat= lat,
                                  lon = lon,
                                  format = format)

20.3.4.2 Vegetation Data

Vegetation data will be required to parameterize your model. In these examples we will go over how to produce a standard initial condition file.

The main function to process cohort data is the ic_process.R function. As of now however, if you require pool data you will run a separate function, pool_ic_list2netcdf.R.

20.3.4.2.0.1 Example 1: Processing Veg data from data in hand.

In the following example we will process vegetation data that you have in hand using PEcAn.

First, you’ll need to create a input record in BETY that will have a file record and format record reflecting the location and format of your file. Instructions can be found in our How to Insert new Input Data page.

Once you have created an input record you must take note of the input id of your record. An easy way to take note of this is in the URL of the BETY webpage that shows your input record. In this example we use an input record with the id 1000013064 which can be found at this url: https://psql-pecan.bu.edu/bety/inputs/1000013064# . Note that this is the Boston University BETY database. If you are on a different machine, your url will be different.

With the input id in hand you can now edit a pecan XML so that the PEcAn function ic_process will know where to look in order to process your data. The inputs section of your pecan XML will look like this. As of now ic_process is set up to work with the ED2 model so we will use ED2 settings and then grab the intermediary Rds data file that is created as the standard PEcAn file. For your Inputs section you will need to input your input id wherever you see the useic flag.

<inputs>
      <css>
        <source>FFT</source>
        <output>css</output>
        <username>pecan</username>
        <id>1000013064</id>
        <useic>TRUE</useic>
        <metadata>
          <trk>1</trk>
          <age>70</age>
          <area>400</area>
        </metadata>
      </css>
      <pss>
        <source>FFT</source>
        <output>pss</output>
        <username>pecan</username>
        <id>1000013064</id>
        <useic>TRUE</useic>
      </pss>
      <site>
        <source>FFT</source>
        <output>site</output>
        <username>pecan</username>
        <id>1000013064</id>
        <useic>TRUE</useic>
      </site>
      <met>
        <source>CRUNCEP</source>
        <output>ED2</output>
      </met>
      <lu>
        <id>294</id>
      </lu>
      <soil>
        <id>297</id>
      </soil>
      <thsum>
        <id>295</id>
      </thsum>
      <veg>
        <id>296</id>
      </veg>
    </inputs>

This IC workflow also supports generating ensembles of initial conditions from posterior estimates of DBH. To do this the tags below can be inserted to the pecan.xml:

       <css>
        <source>PalEON</source>
        <output>css</output>
        <id>1000015682</id>
        <useic>TRUE</useic>
        <ensemble>20</ensemble>
        <metadata>
          <area>1256.637</area>
          <n.patch>3</n.patch>
        </metadata>
      </css>

Here the id should point to a file that has MCMC samples to generate the ensemble from. The number between the <ensemble> tag defines the number of ensembles requested. The workflow will populate the settings list run$inputs tag with ensemble member information. E.g.:

  <inputs>
   <css>
    <path1>...</path1>
    <path2>...</path2>
    <path3>...</path3>
    ...
    <pathN>...</pathN>
   </css>
   <pss>
    <path>
     <path1>...</path1>
     <path2>...</path2>
     <path3>...</path3>
      ...
     <pathN>...</pathN>
    </path>
   </pss>
   <site>
    <path>
     <path1>...</path1>
     <path2>...</path2>
     <path3>...</path3>
      ...
     <pathN>...</pathN>
    </path>
   </site>
   <met>...</met>
   <lu>...</lu>
   <soil>...</soil>
   <thsum>...</thsum>
   <veg>...</veg>
  </inputs>

Once you edit your PEcAn.xml you can than create a settings object using PEcAn functions. Your pecan.xml must be in your working directory.

settings <- PEcAn.settings::read.settings("pecan.xml")
settings <- PEcAn.settings::prepare.settings(settings, force=FALSE)

You can then execute the ic_process function to convert data into a standard Rds file:

input <- settings$run$inputs
dir <- "."
ic_process(settings, input, dir, overwrite = FALSE)

Note that the argument dir is set to the current directory. You will find the final ED2 file there. More importantly though you will find the .Rds file within the same directory.

20.3.4.2.0.2 Example 3 Pool Initial Condition files

If you have pool vegetation data, you’ll need the pool_ic_list2netcdf.R function to convert the pool data into PEcAn standard.

The function stands alone and requires that you provide a named list of netcdf dimensions and values, and a named list of variables and values. Names and units need to match the standard_vars.csv table found here.

#Create a list object with necessary dimensions for your site
input<-list()
dims<- list(lat=-115,lon=45, time= 1)
variables<- list(SoilResp=8,TotLivBiom=295)
input$dims <- dims
input$vals <- variables

Once this is done, set outdir to where you’d like the file to write out to and a siteid. Siteid in this can be used as an file name identifier. Once part of the automated workflow siteid will reflect the site id within the BET db.

outdir  <- "."
siteid <- 772
pool_ic_list2netcdf(input = input, outdir = outdir, siteid = siteid)

You should now have a netcdf file with initial conditions.

20.3.4.3 Soil Data

20.3.4.3.0.1 Example 1: Converting Data in hand

Local data that has the correct names and units can easily be written out in PEcAn standard using the function soil2netcdf.

soil.data <- list(volume_fraction_of_sand_in_soil = c(0.3,0.4,0.5),
                  volume_fraction_of_clay_in_soil = c(0.3,0.3,0.3),
                  soil_depth = c(0.2,0.5,1.0))
                         
soil2netcdf(soil.data,"soil.nc")

At the moment this file would need to be inserted into Inputs manually. By default, this function also calls soil_params, which will estimate a number of hydraulic and thermal parameters from texture. Be aware that at the moment not all model couplers are yet set up to read this file and/or convert it to model-specific formats.

20.3.4.3.0.2 Example 2: Converting PalEON data

In addition to location-specific soil data, PEcAn can extract soil texture information from the PalEON regional soil product, which itself is a subset of the MsTMIP Unified North American Soil Map. If this product is installed on your machine, the appropriate step in the do_conversions workflow is enabled by adding the following tag under <inputs> in your pecan.xml

   <soil>
     <id>1000012896</id>
   </soil>

In the future we aim to extend this extraction to a wider range of soil products.

20.3.4.3.0.3 Example 3: Extracting soil properties from gSSURGO database

In addition to location-specific soil data, PEcAn can extract soil texture information from the gSSURGO data product. This product needs no installation and it extract soil proeprties for the lower 48 states in U.S. In order to let the pecan know that you’re planning to use gSSURGO, you can the following XML tag under input in your pecan xml file.

<inputs>
   <soil>
     <source>gSSURGO</source>
   </soil>
</inputs>