17 Creating a new Format record in BETY

If the Format you are looking for is not available, you will need to create a new record. Before entering information into the database, you need to be able to answer the following questions about your data:

  • What is the file MIME type?
    • We have a suit of functions for loading in data in open formats such as CSV, txt, netCDF, etc.
    • PEcAn has partnered with the NCSA BrownDog project to create a service that can read and convert as many data formats as possible. If your file type is less common or a proprietary type, you can use the BrownDog DAP to convert it to a format that can be used with PEcAn.
    • If BrownDog cannot convert your data, you will need to contact us about writing a data specific load function.
  • What variables does the file contain?
    • What are the variables named?
    • What are the variable units?
    • How do the variable names and units in the data map to PEcAn variables in the BETY database? See below for an example. It is most likely that you will NOT need to add variables to BETY. However, identifying the appropriate variables matches in the database may require some work. We are always available to help answer your questions.
  • Is there a timestamp on the data?
    • What are the units of time?

Here is an example using a fake dataset:

example_data

example_data

This data started out as an excel document, but was saved as a CSV file.

To create a Formats record for this data, in the web interface of BETY, select Runs > Formats and click New Format.

You will need to fill out the following fields:

  • MIME type: File type (you can search for other formats in the text field)
  • Name: The name of your format (this can be whatever you want)
  • Header: Boolean that denotes whether or not your data contains a header as the first line of the data. (1 = TRUE, 0 = FALSE)
  • Skip: The number of lines above the data that should be skipped. For example, metadata that should not be included when reading in the data or blank spaces.
  • Notes: Any additional information about the data such as sources and citations.

Here is the Formats record for the example data:

format_record_1 When you have finished this section, hit Create. The final record will be displayed on the screen.

17.1 Formats -> Variables

After a Format entry has been created, you are encouraged to edit the entry to add relationships between the file’s variables and the Variables table in PEcAn. Not only do these relationships provide meta-data describing the file format, but they also allow PEcAn to search and (for some MIME types) read files.

To enter this data, select Edit Record and on the edit screen select View Related Variable.

Here is the record for the example data after adding related variables:

format_record_2

format_record_2

17.1.1 Name and Unit

For each variable in the file you will want at a minimum to specify the NAME of the variable within your file and match that to the equivalent Variable in the pulldown.

Make sure to search for your variables under Data > Variables before suggesting that we create a new variable record. This may not always be a straightforward process.

For example bety contains a record for Net Primary Productivity:

var_record

var_record

This record does not have the same variable name or the same units as NPP in the example data. You may have to do some reading to confirm that they are the same variable. In this case - Both the data and the record are for Net Primary Productivity (the notes section provides additional resources for interpreting the variable.) - The units of the data can be converted to those of the vairiable record (this can be checked by running udunits2::ud.are.convertible("g C m-2 yr-1", "Mg C ha-1 yr-1"))

Differences between the data and the variable record can be accounted for in the data Formats record.

  • Under Variable, select the variable as it is recorded in bety.
  • Under Name, write the name the variable has in your data file.
  • Under Unit, write the units the variable has in your data file.

NOTE: All units must be written in a udunits compliant format. To check that your units can be read by udunits, in R, load the udunits2 package and run udunits2::is.parseable("g C m-2 yr-1")

If the name or the units are the same, you can leave the Name and Unit fields blank. This is can be seen with the variable LAI.

17.1.2 Storage Type

Storage Type only needs to be specified if the variable is stored in a format other than what would be expected (e.g. if numeric values are stored as quoted character strings).

One such example is time variables.

PEcAn converts all dates into POSIX format using R functions such as strptime. These functions require that the user specify the format in which the date is written.

The default is "%Y-%m-%d %H:%M:%S" which would look like "2017-01-01 00:00:00"

A list of date formats can be found in the R documentation for the function strptime

Below are some commonly used codes:

%d Day of the month as decimal number (01–31).
%D Date format such as %m/%d/%y.
%H Hours as decimal number (00–23).
%m Month as decimal number (01–12).
%M Minute as decimal number (00–59).
%S Second as integer (00–61), allowing for up to two leap-seconds (but POSIX-compliant implementations will ignore leap seconds).
%T Equivalent to %H:%M:%S.
%y Year without century (00–99). On input, values 00 to 68 are prefixed by 20 and 69 to 99 by 19 – that is the behaviour specified by the 2004 and 2008 POSIX standards, but they do also say ‘it is expected that in a future version the default century inferred from a 2-digit year will change’.
%Y Year with century.

17.1.3 Column Number

If your data is in text format with variables in a standard order then you can specify the Column Number for the variable. This is required for text files that lack headers.

17.2 Retrieving Format Information

To acquire Format information from a Format record, use the R function query.format.vars

17.2.1 Inputs

  • bety: connection to BETY
  • input.id=NA and/or format.id=NA: Input or Format record ID from BETY
  • At least one must be specified. Defaults to format.id if both provided.
  • var.ids=NA: optional vector of variable IDs. If provided, limits results to these variables.

17.2.2 Output

  • R list object containing many things. Fill this in.

17.3 Input records in BETY

All model input data or data used for model calibration/validation must be registered in the BETY database.

Before creating a new Input record, you must make sure that the format type of your data is registered in the database. If you need to make a new format record, see Creating a new format record in BETY.

17.4 Create a database file record for the input data

An input record contains all the metadata required to identify the data, however, this record does not include the location of the data file. Since the same data may be stored in multiple places, every file has its own dbfile record.

From your BETY interface:

  • Create a DBFILES entry for the path to the file
    • From the menu click RUNS then FILES
    • Click “New File”
    • Select the machine your file is located at
    • Fill in the File Path where your file is located (aka folder or directory) NOT including the name of the file itself
    • Fill in the File Name with the name of the file itself. Note that some types of input records will refer to be ALL the files in a directory and thus File Name can be blank
    • Click Update

17.5 Creating a new Input record in BETY

From your BETY interface:

  • Create an INPUT entry for your data
    • From the menu click RUNS then INPUTS
    • Click “New Input”
    • Select the SITE that this data is associated with the input data set
    • Other required fields are a unique name for the input, the start and end dates of the data set, and the format of the data. If the data is not in a currently known format you will need to create a NEW FORMAT and possibly a new input converter. Instructions on how to do add a converter can be found here Input conversion. Instructions on how to add a format record can be found here
    • Parent ID is an optional variable to indicated that one dataset was derived from another.
    • Click “Create”
  • Associate the DBFILE with the INPUT
    • In the RUNS -> INPUTS table, search and find the input record you just created
    • Click on the EDIT icon
    • Select “View related Files”
    • In the Search window, search for the DBFILE you just created
  • Once you have found the DBFILE, click on the “+” icon to add the file
  • Click on “Update” at the bottom when you are done.

17.5.1 Input Conversion

Three Types of data conversions are discussed below: Meteorological data, Vegetation data, and Soil data. Each section provides instructions on how to convert data from their raw formats into a PEcAn standard format, whether it be from a database or if you have raw data in hand.

Also, see PEcAn standard formats.

17.5.1.1 Meterological Data conversion

17.5.1.1.1 Adding a function to PEcAn to convert a met data source

In general, you will need to write a function to download the raw met data and one to convert it to the PEcAn standard.

Downloading raw data function are named download.<source>.R. These functions are stored within the PEcAn directory: /modules/data.atmosphere/R.

Conversion function from raw to standard are named met2CF.<source>.R. These functions are stored within the PEcAn directory: /modules/data.atmosphere/R.

Current Meteorological products that are coupled to PEcAn can be found in our Available Meteorological Drivers page.

Note: Unless you are also adding a new model, you will not need to write a script to convert from PEcAn standard to PEcAn models. Those conversion scripts are written when a model is added and can be found within each model’s PEcAn directory.

17.5.1.1.2 Dimensions:
CF standard-name units
time days since 1700-01-01 00:00:00 UTC
longitude degrees_east
latitude degrees_north

General Note: dates in the database should be date-time (preferably with timezone), and datetime passed around in PEcAn should be of type POSIXct.

17.5.1.1.3 The variable names should be standard_name
CF standard-name units bety isimip cruncep narr ameriflux
air_temperature K airT tasAdjust tair air TA (C)
air_temperature_max K tasmaxAdjust NA tmax
air_temperature_min K tasminAdjust NA tmin
air_pressure Pa air_pressure PRESS (KPa)
mole_fraction_of_carbon_dioxide_in_air mol/mol CO2
moisture_content_of_soil_layer kg m-2
soil_temperature K soilT TS1 (NOT DONE)
relative_humidity % relative_humidity rhurs NA rhum RH
specific_humidity 1 specific_humidity NA qair shum CALC(RH)
water_vapor_saturation_deficit Pa VPD VPD (NOT DONE)
surface_downwelling_longwave_flux_in_air W m-2 same rldsAdjust lwdown dlwrf Rgl
surface_downwelling_shortwave_flux_in_air W m-2 solar_radiation rsdsAdjust swdown dswrf Rg
surface_downwelling_photosynthetic_photon_flux_in_air mol m-2 s-1 PAR PAR (NOT DONE)
precipitation_flux kg m-2 s-1 cccc prAdjust rain acpc PREC (mm/s)
degrees wind_direction WD
wind_speed m/s Wspd WS
eastward_wind m/s eastward_wind CALC(WS+WD)
northward_wind m/s northward_wind CALC(WS+WD)
  • preferred variables indicated in bold
  • wind_direction has no CF equivalent and should not be converted, instead the met2CF functions should convert wind_direction and wind_speed to eastward_wind and northward_wind
  • standard_name is CF-convention standard names
  • units can be converted by udunits, so these can vary (e.g. the time denominator may change with time frequency of inputs)
  • soil moisture for the full column, rather than a layer, is soil_moisture_content
  • A full list of PEcAn standard variable names, units and dimensions can be found here: https://github.com/PecanProject/pecan/blob/develop/base/utils/data/standard_vars.csv

For example, in the MsTMIP-CRUNCEP data, the variable rain should be precipitation_rate. We want to standardize the units as well as part of the met2CF.<product> step. I believe we want to use the CF “canonical” units but retain the MsTMIP units any time CF is ambiguous about the units.

The key is to process each type of met data (site, reanalysis, forecast, climate scenario, etc) to the exact same standard. This way every operation after that (extract, gap fill, downscale, convert to a model, etc) will always have the exact same inputs. This will make everything else much simpler to code and allow us to avoid a lot of unnecessary data checking, tests, etc being repeated in every downstream function.

17.5.1.1.4 Adding Single-Site Specific Meteorological Data

Perhaps you have meteorological data specific to one site, with a unique format that you would like to add to PEcAn. Your steps would be to: 1. write a script or function to convert your files into the netcdf PEcAn standard 2. insert that file as an input record for your site following these instructions

17.5.1.1.5 Processing Met data outside of the workflow using PEcAn functions

Perhaps you would like to obtain data from one of the sources coupled to PEcAn on its own. To do so you can run PEcAn functions on their own.

17.5.1.1.5.1 Example 1: Processing data from a database

Download Amerifluxlbl from Niwot Ridge for the year 2004:

raw.file <-PEcAn.data.atmosphere::download.AmerifluxLBL(sitename = "US-NR1", 
                                             outfolder = ".", 
                                             start_date = "2004-01-01", 
                                             end_date = "2004-12-31")

Using the information returned as the object raw.file you will then convert the raw files into a standard file.

Open a connection with BETY. You may need to change the host name depending on what machine you are hosting BETY. You can find the hostname listed in the machines table of BETY.


bety <- dplyr::src_postgres(dbname   = 'bety', 
                            host ='localhost', 
                            user     = "bety", 
                            password = "bety")
                            
con <- bety$con

Next you will set up the arguments for the function

in.path <- '.'
in.prefix <- raw.file$dbfile.name
outfolder <- '.'
format.id <- 5000000002
format <- PEcAn.DB::query.format.vars(format.id=format.id,bety = bety)
lon <- -105.54
lat <- 40.03
format$time_zone <- "America/Chicago"

Note: The format.id can be pulled from the BETY database if you know the format of the raw data.

Once these arguments are defined you can execute the met2CF.csv function

PEcAn.data.atmosphere::met2CF.csv(in.path = in.path, 
                                  in.prefix =in.prefix,
                                  outfolder = ".", 
                                  start_date ="2004-01-01",
                                  end_date = "2004-12-01",
                                  lat= lat,
                                  lon = lon,
                                  format = format) 
17.5.1.1.5.2 Example 2: Processing data from data already in hand

If you have Met data already in hand and you would like to convert into the PEcAn standard follow these instructions.

Update BETY with file record, format record and input record according to this page How to Insert new Input Data

If your data is in a csv format you can use the met2CF.csvfunction to convert your data into a PEcAn standard file.

Open a connection with BETY. You may need to change the host name depending on what machine you are hosting BETY. You can find the hostname listed in the machines table of BETY.

bety <- dplyr::src_postgres(dbname   = 'bety', 
                            host ='localhost', 
                            user     = "bety", 
                            password = "bety")
                            
con <- bety$con

Prepare the arguments you need to execute the met2CF.csv function

in.path <- 'path/where/the/raw/file/lives'
in.prefix <- 'prefix_of_the_raw_file'
outfolder <- 'path/to/where/you/want/to/output/thecsv/'
format.id <- formatid of the format your created
format <- PEcAn.DB::query.format.vars(format.id=format.id,bety = bety)
lon <- longitude of your site
lat <- latitude of your site
format$time_zone <- time zone of your site
start_date <- Start date of your data in "y-m-d"
end_date <- End date of your data in "y-m-d"

Next you can execute the function:

PEcAn.data.atmosphere::met2CF.csv(in.path = in.path, 
                                  in.prefix =in.prefix, 
                                  outfolder = ".", 
                                  start_date = start_date,
                                  end_date = end_date,
                                  lat= lat,
                                  lon = lon,
                                  format = format)

17.5.1.2 Vegetation Data

Vegetation data will be required to parameterize your model. In these examples we will go over how to produce a standard initial condition file.

The main function to process cohort data is the ic.process.R function. As of now however, if you require pool data you will run a separate function, pool_ic_list2netcdf.R.

17.5.1.2.0.1 Example 1: Processing Veg data from data in hand.

In the following example we will process vegetation data that you have in hand using PEcAn.

First, you’ll need to create a input record in BETY that will have a file record and format record reflecting the location and format of your file. Instructions can be found in our How to Insert new Input Data page.

Once you have created an input record you must take note of the input id of your record. An easy way to take note of this is in the URL of the BETY webpage that shows your input record. In this example we use an input record with the id 1000013064 which can be found at this url: https://psql-pecan.bu.edu/bety/inputs/1000013064# . Note that this is the Boston University BETY database. If you are on a different machine, your url will be different.

With the input id in hand you can now edit a pecan XML so that the PEcAn function ic.process will know where to look in order to process your data. The inputs section of your pecan XML will look like this. As of now ic.process is set up to work with the ED2 model so we will use ED2 settings and then grab the intermediary Rds data file that is created as the standard PEcAn file. For your Inputs section you will need to input your input id wherever you see the source.ic flag.

<inputs>
      <css>
        <source>FFT</source>
        <output>css</output>
        <username>pecan</username>
        <source.id>1000013064</source.id>
        <useic>TRUE</useic>
        <meta>
          <trk>1</trk>
          <age>70</age>
        </meta>
      </css>
      <pss>
        <source>FFT</source>
        <output>pss</output>
        <username>pecan</username>
        <source.id>1000013064</source.id>
        <useic>TRUE</useic>
      </pss>
      <site>
        <source>FFT</source>
        <output>site</output>
        <username>pecan</username>
        <source.id>1000013064</source.id>
        <useic>TRUE</useic>
      </site>
      <met>
        <source>CRUNCEP</source>
        <output>ED2</output>
      </met>
      <lu>
        <id>294</id>
      </lu>
      <soil>
        <id>297</id>
      </soil>
      <thsum>
        <id>295</id>
      </thsum>
      <veg>
        <id>296</id>
      </veg>
    </inputs>

Once you edit your PEcAn.xml you can than create a settings object using PEcAn functions. Your pecan.xml must be in your working directory.

settings <- PEcAn.settings::read.settings("pecan.xml")
settings <- PEcAn.settings::prepare.settings(settings, force=FALSE)

You can then execute the ic.process function to convert data into a standard Rds file:

input <- settings$run$inputs
dir <- "."
ic.process(settings, input, dir, overwrite = FALSE)

Note that the argument dir is set to the current directory. You will find the final ED2 file there. More importantly though you will find the .Rds file within the same directory.

17.5.1.2.0.2 Example 3 Pool Initial Condition files

If you have pool vegetation data, you’ll need the pool_ic_list2netcdf.R function to convert the pool data into PEcAn standard.

The function stands alone and requires that you provide a named list of netcdf dimensions and values, and a named list of variables and values. Names and units need to match the standard_vars.csv table found here.

#Create a list object with necessary dimensions for your site
input<-list()
dims<- list(lat=-115,lon=45, time= 1)
variables<- list(SoilResp=8,TotLivBiom=295)
input$dims <- dims
input$vals <- variables

Once this is done, set outdir to where you’d like the file to write out to and a siteid. Siteid in this can be used as an file name identifier. Once part of the automated workflow siteid will reflect the site id within the BET db.

outdir  <- "."
siteid <- 772
pool_ic_list2netcdf(input = input, outdir = outdir, siteid = siteid)

You should now have a netcdf file with initial conditions.

17.5.1.3 Soil Data

17.5.1.3.0.1 Example 1: Converting Data in hand

Local data that has the correct names and units can easily be written out in PEcAn standard using the function soil2netcdf.

soil.data <- list(volume_fraction_of_sand_in_soil = c(0.3,0.4,0.5),
                  volume_fraction_of_clay_in_soil = c(0.3,0.3,0.3),
                  soil_depth = c(0.2,0.5,1.0))
                         
soil2netcdf(soil.data,"soil.nc")

At the moment this file would need to be inserted into Inputs manually. By default, this function also calls soil_params, which will estimate a number of hydraulic and thermal parameters from texture. Be aware that at the moment not all model couplers are yet set up to read this file and/or convert it to model-specific formats.

17.5.1.3.0.2 Example 2: Converting PalEON data

In addition to location-specific soil data, PEcAn can extract soil texture information from the PalEON regional soil product, which itself is a subset of the MsTMIP Unified North American Soil Map. If this product is installed on your machine, the appropriate step in the do_conversions workflow is enabled by adding the following tag under <inputs> in your pecan.xml

   <soil>
     <id>1000012896</id>
   </soil>

In the future we aim to extend this extraction to a wider range of soil products.