17 Creating a new Format record in BETY
If the Format you are looking for is not available, you will need to create a new record. Before entering information into the database, you need to be able to answer the following questions about your data:
- What is the file MIME type?
- We have a suit of functions for loading in data in open formats such as CSV, txt, netCDF, etc.
- PEcAn has partnered with the NCSA BrownDog project to create a service that can read and convert as many data formats as possible. If your file type is less common or a proprietary type, you can use the BrownDog DAP to convert it to a format that can be used with PEcAn.
- If BrownDog cannot convert your data, you will need to contact us about writing a data specific load function.
- What variables does the file contain?
- What are the variables named?
- What are the variable units?
- How do the variable names and units in the data map to PEcAn variables in the BETY database? See below for an example. It is most likely that you will NOT need to add variables to BETY. However, identifying the appropriate variables matches in the database may require some work. We are always available to help answer your questions.
- Is there a timestamp on the data?
- What are the units of time?
Here is an example using a fake dataset:
This data started out as an excel document, but was saved as a CSV file.
To create a Formats record for this data, in the web interface of BETY, select Runs > Formats and click New Format.
You will need to fill out the following fields:
- MIME type: File type (you can search for other formats in the text field)
- Name: The name of your format (this can be whatever you want)
- Header: Boolean that denotes whether or not your data contains a header as the first line of the data. (1 = TRUE, 0 = FALSE)
- Skip: The number of lines above the data that should be skipped. For example, metadata that should not be included when reading in the data or blank spaces.
- Notes: Any additional information about the data such as sources and citations.
Here is the Formats record for the example data:
When you have finished this section, hit Create. The final record will be displayed on the screen.
17.1 Formats -> Variables
After a Format entry has been created, you are encouraged to edit the entry to add relationships between the file’s variables and the Variables table in PEcAn. Not only do these relationships provide meta-data describing the file format, but they also allow PEcAn to search and (for some MIME types) read files.
To enter this data, select Edit Record and on the edit screen select View Related Variable.
Here is the record for the example data after adding related variables:
17.1.1 Name and Unit
For each variable in the file you will want at a minimum to specify the NAME of the variable within your file and match that to the equivalent Variable in the pulldown.
Make sure to search for your variables under Data > Variables before suggesting that we create a new variable record. This may not always be a straightforward process.
For example bety contains a record for Net Primary Productivity:
This record does not have the same variable name or the same units as NPP in the example data. You may have to do some reading to confirm that they are the same variable. In this case - Both the data and the record are for Net Primary Productivity (the notes section provides additional resources for interpreting the variable.) - The units of the data can be converted to those of the vairiable record (this can be checked by running udunits2::ud.are.convertible("g C m-2 yr-1", "Mg C ha-1 yr-1")
)
Differences between the data and the variable record can be accounted for in the data Formats record.
- Under Variable, select the variable as it is recorded in bety.
- Under Name, write the name the variable has in your data file.
- Under Unit, write the units the variable has in your data file.
NOTE: All units must be written in a udunits compliant format. To check that your units can be read by udunits, in R, load the udunits2 package and run udunits2::is.parseable("g C m-2 yr-1")
If the name or the units are the same, you can leave the Name and Unit fields blank. This is can be seen with the variable LAI.
17.1.2 Storage Type
Storage Type only needs to be specified if the variable is stored in a format other than what would be expected (e.g. if numeric values are stored as quoted character strings).
One such example is time variables.
PEcAn converts all dates into POSIX format using R functions such as strptime
. These functions require that the user specify the format in which the date is written.
The default is "%Y-%m-%d %H:%M:%S"
which would look like "2017-01-01 00:00:00"
A list of date formats can be found in the R documentation for the function strptime
Below are some commonly used codes:
%d | Day of the month as decimal number (01–31). |
---|---|
%D | Date format such as %m/%d/%y. |
%H | Hours as decimal number (00–23). |
%m | Month as decimal number (01–12). |
%M | Minute as decimal number (00–59). |
%S | Second as integer (00–61), allowing for up to two leap-seconds (but POSIX-compliant implementations will ignore leap seconds). |
%T | Equivalent to %H:%M:%S. |
%y | Year without century (00–99). On input, values 00 to 68 are prefixed by 20 and 69 to 99 by 19 – that is the behaviour specified by the 2004 and 2008 POSIX standards, but they do also say ‘it is expected that in a future version the default century inferred from a 2-digit year will change’. |
%Y | Year with century. |
17.1.3 Column Number
If your data is in text format with variables in a standard order then you can specify the Column Number for the variable. This is required for text files that lack headers.
17.2 Retrieving Format Information
To acquire Format information from a Format record, use the R function query.format.vars
17.2.1 Inputs
bety
: connection to BETYinput.id=NA
and/orformat.id=NA
: Input or Format record ID from BETY- At least one must be specified. Defaults to
format.id
if both provided. var.ids=NA
: optional vector of variable IDs. If provided, limits results to these variables.
17.2.2 Output
- R list object containing many things. Fill this in.
17.3 Input records in BETY
All model input data or data used for model calibration/validation must be registered in the BETY database.
Before creating a new Input record, you must make sure that the format type of your data is registered in the database. If you need to make a new format record, see Creating a new format record in BETY.
17.4 Create a database file record for the input data
An input record contains all the metadata required to identify the data, however, this record does not include the location of the data file. Since the same data may be stored in multiple places, every file has its own dbfile record.
From your BETY interface:
- Create a DBFILES entry for the path to the file
- From the menu click RUNS then FILES
- Click “New File”
- Select the machine your file is located at
- Fill in the File Path where your file is located (aka folder or directory) NOT including the name of the file itself
- Fill in the File Name with the name of the file itself. Note that some types of input records will refer to be ALL the files in a directory and thus File Name can be blank
- Click Update
17.5 Creating a new Input record in BETY
From your BETY interface:
- Create an INPUT entry for your data
- From the menu click RUNS then INPUTS
- Click “New Input”
- Select the SITE that this data is associated with the input data set
- Other required fields are a unique name for the input, the start and end dates of the data set, and the format of the data. If the data is not in a currently known format you will need to create a NEW FORMAT and possibly a new input converter. Instructions on how to do add a converter can be found here Input conversion. Instructions on how to add a format record can be found here
- Parent ID is an optional variable to indicated that one dataset was derived from another.
- Click “Create”
- Associate the DBFILE with the INPUT
- In the RUNS -> INPUTS table, search and find the input record you just created
- Click on the EDIT icon
- Select “View related Files”
- In the Search window, search for the DBFILE you just created
- Once you have found the DBFILE, click on the “+” icon to add the file
- Click on “Update” at the bottom when you are done.
17.5.1 Input Conversion
Three Types of data conversions are discussed below: Meteorological data, Vegetation data, and Soil data. Each section provides instructions on how to convert data from their raw formats into a PEcAn standard format, whether it be from a database or if you have raw data in hand.
Also, see PEcAn standard formats.
17.5.1.1 Meterological Data conversion
17.5.1.1.1 Adding a function to PEcAn to convert a met data source
In general, you will need to write a function to download the raw met data and one to convert it to the PEcAn standard.
Downloading raw data function are named download.<source>.R
. These functions are stored within the PEcAn directory: /modules/data.atmosphere/R
.
Conversion function from raw to standard are named met2CF.<source>.R
. These functions are stored within the PEcAn directory: /modules/data.atmosphere/R
.
Current Meteorological products that are coupled to PEcAn can be found in our Available Meteorological Drivers page.
Note: Unless you are also adding a new model, you will not need to write a script to convert from PEcAn standard to PEcAn models. Those conversion scripts are written when a model is added and can be found within each model’s PEcAn directory.
17.5.1.1.2 Dimensions:
CF standard-name | units |
---|---|
time | days since 1700-01-01 00:00:00 UTC |
longitude | degrees_east |
latitude | degrees_north |
General Note: dates in the database should be date-time (preferably with timezone), and datetime passed around in PEcAn should be of type POSIXct.
17.5.1.1.3 The variable names should be standard_name
CF standard-name | units | bety | isimip | cruncep | narr | ameriflux |
---|---|---|---|---|---|---|
air_temperature | K | airT | tasAdjust | tair | air | TA (C) |
air_temperature_max | K | tasmaxAdjust | NA | tmax | ||
air_temperature_min | K | tasminAdjust | NA | tmin | ||
air_pressure | Pa | air_pressure | PRESS (KPa) | |||
mole_fraction_of_carbon_dioxide_in_air | mol/mol | CO2 | ||||
moisture_content_of_soil_layer | kg m-2 | |||||
soil_temperature | K | soilT | TS1 (NOT DONE) | |||
relative_humidity | % | relative_humidity | rhurs | NA | rhum | RH |
specific_humidity | 1 | specific_humidity | NA | qair | shum | CALC(RH) |
water_vapor_saturation_deficit | Pa | VPD | VPD (NOT DONE) | |||
surface_downwelling_longwave_flux_in_air | W m-2 | same | rldsAdjust | lwdown | dlwrf | Rgl |
surface_downwelling_shortwave_flux_in_air | W m-2 | solar_radiation | rsdsAdjust | swdown | dswrf | Rg |
surface_downwelling_photosynthetic_photon_flux_in_air | mol m-2 s-1 | PAR | PAR (NOT DONE) | |||
precipitation_flux | kg m-2 s-1 | cccc | prAdjust | rain | acpc | PREC (mm/s) |
degrees | wind_direction | WD | ||||
wind_speed | m/s | Wspd | WS | |||
eastward_wind | m/s | eastward_wind | CALC(WS+WD) | |||
northward_wind | m/s | northward_wind | CALC(WS+WD) |
- preferred variables indicated in bold
- wind_direction has no CF equivalent and should not be converted, instead the met2CF functions should convert wind_direction and wind_speed to eastward_wind and northward_wind
- standard_name is CF-convention standard names
- units can be converted by udunits, so these can vary (e.g. the time denominator may change with time frequency of inputs)
- soil moisture for the full column, rather than a layer, is soil_moisture_content
- A full list of PEcAn standard variable names, units and dimensions can be found here: https://github.com/PecanProject/pecan/blob/develop/base/utils/data/standard_vars.csv
For example, in the MsTMIP-CRUNCEP data, the variable rain
should be precipitation_rate
. We want to standardize the units as well as part of the met2CF.<product>
step. I believe we want to use the CF “canonical” units but retain the MsTMIP units any time CF is ambiguous about the units.
The key is to process each type of met data (site, reanalysis, forecast, climate scenario, etc) to the exact same standard. This way every operation after that (extract, gap fill, downscale, convert to a model, etc) will always have the exact same inputs. This will make everything else much simpler to code and allow us to avoid a lot of unnecessary data checking, tests, etc being repeated in every downstream function.
17.5.1.1.4 Adding Single-Site Specific Meteorological Data
Perhaps you have meteorological data specific to one site, with a unique format that you would like to add to PEcAn. Your steps would be to: 1. write a script or function to convert your files into the netcdf PEcAn standard 2. insert that file as an input record for your site following these instructions
17.5.1.1.5 Processing Met data outside of the workflow using PEcAn functions
Perhaps you would like to obtain data from one of the sources coupled to PEcAn on its own. To do so you can run PEcAn functions on their own.
17.5.1.1.5.1 Example 1: Processing data from a database
Download Amerifluxlbl from Niwot Ridge for the year 2004:
raw.file <-PEcAn.data.atmosphere::download.AmerifluxLBL(sitename = "US-NR1",
outfolder = ".",
start_date = "2004-01-01",
end_date = "2004-12-31")
Using the information returned as the object raw.file
you will then convert the raw files into a standard file.
Open a connection with BETY. You may need to change the host name depending on what machine you are hosting BETY. You can find the hostname listed in the machines table of BETY.
bety <- dplyr::src_postgres(dbname = 'bety',
host ='localhost',
user = "bety",
password = "bety")
con <- bety$con
Next you will set up the arguments for the function
in.path <- '.'
in.prefix <- raw.file$dbfile.name
outfolder <- '.'
format.id <- 5000000002
format <- PEcAn.DB::query.format.vars(format.id=format.id,bety = bety)
lon <- -105.54
lat <- 40.03
format$time_zone <- "America/Chicago"
Note: The format.id can be pulled from the BETY database if you know the format of the raw data.
Once these arguments are defined you can execute the met2CF.csv
function
PEcAn.data.atmosphere::met2CF.csv(in.path = in.path,
in.prefix =in.prefix,
outfolder = ".",
start_date ="2004-01-01",
end_date = "2004-12-01",
lat= lat,
lon = lon,
format = format)
17.5.1.1.5.2 Example 2: Processing data from data already in hand
If you have Met data already in hand and you would like to convert into the PEcAn standard follow these instructions.
Update BETY with file record, format record and input record according to this page How to Insert new Input Data
If your data is in a csv format you can use the met2CF.csv
function to convert your data into a PEcAn standard file.
Open a connection with BETY. You may need to change the host name depending on what machine you are hosting BETY. You can find the hostname listed in the machines table of BETY.
bety <- dplyr::src_postgres(dbname = 'bety',
host ='localhost',
user = "bety",
password = "bety")
con <- bety$con
Prepare the arguments you need to execute the met2CF.csv function
in.path <- 'path/where/the/raw/file/lives'
in.prefix <- 'prefix_of_the_raw_file'
outfolder <- 'path/to/where/you/want/to/output/thecsv/'
format.id <- formatid of the format your created
format <- PEcAn.DB::query.format.vars(format.id=format.id,bety = bety)
lon <- longitude of your site
lat <- latitude of your site
format$time_zone <- time zone of your site
start_date <- Start date of your data in "y-m-d"
end_date <- End date of your data in "y-m-d"
Next you can execute the function:
PEcAn.data.atmosphere::met2CF.csv(in.path = in.path,
in.prefix =in.prefix,
outfolder = ".",
start_date = start_date,
end_date = end_date,
lat= lat,
lon = lon,
format = format)
17.5.1.2 Vegetation Data
Vegetation data will be required to parameterize your model. In these examples we will go over how to produce a standard initial condition file.
The main function to process cohort data is the ic.process.R
function. As of now however, if you require pool data you will run a separate function, pool_ic_list2netcdf.R
.
17.5.1.2.0.1 Example 1: Processing Veg data from data in hand.
In the following example we will process vegetation data that you have in hand using PEcAn.
First, you’ll need to create a input record in BETY that will have a file record and format record reflecting the location and format of your file. Instructions can be found in our How to Insert new Input Data page.
Once you have created an input record you must take note of the input id of your record. An easy way to take note of this is in the URL of the BETY webpage that shows your input record. In this example we use an input record with the id 1000013064
which can be found at this url: https://psql-pecan.bu.edu/bety/inputs/1000013064# . Note that this is the Boston University BETY database. If you are on a different machine, your url will be different.
With the input id in hand you can now edit a pecan XML so that the PEcAn function ic.process
will know where to look in order to process your data. The inputs
section of your pecan XML will look like this. As of now ic.process is set up to work with the ED2 model so we will use ED2 settings and then grab the intermediary Rds data file that is created as the standard PEcAn file. For your Inputs section you will need to input your input id wherever you see the source.ic
flag.
<inputs>
<css>
<source>FFT</source>
<output>css</output>
<username>pecan</username>
<source.id>1000013064</source.id>
<useic>TRUE</useic>
<meta>
<trk>1</trk>
<age>70</age>
</meta>
</css>
<pss>
<source>FFT</source>
<output>pss</output>
<username>pecan</username>
<source.id>1000013064</source.id>
<useic>TRUE</useic>
</pss>
<site>
<source>FFT</source>
<output>site</output>
<username>pecan</username>
<source.id>1000013064</source.id>
<useic>TRUE</useic>
</site>
<met>
<source>CRUNCEP</source>
<output>ED2</output>
</met>
<lu>
<id>294</id>
</lu>
<soil>
<id>297</id>
</soil>
<thsum>
<id>295</id>
</thsum>
<veg>
<id>296</id>
</veg>
</inputs>
Once you edit your PEcAn.xml you can than create a settings object using PEcAn functions. Your pecan.xml
must be in your working directory.
settings <- PEcAn.settings::read.settings("pecan.xml")
settings <- PEcAn.settings::prepare.settings(settings, force=FALSE)
You can then execute the ic.process
function to convert data into a standard Rds file:
input <- settings$run$inputs
dir <- "."
ic.process(settings, input, dir, overwrite = FALSE)
Note that the argument dir
is set to the current directory. You will find the final ED2 file there. More importantly though you will find the .Rds
file within the same directory.
17.5.1.2.0.2 Example 3 Pool Initial Condition files
If you have pool vegetation data, you’ll need the pool_ic_list2netcdf.R
function to convert the pool data into PEcAn standard.
The function stands alone and requires that you provide a named list of netcdf dimensions and values, and a named list of variables and values. Names and units need to match the standard_vars.csv table found here.
#Create a list object with necessary dimensions for your site
input<-list()
dims<- list(lat=-115,lon=45, time= 1)
variables<- list(SoilResp=8,TotLivBiom=295)
input$dims <- dims
input$vals <- variables
Once this is done, set outdir
to where you’d like the file to write out to and a siteid. Siteid in this can be used as an file name identifier. Once part of the automated workflow siteid will reflect the site id within the BET db.
outdir <- "."
siteid <- 772
pool_ic_list2netcdf(input = input, outdir = outdir, siteid = siteid)
You should now have a netcdf file with initial conditions.
17.5.1.3 Soil Data
17.5.1.3.0.1 Example 1: Converting Data in hand
Local data that has the correct names and units can easily be written out in PEcAn standard using the function soil2netcdf.
soil.data <- list(volume_fraction_of_sand_in_soil = c(0.3,0.4,0.5),
volume_fraction_of_clay_in_soil = c(0.3,0.3,0.3),
soil_depth = c(0.2,0.5,1.0))
soil2netcdf(soil.data,"soil.nc")
At the moment this file would need to be inserted into Inputs manually. By default, this function also calls soil_params, which will estimate a number of hydraulic and thermal parameters from texture. Be aware that at the moment not all model couplers are yet set up to read this file and/or convert it to model-specific formats.
17.5.1.3.0.2 Example 2: Converting PalEON data
In addition to location-specific soil data, PEcAn can extract soil texture information from the PalEON regional soil product, which itself is a subset of the MsTMIP Unified North American Soil Map. If this product is installed on your machine, the appropriate step in the do_conversions workflow is enabled by adding the following tag under <inputs>
in your pecan.xml
<soil>
<id>1000012896</id>
</soil>
In the future we aim to extend this extraction to a wider range of soil products.