Convert between formats, reusing existing files where possible
Source:R/convert_input.R
convert_input.Rdconvert_input is a relatively generic function that applies the function fcn and inserts a record of it into the database. It is primarily designed for converting meteorological data between formats and can be used on observed data, forecasts, and ensembles of forecasts.
To minimize downloading and storing duplicate data, it first checks to see if a given file is already in the
database before applying fcn.
Usage
convert_input(
input.id,
outfolder,
formatname,
mimetype,
site.id,
start_date,
end_date,
pkg,
fcn,
con = con,
host,
write = TRUE,
format.vars,
overwrite = FALSE,
exact.dates = FALSE,
allow.conflicting.dates = TRUE,
insert.new.file = FALSE,
pattern = NULL,
forecast = FALSE,
ensemble = FALSE,
ensemble_name = NULL,
dbparms = NULL,
...
)Arguments
- input.id
The database id of the input file of the parent of the file being processed here. The parent will have the same data, but in a different format.
- outfolder
The directory where files generated by functions called by convert_input will be placed
- formatname
data product specific format name
- mimetype
data product specific file format
- site.id
The id of the site
- start_date
Start date of the data being requested or processed
- end_date
End date of the data being requested or processed
- pkg
The package that the function being executed is in (as a string)
- fcn
The function to be executed if records of the output file aren't found in the database. (as a string)
- con
Database connection object
- host
Named list identifying the machine where conversion should be performed. Currently only
host$nameandhost$Rbinaryare used byconvert_input, but the whole list is passed to other functions- write
Logical: Write new file records to the database?
- format.vars
Passed on as arguments to
fcn- overwrite
Logical: If a file already exists, create a fresh copy? Passed along to fcn.
- exact.dates
Ignore time-span appending and enforce exact start and end dates on the database input file? (logical)
- allow.conflicting.dates
Should overlapping years ignore time-span appending and exist as separate input files? (logical)
- insert.new.file
Logical: force creation of a new database record even if one already exists?
- pattern
A regular expression, passed to
dbfile.input.check, used to match the name of the input file.- forecast
Logical: Is the data product a forecast?
- ensemble
An integer representing the number of ensembles, or FALSE if it data product is not an ensemble.
- ensemble_name
If convert_input is being called iteratively for each ensemble, ensemble_name contains the identifying name/number for that ensemble.
- dbparms
list of parameters to use for opening a database connection
- ...
Additional arguments, passed unchanged to
fcn
Value
A list of two BETY IDs (input.id, dbfile.id) identifying a pre-existing file if one was available, or a newly created file if not. Each id may be a vector of ids if the function is processing an entire ensemble at once.
Executing the function
convert_input executes the function fcn in package pkg via PEcAn.remote::remote.execute.R. All additional arguments passed to convert_input (...) are in turn passed along to fcn as arguments. In addition, several named arguments to convert_input are passed along to fcn. The command to execute fcn is built as a string.
Database files
There are two kinds of database records (in different tables) that represent a given data file in the file system. An input file contains information about the contents of the data file. A dbfile contains machine spacific information for a given input file, such as the file path. Because duplicates of data files for a given input can be on multiple different machines, there can be more than one dbfile for a given input file.
Time-span appending
By default, convert_input tries to optimize the download of most data products by only downloading the years of data not present on the current machine. (For example, if files for 2004-2008 exist for a given data product exist on this machine and the user requests 2006-2010, the function will only download data for 2009 and 2010). In year-long data files, each year exists as a separate file. The database input file contains records of the bounds of the range stored by those years. The data optimization can be turned off by overriding the default values for exact.dates and allow.conflicting.dates.
Forecast data
If the flag forecast is TRUE, convert_input treats data as if it were forecast data. Forecast data do not undergo time span appending.
Ensembles
convert_input has the capability to handle ensembles of met data. If ensemble = an integer > 1, convert_input checks the database for records of all ensemble members, and calls fcn if at least one is missing. convert_input assumes that fcn will return records for all ensembles. convert_input can also be called iteratevely for each ensemble member. In this case ensemble_name contains the unique identifying name/number of the ensemble to be processed.