8.3 Coding Practices
8.3.1 Coding Style
Consistent coding style improves readability and reduces errors in shared code.
Unless otherwise noted, PEcAn follows the Tidyverse style guide, so please familiarize yourself with it before contributing. In addition, note the following:
- Document all functions using
roxygen2
. See Roxygen2 for more details. - Put your name on things. Any function that you create or make a meaningful contribution to should have your name listed after the author tag in the function documentation. It is also often a good idea to add your name to extended comments describing particularly complex or strange code.
- Write unit tests with
testthat
. Tests are a complement to documentation - they define what a function is (and is not) expected to do. Not all functions necessarily need unit tests, but the more tests we have, the more confident we can be that changes don’t break existing code. Whenever you discover and fix a bug, it is a good idea to write a unit test that makes sure the same bug won’t happen again. See Unit_Testing for instructions, and Advanced R: Tests. - Do not use abbreviations.
Always write out
TRUE
andFALSE
(i.e. do not useT
orF
). Do not rely on partial argument matching – write out all arguments in full. - Avoid dots in function names.
R’s S3 methods system uses dots to denote object methods (e.g.
print.matrix
is theprint
method for objects of classmatrix
), which can cause confusion. Use underscores instead (e.g.do_analysis
instead ofdo.analysis
). (NOTE that many old PEcAn functions violate this convention. The plan is to deprecate those in PEcAn 2.0. See GitHub issue #392). - Use informative file names with consistent extensions.
Standard file extensions are
.R
for R scripts,.rds
for individual objects (viasaveRDS
function), and.RData
(note: capital D!) for multiple objects (via thesave
function). For function source code, prefer multiple files with fewer functions in each to large files with lots of files (though it may be a good idea to group closely related functions in a single file). File names should match, or at least closely reflect, their files (e.g. functiondo_analysis
should be defined in a file calleddo_analysis.R
). Do not use spaces in file names – use dashes (-
) or underscores (_
). - For using external packages, add the package to
Imports:
and call the corresponding function withpackage::function
. Do not use@importFrom package function
or, worse yet,@import package
. (The exception is infix operators likemagrittr::%>%
orggplot2::%+%
, which can be imported via roxygen2 documentation like@importFrom magrittr %>%
). Do not add packages toDepends
. In general, try to avoid adding new dependencies (especially ones that depend on system libraries) unless they are necessary or already widely used in PEcAn (e.g. GDAL, NetCDF, XML, JAGS,dplyr
). For a more thorough and nuanced discussion, see the package dependencies appendix.
8.3.2 Logging
During development we often add many print statements to check to see how the code is doing, what is happening, what intermediate results there are etc. When done with the development it would be nice to turn this additional code off, but have the ability to quickly turn it back on if we discover a problem. This is where logging comes into play. Logging allows us to use “rules” to say what information should be shown. For example when I am working on the code to create graphs, I do not have to see any debugging information about the SQL command being sent, however trying to figure out what goes wrong during a SQL statement it would be nice to show the SQL statements without adding any additional code.
PEcAn provides a set of logger.*
functions that should be used in place of base R’s stop
, warn
, print
, and similar functions. The logger
functions make it easier to print to a system log file, and to control the level of output produced by PEcAn.
- The file test.logger.R provides descriptive examples
- This query provides an current overview of functions that use logging
- Logger functions and their corresponding levels (in order of increasing level):
logger.debug
("DEBUG"
) – Low-level diagnostic messages that are hidden by default. Good examples of this are expanded file paths and raw results from database queries or other analyses.logger.info
("INFO"
) – Informational messages that regular users may want to see, but which do not indicate anything unexpected. Good examples of this are progress updates updates for long-running processes, or brief summaries of queries or analyses.logger.warn
("WARN"
) – Warning messages about issues that may lead to unexpected but valid results. Good examples of this are interactions between arguments that lead to some arguments being ignored or removal of missing or extreme values.logger.error
("ERROR"
) – Error messages from which PEcAn has some capacity to recover. Unless you have a very good reason, we recommend avoiding this in favor of eitherlogger.severe
to actually stop execution orlogger.warn
to more explicitly indicate that the problem is not fatal.logger.severe
– Catastrophic errors that warrant immediate termination of the workflow. This is the only function that actually stops R’s execution (viastop
).- The
logger.setLevel
function sets the level at which a message will be printed. For instance,logger.setLevel("WARN")
will suppresslogger.info
andlogger.debug
messages, but will printlogger.warn
andlogger.error
messages.logger.setLevel("OFF")
suppresses all logger messages. - To print all messages to console, use
logger.setUseConsole(TRUE)
8.3.3 Package Data
8.3.3.1 Summary:
Files with the following extensions will be read by R as data:
- plain R code in .R and .r files are sourced using
source()
- text tables in .tab, .txt, .csv files are read using
read()
** objects in R image files: .RData, .rda are loaded usingload()
- capitalization matters
- all objects in foo.RData are loaded into environment
- pro: easiset way to store objects in R format
- con: format is application (R) specific
Details are in ?data
, which is mostly a copy of Data section of
Writing R
Extensions.
8.3.3.2 Accessing data
Data in the [data] directory will be accessed in the following ways,
- efficient way: (especially for large data sets) using the
data
function:
data(foo) # accesses data with, e.g. load(foo.RData), read(foo.csv), or source(foo.R)
- easy way: by adding the following line to the package DESCRIPTION: note: this should be used with caution or it can cause difficulty as discussed in redmine issue #1118
: TRUE LazyData
From the R help page:
Currently, a limited number of data formats can be accessed using the data
function by placing one of the following filetypes in a packages’ data
directory:
* files ending .R
or .r
are source()
d in, with the R working
directory changed temporarily to the directory containing the respective
file. (data
ensures that the utils
package is attached, in case it
had been run via utils::data
.)
* files ending .RData
or .rda
are load()
ed.
* files ending .tab
, .txt
or .TXT
are read using read.table(..., header = TRUE)
, and hence result in a data frame.
* files ending .csv
or .CSV
are read using read.table(..., header = TRUE, sep = ';')
, and also result in a data frame.
If your data does not fall in those 4 categories, or you can use the
system.file
function to get access to the data:
system.file("data", "ed.trait.dictionary.csv", package="PEcAn.utils")
1] "/home/kooper/R/x86_64-pc-linux-gnu-library/2.15/PEcAn.utils/data/ed.trait.dictionary.csv" [
The arguments are folder, filename(s) and then package. It will return the fully qualified path name to a file in a package, in this case it points to the trait data. This is almost the same as the data function, however we can now use any function to read the file, such as read.csv instead of read.csv2 which seems to be the default of data. This also allows us to store arbitrary files in the data folder, such as the the bug file and load it when we need it.
8.3.3.2.1 Examples of data in PEcAn packages
- outputs: [/modules/uncertainties/data/output.RData]
- parameter samples [/modules/uncertainties/data/samples.RData]
8.3.4 Documenting functions using roxygen2
This is the standard method for documenting R functions in PEcAn. For detailed instructions, see one of the following resources:
roxygen2
pacakge documentation- Formatting overview
- Markdown formatting
- Namespaces (e.g. when to use
@export
)
- From “R packages” by Hadley Wickham:
Below is a complete template for a Roxygen documentation block.
Note that roxygen lines start with #'
:
#' Function title, in a few words
#'
#' Function description, in 2-3 sentences.
#'
#' (Optional) Package details.
#'
#' @param argument_1 A description of the argument
#' @param argument_2 Another argument to the function
#' @return A description of what the function returns.
#'
#' @author Your name <your_email@email.com>
#' @examples
#' \dontrun{
#' # This example will NOT be run by R CMD check.
#' # Useful for long-running functions, or functions that
#' # depend on files or values that may not be accessible to R CMD check.
#' my_function("~/user/my_file")
#'}
# # This example WILL be run by R CMD check
#' my_function(1:10, argument_2 = 5)
## ^^ A few examples of the function's usage
#' @export
# ^^ Whether or not the function will be "exported" (made available) to the user.
# If omitted, the function can only be used inside the package.
<- function(argument_1, argument_2) {...} my_function
Here is a complete example from the PEcAn.utils::days_in_year()
function:
#' Number of days in a year
#'
#' Calculate number of days in a year based on whether it is a leap year or not.
#'
#' @param year Numeric year (can be a vector)
#' @param leap_year Default = TRUE. If set to FALSE will always return 365
#'
#' @author Alexey Shiklomanov
#' @return integer vector, all either 365 or 366
#' @export
#' @examples
#' days_in_year(2010) # Not a leap year -- returns 365
#' days_in_year(2012) # Leap year -- returns 366
#' days_in_year(2000:2008) # Function is vectorized over years
<- function(year, leap_year = TRUE) {...} days_in_year
To update documentation throughout PEcAn, run make document
in the PEcAn root directory.
Make sure you do this before opening a pull request –
PEcAn’s automated testing (Travis) will check if any documentation is out of date and will throw an error like the following if it is:
These files were changed by the build process:
{...}