8.3 Coding Practices

8.3.1 Coding Style

Consistent coding style improves readability and reduces errors in shared code.

Unless otherwise noted, PEcAn follows the Tidyverse style guide, so please familiarize yourself with it before contributing. In addition, note the following:

  • Document all functions using roxygen2. See Roxygen2 for more details.
  • Put your name on things. Any function that you create or make a meaningful contribution to should have your name listed after the author tag in the function documentation. It is also often a good idea to add your name to extended comments describing particularly complex or strange code.
  • Write unit tests with testthat. Tests are a complement to documentation - they define what a function is (and is not) expected to do. Not all functions necessarily need unit tests, but the more tests we have, the more confident we can be that changes don’t break existing code. Whenever you discover and fix a bug, it is a good idea to write a unit test that makes sure the same bug won’t happen again. See Unit_Testing for instructions, and Advanced R: Tests.
  • Do not use abbreviations. Always write out TRUE and FALSE (i.e. do not use T or F). Do not rely on partial argument matching – write out all arguments in full.
  • Avoid dots in function names. R’s S3 methods system uses dots to denote object methods (e.g. print.matrix is the print method for objects of class matrix), which can cause confusion. Use underscores instead (e.g. do_analysis instead of do.analysis). (NOTE that many old PEcAn functions violate this convention. The plan is to deprecate those in PEcAn 2.0. See GitHub issue #392).
  • Use informative file names with consistent extensions. Standard file extensions are .R for R scripts, .rds for individual objects (via saveRDS function), and .RData (note: capital D!) for multiple objects (via the save function). For function source code, prefer multiple files with fewer functions in each to large files with lots of files (though it may be a good idea to group closely related functions in a single file). File names should match, or at least closely reflect, their files (e.g. function do_analysis should be defined in a file called do_analysis.R). Do not use spaces in file names – use dashes (-) or underscores (_).
  • For using external packages, add the package to Imports: and call the corresponding function with package::function. Do not use @importFrom package function or, worse yet, @import package. (The exception is infix operators like magrittr::%>% or ggplot2::%+%, which can be imported via roxygen2 documentation like @importFrom magrittr %>%). Do not add packages to Depends. In general, try to avoid adding new dependencies (especially ones that depend on system libraries) unless they are necessary or already widely used in PEcAn (e.g. GDAL, NetCDF, XML, JAGS, dplyr). For a more thorough and nuanced discussion, see the package dependencies appendix.

8.3.2 Logging

During development we often add many print statements to check to see how the code is doing, what is happening, what intermediate results there are etc. When done with the development it would be nice to turn this additional code off, but have the ability to quickly turn it back on if we discover a problem. This is where logging comes into play. Logging allows us to use “rules” to say what information should be shown. For example when I am working on the code to create graphs, I do not have to see any debugging information about the SQL command being sent, however trying to figure out what goes wrong during a SQL statement it would be nice to show the SQL statements without adding any additional code.

PEcAn provides a set of logger.* functions that should be used in place of base R’s stop, warn, print, and similar functions. The logger functions make it easier to print to a system log file, and to control the level of output produced by PEcAn.

  • The file test.logger.R provides descriptive examples
  • This query provides an current overview of functions that use logging
  • Logger functions and their corresponding levels (in order of increasing level):
  • logger.debug ("DEBUG") – Low-level diagnostic messages that are hidden by default. Good examples of this are expanded file paths and raw results from database queries or other analyses.
  • logger.info ("INFO") – Informational messages that regular users may want to see, but which do not indicate anything unexpected. Good examples of this are progress updates updates for long-running processes, or brief summaries of queries or analyses.
  • logger.warn ("WARN") – Warning messages about issues that may lead to unexpected but valid results. Good examples of this are interactions between arguments that lead to some arguments being ignored or removal of missing or extreme values.
  • logger.error ("ERROR") – Error messages from which PEcAn has some capacity to recover. Unless you have a very good reason, we recommend avoiding this in favor of either logger.severe to actually stop execution or logger.warn to more explicitly indicate that the problem is not fatal.
  • logger.severe – Catastrophic errors that warrant immediate termination of the workflow. This is the only function that actually stops R’s execution (via stop).
  • The logger.setLevel function sets the level at which a message will be printed. For instance, logger.setLevel("WARN") will suppress logger.info and logger.debug messages, but will print logger.warn and logger.error messages. logger.setLevel("OFF") suppresses all logger messages.
  • To print all messages to console, use logger.setUseConsole(TRUE)

8.3.3 Package Data

8.3.3.1 Summary:

Files with the following extensions will be read by R as data:

  • plain R code in .R and .r files are sourced using source()
  • text tables in .tab, .txt, .csv files are read using read() ** objects in R image files: .RData, .rda are loaded using load()
  • capitalization matters
  • all objects in foo.RData are loaded into environment
  • pro: easiset way to store objects in R format
  • con: format is application (R) specific

Details are in ?data, which is mostly a copy of Data section of Writing R Extensions.

8.3.3.2 Accessing data

Data in the [data] directory will be accessed in the following ways,

  • efficient way: (especially for large data sets) using the data function:
data(foo) # accesses data with, e.g. load(foo.RData), read(foo.csv), or source(foo.R) 
  • easy way: by adding the following line to the package DESCRIPTION: note: this should be used with caution or it can cause difficulty as discussed in redmine issue #1118
LazyData: TRUE

From the R help page:

Currently, a limited number of data formats can be accessed using the data function by placing one of the following filetypes in a packages’ data directory: * files ending .R or .r are source()d in, with the R working directory changed temporarily to the directory containing the respective file. (data ensures that the utils package is attached, in case it had been run via utils::data.) * files ending .RData or .rda are load()ed. * files ending .tab, .txt or .TXT are read using read.table(..., header = TRUE), and hence result in a data frame. * files ending .csv or .CSV are read using read.table(..., header = TRUE, sep = ';'), and also result in a data frame.

If your data does not fall in those 4 categories, or you can use the system.file function to get access to the data:

system.file("data", "ed.trait.dictionary.csv", package="PEcAn.utils")
[1] "/home/kooper/R/x86_64-pc-linux-gnu-library/2.15/PEcAn.utils/data/ed.trait.dictionary.csv"

The arguments are folder, filename(s) and then package. It will return the fully qualified path name to a file in a package, in this case it points to the trait data. This is almost the same as the data function, however we can now use any function to read the file, such as read.csv instead of read.csv2 which seems to be the default of data. This also allows us to store arbitrary files in the data folder, such as the the bug file and load it when we need it.

8.3.3.2.1 Examples of data in PEcAn packages
  • outputs: [/modules/uncertainties/data/output.RData]
  • parameter samples [/modules/uncertainties/data/samples.RData]

8.3.4 Documenting functions using roxygen2

This is the standard method for documenting R functions in PEcAn. For detailed instructions, see one of the following resources:

Below is a complete template for a Roxygen documentation block. Note that roxygen lines start with #':

#' Function title, in a few words
#'
#' Function description, in 2-3 sentences.
#'
#' (Optional) Package details.
#'
#' @param argument_1 A description of the argument
#' @param argument_2 Another argument to the function
#' @return A description of what the function returns.
#'
#' @author Your name <your_email@email.com>
#' @examples
#' \dontrun{
#'   # This example will NOT be run by R CMD check.
#'   # Useful for long-running functions, or functions that
#'   # depend on files or values that may not be accessible to R CMD check.
#'   my_function("~/user/my_file")
#'}
# # This example WILL be run by R CMD check
#' my_function(1:10, argument_2 = 5)
## ^^ A few examples of the function's usage
#' @export
# ^^ Whether or not the function will be "exported" (made available) to the user.
# If omitted, the function can only be used inside the package.
my_function <- function(argument_1, argument_2) {...}

Here is a complete example from the PEcAn.utils::days_in_year() function:

#' Number of days in a year
#'
#' Calculate number of days in a year based on whether it is a leap year or not.
#'
#' @param year Numeric year (can be a vector)
#' @param leap_year Default = TRUE. If set to FALSE will always return 365
#'
#' @author Alexey Shiklomanov
#' @return integer vector, all either 365 or 366
#' @export
#' @examples
#' days_in_year(2010)  # Not a leap year -- returns 365
#' days_in_year(2012)  # Leap year -- returns 366
#' days_in_year(2000:2008)  # Function is vectorized over years
days_in_year <- function(year, leap_year = TRUE) {...}

To update documentation throughout PEcAn, run make document in the PEcAn root directory. Make sure you do this before opening a pull request – PEcAn’s automated testing (Travis) will check if any documentation is out of date and will throw an error like the following if it is:

These files were changed by the build process:
{...}