9 dplyr

When running devtools::check() a common warning relating to dplyr code is “no visible binding for global variable ‘x’”

In data-masking situations (e.g. mutate(), filter()), you can eliminate this warning by using the .data pronoun. For example, instead of df %>% mutate(newvar = oldvar + 2), use df %>% mutate(newvar = .data$oldvar + 2).

In tidy-select situations (e.g. select(), rename()), you can eliminate this warning by using strings instead of naked column names. For example, instead of df %>% select(y) use df %>% select("y"). Using .data inside of select() is deprecated as of tidyselect v1.2.0

9.0.1 Logging

During development we often add many print statements to check to see how the code is doing, what is happening, what intermediate results there are etc. When done with the development it would be nice to turn this additional code off, but have the ability to quickly turn it back on if we discover a problem. This is where logging comes into play. Logging allows us to use “rules” to say what information should be shown. For example when I am working on the code to create graphs, I do not have to see any debugging information about the SQL command being sent, however trying to figure out what goes wrong during a SQL statement it would be nice to show the SQL statements without adding any additional code.

PEcAn provides a set of logger.* functions that should be used in place of base R’s stop, warn, print, and similar functions. The logger functions make it easier to print to a system log file, and to control the level of output produced by PEcAn.

  • The file test.logger.R provides descriptive examples
  • This query provides an current overview of functions that use logging
  • Logger functions and their corresponding levels (in order of increasing level):
  • logger.debug ("DEBUG") – Low-level diagnostic messages that are hidden by default. Good examples of this are expanded file paths and raw results from database queries or other analyses.
  • logger.info ("INFO") – Informational messages that regular users may want to see, but which do not indicate anything unexpected. Good examples of this are progress updates updates for long-running processes, or brief summaries of queries or analyses.
  • logger.warn ("WARN") – Warning messages about issues that may lead to unexpected but valid results. Good examples of this are interactions between arguments that lead to some arguments being ignored or removal of missing or extreme values.
  • logger.error ("ERROR") – Error messages from which PEcAn has some capacity to recover. Unless you have a very good reason, we recommend avoiding this in favor of either logger.severe to actually stop execution or logger.warn to more explicitly indicate that the problem is not fatal.
  • logger.severe – Catastrophic errors that warrant immediate termination of the workflow. This is the only function that actually stops R’s execution (via stop).
  • The logger.setLevel function sets the level at which a message will be printed. For instance, logger.setLevel("WARN") will suppress logger.info and logger.debug messages, but will print logger.warn and logger.error messages. logger.setLevel("OFF") suppresses all logger messages.
  • To print all messages to console, use logger.setUseConsole(TRUE)

9.0.2 Package Data

9.0.2.1 Summary:

Files with the following extensions will be read by R as data:

  • plain R code in .R and .r files are sourced using source()
  • text tables in .tab, .txt, .csv files are read using read() ** objects in R image files: .RData, .rda are loaded using load()
  • capitalization matters
  • all objects in foo.RData are loaded into environment
  • pro: easiset way to store objects in R format
  • con: format is application (R) specific

Details are in ?data, which is mostly a copy of Data section of Writing R Extensions.

9.0.2.2 Accessing data

Data in the [data] directory will be accessed in the following ways,

  • efficient way: (especially for large data sets) using the data function:
data(foo) # accesses data with, e.g. load(foo.RData), read(foo.csv), or source(foo.R) 
  • easy way: by adding the following line to the package DESCRIPTION: note: this should be used with caution or it can cause difficulty as discussed in redmine issue #1118
LazyData: TRUE

From the R help page:

Currently, a limited number of data formats can be accessed using the data function by placing one of the following filetypes in a packages’ data directory: * files ending .R or .r are source()d in, with the R working directory changed temporarily to the directory containing the respective file. (data ensures that the utils package is attached, in case it had been run via utils::data.) * files ending .RData or .rda are load()ed. * files ending .tab, .txt or .TXT are read using read.table(..., header = TRUE), and hence result in a data frame. * files ending .csv or .CSV are read using read.table(..., header = TRUE, sep = ';'), and also result in a data frame.

If your data does not fall in those 4 categories, or you can use the system.file function to get access to the data:

system.file("data", "ed.trait.dictionary.csv", package="PEcAn.utils")
[1] "/home/kooper/R/x86_64-pc-linux-gnu-library/2.15/PEcAn.utils/data/ed.trait.dictionary.csv"

The arguments are folder, filename(s) and then package. It will return the fully qualified path name to a file in a package, in this case it points to the trait data. This is almost the same as the data function, however we can now use any function to read the file, such as read.csv instead of read.csv2 which seems to be the default of data. This also allows us to store arbitrary files in the data folder, such as the the bug file and load it when we need it.

9.0.2.2.1 Examples of data in PEcAn packages
  • outputs: [/modules/uncertainties/data/output.RData]
  • parameter samples [/modules/uncertainties/data/samples.RData]

9.0.3 Documenting functions using roxygen2

This is the standard method for documenting R functions in PEcAn. For detailed instructions, see one of the following resources:

Below is a complete template for a Roxygen documentation block. Note that roxygen lines start with #':

#' Function title, in a few words
#'
#' Function description, in 2-3 sentences.
#'
#' (Optional) Package details.
#'
#' @param argument_1 A description of the argument
#' @param argument_2 Another argument to the function
#' @return A description of what the function returns.
#'
#' @author Your name <your_email@email.com>
#' @examples
#' \dontrun{
#'   # This example will NOT be run by R CMD check.
#'   # Useful for long-running functions, or functions that
#'   # depend on files or values that may not be accessible to R CMD check.
#'   my_function("~/user/my_file")
#'}
# # This example WILL be run by R CMD check
#' my_function(1:10, argument_2 = 5)
## ^^ A few examples of the function's usage
#' @export
# ^^ Whether or not the function will be "exported" (made available) to the user.
# If omitted, the function can only be used inside the package.
my_function <- function(argument_1, argument_2) {...}

Here is a complete example from the PEcAn.utils::days_in_year() function:

#' Number of days in a year
#'
#' Calculate number of days in a year based on whether it is a leap year or not.
#'
#' @param year Numeric year (can be a vector)
#' @param leap_year Default = TRUE. If set to FALSE will always return 365
#'
#' @author Alexey Shiklomanov
#' @return integer vector, all either 365 or 366
#' @export
#' @examples
#' days_in_year(2010)  # Not a leap year -- returns 365
#' days_in_year(2012)  # Leap year -- returns 366
#' days_in_year(2000:2008)  # Function is vectorized over years
days_in_year <- function(year, leap_year = TRUE) {...}

To update documentation throughout PEcAn, run make document in the PEcAn root directory. Make sure you do this before opening a pull request – PEcAn’s automated testing (Travis) will check if any documentation is out of date and will throw an error like the following if it is:

These files were changed by the build process:
{...}

9.0.3.1 Updating to a new Roxygen version

For consistency across packages and machines, all PEcAn developers need to compile documentation with the same version of Roxygen. Roxygen itself will check for this and refuse to rebuild a package that was last touched by a newer version of Roxygen, but the warning it gives is very quiet and easy to miss. We take a louder approach by hardcoding the expected Roxygen version into PEcAn’s Makefile and throwing a build failure if the installed Roxygen is not an exact match.

When it is time for everyone to update to a newer Roxygen, follow the same procedure we used when updating from 7.2.3 to 7.3.1, replacing version strings as appropriate:

  • Before starting, work with the team to merge/close as many existing PRs as feasible – this process touches a lot of files and is likely to create merge conflicts in other PRs.
  • Edit the Makefile to change EXPECTED_ROXYGEN_VERSION := 7.2.3 to EXPECTED_ROXYGEN_VERSION := 7.3.1.
  • Run make clean && make document to be sure Roxygen has been run on all packages.
  • Check the console output for warnings from Roxygen, and fix them as needed. New versions often get pickier about formatting issues that used to be considered minor.
  • Run ./scripts/generate_dependencies.R to update the version of Roxygen recorded as a Docker dependency.
  • Grep the PEcAn folder for the string 7.2.3 to make sure no references were missed.
    • e.g. this time I found a remaining RoxygenNote: 7.2.3 in models/cable/DESCRIPTION – Make currently skips cable, so I redocumented it manually.
  • Review all changes.
    • The changes should mostly just consist of updated RoxygenNote: lines in all the DESCRIPTION files.
    • In all cases but extra-double-specially if any NAMESPACE files change, make sure you understand what happened rather than blindly committing the changes. Usually the new version is an improvement, but this is the time to check.
  • Once all looks good, commit and push.
  • Make a loud announcement, e.g. on Slack, to tell all developers to update roxygen2 on their machines as soon as the PR is merged.