9 dplyr
When running devtools::check()
a common warning relating to dplyr
code is “no visible binding for global variable ‘x’”
In data-masking situations (e.g. mutate()
, filter()
), you can eliminate this warning by using the .data
pronoun. For example, instead of df %>% mutate(newvar = oldvar + 2)
, use df %>% mutate(newvar = .data$oldvar + 2)
.
In tidy-select situations (e.g. select()
, rename()
), you can eliminate this warning by using strings instead of naked column names. For example, instead of df %>% select(y)
use df %>% select("y")
. Using .data
inside of select()
is deprecated as of tidyselect v1.2.0
9.0.1 Logging
During development we often add many print statements to check to see how the code is doing, what is happening, what intermediate results there are etc. When done with the development it would be nice to turn this additional code off, but have the ability to quickly turn it back on if we discover a problem. This is where logging comes into play. Logging allows us to use “rules” to say what information should be shown. For example when I am working on the code to create graphs, I do not have to see any debugging information about the SQL command being sent, however trying to figure out what goes wrong during a SQL statement it would be nice to show the SQL statements without adding any additional code.
PEcAn provides a set of logger.*
functions that should be used in place of base R’s stop
, warn
, print
, and similar functions. The logger
functions make it easier to print to a system log file, and to control the level of output produced by PEcAn.
- The file test.logger.R provides descriptive examples
- This query provides an current overview of functions that use logging
- Logger functions and their corresponding levels (in order of increasing level):
logger.debug
("DEBUG"
) – Low-level diagnostic messages that are hidden by default. Good examples of this are expanded file paths and raw results from database queries or other analyses.logger.info
("INFO"
) – Informational messages that regular users may want to see, but which do not indicate anything unexpected. Good examples of this are progress updates updates for long-running processes, or brief summaries of queries or analyses.logger.warn
("WARN"
) – Warning messages about issues that may lead to unexpected but valid results. Good examples of this are interactions between arguments that lead to some arguments being ignored or removal of missing or extreme values.logger.error
("ERROR"
) – Error messages from which PEcAn has some capacity to recover. Unless you have a very good reason, we recommend avoiding this in favor of eitherlogger.severe
to actually stop execution orlogger.warn
to more explicitly indicate that the problem is not fatal.logger.severe
– Catastrophic errors that warrant immediate termination of the workflow. This is the only function that actually stops R’s execution (viastop
).- The
logger.setLevel
function sets the level at which a message will be printed. For instance,logger.setLevel("WARN")
will suppresslogger.info
andlogger.debug
messages, but will printlogger.warn
andlogger.error
messages.logger.setLevel("OFF")
suppresses all logger messages. - To print all messages to console, use
logger.setUseConsole(TRUE)
9.0.2 Package Data
9.0.2.1 Summary:
Files with the following extensions will be read by R as data:
- plain R code in .R and .r files are sourced using
source()
- text tables in .tab, .txt, .csv files are read using
read()
** objects in R image files: .RData, .rda are loaded usingload()
- capitalization matters
- all objects in foo.RData are loaded into environment
- pro: easiset way to store objects in R format
- con: format is application (R) specific
Details are in ?data
, which is mostly a copy of Data section of
Writing R
Extensions.
9.0.2.2 Accessing data
Data in the [data] directory will be accessed in the following ways,
- efficient way: (especially for large data sets) using the
data
function:
data(foo) # accesses data with, e.g. load(foo.RData), read(foo.csv), or source(foo.R)
- easy way: by adding the following line to the package DESCRIPTION: note: this should be used with caution or it can cause difficulty as discussed in redmine issue #1118
: TRUE LazyData
From the R help page:
Currently, a limited number of data formats can be accessed using the data
function by placing one of the following filetypes in a packages’ data
directory:
* files ending .R
or .r
are source()
d in, with the R working
directory changed temporarily to the directory containing the respective
file. (data
ensures that the utils
package is attached, in case it
had been run via utils::data
.)
* files ending .RData
or .rda
are load()
ed.
* files ending .tab
, .txt
or .TXT
are read using read.table(..., header = TRUE)
, and hence result in a data frame.
* files ending .csv
or .CSV
are read using read.table(..., header = TRUE, sep = ';')
, and also result in a data frame.
If your data does not fall in those 4 categories, or you can use the
system.file
function to get access to the data:
system.file("data", "ed.trait.dictionary.csv", package="PEcAn.utils")
1] "/home/kooper/R/x86_64-pc-linux-gnu-library/2.15/PEcAn.utils/data/ed.trait.dictionary.csv" [
The arguments are folder, filename(s) and then package. It will return the fully qualified path name to a file in a package, in this case it points to the trait data. This is almost the same as the data function, however we can now use any function to read the file, such as read.csv instead of read.csv2 which seems to be the default of data. This also allows us to store arbitrary files in the data folder, such as the the bug file and load it when we need it.
9.0.3 Documenting functions using roxygen2
This is the standard method for documenting R functions in PEcAn. For detailed instructions, see one of the following resources:
roxygen2
pacakge documentation- Formatting overview
- Markdown formatting
- Namespaces (e.g. when to use
@export
)
- From “R packages” by Hadley Wickham:
Below is a complete template for a Roxygen documentation block.
Note that roxygen lines start with #'
:
#' Function title, in a few words
#'
#' Function description, in 2-3 sentences.
#'
#' (Optional) Package details.
#'
#' @param argument_1 A description of the argument
#' @param argument_2 Another argument to the function
#' @return A description of what the function returns.
#'
#' @author Your name <your_email@email.com>
#' @examples
#' \dontrun{
#' # This example will NOT be run by R CMD check.
#' # Useful for long-running functions, or functions that
#' # depend on files or values that may not be accessible to R CMD check.
#' my_function("~/user/my_file")
#'}
# # This example WILL be run by R CMD check
#' my_function(1:10, argument_2 = 5)
## ^^ A few examples of the function's usage
#' @export
# ^^ Whether or not the function will be "exported" (made available) to the user.
# If omitted, the function can only be used inside the package.
<- function(argument_1, argument_2) {...} my_function
Here is a complete example from the PEcAn.utils::days_in_year()
function:
#' Number of days in a year
#'
#' Calculate number of days in a year based on whether it is a leap year or not.
#'
#' @param year Numeric year (can be a vector)
#' @param leap_year Default = TRUE. If set to FALSE will always return 365
#'
#' @author Alexey Shiklomanov
#' @return integer vector, all either 365 or 366
#' @export
#' @examples
#' days_in_year(2010) # Not a leap year -- returns 365
#' days_in_year(2012) # Leap year -- returns 366
#' days_in_year(2000:2008) # Function is vectorized over years
<- function(year, leap_year = TRUE) {...} days_in_year
To update documentation throughout PEcAn, run make document
in the PEcAn root directory.
Make sure you do this before opening a pull request –
PEcAn’s automated testing (Travis) will check if any documentation is out of date and will throw an error like the following if it is:
These files were changed by the build process:
{...}
9.0.3.1 Updating to a new Roxygen version
For consistency across packages and machines, all PEcAn developers need to compile documentation with the same version of Roxygen. Roxygen itself will check for this and refuse to rebuild a package that was last touched by a newer version of Roxygen, but the warning it gives is very quiet and easy to miss. We take a louder approach by hardcoding the expected Roxygen version into PEcAn’s Makefile and throwing a build failure if the installed Roxygen is not an exact match.
When it is time for everyone to update to a newer Roxygen, follow the same procedure we used when updating from 7.2.3 to 7.3.1, replacing version strings as appropriate:
- Before starting, work with the team to merge/close as many existing PRs as feasible – this process touches a lot of files and is likely to create merge conflicts in other PRs.
- Edit the Makefile to change
EXPECTED_ROXYGEN_VERSION := 7.2.3
toEXPECTED_ROXYGEN_VERSION := 7.3.1
. - Run
make clean && make document
to be sure Roxygen has been run on all packages. - Check the console output for warnings from Roxygen, and fix them as needed. New versions often get pickier about formatting issues that used to be considered minor.
- Run
./scripts/generate_dependencies.R
to update the version of Roxygen recorded as a Docker dependency. - Grep the PEcAn folder for the string
7.2.3
to make sure no references were missed.- e.g. this time I found a remaining
RoxygenNote: 7.2.3
in models/cable/DESCRIPTION – Make currently skips cable, so I redocumented it manually.
- e.g. this time I found a remaining
- Review all changes.
- The changes should mostly just consist of updated
RoxygenNote:
lines in all the DESCRIPTION files. - In all cases but extra-double-specially if any NAMESPACE files change, make sure you understand what happened rather than blindly committing the changes. Usually the new version is an improvement, but this is the time to check.
- The changes should mostly just consist of updated
- Once all looks good, commit and push.
- Make a loud announcement, e.g. on Slack, to tell all developers to update roxygen2 on their machines as soon as the PR is merged.