27 Coding Practices
27.1 Coding Style
Consistent coding style improves readability and reduces errors in shared code.
R does not have an official style guide, but Hadley Wickham provides one that is well thought out and widely adopted. Advanced R: Coding Style.
Both the Wickham text and this page are derived from Google’s R Style Guide.
27.1.1 Use Roxygen2 documentation
This is the standard method of documentation used in PEcAn development, it provides inline documentation similar to doxygen. Even trivial functions should be documented.
See Roxygen2 Wiki page
27.1.2 Write your name at the top
Any function that you create or make a meaningful contribution to should have your name listed after the author tag in the function documentation.
27.1.3 Use testthat testing package
See Unit_Testing wiki for instructions, and Advanced R: Tests.
- tests provide support for documentation - they define what a function is (and is not) expected to do
- all functions need tests to ensure basic functionality is maintained during development.
- all bugs should have a test that reproduces the bug, and the test should pass before bug is closed
27.1.4 Don’t use shortcuts
R provides many shortcuts that are useful when coding interactively, or for writing scripts. However, these can make code more difficult to read and can cause problems when written into packages.
27.1.4.1 Function Names (verb.noun
)
Following convention established in PEcAn 0.1, we use the all lowercase with periods to separate words. They should generally have a verb.noun
format, such as query.traits
, get.samples
, etc.
27.1.4.2 File Names
File names should end in .R
, .Rdata
, .Rscript
and should be meaningful, e.g. named after the primary functions that they contain. There should be a separate file for each major high-level function to aid in identifying the contents of files in a directory.
27.1.4.3 Use “<-” as an assignment operator
- Because most R code uses <- (except where = is required), we will use <-
- “=” is used for function arguments
27.1.4.4 Use Spaces
- around all binary operators (=, +, -, <-, etc.).
- after but not before a comma
27.1.4.5 Use curly braces
The option to omit curly braces is another shortcut that makes code easier to write but harder to read and more prone to error.
27.1.5 Package Dependencies:
27.1.5.1 library vs require
When another package is required by a function or script, it can be called in the following ways:
(As a package dependency loads with the package, these should be the default approaches when writing functions in a package. There can be some exceptions, such as when a rarely-used or non-essential function requires an esoteric package.)
1. When using library
,
if dependency is not met, it will print an error and stop
2. When using require
, it
will print a warning and continue (but will throw an error when a function from the required package is called)
Reference: Stack Overflow “What is the difference between require and library?”
27.1.5.2 DEPENDS, SUGGESTS, IMPORTS
It is considered best practice to use DEPENDS and SUGGESTS in DESCRIPTION; SUGGESTS should be used for packages that are called infrequently, or only in examples and vignettes; suggested packages are called by require inside a function.
Consider using IMPORTS instead of depends in the DESCRIPTION files. This will make loading packages faster by allowing it to have functions available without loading the hierarchy of dependencies, dependencies of dependencies, ad infinitum … From p. 6 of the “R extensions manual”:http://cran.r-project.org/doc/manuals/R-exts.html
The
Suggests
field uses the same syntax asDepends
and lists packages that are not necessarily needed. This includes packages used only in examples, tests or vignettes (see Section 1.4 [Writing package vignettes], page 26), and packages loaded in the body of functions. E.g., suppose an example from package foo uses a dataset from package bar. Then it is not necessary to have bar use foo unless one wants to execute all the examples/tests/vignettes: it is useful to have bar, but not necessary. Version requirements can be specified, and will be used by R CMD check.
27.2 Logging
During development we often add many print statements to check to see how the code is doing, what is happening, what intermediate results there are etc. When done with the development it would be nice to turn this additional code off, but have the ability to quickly turn it back on if we discover a problem. This is where logging comes into play. Logging allows us to use “rules” to say what information should be shown. For example when I am working on the code to create graphs, I do not have to see any debugging information about the SQL command being sent, however trying to figure out what goes wrong during a SQL statement it would be nice to show the SQL statements without adding any additional code.
27.2.1 PEcAn logging functions
These logger
family of functions are more sophisticated, and can be used in place of stop
, warn
, print
, and similar functions. The logger
functions make it easier to print to a system log file.
27.2.1.1 Examples
- The file test.logger.R provides descriptive examples
- This query provides an current overview of functions that use logging
- logger functions (in order of increasing level):
logger.debug
logger.info
logger.warn
logger.error
- the
logger.setLevel
function sets the level at which a message will be printed logger.setLevel("DEBUG")
will print messages from all logger functionslogger.setLevel("ERROR")
will only print messages fromlogger.error
logger.setLevel("INFO")
andlogger.setLevel("WARN") shows messages from
logger.and higher functions, e.g.
logger.setLevel(“WARN”)shows messages from
logger.warnand
logger.error`logger.setLevel("OFF")
suppresses all logger messages- To print all messages to console, use
logger.setUseConsole(TRUE)
27.2.2 Other R logging packages
- This section is for reference - these functions should not be used in PEcAn, as they are redundant with the
logger.*
functions described above
R does provide a basic logging capability using stop, warning and message. These allow to print message (and stop execution in case of stop). However there is not an easy method to redirect the logging information to a file, or turn the logging information on and off. This is where one of the following packages comes into play. The packages themselves are very similar since they try to emulate log4j.
Both of the following packages use a hierarchic loggers, meaning that if you change the level of displayed level of logging at one level all levels below it will update their logging.
27.2.2.1 logging
The logging development is done at http://logging.r-forge.r-project.org/ and more information is located at http://cran.r-project.org/web/packages/logging/index.html . To install use the following command:install.packages("logging", repos="http://R-Forge.R-project.org")
This has my preference pure based on documentation.
27.2.2.2 futile
The second logging package is http://cran.r-project.org/web/packages/futile.logger/ and is eerily similar to logging (as a matter of fact logging is based on futile).
27.2.3 Example Usage
To be able to use the loggers there needs to be some initialization done. Neither package allows to read it from a configuration file, so we might want to use the pecan.xml file to set it up. The setup will always be somewhat the same:
# load library
library(logging)
logReset()
# add handlers, responsible for actually printing/saving the messages
addHandler(writeToConsole)
addHandler(writeToFile, file="file.log")
# setup root logger with INFO
setLevel('INFO')
# make all of PEcAn print debug messages
setLevel('DEBUG', getLogger('PEcAn'))
# only print info and above for the SQL part of PEcAn
setLevel('INFO', getLogger('PEcAn.SQL'))
To now use logging in the code you can use the following code:
pl <- getLogger('PEcAn.MetaAnalysis.function1')
pl$info("This is an INFO message.")
pl$debug("The value for x=%d", x)
pl$error("Something bad happened and I am scared now.")
or
27.3 Package Data
27.3.1 Summary:
Files with the following extensions will be read by R as data:
- plain R code in .R and .r files are sourced using
source()
- text tables in .tab, .txt, .csv files are read using
read()
** objects in R image files: .RData, .rda are loaded usingload()
- capitalization matters
- all objects in foo.RData are loaded into environment
- pro: easiset way to store objects in R format
- con: format is application (R) specific (discussed in #318)
Details are in ?data
, which is mostly a copy of Data section of
Writing R
Extensions.
27.3.2 Accessing data
Data in the [data] directory will be accessed in the following ways,
- efficient way: (especially for large data sets) using the
data
function:
- easy way: by adding the following line to the package DESCRIPTION: note: this should be used with caution or it can cause difficulty as discussed in redmine issue #1118
From the R help page:
Currently, a limited number of data formats can be accessed using the data
function by placing one of the following filetypes in a packages’ data
directory:
* files ending .R
or .r
are source()
d in, with the R working
directory changed temporarily to the directory containing the respective
file. (data
ensures that the utils
package is attached, in case it
had been run via utils::data
.)
* files ending .RData
or .rda
are load()
ed.
* files ending .tab
, .txt
or .TXT
are read using read.table(..., header = TRUE)
, and hence result in a data frame.
* files ending .csv
or .CSV
are read using read.table(..., header = TRUE, sep = ';')
, and also result in a data frame.
If your data does not fall in those 4 categories, or you can use the
system.file
function to get access to the data:
system.file("data", "ed.trait.dictionary.csv", package="PEcAn.utils")
[1] "/home/kooper/R/x86_64-pc-linux-gnu-library/2.15/PEcAn.utils/data/ed.trait.dictionary.csv"
The arguments are folder, filename(s) and then package. It will return the fully qualified path name to a file in a package, in this case it points to the trait data. This is almost the same as the data function, however we can now use any function to read the file, such as read.csv instead of read.csv2 which seems to be the default of data. This also allows us to store arbitrary files in the data folder, such as the the bug file and load it when we need it.
27.3.2.0.1 Examples of data in PEcAn packages
- Redmine issue #1060 added time constants in
source:utils/data/time.constants.RData
- outputs: [/modules/uncertainties/data/output.RData]
- parameter samples [/modules/uncertainties/data/samples.RData]
27.4 Packages used in development
27.4.0.1 roxygen2
Used to document code. See instructions under [[R#Coding_Style|Coding Style]]
27.4.0.2 devtools
Provides functions to simplify development
Documentation: The R devtools packate
other tips for devtools (from the documentation):
- Adding the following to your
~/.Rprofile
will load devtools when running R in interactive mode:
- Adding the following to your .Rpackages will allow devtools to recognize package by folder name, rather than directory path
# in this example, devhome is the pecan trunk directory
devhome <- "/home/dlebauer/R-dev/pecandev/"
list(
default = function(x) {
file.path(devhome, x, x)
},
"utils" = paste(devhome, "pecandev/utils", sep = "")
"common" = paste(devhome, "pecandev/common", sep = "")
"all" = paste(devhome, "pecandev/all", sep = "")
"ed" = paste(devhome, "pecandev/models/ed", sep = "")
"uncertainty" = paste(devhome, "modules/uncertainty", sep = "")
"meta.analysis" = paste(devhome, "modules/meta.analysis", sep = "")
"db" = paste(devhome, "db", sep = "")
)
Now, devtools can take pkg
as an argument instead of /path/to/pkg/
,
e.g. so you can use build("pkg")
instead of build("/path/to/pkg/")
27.5 Roxygen2
This is the standard method of documentation used in PEcAn development, it provides inline documentation similar to doxygen.
27.5.1 Canonical references:
- Must Read: R package development by Hadley Wickham:
- Object Documentation
- Package Metadata
- Roxygen2 Documentation
- Roxygen2 Package Documentation
- GitHub
27.5.2 Basic Roxygen2 instructions:
Section headers link to “Writing R extensions” which provides in-depth documentation. This is provided as an overview and quick reference.
27.5.2.2 Text markup
27.5.2.2.1 Formatting
\bold{}
\emph{}
italics
27.5.2.2.2 Links
\url{www.url.com}
or\href{url}{text}
for links\code{\link{thisfn}}
links to function “thisfn” in the same package\code{\link{foo::thatfn}}
links to function “thatfn” in package “foo”\pkg{package_name}
27.5.2.2.3 Math
\eqn{a+b=c}
uses LaTex to format an inline equation\deqn{a+b=c}
uses LaTex to format displayed equation\deqn{latex}{ascii}
and\eqn{latex}{ascii}
can be used to provide different versions in latex and ascii.
27.5.2.2.4 Lists
\enumerate{
\item A database consists of one or more records, each with one or
more named fields.
\item Regular lines start with a non-whitespace character.
\item Records are separated by one or more empty lines.
}
\itemize and \enumerate commands may be nested.
27.5.2.2.5 “Tables”:http://cran.r-project.org/doc/manuals/R-exts.html#Lists-and-tables
\tabular{rlll}{
[,1] \tab Ozone \tab numeric \tab Ozone (ppb)\cr
[,2] \tab Solar.R \tab numeric \tab Solar R (lang)\cr
[,3] \tab Wind \tab numeric \tab Wind (mph)\cr
[,4] \tab Temp \tab numeric \tab Temperature (degrees F)\cr
[,5] \tab Month \tab numeric \tab Month (1--12)\cr
[,6] \tab Day \tab numeric \tab Day of month (1--31)
}
27.5.3 Example
Here is an example documented function, myfun
##' My function adds three numbers
##'
##' A great function for demonstrating Roxygen documentation
##' @param a numeric
##' @param b numeric
##' @param c numeric
##' @return d, numeric sum of a + b + c
##' @export
##' @author David LeBauer
##' @examples
##' myfun(1,2,3)
##' \dontrun{myfun(NULL)}
myfun <- function(a, b, c){
d <- a + b + c
return(d)
}
In emacs, with the cursor inside the function, the keybinding C-x O will generate an outline or update the Roxygen2 documentation.
27.5.4 Updating documentation
- After adding documentation run the following command (replacing common with the name of the folder you want to update): ** In R using devtools to call roxygenize:
27.6 Testing
PEcAn uses the testthat package developed by Hadley Wickham. Hadley has written instructions for using this package in his Testing chapter.
27.6.1 Rationale
- makes development easier
- provides working documentation of expected functionality
- saves time by allowing computer to take over error checking once a test has been made
- improves code quality
- Further reading: Aruliah et al 2012 Best Practices for Scientific Computing
27.6.2 Tests makes development easier and less error prone
Testing makes it easier to develop by organizing everything you are already doing anyway - but integrating it into the testing and documentation. With a codebase like PEcAn, it is often difficult to get started. You have to figure out
- what was I doing yesterday?
- what do I want to do today?
- what existing functions do I need to edit?
- what are the arguments to these functions (and what are examples of valid arguments)
- what packages are affected
- where is a logical place to put files used in testing
27.6.3 Quick Start:
- decide what you want to do today
- identify the issue in github (if none exists, create one)
to work on issue 99, create a new branch called “github99” or some descriptive name… Today we will enable an existing function,
make.cheas
to makegoat.cheddar
. We will know that we are done by the color and taste.git branch goat-cheddar git checkout goat-cheddar
- open existing (or create new) file in
inst/tests/
. If working on code in “myfunction” or a set of functions in “R/myfile.R”, the file should be named accordingly, e.g. “inst/tests/test.myfile.R” - if you are lucky, the function has already been tested and has some examples.
if not, you may need to create a minimal example, often requiring a settings file. The default settings file can be obtained in this way:
write what you want to do
test_that("make.cheas can make cheese",{ goat.cheddar <- make.cheas(source = 'goat', style = 'cheddar') expect_equal(color(goat.cheddar), "orange") expect_is(object = goat.cheddar, class = "cheese") expect_true(all(c("sharp", "creamy") %in% taste(goat.cheddar))) }
- now edit the goat.cheddar function until it makes savory, creamy, orange cheese.
- commit often
update documentation and test
- commit again
when complete, merge, and push
27.6.4 Test files
Many of PEcAn’s functions require inputs that are provided as data.
These can be in the /data
or the /inst/extdata
folders of a package.
Data that are not package specific should be placed in the PEcAn.all or
PEcAn.utils files.
Some useful conventions:
27.6.4.1 Settings
- A generic settings can be found in the PEcAn.all package
settings.xml <- system.file("pecan.biocro.xml", package = "PEcAn.BIOCRO")
settings <- read.settings(settings.xml)
- database settings can be specified, and tests run only if a connection is available
We currently use the following database to run tests against; tests that require access to a database should check db.exists()
and be skipped if it returns FALSE to avoid failed tests on systems that do not have the database installed.
settings$database <- list(userid = "bety",
passwd = "bety",
name = "bety", # database name
host = "localhost" # server name)
test_that(..., {
skip_if_not(db.exists(settings$database))
## write tests here
})
- instructions for installing this are available on the VM creation wiki
examples can be found in the PEcAn.DB package (
base/db/tests/testthat/
).Model specific settings can go in the model-specific module, for example:
settings.xml <- system.file("extdata/pecan.biocro.xml", package = "PEcAn.BIOCRO")
settings <- read.settings(settings.xml)
- test-specific settings:
settings text can be specified inline:
settings.text <- " <pecan> <nocheck>nope</nocheck> ## allows bypass of checks in the read.settings functions <pfts> <pft> <name>ebifarm.pavi</name> <outdir>test/</outdir> </pft> </pfts> <outdir>test/</outdir> <database> <userid>bety</userid> <passwd>bety</passwd> <location>localhost</location> <name>bety</name> </database> </pecan>" settings <- read.settings(settings.text)
values in settings can be updated:
27.6.4.2 Helper functions created to make testing easier
- tryl returns FALSE if function gives error
- temp.settings creates temporary settings file
- test.remote returns TRUE if remote connection is available
- db.exists returns TRUE if connection to database is available
27.6.4.3 When should I test?
A test should be written for each of the following situations:
- Each bug should get a regression test.
- The first step in handling a bug is to write code that reproduces the error
- This code becomes the test
- most important when error could re-appear
- essential when error silently produces invalid results
- Every time a (non-trivial) function is created or edited
- Write tests that indicate how the function should perform
- example:
expect_equal(sum(1,1), 2)
indicates that the sum function should take the sum of its arguments
- example:
- Write tests for cases under which the function should throw an error
- example:
expect_error(sum("foo"))
- better :
expect_error(sum("foo"), "invalid 'type' (character)")
27.6.4.4 What types of testing are important to understand?
27.6.4.5 Unit Testing / Test Driven Development
Tests are only as good as the test
- write test
- write code
27.6.4.6 Regression Testing
When a bug is found,
- write a test that finds the bug (the minimum test required to make the test fail)
- fix the bug
- bug is fixed when test passes
27.6.4.7 How should I test in R? The testthat package.
tests are found in ~/pecan/<packagename>/inst/tests
, for example
utils/inst/tests/
See attached file and http://r-pkgs.had.co.nz/tests.html for details on how to use the testthat package.
27.6.4.7.1 List of Expectations
Full | Abbreviation |
---|---|
expect_that(x, is_true()) | expect_true(x) |
expect_that(x, is_false()) | expect_false(x) |
expect_that(x, is_a(y)) | expect_is(x, y) |
expect_that(x, equals(y)) | expect_equal(x, y) |
expect_that(x, is_equivalent_to(y)) | expect_equivalent(x, y) |
expect_that(x, is_identical_to(y)) | expect_identical(x, y) |
expect_that(x, matches(y)) | expect_matches(x, y) |
expect_that(x, prints_text(y)) | expect_output(x, y) |
expect_that(x, shows_message(y)) | expect_message(x, y) |
expect_that(x, gives_warning(y)) | expect_warning(x, y) |
expect_that(x, throws_error(y)) | expect_error(x, y) |
27.6.4.7.2 How to run tests
add the following to “pecan/tests/testthat.R”
27.6.4.8 basic use of the testthat package
Here is an example of tests (these should be placed in
<packagename>/tests/testthat/test-<sourcefilename>.R
:
test_that("mathematical operators plus and minus work as expected",{
expect_equal(sum(1,1), 2)
expect_equal(sum(-1,-1), -2)
expect_equal(sum(1,NA), NA)
expect_error(sum("cat"))
set.seed(0)
expect_equal(sum(matrix(1:100)), sum(data.frame(1:100)))
})
test_that("different testing functions work, giving excuse to demonstrate",{
expect_identical(1, 1)
expect_identical(numeric(1), integer(1))
expect_equivalent(numeric(1), integer(1))
expect_warning(mean('1'))
expect_that(mean('1'), gives_warning("argument is not numeric or logical: returning NA"))
expect_warning(mean('1'), "argument is not numeric or logical: returning NA")
expect_message(message("a"), "a")
})
27.6.4.8.1 Script testing
It is useful to add tests to a script during development. This allows you to test that the code is doing what you expect it to do.
* here is a fake script using the iris data set
test_that("the iris data set has the same basic features as before",{
expect_equal(dim(iris), c(150,5))
expect_that(iris$Sepal.Length, is_a("numeric"))
expect_is(iris$Sepal.Length, "numeric")#equivalent to prev. line
expect_is(iris$Species, "factor")
})
iris.color <- data.frame(Species = c("setosa", "versicolor", "virginica"),
color = c("pink", "blue", "orange"))
newiris <- merge(iris, iris.color)
iris.model <- lm(Petal.Length ~ color, data = newiris)
test_that("changes to Iris code occurred as expected",{
expect_that(dim(newiris), equals(c(150, 6)))
expect_that(unique(newiris$color),
is_identical_to(unique(iris.color$color)))
expect_equivalent(iris.model$coefficients["(Intercept)"], 4.26)
})
27.6.4.8.2 Function testing
Testing of a new function, as.sequence
. The function and documentation
are in source:R/utils.R and the tests are in source:tests/test.utils.R.
Recently, I made the function as.sequence
to turn any vector into a
sequence, with custom handling of NA’s:
function(x, na.rm = TRUE){
x2 <- as.integer(factor(x, unique(x)))
if(all(is.na(x2))){
x2 <- rep(1, length(x2))
}
if(na.rm == TRUE){
x2[is.na(x2)] <- max(x2, na.rm = TRUE) + 1
}
return(x2)
}
The next step was to add documentation and test. Many people find it more efficient to write tests before writing the function. This is true, but it also requires more discipline. I wrote these tests to handle the variety of cases that I had observed.
As currently used, the function is exposed to a fairly restricted set of options - results of downloads from the database and transformations.
27.6.5 Testing the Shiny Server
Shiny can be difficult to debug because, when run as a web service, the R output is hidden in system log files that are hard to find and read. One useful approach to debugging is to use port forwarding, as follows.
First, on the remote machine (including the VM), make sure R’s working directory is set to the directory of the Shiny app (e.g., setwd(/path/to/pecan/shiny/WorkflowPlots)
, or just open the app as an RStudio project).
Then, in the R console, run the app as:
shiny::runApp(port = XXXX)
# E.g. shiny::runApp(port = 5638)
Then, on your local machine, open a terminal and run the following command, matching XXXX
to the port above and YYYY
to any unused port on your local machine (any 4-digit number should work).
ssh -L YYYY:localhost:XXXX <remote connection>
# E.g., for the PEcAn VM, given the above port:
# ssh -L 5639:localhost:5638 carya@localhost -p 6422
Now, in a web browser on your local machine, browse to localhost:YYYY
(e.g., localhost:5639
) to run whatever app you started with shiny::runApp
in the previous step.
All of the output should display in the R console where the shiny::runApp
command was executed.
Note that this includes any print
, message
, logger.*
, etc. statements in your Shiny app.
If the Shiny app hits an R error, the backtrace should include a line like Hit error at of server.R#LXX
– that XX
being a line number that you can use to track down the error.
To return from the error to a normal R prompt, hit <Control>-C
(alternatively, the “Stop” button in RStudio).
To restart the app, run shiny::runApp(port = XXXX)
again (keeping the same port).
Note that Shiny runs any code in the pecan/shiny/<app>
directory at the moment the app is launched.
So, any changes you make to the code in server.R
and ui.R
or scripts loaded therein will take effect the next time the app is started.
If for whatever reason this doesn’t work with RStudio, you can always run R from the command line.
Also, note that the ability to forward ports (ssh -L
) may depend on the ssh
configuration of your remote machine.
These instructions have been tested on the PEcAn VM (v.1.5.2+).