27 Coding Practices

27.1 Coding Style

Consistent coding style improves readability and reduces errors in shared code.

R does not have an official style guide, but Hadley Wickham provides one that is well thought out and widely adopted. Advanced R: Coding Style.

Both the Wickham text and this page are derived from Google’s R Style Guide.

27.1.1 Use Roxygen2 documentation

This is the standard method of documentation used in PEcAn development, it provides inline documentation similar to doxygen. Even trivial functions should be documented.

See Roxygen2 Wiki page

27.1.2 Write your name at the top

Any function that you create or make a meaningful contribution to should have your name listed after the author tag in the function documentation.

27.1.3 Use testthat testing package

See Unit_Testing wiki for instructions, and Advanced R: Tests.

tests provide support for documentation - they define what a function is (and is not) expected to do
all functions need tests to ensure basic functionality is maintained during development.
all bugs should have a test that reproduces the bug, and the test should pass before bug is closed

27.1.4 Don’t use shortcuts

R provides many shortcuts that are useful when coding interactively, or for writing scripts. However, these can make code more difficult to read and can cause problems when written into packages.

27.1.4.1 Function Names (`verb.noun`)

Following convention established in PEcAn 0.1, we use the all lowercase with periods to separate words. They should generally have a verb.noun format, such as query.traits, get.samples, etc.

27.1.4.2 File Names

File names should end in .R, .Rdata, .Rscript and should be meaningful, e.g. named after the primary functions that they contain. There should be a separate file for each major high-level function to aid in identifying the contents of files in a directory.

27.1.4.3 Use “<-” as an assignment operator

Because most R code uses <- (except where = is required), we will use <-
“=” is used for function arguments

27.1.4.4 Use Spaces

around all binary operators (=, +, -, <-, etc.).
after but not before a comma

27.1.4.5 Use curly braces

The option to omit curly braces is another shortcut that makes code easier to write but harder to read and more prone to error.

27.1.5 Package Dependencies:

27.1.5.1 library vs require

When another package is required by a function or script, it can be called in the following ways:

(As a package dependency loads with the package, these should be the default approaches when writing functions in a package. There can be some exceptions, such as when a rarely-used or non-essential function requires an esoteric package.) 1. When using library, if dependency is not met, it will print an error and stop 2. When using require, it will print a warning and continue (but will throw an error when a function from the required package is called)

Reference: Stack Overflow “What is the difference between require and library?”

27.1.5.2 DEPENDS, SUGGESTS, IMPORTS

It is considered best practice to use DEPENDS and SUGGESTS in DESCRIPTION; SUGGESTS should be used for packages that are called infrequently, or only in examples and vignettes; suggested packages are called by require inside a function.

Consider using IMPORTS instead of depends in the DESCRIPTION files. This will make loading packages faster by allowing it to have functions available without loading the hierarchy of dependencies, dependencies of dependencies, ad infinitum … From p. 6 of the “R extensions manual”:http://cran.r-project.org/doc/manuals/R-exts.html

The Suggests field uses the same syntax as Depends and lists packages that are not necessarily needed. This includes packages used only in examples, tests or vignettes (see Section 1.4 [Writing package vignettes], page 26), and packages loaded in the body of functions. E.g., suppose an example from package foo uses a dataset from package bar. Then it is not necessary to have bar use foo unless one wants to execute all the examples/tests/vignettes: it is useful to have bar, but not necessary. Version requirements can be specified, and will be used by R CMD check.

27.2 Logging

During development we often add many print statements to check to see how the code is doing, what is happening, what intermediate results there are etc. When done with the development it would be nice to turn this additional code off, but have the ability to quickly turn it back on if we discover a problem. This is where logging comes into play. Logging allows us to use “rules” to say what information should be shown. For example when I am working on the code to create graphs, I do not have to see any debugging information about the SQL command being sent, however trying to figure out what goes wrong during a SQL statement it would be nice to show the SQL statements without adding any additional code.

27.2.1 PEcAn logging functions

These logger family of functions are more sophisticated, and can be used in place of stop, warn, print, and similar functions. The logger functions make it easier to print to a system log file.

27.2.1.1 Examples

The file test.logger.R provides descriptive examples
This query provides an current overview of functions that use logging
logger functions (in order of increasing level):
logger.debug
logger.info
logger.warn
logger.error
the logger.setLevel function sets the level at which a message will be printed
logger.setLevel("DEBUG") will print messages from all logger functions
logger.setLevel("ERROR") will only print messages from logger.error
logger.setLevel("INFO") and logger.setLevel("WARN") shows messages fromlogger.and higher functions, e.g.logger.setLevel(“WARN”)shows messages fromlogger.warnandlogger.error`
logger.setLevel("OFF") suppresses all logger messages
To print all messages to console, use logger.setUseConsole(TRUE)

27.2.2 Other R logging packages

This section is for reference - these functions should not be used in PEcAn, as they are redundant with the logger.* functions described above

R does provide a basic logging capability using stop, warning and message. These allow to print message (and stop execution in case of stop). However there is not an easy method to redirect the logging information to a file, or turn the logging information on and off. This is where one of the following packages comes into play. The packages themselves are very similar since they try to emulate log4j.

Both of the following packages use a hierarchic loggers, meaning that if you change the level of displayed level of logging at one level all levels below it will update their logging.

27.2.2.1 logging

The logging development is done at http://logging.r-forge.r-project.org/ and more information is located at http://cran.r-project.org/web/packages/logging/index.html . To install use the following command:

install.packages("logging", repos="http://R-Forge.R-project.org")

This has my preference pure based on documentation.

27.2.2.2 futile

The second logging package is http://cran.r-project.org/web/packages/futile.logger/ and is eerily similar to logging (as a matter of fact logging is based on futile).

27.2.3 Example Usage

To be able to use the loggers there needs to be some initialization done. Neither package allows to read it from a configuration file, so we might want to use the pecan.xml file to set it up. The setup will always be somewhat the same:

# load library
library(logging)
logReset()

# add handlers, responsible for actually printing/saving the messages
addHandler(writeToConsole)
addHandler(writeToFile, file="file.log")

# setup root logger with INFO
setLevel('INFO')

# make all of PEcAn print debug messages
setLevel('DEBUG', getLogger('PEcAn'))

# only print info and above for the SQL part of PEcAn
setLevel('INFO', getLogger('PEcAn.SQL'))

To now use logging in the code you can use the following code:

pl <- getLogger('PEcAn.MetaAnalysis.function1')
pl$info("This is an INFO message.")
pl$debug("The value for x=%d", x)
pl$error("Something bad happened and I am scared now.")

loginfo("This is an INFO message.", logger="PEcAn.MetaAnalysis.function1")
logdebug("The value for x=%d", x, logger="PEcAn.MetaAnalysis.function1")
logerror("Something bad happened and I am scared now.", logger="PEcAn.MetaAnalysis.function1")

27.3 Package Data

27.3.1 Summary:

Files with the following extensions will be read by R as data:

plain R code in .R and .r files are sourced using source()
text tables in .tab, .txt, .csv files are read using read() ** objects in R image files: .RData, .rda are loaded using load()
capitalization matters
all objects in foo.RData are loaded into environment
pro: easiset way to store objects in R format
con: format is application (R) specific (discussed in #318)

Details are in ?data, which is mostly a copy of Data section of Writing R Extensions.

27.3.2 Accessing data

Data in the [data] directory will be accessed in the following ways,

efficient way: (especially for large data sets) using the data function:

data(foo) # accesses data with, e.g. load(foo.RData), read(foo.csv), or source(foo.R)

easy way: by adding the following line to the package DESCRIPTION: note: this should be used with caution or it can cause difficulty as discussed in redmine issue #1118

LazyData: TRUE

From the R help page:

Currently, a limited number of data formats can be accessed using the data function by placing one of the following filetypes in a packages’ data directory: * files ending .R or .r are source()d in, with the R working directory changed temporarily to the directory containing the respective file. (data ensures that the utils package is attached, in case it had been run via utils::data.) * files ending .RData or .rda are load()ed. * files ending .tab, .txt or .TXT are read using read.table(..., header = TRUE), and hence result in a data frame. * files ending .csv or .CSV are read using read.table(..., header = TRUE, sep = ';'), and also result in a data frame.

If your data does not fall in those 4 categories, or you can use the system.file function to get access to the data:

system.file("data", "ed.trait.dictionary.csv", package="PEcAn.utils")
[1] "/home/kooper/R/x86_64-pc-linux-gnu-library/2.15/PEcAn.utils/data/ed.trait.dictionary.csv"

The arguments are folder, filename(s) and then package. It will return the fully qualified path name to a file in a package, in this case it points to the trait data. This is almost the same as the data function, however we can now use any function to read the file, such as read.csv instead of read.csv2 which seems to be the default of data. This also allows us to store arbitrary files in the data folder, such as the the bug file and load it when we need it.

27.3.2.0.1 Examples of data in PEcAn packages

Redmine issue #1060 added time constants in source:utils/data/time.constants.RData
outputs: [/modules/uncertainties/data/output.RData]
parameter samples [/modules/uncertainties/data/samples.RData]

27.4 Packages used in development

27.4.0.1 roxygen2

Used to document code. See instructions under [[R#Coding_Style|Coding Style]]

27.4.0.2 devtools

Provides functions to simplify development

Documentation: The R devtools packate

load_all("pkg")
document("pkg")
test("pkg")
install("pkg")
build("pkg")

other tips for devtools (from the documentation):

Adding the following to your ~/.Rprofile will load devtools when running R in interactive mode:

# load devtools by default
if (interactive()) {
  suppressMessages(require(devtools))
}

Adding the following to your .Rpackages will allow devtools to recognize package by folder name, rather than directory path

# in this example, devhome is the pecan trunk directory 
devhome <- "/home/dlebauer/R-dev/pecandev/"
list(
    default = function(x) {
      file.path(devhome, x, x)
    }, 
  "utils" = paste(devhome, "pecandev/utils", sep = "")
  "common" = paste(devhome, "pecandev/common", sep = "")
  "all" = paste(devhome, "pecandev/all", sep = "")
  "ed" = paste(devhome, "pecandev/models/ed", sep = "")
  "uncertainty" = paste(devhome, "modules/uncertainty", sep = "")
  "meta.analysis" = paste(devhome, "modules/meta.analysis", sep = "")
  "db" = paste(devhome, "db", sep = "")
)

Now, devtools can take pkg as an argument instead of /path/to/pkg/, e.g. so you can use build("pkg") instead of build("/path/to/pkg/")

27.5 Roxygen2

This is the standard method of documentation used in PEcAn development, it provides inline documentation similar to doxygen.

27.5.1 Canonical references:

Must Read: R package development by Hadley Wickham:
Object Documentation
Package Metadata
Roxygen2 Documentation
Roxygen2 Package Documentation
GitHub

27.5.2 Basic Roxygen2 instructions:

Section headers link to “Writing R extensions” which provides in-depth documentation. This is provided as an overview and quick reference.

27.5.2.1 Tags

tags are preceeded by ##'
tags required by R: ** title tag is required, along with actual title ** param one for each parameter, should be defined ** return must state what function returns (or nothing, if something occurs as a side effect
tags strongly suggested for most functions: ** author ** examples can be similar to test cases.
optional tags: ** export required if function is used by another package ** import can import a required function from another package (if package is not loaded or other function is not exported) ** seealso suggests related functions. These can be linked using \code{link{}}

27.5.2.2 Text markup

27.5.2.2.1 Formatting

\bold{}
\emph{} italics

27.5.2.2.2 Links

\url{www.url.com} or \href{url}{text} for links
\code{\link{thisfn}} links to function “thisfn” in the same package
\code{\link{foo::thatfn}} links to function “thatfn” in package “foo”
\pkg{package_name}

27.5.2.2.3 Math

\eqn{a+b=c} uses LaTex to format an inline equation
\deqn{a+b=c} uses LaTex to format displayed equation
\deqn{latex}{ascii} and \eqn{latex}{ascii} can be used to provide different versions in latex and ascii.

27.5.2.2.4 Lists

\enumerate{
\item A database consists of one or more records, each with one or
more named fields.
\item Regular lines start with a non-whitespace character.
\item Records are separated by one or more empty lines.
}
\itemize and \enumerate commands may be nested.

27.5.2.2.5 “Tables”:http://cran.r-project.org/doc/manuals/R-exts.html#Lists-and-tables

\tabular{rlll}{
[,1] \tab Ozone \tab numeric \tab Ozone (ppb)\cr
[,2] \tab Solar.R \tab numeric \tab Solar R (lang)\cr
[,3] \tab Wind \tab numeric \tab Wind (mph)\cr
[,4] \tab Temp \tab numeric \tab Temperature (degrees F)\cr
[,5] \tab Month \tab numeric \tab Month (1--12)\cr
[,6] \tab Day \tab numeric \tab Day of month (1--31)
}

27.5.3 Example

Here is an example documented function, myfun

##' My function adds three numbers
##'
##' A great function for demonstrating Roxygen documentation
##' @param a numeric
##' @param b numeric
##' @param c numeric
##' @return d, numeric sum of a + b + c
##' @export
##' @author David LeBauer
##' @examples
##' myfun(1,2,3)
##' \dontrun{myfun(NULL)}
myfun <- function(a, b, c){
  d <- a + b + c
  return(d)
}

In emacs, with the cursor inside the function, the keybinding C-x O will generate an outline or update the Roxygen2 documentation.

27.5.4 Updating documentation

After adding documentation run the following command (replacing common with the name of the folder you want to update): ** In R using devtools to call roxygenize:

require(devtools)
document("common")

27.6 Testing

PEcAn uses the testthat package developed by Hadley Wickham. Hadley has written instructions for using this package in his Testing chapter.

27.6.1 Rationale

makes development easier
provides working documentation of expected functionality
saves time by allowing computer to take over error checking once a test has been made
improves code quality
Further reading: Aruliah et al 2012 Best Practices for Scientific Computing

27.6.2 Tests makes development easier and less error prone

Testing makes it easier to develop by organizing everything you are already doing anyway - but integrating it into the testing and documentation. With a codebase like PEcAn, it is often difficult to get started. You have to figure out

what was I doing yesterday?
what do I want to do today?
what existing functions do I need to edit?
what are the arguments to these functions (and what are examples of valid arguments)
what packages are affected
where is a logical place to put files used in testing

27.6.3 Quick Start:

decide what you want to do today
identify the issue in github (if none exists, create one)
to work on issue 99, create a new branch called “github99” or some descriptive name… Today we will enable an existing function, make.cheas to make goat.cheddar. We will know that we are done by the color and taste.
```
git branch goat-cheddar
git checkout goat-cheddar
```
open existing (or create new) file in inst/tests/. If working on code in “myfunction” or a set of functions in “R/myfile.R”, the file should be named accordingly, e.g. “inst/tests/test.myfile.R”
if you are lucky, the function has already been tested and has some examples.
if not, you may need to create a minimal example, often requiring a settings file. The default settings file can be obtained in this way:
```
settings <- read.settings(system.file("extdata/test.settings.xml", package = "PEcAn.utils"))
```

write what you want to do

test_that("make.cheas can make cheese",{
  goat.cheddar <- make.cheas(source = 'goat', style = 'cheddar')
  expect_equal(color(goat.cheddar), "orange")
  expect_is(object = goat.cheddar, class = "cheese")
  expect_true(all(c("sharp", "creamy") %in% taste(goat.cheddar)))
}

now edit the goat.cheddar function until it makes savory, creamy, orange cheese.
commit often

update documentation and test

library(devtools)
document("mypkg")
test("mypkg")

commit again

when complete, merge, and push

git commit -m "make.cheas makes goat.cheddar now"
git checkout master
git merge goat-cheddar
git push

27.6.4 Test files

Many of PEcAn’s functions require inputs that are provided as data. These can be in the /data or the /inst/extdata folders of a package. Data that are not package specific should be placed in the PEcAn.all or PEcAn.utils files.

Some useful conventions:

27.6.4.1 Settings

A generic settings can be found in the PEcAn.all package

settings.xml <- system.file("pecan.biocro.xml", package = "PEcAn.BIOCRO")
settings <- read.settings(settings.xml)

database settings can be specified, and tests run only if a connection is available

We currently use the following database to run tests against; tests that require access to a database should check db.exists() and be skipped if it returns FALSE to avoid failed tests on systems that do not have the database installed.

settings$database <- list(userid = "bety", 
                          passwd = "bety", 
                          name = "bety",     # database name 
                          host = "localhost" # server name)
test_that(..., {
  skip_if_not(db.exists(settings$database))
  ## write tests here
})

instructions for installing this are available on the VM creation wiki
examples can be found in the PEcAn.DB package (base/db/tests/testthat/).
Model specific settings can go in the model-specific module, for example:

settings.xml <- system.file("extdata/pecan.biocro.xml", package = "PEcAn.BIOCRO")
settings <- read.settings(settings.xml)

test-specific settings:

settings text can be specified inline:

settings.text <- "
  <pecan>
    <nocheck>nope</nocheck> ## allows bypass of checks in the read.settings functions
    <pfts>
      <pft>
        <name>ebifarm.pavi</name>
        <outdir>test/</outdir>
      </pft>
    </pfts>
    <outdir>test/</outdir>
    <database>
      <userid>bety</userid>
      <passwd>bety</passwd>
      <location>localhost</location>
      <name>bety</name>
    </database>
  </pecan>"
settings <- read.settings(settings.text)

values in settings can be updated:

settings <- read.settings(settings.text)
settings$outdir <- "/tmp" ## or any other settings

27.6.4.2 Helper functions created to make testing easier

tryl returns FALSE if function gives error
temp.settings creates temporary settings file
test.remote returns TRUE if remote connection is available
db.exists returns TRUE if connection to database is available

27.6.4.3 When should I test?

A test should be written for each of the following situations:

Each bug should get a regression test.

The first step in handling a bug is to write code that reproduces the error
This code becomes the test
most important when error could re-appear
essential when error silently produces invalid results

Every time a (non-trivial) function is created or edited

Write tests that indicate how the function should perform
- example: expect_equal(sum(1,1), 2) indicates that the sum function should take the sum of its arguments
Write tests for cases under which the function should throw an error
example: expect_error(sum("foo"))
better : expect_error(sum("foo"), "invalid 'type' (character)")

27.6.4.4 What types of testing are important to understand?

27.6.4.5 Unit Testing / Test Driven Development

Tests are only as good as the test

write test
write code

27.6.4.6 Regression Testing

When a bug is found,

write a test that finds the bug (the minimum test required to make the test fail)
fix the bug
bug is fixed when test passes

27.6.4.7 How should I test in R? The testthat package.

tests are found in ~/pecan/<packagename>/inst/tests, for example utils/inst/tests/

See attached file and http://r-pkgs.had.co.nz/tests.html for details on how to use the testthat package.

27.6.4.7.1 List of Expectations

Full	Abbreviation
expect_that(x, is_true())	expect_true(x)
expect_that(x, is_false())	expect_false(x)
expect_that(x, is_a(y))	expect_is(x, y)
expect_that(x, equals(y))	expect_equal(x, y)
expect_that(x, is_equivalent_to(y))	expect_equivalent(x, y)
expect_that(x, is_identical_to(y))	expect_identical(x, y)
expect_that(x, matches(y))	expect_matches(x, y)
expect_that(x, prints_text(y))	expect_output(x, y)
expect_that(x, shows_message(y))	expect_message(x, y)
expect_that(x, gives_warning(y))	expect_warning(x, y)
expect_that(x, throws_error(y))	expect_error(x, y)

27.6.4.7.2 How to run tests

add the following to “pecan/tests/testthat.R”

library(testthat)
library(mypackage)

test_check("mypackage")

27.6.4.8 basic use of the testthat package

Here is an example of tests (these should be placed in <packagename>/tests/testthat/test-<sourcefilename>.R:

test_that("mathematical operators plus and minus work as expected",{
  expect_equal(sum(1,1), 2)
  expect_equal(sum(-1,-1), -2)
  expect_equal(sum(1,NA), NA)
  expect_error(sum("cat"))
  set.seed(0)
  expect_equal(sum(matrix(1:100)), sum(data.frame(1:100)))
})

test_that("different testing functions work, giving excuse to demonstrate",{
  expect_identical(1, 1)
  expect_identical(numeric(1), integer(1))
  expect_equivalent(numeric(1), integer(1))
  expect_warning(mean('1'))
  expect_that(mean('1'), gives_warning("argument is not numeric or logical: returning NA"))
  expect_warning(mean('1'), "argument is not numeric or logical: returning NA")
  expect_message(message("a"), "a")
})

27.6.4.8.1 Script testing

It is useful to add tests to a script during development. This allows you to test that the code is doing what you expect it to do.

* here is a fake script using the iris data set

test_that("the iris data set has the same basic features as before",{
  expect_equal(dim(iris), c(150,5))
  expect_that(iris$Sepal.Length, is_a("numeric"))
  expect_is(iris$Sepal.Length, "numeric")#equivalent to prev. line
  expect_is(iris$Species, "factor")
})

iris.color <- data.frame(Species = c("setosa", "versicolor", "virginica"),
                         color = c("pink", "blue", "orange"))

newiris <- merge(iris, iris.color)
iris.model <- lm(Petal.Length ~ color, data = newiris)

test_that("changes to Iris code occurred as expected",{
  expect_that(dim(newiris), equals(c(150, 6)))
  expect_that(unique(newiris$color),
              is_identical_to(unique(iris.color$color)))
  expect_equivalent(iris.model$coefficients["(Intercept)"], 4.26)
})

27.6.4.8.2 Function testing

Testing of a new function, as.sequence. The function and documentation are in source:R/utils.R and the tests are in source:tests/test.utils.R.

Recently, I made the function as.sequence to turn any vector into a sequence, with custom handling of NA’s:

function(x, na.rm = TRUE){
  x2 <- as.integer(factor(x, unique(x)))
  if(all(is.na(x2))){
    x2 <- rep(1, length(x2))
  }
  if(na.rm == TRUE){
    x2[is.na(x2)] <- max(x2, na.rm = TRUE) + 1
  }
  return(x2)
}

The next step was to add documentation and test. Many people find it more efficient to write tests before writing the function. This is true, but it also requires more discipline. I wrote these tests to handle the variety of cases that I had observed.

As currently used, the function is exposed to a fairly restricted set of options - results of downloads from the database and transformations.

test_that(“as.sequence works”;{
 expect_identical(as.sequence(c(“a”, “b”)), 1:2)
 expect_identical(as.sequence(c(“a”, NA)), 1:2)
 expect_equal(as.sequence(c(“a”, NA), na.rm = FALSE), c(1,NA))
 expect_equal(as.sequence(c(NA,NA)), c(1,1))
})

27.6.5 Testing the Shiny Server

Shiny can be difficult to debug because, when run as a web service, the R output is hidden in system log files that are hard to find and read. One useful approach to debugging is to use port forwarding, as follows.

First, on the remote machine (including the VM), make sure R’s working directory is set to the directory of the Shiny app (e.g., setwd(/path/to/pecan/shiny/WorkflowPlots), or just open the app as an RStudio project). Then, in the R console, run the app as:

shiny::runApp(port = XXXX)
# E.g. shiny::runApp(port = 5638)

Then, on your local machine, open a terminal and run the following command, matching XXXX to the port above and YYYY to any unused port on your local machine (any 4-digit number should work).

ssh -L YYYY:localhost:XXXX <remote connection>
# E.g., for the PEcAn VM, given the above port:
# ssh -L 5639:localhost:5638 carya@localhost -p 6422

Now, in a web browser on your local machine, browse to localhost:YYYY (e.g., localhost:5639) to run whatever app you started with shiny::runApp in the previous step. All of the output should display in the R console where the shiny::runApp command was executed. Note that this includes any print, message, logger.*, etc. statements in your Shiny app.

If the Shiny app hits an R error, the backtrace should include a line like Hit error at of server.R#LXX – that XX being a line number that you can use to track down the error. To return from the error to a normal R prompt, hit <Control>-C (alternatively, the “Stop” button in RStudio). To restart the app, run shiny::runApp(port = XXXX) again (keeping the same port).

Note that Shiny runs any code in the pecan/shiny/<app> directory at the moment the app is launched. So, any changes you make to the code in server.R and ui.R or scripts loaded therein will take effect the next time the app is started.

If for whatever reason this doesn’t work with RStudio, you can always run R from the command line. Also, note that the ability to forward ports (ssh -L) may depend on the ssh configuration of your remote machine. These instructions have been tested on the PEcAn VM (v.1.5.2+).