8 Developer guide
8.1 Updating PEcAn Code and Bety Database
Release notes for all releases can be found here.
This page will only list any steps you have to do to upgrade an existing system. When updating PEcAn it is highly encouraged to update BETY. You can find instructions on how to do this, as well on how to update the database in the Updating BETYdb gitbook page.
8.1.1 Updating PEcAn
The latest version of PEcAn code can be obtained from the PEcAn repository on GitHub:
The PEcAn build system is based on GNU Make.
The simplest way to install is to run make
from inside the PEcAn directory.
This will update the documentation for all packages and install them, as well as all required dependencies.
For more control, the following make
commands are available:
make document
– Usedevtools::document
to update the documentation for all package. Under the hood, this uses theroxygen2
documentation system.make install
– Install all packages and their dependnencies usingdevtools::install
. By default, this only installs packages that have had their code changed and any dependent packages.make check
– Perform a rigorous check of packages usingdevtools::check
make test
– Run all unit tests (based ontestthat
package) for all packages, usingdevtools::test
make clean
– Remove the make build cache, which is used to track which packages have changed. Cache files are stored in the.doc
,.install
,.check
, and.test
subdirectories in the PEcAn main directory. Runningmake clean
will force the next invocation ofmake
commands to operate on all PEcAn packages, regardless of changes.
The following are some additional make
tricks that may be useful:
Install, check, document, or test a specific package –
make .<cmd>/<pkg-dir>
; e.g.make .install/utils
ormake .check/modules/rtm
Force
make
to run, even if package has not changed –make -B <command>
Run
make
commands in parallel –make -j<ncores>
; e.g.make -j4 install
to install packages using four parallel processes.
All instructions for the make
build system are contained in the Makefile
in the PEcAn root directory.
For full documentation on make
, see the man pages by running man make
from a terminal.
8.2 Git and GitHub Workflow
Using Git
8.2.1 Using Git
This document describes the steps required to download PEcAn, make changes to code, and submit your changes.
- If you are new to GitHub or to PEcAn, start with the one-time set-up instructions under Before any work is done. Also see the excellent tutorials and references in the Git) section right below this list and at the bootom in References.
- To make trivial changes, see Quick and Easy.
- To make a few changes to the code, start with the Basic Workflow.
- To make substantial changes and/or if plan to contribute over time see Recommended Workflow: A new branch for each change.
8.2.1.1 Git
Git is a free & open source, distributed version control system designed to handle everything from small to very large projects with speed and efficiency. Every Git clone is a full-fledged repository with complete history and full revision tracking capabilities, not dependent on network access or a central server. Branching and merging are fast and easy to do.
A good place to start is the GitHub 5 minute illustrated tutorial. In addition, there are three fun tutorials for learning git:
- Learn Git is a great web-based interactive tutorial.
- LearnGitBranching
- TryGit.
URLs In the rest of the document will use specific URL’s to clone the code. There a few URL’s you can use to clone a project, using https, ssh and git. You can use either https or git to clone a repository and write to it. The git protocol is read-only. This document describes the steps required to download PEcAn, make changes to code, and submit your changes.
8.2.1.2 PEcAn Project and Github
- Organization Repository: https://github.com/organizations/PecanProject
- PEcAn source code: https://github.com/PecanProject/pecan.git
- BETYdb source code: https://github.com/PecanProject/bety.git
These instructions apply to other repositories too.
8.2.1.3 PEcAn Project Branches
We follow branch organization laid out on this page.
In short, there are three main branches you must be aware of:
- develop - Main Branch containing the latest code. This is the main branch you will make changes to.
- master - Branch containing the latest stable code. DO NOT MAKE CHANGES TO THIS BRANCH.
- release/vX.X.X - Named branches containing code specific to a release. Only make changes to this branch if you are fixing a bug on a release branch.
8.2.1.4 Milestones, Issues, Tasks
The Milestones, issues, and tasks can be used to organize specific features or research projects. In general, there is a heirarchy:
- milestones (Big picture, “Epic”): contains many issues, organized by release.
- issues (Specific features / bugs, “Story”): may contain a list of tasks; represent
- task list (to do list, “Tasks”): list of steps required to close an issue, e.g.:
* [ ] first do this |
* [ ] then this |
* [ ] completed when x and y |
8.2.1.5 Quick and Easy
The easiest approach is to use GitHub’s browser based workflow. This is useful when your change is a few lines, if you are editing a wiki, or if the edit is trivial (and won’t break the code). The GitHub documentation is here but it is simple: finding the page or file you want to edit, click “edit” and then the GitHub web application will automatically forking and branch, then allow you to submit a pull request. However, it should be noted that unless you are a member of the PEcAn project that the “edit” button will not be active and you’ll want to follow the workflow described below for forking and then submitting a pull request.
8.2.1.6 Recommended Git Workflow
Each feature should be in its own branch (for example each issue is a branch, names of branches are often the issue in a bug tracking system).
Commit and Push Frequency On your branch, commit at minimum once a day before you push changes: even better: every time you reach a stopping point and move to a new issue. best: any time that you have done work that you do not want to re-do. Remember, pushing changes to your branch is like saving a draft. Submit a pull request when you are done.
8.2.1.7 Before any work is done
The first step below only needs to be done once when you first start working on the PEcAn code. The steps below that need to be done to set up PEcAn on your computer, and would need to be repeated if you move to a new computer. If you are working from the PEcAn VM, you can skip the “git clone” since the PEcAn code is already installed.
Most people will not be able to work in the PEcAn repository directly and will need to create a fork of the PEcAn source code in their own folder. To fork PEcAn into your own github space (github help: “fork a repo”). This forked repository will allow you to create branches and commit changes back to GitHub and create pull requests to the develop branch of PEcAn.
The forked repository is the only way for external people to commit code back to PEcAn and BETY. The pull request will start a review process that will eventually result in the code being merged into the main copy of the codebase. See https://help.github.com/articles/fork-a-repo for more information, especially on how to keep your fork up to date with respect to the original. (Rstudio users should also see Git + Rstudio, below)
You can setup SSH keys to make it easier to commit cod back to GitHub. This might especially be true if you are working from a cluster, see set up ssh keys
- Introduce yourself to GIT
git config --global user.name "FULLNAME"
git config --global user.email you@yourdomain.example.com
Fork PEcAn on GitHub. Go to the PEcAn source code and click on the Fork button in the upper right. This will create a copy of PEcAn in your personal space.
Clone to your local machine via command line
git clone git@github.com:<username>/pecan.git
If this does not work, try the https method
git clone https://github.com/PecanProject/pecan.git
- Define upstream repository
cd pecan
git remote add upstream git@github.com:PecanProject/pecan.git
8.2.1.8 During development:
- commit often;
- each commit can address 0 or 1 issue; many commits can reference an issue
- ensure that all tests are passing before anything is pushed into develop.
8.2.1.9 Basic Workflow
This workflow is for educational purposes only. Please use the Recommended Workflow if you plan on contributing to PEcAn. This workflow does not include creating branches, a feature we would like you to use. 1. Get the latest code from the main repository
git pull upstream develop
Do some coding
Commit after each chunk of code (multiple times a day)
git commit -m "<some descriptive information about what was done; references/fixes gh-X>"
- Push to YOUR Github (when a feature is working, a set of bugs are fixed, or you need to share progress with others)
git push origin develop
- Before submitting code back to the main repository, make sure that code compiles from the main directory.
make
- submit pull request with a reference to related issue;
- also see github documentation
8.2.1.10 Recommended Workflow: A new branch for each change
- Make sure you start in develop
git checkout develop
- Make sure develop is up to date
git pull upstream develop
- Run the PEcAn MAKEFILE to compile code from the main directory.
make
- Create a branch and switch to it
git checkout -b <branchname>
- Work/commit/etc
git add <file_that_was_changed.R>
git commit -m "<some descriptive information about what was done>"
- Make sure that code compiles and documentation updated. The make document command will run roxygenise.
make document
make
- Push this branch to your github space
git push origin <branchname>
- submit pull request with [[link commits to issues|Using-Git#link-commits-to-issuess]];
8.2.1.11 After pull request is merged
- Make sure you start in master
git checkout develop
- delete branch remotely
git push origin --delete <branchname>
- delete branch locally
git branch -D <branchname>
8.2.1.12 Fixing a release Branch
If you would like to make changes to a release branch, you must follow a different workflow, as the release branch will not contain the latest code on develop and must remain seperate.
- Fetch upstream remote branches
git fetch upstream
- Checkout the correct release branch
git checkout -b release/vX.Y.Z
- Compile Code with make
make
- Make changes and commit them
git add <changed_file.R>
git commit -m "Describe changes"
Compile and make roxygen changes
make
make document
Commit and push any files that were changed by make document
Make a pull request. It is essential that you compare your pull request to the remote release branch, NOT the develop branch.
8.2.1.13 Link commits to issues
You can reference and close issues from comments, pull requests, and commit messages. This should be done when you commit code that is related to or will close/fix an existing issue.
There are two ways to do this. One easy way is to include the following text in your commit message:
- Github
- to close: “closes gh-xxx” (or syn. close, closed, fixes, fix, fixed)
- to reference: just the issue number (e.g. “gh-xxx”)
8.2.1.14 Other Useful Git Commands:
- GIT encourages branching “early and often”
- First pull from develop
- Branch before working on feature
- One branch per feature
- You can switch easily between branches
- Merge feature into main line when branch done
If during above process you want to work on something else, commit all your code, create a new branch, and work on new branch.
- Delete a branch:
git branch -d <name of branch>
- To push a branch git:
push -u origin
` - To check out a branch:
git fetch origin
git checkout --track origin/<name of branch>
- Show graph of commits:
git log --graph --oneline --all
8.2.1.16 Git + Rstudio
Rstudio is nicely integrated with many development tools, including git and GitHub. It is quite easy to check out source code from within the Rstudio program or browser. The Rstudio documentation includes useful overviews of version control and R package development.
Once you have git installed on your computer (see the Rstudio version control documentation for instructions), you can use the following steps to install the PEcAn source code in Rstudio.
8.2.1.17 Creating a Read-only version:
This is a fast way to clone the repository that does not support contributing new changes (this can be done with further modification).
- install Rstudio (www.rstudio.com)
- click (upper right) project
- create project
- version control
- Git - clone a project from a Git Repository
- paste https://www.github.com/PecanProject/pecan
- choose working dir. for repo
8.2.1.18 For development:
- create account on github
- create a fork of the PEcAn repository to your own account https://www.github.com/pecanproject/pecan
- install Rstudio (www.rstudio.com)
- generate an ssh key
- in Rstudio:
Tools -> Options -> Git/SVN -> "create RSA key"
View public key -> ctrl+C to copy
- in GitHub
- go to ssh settings
-> 'add ssh key' -> ctrl+V to paste -> 'add key'
- Create project in Rstudio
project (upper right) -> create project -> version control -> Git - clone a project from a Git Repository
- paste repository url
git@github.com:<username>/pecan.git>
- choose working dir. for repository
8.2.1.19 References
8.2.1.20 Git Documentation
- Scott Chacon, ‘Pro Git book’, http://git-scm.com/book
- GitHub help pages, https://help.github.com/
- Main GIT page http://git-scm.com/documentation
- Another set of pages about branching, http://sandofsky.com/blog/git-workflow.html
- Stackoverflow highest voted questions tagged “git”
8.2.1.21 GitHub Documentation
When in doubt, the first step is to click the “Help” button at the top of the page.
- GitHub Flow by Scott Chacon (Git evangelist and Ruby developer working on GitHub.com)
- GitHub FAQ
- Using Pull Requests
- SSH Keys
8.2.2 GitHub use with PEcAn
In this section, development topics are introduced and discussed. PEcAn code lives within the If you are looking for an issue to work on, take a look through issues labled “good first issue”. To get started you will want to review
We use GitHub to track development.
To learn about GitHub, it is worth taking some time to read through the FAQ. When in doubt, the first step is to click the “Help” button at the top of the page.
- To address specific people, use a github feature called @mentions e.g. write @dlebauer, @robkooper, @mdietze, or @serbinsh … in the issue to alert the user as described in the GitHub documentation on notifications
8.2.2.1 Bugs, Issues, Features, etc.
8.2.2.2 Reporting a bug
- (For developers) work through debugging.
- Once you have identified a problem, that you can not resolve, you can write a bug report
- Write a bug report
- submit the bug report
- If you do find the answer, explain the resolution (in the issue) and close the issue
8.2.2.3 Required content
Note:
- a bug is only a bug if it is reproducible
- clear bug reports save time
- Clear, specific title
- Description -
- What you did
- What you expected to happen
- What actually happened
- What does work, under what conditions does it fail?
- Reproduction steps - minimum steps required to reproduce the bug
- additional materials that could help identify the cause:
- screen shots
- stack traces, logs, scripts, output
- specific code and data / settings / configuration files required to reproduce the bug
- environment (operating system, browser, hardware)
8.2.2.4 Requesting a feature
(from The Pragmatic Programmer, available as
ebook
through UI libraries, hardcopy on David’s bookshelf)
- focus on “user stories”, e.g. specific use cases
Be as specific as possible,
Here is an example:
- Bob is at www.mysite.edu/maps
- map of the the region (based on user location, e.g. US, Asia, etc)
- option to “use current location” is provided, if clicked, map zooms in to, e.g. state or county level
- for site run:
- option to select existing site or specify point by lat/lon
- option to specify a bounding box and grid resolution in either lat/lon or polar stereographic.
- asked to specify start and end times in terms of year, month, day, hour, minute. Time is recorded in UTC not local time, this should be indicated.
8.2.2.5 Closing an issue
- Definition of “Done”
- test
- documentation
- when issue is resolved:
- status is changed to “resolved”
- assignee is changed to original author
- if original author agrees that issue has been resolved
- original author changes status to “closed”
- except for trivial issues, issues are only closed by the author
8.2.2.6 When to submit an issue?
Ideally, non-trivial code changes will be linked to an issue and a commit.
This requires creating issues for each task, making small commits, and referencing the issue within your commit message. Issues can be created on GitHub. These issues can be linked to commits by adding text such as fixes gh-5
).
Rationale: This workflow is a small upfront investment that reduces error and time spent re-creating and debugging errors. Associating issues and commits, makes it easier to identify why a change was made, and potential bugs that could arise when the code is changed. In addition, knowing which issue you are working on clarifies the scope and objectives of your current task.
8.3 Coding Practices
8.3.1 Coding Style
Consistent coding style improves readability and reduces errors in shared code.
R does not have an official style guide, but Hadley Wickham provides one that is well thought out and widely adopted. Advanced R: Coding Style.
Both the Wickham text and this page are derived from Google’s R Style Guide.
8.3.1.1 Use Roxygen2 documentation
This is the standard method of documentation used in PEcAn development, it provides inline documentation similar to doxygen. Even trivial functions should be documented.
See Roxygen2.
8.3.1.2 Write your name at the top
Any function that you create or make a meaningful contribution to should have your name listed after the author tag in the function documentation.
8.3.1.3 Use testthat testing package
See Unit_Testing for instructions, and Advanced R: Tests.
- tests provide support for documentation - they define what a function is (and is not) expected to do
- all functions need tests to ensure basic functionality is maintained during development.
- all bugs should have a test that reproduces the bug, and the test should pass before bug is closed
8.3.1.4 Don’t use shortcuts
R provides many shortcuts that are useful when coding interactively, or for writing scripts. However, these can make code more difficult to read and can cause problems when written into packages.
8.3.1.5 Function Names (verb.noun
)
Following convention established in PEcAn 0.1, we use the all lowercase with periods to separate words. They should generally have a verb.noun
format, such as query.traits
, get.samples
, etc.
8.3.1.6 File Names
File names should end in .R
, .Rdata
, or .rds
(as appropriate) and should be meaningful, e.g. named after the primary functions that they contain. There should be a separate file for each major high-level function to aid in identifying the contents of files in a directory.
8.3.1.7 Use “<-” as an assignment operator
Because most R code uses <- (except where = is required), we will use <-
=
is reserved for function arguments
8.3.1.8 Use Spaces
- around all binary operators (=, +, -, <-, etc.).
- after but not before a comma
8.3.1.9 Use curly braces
The option to omit curly braces is another shortcut that makes code easier to write but harder to read and more prone to error.
8.3.1.10 Package Dependencies
In the source code for PEcAn functions, all functions that are not from base R or the current package must be called with explicit namespacing; i.e. package::function
(e.g. ncdf4::nc_open(...)
, dplyr::select()
, PEcAn.logger::logger.warn()
).
This is intended to maximize clarity for current and future developers (including yourself), and to make it easier to quickly identify (and possibly remove) external dependencies.
In addition, it may be a good idea to call some base R functions with known, common namespace conflicts this way as well.
For instance, if you want to use base R’s filter
function, it’s a good idea to write it as stats::filter
to avoid unintentional conflicts with dplyr::filter
.
The one exception to this rule is infix operators (e.g. magrittr::"%>%"
) which cannot be conveniently namespaced.
These functions should be imported using the Roxygen @importFrom
tag.
For example:
#' My function
#'
#' @param a First param
#' @param b Second param
#' @returns Something
#' @importFrom magrittr %>%
#' @export
f <- myfunction(a, b) {
something(a) %>% something_else(b)
}
Never use library
or require
inside package functions.
Any package dependencies added in this way should be added to the Imports:
list in the package DESCRIPTION
file.
Do not use Depends:
unless you have a very good reason.
The Imports
list should be sorted alphabetically, with each package on its own line.
It is also a good idea to include version requirements in the Imports
list (e.g. dplyr (>=0.7)
).
External packages that do not provide essential functionality can be relegated to Suggests
instead of Imports
.
In particular, consider this for packages that are large, difficult to install, and/or bring in a large number of their own dependencies.
Functions using these kinds of dependencies should check for their availability with requireNamespace
and fail informatively in their absence.
For example:
8.3.2 Logging
During development we often add many print statements to check to see how the code is doing, what is happening, what intermediate results there are etc. When done with the development it would be nice to turn this additional code off, but have the ability to quickly turn it back on if we discover a problem. This is where logging comes into play. Logging allows us to use “rules” to say what information should be shown. For example when I am working on the code to create graphs, I do not have to see any debugging information about the SQL command being sent, however trying to figure out what goes wrong during a SQL statement it would be nice to show the SQL statements without adding any additional code.
8.3.2.1 PEcAn logging functions
These logger
family of functions are more sophisticated, and can be used in place of stop
, warn
, print
, and similar functions. The logger
functions make it easier to print to a system log file.
- The file test.logger.R provides descriptive examples
- This query provides an current overview of functions that use logging
- logger functions (in order of increasing level):
logger.debug
logger.info
logger.warn
logger.error
- the
logger.setLevel
function sets the level at which a message will be printed logger.setLevel("DEBUG")
will print messages from all logger functionslogger.setLevel("ERROR")
will only print messages fromlogger.error
logger.setLevel("INFO")
andlogger.setLevel("WARN")
shows messages fromlogger.<level>
and higher functions, e.g.logger.setLevel("WARN")
shows messages fromlogger.warn
andlogger.error
logger.setLevel("OFF")
suppresses all logger messages- To print all messages to console, use
logger.setUseConsole(TRUE)
8.3.2.2 Other R logging packages
- This section is for reference - these functions should not be used in PEcAn, as they are redundant with the
logger.*
functions described above
R does provide a basic logging capability using stop, warning and message. These allow to print message (and stop execution in case of stop). However there is not an easy method to redirect the logging information to a file, or turn the logging information on and off. This is where one of the following packages comes into play. The packages themselves are very similar since they try to emulate log4j.
Both of the following packages use a hierarchic loggers, meaning that if you change the level of displayed level of logging at one level all levels below it will update their logging.
8.3.2.2.1 logging
The logging development is done at http://logging.r-forge.r-project.org/ and more information is located at http://cran.r-project.org/web/packages/logging/index.html . To install use the following command:
This has my preference pure based on documentation.
8.3.2.3 futile.logger
The second logging package is http://cran.r-project.org/web/packages/futile.logger/ and is eerily similar to logging (as a matter of fact logging is based on futile).
8.3.2.3.1 Example Usage
To be able to use the loggers there needs to be some initialization done. Neither package allows to read it from a configuration file, so we might want to use the pecan.xml file to set it up. The setup will always be somewhat the same:
# load library
library(logging)
logReset()
# add handlers, responsible for actually printing/saving the messages
addHandler(writeToConsole)
addHandler(writeToFile, file="file.log")
# setup root logger with INFO
setLevel('INFO')
# make all of PEcAn print debug messages
setLevel('DEBUG', getLogger('PEcAn'))
# only print info and above for the SQL part of PEcAn
setLevel('INFO', getLogger('PEcAn.SQL'))
To now use logging in the code you can use the following code:
pl <- getLogger('PEcAn.MetaAnalysis.function1')
pl$info("This is an INFO message.")
pl$debug("The value for x=%d", x)
pl$error("Something bad happened and I am scared now.")
or
8.3.3 Package Data
8.3.3.1 Summary:
Files with the following extensions will be read by R as data:
- plain R code in .R and .r files are sourced using
source()
- text tables in .tab, .txt, .csv files are read using
read()
** objects in R image files: .RData, .rda are loaded usingload()
- capitalization matters
- all objects in foo.RData are loaded into environment
- pro: easiset way to store objects in R format
- con: format is application (R) specific
Details are in ?data
, which is mostly a copy of Data section of
Writing R
Extensions.
8.3.3.2 Accessing data
Data in the [data] directory will be accessed in the following ways,
- efficient way: (especially for large data sets) using the
data
function:
- easy way: by adding the following line to the package DESCRIPTION: note: this should be used with caution or it can cause difficulty as discussed in redmine issue #1118
From the R help page:
Currently, a limited number of data formats can be accessed using the data
function by placing one of the following filetypes in a packages’ data
directory:
* files ending .R
or .r
are source()
d in, with the R working
directory changed temporarily to the directory containing the respective
file. (data
ensures that the utils
package is attached, in case it
had been run via utils::data
.)
* files ending .RData
or .rda
are load()
ed.
* files ending .tab
, .txt
or .TXT
are read using read.table(..., header = TRUE)
, and hence result in a data frame.
* files ending .csv
or .CSV
are read using read.table(..., header = TRUE, sep = ';')
, and also result in a data frame.
If your data does not fall in those 4 categories, or you can use the
system.file
function to get access to the data:
system.file("data", "ed.trait.dictionary.csv", package="PEcAn.utils")
[1] "/home/kooper/R/x86_64-pc-linux-gnu-library/2.15/PEcAn.utils/data/ed.trait.dictionary.csv"
The arguments are folder, filename(s) and then package. It will return the fully qualified path name to a file in a package, in this case it points to the trait data. This is almost the same as the data function, however we can now use any function to read the file, such as read.csv instead of read.csv2 which seems to be the default of data. This also allows us to store arbitrary files in the data folder, such as the the bug file and load it when we need it.
8.3.3.2.1 Examples of data in PEcAn packages
- outputs: [/modules/uncertainties/data/output.RData]
- parameter samples [/modules/uncertainties/data/samples.RData]
8.3.4 Roxygen2
This is the standard method of documentation used in PEcAn development, it provides inline documentation similar to doxygen.
8.3.4.1 Canonical references:
- Must Read: R package development by Hadley Wickham:
- Object Documentation
- Package Metadata
- Roxygen2 Documentation
- Roxygen2 Package Documentation
- GitHub
8.3.4.2 Basic Roxygen2 instructions:
Section headers link to “Writing R extensions” which provides in-depth documentation. This is provided as an overview and quick reference.
8.3.4.4 Text markup
8.3.4.4.1 Formatting
\bold{}
\emph{}
italics
8.3.4.4.2 Links
\url{www.url.com}
or\href{url}{text}
for links\code{\link{thisfn}}
links to function “thisfn” in the same package\code{\link{foo::thatfn}}
links to function “thatfn” in package “foo”\pkg{package_name}
8.3.4.4.3 Math
\eqn{a+b=c}
uses LaTex to format an inline equation\deqn{a+b=c}
uses LaTex to format displayed equation\deqn{latex}{ascii}
and\eqn{latex}{ascii}
can be used to provide different versions in latex and ascii.
8.3.4.4.4 Lists
\enumerate{
\item A database consists of one or more records, each with one or
more named fields.
\item Regular lines start with a non-whitespace character.
\item Records are separated by one or more empty lines.
}
\itemize and \enumerate commands may be nested.
8.3.4.4.5 “Tables”:http://cran.r-project.org/doc/manuals/R-exts.html#Lists-and-tables
\tabular{rlll}{
[,1] \tab Ozone \tab numeric \tab Ozone (ppb)\cr
[,2] \tab Solar.R \tab numeric \tab Solar R (lang)\cr
[,3] \tab Wind \tab numeric \tab Wind (mph)\cr
[,4] \tab Temp \tab numeric \tab Temperature (degrees F)\cr
[,5] \tab Month \tab numeric \tab Month (1--12)\cr
[,6] \tab Day \tab numeric \tab Day of month (1--31)
}
8.3.4.5 Example
Here is an example documented function, myfun
##' My function adds three numbers
##'
##' A great function for demonstrating Roxygen documentation
##' @param a numeric
##' @param b numeric
##' @param c numeric
##' @return d, numeric sum of a + b + c
##' @export
##' @author David LeBauer
##' @examples
##' myfun(1,2,3)
##' \dontrun{myfun(NULL)}
myfun <- function(a, b, c){
d <- a + b + c
return(d)
}
In emacs, with the cursor inside the function, the keybinding C-x O will generate an outline or update the Roxygen2 documentation.
8.3.4.6 Updating documentation
- After adding documentation run the following command (replacing common with the name of the folder you want to update): ** In R using devtools to call roxygenize:
8.3.5 Testing
PEcAn uses the testthat package developed by Hadley Wickham. Hadley has written instructions for using this package in his Testing chapter.
8.3.5.1 Rationale
- makes development easier
- provides working documentation of expected functionality
- saves time by allowing computer to take over error checking once a test has been made
- improves code quality
- Further reading: Aruliah et al 2012 Best Practices for Scientific Computing
8.3.5.2 Tests makes development easier and less error prone
Testing makes it easier to develop by organizing everything you are already doing anyway - but integrating it into the testing and documentation. With a codebase like PEcAn, it is often difficult to get started. You have to figure out
- what was I doing yesterday?
- what do I want to do today?
- what existing functions do I need to edit?
- what are the arguments to these functions (and what are examples of valid arguments)
- what packages are affected
- where is a logical place to put files used in testing
8.3.5.3 Quick Start:
- decide what you want to do today
- identify the issue in github (if none exists, create one)
to work on issue 99, create a new branch called “github99” or some descriptive name… Today we will enable an existing function,
make.cheas
to makegoat.cheddar
. We will know that we are done by the color and taste.git branch goat-cheddar git checkout goat-cheddar
- open existing (or create new) file in
inst/tests/
. If working on code in “myfunction” or a set of functions in “R/myfile.R”, the file should be named accordingly, e.g. “inst/tests/test.myfile.R” - if you are lucky, the function has already been tested and has some examples.
if not, you may need to create a minimal example, often requiring a settings file. The default settings file can be obtained in this way:
write what you want to do
test_that("make.cheas can make cheese",{ goat.cheddar <- make.cheas(source = 'goat', style = 'cheddar') expect_equal(color(goat.cheddar), "orange") expect_is(object = goat.cheddar, class = "cheese") expect_true(all(c("sharp", "creamy") %in% taste(goat.cheddar))) }
- now edit the goat.cheddar function until it makes savory, creamy, orange cheese.
- commit often
update documentation and test
- commit again
when complete, merge, and push
8.3.5.4 Test files
Many of PEcAn’s functions require inputs that are provided as data.
These can be in the /data
or the /inst/extdata
folders of a package.
Data that are not package specific should be placed in the PEcAn.all or
PEcAn.utils files.
Some useful conventions:
8.3.5.5 Settings
- A generic settings can be found in the PEcAn.all package
settings.xml <- system.file("pecan.biocro.xml", package = "PEcAn.BIOCRO")
settings <- read.settings(settings.xml)
- database settings can be specified, and tests run only if a connection is available
We currently use the following database to run tests against; tests that require access to a database should check db.exists()
and be skipped if it returns FALSE to avoid failed tests on systems that do not have the database installed.
settings$database <- list(userid = "bety",
passwd = "bety",
name = "bety", # database name
host = "localhost" # server name)
test_that(..., {
skip_if_not(db.exists(settings$database))
## write tests here
})
- instructions for installing this are available on the VM creation wiki
examples can be found in the PEcAn.DB package (
base/db/tests/testthat/
).Model specific settings can go in the model-specific module, for example:
settings.xml <- system.file("extdata/pecan.biocro.xml", package = "PEcAn.BIOCRO")
settings <- read.settings(settings.xml)
- test-specific settings:
settings text can be specified inline:
settings.text <- " <pecan> <nocheck>nope</nocheck> ## allows bypass of checks in the read.settings functions <pfts> <pft> <name>ebifarm.pavi</name> <outdir>test/</outdir> </pft> </pfts> <outdir>test/</outdir> <database> <userid>bety</userid> <passwd>bety</passwd> <location>localhost</location> <name>bety</name> </database> </pecan>" settings <- read.settings(settings.text)
values in settings can be updated:
8.3.5.6 Helper functions created to make testing easier
- tryl returns FALSE if function gives error
- temp.settings creates temporary settings file
- test.remote returns TRUE if remote connection is available
- db.exists returns TRUE if connection to database is available
8.3.5.7 When should I test?
A test should be written for each of the following situations:
- Each bug should get a regression test.
- The first step in handling a bug is to write code that reproduces the error
- This code becomes the test
- most important when error could re-appear
- essential when error silently produces invalid results
- Every time a (non-trivial) function is created or edited
- Write tests that indicate how the function should perform
- example:
expect_equal(sum(1,1), 2)
indicates that the sum function should take the sum of its arguments
- example:
- Write tests for cases under which the function should throw an error
- example:
expect_error(sum("foo"))
- better :
expect_error(sum("foo"), "invalid 'type' (character)")
8.3.5.8 What types of testing are important to understand?
8.3.5.9 Unit Testing / Test Driven Development
Tests are only as good as the test
- write test
- write code
8.3.5.10 Regression Testing
When a bug is found,
- write a test that finds the bug (the minimum test required to make the test fail)
- fix the bug
- bug is fixed when test passes
8.3.5.11 How should I test in R? The testthat package.
tests are found in ~/pecan/<packagename>/inst/tests
, for example
utils/inst/tests/
See attached file and http://r-pkgs.had.co.nz/tests.html for details on how to use the testthat package.
8.3.5.11.1 List of Expectations
Full | Abbreviation |
---|---|
expect_that(x, is_true()) | expect_true(x) |
expect_that(x, is_false()) | expect_false(x) |
expect_that(x, is_a(y)) | expect_is(x, y) |
expect_that(x, equals(y)) | expect_equal(x, y) |
expect_that(x, is_equivalent_to(y)) | expect_equivalent(x, y) |
expect_that(x, is_identical_to(y)) | expect_identical(x, y) |
expect_that(x, matches(y)) | expect_matches(x, y) |
expect_that(x, prints_text(y)) | expect_output(x, y) |
expect_that(x, shows_message(y)) | expect_message(x, y) |
expect_that(x, gives_warning(y)) | expect_warning(x, y) |
expect_that(x, throws_error(y)) | expect_error(x, y) |
8.3.5.11.2 How to run tests
add the following to “pecan/tests/testthat.R”
8.3.5.12 basic use of the testthat package
Here is an example of tests (these should be placed in
<packagename>/tests/testthat/test-<sourcefilename>.R
:
test_that("mathematical operators plus and minus work as expected",{
expect_equal(sum(1,1), 2)
expect_equal(sum(-1,-1), -2)
expect_equal(sum(1,NA), NA)
expect_error(sum("cat"))
set.seed(0)
expect_equal(sum(matrix(1:100)), sum(data.frame(1:100)))
})
test_that("different testing functions work, giving excuse to demonstrate",{
expect_identical(1, 1)
expect_identical(numeric(1), integer(1))
expect_equivalent(numeric(1), integer(1))
expect_warning(mean('1'))
expect_that(mean('1'), gives_warning("argument is not numeric or logical: returning NA"))
expect_warning(mean('1'), "argument is not numeric or logical: returning NA")
expect_message(message("a"), "a")
})
8.3.5.12.1 Script testing
It is useful to add tests to a script during development. This allows you to test that the code is doing what you expect it to do.
* here is a fake script using the iris data set
test_that("the iris data set has the same basic features as before",{
expect_equal(dim(iris), c(150,5))
expect_that(iris$Sepal.Length, is_a("numeric"))
expect_is(iris$Sepal.Length, "numeric")#equivalent to prev. line
expect_is(iris$Species, "factor")
})
iris.color <- data.frame(Species = c("setosa", "versicolor", "virginica"),
color = c("pink", "blue", "orange"))
newiris <- merge(iris, iris.color)
iris.model <- lm(Petal.Length ~ color, data = newiris)
test_that("changes to Iris code occurred as expected",{
expect_that(dim(newiris), equals(c(150, 6)))
expect_that(unique(newiris$color),
is_identical_to(unique(iris.color$color)))
expect_equivalent(iris.model$coefficients["(Intercept)"], 4.26)
})
8.3.5.12.2 Function testing
Testing of a new function, as.sequence
. The function and documentation
are in source:R/utils.R and the tests are in source:tests/test.utils.R.
Recently, I made the function as.sequence
to turn any vector into a
sequence, with custom handling of NA’s:
function(x, na.rm = TRUE){
x2 <- as.integer(factor(x, unique(x)))
if(all(is.na(x2))){
x2 <- rep(1, length(x2))
}
if(na.rm == TRUE){
x2[is.na(x2)] <- max(x2, na.rm = TRUE) + 1
}
return(x2)
}
The next step was to add documentation and test. Many people find it more efficient to write tests before writing the function. This is true, but it also requires more discipline. I wrote these tests to handle the variety of cases that I had observed.
As currently used, the function is exposed to a fairly restricted set of options - results of downloads from the database and transformations.
8.3.5.13 Testing the Shiny Server
Shiny can be difficult to debug because, when run as a web service, the R output is hidden in system log files that are hard to find and read. One useful approach to debugging is to use port forwarding, as follows.
First, on the remote machine (including the VM), make sure R’s working directory is set to the directory of the Shiny app (e.g., setwd(/path/to/pecan/shiny/WorkflowPlots)
, or just open the app as an RStudio project).
Then, in the R console, run the app as:
shiny::runApp(port = XXXX)
# E.g. shiny::runApp(port = 5638)
Then, on your local machine, open a terminal and run the following command, matching XXXX
to the port above and YYYY
to any unused port on your local machine (any 4-digit number should work).
ssh -L YYYY:localhost:XXXX <remote connection>
# E.g., for the PEcAn VM, given the above port:
# ssh -L 5639:localhost:5638 carya@localhost -p 6422
Now, in a web browser on your local machine, browse to localhost:YYYY
(e.g., localhost:5639
) to run whatever app you started with shiny::runApp
in the previous step.
All of the output should display in the R console where the shiny::runApp
command was executed.
Note that this includes any print
, message
, logger.*
, etc. statements in your Shiny app.
If the Shiny app hits an R error, the backtrace should include a line like Hit error at of server.R#LXX
– that XX
being a line number that you can use to track down the error.
To return from the error to a normal R prompt, hit <Control>-C
(alternatively, the “Stop” button in RStudio).
To restart the app, run shiny::runApp(port = XXXX)
again (keeping the same port).
Note that Shiny runs any code in the pecan/shiny/<app>
directory at the moment the app is launched.
So, any changes you make to the code in server.R
and ui.R
or scripts loaded therein will take effect the next time the app is started.
If for whatever reason this doesn’t work with RStudio, you can always run R from the command line.
Also, note that the ability to forward ports (ssh -L
) may depend on the ssh
configuration of your remote machine.
These instructions have been tested on the PEcAn VM (v.1.5.2+).
8.3.6 devtools
package
Provides functions to simplify development
Documentation: The R devtools packate
other tips for devtools (from the documentation):
- Adding the following to your
~/.Rprofile
will load devtools when running R in interactive mode:
- Adding the following to your .Rpackages will allow devtools to recognize package by folder name, rather than directory path
# in this example, devhome is the pecan trunk directory
devhome <- "/home/dlebauer/R-dev/pecandev/"
list(
default = function(x) {
file.path(devhome, x, x)
},
"utils" = paste(devhome, "pecandev/utils", sep = "")
"common" = paste(devhome, "pecandev/common", sep = "")
"all" = paste(devhome, "pecandev/all", sep = "")
"ed" = paste(devhome, "pecandev/models/ed", sep = "")
"uncertainty" = paste(devhome, "modules/uncertainty", sep = "")
"meta.analysis" = paste(devhome, "modules/meta.analysis", sep = "")
"db" = paste(devhome, "db", sep = "")
)
Now, devtools can take pkg
as an argument instead of /path/to/pkg/
,
e.g. so you can use build("pkg")
instead of build("/path/to/pkg/")
8.4 Download and Compile PEcAn
Set R_LIBS_USER
# point R to personal lib folder
echo 'export R_LIBS_USER=${HOME}/R/library' >> ~/.profile
source ~/.profile
mkdir -p ${R_LIBS_USER}
8.4.1 Download, compile and install PEcAn from GitHub
# download pecan
cd
git clone https://github.com/PecanProject/pecan.git
# compile pecan
cd pecan
make
For more information on the capabilities of the PEcAn Makefile, check out our section on Updating PEcAn.
Following will run a small script to setup some hooks to prevent people from using the pecan demo user account to check in any code.
8.4.2 PEcAn Testrun
Do the run, this assumes you have installed the BETY database, sites tar file and SIPNET.
# create folder
cd
mkdir testrun.pecan
cd testrun.pecan
# copy example of pecan workflow and configuration file
cp ../pecan/tests/pecan32.sipnet.xml pecan.xml
cp ../pecan/scripts/workflow.R workflow.R
# exectute workflow
rm -rf pecan
./workflow.R pecan.xml
NB: pecan.xml is configured for the virtual machine, you will need to change the
8.5 Directory structure
8.5.1 Overview of PEcAn repository as of PEcAn 1.5.3
pecan/
+- base/ # Core functions
+- all # Dummy package to load all PEcAn R packages
+- db # Modules for querying the database
+- logger # Report warnings without killing workflows
+- qaqc # Model skill testing and integration testing
+- remote # Communicate with and execute models on local and remote hosts
+- settings # Functions to read and manipulate PEcAn settings files
+- utils # Misc. utility functions
+- visualization # Advanced PEcAn visualization module
+- workflow # functions to coordinate analysis steps
+- book_source/ # Main documentation and developer's guide
+- CHANGELOG.md # Running log of changes in each version of PEcAn
+- docker/ # Experimental replacement for PEcAn virtual machine
+- documentation # index_vm.html, references, other misc.
+- models/ # Wrappers to run models within PEcAn
+- ed/ # Wrapper scripts for running ED within PEcAn
+- sipnet/ # Wrapper scripts for running SIPNET within PEcAn
+- ... # Wrapper scripts for running [...] within PEcAn
+- template/ # Sample wrappers to copy and modify when adding a new model
+- modules # Core modules
+- allometry
+- data.atmosphere
+- data.hydrology
+- data.land
+- meta.analysis
+- priors
+- rtm
+- uncertainty
+- ...
+- scripts # R and Shell scripts for use with PEcAn
+- shiny/ # Interactive visualization of model results
+- tests/ # Settings files for host-specific integration tests
+- web # Main PEcAn website files
8.5.2 Generic R package structure:
see the R development wiki for more information on writing code and adding data.
+- DESCRIPTION # short description of the PEcAn library
+- R/ # location of R source code
+- man/ # Documentation (automatically compiled by Roxygen)
+- inst/ # files to be installed with package that aren't R functions
+- extdata/ # misc. data files (in misc. formats)
+- data/ # data used in testing and examples (saved as *.RData or *.rda files)
+- NAMESPACE # declaration of package imports and exports (automatically compiled by Roxygen)
+- tests/ # PEcAn testing scripts
+- testthat/ # nearly all tests should use the testthat framework and live here