8 Developer guide

8.1 Updating PEcAn Code and Bety Database

Release notes for all releases can be found here.

This page will only list any steps you have to do to upgrade an existing system. When updating PEcAn it is highly encouraged to update BETY. You can find instructions on how to do this, as well on how to update the database in the Updating BETYdb gitbook page.

8.1.1 Updating PEcAn

The latest version of PEcAn code can be obtained from the PEcAn repository on GitHub:

cd pecan        # If you are not already in the PEcAn directory
git pull

The PEcAn build system is based on GNU Make. The simplest way to install is to run make from inside the PEcAn directory. This will update the documentation for all packages and install them, as well as all required dependencies.

For more control, the following make commands are available:

make document – Use devtools::document to update the documentation for all package. Under the hood, this uses the roxygen2 documentation system.
make install – Install all packages and their dependnencies using devtools::install. By default, this only installs packages that have had their code changed and any dependent packages.
make check – Perform a rigorous check of packages using devtools::check
make test – Run all unit tests (based on testthat package) for all packages, using devtools::test
make clean – Remove the make build cache, which is used to track which packages have changed. Cache files are stored in the .doc, .install, .check, and .test subdirectories in the PEcAn main directory. Running make clean will force the next invocation of make commands to operate on all PEcAn packages, regardless of changes.

The following are some additional make tricks that may be useful:

Install, check, document, or test a specific package – make .<cmd>/<pkg-dir>; e.g. make .install/utils or make .check/modules/rtm
Force make to run, even if package has not changed – make -B <command>
Run make commands in parallel – make -j<ncores>; e.g. make -j4 install to install packages using four parallel processes.

All instructions for the make build system are contained in the Makefile in the PEcAn root directory. For full documentation on make, see the man pages by running man make from a terminal.

8.2 Git and GitHub Workflow

Using Git

8.2.1 Using Git

This document describes the steps required to download PEcAn, make changes to code, and submit your changes.

If you are new to GitHub or to PEcAn, start with the one-time set-up instructions under Before any work is done. Also see the excellent tutorials and references in the Git) section right below this list and at the bootom in References.
To make trivial changes, see Quick and Easy.
To make a few changes to the code, start with the Basic Workflow.
To make substantial changes and/or if plan to contribute over time see Recommended Workflow: A new branch for each change.

8.2.1.1 Git

Git is a free & open source, distributed version control system designed to handle everything from small to very large projects with speed and efficiency. Every Git clone is a full-fledged repository with complete history and full revision tracking capabilities, not dependent on network access or a central server. Branching and merging are fast and easy to do.

A good place to start is the GitHub 5 minute illustrated tutorial. In addition, there are three fun tutorials for learning git:

Learn Git is a great web-based interactive tutorial.
LearnGitBranching
TryGit.

URLs In the rest of the document will use specific URL’s to clone the code. There a few URL’s you can use to clone a project, using https, ssh and git. You can use either https or git to clone a repository and write to it. The git protocol is read-only. This document describes the steps required to download PEcAn, make changes to code, and submit your changes.

8.2.1.2 PEcAn Project and Github

Organization Repository: https://github.com/organizations/PecanProject
PEcAn source code: https://github.com/PecanProject/pecan.git
BETYdb source code: https://github.com/PecanProject/bety.git

These instructions apply to other repositories too.

8.2.1.3 PEcAn Project Branches

We follow branch organization laid out on this page.

In short, there are three main branches you must be aware of:

develop - Main Branch containing the latest code. This is the main branch you will make changes to.
master - Branch containing the latest stable code. DO NOT MAKE CHANGES TO THIS BRANCH.
release/vX.X.X - Named branches containing code specific to a release. Only make changes to this branch if you are fixing a bug on a release branch.

8.2.1.4 Milestones, Issues, Tasks

The Milestones, issues, and tasks can be used to organize specific features or research projects. In general, there is a heirarchy:

milestones (Big picture, “Epic”): contains many issues, organized by release.
issues (Specific features / bugs, “Story”): may contain a list of tasks; represent
task list (to do list, “Tasks”): list of steps required to close an issue, e.g.:

* [ ] first do this

* [ ] then this

* [ ] completed when x and y

8.2.1.5 Quick and Easy

The easiest approach is to use GitHub’s browser based workflow. This is useful when your change is a few lines, if you are editing a wiki, or if the edit is trivial (and won’t break the code). The GitHub documentation is here but it is simple: finding the page or file you want to edit, click “edit” and then the GitHub web application will automatically forking and branch, then allow you to submit a pull request. However, it should be noted that unless you are a member of the PEcAn project that the “edit” button will not be active and you’ll want to follow the workflow described below for forking and then submitting a pull request.

8.2.1.6 Recommended Git Workflow

Each feature should be in its own branch (for example each issue is a branch, names of branches are often the issue in a bug tracking system).

Commit and Push Frequency On your branch, commit at minimum once a day before you push changes: even better: every time you reach a stopping point and move to a new issue. best: any time that you have done work that you do not want to re-do. Remember, pushing changes to your branch is like saving a draft. Submit a pull request when you are done.

8.2.1.7 Before any work is done

The first step below only needs to be done once when you first start working on the PEcAn code. The steps below that need to be done to set up PEcAn on your computer, and would need to be repeated if you move to a new computer. If you are working from the PEcAn VM, you can skip the “git clone” since the PEcAn code is already installed.

Most people will not be able to work in the PEcAn repository directly and will need to create a fork of the PEcAn source code in their own folder. To fork PEcAn into your own github space (github help: “fork a repo”). This forked repository will allow you to create branches and commit changes back to GitHub and create pull requests to the develop branch of PEcAn.

The forked repository is the only way for external people to commit code back to PEcAn and BETY. The pull request will start a review process that will eventually result in the code being merged into the main copy of the codebase. See https://help.github.com/articles/fork-a-repo for more information, especially on how to keep your fork up to date with respect to the original. (Rstudio users should also see Git + Rstudio, below)

You can setup SSH keys to make it easier to commit cod back to GitHub. This might especially be true if you are working from a cluster, see set up ssh keys

Introduce yourself to GIT

git config --global user.name "FULLNAME" git config --global user.email you@yourdomain.example.com

Fork PEcAn on GitHub. Go to the PEcAn source code and click on the Fork button in the upper right. This will create a copy of PEcAn in your personal space.
Clone to your local machine via command line

git clone git@github.com:<username>/pecan.git

If this does not work, try the https method

git clone https://github.com/PecanProject/pecan.git

Define upstream repository

cd pecan
git remote add upstream git@github.com:PecanProject/pecan.git

8.2.1.8 During development:

commit often;
each commit can address 0 or 1 issue; many commits can reference an issue
ensure that all tests are passing before anything is pushed into develop.

8.2.1.9 Basic Workflow

This workflow is for educational purposes only. Please use the Recommended Workflow if you plan on contributing to PEcAn. This workflow does not include creating branches, a feature we would like you to use. 1. Get the latest code from the main repository

git pull upstream develop

Do some coding
Commit after each chunk of code (multiple times a day)

git commit -m "<some descriptive information about what was done; references/fixes gh-X>"

Push to YOUR Github (when a feature is working, a set of bugs are fixed, or you need to share progress with others)

git push origin develop

Before submitting code back to the main repository, make sure that code compiles from the main directory.

make

submit pull request with a reference to related issue;

also see github documentation

8.2.1.10 Recommended Workflow: A new branch for each change

Make sure you start in develop

git checkout develop

Make sure develop is up to date

git pull upstream develop

Run the PEcAn MAKEFILE to compile code from the main directory.

make

Create a branch and switch to it

git checkout -b <branchname>

Work/commit/etc

git add <file_that_was_changed.R>

git commit -m "<some descriptive information about what was done>"

Make sure that code compiles and documentation updated. The make document command will run roxygenise.

make document make

Push this branch to your github space

git push origin <branchname>

submit pull request with [[link commits to issues|Using-Git#link-commits-to-issuess]];

also see explanation in this PecanProject/bety issue and github documentation

8.2.1.11 After pull request is merged

Make sure you start in master

git checkout develop

delete branch remotely

git push origin --delete <branchname>

delete branch locally

git branch -D <branchname>

8.2.1.12 Fixing a release Branch

If you would like to make changes to a release branch, you must follow a different workflow, as the release branch will not contain the latest code on develop and must remain seperate.

Fetch upstream remote branches

git fetch upstream

Checkout the correct release branch

git checkout -b release/vX.Y.Z

Compile Code with make

make

Make changes and commit them

git add <changed_file.R> git commit -m "Describe changes"

Compile and make roxygen changes make make document
Commit and push any files that were changed by make document
Make a pull request. It is essential that you compare your pull request to the remote release branch, NOT the develop branch.

8.2.1.13 Link commits to issues

You can reference and close issues from comments, pull requests, and commit messages. This should be done when you commit code that is related to or will close/fix an existing issue.

There are two ways to do this. One easy way is to include the following text in your commit message:

Github
to close: “closes gh-xxx” (or syn. close, closed, fixes, fix, fixed)
to reference: just the issue number (e.g. “gh-xxx”)

8.2.1.14 Other Useful Git Commands:

GIT encourages branching “early and often”
First pull from develop
Branch before working on feature
One branch per feature
You can switch easily between branches
Merge feature into main line when branch done

If during above process you want to work on something else, commit all your code, create a new branch, and work on new branch.

Delete a branch: git branch -d <name of branch>
To push a branch git: push -u origin`
To check out a branch:

git fetch origin
git checkout --track origin/<name of branch>

Show graph of commits:

git log --graph --oneline --all

8.2.1.15 Tags

Git supports two types of tags: lightweight and annotated. For more information see the Tagging Chapter in the Git documentation.

Lightweight tags are useful, but here we discuss the annotated tags that are used for marking stable versions, major releases, and versions associated with published results.

The basic command is git tag. The -a flag means ‘annotated’ and -m is used before a message. Here is an example:

git tag -a v0.6 -m "stable version with foo and bar features, used in the foobar publication by Bob"

Adding a tag to the a remote repository must be done explicitly with a push, e.g.

git push v0.6

To use a tagged version, just checkout:

git checkout v0.6

To tag an earlier commit, just append the commit SHA to the command, e.g.

git tag -a v0.99 -m "last version before 1.0" 9fceb02

Using GitHub The easiest way to get working with GitHub is by installing the GitHub client. For instructions for your specific OS and download of the GitHub client, see https://help.github.com/articles/set-up-git. This will help you set up an SSH key to push code back to GitHub. To check out a project you do not need to have an ssh key and you can use the https or git url to check out the code.

8.2.1.16 Git + Rstudio

Rstudio is nicely integrated with many development tools, including git and GitHub. It is quite easy to check out source code from within the Rstudio program or browser. The Rstudio documentation includes useful overviews of version control and R package development.

Once you have git installed on your computer (see the Rstudio version control documentation for instructions), you can use the following steps to install the PEcAn source code in Rstudio.

8.2.1.17 Creating a Read-only version:

This is a fast way to clone the repository that does not support contributing new changes (this can be done with further modification).

install Rstudio (www.rstudio.com)
click (upper right) project

create project
version control
Git - clone a project from a Git Repository
paste https://www.github.com/PecanProject/pecan
choose working dir. for repo

8.2.1.18 For development:

create account on github
create a fork of the PEcAn repository to your own account https://www.github.com/pecanproject/pecan
install Rstudio (www.rstudio.com)
generate an ssh key

in Rstudio:
- Tools -> Options -> Git/SVN -> "create RSA key"
View public key -> ctrl+C to copy
in GitHub
go to ssh settings
-> 'add ssh key' -> ctrl+V to paste -> 'add key'

Create project in Rstudio

project (upper right) -> create project -> version control -> Git - clone a project from a Git Repository
paste repository url git@github.com:<username>/pecan.git>
choose working dir. for repository

8.2.1.19 References

8.2.1.20 Git Documentation

Scott Chacon, ‘Pro Git book’, http://git-scm.com/book
GitHub help pages, https://help.github.com/
Main GIT page http://git-scm.com/documentation
Another set of pages about branching, http://sandofsky.com/blog/git-workflow.html
Stackoverflow highest voted questions tagged “git”

8.2.1.21 GitHub Documentation

When in doubt, the first step is to click the “Help” button at the top of the page.

GitHub Flow by Scott Chacon (Git evangelist and Ruby developer working on GitHub.com)
GitHub FAQ
Using Pull Requests
SSH Keys

8.2.2 GitHub use with PEcAn

In this section, development topics are introduced and discussed. PEcAn code lives within the If you are looking for an issue to work on, take a look through issues labled “good first issue”. To get started you will want to review

We use GitHub to track development.

To learn about GitHub, it is worth taking some time to read through the FAQ. When in doubt, the first step is to click the “Help” button at the top of the page.

To address specific people, use a github feature called @mentions e.g. write @dlebauer, @robkooper, @mdietze, or @serbinsh … in the issue to alert the user as described in the GitHub documentation on notifications

8.2.2.1 Bugs, Issues, Features, etc.

8.2.2.2 Reporting a bug

(For developers) work through debugging.
Once you have identified a problem, that you can not resolve, you can write a bug report
Write a bug report
submit the bug report
If you do find the answer, explain the resolution (in the issue) and close the issue

8.2.2.3 Required content

Note:

a bug is only a bug if it is reproducible
clear bug reports save time

Clear, specific title
Description -

What you did
What you expected to happen
What actually happened
What does work, under what conditions does it fail?
Reproduction steps - minimum steps required to reproduce the bug

additional materials that could help identify the cause:

screen shots
stack traces, logs, scripts, output
specific code and data / settings / configuration files required to reproduce the bug
environment (operating system, browser, hardware)

8.2.2.4 Requesting a feature

(from The Pragmatic Programmer, available as ebook through UI libraries, hardcopy on David’s bookshelf)

focus on “user stories”, e.g. specific use cases
Be as specific as possible,
Here is an example:

Bob is at www.mysite.edu/maps
map of the the region (based on user location, e.g. US, Asia, etc)
option to “use current location” is provided, if clicked, map zooms in to, e.g. state or county level
for site run:
1. option to select existing site or specify point by lat/lon
2. option to specify a bounding box and grid resolution in either lat/lon or polar stereographic.
asked to specify start and end times in terms of year, month, day, hour, minute. Time is recorded in UTC not local time, this should be indicated.

8.2.2.5 Closing an issue

Definition of “Done”

test
documentation

when issue is resolved:

status is changed to “resolved”
assignee is changed to original author

if original author agrees that issue has been resolved

original author changes status to “closed”

except for trivial issues, issues are only closed by the author

8.2.2.6 When to submit an issue?

Ideally, non-trivial code changes will be linked to an issue and a commit.

This requires creating issues for each task, making small commits, and referencing the issue within your commit message. Issues can be created on GitHub. These issues can be linked to commits by adding text such as fixes gh-5).

Rationale: This workflow is a small upfront investment that reduces error and time spent re-creating and debugging errors. Associating issues and commits, makes it easier to identify why a change was made, and potential bugs that could arise when the code is changed. In addition, knowing which issue you are working on clarifies the scope and objectives of your current task.

8.3 Coding Practices

8.3.1 Coding Style

Consistent coding style improves readability and reduces errors in shared code.

R does not have an official style guide, but Hadley Wickham provides one that is well thought out and widely adopted. Advanced R: Coding Style.

Both the Wickham text and this page are derived from Google’s R Style Guide.

8.3.1.1 Use Roxygen2 documentation

This is the standard method of documentation used in PEcAn development, it provides inline documentation similar to doxygen. Even trivial functions should be documented.

See Roxygen2.

8.3.1.2 Write your name at the top

Any function that you create or make a meaningful contribution to should have your name listed after the author tag in the function documentation.

8.3.1.3 Use testthat testing package

See Unit_Testing for instructions, and Advanced R: Tests.

tests provide support for documentation - they define what a function is (and is not) expected to do
all functions need tests to ensure basic functionality is maintained during development.
all bugs should have a test that reproduces the bug, and the test should pass before bug is closed

8.3.1.4 Don’t use shortcuts

R provides many shortcuts that are useful when coding interactively, or for writing scripts. However, these can make code more difficult to read and can cause problems when written into packages.

8.3.1.5 Function Names (`verb.noun`)

Following convention established in PEcAn 0.1, we use the all lowercase with periods to separate words. They should generally have a verb.noun format, such as query.traits, get.samples, etc.

8.3.1.6 File Names

File names should end in .R, .Rdata, or .rds (as appropriate) and should be meaningful, e.g. named after the primary functions that they contain. There should be a separate file for each major high-level function to aid in identifying the contents of files in a directory.

8.3.1.7 Use “<-” as an assignment operator

Because most R code uses <- (except where = is required), we will use <- = is reserved for function arguments

8.3.1.8 Use Spaces

around all binary operators (=, +, -, <-, etc.).
after but not before a comma

8.3.1.9 Use curly braces

The option to omit curly braces is another shortcut that makes code easier to write but harder to read and more prone to error.

8.3.1.10 Package Dependencies

In the source code for PEcAn functions, all functions that are not from base R or the current package must be called with explicit namespacing; i.e. package::function (e.g. ncdf4::nc_open(...), dplyr::select(), PEcAn.logger::logger.warn()). This is intended to maximize clarity for current and future developers (including yourself), and to make it easier to quickly identify (and possibly remove) external dependencies.

In addition, it may be a good idea to call some base R functions with known, common namespace conflicts this way as well. For instance, if you want to use base R’s filter function, it’s a good idea to write it as stats::filter to avoid unintentional conflicts with dplyr::filter.

The one exception to this rule is infix operators (e.g. magrittr::"%>%") which cannot be conveniently namespaced. These functions should be imported using the Roxygen @importFrom tag. For example:

#' My function
#'
#' @param a First param
#' @param b Second param
#' @returns Something
#' @importFrom magrittr %>%
#' @export
f <- myfunction(a, b) {
  something(a) %>% something_else(b)
}

Never use library or require inside package functions.

Any package dependencies added in this way should be added to the Imports: list in the package DESCRIPTION file. Do not use Depends: unless you have a very good reason. The Imports list should be sorted alphabetically, with each package on its own line. It is also a good idea to include version requirements in the Imports list (e.g. dplyr (>=0.7)).

External packages that do not provide essential functionality can be relegated to Suggests instead of Imports. In particular, consider this for packages that are large, difficult to install, and/or bring in a large number of their own dependencies. Functions using these kinds of dependencies should check for their availability with requireNamespace and fail informatively in their absence. For example:

g <- myfunction() {
  if (!requireNamespace("BayesianTools", quietly = TRUE) {
    PEcAn.logger::logger.severe(
      "`BayesianTools` package required but not found.",
      "Please make sure it is installed before using `g`.")
  })
  BayesianTools::do_stuff(...)
}

8.3.2 Logging

During development we often add many print statements to check to see how the code is doing, what is happening, what intermediate results there are etc. When done with the development it would be nice to turn this additional code off, but have the ability to quickly turn it back on if we discover a problem. This is where logging comes into play. Logging allows us to use “rules” to say what information should be shown. For example when I am working on the code to create graphs, I do not have to see any debugging information about the SQL command being sent, however trying to figure out what goes wrong during a SQL statement it would be nice to show the SQL statements without adding any additional code.

8.3.2.1 PEcAn logging functions

These logger family of functions are more sophisticated, and can be used in place of stop, warn, print, and similar functions. The logger functions make it easier to print to a system log file.

The file test.logger.R provides descriptive examples
This query provides an current overview of functions that use logging
logger functions (in order of increasing level):
logger.debug
logger.info
logger.warn
logger.error
the logger.setLevel function sets the level at which a message will be printed
logger.setLevel("DEBUG") will print messages from all logger functions
logger.setLevel("ERROR") will only print messages from logger.error
logger.setLevel("INFO") and logger.setLevel("WARN") shows messages from logger.<level> and higher functions, e.g. logger.setLevel("WARN") shows messages from logger.warn and logger.error
logger.setLevel("OFF") suppresses all logger messages
To print all messages to console, use logger.setUseConsole(TRUE)

8.3.2.2 Other R logging packages

This section is for reference - these functions should not be used in PEcAn, as they are redundant with the logger.* functions described above

R does provide a basic logging capability using stop, warning and message. These allow to print message (and stop execution in case of stop). However there is not an easy method to redirect the logging information to a file, or turn the logging information on and off. This is where one of the following packages comes into play. The packages themselves are very similar since they try to emulate log4j.

Both of the following packages use a hierarchic loggers, meaning that if you change the level of displayed level of logging at one level all levels below it will update their logging.

8.3.2.2.1 `logging`

The logging development is done at http://logging.r-forge.r-project.org/ and more information is located at http://cran.r-project.org/web/packages/logging/index.html . To install use the following command:

install.packages("logging", repos="http://R-Forge.R-project.org")

This has my preference pure based on documentation.

8.3.2.3 `futile.logger`

The second logging package is http://cran.r-project.org/web/packages/futile.logger/ and is eerily similar to logging (as a matter of fact logging is based on futile).

8.3.2.3.1 Example Usage

To be able to use the loggers there needs to be some initialization done. Neither package allows to read it from a configuration file, so we might want to use the pecan.xml file to set it up. The setup will always be somewhat the same:

# load library
library(logging)
logReset()

# add handlers, responsible for actually printing/saving the messages
addHandler(writeToConsole)
addHandler(writeToFile, file="file.log")

# setup root logger with INFO
setLevel('INFO')

# make all of PEcAn print debug messages
setLevel('DEBUG', getLogger('PEcAn'))

# only print info and above for the SQL part of PEcAn
setLevel('INFO', getLogger('PEcAn.SQL'))

To now use logging in the code you can use the following code:

pl <- getLogger('PEcAn.MetaAnalysis.function1')
pl$info("This is an INFO message.")
pl$debug("The value for x=%d", x)
pl$error("Something bad happened and I am scared now.")

loginfo("This is an INFO message.", logger="PEcAn.MetaAnalysis.function1")
logdebug("The value for x=%d", x, logger="PEcAn.MetaAnalysis.function1")
logerror("Something bad happened and I am scared now.", logger="PEcAn.MetaAnalysis.function1")

8.3.3 Package Data

8.3.3.1 Summary:

Files with the following extensions will be read by R as data:

plain R code in .R and .r files are sourced using source()
text tables in .tab, .txt, .csv files are read using read() ** objects in R image files: .RData, .rda are loaded using load()
capitalization matters
all objects in foo.RData are loaded into environment
pro: easiset way to store objects in R format
con: format is application (R) specific

Details are in ?data, which is mostly a copy of Data section of Writing R Extensions.

8.3.3.2 Accessing data

Data in the [data] directory will be accessed in the following ways,

efficient way: (especially for large data sets) using the data function:

data(foo) # accesses data with, e.g. load(foo.RData), read(foo.csv), or source(foo.R)

easy way: by adding the following line to the package DESCRIPTION: note: this should be used with caution or it can cause difficulty as discussed in redmine issue #1118

LazyData: TRUE

From the R help page:

Currently, a limited number of data formats can be accessed using the data function by placing one of the following filetypes in a packages’ data directory: * files ending .R or .r are source()d in, with the R working directory changed temporarily to the directory containing the respective file. (data ensures that the utils package is attached, in case it had been run via utils::data.) * files ending .RData or .rda are load()ed. * files ending .tab, .txt or .TXT are read using read.table(..., header = TRUE), and hence result in a data frame. * files ending .csv or .CSV are read using read.table(..., header = TRUE, sep = ';'), and also result in a data frame.

If your data does not fall in those 4 categories, or you can use the system.file function to get access to the data:

system.file("data", "ed.trait.dictionary.csv", package="PEcAn.utils")
[1] "/home/kooper/R/x86_64-pc-linux-gnu-library/2.15/PEcAn.utils/data/ed.trait.dictionary.csv"

The arguments are folder, filename(s) and then package. It will return the fully qualified path name to a file in a package, in this case it points to the trait data. This is almost the same as the data function, however we can now use any function to read the file, such as read.csv instead of read.csv2 which seems to be the default of data. This also allows us to store arbitrary files in the data folder, such as the the bug file and load it when we need it.

8.3.3.2.1 Examples of data in PEcAn packages

outputs: [/modules/uncertainties/data/output.RData]
parameter samples [/modules/uncertainties/data/samples.RData]

8.3.4 Roxygen2

This is the standard method of documentation used in PEcAn development, it provides inline documentation similar to doxygen.

8.3.4.1 Canonical references:

Must Read: R package development by Hadley Wickham:
Object Documentation
Package Metadata
Roxygen2 Documentation
Roxygen2 Package Documentation
GitHub

8.3.4.2 Basic Roxygen2 instructions:

Section headers link to “Writing R extensions” which provides in-depth documentation. This is provided as an overview and quick reference.

8.3.4.3 Tags

tags are preceeded by ##'
tags required by R: ** title tag is required, along with actual title ** param one for each parameter, should be defined ** return must state what function returns (or nothing, if something occurs as a side effect
tags strongly suggested for most functions: ** author ** examples can be similar to test cases.
optional tags: ** export required if function is used by another package ** import can import a required function from another package (if package is not loaded or other function is not exported) ** seealso suggests related functions. These can be linked using \code{link{}}

8.3.4.4 Text markup

8.3.4.4.1 Formatting

\bold{}
\emph{} italics

8.3.4.4.2 Links

\url{www.url.com} or \href{url}{text} for links
\code{\link{thisfn}} links to function “thisfn” in the same package
\code{\link{foo::thatfn}} links to function “thatfn” in package “foo”
\pkg{package_name}

8.3.4.4.3 Math

\eqn{a+b=c} uses LaTex to format an inline equation
\deqn{a+b=c} uses LaTex to format displayed equation
\deqn{latex}{ascii} and \eqn{latex}{ascii} can be used to provide different versions in latex and ascii.

8.3.4.4.4 Lists

\enumerate{
\item A database consists of one or more records, each with one or
more named fields.
\item Regular lines start with a non-whitespace character.
\item Records are separated by one or more empty lines.
}
\itemize and \enumerate commands may be nested.

8.3.4.4.5 “Tables”:http://cran.r-project.org/doc/manuals/R-exts.html#Lists-and-tables

\tabular{rlll}{
[,1] \tab Ozone \tab numeric \tab Ozone (ppb)\cr
[,2] \tab Solar.R \tab numeric \tab Solar R (lang)\cr
[,3] \tab Wind \tab numeric \tab Wind (mph)\cr
[,4] \tab Temp \tab numeric \tab Temperature (degrees F)\cr
[,5] \tab Month \tab numeric \tab Month (1--12)\cr
[,6] \tab Day \tab numeric \tab Day of month (1--31)
}

8.3.4.5 Example

Here is an example documented function, myfun

##' My function adds three numbers
##'
##' A great function for demonstrating Roxygen documentation
##' @param a numeric
##' @param b numeric
##' @param c numeric
##' @return d, numeric sum of a + b + c
##' @export
##' @author David LeBauer
##' @examples
##' myfun(1,2,3)
##' \dontrun{myfun(NULL)}
myfun <- function(a, b, c){
  d <- a + b + c
  return(d)
}

In emacs, with the cursor inside the function, the keybinding C-x O will generate an outline or update the Roxygen2 documentation.

8.3.4.6 Updating documentation

After adding documentation run the following command (replacing common with the name of the folder you want to update): ** In R using devtools to call roxygenize:

require(devtools)
document("common")

8.3.5 Testing

PEcAn uses the testthat package developed by Hadley Wickham. Hadley has written instructions for using this package in his Testing chapter.

8.3.5.1 Rationale

makes development easier
provides working documentation of expected functionality
saves time by allowing computer to take over error checking once a test has been made
improves code quality
Further reading: Aruliah et al 2012 Best Practices for Scientific Computing

8.3.5.2 Tests makes development easier and less error prone

Testing makes it easier to develop by organizing everything you are already doing anyway - but integrating it into the testing and documentation. With a codebase like PEcAn, it is often difficult to get started. You have to figure out

what was I doing yesterday?
what do I want to do today?
what existing functions do I need to edit?
what are the arguments to these functions (and what are examples of valid arguments)
what packages are affected
where is a logical place to put files used in testing

8.3.5.3 Quick Start:

decide what you want to do today
identify the issue in github (if none exists, create one)
to work on issue 99, create a new branch called “github99” or some descriptive name… Today we will enable an existing function, make.cheas to make goat.cheddar. We will know that we are done by the color and taste.
```
git branch goat-cheddar
git checkout goat-cheddar
```
open existing (or create new) file in inst/tests/. If working on code in “myfunction” or a set of functions in “R/myfile.R”, the file should be named accordingly, e.g. “inst/tests/test.myfile.R”
if you are lucky, the function has already been tested and has some examples.
if not, you may need to create a minimal example, often requiring a settings file. The default settings file can be obtained in this way:
```
settings <- read.settings(system.file("extdata/test.settings.xml", package = "PEcAn.utils"))
```

write what you want to do

test_that("make.cheas can make cheese",{
  goat.cheddar <- make.cheas(source = 'goat', style = 'cheddar')
  expect_equal(color(goat.cheddar), "orange")
  expect_is(object = goat.cheddar, class = "cheese")
  expect_true(all(c("sharp", "creamy") %in% taste(goat.cheddar)))
}

now edit the goat.cheddar function until it makes savory, creamy, orange cheese.
commit often
update documentation and test
commit again
when complete, merge, and push

8.3.5.4 Test files

Many of PEcAn’s functions require inputs that are provided as data. These can be in the /data or the /inst/extdata folders of a package. Data that are not package specific should be placed in the PEcAn.all or PEcAn.utils files.

Some useful conventions:

8.3.5.5 Settings

A generic settings can be found in the PEcAn.all package

settings.xml <- system.file("pecan.biocro.xml", package = "PEcAn.BIOCRO")
settings <- read.settings(settings.xml)

database settings can be specified, and tests run only if a connection is available

We currently use the following database to run tests against; tests that require access to a database should check db.exists() and be skipped if it returns FALSE to avoid failed tests on systems that do not have the database installed.

settings$database <- list(userid = "bety", 
                          passwd = "bety", 
                          name = "bety",     # database name 
                          host = "localhost" # server name)
test_that(..., {
  skip_if_not(db.exists(settings$database))
  ## write tests here
})

instructions for installing this are available on the VM creation wiki
examples can be found in the PEcAn.DB package (base/db/tests/testthat/).
Model specific settings can go in the model-specific module, for example:

settings.xml <- system.file("extdata/pecan.biocro.xml", package = "PEcAn.BIOCRO")
settings <- read.settings(settings.xml)

test-specific settings:

settings text can be specified inline:

settings.text <- "
  <pecan>
    <nocheck>nope</nocheck> ## allows bypass of checks in the read.settings functions
    <pfts>
      <pft>
        <name>ebifarm.pavi</name>
        <outdir>test/</outdir>
      </pft>
    </pfts>
    <outdir>test/</outdir>
    <database>
      <userid>bety</userid>
      <passwd>bety</passwd>
      <location>localhost</location>
      <name>bety</name>
    </database>
  </pecan>"
settings <- read.settings(settings.text)

values in settings can be updated:

settings <- read.settings(settings.text)
settings$outdir <- "/tmp" ## or any other settings

8.3.5.6 Helper functions created to make testing easier

tryl returns FALSE if function gives error
temp.settings creates temporary settings file
test.remote returns TRUE if remote connection is available
db.exists returns TRUE if connection to database is available

8.3.5.7 When should I test?

A test should be written for each of the following situations:

Each bug should get a regression test.

The first step in handling a bug is to write code that reproduces the error
This code becomes the test
most important when error could re-appear
essential when error silently produces invalid results

Every time a (non-trivial) function is created or edited

Write tests that indicate how the function should perform
- example: expect_equal(sum(1,1), 2) indicates that the sum function should take the sum of its arguments
Write tests for cases under which the function should throw an error
example: expect_error(sum("foo"))
better : expect_error(sum("foo"), "invalid 'type' (character)")

8.3.5.8 What types of testing are important to understand?

8.3.5.9 Unit Testing / Test Driven Development

Tests are only as good as the test

write test
write code

8.3.5.10 Regression Testing

When a bug is found,

write a test that finds the bug (the minimum test required to make the test fail)
fix the bug
bug is fixed when test passes

8.3.5.11 How should I test in R? The testthat package.

tests are found in ~/pecan/<packagename>/inst/tests, for example utils/inst/tests/

See attached file and http://r-pkgs.had.co.nz/tests.html for details on how to use the testthat package.

8.3.5.11.1 List of Expectations

Full	Abbreviation
expect_that(x, is_true())	expect_true(x)
expect_that(x, is_false())	expect_false(x)
expect_that(x, is_a(y))	expect_is(x, y)
expect_that(x, equals(y))	expect_equal(x, y)
expect_that(x, is_equivalent_to(y))	expect_equivalent(x, y)
expect_that(x, is_identical_to(y))	expect_identical(x, y)
expect_that(x, matches(y))	expect_matches(x, y)
expect_that(x, prints_text(y))	expect_output(x, y)
expect_that(x, shows_message(y))	expect_message(x, y)
expect_that(x, gives_warning(y))	expect_warning(x, y)
expect_that(x, throws_error(y))	expect_error(x, y)

8.3.5.11.2 How to run tests

add the following to “pecan/tests/testthat.R”

library(testthat)
library(mypackage)

test_check("mypackage")

8.3.5.12 basic use of the testthat package

Here is an example of tests (these should be placed in <packagename>/tests/testthat/test-<sourcefilename>.R:

test_that("mathematical operators plus and minus work as expected",{
  expect_equal(sum(1,1), 2)
  expect_equal(sum(-1,-1), -2)
  expect_equal(sum(1,NA), NA)
  expect_error(sum("cat"))
  set.seed(0)
  expect_equal(sum(matrix(1:100)), sum(data.frame(1:100)))
})

test_that("different testing functions work, giving excuse to demonstrate",{
  expect_identical(1, 1)
  expect_identical(numeric(1), integer(1))
  expect_equivalent(numeric(1), integer(1))
  expect_warning(mean('1'))
  expect_that(mean('1'), gives_warning("argument is not numeric or logical: returning NA"))
  expect_warning(mean('1'), "argument is not numeric or logical: returning NA")
  expect_message(message("a"), "a")
})

8.3.5.12.1 Script testing

It is useful to add tests to a script during development. This allows you to test that the code is doing what you expect it to do.

* here is a fake script using the iris data set

test_that("the iris data set has the same basic features as before",{
  expect_equal(dim(iris), c(150,5))
  expect_that(iris$Sepal.Length, is_a("numeric"))
  expect_is(iris$Sepal.Length, "numeric")#equivalent to prev. line
  expect_is(iris$Species, "factor")
})

iris.color <- data.frame(Species = c("setosa", "versicolor", "virginica"),
                         color = c("pink", "blue", "orange"))

newiris <- merge(iris, iris.color)
iris.model <- lm(Petal.Length ~ color, data = newiris)

test_that("changes to Iris code occurred as expected",{
  expect_that(dim(newiris), equals(c(150, 6)))
  expect_that(unique(newiris$color),
              is_identical_to(unique(iris.color$color)))
  expect_equivalent(iris.model$coefficients["(Intercept)"], 4.26)
})

8.3.5.12.2 Function testing

Testing of a new function, as.sequence. The function and documentation are in source:R/utils.R and the tests are in source:tests/test.utils.R.

Recently, I made the function as.sequence to turn any vector into a sequence, with custom handling of NA’s:

function(x, na.rm = TRUE){
  x2 <- as.integer(factor(x, unique(x)))
  if(all(is.na(x2))){
    x2 <- rep(1, length(x2))
  }
  if(na.rm == TRUE){
    x2[is.na(x2)] <- max(x2, na.rm = TRUE) + 1
  }
  return(x2)
}

The next step was to add documentation and test. Many people find it more efficient to write tests before writing the function. This is true, but it also requires more discipline. I wrote these tests to handle the variety of cases that I had observed.

As currently used, the function is exposed to a fairly restricted set of options - results of downloads from the database and transformations.

test_that(“as.sequence works”;{
 expect_identical(as.sequence(c(“a”, “b”)), 1:2)
 expect_identical(as.sequence(c(“a”, NA)), 1:2)
 expect_equal(as.sequence(c(“a”, NA), na.rm = FALSE), c(1,NA))
 expect_equal(as.sequence(c(NA,NA)), c(1,1))
})

8.3.5.13 Testing the Shiny Server

Shiny can be difficult to debug because, when run as a web service, the R output is hidden in system log files that are hard to find and read. One useful approach to debugging is to use port forwarding, as follows.

First, on the remote machine (including the VM), make sure R’s working directory is set to the directory of the Shiny app (e.g., setwd(/path/to/pecan/shiny/WorkflowPlots), or just open the app as an RStudio project). Then, in the R console, run the app as:

shiny::runApp(port = XXXX)
# E.g. shiny::runApp(port = 5638)

Then, on your local machine, open a terminal and run the following command, matching XXXX to the port above and YYYY to any unused port on your local machine (any 4-digit number should work).

ssh -L YYYY:localhost:XXXX <remote connection>
# E.g., for the PEcAn VM, given the above port:
# ssh -L 5639:localhost:5638 carya@localhost -p 6422

Now, in a web browser on your local machine, browse to localhost:YYYY (e.g., localhost:5639) to run whatever app you started with shiny::runApp in the previous step. All of the output should display in the R console where the shiny::runApp command was executed. Note that this includes any print, message, logger.*, etc. statements in your Shiny app.

If the Shiny app hits an R error, the backtrace should include a line like Hit error at of server.R#LXX – that XX being a line number that you can use to track down the error. To return from the error to a normal R prompt, hit <Control>-C (alternatively, the “Stop” button in RStudio). To restart the app, run shiny::runApp(port = XXXX) again (keeping the same port).

Note that Shiny runs any code in the pecan/shiny/<app> directory at the moment the app is launched. So, any changes you make to the code in server.R and ui.R or scripts loaded therein will take effect the next time the app is started.

If for whatever reason this doesn’t work with RStudio, you can always run R from the command line. Also, note that the ability to forward ports (ssh -L) may depend on the ssh configuration of your remote machine. These instructions have been tested on the PEcAn VM (v.1.5.2+).

8.3.6 `devtools` package

Provides functions to simplify development

Documentation: The R devtools packate

load_all("pkg")
document("pkg")
test("pkg")
install("pkg")
build("pkg")

other tips for devtools (from the documentation):

Adding the following to your ~/.Rprofile will load devtools when running R in interactive mode:

# load devtools by default
if (interactive()) {
  suppressMessages(require(devtools))
}

Adding the following to your .Rpackages will allow devtools to recognize package by folder name, rather than directory path

# in this example, devhome is the pecan trunk directory 
devhome <- "/home/dlebauer/R-dev/pecandev/"
list(
    default = function(x) {
      file.path(devhome, x, x)
    }, 
  "utils" = paste(devhome, "pecandev/utils", sep = "")
  "common" = paste(devhome, "pecandev/common", sep = "")
  "all" = paste(devhome, "pecandev/all", sep = "")
  "ed" = paste(devhome, "pecandev/models/ed", sep = "")
  "uncertainty" = paste(devhome, "modules/uncertainty", sep = "")
  "meta.analysis" = paste(devhome, "modules/meta.analysis", sep = "")
  "db" = paste(devhome, "db", sep = "")
)

Now, devtools can take pkg as an argument instead of /path/to/pkg/, e.g. so you can use build("pkg") instead of build("/path/to/pkg/")

8.4 Download and Compile PEcAn

Set R_LIBS_USER

CRAN Reference

# point R to personal lib folder
echo 'export R_LIBS_USER=${HOME}/R/library' >> ~/.profile
source ~/.profile
mkdir -p ${R_LIBS_USER}

8.4.1 Download, compile and install PEcAn from GitHub

# download pecan
cd
git clone https://github.com/PecanProject/pecan.git

# compile pecan
cd pecan
make

For more information on the capabilities of the PEcAn Makefile, check out our section on Updating PEcAn.

Following will run a small script to setup some hooks to prevent people from using the pecan demo user account to check in any code.

# prevent pecan user from checking in code
./scripts/create-hooks.sh

8.4.2 PEcAn Testrun

Do the run, this assumes you have installed the BETY database, sites tar file and SIPNET.

# create folder
cd
mkdir testrun.pecan
cd testrun.pecan

# copy example of pecan workflow and configuration file
cp ../pecan/tests/pecan32.sipnet.xml pecan.xml
cp ../pecan/scripts/workflow.R workflow.R

# exectute workflow
rm -rf pecan
./workflow.R pecan.xml

NB: pecan.xml is configured for the virtual machine, you will need to change the field from ‘/home/carya/’ to wherever you installed your ‘sites’, usually $HOME

8.5 Directory structure

8.5.1 Overview of PEcAn repository as of PEcAn 1.5.3

pecan/
 +- base/          # Core functions
    +- all         # Dummy package to load all PEcAn R packages
    +- db          # Modules for querying the database
    +- logger      # Report warnings without killing workflows
    +- qaqc        # Model skill testing and integration testing
    +- remote      # Communicate with and execute models on local and remote hosts
    +- settings    # Functions to read and manipulate PEcAn settings files
    +- utils       # Misc. utility functions
    +- visualization # Advanced PEcAn visualization module
    +- workflow    # functions to coordinate analysis steps
 +- book_source/   # Main documentation and developer's guide
 +- CHANGELOG.md   # Running log of changes in each version of PEcAn
 +- docker/        # Experimental replacement for PEcAn virtual machine
 +- documentation  # index_vm.html, references, other misc.
 +- models/        # Wrappers to run models within PEcAn
    +- ed/         # Wrapper scripts for running ED within PEcAn
    +- sipnet/     # Wrapper scripts for running SIPNET within PEcAn
    +- ...         # Wrapper scripts for running [...] within PEcAn
    +- template/   # Sample wrappers to copy and modify when adding a new model
 +- modules        # Core modules
    +- allometry
    +- data.atmosphere
    +- data.hydrology
    +- data.land
    +- meta.analysis
    +- priors
    +- rtm
    +- uncertainty
    +- ...
 +- scripts        # R and Shell scripts for use with PEcAn
 +- shiny/         # Interactive visualization of model results
 +- tests/         # Settings files for host-specific integration tests
 +- web            # Main PEcAn website files

8.5.2 Generic R package structure:

see the R development wiki for more information on writing code and adding data.

 +- DESCRIPTION    # short description of the PEcAn library
 +- R/             # location of R source code
 +- man/           # Documentation (automatically compiled by Roxygen)
 +- inst/          # files to be installed with package that aren't R functions
    +- extdata/    # misc. data files (in misc. formats)
 +- data/          # data used in testing and examples (saved as *.RData or *.rda files)
 +- NAMESPACE      # declaration of package imports and exports (automatically compiled by Roxygen)
 +- tests/         # PEcAn testing scripts
   +- testthat/    # nearly all tests should use the testthat framework and live here

8 Developer guide

8.1 Updating PEcAn Code and Bety Database

8.1.1 Updating PEcAn

8.2 Git and GitHub Workflow

8.2.1 Using Git

8.2.1.1 Git

8.2.1.2 PEcAn Project and Github

8.2.1.3 PEcAn Project Branches

8.2.1.4 Milestones, Issues, Tasks

8.2.1.5 Quick and Easy

8.2.1.6 Recommended Git Workflow

8.2.1.7 Before any work is done

8.2.1.8 During development:

8.2.1.9 Basic Workflow

8.2.1.10 Recommended Workflow: A new branch for each change

8.2.1.11 After pull request is merged

8.2.1.12 Fixing a release Branch

8.2.1.13 Link commits to issues

8.2.1.14 Other Useful Git Commands:

8.2.1.15 Tags

8.2.1.16 Git + Rstudio

8.2.1.17 Creating a Read-only version:

8.2.1.18 For development:

8.2.1.19 References

8.2.1.20 Git Documentation

8.2.1.21 GitHub Documentation

8.2.2 GitHub use with PEcAn

8.2.2.1 Bugs, Issues, Features, etc.

8.2.2.2 Reporting a bug

8.2.2.3 Required content

8.2.2.4 Requesting a feature

8.2.2.5 Closing an issue

8.2.2.6 When to submit an issue?

8.3 Coding Practices

8.3.1 Coding Style

8.3.1.1 Use Roxygen2 documentation

8.3.1.2 Write your name at the top

8.3.1.3 Use testthat testing package

8.3.1.4 Don’t use shortcuts

8.3.1.5 Function Names (verb.noun)

8.3.1.6 File Names

8.3.1.7 Use “<-” as an assignment operator

8.3.1.8 Use Spaces

8.3.1.9 Use curly braces

8.3.1.10 Package Dependencies

8.3.2 Logging

8.3.2.1 PEcAn logging functions

8.3.2.2 Other R logging packages

8.3.2.2.1 logging

8.3.2.3 futile.logger

8.3.2.3.1 Example Usage

8.3.3 Package Data

8.3.3.1 Summary:

8.3.3.2 Accessing data

8.3.3.2.1 Examples of data in PEcAn packages

8.3.4 Roxygen2

8.3.4.1 Canonical references:

8.3.4.2 Basic Roxygen2 instructions:

8.3.4.3 Tags

8.3.4.4 Text markup

8.3.4.4.1 Formatting

8.3.4.4.2 Links

8.3.4.4.3 Math

8.3.4.4.4 Lists

8.3.4.4.5 “Tables”:http://cran.r-project.org/doc/manuals/R-exts.html#Lists-and-tables

8.3.4.5 Example

8.3.4.6 Updating documentation

8.3.5 Testing

8.3.5.1 Rationale

8.3.5.2 Tests makes development easier and less error prone

8.3.5.3 Quick Start:

8.3.5.4 Test files

8.3.5.5 Settings

8.3.5.6 Helper functions created to make testing easier

8.3.5.7 When should I test?

8.3.5.8 What types of testing are important to understand?

8.3.5.9 Unit Testing / Test Driven Development

8.3.5.10 Regression Testing

8.3.5.11 How should I test in R? The testthat package.

8.3.5.11.1 List of Expectations

8.3.1.5 Function Names (`verb.noun`)

8.3.2.2.1 `logging`

8.3.2.3 `futile.logger`

8.3.6 `devtools` package