AssimBatchVignette.Rmd
Only needs to be done the first time :
library(devtools)
install_github("PecanProject/pecan", subdir="all")
Currently, there are four ways of doing Parameter Data Assimilation (PDA) in PEcAn :
Which one to use?
bruteforce : You can choose bruteforce as a method to use PEcAn’s natively implemented Metropolis-Hastings Markov Chain Monte Carlo (MCMC) algorithm. This would perform a Bayesian MCMC on model parameters by proposing one parameter value at a time, and accepting or rejecting it according to the calculated likelihood value. This algorithm also has an adaptation functionality that can be turned on (recommended) and off (see below). As each (i-th) parameter is dependent of the previous (i-1-th) one, this algorithm can only be run sequentially (but different chains can be run in parallel). Therefore, if you have just one parameter to calibrate and a relatively fast model that can be run couple of hundreds (couple of thousands would even be better) of times within an hour, it is possible to use this algorithm.
bruteforce.bs : This algorithm is basically identical to the bruteforce, but rather than proposing parameters one at a time, it proposes new values for all parameters at once (“bs” stands for “block sampling”). If you have more than one parameter to calibrate and a relatively fast model, you can use this algorithm, preferably with the adaptation functionality turned on.
emulator : When a model is slow, it is practically not possible to run it many times in order to explore the parameter space and draw enough samples to converge the target distribution with bruteforce algorithms. Instead, we can run the model for a relatively smaller number of times with parameter values that have been carefully chosen to give good coverage of parameter space. Then we can interpolate the likelihood calculated for each of those runs to get a surface that “emulates” the true likelihood and perform regular MCMC (just like the “bruteforce” approach), except instead of actually running the model on every iteration to get a likelihood, this time we will just get an approximation from the likelihood emulator.
bayesian.tools : There are other MCMC algorithms with different proposal schemes and acceptance criterion than Metropolis-Hastings. BayesianTools is an R-package that includes MCMC and SMC samplers and other tools for Bayesian parameter calibration. If you choose bayesian.tools option, PEcAn framework will hand the PDA calculations over to the BayesianTools package. Although this package includes algorithms that are designed to explore the parameter space more efficiently than the regular MH-MCMC, you would still need a relatively faster model to use these algorithms. The BayesianTools R-package itself is currently under development, once it is fully integrated with PEcAn it will be possible to run some of those algorithms in parallel.
If you want to extend your previous bruteforce and bruteforce.bs MCMC-chains, you can add an <extension>
tag within your <assim.batch>
chunk after reading your post-PDA settings (which would be saved as “pecan.pda[UNIQUE_ID].xml” in your working directory) and set it to “longer”.
...
<method>bruteforce.bs</method>
<iter>10000</iter>
<method>bruteforce</method>
<chain>3</chain>
<extension>longer</extension>
...
If you are using methods other than bruteforce and bruteforce.bs, some additional tags and different settings may apply.
emulator would use additional tags:
...
<method>emulator</method>
<n.knot>20</n.knot>
<GPpckg>GPfit</GPpckg>
<chain>3</chain>
<extension>round</extension>
<knot.par>0.75</knot.par>
...
<n.knot>
Specifies the number of locations in parameter space to be sampled by the Latin Hypercube design. These locations are where the model will actually be run. In other words the model will be run for <n.knots>
times.
<GPpckg>
Specifies which R package to use for fitting a Gaussian process to interpolate the likelihood surface in between the calculated values that are obtained from actual model runs. Current options are kernlab
and GPfit
(recommended).
<chain>
Specifies the number of MCMC chains to be run.
<extension>
Specifies the extension type of additional emulator runs, should be skipped if this a first emulator run of its own. Otherwise it is possible to extend emulator PDA runs in two ways:
<extension>longer</extension>
using the same emulator, this extension run takes the MCMC sampling from where it was left in the previous emulator run and runs a longer MCMC chain.
<extension>round</extension>
this extension run proposes new points in the parameter space in addition to the previous ones, and builds a new emulator including these additional points for a new MCMC sampling. These new points can come from both your inital PDA prior and the posterior of your first round of emulator run. You can determine the percentage of new knots coming from the posterior of your previous run in the <knot.par>
tag. If you leave it empty, 75% of your new points will be drawn from the posterior of your previous run by default.
<file>
<input.id>...</input.id>
<path>
<path>...</path>
</path>
<likelihood>multipGauss</likelihood>
<ss.positive>TRUE</ss.positive>
<variable.id>...</variable.id>
<variable.name>
<variable.name>...</variable.name>
</variable.name>
</file>
</inputs>
<ss.positive>
When using the emulator, it is important to let the algorithm know that you have a likelihood whose sufficient statistics is zero bound.bayesian.tools would look for sampler specific settings that can be passed through the pecan.xml as a block under the <bt.settings>
tag. Currently, the available samplers in the BayesianTools package are:
The name of the chosen sampler would be passed under <sampler>
tag within the <bt.settings>
block :
It is not possible to use some of the samplers in this package for univariate cases. The ones that you can use for univariate cases are: “Metropolis”, “DE”, “DEzs” and “Twalk”.
For Metropolis variations:
...
<method>bayesian.tools</method>
<bt.settings>
<iter>10000</iter>
<sampler>Metropolis</sampler>
<DRlevels>1</DRlevels> <-- set 2 for Delayed Rejection
<optimize>FALSE</optimize> <-- set TRUE for pre-optimization
<adapt>FALSE</adapt> <-- set TRUE for Adaptive Metropolis
<adaptationNotBefore>500</adaptationNotBefore> <-- set for Adaptive Metropolis
</bt.settings>
...
Some of the samplers are also restartable : “Metropolis”, “DE”, “DEzs”, “DREAM”, “DREAMzs”, “Twalk”