37 Demo 02: Sensitivity and Uncertainty Analysis
In Demo 2 we will be looking at how PEcAn can use information about parameter uncertainty to perform three automated analyses:
- Ensemble Analysis: Repeat numerous model runs, each sampling from the parameter uncertainty, to generate a probability distribution of model projections. Allows us to put a confidence interval on the model
- Sensitivity Analysis: Repeats numerous model runs to assess how changes in model parameters will affect model outputs. Allows us to identify which parameters the model is most sensitive to.
- Uncertainty Analysis: Combines information about model sensitivity with information about parameter uncertainty to determine the contribution of each model parameter to the uncertainty in model outputs. Allow us to identify which parameters are driving model uncertainty.
37.1 Run Specification
Return to the main menu for the PEcAn web interface: URL > Run Models
Repeat the steps for site selection and run specification from Demo 01, but also click on “Advanced setup”, then click Next.
By clicking Advanced setup, PEcAn will first show an Analysis Menu, where we are going to specify new settings.
For an ensemble analysis, increase the number of runs in the ensemble, in this case set Runs to 50. In practice you would want to use a larger ensemble size (100-5000) than we are using in the demo. The ensemble analysis samples parameters from their posterior distributions to propagate this uncertainty into the model output.
PEcAn’s sensitivity analysis holds all parameters at their median value and then varies each parameter one-at-a-time based on the quantiles of the posterior distribution. PEcAn also includes a handy shortcut, which is the default behavior for the web interface, that converts a specified standard deviation into its Normal quantile equivalent (e.g. 1 and -1 are converted to 0.157 and 0.841). In this example set Sensitivity to -2,-1,1,2 (the median value, 0, occurs by default).
We also can tell PEcAn which variable to run the sensitivity on. Here, set Variables to NEE, so we can compare against flux tower NEE observations.
Click Next
37.1.1 Additional Outputs:
The PEcAn workflow will take considerably longer to complete since we have just asked for over a hundred model runs. Once the runs are complete you will return to the output visualization page were there will be a few new outputs to explore, as well as outputs that were present earlier that we’ll explore in greater details:
37.1.2 1. Run ID:
While the sensitivity and ensemble analyses synthesize across runs, you can also select individual runs from the Run ID menu. You can use the Graphs menu to visualize each individual run, or open individual runs in Shiny
37.1.3 2. Inputs:
This menu shows the contents of /run which lets you look at and download:
- A summary file (README.txt) describing each run: location, run ID, model, dates, whether it was in the sensitivity or ensemble analysis, variables modifed, etc.
- The model-specific input files fed into the model
- The jobs.sh file used to submit the model run
37.1.4 3. Outputs:
This menu shows the contents of /out. A number of files generated by the underlying ecosystem model are archived and available for download. These include:
- Output files in the standardized netCDF ([year].nc) that can be downloaded for visualization and analysis (R, Matlab, ncview, panoply, etc)
- Raw model output in model-specific format (e.g. sipnet.out).
- Logfile.txt contains job.sh & model error, warning, and informational messages
37.1.5 4. PFTs:
This menu shows the contents of /pft. There is a wide array of outputs available that are related to the process of estimating the model parameters and running sensitivity/uncertainty analyses for a specific Plant Functional Type.
- TRAITS: The Rdata files trait.data.Rdata and madata.Rdata are, respectively, the available trait data extracted from the database that was used to estimate the model parameters and that same data cleaned and formatted for the statistical code. The list of variables that are queried is determined by what variables have priors associated with them in the definition of the PFTs. Priors are output into prior.distns.Rdata. Likewise, the list of species that are associated with a PFT determines what subset of data is extracted out of all data matching a given variable name. Demo 3 will demonstrate how a PFT can be created or modified. To look at these files in RStudio click on these files to load them into your workspace. You can further examine them in the Environment window or accessing them at the command line. For example, try typing
names(trait.data)
as this will tell you what variables were extracted,names(trait.data$Amax)
will tell you the names of the columns in the Amax table, andsummary(trait.data$Amax)
will give you summary data about the Amax values. - META-ANALYSIS:
*.bug
: The evaluation of the meta-analysis is done using a Bayesian statistical software package called JAGS that is called by the R code. For each trait, the R code will generate a [trait].model.bug file that is the JAGS code for the meta-analysis itself. This code is generated on the fly, with PEcAn adding or subtracting the site, treatment, and greenhouse terms depending upon the presence of these effects in the data itself. If thetag is set to FALSE then all random effects will be turned off even if there are multiple sites. meta-analysis.log
contains a number of diagnostics, including the summary statistics of the model, an assessment of whether the posterior is consistent with the prior, and the status of the Gelman-Brooks-Rubin convergence statistic (which is ideally 1.0 but should be less than 1.1).ma.summaryplots.*.pdf
are collections of diagnostic plots produced in R after the above JAGS code is run that are useful in assessing the statistical model. Open up one of these pdfs to evaluate the shape of the posterior distributions (they should generally be unimodal), the convergence of the MCMC chains (all chains should be mixing well from the same distribution), and the autocorrelation of the samples (should be low).traits.mcmc.Rdata
contains the raw output from the statistical code. This includes samples from all of the parameters in the meta-analysis model, not just those that feed forward to the ecosystem, but also the variances, fixed effects, and random effects.post.distns.Rdata
stores a simple tables of the posterior distributions for all model parameters in terms of the name of the distribution and its parameters.posteriors.pdf
provides graphics showing, for each model parameter, the prior distribution, the data, the smoothed histogram of the posterior distribution (labeled post), and the best-fit analytical approximation to that smoothed histogram (labeled approx). Open posteriors.pdf and compare the posteriors to the priors and data
- SENSITIVITY ANALYSIS
sensitivity.analysis.[RunID].[Variable].[StartYear].[EndYear].pdf
shows the raw data points from univariate one-at-a-time analyses and spline fits through the points. Open this file to determine which parameters are most and least sensitive
- UNCERTAINTY ANALYSIS
variance.decomposition.[RunID].[Variable].[StartYear].[EndYear].pdf
, contains three columns, the coefficient of variation (normalized posterior variance), the elasticity (normalized sensitivity), and the partial standard deviation of each model parameter. Open this file for BOTH the soil and conifer PFTS and answer the following questions:- The Variance Decomposition graph is sorted by the variable explaining the largest amount of variability in the model output (right hand column). From this graph identify the top-tier parameters that you would target for future constraint.
- A parameter can be important because it is highly sensitive, because it is highly uncertain, or both. Identify parameters in your output that meet each of these criteria. Additionally, identify parameters that are highly uncertain but unimportant (due to low sensitivity) and those that are highly sensitive but unimportant (due to low uncertainty).
- Parameter constraints could come from further literature synthesis, from direct measurement of the trait, or from data assimilation. Choose the parameter that you think provides the most efficient means of reducing model uncertainty and propose how you might best reduce uncertainty in this process. In making this choice remember that not all processes in models can be directly observed, and that the cost-per-sample for different measurements can vary tremendously (and thus the parameter you measure next is not always the one contributing the most to model variability). Also consider the role of parameter uncertainty versus model sensitivity in justifying your choice of what parameters to constrain.
37.1.6 PEcAn Files:
This menu shows the contents of the root workflow folder that are not in one of the folders indicated above. It mostly contains log files from the PEcAn workflow that are useful if the workflow generates an error, and as metadata & provenance (a detailed record of how data was generated).
STATUS
gives a summary of the steps of the workflow, the time they took, and whether they were successfulpecan.*.xml
are PEcAn settings filesworkflow.R
is the workflow scriptworkflow.Rout
is the corresponding log filesamples.Rdata
contains the parameter values used in the runs. This file contains two data objects, sa.samples and ensemble.samples, that are the parameter values for the sensitivity analysis and ensemble runs respectivelysensitivity.output.[RunID].[Variable].[StartYear].[EndYear].Rdata
contains the object sensitivity.output which is the model outputs corresponding to the parameter values in sa.samples.- ENSEMBLE ANALYSIS
ensemble.Rdata
contains contains the object ensemble.output, which is the model predictions at the parameter values given in ensemble.samples.ensemble.analysis.[RunID].[Variable].[StarYear].[EndYear].pdf
contains the ensemble prediction as both a histogram and a boxplot.ensemble.ts.[RunID].[Variable].[StartYear].[EndYear].pdf
contains a time-series plot of the ensemble mean, median, and 95% CI
37.2 Global Sensitivity: Shiny
Navigate to URL/shiny/global-sensitivity.
This app uses the output from the ENSEMBLE runs to perform a global Monte Carlo sensitivity analysis. There are three modes controlled by Output type:
- Pairwise looks at the relationship between a specific parameter (X) and output (Y)
- All parameters looks at how all parameters affect a specific output (Y)
- All variables looks at how all outputs are affected by a specific parameter(X)
In all of these analyses, the app also fits a linear regression to these scatterplots and reports a number of summary statistics. Among these, the slope is an indicator of global sensitivity and the R2 is an indicator of the contribution to global uncertainty
37.2.1 Next Steps
The next set of tutorials will focus on the process of data assimilation and parameter estimation. The next two steps are in “.Rmd” files which can be viewed online.
37.2.2 Assimilation ‘by hand’
Explore how model error changes as a function of parameter value (i.e. data assimilation ‘by hand’)
37.2.3 MCMC Concepts
Explore Bayesian MCMC concepts using the photosynthesis module
37.2.4 More info about tools, analyses, and specific tasks…
Additional information about specific tasks (adding sites, models, data; software updates; etc.) and analyses (e.g. data assimilation) can be found in the PEcAn documentation
If you encounter a problem with PEcAn that’s not covered in the documentation, or if PEcAn is missing functionality you need, please search known bugs and issues, submit a bug report, or ask a question in our chat room. Additional questions can be directed to the project manager