42 Simple Model-Data Comparisons

42.0.1 Author: Istem Fer, Tess McCabe

In this tutorial we will compare model outputs to data outside of the PEcAn web interface. The goal of this is to demonstrate how to perform additional analyses using PEcAn’s outputs. To do this you can download each of the Output files, and then perform the analyses using whatever software you prefer, or you can perform analyses directly on the PEcAn server itself. Here we’ll be analyzing model outputs in R using a browser-based version of RStudio that’s installed on the server

42.1 Starting RStudio Server

  1. Open RStudio Server in a new window at URL/rstudio

  2. The username is carya and the password is illinois.

  3. To open a new R script click File > New File > R Script

  4. Use the Files browser on the lower right pane to find where your run(s) are located

  • All PEcAn outputs are stored in the output folder. Click on this to open it up.

    • Within the outputs folder, there will be one folder for each workflow execution. For example, click to open the folder PEcAn_99000000001 if that’s your workflow ID

    • A workflow folder will have a few log and settings files (e.g. pecan.xml) and the following subfolders

run     contains all the inputs for each run
out     contains all the outputs from each run
pft     contains the parameter information for each PFT

Within both the run and out folders there will be one folder for each unique model run, where the folder name is the run ID. Click to open the out folder. For our simple case we only did one run so there should be only one folder (e.g. 99000000001). Click to open this folder.

  • Within this folder you will find, among other things, files of the format .nc. Each of these files contains one year of model output in the standard PEcAn netCDF format. This is the model output that we will use to compare to data.

42.2 Read in settings From an XML file

42.3 Read in model output from specific variables

The arguments to read.output are the run ID, the folder where the run is located, the start year, the end year, and the variables being read. The README file in the Input file dropdown menu of any successful run lists the run ID, the output folder, and the start and end year.

42.4 Compare model to flux observations

First load up the observations and take a look at the contents of the file

File_Path refers to where you stored your observational data. In this example the default file path is an Ameriflux dataset from Niwot Ridge.

File_format queries the database for the format your file is in. The defualt format ID “5000000002” is for csv files downloaded from the Ameriflux website. You could query for diffent kinds of formats that exist in bety or make your own.

Here 772 is the database site ID for Niwot Ridge Forest, which tells pecan where the data is from and what time zone to assign any time data read in.

Second apply a conservative u* filter to observations

Third align model output and observations

When we aligned the data, we got a dataframe with the variables we requested in a \(NEE.m\) and a \(NEE.o\) format. The \(.o\) is for observations, and the \(.m\) is for model. The posix column allows for easy plotting along a timeseries.

Fourth, plot model predictions vs. observations and compare this to a 1:1 line

Fifth, calculate the Root Mean Square Error (RMSE) between the model and the data

na.rm makes sure we don’t include missing or screened values in either time series.

Finally, plot time-series of both the model and data together

Bonus How would you compare aggregated data?

Try RMSE against monthly NEE instead of half-hourly. In this case, first average the values up to monthly in the observations. Then, use align_data to match the monthly timestep in model output.

NOTE: Align_data uses two seperate alignment function, match_timestep and mean_over_larger_timestep. Match_timestep will use only that data that is present in both the model and the observation. This is helpful for sparse observations. Mean_over_larger_timestep aggregates the values over the largest timestep present. If you were to look at averaged monthly data, you would use mean_over_larger_timestep.