42 Simple Model-Data Comparisons
42.1 Starting RStudio Server
Open RStudio Server in a new window at URL/rstudio
The username is carya and the password is illinois.
To open a new R script click File > New File > R Script
Use the Files browser on the lower right pane to find where your run(s) are located
All PEcAn outputs are stored in the output folder. Click on this to open it up.
Within the outputs folder, there will be one folder for each workflow execution. For example, click to open the folder PEcAn_99000000001 if that’s your workflow ID
A workflow folder will have a few log and settings files (e.g. pecan.xml) and the following subfolders
run contains all the inputs for each run
out contains all the outputs from each run
pft contains the parameter information for each PFT
Within both the run and out folders there will be one folder for each unique model run, where the folder name is the run ID. Click to open the out folder. For our simple case we only did one run so there should be only one folder (e.g. 99000000001). Click to open this folder.
- Within this folder you will find, among other things, files of the format
.nc. Each of these files contains one year of model output in the standard PEcAn netCDF format. This is the model output that we will use to compare to data.
42.2 Read in settings From an XML file
42.3 Read in model output from specific variables
The arguments to read.output are the run ID, the folder where the run is located, the start year, the end year, and the variables being read. The README file in the Input file dropdown menu of any successful run lists the run ID, the output folder, and the start and end year.
42.4 Compare model to flux observations
First load up the observations and take a look at the contents of the file
File_Path refers to where you stored your observational data. In this example the default file path is an Ameriflux dataset from Niwot Ridge.
File_format queries the database for the format your file is in. The defualt format ID “5000000002” is for csv files downloaded from the Ameriflux website. You could query for diffent kinds of formats that exist in bety or make your own.
Here 772 is the database site ID for Niwot Ridge Forest, which tells pecan where the data is from and what time zone to assign any time data read in.
Second apply a conservative u* filter to observations
Third align model output and observations
When we aligned the data, we got a dataframe with the variables we requested in a \(NEE.m\) and a \(NEE.o\) format. The \(.o\) is for observations, and the \(.m\) is for model. The posix column allows for easy plotting along a timeseries.
Fourth, plot model predictions vs. observations and compare this to a 1:1 line
Fifth, calculate the Root Mean Square Error (RMSE) between the model and the data
na.rm makes sure we don’t include missing or screened values in either time series.
Finally, plot time-series of both the model and data together
Bonus How would you compare aggregated data?
Try RMSE against monthly NEE instead of half-hourly. In this case, first average the values up to monthly in the observations. Then, use align_data to match the monthly timestep in model output.
NOTE: Align_data uses two seperate alignment function, match_timestep and mean_over_larger_timestep. Match_timestep will use only that data that is present in both the model and the observation. This is helpful for sparse observations. Mean_over_larger_timestep aggregates the values over the largest timestep present. If you were to look at averaged monthly data, you would use mean_over_larger_timestep.