Working with Plant Functional Types and Priors

Author

David LeBauer and Akash B V

Published

March 1, 2026

What you will learn
  • How Plant Functional Types (PFTs) organize species for ecosystem modeling
  • How prior distributions encode parameter knowledge before Bayesian inference
  • How to visualize priors and compare them to observed trait data

Background

Plant Functional Types

Ecosystem models simulate how plants exchange carbon, water, and energy with the atmosphere. These models require parameter values for photosynthesis rates, leaf properties, allocation fractions, and other ecophysiological quantities. Rather than parameterizing every species individually, models group species into Plant Functional Types (PFTs) – categories like “temperate deciduous trees” or “C4 grasses” – and estimate shared parameters for each group.

Bayesian Parameterization

Bayesian inference is used to estimate model parameters. The workflow is:

  1. Prior distribution \(p(\theta)\): Encodes existing knowledge about a parameter \(\theta\) before looking at data (e.g. “SLA is around 20 m2/kg with considerable uncertainty”)
  2. Likelihood \(p(D|\theta)\): How probable the observed trait data \(D\) is given parameter values
  3. Posterior distribution \(p(\theta|D)\): Updated belief after combining prior and data via Bayes’ theorem:

\[ p(\theta | D) \propto p(D | \theta) \cdot p(\theta) \tag{1}\]

The tables in betydata provide the ingredients for this workflow: priors stores prior distributions, traitsview stores the observed data, and pfts/pfts_species/pfts_priors define which priors and species belong to each PFT.

Setup

library(betydata)
library(dplyr)
library(ggplot2)

Understanding PFTs

Available PFTs

nrow(pfts)
[1] 386
pfts |>
  select(id, name) |>
  head(20)

Species Membership

Each PFT contains species assigned by researchers. The pfts_species junction table maps PFTs to species:

species_per_pft <- pfts_species |>
  count(pft_id, name = "n_species")

summary(species_per_pft$n_species)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      1       1       2    1285      19   49626 
miscanthus_pft <- pfts |>
  filter(grepl("miscanthus", name, ignore.case = TRUE))

if (nrow(miscanthus_pft) > 0) {
  miscanthus_species <- pfts_species |>
    filter(pft_id %in% miscanthus_pft$id) |>
    left_join(
      species |> select(id, scientificname),
      by = c("specie_id" = "id")
    )
  miscanthus_species
}

Working with Priors

Prior Distribution Structure

Each row in the priors table specifies a probability distribution for a model parameter. The key fields are:

Field Description
variable_id Which trait/parameter this prior describes
distn Distribution family (e.g. norm, gamma, lnorm)
parama First parameter (mean, shape, etc.)
paramb Second parameter (sd, rate, etc.)
priors |>
  count(distn, sort = TRUE) |>
  knitr::kable()
Table 1: Distribution families used in BETYdb priors
distn n
unif 325
norm 239
beta 99
lnorm 94
weibull 59
gamma 55
binom 7
exp 6
pois 5
priors |>
  left_join(variables |> select(id, name), by = c("variable_id" = "id")) |>
  count(name, sort = TRUE) |>
  head(15) |>
  knitr::kable()
Table 2: Top variables with prior distributions
name n
AGEMX 21
DMAX 21
DMIN 21
Gmax 21
SPRTND 19
SPRTMX 17
FROST 15
SLTA 13
SLTB 13
MPLANT 11
SLA 11
stomatal_slope 11
HTMAX 10
Vcmax 10
c2n_leaf 10

Visualizing a Prior Distribution

Consider a prior for Specific Leaf Area (SLA). If the prior is \(\text{Gamma}(\alpha, \beta)\), then the density function is:

\[ f(x; \alpha, \beta) = \frac{\beta^\alpha}{\Gamma(\alpha)} x^{\alpha - 1} e^{-\beta x}, \quad x > 0 \tag{2}\]

sla_var_id <- variables |>
  filter(name == "SLA") |>
  pull(id)

sla_priors <- priors |>
  filter(variable_id %in% sla_var_id)

eval_prior_density <- function(p, x = NULL) {
  if (is.null(x)) {
    x <- switch(p$distn,
      "gamma" = seq(0, qgamma(0.999, p$parama, p$paramb), length.out = 1000),
      "norm" = seq(p$parama - 4 * p$paramb, p$parama + 4 * p$paramb, length.out = 1000),
      "lnorm" = seq(0.001, qlnorm(0.999, p$parama, p$paramb), length.out = 1000),
      "beta" = seq(0.001, 0.999, length.out = 1000),
      "unif" = seq(p$parama, p$paramb, length.out = 1000),
      "weibull" = seq(0, qweibull(0.999, p$parama, p$paramb), length.out = 1000),
      "exp" = seq(0, qexp(0.999, p$parama), length.out = 1000),
      seq(0, 100, length.out = 1000)
    )
  }

  y <- switch(p$distn,
    "gamma" = dgamma(x, shape = p$parama, rate = p$paramb),
    "norm" = dnorm(x, mean = p$parama, sd = p$paramb),
    "lnorm" = dlnorm(x, meanlog = p$parama, sdlog = p$paramb),
    "beta" = dbeta(x, shape1 = p$parama, shape2 = p$paramb),
    "unif" = dunif(x, min = p$parama, max = p$paramb),
    "weibull" = dweibull(x, shape = p$parama, scale = p$paramb),
    "exp" = dexp(x, rate = p$parama),
    rep(0, length(x))
  )

  list(x = x, y = y)
}

if (nrow(sla_priors) > 0) {
  p <- sla_priors[1, ]
  prior_result <- eval_prior_density(p)
  prior_x <- prior_result$x
  prior_y <- prior_result$y

  ggplot(data.frame(x = prior_x, y = prior_y), aes(x, y)) +
    geom_line(linewidth = 1, color = "#2c3e50") +
    labs(
      title = paste0("Prior: ", p$distn, "(", round(p$parama, 2), ", ", round(p$paramb, 2), ")"),
      x = "SLA (m2/kg)",
      y = "Density"
    ) +
    theme_minimal(base_size = 12)
}

Figure 1: Prior distribution for Specific Leaf Area (SLA)

Linking PFTs to Priors

The pfts_priors table connects PFTs to their associated prior distributions, completing the Bayesian parameterization setup:

\[ \text{PFT} \xrightarrow{\texttt{pfts\_priors}} \text{Prior}(\theta) \xrightarrow{\texttt{variable\_id}} \text{Trait} \]

priors_per_pft <- pfts_priors |>
  count(pft_id, name = "n_priors")

summary(priors_per_pft$n_priors)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      1       5      14      14      20      43 
if (exists("miscanthus_pft") && nrow(miscanthus_pft) > 0) {
  pft_id_val <- miscanthus_pft$id[1]

  pft_priors <- pfts_priors |>
    filter(pft_id == pft_id_val) |>
    left_join(priors, by = c("prior_id" = "id")) |>
    left_join(variables |> select(id, name, units), by = c("variable_id" = "id"))

  pft_priors |>
    select(name, units, distn, parama, paramb) |>
    knitr::kable(digits = 3)
}
Table 3: Prior distributions for a Miscanthus PFT
name units distn parama paramb
leaf_respiration_rate_m2 umol [CO2] m-2 s-1 lnorm 0.632 0.65
cuticular_cond umol [H2O] m-2 s-1 lnorm 8.400 0.90
stomatal_slope.BB [ratio] lnorm 1.240 0.28
SLA m2 kg-1 weibull 2.060 19.00
Vcmax umol [CO2] m-2 s-1 lnorm 3.750 0.30
growth_respiration_coefficient mol [CO2] mol-1 [net assimilation] beta 26.000 48.00
Jmax umol [photons] m-2 s-1 weibull 2.000 150.00
c2n_leaf [ratio] gamma 4.180 0.13
quantum_efficiency fraction gamma 90.900 1580.00
chi_leaf [ratio] lnorm 0.180 0.43
extinction_coefficient_diffuse NA lnorm -2.300 0.55

Comparing Prior to Data

A key validation step in Bayesian parameterization: compare the prior distribution to observed data. If the prior assigns negligible probability to regions where data actually falls, the prior may need revision.

Prior-Data Conflict

When the prior and data disagree strongly, the posterior will be dominated by whichever has lower variance. A very tight prior that disagrees with data can effectively ignore the data – a clear signal to revisit prior specification.

sla_data <- traitsview |>
  filter(
    trait == "SLA",
    genus %in% c("Miscanthus", "Panicum"),
    !is.na(mean),
    checked >= 0
  ) |>
  pull(mean)

if (length(sla_data) > 10 && exists("prior_x") && exists("prior_y")) {
  ggplot() +
    geom_histogram(
      data = data.frame(sla = sla_data),
      aes(x = sla, y = after_stat(density)),
      bins = 30, fill = "steelblue", alpha = 0.6
    ) +
    geom_line(
      data = data.frame(x = prior_x, y = prior_y),
      aes(x, y),
      color = "#e74c3c", linewidth = 1, linetype = "dashed"
    ) +
    labs(
      title = "Prior vs. Observed SLA",
      subtitle = "Dashed red = prior distribution | Blue bars = observed data (Miscanthus + Panicum)",
      x = "SLA (m2/kg)",
      y = "Density"
    ) +
    xlim(0, 80) +
    theme_minimal(base_size = 12)
}

Figure 2: SLA: Prior distribution vs. observed data for bioenergy grasses

Extracting Data for a PFT

A common workflow: retrieve all trait data for species belonging to a given PFT.

get_pft_data <- function(pft_name, trait_name = NULL) {
  pft_ids <- pfts |>
    filter(grepl(pft_name, name, ignore.case = TRUE)) |>
    pull(id)

  if (length(pft_ids) == 0) {
    warning("PFT not found: ", pft_name)
    return(tibble())
  }

  spp_ids <- pfts_species |>
    filter(pft_id %in% pft_ids) |>
    pull(specie_id)

  result <- traitsview |>
    filter(species_id %in% spp_ids)

  if (!is.null(trait_name)) {
    result <- result |>
      filter(trait == trait_name)
  }

  result
}

willow_sla <- get_pft_data("salix", "SLA")
if (nrow(willow_sla) > 0) {
  summary(willow_sla$mean)
}
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   7.14   10.40   11.54   12.43   13.32   26.40 

Key Concepts Summary

Concept Table(s) Description
PFT definition pfts Named groupings of ecologically similar species
Species membership pfts_species Which species belong to which PFT
Prior specification priors Distribution family + parameters for each trait
PFT-Prior link pfts_priors Which priors apply to which PFT
Observed data traitsview Actual measurements for Bayesian updating

References

  • LeBauer, D. S., et al. (2018). BETYdb: a yield, trait, and ecosystem service database. GCB Bioenergy. doi:10.1111/gcbb.12420
  • LeBauer, D. S., et al. (2013). Facilitating feedbacks between field measurements and ecosystem models. Ecological Monographs, 83(2), 133–154.