Working with Plant Functional Types and Priors

Author

David LeBauer and Akash B V

Published

March 1, 2026

What you will learn

How Plant Functional Types (PFTs) organize species for ecosystem modeling
How prior distributions encode parameter knowledge before Bayesian inference
How to visualize priors and compare them to observed trait data

Background

Plant Functional Types

Ecosystem models simulate how plants exchange carbon, water, and energy with the atmosphere. These models require parameter values for photosynthesis rates, leaf properties, allocation fractions, and other ecophysiological quantities. Rather than parameterizing every species individually, models group species into Plant Functional Types (PFTs) – categories like “temperate deciduous trees” or “C4 grasses” – and estimate shared parameters for each group.

Bayesian Parameterization

Bayesian inference is used to estimate model parameters. The workflow is:

Prior distribution \(p(\theta)\): Encodes existing knowledge about a parameter \(\theta\) before looking at data (e.g. “SLA is around 20 m2/kg with considerable uncertainty”)
Likelihood \(p(D|\theta)\): How probable the observed trait data \(D\) is given parameter values
Posterior distribution \(p(\theta|D)\): Updated belief after combining prior and data via Bayes’ theorem:

\[ p(\theta | D) \propto p(D | \theta) \cdot p(\theta) \tag{1}\]

The tables in betydata provide the ingredients for this workflow: priors stores prior distributions, traitsview stores the observed data, and pfts/pfts_species/pfts_priors define which priors and species belong to each PFT.

Setup

library(betydata)
library(dplyr)
library(ggplot2)

Understanding PFTs

Available PFTs

nrow(pfts)

[1] 386

pfts |>
  select(id, name) |>
  head(20)

Species Membership

Each PFT contains species assigned by researchers. The pfts_species junction table maps PFTs to species:

species_per_pft <- pfts_species |>
  count(pft_id, name = "n_species")

summary(species_per_pft$n_species)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      1       1       2    1285      19   49626

miscanthus_pft <- pfts |>
  filter(grepl("miscanthus", name, ignore.case = TRUE))

if (nrow(miscanthus_pft) > 0) {
  miscanthus_species <- pfts_species |>
    filter(pft_id %in% miscanthus_pft$id) |>
    left_join(
      species |> select(id, scientificname),
      by = c("specie_id" = "id")
    )
  miscanthus_species
}

Working with Priors

Prior Distribution Structure

Each row in the priors table specifies a probability distribution for a model parameter. The key fields are:

Field	Description
`variable_id`	Which trait/parameter this prior describes
`distn`	Distribution family (e.g. `norm`, `gamma`, `lnorm`)
`parama`	First parameter (mean, shape, etc.)
`paramb`	Second parameter (sd, rate, etc.)

priors |>
  count(distn, sort = TRUE) |>
  knitr::kable()

Table 1: Distribution families used in BETYdb priors
distn	n
unif	325
norm	239
beta	99
lnorm	94
weibull	59
gamma	55
binom	7
exp	6
pois	5

priors |>
  left_join(variables |> select(id, name), by = c("variable_id" = "id")) |>
  count(name, sort = TRUE) |>
  head(15) |>
  knitr::kable()

Table 2: Top variables with prior distributions
name	n
AGEMX	21
DMAX	21
DMIN	21
Gmax	21
SPRTND	19
SPRTMX	17
FROST	15
SLTA	13
SLTB	13
MPLANT	11
SLA	11
stomatal_slope	11
HTMAX	10
Vcmax	10
c2n_leaf	10

Visualizing a Prior Distribution

Consider a prior for Specific Leaf Area (SLA). If the prior is \(\text{Gamma}(\alpha, \beta)\), then the density function is:

\[ f(x; \alpha, \beta) = \frac{\beta^\alpha}{\Gamma(\alpha)} x^{\alpha - 1} e^{-\beta x}, \quad x > 0 \tag{2}\]

sla_var_id <- variables |>
  filter(name == "SLA") |>
  pull(id)

sla_priors <- priors |>
  filter(variable_id %in% sla_var_id)

eval_prior_density <- function(p, x = NULL) {
  if (is.null(x)) {
    x <- switch(p$distn,
      "gamma" = seq(0, qgamma(0.999, p$parama, p$paramb), length.out = 1000),
      "norm" = seq(p$parama - 4 * p$paramb, p$parama + 4 * p$paramb, length.out = 1000),
      "lnorm" = seq(0.001, qlnorm(0.999, p$parama, p$paramb), length.out = 1000),
      "beta" = seq(0.001, 0.999, length.out = 1000),
      "unif" = seq(p$parama, p$paramb, length.out = 1000),
      "weibull" = seq(0, qweibull(0.999, p$parama, p$paramb), length.out = 1000),
      "exp" = seq(0, qexp(0.999, p$parama), length.out = 1000),
      seq(0, 100, length.out = 1000)
    )
  }

  y <- switch(p$distn,
    "gamma" = dgamma(x, shape = p$parama, rate = p$paramb),
    "norm" = dnorm(x, mean = p$parama, sd = p$paramb),
    "lnorm" = dlnorm(x, meanlog = p$parama, sdlog = p$paramb),
    "beta" = dbeta(x, shape1 = p$parama, shape2 = p$paramb),
    "unif" = dunif(x, min = p$parama, max = p$paramb),
    "weibull" = dweibull(x, shape = p$parama, scale = p$paramb),
    "exp" = dexp(x, rate = p$parama),
    rep(0, length(x))
  )

  list(x = x, y = y)
}

if (nrow(sla_priors) > 0) {
  p <- sla_priors[1, ]
  prior_result <- eval_prior_density(p)
  prior_x <- prior_result$x
  prior_y <- prior_result$y

  ggplot(data.frame(x = prior_x, y = prior_y), aes(x, y)) +
    geom_line(linewidth = 1, color = "#2c3e50") +
    labs(
      title = paste0("Prior: ", p$distn, "(", round(p$parama, 2), ", ", round(p$paramb, 2), ")"),
      x = "SLA (m2/kg)",
      y = "Density"
    ) +
    theme_minimal(base_size = 12)
}

Figure 1: Prior distribution for Specific Leaf Area (SLA)

Linking PFTs to Priors

The pfts_priors table connects PFTs to their associated prior distributions, completing the Bayesian parameterization setup:

\[ \text{PFT} \xrightarrow{\texttt{pfts\_priors}} \text{Prior}(\theta) \xrightarrow{\texttt{variable\_id}} \text{Trait} \]

priors_per_pft <- pfts_priors |>
  count(pft_id, name = "n_priors")

summary(priors_per_pft$n_priors)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      1       5      14      14      20      43

if (exists("miscanthus_pft") && nrow(miscanthus_pft) > 0) {
  pft_id_val <- miscanthus_pft$id[1]

  pft_priors <- pfts_priors |>
    filter(pft_id == pft_id_val) |>
    left_join(priors, by = c("prior_id" = "id")) |>
    left_join(variables |> select(id, name, units), by = c("variable_id" = "id"))

  pft_priors |>
    select(name, units, distn, parama, paramb) |>
    knitr::kable(digits = 3)
}

Table 3: Prior distributions for a Miscanthus PFT
name	units	distn	parama	paramb
leaf_respiration_rate_m2	umol [CO2] m-2 s-1	lnorm	0.632	0.65
cuticular_cond	umol [H2O] m-2 s-1	lnorm	8.400	0.90
stomatal_slope.BB	[ratio]	lnorm	1.240	0.28
SLA	m2 kg-1	weibull	2.060	19.00
Vcmax	umol [CO2] m-2 s-1	lnorm	3.750	0.30
growth_respiration_coefficient	mol [CO2] mol-1 [net assimilation]	beta	26.000	48.00
Jmax	umol [photons] m-2 s-1	weibull	2.000	150.00
c2n_leaf	[ratio]	gamma	4.180	0.13
quantum_efficiency	fraction	gamma	90.900	1580.00
chi_leaf	[ratio]	lnorm	0.180	0.43
extinction_coefficient_diffuse	NA	lnorm	-2.300	0.55

Comparing Prior to Data

A key validation step in Bayesian parameterization: compare the prior distribution to observed data. If the prior assigns negligible probability to regions where data actually falls, the prior may need revision.

Prior-Data Conflict

When the prior and data disagree strongly, the posterior will be dominated by whichever has lower variance. A very tight prior that disagrees with data can effectively ignore the data – a clear signal to revisit prior specification.

sla_data <- traitsview |>
  filter(
    trait == "SLA",
    genus %in% c("Miscanthus", "Panicum"),
    !is.na(mean),
    checked >= 0
  ) |>
  pull(mean)

if (length(sla_data) > 10 && exists("prior_x") && exists("prior_y")) {
  ggplot() +
    geom_histogram(
      data = data.frame(sla = sla_data),
      aes(x = sla, y = after_stat(density)),
      bins = 30, fill = "steelblue", alpha = 0.6
    ) +
    geom_line(
      data = data.frame(x = prior_x, y = prior_y),
      aes(x, y),
      color = "#e74c3c", linewidth = 1, linetype = "dashed"
    ) +
    labs(
      title = "Prior vs. Observed SLA",
      subtitle = "Dashed red = prior distribution | Blue bars = observed data (Miscanthus + Panicum)",
      x = "SLA (m2/kg)",
      y = "Density"
    ) +
    xlim(0, 80) +
    theme_minimal(base_size = 12)
}

Figure 2: SLA: Prior distribution vs. observed data for bioenergy grasses

Extracting Data for a PFT

A common workflow: retrieve all trait data for species belonging to a given PFT.

get_pft_data <- function(pft_name, trait_name = NULL) {
  pft_ids <- pfts |>
    filter(grepl(pft_name, name, ignore.case = TRUE)) |>
    pull(id)

  if (length(pft_ids) == 0) {
    warning("PFT not found: ", pft_name)
    return(tibble())
  }

  spp_ids <- pfts_species |>
    filter(pft_id %in% pft_ids) |>
    pull(specie_id)

  result <- traitsview |>
    filter(species_id %in% spp_ids)

  if (!is.null(trait_name)) {
    result <- result |>
      filter(trait == trait_name)
  }

  result
}

willow_sla <- get_pft_data("salix", "SLA")
if (nrow(willow_sla) > 0) {
  summary(willow_sla$mean)
}

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   7.14   10.40   11.54   12.43   13.32   26.40

Key Concepts Summary

Concept	Table(s)	Description
PFT definition	`pfts`	Named groupings of ecologically similar species
Species membership	`pfts_species`	Which species belong to which PFT
Prior specification	`priors`	Distribution family + parameters for each trait
PFT-Prior link	`pfts_priors`	Which priors apply to which PFT
Observed data	`traitsview`	Actual measurements for Bayesian updating

References

LeBauer, D. S., et al. (2018). BETYdb: a yield, trait, and ecosystem service database. GCB Bioenergy. doi:10.1111/gcbb.12420
LeBauer, D. S., et al. (2013). Facilitating feedbacks between field measurements and ecosystem models. Ecological Monographs, 83(2), 133–154.