Getting Started with betydata

Author

David LeBauer and Akash B V

Published

March 1, 2026

What you will learn
  • What data is available in betydata and how the 16 tables relate to each other
  • How to explore trait and yield observations using dplyr
  • Key concepts: traits, yields, QA/QC flags, and Plant Functional Types

What is betydata?

The betydata package provides offline access to public data from BETYdb, the Biofuel Ecophysiological Traits and Yields database. BETYdb is a centralized repository of plant trait measurements and crop yield data used in ecosystem modeling and agricultural research.

A trait is a measurable characteristic of a plant – for example, Specific Leaf Area (SLA, m2/kg), maximum carboxylation rate (Vcmax, umol/m2/s), or leaf nitrogen content (%). A yield is a measure of crop production per unit area (typically Mg/ha). Together, traits and yields form the foundation of ecosystem model parameterization and agricultural research.

Loading the Package

library(betydata)
library(dplyr)

Data Architecture

The package contains 16 tables organized in three tiers:

data(package = "betydata")$results[, c("Item", "Title")] |>
  as.data.frame() |>
  knitr::kable()
Table 1: All tables available in betydata
Item Title
citations Literature citations from BETYdb
cultivars Plant cultivars from BETYdb
cultivars_pfts Cultivar-PFT mapping from BETYdb
entities Entities from BETYdb
managements Management practices from BETYdb
managements_treatments Management-Treatment mapping from BETYdb
methods Measurement methods from BETYdb
pfts Plant Functional Types (PFTs) from BETYdb
pfts_priors PFT-Prior mapping from BETYdb
pfts_species PFT-Species mapping from BETYdb
priors Prior distributions from BETYdb
sites Research sites from BETYdb
species Species taxonomy from BETYdb
traitsview Traits and Yields from BETYdb
treatments Experimental treatments from BETYdb
variables Variable definitions from BETYdb
Data Model

The tables follow a relational structure:

  • traitsview is the primary denormalized table (pre-joined for convenience)
  • Metadata tables (species, sites, variables, citations, etc.) provide reference data
  • Relationship tables (pfts_species, pfts_priors, etc.) are many-to-many junction tables

You can use traitsview for most analyses without joining anything. The metadata and relationship tables are available when you need additional detail or custom aggregations.

The Primary Table: traitsview

The traitsview table is a denormalized view combining traits and yields with associated metadata. Key analytical columns are placed first for convenient interactive use:

traitsview

Key Columns

Column Description Example Values
trait Variable name SLA, Vcmax, Ayield
mean Observed value 22.5, 38.1
units Measurement units m2/kg, umol/m2/s
scientificname Full species name Miscanthus x giganteus
genus Genus Miscanthus, Panicum
sitename Research site Energy Farm, Urbana IL
author Citation author Heaton 2008
checked QA/QC status (0 = unchecked, 1 = verified) 0, 1

Basic Exploration

traitsview |>
  count(trait, sort = TRUE) |>
  head(15) |>
  knitr::kable()
Table 2: Top 15 most common traits in betydata
trait n
Ayield 8407
leafN 3645
SLA 3141
c2n_leaf 2510
leafC 2415
SLN 1345
Vcmax 1115
height 895
soil_respiration_m2 874
leaf_respiration_rate_m2 850
Jmax 849
LAI 744
DBH 733
quantum_efficiency 510
leafP 505

Data Quality: The checked Column

Quality Control

The checked column indicates data verification status:

  • 1 = Verified by an independent reviewer
  • 0 = Not yet reviewed (use with appropriate caution)
  • -1 = Flagged as incorrect (excluded from this package)

All data in this package is public (BETYdb access_level = 4).

table(traitsview$checked, useNA = "ifany")

    0     1 
29383 14149 
verified <- traitsview |>
  filter(checked == 1)
nrow(verified)
[1] 14149

Support Tables

Species Taxonomy

The species table contains 70,741 entries with full taxonomic information:

species |>
  select(id, scientificname, genus, commonname)

Variables (Trait Definitions)

The variables table documents units, descriptions, and valid ranges for each measured trait:

variables |>
  filter(name %in% c("SLA", "Vcmax", "leaf_respiration_rate_m2", "Ayield")) |>
  select(name, units, description)

Sites

sites_with_climate <- sites |>
  filter(!is.na(mat), !is.na(map))
nrow(sites_with_climate)
[1] 453

Example: Bioenergy Crop Yields

bioenergy_genera <- c("Miscanthus", "Panicum", "Populus", "Salix", "Saccharum")

yields <- traitsview |>
  filter(
    trait == "Ayield",
    genus %in% bioenergy_genera,
    !is.na(mean)
  ) |>
  select(genus, mean, units, sitename, author, citation_year, lat, lon)

yields |>
  summarise(
    n = n(),
    mean_yield = round(mean(mean, na.rm = TRUE), 1),
    sd_yield = round(sd(mean, na.rm = TRUE), 1),
    .by = genus
  ) |>
  knitr::kable(col.names = c("Genus", "N", "Mean Yield (Mg/ha)", "SD"))
Table 3: Yield summary for key bioenergy genera
Genus N Mean Yield (Mg/ha) SD
Panicum 2087 10.1 5.5
Salix 528 15.8 18.3
Miscanthus 1021 12.8 10.5
Populus 841 29.3 29.0
Saccharum 3578 34.9 22.1

Working with Plant Functional Types (PFTs)

What is a PFT?

A Plant Functional Type groups species with similar ecological characteristics for ecosystem modeling. Instead of parameterizing models for each species individually, PFTs like “temperate deciduous trees” or “C4 grasses” define shared parameter distributions. This approach is essential when species-level data is sparse and makes modeling tractable at large scales.

miscanthus_sp <- species |>
  filter(genus == "Miscanthus") |>
  pull(id)

pfts_species |>
  filter(specie_id %in% miscanthus_sp) |>
  left_join(pfts |> select(id, name), by = c("pft_id" = "id")) |>
  distinct(name)

Next Steps

Vignette Description
vignette("common_analyses") Common analysis patterns with dplyr
vignette("pfts-priors") Working with PFTs and Bayesian priors
vignette("manuscript") Reproduce analyses from LeBauer et al. (2018)

References

  • LeBauer, D. S., et al. (2018). BETYdb: a yield, trait, and ecosystem service database applied to second-generation bioenergy feedstock production. GCB Bioenergy. doi:10.1111/gcbb.12420
  • LeBauer, D. S., et al. (2013). Facilitating feedbacks between field measurements and ecosystem models. Ecological Monographs, 83(2), 133–154.
  • BETYdb documentation: https://betydb.org