library(betydata)
library(dplyr)Getting Started with betydata
- What data is available in betydata and how the 16 tables relate to each other
- How to explore trait and yield observations using dplyr
- Key concepts: traits, yields, QA/QC flags, and Plant Functional Types
What is betydata?
The betydata package provides offline access to public data from BETYdb, the Biofuel Ecophysiological Traits and Yields database. BETYdb is a centralized repository of plant trait measurements and crop yield data used in ecosystem modeling and agricultural research.
A trait is a measurable characteristic of a plant – for example, Specific Leaf Area (SLA, m2/kg), maximum carboxylation rate (Vcmax, umol/m2/s), or leaf nitrogen content (%). A yield is a measure of crop production per unit area (typically Mg/ha). Together, traits and yields form the foundation of ecosystem model parameterization and agricultural research.
Loading the Package
Data Architecture
The package contains 16 tables organized in three tiers:
data(package = "betydata")$results[, c("Item", "Title")] |>
as.data.frame() |>
knitr::kable()| Item | Title |
|---|---|
| citations | Literature citations from BETYdb |
| cultivars | Plant cultivars from BETYdb |
| cultivars_pfts | Cultivar-PFT mapping from BETYdb |
| entities | Entities from BETYdb |
| managements | Management practices from BETYdb |
| managements_treatments | Management-Treatment mapping from BETYdb |
| methods | Measurement methods from BETYdb |
| pfts | Plant Functional Types (PFTs) from BETYdb |
| pfts_priors | PFT-Prior mapping from BETYdb |
| pfts_species | PFT-Species mapping from BETYdb |
| priors | Prior distributions from BETYdb |
| sites | Research sites from BETYdb |
| species | Species taxonomy from BETYdb |
| traitsview | Traits and Yields from BETYdb |
| treatments | Experimental treatments from BETYdb |
| variables | Variable definitions from BETYdb |
The tables follow a relational structure:
traitsviewis the primary denormalized table (pre-joined for convenience)- Metadata tables (
species,sites,variables,citations, etc.) provide reference data - Relationship tables (
pfts_species,pfts_priors, etc.) are many-to-many junction tables
You can use traitsview for most analyses without joining anything. The metadata and relationship tables are available when you need additional detail or custom aggregations.
The Primary Table: traitsview
The traitsview table is a denormalized view combining traits and yields with associated metadata. Key analytical columns are placed first for convenient interactive use:
traitsviewKey Columns
| Column | Description | Example Values |
|---|---|---|
trait |
Variable name | SLA, Vcmax, Ayield |
mean |
Observed value | 22.5, 38.1 |
units |
Measurement units | m2/kg, umol/m2/s |
scientificname |
Full species name | Miscanthus x giganteus |
genus |
Genus | Miscanthus, Panicum |
sitename |
Research site | Energy Farm, Urbana IL |
author |
Citation author | Heaton 2008 |
checked |
QA/QC status (0 = unchecked, 1 = verified) | 0, 1 |
Basic Exploration
traitsview |>
count(trait, sort = TRUE) |>
head(15) |>
knitr::kable()| trait | n |
|---|---|
| Ayield | 8407 |
| leafN | 3645 |
| SLA | 3141 |
| c2n_leaf | 2510 |
| leafC | 2415 |
| SLN | 1345 |
| Vcmax | 1115 |
| height | 895 |
| soil_respiration_m2 | 874 |
| leaf_respiration_rate_m2 | 850 |
| Jmax | 849 |
| LAI | 744 |
| DBH | 733 |
| quantum_efficiency | 510 |
| leafP | 505 |
Data Quality: The checked Column
The checked column indicates data verification status:
1= Verified by an independent reviewer0= Not yet reviewed (use with appropriate caution)-1= Flagged as incorrect (excluded from this package)
All data in this package is public (BETYdb access_level = 4).
table(traitsview$checked, useNA = "ifany")
0 1
29383 14149
verified <- traitsview |>
filter(checked == 1)
nrow(verified)[1] 14149
Support Tables
Species Taxonomy
The species table contains 70,741 entries with full taxonomic information:
species |>
select(id, scientificname, genus, commonname)Variables (Trait Definitions)
The variables table documents units, descriptions, and valid ranges for each measured trait:
variables |>
filter(name %in% c("SLA", "Vcmax", "leaf_respiration_rate_m2", "Ayield")) |>
select(name, units, description)Sites
sites_with_climate <- sites |>
filter(!is.na(mat), !is.na(map))
nrow(sites_with_climate)[1] 453
Example: Bioenergy Crop Yields
bioenergy_genera <- c("Miscanthus", "Panicum", "Populus", "Salix", "Saccharum")
yields <- traitsview |>
filter(
trait == "Ayield",
genus %in% bioenergy_genera,
!is.na(mean)
) |>
select(genus, mean, units, sitename, author, citation_year, lat, lon)
yields |>
summarise(
n = n(),
mean_yield = round(mean(mean, na.rm = TRUE), 1),
sd_yield = round(sd(mean, na.rm = TRUE), 1),
.by = genus
) |>
knitr::kable(col.names = c("Genus", "N", "Mean Yield (Mg/ha)", "SD"))| Genus | N | Mean Yield (Mg/ha) | SD |
|---|---|---|---|
| Panicum | 2087 | 10.1 | 5.5 |
| Salix | 528 | 15.8 | 18.3 |
| Miscanthus | 1021 | 12.8 | 10.5 |
| Populus | 841 | 29.3 | 29.0 |
| Saccharum | 3578 | 34.9 | 22.1 |
Working with Plant Functional Types (PFTs)
A Plant Functional Type groups species with similar ecological characteristics for ecosystem modeling. Instead of parameterizing models for each species individually, PFTs like “temperate deciduous trees” or “C4 grasses” define shared parameter distributions. This approach is essential when species-level data is sparse and makes modeling tractable at large scales.
miscanthus_sp <- species |>
filter(genus == "Miscanthus") |>
pull(id)
pfts_species |>
filter(specie_id %in% miscanthus_sp) |>
left_join(pfts |> select(id, name), by = c("pft_id" = "id")) |>
distinct(name)Next Steps
| Vignette | Description |
|---|---|
vignette("common_analyses") |
Common analysis patterns with dplyr |
vignette("pfts-priors") |
Working with PFTs and Bayesian priors |
vignette("manuscript") |
Reproduce analyses from LeBauer et al. (2018) |
References
- LeBauer, D. S., et al. (2018). BETYdb: a yield, trait, and ecosystem service database applied to second-generation bioenergy feedstock production. GCB Bioenergy. doi:10.1111/gcbb.12420
- LeBauer, D. S., et al. (2013). Facilitating feedbacks between field measurements and ecosystem models. Ecological Monographs, 83(2), 133–154.
- BETYdb documentation: https://betydb.org