Common Analyses with betydata

Author

David LeBauer and Akash B V

Published

March 1, 2026

What you will learn

How to extract and summarize yield data for specific genera
How to link management practices (fertilization, planting) to yield observations
Patterns for site-level aggregation, author-based queries, and variable lookups

Setup

library(betydata)
library(dplyr)

Extracting Yield Data for a Genus

A common starting point is pulling yield observations for a particular genus and summarizing them. The Ayield trait represents above-ground annual yield in Mg/ha.

miscanthus_yields <- traitsview |>
  filter(
    genus == "Miscanthus",
    trait == "Ayield"
  ) |>
  select(id, mean, date, sitename, scientificname)

miscanthus_yields

nrow(miscanthus_yields)

[1] 1021

Tibble Printing

All tables are tibbles, which display the first 10 rows by default. With key columns ordered first (trait, mean, units, scientificname, genus), the default output is immediately informative without needing head() or column subsetting.

Working with Management Practices

Management practices (planting dates, fertilization rates, harvest methods) are stored in the managements table and linked to experimental treatments through the managements_treatments junction table. This linkage connects management details to yield observations in traitsview.

mgmt_treat <- managements_treatments |>
  left_join(
    managements |> select(id, mgmttype, level, units, date),
    by = c("management_id" = "id")
  )

grass_yields <- traitsview |>
  filter(
    genus %in% c("Miscanthus", "Panicum"),
    trait == "Ayield"
  ) |>
  left_join(mgmt_treat, by = "treatment_id", relationship = "many-to-many")

grass_yields |>
  filter(!is.na(mgmttype)) |>
  count(genus, mgmttype, sort = TRUE)

Nitrogen Fertilization Rates

Extracting nitrogen application rates and joining them with yield data enables exploration of yield–nitrogen relationships. Nitrogen management is recorded as fertilizer_N or fertilizer_N_rate in the mgmttype column.

nitrogen_rates <- managements |>
  filter(mgmttype %in% c("fertilizer_N", "fertilizer_N_rate")) |>
  left_join(
    managements_treatments |> select(management_id, treatment_id),
    by = c("id" = "management_id")
  ) |>
  select(treatment_id, nrate = level, units)

yields_with_n <- traitsview |>
  filter(
    trait == "Ayield",
    genus %in% c("Miscanthus", "Panicum")
  ) |>
  left_join(nitrogen_rates, by = "treatment_id", relationship = "many-to-many")

yields_with_n |>
  filter(!is.na(nrate)) |>
  summarise(
    n = n(),
    mean_N = round(mean(nrate, na.rm = TRUE), 1),
    mean_yield = round(mean(mean, na.rm = TRUE), 1),
    .by = genus
  ) |>
  knitr::kable(col.names = c("Genus", "N obs", "Mean N rate", "Mean Yield (Mg/ha)"))

Genus	N obs	Mean N rate	Mean Yield (Mg/ha)
Panicum	4385	105.1	10.7
Miscanthus	1536	71.6	11.7

Site-Level Aggregation

Aggregating trait data by site is useful for spatial analysis and mapping data density across research locations.

site_summary <- traitsview |>
  filter(!is.na(lat), !is.na(lon)) |>
  summarise(
    n_records = n(),
    n_traits = n_distinct(trait),
    n_species = n_distinct(species_id),
    .by = c(site_id, sitename, lat, lon)
  )

site_summary |>
  arrange(desc(n_records)) |>
  head(15) |>
  knitr::kable()

Table 1: Top research sites by number of records
site_id	sitename	lat	lon	n_records	n_traits	n_species
2.000e+09	Barrow Environmental Observatory (NGEE-Arctic)	71.279875	-156.60848	6723	8	12
7.600e+01	EBI Energy farm	40.063700	-88.20200	2357	12	32
2.740e+02	Ihinger Hof	48.740000	8.92400	1253	11	7
2.000e+09	Kougarok (NGEE-Arctic)	65.163451	-164.81695	745	5	5
1.226e+03	Luquillo	18.310000	-65.74000	715	5	142
3.900e+02	Macknade Research Station	-18.700000	146.20000	640	3	1
5.110e+02	Bambaroo	-18.858889	146.19139	630	25	1
2.000e+09	Santa Cruz Experimental Field Facility	9.119870	-79.70506	600	10	4
1.038e+03	Harwood Mill Farm - Mizer	-29.426000	153.24100	514	1	1
2.000e+09	PA-PNM	8.994504	-79.54296	486	7	36
2.760e+02	Gutenzell-duplicate	48.700000	9.20000	482	21	1
5.830e+02	G. and R. Zanetti’s Farm	-19.500000	147.30000	476	8	1
8.520e+02	AspenFACE	45.667000	-89.61670	417	18	3
1.041e+03	La Mercy	-29.350000	31.07000	410	1	1
6.940e+02	Centro de Tecnologia Canavieira	-22.700000	-47.55000	402	9	1

Geographic Data

All sites with coordinates have lat and lon columns in both traitsview and the sites table. The sites table additionally contains mat (mean annual temperature) and map (mean annual precipitation) for sites where climate data is available.

Finding Data by Author

lebauer_data <- traitsview |>
  filter(grepl("LeBauer", author, ignore.case = TRUE))

lebauer_data |>
  count(trait, author, citation_year, sort = TRUE)

Most Data-Rich Citations

traitsview |>
  count(citation_id, author, citation_year, sort = TRUE) |>
  head(10) |>
  knitr::kable()

Table 2: Top 10 citations by number of records
citation_id	author	citation_year	n
2.00e+09	Alistair Rogers	2017	6723
7.61e+02	Laredo	2003	3461
1.99e+02	Feng	2010	2742
1.89e+02	Clifton-Brown	2002	1232
2.00e+09	Serbin and Rogers	2016	835
2.00e+09	Lianhong Gu	2016	816
7.82e+02	Xiaohui Feng	2015	715
2.00e+09	Slot and Winter	2017	600
3.25e+02	Inman-Bamber	2000	589
1.57e+02	Lewandowski	1998	549

Variable and Trait Lookups

The variables table provides units, descriptions, and valid ranges for each measured trait. This is useful for understanding what a trait measures and checking whether observed values are within expected bounds.

variables |>
  filter(name %in% c("SLA", "Vcmax", "leaf_respiration_rate_m2", "Ayield")) |>
  select(name, units, description, min, max) |>
  knitr::kable()

name	units	description	min	max
SLA	m2 kg-1	Specific Leaf Area	0.1	100
Ayield	Mg ha-1 yr-1	annual aboveground yield	0	Infinity
Vcmax	umol [CO2] m-2 s-1	maximum rubisco carboxylation capacity	0	500
leaf_respiration_rate_m2	umol [CO2] m-2 s-1	Rd; leaf dark respiration	0	500

Performance

Since all tables are loaded in memory as R data frames, filtering and joining operations run at in-memory speed with no network overhead.

system.time({
  result <- traitsview |>
    filter(
      genus %in% c("Miscanthus", "Panicum", "Populus"),
      trait %in% c("SLA", "Vcmax", "Ayield"),
      checked == 1
    ) |>
    summarise(
      n = n(),
      mean = mean(mean, na.rm = TRUE),
      .by = c(genus, trait)
    )
})

   user  system elapsed 
  0.006   0.000   0.006

References

LeBauer, D. S., et al. (2018). BETYdb: a yield, trait, and ecosystem service database applied to second-generation bioenergy feedstock production. GCB Bioenergy. doi:10.1111/gcbb.12420