library(betydata)
library(dplyr)Common Analyses with betydata
- How to extract and summarize yield data for specific genera
- How to link management practices (fertilization, planting) to yield observations
- Patterns for site-level aggregation, author-based queries, and variable lookups
Setup
Extracting Yield Data for a Genus
A common starting point is pulling yield observations for a particular genus and summarizing them. The Ayield trait represents above-ground annual yield in Mg/ha.
miscanthus_yields <- traitsview |>
filter(
genus == "Miscanthus",
trait == "Ayield"
) |>
select(id, mean, date, sitename, scientificname)
miscanthus_yieldsnrow(miscanthus_yields)[1] 1021
All tables are tibbles, which display the first 10 rows by default. With key columns ordered first (trait, mean, units, scientificname, genus), the default output is immediately informative without needing head() or column subsetting.
Working with Management Practices
Management practices (planting dates, fertilization rates, harvest methods) are stored in the managements table and linked to experimental treatments through the managements_treatments junction table. This linkage connects management details to yield observations in traitsview.
mgmt_treat <- managements_treatments |>
left_join(
managements |> select(id, mgmttype, level, units, date),
by = c("management_id" = "id")
)
grass_yields <- traitsview |>
filter(
genus %in% c("Miscanthus", "Panicum"),
trait == "Ayield"
) |>
left_join(mgmt_treat, by = "treatment_id", relationship = "many-to-many")
grass_yields |>
filter(!is.na(mgmttype)) |>
count(genus, mgmttype, sort = TRUE)Nitrogen Fertilization Rates
Extracting nitrogen application rates and joining them with yield data enables exploration of yield–nitrogen relationships. Nitrogen management is recorded as fertilizer_N or fertilizer_N_rate in the mgmttype column.
nitrogen_rates <- managements |>
filter(mgmttype %in% c("fertilizer_N", "fertilizer_N_rate")) |>
left_join(
managements_treatments |> select(management_id, treatment_id),
by = c("id" = "management_id")
) |>
select(treatment_id, nrate = level, units)
yields_with_n <- traitsview |>
filter(
trait == "Ayield",
genus %in% c("Miscanthus", "Panicum")
) |>
left_join(nitrogen_rates, by = "treatment_id", relationship = "many-to-many")
yields_with_n |>
filter(!is.na(nrate)) |>
summarise(
n = n(),
mean_N = round(mean(nrate, na.rm = TRUE), 1),
mean_yield = round(mean(mean, na.rm = TRUE), 1),
.by = genus
) |>
knitr::kable(col.names = c("Genus", "N obs", "Mean N rate", "Mean Yield (Mg/ha)"))| Genus | N obs | Mean N rate | Mean Yield (Mg/ha) |
|---|---|---|---|
| Panicum | 4385 | 105.1 | 10.7 |
| Miscanthus | 1536 | 71.6 | 11.7 |
Site-Level Aggregation
Aggregating trait data by site is useful for spatial analysis and mapping data density across research locations.
site_summary <- traitsview |>
filter(!is.na(lat), !is.na(lon)) |>
summarise(
n_records = n(),
n_traits = n_distinct(trait),
n_species = n_distinct(species_id),
.by = c(site_id, sitename, lat, lon)
)
site_summary |>
arrange(desc(n_records)) |>
head(15) |>
knitr::kable()| site_id | sitename | lat | lon | n_records | n_traits | n_species |
|---|---|---|---|---|---|---|
| 2.000e+09 | Barrow Environmental Observatory (NGEE-Arctic) | 71.279875 | -156.60848 | 6723 | 8 | 12 |
| 7.600e+01 | EBI Energy farm | 40.063700 | -88.20200 | 2357 | 12 | 32 |
| 2.740e+02 | Ihinger Hof | 48.740000 | 8.92400 | 1253 | 11 | 7 |
| 2.000e+09 | Kougarok (NGEE-Arctic) | 65.163451 | -164.81695 | 745 | 5 | 5 |
| 1.226e+03 | Luquillo | 18.310000 | -65.74000 | 715 | 5 | 142 |
| 3.900e+02 | Macknade Research Station | -18.700000 | 146.20000 | 640 | 3 | 1 |
| 5.110e+02 | Bambaroo | -18.858889 | 146.19139 | 630 | 25 | 1 |
| 2.000e+09 | Santa Cruz Experimental Field Facility | 9.119870 | -79.70506 | 600 | 10 | 4 |
| 1.038e+03 | Harwood Mill Farm - Mizer | -29.426000 | 153.24100 | 514 | 1 | 1 |
| 2.000e+09 | PA-PNM | 8.994504 | -79.54296 | 486 | 7 | 36 |
| 2.760e+02 | Gutenzell-duplicate | 48.700000 | 9.20000 | 482 | 21 | 1 |
| 5.830e+02 | G. and R. Zanetti’s Farm | -19.500000 | 147.30000 | 476 | 8 | 1 |
| 8.520e+02 | AspenFACE | 45.667000 | -89.61670 | 417 | 18 | 3 |
| 1.041e+03 | La Mercy | -29.350000 | 31.07000 | 410 | 1 | 1 |
| 6.940e+02 | Centro de Tecnologia Canavieira | -22.700000 | -47.55000 | 402 | 9 | 1 |
All sites with coordinates have lat and lon columns in both traitsview and the sites table. The sites table additionally contains mat (mean annual temperature) and map (mean annual precipitation) for sites where climate data is available.
Most Data-Rich Citations
traitsview |>
count(citation_id, author, citation_year, sort = TRUE) |>
head(10) |>
knitr::kable()| citation_id | author | citation_year | n |
|---|---|---|---|
| 2.00e+09 | Alistair Rogers | 2017 | 6723 |
| 7.61e+02 | Laredo | 2003 | 3461 |
| 1.99e+02 | Feng | 2010 | 2742 |
| 1.89e+02 | Clifton-Brown | 2002 | 1232 |
| 2.00e+09 | Serbin and Rogers | 2016 | 835 |
| 2.00e+09 | Lianhong Gu | 2016 | 816 |
| 7.82e+02 | Xiaohui Feng | 2015 | 715 |
| 2.00e+09 | Slot and Winter | 2017 | 600 |
| 3.25e+02 | Inman-Bamber | 2000 | 589 |
| 1.57e+02 | Lewandowski | 1998 | 549 |
Variable and Trait Lookups
The variables table provides units, descriptions, and valid ranges for each measured trait. This is useful for understanding what a trait measures and checking whether observed values are within expected bounds.
variables |>
filter(name %in% c("SLA", "Vcmax", "leaf_respiration_rate_m2", "Ayield")) |>
select(name, units, description, min, max) |>
knitr::kable()| name | units | description | min | max |
|---|---|---|---|---|
| SLA | m2 kg-1 | Specific Leaf Area | 0.1 | 100 |
| Ayield | Mg ha-1 yr-1 | annual aboveground yield | 0 | Infinity |
| Vcmax | umol [CO2] m-2 s-1 | maximum rubisco carboxylation capacity | 0 | 500 |
| leaf_respiration_rate_m2 | umol [CO2] m-2 s-1 | Rd; leaf dark respiration | 0 | 500 |
Performance
Since all tables are loaded in memory as R data frames, filtering and joining operations run at in-memory speed with no network overhead.
system.time({
result <- traitsview |>
filter(
genus %in% c("Miscanthus", "Panicum", "Populus"),
trait %in% c("SLA", "Vcmax", "Ayield"),
checked == 1
) |>
summarise(
n = n(),
mean = mean(mean, na.rm = TRUE),
.by = c(genus, trait)
)
}) user system elapsed
0.006 0.000 0.006
References
- LeBauer, D. S., et al. (2018). BETYdb: a yield, trait, and ecosystem service database applied to second-generation bioenergy feedstock production. GCB Bioenergy. doi:10.1111/gcbb.12420