GSoC - PEcAn Project Ideas

PEcAn is an open-source ecosystem modeling framework integrating data, models, and uncertainty quantification. Below is a list of potential ideas where contributors can help improve and expand PEcAn. To get started contributing to PEcAn, check out this guide. Come find us on Slack to discuss. If you have questions or would like to propose your own idea, contact @kooper in or join our #gsoc-2025 channel in Slack!

Project Ideas

Below is a list of project ideas. Feel free to contact the listed mentors on Slack to discuss further or contact @kooper with new ideas and he can help connect you with mentors.

Global Sensitivity Analysis / Uncertainty Partitioning
Parallelization of Model Runs on HPC
Database and Data Improvements
Development of Notebook-based PEcAn Workflows
Refactoring Compile-time Flags to Runtime Flags in SIPNET

1. Global Sensitivity Analysis / Uncertainty Partitioning

This project would extend PEcAn's existing uncertainty partitioning routines, which are primarily one-at-a-time and focused on model parameters, to also consider ensemble-based uncertainties in other model inputs (meteorology, soils, vegetation, phenology, etc). This project would employ Sobol' methods and some uncommitted code exists that manually prototyped how this would be done in PEcAn. The goal would be to refactor/reimplement this prototype into a reliable, automated system and apply it to some key test cases in both natural and managed ecosystems.

Expected outcomes:

A successful project would complete the following tasks:

Reliable, automated Sobol sensitivity analyss and uncertainty partitioning across multiple model inputs.
Applications to test case(s) in natural and / or managed ecosystems.

Prerequisites:

Required: R (existing workflow and prototype is in R)
Helpful: familiarity with sensitivity analyses

Contact person:

Mike @Dietze

Duration:

Flexible to work as either a Medium (175hr) or Large (350 hr)

Difficulty:

Medium

2. Parallelization of Model Runs on HPC

This project would extend PEcAn's existing run mechanisms to be able to run on a High Performance Compute cluster (HPC) using Apptainer. For uncertaintity analysis, PEcAn will run the same model 1000s of times with small permutations. This is a perfect use for an HPC run. The goal is to not submit 1000s of jobs, but have a single job with multiple nodes that will run all of the ensembles efficiently. Running can be orchistrated using RabbitMQ, but other methods are also encouraged. The end goal should be for the PEcAn system to be launched, and run the full workflow on the HPC from start to finish leveraging as many nodes as it is given during the submission.

Expected outcomes:

A successful project would complete the following tasks:

Show different ways to launch jobs (rabbitmq, lock files, simple round robin, etc)
Report of different options and how they can be enabled.

Prerequisites:

Required: R (existing workflow and prototype is in R), Docker
Helpful: Familiarity with HPC and Apptainer

Contact person:

Rob @Kooper

Duration:

Flexible to work as either a Medium (175hr) or Large (350 hr)

Difficulty:

Medium

3. Database and Data Improvements

PEcAn relies on the BETYdb database to store trait and yield data as well as model provenance information. This project aims to separate trait data from provenance tracking, ensure that PEcAn is able to run without the server currently required to run the Postgres database used by BETYdb, and enable flexible data sharing in place of a server-reliant sync mechanism. The goal is to make PEcAn workflows easier to test, deploy, and use while also making data more accessible.

Potential Directions

Minimal BETYdb Database: Create a simplified version of BETYdb for demonstrations and Integration tests, which might include:
- Review the provenance information we currently log, identify components that no longer need to be tracked or that should be temporary rather than permanent records, and build tools to clean unneeded records from the database.
- Design and create a freestanding version of the trait data, including choosing the format and distribution method, implementing whatever pipelines are needed to move the data over, and documenting how to use and update the result.
- Review the information we currently log, identify components that no longer need to be tracked or that should be temporary rather than permanent, and build tools to clean unneeded/expired records from the database.
Non-Database Setup: Enable workflows that do not require PostgreSQL or a web front-end, potentially including:
- Identify PEcAn modules that are still DB-dependent and refactor them to allow freestanding use
- Implement mechanisms for decoupling the DB from the model pipelines in time and space while still tracking provenance. Perhaps this could involve separate prep/execution/post-logging phases, but we encourage your creative suggestions.
- Create tools that maximize interoperability with data from other sources, including from external databases or the user's own observations.
- Identify functionality from the "BETYdb network" sync system that is out of date and replace or remove it as needed.

Expected outcomes:

A successful project would complete a subset of the following tasks:

A lightweight, distributable demo Postgres database.
A distributable dataset of the existing trait and yield records in a maximally reusable format (i.e. maybe not Postgres)
A workflow that is independent of the Postgres database.

Skills Required:

Familiarity with database concepts required
Postgres experience helpful (and required if proposing DB cleanup tasks)
R experience helpful (and required if proposing PEcAn code changes)

Contact person:

Chris Black (@infotroph)

Duration:

Suitable for a Medium (175hr) or Large (350 hr) project.

Difficulty:

Intermediate to hard

4. Development of Notebook-based PEcAn Workflows

The PEcAn workflow is currently run using either a web based user interface, an API, or custom R scripts. The web based user interface is easiest to use, but has limited functionality whereas the custom R scripts and API are more flexible, but require more experience.

This project will focus on building Quarto notebooks that provide an interface to PEcAn that is both welcoming to new users and flexible enough to be a starting point for more advanced users. It will build on existing Pull Request 1733.

Expected outcomes:

Two or more template workflows for running the PEcAn workflow.
Written vignette and video tutorial introducing their use.

Prerequisites:

Familiarity with R.
Familiarity with R studio and Quarto or Rmarkdown is a plus.

Contact person: David LeBauer @dlebauer, Nihar Sanda @koolgax99

Duration: Medium (175hr)

Difficulty: Medium

5. Refactoring Compile-time Flags to Runtime Flags in SIPNET

Project Overview

The ecosystem SIPNET is a core component of many PEcAn analyses. SIPNET is compiled with multiple compile-time flags that control whether different features are turned on and off. Thus, as currently configured, each model structure requires a separate compiled binary.

This project will refactor these flags to be runtime-configurable via command-line arguments or a configuration file, improving usability and testing efficiency.

Expected Outcomes

Convert selected SIPNET compile-time flags to runtime options.
Develop a global configuration object for managing runtime flags.
Improve testability by enabling different configurations without recompiling.

Prerequisites

Required: C, experience with compilers and build systems.
Helpful: Understanding of simulation models.

Mentor(s)

David LeBauer (@dlebauer)
Mike Longfritz

Duration

Medium (175hr) or Large (350hr)

Difficulty

Medium to Hard

Project Ideas​

1. Global Sensitivity Analysis / Uncertainty Partitioning​

2. Parallelization of Model Runs on HPC​

3. Database and Data Improvements​

4. Development of Notebook-based PEcAn Workflows​

5. Refactoring Compile-time Flags to Runtime Flags in SIPNET​

Project Ideas

1. Global Sensitivity Analysis / Uncertainty Partitioning

2. Parallelization of Model Runs on HPC

3. Database and Data Improvements

4. Development of Notebook-based PEcAn Workflows

5. Refactoring Compile-time Flags to Runtime Flags in SIPNET