Skip to main content

GSoC - PEcAn Project Ideas

Ecosystem science has many components, so does PEcAn! Some of those components where you can contribute. Below is a list of potential ideas. Feel free to contact any of the mentors in slack, or feel free to ask questions in our #gsoc-2023 channel in slack.

Project Ideas

Following is a list of project ideas, use this list to contact the appropriate mentors on slack. Feel free to propose your own ideas as well, in this case contact @kooper on slack so he can put you in contact with the best mentors.

PEcAn packages & CRAN [R package development]

PEcAn is implemented as a set of R packages, but the user must currently download and install all the packages as a single unit. The short-term goal of this project is to focus on fixing warnings in the build process, refactoring to remove unnecessary dependencies, and potentially splitting modules. The medium-term goal is to increase the reliability of PEcAn’s integration tests, and thus this year’s package development will prioritize the packages that are most associated with overall workflow bottlenecks (e.g., PEcAn.data.atmosphere, which is focused on downloading and processing meteorological data). The longer-term goal is to make PEcAn packages available on CRAN (the primary R package archive) which will not only make it easier to install, but also easier to find and easier to use standalone modules.

 

Expected outcome:
PEcAn packages pass checks and integration tests without warnings. Packages are made available in CRAN.
Prerequisites:
R; experience with R packages is helpful, but most of the process is covered in chapters on R package releases in the book ‘rOpenSci packages’ and the book ‘R packages’ by Hadley Wickham
Contact person:
Chris Black, @infotroph; Mike @Dietze
Duration:
Size: 175 hours for proposals that focus on dependency removal, 350 hours for proposals that split modules.
Difficulty:
Easy, we anticipate the ability for multiple people to work on this project since different individuals can focus on different PEcAn R packages.

PEcAn model coupling and development [Data Science]

PEcAn has the capability to interface multiple ecological models. The goal of this project is to improve the coupling of existing models to PEcAn (specifically FATES) and add new models (specifically a simple vegetation model that is under development). It is also possible to contribute to the development of the simple vegetation model which is written in fortran.

 

Expected outcome:
New or improved PEcAn model packages.
Prerequisites:
R. Fortran is an advantage.
Contact person:
Hui Tang @Hui Tang, Istem Fer @istfer
Duration:
Flexible to work as either a Small (175hr) or Large (350 hr)
Difficulty:
Medium

Input Processing / Asynchronous workflow execution [Data Science]

One of the goals of PEcAn is to be able to run different ecological models (which require a range of data inputs) and compare the model outputs with actual measurements (a.k.a. data constraints). The goal of this project is twofold, depending on the specific interests of the GSOC student.
  1. The current PEcAn input processing occurs mostly within the primary runtime workflow, but numerous PEcAn applications would benefit from the ability to update near real-time data asynchronously with model execution, handling different data streams in parallel. As part of this we’d also like to make it easier to use PEcAn input processing modules as stand alone tools. This subproject also leverages a joint effort with the Red Hat Collaboratory.
  2. Increase the number of input products supported. Students may focus on one or more of the following:
    1. Add the NMME seasonal weather forecast as an meteorological drivers
    2. Add remote sensing data streams: NASA GEDI lidar, solar induced fluorescence e.g., NASA OCO-2, OCO-3, thermal e.g., NASA ECOSTRESS
    3. Extend our existing support for ingesting data from the National Ecological Observatory Network NEON soil moisture and soil respiration data products. This will involve developing integrating NEONSoils code into PEcAn and internal code from the Dietze lab on soil moisture gap-filling and downscaling.
We anticipate the ability for multiple people doing this project since there are separate parts that can be done by individuals.

 

Prerequisites:
R.
Contact person:
@Ankur Desai, Istem Fer @istfer
Duration:
1. data workflow update [size: large (350hr)]; 2. Individual data packages: [size: small (175 hr) for one, large for 2-3 data packages]
Difficulty:
1 data update [difficulty: hard]; 2. Individual data packages: 2.1 easy, 2.2 easy, 2.3 medium

GitHub Actions

Currently GitHub Actions will check to see if there are newer versions of the packages installed. We need to limit these checks since they are limited by GitHub. Additionally we do a simple test of SIPNET, it would be great if that can use the full docker stack to test a full run.

 

In the past year we have created a dashboard that shows how tests are performing. It would be great to have a test that runs the tests using the develop stack and writes the test results back into a file in a special branch. As part of this task the dashboard will need to be updated to fetch the data from this branch.

 

Expected outcome:
New GitHub actions that do not take as long to run, and have the ability to do larger tests.
Prerequisites:
GitHub Actions, Docker
Contact person:
Rob Kooper @kooper
Duration:
Flexible to work as either a Small (175hr) or Large (350 hr)
Difficulty:
Medium, Large if running and updating the integration testing dashboard

SDA Dashboard

This project is primarily focused on the interactive visualization of outputs from our carbon cycle forecast and data assimilation system. This project builds on a previously-developed site-level R Shiny dashboard that is no longer functional, and aims to extend this to a much larger number of sites. We also hope to integrate in functionality from one of our other dashboards (which visualizes spatial interactions) and advances made by external collaborators. If time permits, we’d also like to resurrect our automated email alert system.

 

Expected outcome:
The aims here are:
  1. Resurrect a previously-developed R Shiny dashboard for our carbon cycle forecast system , potentially integrating in work done by the Ecological Forecasting Initiative on their dashboard and FMI’s Field Observatory
  2. Merge in the functionality from our data assimilation dashboard
  3. Resurrect the automated email alert system that sent a subset of visualizations, and links to the full app, to users for the sites they are interested in.
Prerequisites:
R, R Shiny, data visualization
Contact person:
Mike @Dietze, @HenriKajasilta
Duration:
Flexible to work as either a Small (175hr) or Large (350 hr)
Difficulty:
Medium

Finish and Deploy New Website

This project includes migrating content and deploying a new version of the PEcAn website. The code hosted at https://github.com/PecanProject/web was developed during a previous GSOC.

 

Expected outcome:
The primary aims are to migrate content from the old website and then deploy the new PEcAn website, as described inpecanproject/web Issue 11. Additional ideas may be proposed.
Prerequisites:
Experience with javascript, git, and web development preferred.
Contact person:
David LeBauer @dlebauer, Eshan Tripathi @eshan
Duration:
Small (175hr)
Difficulty:
Easy