GSoC - PEcAn Project Ideas

Ecosystem science has many components, so does PEcAn! Some of those components where you can contribute. Below is a list of potential ideas. Feel free to contact any of the mentors in slack, or feel free to ask questions in our #gsoc-2021 channel in slack.


Project Ideas

 

Extend API

Last year we have started to build an API for PEcAn. This was a enormous success, and the scientists loved this approach. We would like to expand on this API and have more functionality available through the API.

Expected outcome: More functions available through the API, especially options to query the database.

Prerequisites: Knowledge of R and Rest

Contact person: Rob Kooper @kooper

 

Testing Dashboard

Integration testing, test-suite development, and PEcAn "status board" (i.e. a place we can go to easily see what models, inputs, etc are currently working and which are down). The goal is to run multiple integration tests in different locations with different models and show the results of these tests. This can also show historical data on runtimes and past success rates.

Expected outcome: A Dashboard that shows all test runs and their results

Prerequisites: R and HTML.

Contact person: Mike Dieze @dietze

 

Kubernetes

There is a helm chart that will load the PEcAn in kurbernetes. This would expand on this helm chart to add autoscaling, as well as taking the PEcAn executor container and splitting it up in smaller pieces.

Expected outcome: A helm chart that will install PEcAn in kubernetes and scales the models up and down as needed.

Prerequisites: R, Docker, and kubernetes.

Contact person: Rob Kooper, @kooper

 

PEcAn packages on CRAN

Currently PEcAn is not available in CRAN, this makes it harder to install on systems since it requires the user to download all code first and install it. Adding PEcAn packages to CRAN will not only make it easier to install, but also easier to find and easier to use standalone modules. This will require fixing warnings in the build process, removing unnecessary dependencies, and potentially splitting modules.

Expected outcome:PEcAn packages available in CRAN.

Prerequisites: R and comfort with the key steps required to release a package on CRAN; experience with R packages helpful, but most of the process is covered in chapters on R package releases in the book ‘rOpenSci packages’ and the book ‘R packages’ by Hadley Wickham

Contact person: Chris Black, @infotroph

Drought-2018 Product

Most models require both meteorological data as well as flux data to inform and improve their predictions to capture the ecosysem dynamics in an area. This idea is to add the Drought-2018 ecosystem eddy covariance flux product for 52 stations to PEcAn's data streams.

Expected outcome: Drought product is availble in PEcAn.

Prerequisites: R and python.

Contact person: Istem Fer, @istfer

 

Additional Input Data

One of the goals of PEcAn is to be able to compare different ecological models and compare the output with actual measurements. These models will be able to create better predictions when we have more diverse data available. Exmaple inputs we are looking at are Input products: ORNL DAAC subset tool (MODIS/VIIRS/Daymet); GES DISC (LDAS, others?); NLCD; SMAP; NASA CMS; GEDI; ECOSTRESS.

Expected outcome: Additinal inputs are implemented and made availble throug the web interface.

Prerequisites: R.

Contact person: Depends on the input data model, please check in slack

 

Webpage updates

The current webpage for PEcAn is a mixture of auto-generated documentation from R, documentation and example code written using RMarkdown and webpages written in HTML. We would like to have a system that will take all these different sources, and create a consistent looking webpage (automatically).

Expected outcome: A consistent looking webpage that is automatically updated with historical manuals availble.

Prerequisites: R, HTML, and github actions.

Contact person: Prakher Prashank

 

Metadata Upload Interface

We have developed the BETYdb-YABA app to support insertion of metadata into BETYdb, including sites and experimental data. The app is built on the YABA API. The user interface provides visualizations prior to uploading to support QA/QC. Currently, the app uses an API written in Node.js for user validation and visualization. We want to extend and migrate this functionality into the YABA API Python codebase and add additional data validation steps.

Expected outcome: YABA API with functionality for data validation and visualization.

Prerequisites: Python, Flask, SQLAlchemy, and PostgreSQL.

Contact person: David LeBauer, @dlebauer, Kristina Riemer, @Kristina Riemer


Older Project Ideas

Google Earth Engine

Establishing GEE and PEcAn communication. Google Earth Engine (GEE) is a cloud service that makes remote sensing data available for Earth System science. GEE has an API through which certain geopspatial analyses could be performed. We want to establish an efficient GEE-PEcAn link to ingest the outputs of such analyses into the PEcAn workflow seamlessly. PEcAn team will provide the initial example code that runs on the GEE servers, the GSoC participant will build the link that pulls the results from GEE to PEcAn, as well as submitting the code remotely to the GEE servers through the PEcAn workflow.

Expected outcome: An application and/or set of functions that automates the connection between GEE and the PEcAn workflow.

Prerequisites: Experience with R and Python required, familiarity with GEE is a plus

Contact person: Istem Fer, @istfer

 

GitHub Actions

Currently PEcAn is build and tested using GitHub, Travis, Docker Hub as well as some custom scripts. The goal is to take this and use GitHub Actions to create a stable one place build environment.

Expected outcome: Conversion from the mixture of build pieces to GitHub Actions.

Prerequisites: Familiarity with building software, Familiarity with GitHub actions, docker, and travis is a plus

Contact person: Rob Kooper @kooper, or Chris Black @infotroph

 

Leverage FIADB

Some of the models we have can use initial data about trees and plants. One of the databases we have worked with in the past is the FIADB. We have code that can connect to the FIADB and get the right data and format it for the ED model. The database is downloadable as a set of CSV files as well as some schema definitions. This project would write scripts to download the data to postgres, and update the code to create the initializations for the ED model.

Expected outcome: Scripts to download and install to postgres FIADB.

Prerequisites: Knowledge of R and PostgreSQL

Contact person: Rob Kooper @kooper and Ankur Desai @ankur

 

Singularity

Use singularity to run models and convert model docker images to singularity images. Launch from a docker web image a qsub on a HPC to run the singularity images on the hPC to do the actual model runs. Add the ability to leverage of HTcondor to run the models in a condor environment.

Prerequisites: Docker and singularity, HPC is a plus

Contact person: Rob Kooper, @kooper

 

Data Management

PEcAn is written to allow for distributed executions. Currently the docker model requires a single shared data folder. This piece would expand the API endpoint to securly fetch the data needed for a run and send the resulting data back. This should be able to cache some of the data locally.

Prerequisites: R, Docker.

Contact person: Rob Kooper, @kooper

 

Data Ingestion Shiny App

Ecosystem modeling relies heavily on fusing data from multiple sources. Whether it be data to calibrate a model or benchmark a model result, they come from different sources that are varying in their formats and naming conventions. The difference in semantics creates a bottleneck as a central ontology does not exist to translate and relate the measurements from different sites, experiments, and/or databases. To alleviate this issue, this project’s goal is to improve upon the existing SHINY data ingestion app. Two main tasks will be to refine the existing interface and then to add a Machine Learning component to eas the process of matching ormats and variables as new data is added.

Expected outcome: A SHINY app that facilitates easy data ingestion and learns to suggest existing variable mathces in the database to the data that is being uploaded.

Prerequisites: Proficiency in R. Interest in data provenance, SQL (PostgreSQL preferred), and SHINY

Contact person: Check on slack

 

Scientific Visualization

Our mission is to create an ecosystem modeling toolbox that is accessible to a non-technical audience (e.g., a high school ecology classroom) while retaining sufficient power and versatility to be valuable to scientific programmers (e.g. ecosystem model developers). However, the diversity of ecosystem models and associated analyses supported by PEcAn poses logistical challenges for presentation of results, especially given the wide range of targeted users. Web-based interactive visualizations can be a powerful tool for exploring model outputs and data as well as a fun learning tool in educational environments.

Currently, PEcAn has basic support for interactive visualizations of outputs using R Shiny. We are looking for a student interested in addressing any of the following areas:

  • Improving Shiny application stability and performance, for instance through more efficient caching or lazy-loading of large outputs and data, or leveraging more efficient interactive visualization frameworks.
  • Enhancing the visual elements of our interface for starting model runs, including visualization of existing sites and input data and better UI elements for setting run options.
  • Developing novel interactive visualization tools that leverage more advanced statistical techniques, such as visualizing and applying machine learning algorithms to outputs and model-data residuals, exploring results in multivariate space.
  • Expected outcome: A more robust set of web-based interactive visualization tools for model simulations and user-provided data.

    Prerequisites: Familiarity with R Shiny including the ability to work with and debug these tools in a remote, Unix-based CLI environment is a requirement. Preference for proficiency with SQL (especially PostgreSQL), HDF/NetCDF formats, and/or advanced statistics (e.g. multivariate regression, time series analysis, information theory) is preferred. Experience with other web-based interactive visualization frameworks, such as Javascript’s D3, is a plus.

    Contact person: Hamze Dokoohaki @Hamze

     

    Extend Analysis

    PEcAn offers multiple analyses on top of a simple execution of an ecosystem model. Currently, you must write a custom script or start a run again from scratch if you would like to perform one of these analyses on an existing model run. To alleviate this problem, this project will entail creating a SHINY app that will facilitate the process of taking an existing model run and initating analyses on that existing run.

    Expected outcome: A SHINY app that walks a user through selecting an existing workflow allowing a user to select from a set of analyses they can apply to that workflow.

    Prerequisites: Experience with R required and knowledge of SHINY is preferred

    Contact person: Check on slack