Ecosystem science has many components, so does PEcAn! Some of those components where you can contribute includes but not limited to:
PEcAn provides a Bayesian data assimilation framework in order to synthesize the data with the models. However, such workflows can be computationally very expensive due to complexity of models and Bayesian methods. PEcAn already has a data assimilation module implemented in R that works locally. Our goal is to 1) make this workflow compatible with HPC environments (e.g. remote execution, parallelization) and 2) implement massively parallelizable particle filter methods using Graphics Processing Units (GPUs).
Expected outcome:Implementation of data assimilation module in HPC environments, and application of massively parallelizable particle filter algorithms in data assimilation module using GPU programming
Prerequisites: R and C/C++ experience is required, experience or willingness to learn GPU programming, Bayesian statistics background is not required but candidates with a background will be preferred.
Contact person: Istem Fer, istfer[at]bu.eduDatabase Optimization
At PEcAn, we want to build tools that make science more reproducible. A huge part of reproducibility is archiving information: we archive data, model configurations, results, -everything you’d need to recreate the experiment and then some. This means that our database needs to be efficient, manageable, and well-curated. Right now, our database errs towards archival, and creates redundant or unused records. With your help we can create a smarter archival & creation standard.
Expected outcome: A set of database management tools for pre-release database curation.
Prerequisites:Proficiency in R and SQL (PostgreSQL preferred). Interest in data provenance.
Contact person: Tess McCabe, tmccabe[at]bu.eduScientific Visualization
Our mission is to create an ecosystem modeling toolbox that is accessible to a non-technical audience (e.g., a high school ecology classroom) while retaining sufficient power and versatility to be valuable to scientific programmers (e.g. ecosystem model developers). However, the diversity of ecosystem models and associated analyses supported by PEcAn poses logistical challenges for presentation of results, especially given the wide range of targeted users. Web-based interactive visualizations can be a powerful tool for exploring model outputs and data as well as a fun learning tool in educational environments.
Currently, PEcAn has basic support for interactive visualizations of outputs using R Shiny. We are looking for a student interested in addressing any of the following areas:
Expected outcome: A more robust set of web-based interactive visualization tools for model simulations and user-provided data.
Contact person: Alexey Shiklomanov, ashiklom[at]bu.eduBETY port from RUBY to Python
The BETY database is used by multiple projects, such as PEcAn, and TERRA-REF. The database schema is still good to use, however the front end is written in Ruby on Rails. The front end has had some work done on it to improve as well as updating the software however has been the same for the last 6 years. Currently the group does not have any full time ruby developers, however we do have a lot of python development expertise. The goal is to take the Ruby frontend and convert this to a different language, such as python.
Expected outcome: The frontend of BETY is changed from Ruby to another language such as python as well as a new front end. The frontend should be configurable to match different projects and should be skinnable.
Prerequisites: Experience with python, an ORM framework such as django, ruby experience helpful.
Related GitHub issues: see BETY github issues
Contact person: Rob Kooper, kooper[at]illinois.eduRemote Execution
We are currently in the process of converting PEcAn to use docker containers. We would like to use either kubernetes or HPC resources to run and scale multiple instances of these containers. To be able to run on HPC resources we will need to use either singularity, or find resources that allow to run docker containers, such as Google App Engine. To be able to do this we need to securely set up a pipe that allows us to start and stop the right containers on HPC environments, or an easy way to scale containers on a cloud computing platform.
Expected outcome: The user can launch an ensemble of 1000 runs, and the framework will automatically either scale the cloud platform to handle this many requests (and scale down afterwards) or use singularity to submit a job on a HPC site.
Prerequisites: Experience with docker, google App Engine and/or singularity + HPC is a plus
Related GitHub issues: #1391
Contact person: Rob Kooper, kooper[at]illinois.edu ; Alexey Shiklomanov, ashiklom[at]bu.eduDistributed Computing
Currently the database that sits underneath PEcAn is build on top of postgresql. Recently, we have built capability to sync databases on multiple machines within our network using a THREDDS API. We wish to solidify this infrastructure within the context of a multi-model uncertainty analysis project that is currently underway. The goal is to run multiple models, with different sets of data from different sites, on multiple machines and leverage the distributed network to pull and push files between machines at different institutions. This is a critical feature of the PEcAn cyberinfrastructure as the sharing of files across a network of machines is a critical development goal toward a true distributed network . Beyond solidifying the existing infrastructure, we wish to add the ability to process files and share intermediary files as well. Pulling and pushing subsets of files to grab only specific variables from results files and requesting calculations be performed on files on remote machines are a few feature we wish to add.
Expected outcome: Execute Model runs on multiple machines and use the THREDDS API to pull result files all onto one machine and perform calculations on those files.
Prerequisites: Database, PostgreSQL, R, experience with git is a plus.
Contact person: Tony Gardella, tonygard[at]bu.eduLinking databases and data types
Ecosystem modeling relies heavily on fusing data from multiple sources. Whether it be data to calibrate a model or benchmark a model result, they come from different sources that are varying in their formats and naming conventions. The difference in semantics creates a bottleneck as a central ontology does not exist to translate and relate the measurements from different sites, experiments, and/or databases. To alleviate this issue, this project’s goal is to systemize PEcAn’s framework to synthesize data with ecosystem models. In this project we want to cross-link to databases such as DataONE, NEOTOMA, PANGAEA and The International Tree-Ring Data Bank to be able to pull in ecological data and leverage PEcAn’s ability to store meta-data and process data on the fly in order to solidify a common ontology of ecological data our community can use.
Expected outcome: A framework to support external database connections to pull in ecological data to PEcAn workflow for calibration and validation of model runs.
Prerequisites: Experience with database management systems and R is required.
Contact person: Tony Gardella, tonygard[at]bu.edu