View on GitHub

PEcAn

Home News People Tutorials Demo Download Documentation Workshop GSoC Contact Us

GSoC - PEcAn Project Ideas

Ecosystem science has many components, so does PEcAn! Some of those components where you can contribute includes but not limited to:

  • Scientific visualization – One of PEcAn's greatest strengths is that we provide accessible tools for a broader community. Help us develop powerful visualization tools and a well-built web API for a stronger information flow.
  • High performance computing – PEcAn takes the Bayesian approach in bringing ecosystem models and data together. Help us implement and optimize our algorithms for high performance computing environments.
  • Linking databases – PEcAn develops tools to make predictions for the future, but we do not have data from the future (yet!) to validate our predictions. So, it seems like a good idea to predict the past using the same models and let data from the past inform us. Help PEcAn to talk to the databases from where we can pull this paleo data.
  • Distributed computing – PEcAn is growing: we incorporate more models, more data, more teams everyday. Help us design and maintain an intact cyber infrastructure.
  • New instance setup – We want to see more teams in our network but we also want to make it as trouble-free as possible. Help us come up with a system that automatizes this process.
  • Dockerization – We distribute PEcAn as a fully functional virtual application that runs on a wide range of operating systems, but we want even more flexibility! Help us employ a dockerized system.
  • Documentation/Education – Did we mention that one of PEcAn's greatest strengths is that we provide accessible tools for a broader community? Help us make it even more accessible by clean and clear documentation.
  • Scientific visualization

    Our mission is to create an ecosystem modeling toolbox that is accessible to a non-technical audience (e.g., a high school ecology classroom) while retaining sufficient power and versatility to be valuable to scientific programmers (e.g. ecosystem model developers). However, the diversity of ecosystem models and associated analyses supported by PEcAn poses logistical challenges for presentation of results, especially given the wide range of targeted users. Web-based interactive visualizations can be a powerful tool for identifying patterns in model outputs and validation with data.

    Expected outcome: A set of web-based tools for interactive visualization of model simulations and user-provided data, including the ability to plot results as time series, do pairwise comparisons of different variables, and apply statistical tests and transformations to the results.

    Prerequisites: Familiarity with web-based interactive visualization frameworks such as R’s Shiny or Javascript’s D3 is a requirement. Demonstrated proficiency with R, SQL (especially PostgreSQL), HDF/NetCDF formats, and/or advanced statistics (e.g. multivariate regression, time series analysis, information theory) is preferred.

    Related GitHub issues: #1245 - #921

    Contact person: Alexey Shiklomanov, ashiklom[at]bu.edu

    High performance computing

    PEcAn provides a Bayesian data assimilation framework in order to synthesize the data with the models. However, Markov Chain Monte Carlo (MCMC) numerical techniques that are the backbone of Bayesian computation are computationally expensive. PEcAn's data assimilation module already has a Metropolis-Hastings MCMC algorithm natively implemented in R, and also supports external packages that provide other samplers. Our goal is to implement and optimize massively parallelizable MCMC algorithms using Graphics Processing Units (GPUs) that can further accelerate such scalable applications.

    Expected outcome: Implementation of massively parallelizable MCMC algorithms in data assimilation module using GPU programming.

    Prerequisites: R and C/C++/Fortran experience, experience with CUDA/OpenACC or willingness to learn. Some Bayesian statistics background is desirable but not required.

    Contact person: Istem Fer, istfer[at]bu.edu

    Linking databases

    Ecosystem models usually do a pretty good job predicting what we observe today, but when we forecast to the future, they tend not to agree with each other. To find out which model gets it right (if any), we test them against paleo-data and make improvements accordingly. Fossil pollen and tree ring data are particularly informative about the past vegetation. PEcAn already has a framework to synthesize paleo-data with ecosystem models, but we lack a systematic flow of the paleo-data into PEcAn workflow. In this project we want to cross-link to databases such as NEOTOMA, PANGAEA and The International Tree-Ring Data Bank to be able to pull in paleoecological data. In this project we will make use of rOpenSci packages neotoma and pangaear as clients to the databases.

    Expected outcome: A framework to support external database connections to pull in paleoecological data to PEcAn workflow for calibration and validation of model runs.

    Prerequisites: Experience with database management systems and R is required.

    Related GitHub issues: #492 - #743

    Contact person: Istem Fer, istfer[at]bu.edu

    Distributed computing

    Currently the database that sits underneath PEcAn is build on top of postgresql. To allow any site to add new records, we have partitioned the database ids, where each site gets a billion records. This allows us to sync parts of the database. The biggest question is what to do with records that are owned by a remote site and you want to modify. The goal is for users to make changes on their local database and have the changes be distributed. We currently envision something like a git push and pull model, but are open to other solutions. This will need to work for cases where people are offline.

    Expected outcome: For every site the ability to modify the database and have those records be pushed to other sites' databases.

    Prerequisites: Database, PostgreSQL, experience with git is a plus.

    Contact person: Rob Kooper, kooper[at]illinois.edu

    New instance setup

    When the user creates a new instance of PEcAn it will start with a clean database with no users, or the default user. Adding the default user as admin is a security issue for those instances that are online. The thought is for a user to go to an admin page and register the new server and get a new ID, add an admin user to the database, setup database parameters etc. Currently most of these are done in the config.php. This can also help with backups, updates, etc.

    Expected outcome: A wizard that will help a new PEcAn system to be setup, registering it with the other PEcAn servers, and setup a sync with the other servers.

    Prerequisites: PHP is strongly recommended since this is currently used for the web ui.

    Related GitHub issues: #1003 - #849 - #758

    Contact person: Rob Kooper, kooper[at]illinois.edu

    Dockerization

    The software stack used for PEcAn is currently distributed as a virtual machine (VM) that runs on a user machine or on a VM server. When upgrading this VM the user will either need to download the new VM and copy the database and the files from the old VM to the new VM, or they can update the software on the VM. The software is already split in smaller modules, each which could become a docker container (database, web frontends, models). This will make it easier to deploy new versions of the software and add new models to the system.

    Expected outcome: A complete docker stack that will run all the code that is currently distributed as a VM, with the ability to easily update the software and add new models.

    Prerequisites: Experience with docker, or willingness to learn. R and linux is a plus.

    Related GitHub issues: #1028

    Contact person: Rob Kooper, kooper[at]illinois.edu

    Documentation/Education

    Documentation is fundamental to developing. PEcAn documentation is unique in that it must provide instructions to both users and developers in a depth and breadth capable of reaching an audience with a variety of backgrounds. In conjunction, the PEcAn software is constantly evolving because it is open-source, requiring a flexible workflow. Documenting the project will require a multi-faceted approach to clearly communicate and train new users and developers. Beyond static text, we envision a suite of interactive tutorials, videos, and vignettes to accomplish this task.

    Expected outcome: A set of tools to train users and developers and document PEcAn. Specifically we will develop interactive tutorials, videos, and explore other options to communicate the unique products PEcAn delivers.

    Prerequisites: Experience with R and markup languages. Familiarity with graph visualization softwares is a plus.

    Related GitHub issues: See Documentation tag.

    Contact person: Tony Gardella, tonygard[at]bu.edu