17.4 Sharing data
Sharing your data requires a few steps. First, before entering any data, you will need to request an ID from the PEcAn developers. Simply open an issue at github and we will generate an ID for you. If possible, add the URL of your data host.
You will now need to synchronize the database again and use your ID. For example if you are given ID=42 you can use the following command: MYID=42 REMOTEID=0 ./scripts/load.bety.sh
. This will load the EBI database and set the ID’s such that any data you insert will have the right ID.
To share your data you can now run the dump.bey.sh. The script is configured using environment variables, the following variables are recognized:
- DATABASE: the database where the script should write the results. The default is bety
.
- PG_OPT: additional options to be added to psql (default is nothing).
- MYSITE: the ID of your site. If you have not requested an ID, use 99, which is used for all sites that do not want to share their data (i.e. VM). 99 is the default.
- LEVEL: the minimum access-protection level of the data to be dumped (0=private, 1=restricted, 2=internal collaborators, 3=external collaborators, 4=public). The default level for exported data is level 3.
- note that currently only the traits and yields tables have restrictions on sharing. If you share data, records from other (meta-data) tables will be shared. If you wish to extend the access_level to other tables please submit a feature request.
- UNCHECKED: specifies whether unchecked traits and yields be dumped. Set to YES (all caps) to dump unchecked data. The default is NO.
- ANONYMOUS: specifies whether all users be anonymized. Set to YES (all caps) to keep the original users (INCLUDING PASSWORD) in the dump file. The default is NO.
- OUTPUT: the location of where on disk to write the result file. The default is ${PWD}/dump
.
NOTE: If you want your dumps to be accessible to other PEcAn servers you need to perform the following additional steps
- Open pecan/scripts/load.bety.sh
- In the DUMPURL section of the code add a new record indicating where you are dumping your data. Below is the example for SITE number 1 (Boston University)
elif [ "${REMOTESITE}" == "1" ]; then
DUMPURL="http://psql-pecan.bu.edu/sync/dump/bety.tar.gz"
- Check your Apache settings to make sure this location is public
- Commit this code and submit a Pull Request
- From the URL in the Pull Request, PEcAn administrators will update the machines table, the status map, and notify other users to update their cron jobs (see Automation below)
Plans to simplify this process are in the works