3 Finding and Preparing Published Data

BETYdb is designed for both previously published data and 'primary' data. Most of this documentation assumes that you have already identified a data set that you want to upload, or have a set of papers from which you would like to extract data and summary statistics.

3.1 Meta-analyses

If you are planning to do a meta-analysis, even if this is not your first time, please read 'Uses and Misuses of Meta-analysis in Ecology" . Many texts are available, but the recent "Handbook of Meta-analysis in Ecology and Evolution" is probably the most comprehensive and specific for plant sciences.

For a meta-analysis, the first step is to find papers that contain the target data.

The easiest approach to use a search engine such as Web of Science, Google Scholar, or Microsoft Academic Search. Starting with queries such as "scientific name + trait", and allowing these results to guide further queries. Often, the references (particularly of meta-analyses and reviews) and forward citations will point to other studies.

Another starting point for the programmatically inclined - which aids in documenting searches - is to submit queries programmatically. Carl Davidson wrote a python script to search for citations based on species and trait name. In addition, the rOpenSci project has a suite of R packages for searching publications.

3.2 Preparing Publications for Data Entry

3.2.1 Mendeley

Mendeley provides a central location for the collection, annotation, and tracking of the journal articles that we use. Features of Mendeley that are useful to us include:

  • Collaborative annotation & notes sharing
    • Text highlighter
    • Sticky notes for comments in the text
    • Notes field for text notes in the reference documentation
  • Read/ unread & favorites: Papers can be marked as read or unread, and may be starred.
  • Groups
  • Tagging

Each project has two groups: "projectname" and "projectname_out" for the papers with data to be entered and for the papers with data that has been entered, respectively. Papers in the _out group may contain data for future entry (for example, traits that are not listed in Table ).

Each project manager may have one or more projects and each project should have one group. Group names should refer to plant species, plant functional types, or another project specific name. Please make sure that David LeBauer is invited to join each project folder.

  1. Open Mendeley desktop
  2. Click EditNew Group or Ctrl+Shift+M
  3. Create group name following instructions above
  4. Enter group name
  5. Set Privacy SettingsPrivate
  6. Click Create Group
  7. Click Edit Settings
  8. Under File Synchronization, check Download attached files to group

3.2.1.1 Adding and Annotating Papers

When naming a group, tag folders so that instructions for a technician would include the folder and the tag to look for, e.g. "please enter data from projectx" or "please enter data from papers tagged y from project x". To access the full text and PDF of papers from off campus, use the UIUC VPN service. If you are managing a Mendeley folder that undergraduates are actively entering data from, please plan to spend between 15 min and 1 hour per week maintaining it—enough to keep up with the work that the undergraduates are doing.

3.2.1.2 Adding a reference

  • If the DOI number is available (most articles since 2000)
    1. Select project folder
    2. Right click and select Add entry manually...
    3. Paste DOI number in DOI field
    4. Select the search spyglass icon
    5. Drag and drop PDF onto the record.
  • If DOI not available:
    1. Download the paper and save as citation_key.pdf
    2. Add using the Files field
    3. The citation key should be in authorYYYYabc where YYYY is the four digit year and abc is the acronym for the first three words excluding articles (the, a, an), prepositions (on, in, from, for, to, etc...), and the conjunctions (for, and, nor, but, or, yet, so) with less than three letters.

3.2.1.3 Annotating a Reference

Each week, please identify and prepare papers that you would like to be entered next by completing the following steps:

  1. Use the star label to identify the papers that you want the student to focus on next.
    • Start by keeping a minimum of 2 and a maximum of 5 highlighted at once so that students can focus on the ones that you want. Students have been entering 1-3 papers per week, once we get closer to 3-5, the min/max should change.
    • Choose papers that are the most data rich.
  2. For each paper, use comment bubbles, notes field, and highlighter to indicate:
    • Name(s) of traits to be collected
    • Methods:
    • Site name
    • Location
    • Number of replicates
    • Statistics to collect
    • Identify treatment(s) and control
    • Indicate if study was conducted in greenhouse, pot, or growth chamber
    • Data to collect
    • Identify figures number and the symbols to extract data from.
    • Table number and columns with data to collect
    • Covariates
    • Management data (for yields)
    • Units in 'to' and 'from' fields used to convert data
    • Esoteric information that other scientists or technicians might not catch and that is not otherwise recorded in the database
    • Any data that may be useful at a later date but that can be skipped for now.

Comment or Highlight the following information

  • Sample size
  • Covariates (see table )
  • Treatments
  • Managements
  • Other information entered into the database, e.g. experimental details

3.2.1.4 Finding a citation in Mendeley

To find a citation in Mendeley, go to the project folder. By default, data entry technicians should enter data from papers which have been indicated by a yellow star and in the order that they were added to the list. Information and data to be collected from a paper can be found under the 'Notes' tab and in highlighted sections of the paper.

3.2.2 Recording extracted data and transformations

Google Spreadsheets are used to keep a record of any data that is not entered directly from the original publication. Please share all spreadsheets with the user betydb@gmail.com in addition to any collaborators.

  • Any raw data that is not directly entered into the database but that is used to derive data or stats using equations in the table in the section on [unit conversions][Converting Units and Adjustment to Temperature].
  • Any data extracted from figures, along with the figure number.
  • Any calculations that were made. These calculations should be included in the cells.

Each project has a Google document spreadsheet with the title "project_data". In this spreadsheet, each reference should have a separate worksheet labeled with the citation key (authorYYYabc format). Do not enter data into excel first as this is prone to errors and information such as equations may be lost when uploading or copy-pasting.