24.3 The PEcAn docker install process in detail

24.3.1 Configure docker-compose

This section will let you download some configuration files. The documentation provides links to the latest released version (main branch in GitHub) or the develop version that we are working on (develop branch in GitHub) which will become the next release. If you cloned the PEcAn GitHub repository you can use git checkout <branch> to switch branches.

The PEcAn Docker stack is configured using a docker-compose.yml file. You can download just this file directly from GitHub latest or develop. You can also find this file in the root of cloned PEcAn GitHub repository. There is no need to edit the docker-compose.yml file. You can use either the .env file to change some of the settings, or the docker-compose.override.yml file to modify the docker-compose.yml file. This makes it easier for you to get an updated version of the docker-compose.yml file and not lose any changes you have made to it.

Some of the settings in the docker-compose.yml can be set using a .env file. You can download either the latest or the develop version. If you have cloned the GitHub repository it is also located in the docker folder. This file should be called .env and be placed in the same folder as your docker-compose.yml file. This file will allow you to set which version of PEcAn or BETY to use. See the comments in this file to control the settings. Option you might want to set are:

PECAN_VERSION : The docker images to use for PEcAn. The default is latest which is the latest released version of PEcAn. Setting this to develop will result in using the version of PEcAn which will become the next release.
PECAN_FQDN : Is the name of the server where PEcAn is running. This is what is used to register all files generated by this version of PEcAn (see also TRAEFIK_HOST).
PECAN_NAME : A short name of this PEcAn server that is shown in the pull down menu and might be easier to recognize.
BETY_VERSION : This controls the version of BETY. The default is latest which is the latest released version of BETY. Setting this to develop will result in using the version of BETY which will become the next release.
TRAEFIK_HOST : Should be the FQDN of the server, this is needed when generating a SSL certificate. For SSL certificates you will need to set TRAEFIK_ACME_ENABLE as well as TRAEFIK_ACME_EMAIL.
TRAEFIK_IPFILTER : is used to limit access to certain resources, such as RabbitMQ and the Traefik dashboard.

A final file, which is optional, is a docker-compose.override.yml. You can download a version for the latest and develop versions. If you have cloned the GitHub repository it is located in the docker folder. Use this file as an example of what you can do, only copy the pieces over that you really need. This will allow you to make changes to the docker-compose file for your local installation. You can use this to add additional containers to your stack, change the path where docker stores the data for the containers, or you can use this to open up the postgresql port.

version: "3"

services:
  # expose database to localhost for ease of access
  postgres:
    ports:
      - 5432:5432

Once you have the docker-compose.yml file as well as the optional .env and docker-compose.override.yml in a folder you can start the PEcAn stack. The following instructions assume you are in the same directory as the file (if not, cd into it).

In the rest of this section we will use a few arguments for the docker-compose application. The location of these arguments are important. The general syntax of docker-compose is docker-compose <ARGUMENTS FOR DOCKER COMPOSE> <COMMAND> <ARGUMENTS FOR COMMAND> [SERVICES]. More generally, docker-compose options are very sensitive to their location relative to other commands in the same line – that is, docker-compose -f /my/docker-compose.yml -p pecan up -d postgres is not the same as docker-compose -d postgres -p pecan up -f /my/docker-compose.yml. If expected ever don’t seem to be working, check that the arguments are in the right order.)

-f <filename> : ARGUMENTS FOR DOCKER COMPOSE : Allows you to specify a docker-compose.yml file explicitly. You can use this argument multiple times. Default is to use the docker-compose.yml and docker-compose.override.yml in your current folder.
-p <projectname> : ARGUMENTS FOR DOCKER COMPOSE : Project name, all volumes, networks, and containers will be prefixed with this argument. The default value is to use the current folder name.
-d : ARGUMENTS FOR up COMMAND : Will start all the containers in the background and return back to the command shell.

If no services as added to the docker-compose command all services possible will be started.

24.3.2 Initialize PEcAn (first time only)

Before you can start to use PEcAn for the first time you will need to initialize the database (and optionally add some data). The following two sections will first initialize the database and secondly add some data to the system.

24.3.2.1 Initialize the PEcAn database

The commands described in this section will set up the PEcAn database (BETY) and pre-load it with some common “default” data.

docker-compose -p pecan up -d postgres

# If you have a custom docker-compose file:
# docker-compose -f /path/to/my-docker-compose.yml -p pecan up -d postgres

The breakdown of this command is as follows:

-p pecan – This tells docker-compose to do all of this as part of a “project” -p we’ll call pecan. By default, the project name is set to the name of the current working directory. The project name will be used as a prefix to all containers started by this docker-compose instance (so, if we have a service called postgres, this will create a container called pecan_postgres).
up -d – up is a command that initializes the containers. Initialization involves downloading and building the target containers and any containers they depend on, and then running them. Normally, this happens in the foreground, printing logs directly to stderr/stdout (meaning you would have to interrupt it with Ctrl-C), but the -d flag forces this to happen more quietly and in the background.
postgres – This indicates that we only want to initialize the service called postgres (and its dependencies). If we omitted this, docker-compose would initialize all containers in the stack.

The end result of this command is to initialize a “blank” PostGIS container that will run in the background. This container is not connected to any data (yet), and is basically analogous to just installing and starting PostgreSQL to your system. As a side effect, the above command will also create blank data “volumes” and a “network” that containers will use to communicate with each other. Because our project is called pecan and docker-compose.yml describes a network called pecan, the resulting network is called pecan_pecan. This is relevant to the following commands, which will actually initialize and populate the BETY database.

Assuming the above has run successfully, next run the following:

docker run --rm --network pecan_pecan pecan/db

The breakdown of this command is as follows: {#docker-run-init}

docker run – This says we will be running a container.
--rm – This automatically removes the resulting container once the specified command exits, as well as any volumes associated with the container. This is useful as a general “clean-up” flag for one-off commands (like this one) to make sure you don’t leave any “zombie” containers or volumes around at the end.
--network pecan_pecan – Thsi will start the container in the same network space as the posgres container, allowing it to push data into the database.
pecan/db – This is the name of the container, this holds a copy of the database used to initialize the postgresql database.

Note that this command may throw a bunch of errors related to functions and/or operators already existing. This is normal – it just means that the PostGIS extension to PostgreSQL is already installed. The important thing is that you see output near the end like:

----------------------------------------------------------------------
Safety checks

----------------------------------------------------------------------

----------------------------------------------------------------------
Making sure user 'bety' exists.

If you do not see this output, you can look at the troubleshooting section at the end of this section for some troubleshooting tips, as well as some solutions to common problems.

Once the command has finished successfully, proceed with the next step which will load some initial data into the database and place the data in the docker volumes.

24.3.2.2 Add first user to PEcAn database

You can add an initial user to the BETY database, for example the following commands will add the guestuser account as well as the demo carya account:

# guest user
docker-compose run --rm bety user guestuser guestuser "Guest User" guestuser@example.com 4 4

# example user
docker-compose run --rm bety user carya illinois "Carya Demo User" carya@example.com 1 1

24.3.2.3 Add example data (first time only)

The following command will add some initial data to the PEcAn stack and register the data with the database.

docker run -ti --rm --network pecan_pecan --volume pecan_pecan:/data --env FQDN=docker pecan/data:develop

The breakdown of this command is as follows:

docker run – This says we will be running a specific command inside the target Docker container. See docker run --help and the Docker run reference for more information.
-ti – This is actually two flags, -t to allocate a pseudo-tty and -i to keep STDIN open even if detached. -t is necessary to ensure lower-level script commands run correctly. -i makes sure that the command output (stdin) is displayed.
--rm – This automatically removes the resulting container once the specified command exits, as well as any volumes associated with the container. This is useful as a general “clean-up” flag for one-off commands (like this one) to make sure you don’t leave any “zombie” containers or volumes around at the end.
--network pecan_pecan – This indicates that the container will use the existing pecan_pecan network. This network is what ensures communication between the postgres container (which, recall, is just a PostGIS installation with some data) and the “volumes” where the actual data are persistently stored.
pecan/data:develop – This is the name of the image in which to run the specified command, in the form repository/image:version. This is interpreted as follows:
- First, it sees if there are any images called pecan/data:develop available on your local machine. If there are, it uses that one.
- If that image version is not available locally, it will next try to find the image online. By default, it searches Docker Hub, such that pecan/data gets expanded to the container at https://hub.docker.com/r/pecan/data. For custom repositories, a full name can be given, such as hub.ncsa.illinois.edu/pecan/data:latest.
- If :version is omitted, Docker assumes :latest. NOTE that while online containers should have a :latest version, not all of them do, and if a :latest version does not exist, Docker will be unable to find the image and will throw an error.
Everything after the image name (here, pecan/data:develop) is interpreted as an argument to the image’s specified entrypoint.
--volume pecan_pecan:/data – This mounts the data from the subsequent container (pecan/data:develop) onto the current project volume, called pecan_pecan (as with the network, the project name pecan is the prefix, and the volume name also happens to be pecan as specified in the docker-compose.yml file).
--env FQDN=docker – the Fully Qualified Domain Name, this is the same value as specified in the .env file (for the web, monitor and executor containers). This will link the data files to the name in the machines table in BETY.
pecan/data:develop – As above, this is the target image to run. Since there is no argument after the image name, this command will run the default command (CMD) specified for this docker container. In this case, it is the docker/add_data.sh script from the PEcAn repository.

Under the hood, this container runs the docker/add-data.sh script, which copies a bunch of input files and registers them with the PEcAn database.

Successful execution of this command should take some time because it involves copying reasonably large amounts of data and performing a number of database operations.

24.3.2.4 Start PEcAn

If you already completed the above steps, you can start the full stack by just running the following:

docker-compose -p pecan up -d

This will build and start all containers required to run PEcAn. With the -d flag, this will run all of these containers quietly in the background, and show a nice architecture diagram with the name and status of each container while they are starting. Once this is done you have a working instance of PEcAn.

If all of the containers started successfully, you should be able to access the various components from a browser via the following URLs (if you run these commands on a remote machine replace localhost with the actual hostname).

PEcAn web interface (running models) – http://pecan.localhost/pecan/ (NOTE: The trailing backslash is necessary.)
PEcAn documentation and home page – http://pecan.localhost/
BETY web interface – http://pecan.localhost/bety/
Monitor, service that monitors models and shows all models that are online as well as how many instances are online and the number of jobs waiting. The output is in JSON – http://pecan.localhost/monitor/

24.3.2.5 Start model runs using curl

To test PEcAn you can use the following curl statement, or use the webpage to submit a request (if you run these commands on a remote machine replace localhost with the actual hostname):

curl -v -X POST \
    -F 'hostname=docker' \
    -F 'modelid=5000000002' \
    -F 'sitegroupid=1' \
    -F 'siteid=772' \
    -F 'sitename=Niwot Ridge Forest/LTER NWT1 (US-NR1)' \
    -F 'pft[]=temperate.coniferous' \
    -F 'start=2004/01/01' \
    -F 'end=2004/12/31' \
    -F 'input_met=5000000005' \
    -F 'email=' \
    -F 'notes=' \
    'http://pecan.localhost/pecan/04-runpecan.php'

This should return some text with in there Location: this is shows the workflow id, you can prepend http://pecan.localhost/pecan/ to the front of this, for example: http://pecan.localhost/pecan/05-running.php?workflowid=99000000001. Here you will be able to see the progress of the workflow.

To see what is happening behind the scenes you can use look at the log file of the specific docker containers, once of interest are pecan_executor_1 this is the container that will execute a single workflow and pecan_sipnet_1 which executes the sipnet mode. To see the logs you use docker logs pecan_executor_1 Following is an example output:

2018-06-13 15:50:37,903 [MainThread     ] INFO    : pika.adapters.base_connection - Connecting to 172.18.0.2:5672
2018-06-13 15:50:37,924 [MainThread     ] INFO    : pika.adapters.blocking_connection - Created channel=1
2018-06-13 15:50:37,941 [MainThread     ] INFO    : root -  [*] Waiting for messages. To exit press CTRL+C
2018-06-13 19:44:49,523 [MainThread     ] INFO    : root - b'{"folder": "/data/workflows/PEcAn_99000000001", "workflowid": "99000000001"}'
2018-06-13 19:44:49,524 [MainThread     ] INFO    : root - Starting job in /data/workflows/PEcAn_99000000001.
2018-06-13 19:45:15,555 [MainThread     ] INFO    : root - Finished running job.

This shows that the executor connects to RabbitMQ, waits for messages. Once it picks up a message it will print the message, and execute the workflow in the folder passed in with the message. Once the workflow (including any model executions) is finished it will print Finished. The log file for pecan_sipnet_1 is very similar, in this case it runs the job.sh in the run folder.

To run multiple executors in parallel you can duplicate the executor section in the docker-compose file and just rename it from executor to executor1 and executor2 for example. The same can be done for the models. To make this easier it helps to deploy the containers using Kubernetes allowing to easily scale up and down the containers.

24.3.3 Troubleshooting

When initializing the database, you will know you have encountered more serious errors if the command exits or hangs with output resembling the following:

LINE 1: SELECT count(*) FROM formats WHERE ...
                             ^
Error: Relation `formats` does not exist

If the above command fails, you can try to fix things interactively by first opening a shell inside the container…

docker run -ti --rm --network pecan_pecan pecan/bety:latest /bin/bash

…and then running the following commands, which emulate the functionality of the entrypoint.sh with the initialize argument.

# Create the bety role in the postgresql database
psql -h postgres -p 5432 -U postgres -c "CREATE ROLE bety WITH LOGIN CREATEDB NOSUPERUSER NOCREATEROLE PASSWORD 'bety'"

# Initialize the bety database itself, and set to be owned by role bety
psql -h postgres -p 5432 -U postgres -c "CREATE DATABASE bety WITH OWNER bety"

# If either of these fail with a "role/database bety already exists",
# that's fine. You can safely proceed to the next command.

# Load the actual bety database tables and values
./script/load.bety.sh -a "postgres" -d "bety" -p "-h postgres -p 5432" -o bety -c -u -g -m ${LOCAL_SERVER} -r 0 -w https://ebi-forecast.igb.illinois.edu/pecan/dump/all/bety.tar.gz