34 What is Docker?

Docker is a relativly new technology to encapsulate software as a service. This allow for simple distribution of software since all depedencies are installed as well. Unlike traditional VM’s, in docker each component will only run a single service/process and is build on top of existing services provided by the host OS (such as disk access, networking, memory managment etc.). To create a complex system such as PEcAn, we take multiple of these components and link them together.

A common term used in docker is that of an image or container. This refers to all the software needed to run a single service, including the service itself, bundled and installed. The difference between an image and a container is that a container is the execution of an image (in object oriented programming, an image is a class and a container is an instance of the class). When creating docker images a good mental model to follow is to have a single process per container, and have multiple containers together create an application.

For example in the case of BETY we will need at least two processes, the database and the BETY web application. When distributing a VM with the BETY application we will need to create VM with a base OS (in our case this would be Ubuntu), a database (PostgreSQL) and the BETY web-app (ruby). This requires us to install the dependencies and applications. In the case of Docker, we will need a container for the database (standard PostGIS container) as well as the custom BETY container (build on top of the existing Ruby container), both running on a host OS (coreOS for example). When starting BETY we start the PostGIS container first, and next start the BETY container telling it where it can find the PostGIS container. To upgrade we stop the BETY container, download the latest version, tell it to upgrade the database, and start the BETY container. There is no need to install new dependencies for BETY since they are all shipped as part of the container.

The containers that work together to build an application, BETY or PEcAn, is often referred to as a stack. Using docker we can have multiple of these stacks run in parallel. For example we can have two PEcAn stacks running next to each other. To separate these stacks each of them is given a unique name, and all containers are prefixed with this name. Inside these stacks the containers can talk with each other using the generic names, for example they can use postgres to talk to the postgres database. To prevent different stacks from talking to the wrong container, each stack has its own network. This will only allow those containers in the same network to talk to each other. Finally when starting the stack you can define explicitly what external port maps to what port inside the containers, or let docker pick them. In the case of PEcAn we will automatically connect all containers to the pecan network.

In the case of PEcAn we want to use this ability to ship all dependencies as part of the image to make it easier for the users to download a new model, since the image will contain everything that is needed to run the model as part of PEcAn. If two models depend on different versions of the library we no longer need to worry on how to install these models next to each other and creating issues with the libraries used. Each image will contain only a single model and all libraries needed by that model.

The next section, architecture will go in more detail on the containers that make up the PEcAn framework and how these containers interact with each other.

34.1 Working with Docker

To run an image you will use the docker command line, for example:

docker run \
    --detach \
    --rm \
    --name postgresql \
    --network pecan \
    --publish 9876:5432 \
    --volume ${PWD}/postgres:/var/lib/postgresql/data \
    mdillon/postgis:9.6-alpine

This will start for example the PostgreSQL+PostGIS container. The options used are:

--detach makes the container run in the background.
--rm removes the container when it is finished (make sure to use the volume below).
--name the name of the container, also the hostname of the container which can be used by other docker containers in the same network inside docker.
--network pecan the network that the container should be running in, this leverages of network isolation in docker and allows this container to be connected to by others using the postgresql hostname.
--publish exposes the port to the outside world, this is like ssh, and maps port 9876 to port 5432 in the docker container
--volume maps a folder on your local machine to the machine in the container. This allows you to save data on your local machine.
mdillon/postgis:9.6-alpine is the actual image that will be run, in this case it comes from the group/person mdillon, the container is postgis and the version 9.6-alpine (version 9.6 build on alpine linux).

Other options that might be used:

--link makes it such that two containers can see each other.
--tty allocate a pseudo-TTY to send stdout and stderr back to the console.
--interactive keeps stdin open so the user can interact with the application running.
--env sets environment variables, these are often used to change the behavior of the docker container.

To see a list of all running containers you can use the following command:

docker ps

To see the log files of this container you use the following command (you can either use their name or id as returned by docker ps). The -f flag will follow the stdout/stderr from the container, use Ctrl-C to stop following the stdout/stderr.

docker logs -f postgresql

To stop a running container use:

docker stop postgresql

Containers that are started without the --detach can be stopped by pressing Ctrl-C, if you have started containers in the background they will keep on running until the machine is restarted or the container is stopped using docker stop

34.2 Working with docker-compose

In the case of PEcAn we have multiple docker containers that we want to work together. To help with this we can use docker-compose. This will take a file as input that lists all the containers that make up the application and the dependencies between these containers. For example in the case of BETY we need the postgres container as well as the BETY container.

version: "3"
services:
  postgres:
    image: mdillon/postgis:9.5
  bety:
    image: pecan/bety
    depends_on:
      - postgres

This simple file allows us to bring up a full BETY application with both database and BETY application. The BETY app will not be brought up until the database container has started.

You can now start this application using:

docker-compose up

This will start the application, and you will see the log files for the 2 different containers.