10 Remote execution with PEcAn

Remote execution allows the user to leverage the power and storage of high performance computing clusters, AWS instances, or specially configured virtual machines, but without leaving their local working environment. PEcAn uses remote execution primarily to run ecosystem models.

The infrastructure for remote execution lives in the PEcAn.remote package (base/remote in the PEcAn repository).

This section describes the following:

Checking capabilities to connect to the remote machine correctly:

Basics of command line SSH
SSH authentication with keys and passwords
Basics of SSH tunnels, and how they are used in PEcAn

Description of PEcAn related tools that control remote execution

Basic remote execution R functions in PEcAn.remote

SETUP- Configuration Files and settings

Remote model execution configuration in the pecan.xml and config.php
Additional information about preparing remote servers for execution

10.1 Basics of SSH

All of the PEcAn remote infrastructure depends on the system ssh utility, so it’s important to make sure this works before attempting the advanced remote execution functionality in PEcAn.

To connect to a remote server interactively, the command is simply:

ssh <username>@<hostname>

For instance, my connection to the BU shared computing cluster looks like:

ssh ashiklom@geo.bu.edu

It will prompt me for my BU password, and, if successful, will drop me into a login shell on the remote machine.

Alternatively to the login shell, ssh can be used to execute arbitrary code, whose output will be returned exactly as it would if you ran the command locally. For example, the following:

ssh ashiklom@geo.bu.edu pwd

will run the pwd command, and return the path to my home directory on the BU SCC. The more advanced example below will run some simple R code on the BU SCC and return the output as if it was run locally.

ssh ashiklom@geo.bu.edu Rscript -e "seq(1, 10)"

10.2 SSH authentication – password vs. SSH key

Because this server uses passwords for authentication, this command will then prompt me for my password.

An alternative to password authentication is using SSH keys. Under this system, the host machine (say, your laptop, or the PEcAn VM) has to generate a public and private key pair (using the ssh-keygen command). The private key (by default, a file in ~/.ssh/id_rsa) lives on the host machine, and should never be shared with anyone. The public key will be distributed to any remote machines to which you want the host to be able to connect. On each remote machine, the public key should be added to a list of authorized keys located in the ~/.ssh/authorized_keys file (on the remote machine). The authorized keys list indicates which machines (technically, which keys – a single machine, and even a single user, can have many keys) are allowed to connect to it. This is the system used by all of the PEcAn servers (pecan1, pecan2, test-pecan).

10.2.1 SSH tunneling

SSH authentication can be more advanced than indicated above, especially on systems that require dual authentication. Even simple password-protection can be tricky in scripts, since (by design) it is fairly difficult to get SSH to accept a password from anything other than the raw keyboard input (i.e. SSH doesn’t let you pass passwords as input or arguments, because this exposes your password as plain text).

A convenient and secure way to follow SSH security protocol, but prevent having to go through the full authentication process every time, is to use SSH tunnels (or “sockets”, which are effectively synonymous). Essentially, an SSH socket is a read- and write-protectected file that contains all of the information about an SSH connection.

To create an SSH tunnel, use a command like the following:

ssh -n -N -f -o ControlMaster=yes -S /path/to/socket/file <username>@<hostname>

If appropriate, this will prompt you for your password (if using password authentication), and then will drop you back to the command line (thanks to the -N flag, which runs SSH without executing a command, the -f flag, which pushes SSH into the background, and the -n flag, which prevents ssh from reading any input). It will also create the file /path/to/socket/file.

To use this socket with another command, use the -S /path/to/file flag, pointing to the same tunnel file you just created.

ssh -S /path/to/socket/file <hostname> <optional command>

This will let you access the server without any sort of authentication step. As before, if <optional command> is blank, you will be dropped into an interactive shell on the remote, or if it’s a command, that command will be executed and the output returned.

To close a socket, use the following:

ssh -S /path/to/socket/file <hostname> -O exit

This will delete the socket file and close the connection. Alternatively, a scorched earth approach to closing the SSH tunnel if you don’t remember where you put the socket file is something like the following:

pgrep ssh   # See which processes will be killed
pkill ssh   # Kill those processes

…which will kill all user processes called ssh.

To automatically create tunnels following a specific pattern, you can add the following to your ~/.ssh/config

Host <hostname goes here>
 ControlMaster auto
 ControlPath /tmp/%r@%h:%p

For more information, see man ssh.

10.2.2 SSH tunnels and PEcAn

Many of the PEcAn.remote functions assume that a tunnel is already open. If working from the web interface, the tunnel will be opened for you by some under-the-hood PHP and Bash code, but if debugging or working locally, you will have to create the tunnel yourself. The best way to do this is to create the tunnel first, outside of R, as described above. (In the following examples, I’ll use my username ashiklom connecting to the test-pecan server with a socket stored in /tmp/testpecan. To follow along, replace these with your own username and designated server, respectively).

ssh -nNf -o ControlMaster=yes -S /tmp/testpecan ashiklom@test-pecan.bu.edu

Then, in R, create a host object, which is just a list containing the elements name (hostname) and tunnel (path to tunnel file).

my_host <- list(name = "test-pecan.bu.edu", tunnel = "/tmp/testpecan")

This host object can then be used in any of the remote execution functions.

10.3 Basic remote execute functions

The PEcAn.remote::remote.execute.cmd function runs a system command on a remote server (or on the local server, if host$name == "localhost").

x <- PEcAn.remote::remote.execute.cmd(host = my_host, cmd = "echo", args = "Hello world")
x

Note that remote.execute.cmd is similar to base R’s system2, in that the base command (in this case, echo) is passed separately from its arguments ("Hello world"). Note also that the output of the remote command is returned as a character.

For R code, there is a special wrapper around remote.execute.cmd – PEcAn.remote::remote.execute.R, which runs R code (passed as a string) on a remote and returns the output.

code <- "
    x <- 2:4
    y <- 3:1
    x ^ y
"
out <- PEcAn.remote::remote.execute.R(code = code, host = my_host)

For additional functions related to remote file operations and other stuff, see the PEcAn.remote package documentation.

10.4 Remote model execution with PEcAn

The workhorse of remote model execution is the PEcAn.remote::start.model.runs function, which distributes execution of each run in a list of runs (e.g. multiple runs in an ensemble) to the local machine or a remote based on the configuration in the PEcAn settings.

Broadly, there are three major types of model execution:

Serialized (PEcAn.remote::start_serial) – This runs models one at a time, directly on the local machine or remote (i.e. same as calling the executables one at a time for each run).
Via a queue system, (PEcAn.remote::start_qsub) – This uses a queue management system, such as SGE (e.g. qsub, qstat) found on the BU SCC machines, to submit jobs. For computationally intensive tasks, this is the recommended way to go.
Via a model launcher script (PEcAn.remote::setup_modellauncher) – This is a highly customizable approach where task submission is controlled by a user-provided script (launcher.sh).

10.4.1 XML configuration

The relevant section of the PEcAn XML file is the <host> block. Here is a minimal example from one of my recent runs:

<host>
    <name>geo.bu.edu</name>
    <user>ashiklom</user>
    <tunnel>/home/carya/output//PEcAn_99000000008/tunnel/tunnel</tunnel>
</host>

Breaking this down:

name – The hostname of the machine where the runs will be performed. Set it to localhost to run on the local machine.
user – Your username on the remote machine (note that this may be different from the username on your local machine).
tunnel – This is the tunnel file for the connection used by all remote execution files. The tunnel is created automatically by the web interface, but must be created by the user for command line execution.

This configuration will run in serialized mode. To use qsub, the configuration is slightly more involved:

<host>
  <name>geo.bu.edu</name>
  <user>ashiklom</user>
  <qsub>qsub -V -N @NAME@ -o @STDOUT@ -e @STDERR@ -S /bin/bash</qsub>
  <qsub.jobid>Your job ([0-9]+) .*</qsub.jobid>
  <qstat>qstat -j @JOBID@ || echo DONE</qstat>
  <tunnel>/home/carya/output//PEcAn_99000000008/tunnel/tunnel</tunnel>
</host>

The additional fields are as follows:

qsub – The command used to submit jobs to the queue system. Despite the name, this can be any command used for any queue system. The following variables are available to be set here:
- @NAME@ – Job name to display
- @STDOUT@ – File to which stdout will be redirected
- @STDERR@ – File to which stderr will be redirected
qsub.jobid – A regular expression, from which the job ID will be determined. This string will be parsed by R as jobid <- gsub(qsub.jobid, "\\1", output) – note that the first pattern match is taken as the job ID.
qstat – The command used to check the status of a job. Internally, PEcAn will look for the DONE string at the end, so a structure like <some command indicating if any jobs are still running> || echo DONE is required. The @JOBID@ here is the job ID determined from the qsub.jobid parsing.

Documentation for using the model launcher is currently unavailable.

10.4.2 Configuration for PEcAn web interface

The config.php has a few variables that will control where the web interface can run jobs, and how to run those jobs. It is located in the /web directory and if you have not touched it yet it will be named as config.example.php. Rename it to ’config.php` and edit by folowing the following directions.

These variables are $hostlist, $qsublist, $qsuboptions, and $SSHtunnel. In the near future $hostlist, $qsublist, $qsuboptions will be combined into a single list.

$SSHtunnel : points to the script that creates an SSH tunnel. The script is located in the web folder and the default value of dirname(__FILE__) . DIRECTORY_SEPARATOR . "sshtunnel.sh"; most likely will work.

$hostlist : is an array with by default a single value, only allowing jobs to run on the local server. Adding any other servers to this list will show them in the pull down menu when selecting machines, and will trigger the web page to be show to ask for a username and password for the remote execution (make sure to use HTTPS setup when asking for password to prevent it from being send in the clear).

$qsublist : is an array of hosts that require qsub to be used when running the models. This list can include $fqdn to indicate that jobs on the local machine should use qsub to run the models.

$qsuboptions : is an array that lists options for each machine. Currently it support the following options (see also [Run Setup] and look at the tags)

array("geo.bu.edu" =>
    array("qsub"   => "qsub -V -N @NAME@ -o @STDOUT@ -e @STDERR@ -S /bin/bash",
          "jobid"  => "Your job ([0-9]+) .*",
          "qstat"  => "qstat -j @JOBID@ || echo DONE",
          "job.sh" => "module load udunits R/R-3.0.0_gnu-4.4.6",
          "models" => array("ED2"    => "module load hdf5"))

In this list qsub is the actual command line for qsub, jobid is the text returned from qsub, qstat is the command to check to see if the job is finished. job.sh and the value in models are additional entries to add to the job.sh file generated to run the model. This can be used to make sure modules are loaded on the HPC cluster before running the actual model.

10.5 Running PEcAn code for remotely

You do not need to download PEcAn fully on your remote machine. You can compile and install the model specific code pieces of PEcAn on the cluster easily without having to install the full code base of PEcAn (and all OS dependencies). Use the git clone command to:

devtools::install_github("pecanproject/pecan", subdir = 'base/utils')

Next we need to install the model specific pieces, this is done almost the same (example for ED2):

devtools::install_github("pecanproject/pecan", subdir = 'models/ed')

This should install dependencies required.

The following are some notes on how to install the model specifics on different HPC clusters*

10.5.1 geo.bu.edu

Following modules need to be loaded:

module load hdf5 udunits R/R-3.0.0_gnu-4.4.6

Next the following packages need to be installed, otherwise it will fall back on the older versions install site-library

install.packages(c('udunits2', 'lubridate'), 
   configure.args=c(udunits2='--with-udunits2-lib=/project/earth/packages/udunits-2.1.24/lib --with-udunits2-include=/project/earth/packages/udunits-2.1.24/include'),
   repos='http://cran.us.r-project.org')

Finally to install support for both ED and SIPNET:

devtools::install_github("pecanproject/pecan", subdir = 'base/utils')
devtools::install_github("pecanproject/pecan", subdir = 'models/sipnet')
devtools::install_github("pecanproject/pecan", subdir = 'models/ed')