Grid Computing Toolbox Shell Scripts and Batch Files - Maple Help

Home : Support : Online Help : Toolboxes : Grid Computing : Grid/Batch

Grid Computing Toolbox Shell Scripts and Batch Files

This section discusses using the Grid modes, hpc and mpi, and the shell scripts supporting them. These scripts are intended to be run from the operating system command interpreters.

Using Mode = HPC

This script can be used to submit a file containing Maple code directly to a running Grid from an Operating System command line.
Before running the launcher script make sure that Grid Servers are running on the remote machines that are to be used.
The remote servers must all be configured similarly for port number and licenses.

Starting and Stopping the Grid Server

There are two methods to start/stop the Grid Server:
1) interactively calling functions in the Grid Library that is part of the Grid Computing Toolbox
2) using script files at the operating system level.

 Interactive Start/Stop of the Grid Server via the Grid Library One method to start/stop the server is interactively using the Grid library functions StartServer and StopServer. These functions are Maple commands and can be used when interactively running or testing grid code from within a Maple Worksheet.

Shell Script and Batch File to Start/Stop the Grid Server

The script, gridserver.sh (Linux) and batch file, gridserver.bat (Windows) start or stop an instance of the Grid Server on a local computer. These scripts are run from the operating system command interpreter.

The syntax of the gridserver script/batch file is one of

gridserver [options] startmultiple [port [count]]

gridserver [options] start         [port [cpu_index]]

gridserver [options] stop          [port [cpu_index]]

where one of the following modes can be specified

 start Causes a single Node to start on the indicated port startmultiple Causes count Nodes to be started on a single computer beginning at the indicated port. stop Halts the Node associated with port. If Node was from a startmultiple then all Nodes are halted.

If no mode is specified then startmultiple is assumed and the modes take the following optional parameters

 cpu_index The index. Should be set zero to allow AutoDiscovery with broadcast. port The port associated with the Node. If not specified then use port from the -p option or from the configuration file. count The number of Nodes to start. If not specified the use the -n option or the value from the configuration file.

Options may be one or more of:

 -a address UDP broadcast address (e.g 192.168.255.255) for AutoDiscovery. Use 255 to indicate the portion of the subnet to use. This example will broadcast to all computers on the 192.168.*.* address. -b port UDP broadcast port (e.g. 4400). Set to 0 to disable Autodiscovery. -d Enable debug messages to the log file -f file Path and filename of log file (e.g. logs/grid.log)The path can be relative to Grid Computing Toolbox directory -m path The full path to commandline Maple binary -n num Number of Nodes to create -p port TCP/IP base port for the Nodes

Any option not specified will be taken from the configuration file  conf/grid.properties

The  startmultiple  mode starts the specified count of Grid Servers on the same machine. Count should generally not exceed the number of CPUs on the machine as then several instances of Maple will likely use the same CPU instead of being distributed over multi-CPUs. However one reason to set count higher than the number of CPUs is during initial testing of grid code using a single machine. This might be done when a network of Grid Servers is not available. To perform testing using a single machine the following command could be used at the operating system prompt:

gridserver.sh -a 127.0.0.255 -b 4401 startmultiple 2000 5   (Linux)

gridserver.bat -a 127.0.0.255 -b 4401 startmultiple 2000 5   (Windows)

This would start 5 instances of the Grid Server on the local machine each instance using a port from 2000 to 2004.
Jobs would be submitted using the Launch library function after issuing a Setup using localhost (127.0.0.1) and port 2000 as the parameters.

Example Code Accessing the Running Servers

Example for the Server running on port 2000 on the local computer.

 > with(Grid):
 > Setup("hpc", host="localhost", port=2000);
 ${2000}$ (1.1.2.1.1)

A String representing Maple code to execute on the remote nodes is Launched.

 > code := "printf(\"Hello from node %a\\n\", Grid:-MyNode());"; result := Launch(code, numnodes=2);
 ${"printf\left("Hello from node %a\n", Grid:-Util:-MyNode\left(\right)\right);"}$ Node 0: Hello from node 0 Node 1: Hello from node 1 ${""}$ (1.1.2.1.2)

The start command would be used where Autodiscovery is not practical (e.g. Grid Servers are not on a common Internet subnet). In this case each server would be started using an operating system command similar to:

gridserver.sh -b 0 start 2000    (Linux)

gridserver.bat -b 0 start 2000   (Windows)

The same port must be used for each computer where the Grid Server is started as well as on the client machine from which the grid code is to be sent from. Access to the Grid Servers started with the start option must be done from either the Launcher script or PBS script shown below. This is because these scripts will receive a file listing the nodes to use in the grid computations.

The stop parameter would be used to terminate the running Grid Server on the local machine.

gridserver.sh stop 2000    (Linux)

gridserver.bat stop 2000   (Windows)

If the server was started with the startmultiple parameter then all instances of the Grid Server on that machine are stopped. For servers started with the start parameter then each instance would require a matching stop  call to halt that instance of the Grid Server.

Submitting Maple Code to the Grid Server via Script or Batch Files

These scripts can be used to submit a file containing Maple code directly to a running "HPC" Grid from an operating system command line interface.

Launcher can be invoked using either of the following 2 forms:

launcher.bat [-p port] maplefile nodefile   (Window)
launcher.sh  [-p port] maplefile nodefile (Unix/Linux/Mac)
or
launcher.bat [-p port] -h host -n num maplefile (Windows)
launcher.sh  [-p port] -h host -n num maplefile (Unix/Linux/Mac)

Specify the optional -p port option if the port of the master node is different than the default port in the grid.properties file.

launcher.bat [-p port] maplefile nodefile
launcher.sh  [-p port] maplefile nodefile

The servers that will be nodes for this configuration should be setup without Auto-discovery.

 Setup You must start servers on all nodes of the grid. Launching does not start the nodes. Set the broadcast port parameter to zero. This will turn off auto-discovery. All these servers should configured in a similar manner using the same port numbers. These servers can be individually started by logging on to each remote host and running the supplied script gridserver -b 0 start [port] where the option -b 0 turns off autodiscovery and port is optional. The value for port specifies the TCP/IP port that the nodes communicates on. If the port is omitted then the port is determined from the grid.properties file found in the /toolbox/Grid/conf directory. All the ports must be the same and must match the port in the properties file, grid.properties, that is on the machine where launcher is run from. This gridserver command should be run on each individual server. The job-scheduler now controls which nodes to run a job on.
 Usage For this form of the launcher invocation, the options are as follows: Option      Description port        - port that the node at host is listening on. This will override the grid.port value specified in the grid.properties file maplefile - specify the path and filename of a file containing Maple code to be run on each of the nodes nodefile  - specify the path to a file containing a list of grid servers. See below. To create a text file listing the names of the servers that you need to use for this Grid calculation. For example if you have 3 remote servers named grid1.somewhere.com, grid2.somewhere.com and grid3.somewhere.com you would create a text file, say nodes.txt that contained the lines. grid1.somewhere.com grid2.somewhere.com grid3.somewhere.com Note: the first server listed will be considered to be the master node. To submit a maple file, simple.mpl, on a Linux system to these nodes for execution you would issue the command launcher.sh ../samples/Simple.mpl nodes.txt The Grid launcher script will send the maple code contained in the file example.mpl to the nodes listed in the nodes.txt file. The server listed as the first node, grid1.somewhere.com in this example, will be considered the master node for controlling the computations. Output from the grid computations will be printed to the console's standard output.

Using Mode = MPI

The Grid Computing Toolbox (GCT) library code can utilize operating system implementations of the MPI standard (see http://www.mcs.anl.gov/research/projects/mpi/). This document outlines the basics of setting up and using MPI.

Windows

The current library supports Microsoft's MPI implementation, msmpi, found on the Windows Server 2008 HPC clusters. Other MPI implementations may work but have not been tested.

 Server Setup The default location of the  Grid toolbox server installation  is \toolbox\Grid  where  represents where the main Maple program is installed. Additionally, you should create the directory\shared\mpi plus subdirectories. The directory \shared\mpi\bin will contain additional scripts that are used by client workstations submitting Grid code to the cluster. The file  maple.ini contains generic settings to enable the Grid MPI procedures. The Maple command Grid[Setup]("mpi", mpidll="msmpi"); instructs the Grid toolbox to use the "mpi" interface instead of the default local implementation. The second argument, "msmpi" refers to the operating system basename of the MPI implementation dll or shared library. This dll or shared library should be found in one of the directories named in the PATH environment variable. Notes: The \shared\mpi directory on the head node should be shared with all the other compute nodes in the cluster. This is because any Maple GCT code submitted will be stored in the \\headnode\shared\mpi\grid subdirectory. Also any output generated will be stored there. This directory must allow read/write persmission for Grid users as their temporary files will be placed there. User files will be placed in \shared\mpi\grid\ where  will be the user who submitted the job.     Also the compute nodes should be configured so that PATH environment variable contains the directory for the cmaple.exe program.

Running the MPIlauncher Scripts

This script can be used to submit a file containing Maple code directly to a  Grid running MPI from an Operating System command line.

Before running the mpilauncher script make sure that Grid cluster is setup as per the instructions in the previous section.

The mpi launcher can be invoked using the following script command and a combination of options and parameters:

<maple>\toolbox\Grid\bin\mpilauncher.bat -s  hh nn  ff
<maple>\toolbox\Grid\bin\mpilauncher.bat -q|-r|-c  hh  jj
<maple>\toolbox\Grid\bin\mpilauncher.bat -a  hh
<maple>\toolbox\Grid\bin\mpilauncher.bat -d  ss

where the script command parameters are defined as:

 hh hostname of the head node of the HPC cluster nn number of nodes requested for the job ff absolute file path to Maple code jj job id of submitted job ss seconds for delay

and the script command options are defined as:

 -s submit Maple file, ff to nn nodes -q query status of running job jj -r retrieve results of completed job jj -c cleanup/remove temp files and output of completed job jj -d sleep for ss secs -a get number of total and available nodes

To get the total number of available nodes use the command:

mpilauncher -a hh

where hh is the headnode.

To set a pause in the excecution of the batch file, use the command

mpilauncher -d ss

where ss is the length of the pause

To submit the Maple script file containing your Maple code use the command:

mpilauncher -s hh nn fff

When you run the mpilauncher.bat script, the Maple script you specified with parameter ff is submitted to the headnode, hh. The headnode then hands copies this script to each of the nn nodes specified, where each node will run the Maple script.

The Maple script is copied to a directory on the head node. The directory will be of the form \shared\mpi\grid\<username> where <username> is the name of the user running the mpilauncher script.

The filename will be of the form maple_jj.code.mpl where jj is the job id for this submission. This job id is then echoed to the console.

Maple output is saved in maple_jj.stdout.txt and maple_jj.stderr.txt. Where jj is the job id.

To check the status of the submitted job use the command:

mpilauncher -q hh jj

where hh is the name of the headnode and jj is the job id of the submission.

The command returns one of the following status messages:

Done

 – the job is complete

Failed

 – the job was submitted but failed

Aborted

 – the job was aborted (usually done by a System Administrator)

Active

 – the job is running on the grid

Created

 – the job was submitted but is not running yet

To retrieve the results of a submitted job use the command:

mpilauncher -r  nn job

The contents of the file will be returned to the console.

To clean up results and temporary files from the head node use the command:

mpilauncher -c nn job

This will delete the file associated with the job id.

Linux

The current library supports the MPICH MPI implementation, MPICH2. Before you start setting up the Grid toolbox, you must ensure that you have a copy of MPICH2 installed on in the same location on each machine that you will be using for Grid computing.

You will need to have passwordless ssh logins for all computational nodes or else the mpiexec program used for job submission will prompt you for a password for every node when a job is submitted.

Server Setup

Before you can run the mpilauncher scripts, you must setup the machines you will be using for Grid computing, by performing the following steps.

 1 Create an NFS network share on one machine in the network that all users can read and write from. For example, /var/mpi.

> exportfs -o rw,root_squash,sync,no_subtree_check :/var/mpi

Other machines must mount this share at the same location and be readable and writable by all users. For example,

mount -t <type> grid01.myserver.com:/var/mpi/var/mpi

 2 The Grid toolbox creates user directories in this shared location to store intermediate work files, specifically maple code and output.
 3 Create a hostfile in the shared mount location, for example in the /var/mpi/hostfile directory. This hostfile should contain the hostname of each host, one per line, that MPICH2 can use as Grid nodes.

Optionally, after each hostname you can add a colon and a number to indicate how many nodes can be run on that host.

If you create this file in the shared mount location, for example, /var/mpi/hostfile, then all machines can share the same hostfile.

Example hostfile:

grid01.myserver.com    # Can handle 1 nodes

grid02.myserver.com:2  # Can handle 2 nodes

grid03.myserver.com:4  # Can handle 4 nodes, maybe a quadcore

grid04.myserver.com:2  # Can handle 2 nodes

 4 Install the Grid toolbox on each machine. The default location of the  Grid toolbox server installation  is /toolbox/Grid  where  represents where the main Maple program is installed.

Notes:

Also the compute nodes should be configured so that PATH environment variable contains the directory for the xmaple program.

The Maple command Grid[Setup]("mpi", mpidll="mpich"); instructs the Grid toolbox to use the "mpi" interface instead of the default local implementation. The second argument, "mpich" refers to the operating system basename of the MPI implementation dll or shared library. This dll or shared library should be found in one of the directories named in the PATH environment variable.

Running the MPIlauncher Scripts

This script can be used to submit a file containing Maple code directly to a  Grid running MPI from an Operating System command line.

Before running the mpilauncher script make sure that Grid cluster is setup as per the instructions in the previous section.

The mpi launcher can be invoked using the following script command and a combination of options and parameters:

<maple>/toolbox/Grid/bin/mpilauncher -s  hh nn  ff
<maple>/toolbox/Grid/bin/mpilauncher -q|-r|-c  hh  jj
<maple>/toolbox/Grid/bin/mpilauncher -a  hh
<maple>/toolbox/Grid/bin/mpilauncher -d  ss

where the script command parameters are defined as:

 hh localhost nn number of nodes requested for the job ff absolute file path to Maple code jj job id of submitted job ss seconds for delay

and the script command options are defined as:

 -s submit Maple file, ff to nn nodes -q query status of running job jj -r retrieve results of completed job jj -c cleanup/remove temp files and output of completed job jj -d sleep for ss secs -a get number of total and available nodes

To get the total number of available nodes use the command:

mpilauncher -a hh

where hh is localhost.

To set a pause in the excecution of the batch file, use the command

mpilauncher -d ss

where ss is the length of the pause

To submit the Maple script file containing your Maple code use the command:

mpilauncher -s hh nn fff

When you run the mpilauncher script, the Maple script you specified with parameter ff is submitted to the mpi program and then delegated to the appropriate nodes.

The Maple script is copied to a file in tthe shared directory that you created in the previous section. The name of the file will be of the form jj.mpl where jj is the job id for this submission.

Maple output is saved in maple_jj.stdout.txt and maple_jj.stderr.txt. Where jj is the job id.

To check the status of the submitted job use the command:

mpilauncher -q hh jj

where hh is localhost and jj is the job id of the submission.

The command returns one of the following status messages:

Done

 – the job is complete

Failed

 – the job was submitted but failed

Aborted

 – the job was aborted (usually done by a System Administrator)

Active

 – the job is running on the grid

Created

 – the job was submitted but is not running yet

To retrieve the results of a submitted job use the command:

mpilauncher -r  hh job

The contents of the file will be returned to the console.

To clean up results and temporary files from the head node use the command:

mpilauncher -c hh job

This will delete the file associated with the job id.

 PBSLauncher The pbslauncher.sh and pbslauncher.bat script are intended to be run in PBS(tm) environment. Jobs submitted to PBS have an environment variable PBS_NODEFILE that point to a file containing the names of the servers running this job, one server name per line in the file. This is the same format as the nodes.txt shown in the Launcher section above. Note: all servers in the PBS environment must be configured for the same ports via the grid.properties file as the port cannot be overridden from the pbslauncher script. To submit a maple file, example.mpl, on a Linux system to the PBS nodes for execution you would issue the command pbslauncher.sh example.mpl The pbslauncher script will determine the node file from the PBS_NODEFILE variable and then use this to invoke the equivalent of launcher.sh example.mpl \$PBS_NODEFILE to send the maple code contained in the file example.mpl to the PBS nodes. Output from the grid computations will be printed to the console's standard output.