How to run jobs with PBS/Pro on cherry-creek

From NSIwiki

How to run jobs with PBS/Pro on Cherry-creek

Here is a quick step-by-step guide to getting started running MPI jobs on cherry-creek.

We want to run an MPI job that uses a total of 64 processes (cores). We also want to limit to 8 the number of processes running on each node (this allows us the flexibility of controlling how the system allocates the compute cores so we can have OPENMP threads or other special needs taken into account.

To compile a simple "hello world" mpi program (after logging into cherry-creek):

  module add intel                               # activate the Intel compiler suite
  module add intelmpi                            # activate the Intel MPI runtime

  cp /share/apps/intel/impi_latest/test/test.c test.c       # make a copy of the sample hello world program
  mpicc test.c -o testc                                     # compile the sample program

Create a file called testc.pbs with the following (starting in column 1):

  #PBS -l select=8:ncpus=8:mpiprocs=8:nmics=0:mem=60Gb
  #PBS -N test-job  
  #PBS -l cput=10:0:0                                      # limit job to 10 cpu hours (approx 98% of walltime)
  #PBS -l walltime=10:0:0
  #PBS -m abe                                              # send email on abort/begin/end of job
  #PBS -M                             # address to send email messages to...
  module add intel intelmpi
  # change allocated nodes over to use the infiniband interconnect
  sed -i 's/.local/.ibnet/g' $NODEFILE
  echo -n "==== job started: "
  echo "=== limits ==="
  ulimit -s unlimited
  ulimit -a
  echo "=== environment ==="
  echo "=== nodes ==="
  /usr/bin/time mpirun -np 64 -hostfile $NODEFILE ./testc
  echo -n "=== job finished: "
  exit 0

The lines starting with #PBS tells the job scheduler how to allocate resources and other options for your job. The options of particular interest are:

  • select=# -- allocate # separate nodes
  • ncpus=# -- on each node allocate # cpus (cores)
  • mpiprocs=# -- on each node allocate # cpus (of the ncpus allocated) to MPI
  • nmics=# -- on each node allocate # of Intel Xeon PHI coprocessor cards.
  • mem=# -- on each node allocate # (kb, mb, gb) of memory.

Cherry-creek has three types of compute nodes:

  • cc1 -- 48 "original" cherry-creek 1 nodes (2 Intel Xeon E5-2697v2 12core, 128Gb ram), with 3 Intel Xeon PHI 7120P (61 core, 16Gb ram) coprocessors
  • cp -- 48 "penguin" cherry-creek 2 (Penguin Computing Relion Servers) nodes (2 Intel Xeon E5-2640v3 8 core, 128Gb ram), with 4 Intel Xeon PHI 31S1P (57 core, 16Gb ram)
  • ci -- 24 "waterfall" cherry-creek 2 (Intel Servers) nodes (2 Intel Xeon E5-2697v2 12core, 192Gb ram), with 2 Intel Xeon PHI 7120P (61 core, 16Gb ram) coprocessors.

Unless you restrict the node type by using the "Qlist=", you jobs could have nodes allocated from different node types.

To limit the nodes assigned to your job to the "cp" nodes (for example your code expects 4 mics per node), you would use a select statement similar to the following:

  #PBS -l select=2:ncpus=4:mpiprocs=4:nmics=4:mem=60gb:Qlist="cp"

Also, the system requires a minimum of 4Gb of ram for the operating system, so do not allocate more than 124Gb on the cc1 and cp nodes, or 188Gb on the ci nodes.

By varying the above, you can control how cpu resources are allocated, The above example allocates 64 cores all of which are for use by MPI (8 nodes with 8 cpus on each node).

To submit the test job:

  qsub testc.pbs

Unlike the other clusters Eureka and Yucca, there is only a single execution queue (called workq) for submitting jobs.