2. MPSD HPC system#
Note
Following a major upgrade early 2023, the documentation and functionality may be suboptimal in places. Please do not hesitate to get in touch with questions and error reports.
For the Raven and Ada machine, please check Overview computing services.
2.1. Login nodes#
Login nodes are mpsd-hpc-login1.desy.de
and
mpsd-hpc-login2.desy.de
.
If you have not got access to the system and are a member of the Max Planck Institute for the Structure and Dynamics of Matter (MPSD), please request access to the MPSD HPC system by emailing MPSD IT at support[at]mpsd[dot]mpg[dot]de. Please provide your DESY username in the email and send the email from your Max Planck email account.
Note
During the first login your home directory will be created. This could take up to a minute. Please be patient.
2.2. Job submission#
Job submission is via Slurm.
Example slurm submission jobs are available below (Setting the rpath (finding libraries at runtime)).
2.2.1. Partitions#
The following partitions are available to all (partial output from
sinfo
):
PARTITION AVAIL TIMELIMIT NODES NODELIST
bigmem up 7-00:00:00 8 mpsd-hpc-hp-[001-008]
gpu up 7-00:00:00 2 mpsd-hpc-gpu-[001-002]
gpu-ayyer up 7-00:00:00 3 mpsd-hpc-gpu-[003-005]
public* up 7-00:00:00 49 mpsd-hpc-ibm-[001-030,035-036,043-049,053-062]
Please use the machines in the gpu
partition only if your code
supports nvidia-cuda.
Hardware resources per node:
public
16 physical cores (no hyperthreading, 16 CPUs in Slurm terminology)
64GB RAM
at most 2 nodes can be used for multi-node jobs (as the 10GB ethernet for MPI has a relatively high latency)
microarchitecture:
sandybridge
bigmem
96 physical cores (192 with hyperthreading, 192 CPUs in Slurm terminology)
2T RAM
fast FDR infiniband for MPI communication
microarchitecture:
broadwell
gpu
16 physical cores (32 with hyperthreading, 32 CPUs in Slurm terminology)
1.5T RAM
fast FDR infiniband for MPI communication
8 Tesla V100 GPUs
microarchitecture:
skylake_avx512
gpu-ayyer
16 physical cores (32 with hyperthreading, 32 CPUs in Slurm terminology)
372G RAM
fast FDR infiniband for MPI communication
4 Tesla V100 GPUs
microarchitecture:
cascadelake
The maximum runtime for any job is 7 days. Interactive jobs are limited to 12 hours.
2.2.2. Slurm default values#
Slurm defaults are an execution time of 1 hour, one task, one CPU,
4000MB of RAM, and the public
partition.
Nodes are physical multi-core shared-memory computer and by default are shared between multiple users (i.e. use is not exclusive, unless requested).
Logging onto nodes via ssh
is only possible once the nodes are
allocated (either via sbatch
when the job starts, or using
salloc
for interactive use, see
Interactive use of HPC nodes). This avoids accidental over-use of
resources and enables energy saving measures (such as switching compute
nodes off automatically if they are not in use).
2.2.3. Slurm CPUs#
Note that a “CPU” in Slurm terminology is a computational core (or a thread if hyperthreading is configured), whereas in the language in daily use we would consider a (populated) CPU (socket) to be the device holding many computational cores (such as 16 cores per CPU socket or so). We use the Slurm convention in this document as do the slurm commands.
2.2.4. Interactive use of HPC nodes#
For production computation, we typically write a batch file (see
Setting the rpath (finding libraries at runtime)), and submit these using the sbatch
command.
Sometimes, it can be helpful to login into an HPC node for example to
compile software or run interactive tests. The command to use in this
case is salloc
.
For example, requesting a job with all default settings:
user@mpsd-hpc-login1:~$ salloc
salloc: Granted job allocation 1272
user@mpsd-hpc-ibm-058:~$
We can see from the prompt (user@mpsd-hpc-ibm-058:~$
) that the Slurm
system has allocated the requested resources on node
mpsd-hpc-ibm-058
to us.
We can use the mpsd-show-job-resources
command to check some details
of the allocation:
user@mpsd-hpc-ibm-058:~$ mpsd-show-job-resources
345415 Nodes: mpsd-hpc-ibm-058
345415 Local Node: mpsd-hpc-ibm-058
345415 CPUSET: 0
345415 MEMORY: 4000 M
Here we see (CPUSET: 0
) that we have been allocated one CPU (in
Slurm terminology) and that CPU has got the index 0. If we have
requested multiple CPUs, we would find multiple numbers displayed (see
below).
We can finish our interactive session by typing exit
:
user@mpsd-hpc-ibm-058:~$ exit
exit
salloc: Relinquishing job allocation 1272
user@mpsd-hpc-login1:~$
Using a tmux
session while working interactively is advisable, as it
allows you to get back to the terminal in case you lose connection to
the session (e.g., due to network issues). A quick start for tmux
can be found tmux/tmux
If we desire exclusive use of a node (i.e. not shared with others), we
can use salloc --exclusive
(here we request a session time of 120
minutes):
user@mpsd-hpc-login2:~$ salloc --exclusive --time=120
salloc: Granted job allocation 1279
user@mpsd-hpc-ibm-061:~$ mpsd-show-job-resources
65911 Nodes: mpsd-hpc-ibm-061
65911 Local Node: mpsd-hpc-ibm-061
65911 CPUSET: 0-15
65911 MEMORY: 56000 M
We can see (in the output above) that all 16 CPUs of the node are allocated to us.
Assume we need 16 CPUs and 10GB of RAM for our interactive session (the 16 CPUs corresponds to the number of OpenMP threads, see OpenMP):
user@mpsd-hpc-login1:~$ salloc --mem=10000 --cpus-per-task=16
salloc: Granted job allocation 1273
user@mpsd-hpc-ibm-058:~$ mpsd-show-job-resources
345446 Nodes: mpsd-hpc-ibm-058
345446 Local Node: mpsd-hpc-ibm-058
345446 CPUSET: 0-15
345446 MEMORY: 10000 M
user@mpsd-hpc-ibm-058:~$
If we execute MPI programs, we can specify the number of nodes (a node is a computer node, with typically one, two or four CPU sockets), and how many (MPI) tasks (=processes) we want to run on that node. Imagine we ask for two nodes, and want to run 4 MPI processes on each:
user@mpsd-hpc-login1:~$ salloc --nodes=2 --tasks-per-node=4
salloc: Granted job allocation 1276
user@mpsd-hpc-ibm-058:~$ mpsd-show-job-resources
345591 Nodes: mpsd-hpc-ibm-[058-059]
345591 Local Node: mpsd-hpc-ibm-058
345591 CPUSET: 0-3
345591 MEMORY: 14000 M
user@mpsd-hpc-ibm-058:~$ srun hostname
mpsd-hpc-ibm-059
mpsd-hpc-ibm-059
mpsd-hpc-ibm-059
mpsd-hpc-ibm-059
mpsd-hpc-ibm-058
mpsd-hpc-ibm-058
mpsd-hpc-ibm-058
mpsd-hpc-ibm-058
The srun
command starts the execution of our (MPI) tasks. We use the
hostname
command above and can see that we have 4 of these commands
run on each node.
In Slurm, jobs default to the public
partition, but specifying
-p
followed by a partition name directs them to a different
partition.
user@mpsd-hpc-login1:~$ salloc --mem=1000 -p bigmem --cpus-per-task=12
salloc: Granted job allocation 1277
salloc: Waiting for resource configuration
salloc: Nodes mpsd-hpc-hp-002 are ready for job
user@mpsd-hpc-hp-003:~$ mpsd-show-job-resources
32114 Nodes: mpsd-hpc-hp-002
32114 Local Node: mpsd-hpc-hp-002
32114 CPUSET: 48-53,144-149
32114 MEMORY: 1000 M
This allocates memory from the bigmem
partition for the job.
2.2.5. Finding about my jobs#
There are multiple ways of finding out about your slurm jobs:
squeue --me
lists only your jobs (see below for output)mpsd-show-job-resources
can be used ‘inside’ the job (to verify hardware allocation is as desired)scontrol show job JOBID
provides a lot of detail
Example: We request 2 nodes, with 4 tasks (and by default one CPU per task)
user@mpsd-hpc-login1:~$ salloc --nodes=2 --tasks-per-node=4
salloc: Granted job allocation 1276
user@mpsd-hpc-login2:~$ squeue --me
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1276 public interact user R 11:37 2 mpsd-hpc-ibm-[058-059]
user@mpsd-hpc-ibm-058:~$ mpsd-show-job-resources
345591 Nodes: mpsd-hpc-ibm-[058-059]
345591 Local Node: mpsd-hpc-ibm-058
345591 CPUSET: 0-3
345591 MEMORY: 14000 M
user@mpsd-hpc-login2:~$ scontrol show job 1276
JobId=1276 JobName=interactive
UserId=UserId(28479) GroupId=GroupId(3512) MCS_label=N/A
<...>
RunTime=00:14:08 TimeLimit=01:00:00 TimeMin=N/A
Partition=public AllocNode:Sid=mpsd-hpc-login1.desy.de:3116660
NodeList=mpsd-hpc-ibm-[058-059]
BatchHost=mpsd-hpc-ibm-058
NumNodes=2 NumCPUs=8 NumTasks=8 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=8,mem=28000M,node=2,billing=8
Socks/Node=* NtasksPerN:B:S:C=4:0:*:1 CoreSpec=*
MinCPUsNode=4 MinMemoryCPU=4000M MinTmpDiskNode=0
<...>
2.3. Storage and quotas#
The MPSD HPC system provides two file systems: /home
and
/scratch
:
/home/$USER
($HOME
)
home file system for code and scripts
user quota (storage limit): 100 GB
regular backups
users have access to the backup of their data under
$HOME/.zfs/snapshots
/scratch/$USER
scratch file system for simulation output and other temporary data
there are no backups for
/scratch
: hardware error or human error can lead to data loss.A per-user quota of (by default) 25TB is applied. This is in place to prevent jobs that (unintentionally) write arbitrary amount of data to
/scratch
from filling up the file system and blocking the system for everyone.The following policy is applied to manage overall usage of
/scratch
:If
/scratch
fills up, the cluster becomes unusable. Should this happen, we will make space available through the following actions:purchase and installation of additional hardware to increase storage available
/scratch
(if funding and other constraints allow this)ask users to voluntarily reduce their usage of
/scratch
(by, for example, deleting some data, or archiving completed projects elsewhere)if 1. and 2. do not resolve the situation, a script will be started that deletes some of the files on
/scratch
(starting with the oldest files). Notice will be given of this procedure.
You can view your current file system usage using the mpsd-quota
command. Example output:
username@mpsd-hpc-login2:~$ mpsd-quota
location used avail use%
/home/username 8.74 GB 98.63 GB 8.86%
/scratch/username 705.27 GB 25.00 TB 2.82%
Recommendation for usage of /home/$USER
and /scratch/$USER
:
Put small files and important data into /home/$USER
. For example
source code, scripts to compile your source, compiled software, scripts
to submit jobs to slurm, post processing scripts, and perhaps also small
data files.
Put simulation output (in particular if the files are large) into
/scratch/$USER
. All the data in /scratch
should be re-computable
with reasonable effort (typically by running scripts stored somewhere in
/home/$USER
). This re-computation may be needed if data loss occurs
on /scratch
, the hardware retires, or if data needs to be deleted
from /scratch
because we run out of space on /scratch
.
Note
To facilitate the joint analysis of data, the /home
and /scratch
directories
are set up such that all users can read all directories of all other users. If you
want to keep your data in subfolder DIR
private, you should run a command like
chmod -R og-rx DIR
.
2.4. Software#
The software on the MPSD HPC system is provided via environment modules. This facilitates providing different versions of the same software. The software is organised in a hierarchical structure.
First, you need to decide which MPSD software environment version
you need. These are named according to calendar years: the first one in
2023 is 23a
. In preparation of that version, there is a dev-23a
which is used until 23a
is completed. In the examples below, this is
always dev-23a
. We select that version using the mpsd-modules
command, for example mpsd-modules dev-23a
.
In order to use a module we first have to load a base compiler and MPI. That way we can choose between different compilers and MPI implementations for a software. More details are given below.
From a high-level perspective, the required steps to use a particular module are:
Activate the MPSD software environment version of modules
Search for the module to find available versions and required base modules
Load required base modules (such as a compiler)
Load the desired module
2.4.1. TLDR#
Modules are organised in a hierarchical structure with compiler and MPI implementation as base modules.
Modules may be compiled with different feature sets. Use
mpsd-modules
for switching.a generic feature set (runs on all nodes), activated by default
mpsd-modules dev-23a
architecture-dependent feature sets (depending on the CPU microarchitecture
$MPSD_MICROARCH
of the nodes, a suitable optimised set is automatically selected when using the optionnative
)mpsd-modules dev-23a native
For more options use
mpsd-modules --help
.
To subsequently find and load modules:
module avail
module spider <module-name>
module load <module1> [<module2> ...]
toolchain
modules replicate easybuild toolchains and additionally contain all modules required to compile octopus.Configure wrapper scripts to compile octopus are available under
/opt_mpsd/linux-debian11/dev-23a/spack-environments/octopus/
. The configure wrapper script path is also available as an environment variable$MPSD_OCTOPUS_CONFIGURE
after a toolchain is loaded.Once a module
X
is loaded, the environment variable$MPSD_X_ROOT
provides the location of the module’s files. For example:~$ module load gcc/11.3.0 gsl/2.7.1 ~$ echo $MPSD_GSL_ROOT /opt_mpsd/linux-debian11/dev-23a/sandybridge/spack/opt/spack/linux-debian11-sandybridge/gcc-11.3.0/gsl-2.7.1-zp2gnohs5lxnujobwrbr3gicz2vdc5jj
Set the
rpath
for dependencies; do not useLD_LIBRARY_PATH
. See Setting the rpath (finding libraries at runtime).
2.4.2. Initial setup#
The MPSD HPC system consists of a heterogeneous set of compute nodes with different CPU features. This is reflected in the available software stack by providing both a generic set of modules that can be used on all nodes as well as specialised sets of modules for the different available (hardware) microarchitectures. The latter will only run on certain nodes.
A versioning scheme is used for the MPSD software environment to improve
reproducibility. Currently, all software is available in the dev-23a
release (i.e. the first release in 2023). Additional modules will be
added to this environment as long as they do not break anything.
Therefore, users should always specify the version of the modules they
use (even if only a single version is available). A new release will be
made if any addition/change would break backwards compatibility.
This heterogeneous setup makes it necessary to first add an additional
path where module files can be found. To activate the different sets of
modules we can use mpsd-modules
. The function takes two arguments:
the release number (of the MPSD software environment, mandatory) and
the feature set (optional, the generic set is used by default).
Calling mpsd-modules list
lists all available releases,
mpsd-modules <release number> list
lists all available feature sets.
Calling mpsd-modules --help
will show help and list available
options. The microarchitecture of each node is stored in the environment
variable $MPSD_MICROARCH
(and can also be obtained via
archspec cpu
).
To demonstrate the use of mpsd-modules
we activate the generic
module set of the current software environment dev-23a
. These
modules can be used on all HPC nodes.
~$ mpsd-modules dev-23a
Now, we can list available modules. At the time of writing this produces the following output (truncated):
~$ module avail
------------ /opt_mpsd/linux-debian11/develop/sandybridge/lmod/Core -------------
gcc/10.3.0 toolchains/foss2021a-mpi
toolchains/foss2021a-cuda-mpi toolchains/foss2021a-serial (D)
----------------------- /usr/share/lmod/lmod/modulefiles -----------------------
Core/lmod Core/settarg (D)
------------------------ /usr/share/modules/modulefiles ------------------------
mathematica mathematica12p2 matlab matlab2021b
Where:
D: Default Module
We can only see a small number of modules. The reason for this is the hierarchical structure mentioned before. The majority of modules are only visible once we load a compiler (and depending on the package an MPI implementation).
We can load a compiler and again list available modules. Now many more are available:
~$ module load gcc/10.3.0
~$ module avail
--------- /opt_mpsd/linux-debian11/develop/sandybridge/lmod/gcc/10.3.0 ----------
autoconf-archive/2022.02.11 libsigsegv/2.13
autoconf/2.71 libtiff/4.4.0
automake/1.16.3 libtool/2.4.6
bdftopcf/1.0.5 libvdwxc/0.4.0
berkeley-db/18.1.40 libxc/4.3.4
berkeleygw/2.1 libxfont/1.5.2
binutils/2.36.1 libxml2/2.10.1
bison/3.8.2 lz4/1.9.4
boost/1.80.0 m4.1.1.19
bzip2/1.0.8 meson/0.63.3
ca-certificates-mozilla/2022-10-11 metis/5.1.0
cgal/5.0.3 mkfontdir/1.0.7
cmake/3.24.3 mkfontscale/1.1.2
cuda/11.4.4 mpfr/4.1.0
curl/7.85.0 munge/0.5.15
diffutils/3.8 nasm/2.15.05
eigen/3.4.0 ncurses/6.3
expat/2.4.8 nfft/3.4.1
fftw/3.3.9 ninja/1.11.1
findutils/4.9.0 nlopt/2.7.0
flex/2.6.3 numactl/2.0.14
flexiblas/3.0.4 openmpi/4.1.1-cuda-11.4.4
font-util/1.3.2 openmpi/4.1.1 (D)
fontconfig/2.13.94 openssh/9.1p1
fontsproto/2.1.3 openssl/1.1.1s
freetype/2.11.1 pcre/8.45
gawk/5.1.1 pcre2/10.39
gdbm/1.23 perl/5.36.0
gdrcopy/2.3 pigz/2.7
gettext/0.21.1 pkgconf/1.8.0
glib/2.74.1 pmix/4.1.2-cuda-11.4.4
gmp/6.2.1 pmix/4.1.2 (D)
gperf/3.1 py-cython/0.29.32
gsl/2.7 py-docutils/0.19
hdf5/1.12.2 py-numpy/1.23.4
hwloc/2.8.0-cuda-11.4.4 py-pip/22.2.2
hwloc/2.8.0 (D) py-setuptools/59.4.0
json-c/0.16 py-wheel/0.37.1
knem/1.1.4-cuda-11.4.4 python/3.9.5
knem/1.1.4 (D) rdma-core/41.0
krb5/1.19.3 readline/8.1.2
libbsd/0.11.5 slurm/21-08-8-2
libedit/3.1-20210216 sparskit/develop
libevent/2.1.12 sqlite/3.39.4
libffi/3.4.2 swig/4.0.2
libfontenc/1.1.3 tar/1.34
libfuse/3.11.0 texinfo/6.5
libgcrypt/1.10.1 ucx/1.13.1-cuda-11.4.4
libgd/2.2.4 ucx/1.13.1 (D)
libgpg-error/1.46 util-linux-uuid/2.38.1
libiconv/1.16 util-macros/1.19.3
libjpeg-turbo/2.1.3 xproto/7.0.31
libmd/1.0.4 xtrans/1.3.5
libnl/3.3.0 xz/5.2.7
libpciaccess/0.16 zlib/1.2.13
libpng/1.6.37 zstd/1.5.2
------------ /opt_mpsd/linux-debian11/develop/sandybridge/lmod/Core -------------
gcc/10.3.0 (L) toolchains/foss2021a-mpi
toolchains/foss2021a-cuda-mpi toolchains/foss2021a-serial (D)
----------------------- /usr/share/lmod/lmod/modulefiles -----------------------
Core/lmod Core/settarg (D)
------------------------ /usr/share/modules/modulefiles ------------------------
mathematica (L) mathematica12p2 (L) matlab (L) matlab2021b (L)
Where:
L: Module is loaded
D: Default Module
We now unload all loaded modules:
~$ module purge
2.4.3. Loading specific packages#
To find a specific package we can use the module spider
command.
Without extra arguments this would list all modules. We can search for a
specific module by adding the module name. For example, let us find the
anaconda distribution:
~$ mpsd-modules dev-23a
~$ module spider anaconda
----------------------------------------------------------------------------
anaconda3: anaconda3/2022.10
----------------------------------------------------------------------------
This module can be loaded directly: module load anaconda3/2022.10
~$ module load anaconda3/2022.10
~$ python --version
Python 3.9.13
~$ which python
/opt_mpsd/linux-debian11/dev-23a/sandybridge/spack/opt/spack/linux-debian11-x86_64_v2/gcc-10.2.1/anaconda3-2022.10-2rlbkdehwu64wa4cj6emqdh2t2xbr5zc/bin/python
If you need to know the location of the files associated with a module
X
, you can use the MPSD_X_ROOT
environment variable. For
example:
~$ mpsd-module dev-23a
~$ module load gcc/11.3.0 fftw/3.3.10
~$ echo $MPSD_FFTW_ROOT
/opt_mpsd/linux-debian11/dev-23a/sandybridge/spack/opt/spack/linux-debian11-sandybridge/gcc-11.3.0/fftw-3.3.10-susnnr2vitgjh6nqajmr7vfu7afumeti
To get more detailed information, you can use module show X
:
~$ module show fftw
----------------------------------------------------------------------------------------------------------------------------------------------
/opt_mpsd/linux-debian11/dev-23a/sandybridge/lmod/gcc/11.3.0/fftw/3.3.10.lua:
----------------------------------------------------------------------------------------------------------------------------------------------
whatis("Name : fftw")
whatis("Version : 3.3.10")
<snip>
prepend_path("LIBRARY_PATH","/opt_mpsd/linux-debian11/dev-23a/sandybridge/spack/opt/spack/linux-debian11-sandybridge/gcc-11.3.0/fftw-3.3.10-susnnr2vitgjh6nqajmr7vfu7afumeti/lib")
prepend_path("CPATH","/opt_mpsd/linux-debian11/dev-23a/sandybridge/spack/opt/spack/linux-debian11-sandybridge/gcc-11.3.0/fftw-3.3.10-susnnr2vitgjh6nqajmr7vfu7afumeti/include")
prepend_path("PATH","/opt_mpsd/linux-debian11/dev-23a/sandybridge/spack/opt/spack/linux-debian11-sandybridge/gcc-11.3.0/fftw-3.3.10-susnnr2vitgjh6nqajmr7vfu7afumeti/./bin")
<snip>
2.4.4. Octopus#
As a second example for loading pre-compiled packages let us search for
octopus
:
~$ mpsd-modules dev-23a
~$ module spider octopus
Multiple versions of octopus
may be available. We can specify a
particular version in order to get more information on how to load the
module:
~$ module spider octopus/12.1
---------------------------------------------------------------------------
octopus: octopus/12.1
---------------------------------------------------------------------------
You will need to load all module(s) on any one of the lines below before
the "octopus/12.1" module is available to load.
gcc/11.3.0 openmpi/4.1.4
We can see that we have to first load gcc/11.3.0
and
openmpi/4.1.4
in order to be able to load and use octopus/12.1
.
Note
Sometimes module spider
will suggest to either only load a compiler or
compiler + MPI implementation. Then, we generally want to also load the MPI
implementation as only this version of the program will use MPI. Loading the MPI-enabled
version of the desired program is crucial when running a slurm job on multiple nodes.
We load gcc/11.3.0
, openmpi/4.1.4
and finally octopus/12.1
.
All of this can be done in one line as long as the packages are given in
the correct order (as shown by module spider
):
~$ module load gcc/11.3.0 openmpi/4.1.4 octopus/12.1
As a first simple check we display the version number of octopus
:
~$ octopus --version
octopus 12.1 (git commit )
2.4.5. Octopus with CUDA support#
To use Octopus on the GPU nodes, we need CUDA support. This is provided
in the microarchitecture skylake_avx512
of the GPU nodes:
~$ mpsd-modules -m skylake_avx512 dev-23a
~$ module spider octopus/12.1
---------------------------------------------------------------------------------
octopus: octopus/12.1
---------------------------------------------------------------------------------
You will need to load all module(s) on any one of the lines below before the "octopus/12.1" module is available to load.
gcc/11.3.0 openmpi/4.1.4
Thus, we can activate Octopus with CUDA support using
~$ mpsd-modules -m skylake_avx512 dev-23a
~$ module load gcc/11.3.0 openmpi/4.1.4-cuda-11.4.4 octopus/12.1-cuda-11.4.4
2.4.6. Python#
To use Python we load the anaconda3
module:
~$ module load anaconda3/2022.10
Anaconda comes with a wide variety of pre-installed Python packages such
as numpy
, scipy
, matplotlib
, etc.
Numpy example
We can execute a small demo program called
hello-numpy.py
. The file has the following content.import numpy as np print("Hello World") print(f"numpy version: {np.__version__}") x = np.arange(5) y = x**2 print(y)
~$ python3 hello_numpy.py Hello World numpy version: 1.21.5 [ 0 1 4 9 16]
Custom conda environment
We can also create a separate conda environment if we need additional Python software or different versions. We suggest to use Miniconda for this (as documented here in more detail).
First, we have to download the Miniconda installer from https://docs.conda.io/projects/miniconda/en/latest/#latest-miniconda-installer-links. Then, we can run the installer and follow the instructions.
~$ wget https://repo.anaconda.com/miniconda/Miniconda3-py310_22.11.1-1-Linux-x86_64.sh ~$ bash Miniconda3-py310_22.11.1-1-Linux-x86_64.sh
Recommendations
run
conda init
when asked about to get access to the conda executable andremove the auto-activation of the base environment to avoid potential conflicts:
conda config --set auto_activate_base false
(If you need to activate the base environment, use
conda activate base
.)
As an example we now create a new environment, called
my_conda_env
, with an older version of Python and a specific numpy version from theconda-forge
channel.~$ conda create -n my_conda_env -c conda-forge python=3.9 numpy=1.23
We can now activate the environment and check the versions of Python and numpy.
~$ conda activate my_conda_env ~$ python --version Python 3.9.16 ~$ python -c "import numpy; print(numpy.__version__)" 1.23.5
We can deactivate and remove the environment using:
~$ conda deactivate my_conda_env ~$ conda env remove -n my_conda_env
Warning
The local conda executable becomes unusable if you load and unload the
anaconda3
module. If running into this problem start a new shell.When using conda inside a slurm submission script (or other non-interactive shell), it is necesary to add the following line before activating an environment:
eval "$(conda shell.bash hook)"
Without this line, you may encounter errors related to conda not being able to find or activate the desired environment in your script.
mpi4py example
mpi4py
is a Python package that allows you to use MPI from Python. This example needs installation ofmpi4py
in a custom (conda) environment. To be able to install it and use it, we need to load theopenmpi
module along with theanaconda3
module:~$ mpsd-modules dev-23a ~$ module load gcc/11.3.0 openmpi/4.1.4 ~$ module load anaconda3 ~$ echo $MPICC
We echo the value of the
MPICC
environment variable to check that it is set. This variable should be the same as the result fromwhich mpicc
. This variable is required for installingmpi4py
to compile and link against the MPI library.Anaconda does not come with
mpi4py
pre-installed, so we install it in an environment calledmy_conda_env
:~$ conda create -n my_conda_env -c conda-forge python=3.11 ~$ conda activate my_conda_env ~$ pip install mpi4py
We use
pip install
instead ofconda install
because thempi4py
package from theconda-forge
channel is incompatible with theopenmpi
module we loaded.To quickly test the installation, we can run the hello world example:
~$ srun -n 5 python -m mpi4py.bench helloworld Hello, World! I am process 0 of 5 on mpsd-hpc-ibm-023. Hello, World! I am process 1 of 5 on mpsd-hpc-ibm-023. Hello, World! I am process 2 of 5 on mpsd-hpc-ibm-023. Hello, World! I am process 3 of 5 on mpsd-hpc-ibm-023. Hello, World! I am process 4 of 5 on mpsd-hpc-ibm-023.
Here is how you could replicate the same default hello world example in a python script:
from mpi4py import MPI comm = MPI.COMM_WORLD rank = comm.Get_rank() size = comm.Get_size() print(f"Hello, World! I am process {rank} of {size} on {MPI.Get_processor_name()}.")
Which can be run as previously mentioned using the
srun
command:~$ srun -n 5 python hello_mpi4py.py Hello, World! I am process 0 of 5 on mpsd-hpc-ibm-023. Hello, World! I am process 2 of 5 on mpsd-hpc-ibm-023. Hello, World! I am process 1 of 5 on mpsd-hpc-ibm-023. Hello, World! I am process 4 of 5 on mpsd-hpc-ibm-023. Hello, World! I am process 3 of 5 on mpsd-hpc-ibm-023.
Recommendations
One needs to keep track of which conda environment was created with which version of the
openmpi
module: for example via the environment name, or a suitable script that loads the rightopenmpi
module and activates the corresponding conda environment. If you need to use a different version ofopenmpi
you need to create a new conda environment.Always use the
srun
command to run MPI programs. This ensures that the MPI processes are started and managed by slurm scheduler on the allocated nodes and not on the login node.
2.4.7. Jupyter notebooks#
You can use a Jupyter notebook on a dedicated HPC node as follows:
Ensure you are at MPSD or have the DESY VPN set up.
Login to a login node (for example
mosh mpsd-hpc-login1.desy.de
, mosh is recommended overssh
to avoid losing the session in case of short connection interruptions e.g. on WiFi)Request a node for interactive use. For example, 1 node with 8 CPUs for 6 hours from the
public
partition:salloc --nodes=1 --cpus-per-task=8 --time=06:00:00 -p public
You can install Jupyter yourself, or you activate an installed version with the following commands:
mpsd-modules dev-23a module load anaconda3
Limit numpy (and other libraries) to the available cores
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
Start the Jupyter notebook (or Jupyter lab) server on that node with
jupyter-notebook --no-browser --ip=${HOSTNAME}.desy.de
Watch the output displayed in your terminal. There is a line similar to this one:
http://mpsd-hpc-ibm-055.desy.de:8888/?token=8814fea339b8fe7d3a52e7d03c2ce942a3f35c8c263ff0b8
which you can paste as a URL into your browser (on your laptop/Desktop), and you should be connected to the Notebook server on the compute node.
2.4.8. Matlab#
To use Matlab, we load the matlab
module.
~$ module load matlab
We can execute a small demo program called hello_matlab.m
. The file
has the following content.
% hello_matlab.m
disp('Hello, MATLAB!');
a = [1 2 3; 4 5 6; 7 8 9];
b = ones(3, 3);
result = a + b;
disp('Matrix addition result:');
disp(result);
~$ matlab -nodisplay -r "run('hello_matlab');exit;"
Hello, MATLAB!
Matrix addition result:
2 3 4
5 6 7
8 9 10
An interactive interface can be loaded by using matlab
on the
terminal.
~$ matlab
2.4.9. Loading a toolchain to compile octopus#
There is also a set of special meta-modules, called toolchains. These
load groups of modules (e.g. compiler, MPI, blas, …). The versions of
the individual packages follow the easybuild
toolchains.
In addition to the set of modules defined by easybuild the toolchains on
the HPC system also contain all modules required to compile octopus
from source.
Here, we show two examples how to compile octopus
, a serial and an
MPI version. Following this guide is only recommended if you need to
compile octopus from source. We also provide pre-compiled modules for
octopus as outlined in Loading specific packages above.
As mentioned before, different variants of (most) modules are available
that support different CPU feature sets. So far we only discussed the
generic set that can be used on all nodes. In order to make use of all
available features on a specific node we can instead load a more
optimised set of modules. The CPU architecture is available in the
environment variable $MPSD_MICORARCH
.
First, we remove the generic module set and activate the optimised set for the current node.
~$ module purge
~$ mpsd-modules -m $MPSD_MICROARCH dev-23a
Parallel version of octopus
We can load the toolchain
foss2021a-mpi
to compileoctopus
usinggcc 10.3.0
andopenmpi 4.1.1
:~$ module load toolchains/foss2021a-mpi
Next, we clone octopus:
~$ git clone https://gitlab.com/octopus-code/octopus.git
(If you intend to make changes in the octopus code, and push them back as a merge request later, you should use
git clone git@gitlab.com:octopus-code/octopus.git
to git clone using ssh instead of https.)After cloning the octopus repository, proceed with:
~$ cd octopus ~$ autoreconf -fi
The SSU Computational science maintains a set of configure wrapper scripts that sources the configure script of octopus with the right parameters for each toolchain. These can be used to compile octopus with standard feature sets. These scripts are available at
/opt_mpsd/linux-debian11/dev-23a/spack-environments/octopus/
. Here is an example of the script (foss2021a-mpi-config.sh
) :#!/bin/sh export CC="mpicc" MARCH_FLAG="-march=${GCC_ARCH:-native}" OPTIMISATION_LEVEL="-O3" export CFLAGS="$MARCH_FLAG $OPTIMISATION_LEVEL -g -fno-var-tracking-assignments" export CXX="mpicxx" export CXXFLAGS="$MARCH_FLAG $OPTIMISATION_LEVEL -g -fno-var-tracking-assignments" export FC="mpif90" export FCFLAGS="$MARCH_FLAG $OPTIMISATION_LEVEL -g -fno-var-tracking-assignments -ffree-line-length-none -fallow-argument-mismatch -fallow-invalid-boz" # Prepare setting the RPATHs for the executable export LDFLAGS=`echo ${LIBRARY_PATH:+:$LIBRARY_PATH} | sed -e 's/:/ -Wl,-rpath=/g'` # help configure to find the ELPA F90 modules and libraries try_mpsd_elpa_version=`expr match "$MPSD_ELPA_ROOT" '.*/elpa-\([0-9.]\+\)'` if [ -n "$try_mpsd_elpa_version" ] ; then if [ -r $MPSD_ELPA_ROOT/include/elpa_openmp-$try_mpsd_elpa_version/modules/elpa.mod ]; then export FCFLAGS_ELPA="-I$MPSD_ELPA_ROOT/include/elpa_openmp-$try_mpsd_elpa_version/modules" export LIBS_ELPA="-lelpa_openmp" elif [ -r $MPSD_ELPA_ROOT/include/elpa-$try_mpsd_elpa_version/modules/elpa.mod ] ; then export FCFLAGS_ELPA="-I$MPSD_ELPA_ROOT/include/elpa-$try_mpsd_elpa_version/modules" fi fi unset try_mpsd_elpa_version # always keep options in the order listed by ``configure --help`` ../configure \ --enable-mpi \ --enable-openmp \ --with-libxc-prefix="$MPSD_LIBXC_ROOT" \ --with-libvdwxc-prefix="$MPSD_LIBVDWXC_ROOT" \ --with-blas="-L$MPSD_OPENBLAS_ROOT/lib -lopenblas" \ --with-gsl-prefix="$MPSD_GSL_ROOT" \ --with-fftw-prefix="$MPSD_FFTW_ROOT" \ --with-pfft-prefix="$MPSD_PFFT_ROOT" \ --with-nfft="$MPSD_NFFT_ROOT" \ --with-pnfft-prefix="$MPSD_PNFFT_ROOT" \ --with-berkeleygw-prefix="$MPSD_BERKELEYGW_ROOT" \ --with-sparskit="$MPSD_SPARSKIT_ROOT/lib/libskit.a" \ --with-nlopt-prefix="$MPSD_NLOPT_ROOT" \ --with-blacs="-L$MPSD_NETLIB_SCALAPACK_ROOT/lib -lscalapack" \ --with-scalapack="-L$MPSD_NETLIB_SCALAPACK_ROOT/lib -lscalapack" \ --with-elpa-prefix="$MPSD_ELPA_ROOT" \ --with-cgal="$MPSD_CGAL_ROOT" \ --with-boost="$MPSD_BOOST_ROOT" \ --with-metis-prefix="$MPSD_METIS_ROOT" \ --with-parmetis-prefix="$MPSD_PARMETIS_ROOT" \ "$@" | tee 00-configure.log 2>&1 echo "-------------------------------------------------------------------------------" >&2 echo "configure output has been saved to 00-configure.log" >&2 if [ "x${GCC_ARCH-}" = x ] ; then echo "Microarchitecture optimization: native (set \$GCC_ARCH to override)" >&2 else echo "Microarchitecture optimization: $GCC_ARCH (from override \$GCC_ARCH)" >&2 fi echo "-------------------------------------------------------------------------------" >&2
You can either source this script manually or use the environment variable
MPSD_OCTOPUS_CONFIGURE
which points to the right script for the current toolchain, once loaded. In the following example to configure and compile octopus, we show the use of the environment variable:1~$ autoreconf -i 2~$ mkdir _build && cd _build 3~$ source $MPSD_OCTOPUS_CONFIGURE --prefix=`pwd` 4~$ make 5~$ make check-short 6~$ make install
After line 5 (
make
) has completed, the octopus binary is located at_build/src/octopus
.If you need to use the
make install
target (line 7), you can define the location to which Octopus installs its binaries (bin
), libraries (lib
), documentation (doc
) and more using the--prefix
flag in line 4. In this example, the install prefix is the current working directory (pwd
) in line 4, i.e. the_build
directory, with the octopus executable at_build/bin/octopus
.Serial version of octopus
Compiling the serial version in principle consists of the same steps as the parallel version. We use a different toolchain and configuration script.
We load the serial toolchain:
~$ module load toolchains/foss2021a-serial
and use the following configure script (
foss2021a-serial-config.sh
):#!/bin/sh export CC="gcc" MARCH_FLAG="-march=${GCC_ARCH:-native}" OPTIMISATION_LEVEL="-O3" export CFLAGS="$MARCH_FLAG $OPTIMISATION_LEVEL -g -fno-var-tracking-assignments" export CXX="g++" export CXXFLAGS="$MARCH_FLAG $OPTIMISATION_LEVEL -g -fno-var-tracking-assignments" export FC="gfortran" export FCFLAGS="$MARCH_FLAG $OPTIMISATION_LEVEL -g -fno-var-tracking-assignments -ffree-line-length-none -fallow-argument-mismatch -fallow-invalid-boz" #HG: ugly hack to include rpath while linking # becomes necessary for spack >= 0.19, as it does not set LD_LIBRARY_PATH anymore export LDFLAGS=`echo ${LIBRARY_PATH:+:$LIBRARY_PATH} | sed -e 's/:/ -Wl,-rpath=/g'` # HG: help configure to find the ELPA F90 modules and libraries try_mpsd_elpa_version=`expr match "$MPSD_ELPA_ROOT" '.*/elpa-\([0-9.]\+\)'` if [ -n "$try_mpsd_elpa_version" ] ; then if [ -r $MPSD_ELPA_ROOT/include/elpa_openmp-$try_mpsd_elpa_version/modules/elpa.mod ]; then export FCFLAGS_ELPA="-I$MPSD_ELPA_ROOT/include/elpa_openmp-$try_mpsd_elpa_version/modules" export LIBS_ELPA="-lelpa_openmp"
We can then follow the instructions from the previous section.
Passing additional arguments to
configure
The
*-config.sh
scripts shown above accept additional arguments and pass those on to theconfigure
command of Octopus, for examplefoss2021a-serial-config.sh --prefix=<basedir>
would pass the--prefix=<basedir>
to theconfigure
script (see Octopus documentation).
2.4.10. Compiling custom code#
To compile other custom code we generally require a different collection
of modules than the one provided by the toolchains
meta-modules. In
these cases it might be necessary to manually load all required modules.
The same general notes about generic and optimised module sets
explained in the previous section apply.
Here, we show two different examples. The sources are available under
/opt_mpsd/linux-debian11/dev-23a/examples/slurm-examples
Serial “Hello world” in Fortran
First, we want to compile the following “hello world” Fortran program using
gcc
. We assume it is saved in a filehello.f90
. The source is available in/opt_mpsd/linux-debian11/dev-23a/examples/slurm-examples/serial-fortran
.program hello write(*,*) "Hello world!" end program
We have to load
gcc
:module load gcc/10.3.0
Then, we can compile and execute the program:
~$ gfortran -o hello hello.f90 ~$ ./hello Hello world!
MPI-parallelised “Hello world” in C
As a second example we compile an MPI-parallelised “Hello world” C program, again using
gcc
. We assume the source is saved in a filehello-mpi.c
(source available under/opt_mpsd/linux-debian11/dev-23a/examples/slurm-examples/mpi-c
).#include <mpi.h> #include <stdio.h> int main(int argc, char** argv) { MPI_Init(NULL, NULL); int world_size; MPI_Comm_size(MPI_COMM_WORLD, &world_size); int world_rank; MPI_Comm_rank(MPI_COMM_WORLD, &world_rank); char processor_name[MPI_MAX_PROCESSOR_NAME]; int name_len; MPI_Get_processor_name(processor_name, &name_len); printf("Hello world from rank %d out of %d on %s.\n", world_rank, world_size, processor_name); MPI_Finalize(); }
We have to load
gcc
andopenmpi
:~$ module load gcc/10.3.0 openmpi/4.1.1
Now, we can compile and execute the test program:
~$ mpicc -o hello-mpi hello-mpi.c ~$ orterun -n 4 ./hello-mpi Hello world from rank 2 out of 4 on mpsd-hpc-login1. Hello world from rank 3 out of 4 on mpsd-hpc-login1. Hello world from rank 1 out of 4 on mpsd-hpc-login1. Hello world from rank 0 out of 4 on mpsd-hpc-login1.
Note
Inside a slurm job
srun
has to be used instead oforterun
.
2.4.11. Setting the rpath
(finding libraries at runtime)#
This section is relevant if you compile your own software and need to link to libraries provided on the MPSD HPC system.
Background
At compile time (i.e. when compiling and building an executable), we need to tell the linker where to find external libraries. This happens via the
-L
flags and the environment variableLIBRARY_PATH
which the compiler (for examplegcc
) passes on to the linker.At runtime, the dynamic linker
ld.so
needs to find libraries with the same SONAME for our executable by searching through one or more given directories. These directories can be taken from (in decreasing order of priority),a
LD_LIBRARY_PATH
if set,
one or more
rpath
entries set in the executable,
(iii) if not found yet, the default search path defined in
/etc/ld.so.conf
.
Use
rpath
; do not setLD_LIBRARY_PATH
When we compile software on HPC systems, we generally want to use the
rpath
option. That means(a) we must not set
LD_LIBRARY_PATH
environment variable. It also means(b) we must set the
rpath
in the executable. To embed/PATH/TO/LIBRARY
in therpath
entry in the header of the executable, we need to append-Wl,-rpath=/PATH/TO/LIBRARY
to the call of the compiler.
Example: Linking to FFTW
Given this C program with name
fftw_test.c
:#include <stdio.h> #include <fftw3.h> #define N 32 int main(int ARGC, char *ARGV) { fftw_complex *in, *out; fftw_plan p; in = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N); out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N); p = fftw_plan_dft_1d(N, in, out, FFTW_FORWARD, FFTW_ESTIMATE); fftw_execute(p); /* repeat as needed */ fftw_destroy_plan(p); fftw_free(in); fftw_free(out); printf("Done.\n"); return 0; }
we can compile it as follows:
$ mpsd-modules 23b $ module load gcc fftw $ gcc -lfftw3 -L$MPSD_FFTW_ROOT/lib -Wl,-rpath=$MPSD_FFTW_ROOT/lib fftw_test.c -o fftw_test
In the compile (and link) line, we have to specify the path to the relevant file
libfftw3.so
. For every package, the MPSD HPC system provides the relevant path to the package root in a environment variable of the formMPSD_<PACKAGE_NAME>_ROOT
:$ echo $MPSD_FFTW_ROOT /opt_mpsd/linux-debian11/23b/sandybridge/spack/opt/spack/linux-debian11-sandybridge/gcc-11.3.0/fftw-3.3.10-susnnr2vitgjh6nqajmr7vfu7afumeti
If we replace the variables in the compiler call, it would look as follows.
$ gcc -lfftw3 \ -L/opt_mpsd/linux-debian11/23b/sandybridge/spack/opt/spack/linux-debian11-sandybridge/gcc-11.3.0/fftw-3.3.10-susnnr2vitgjh6nqajmr7vfu7afumeti/lib \ -Wl,-rpath=/opt_mpsd/linux-debian11/23b/sandybridge/spack/opt/spack/linux-debian11-sandybridge/gcc-11.3.0/fftw-3.3.10-susnnr2vitgjh6nqajmr7vfu7afumeti/lib \ fftw_test.c -o fftw_test
Users are strongly advised to use the environment variables.
When loading modules (or complete toolchains) using the
module
command, the MPSD HPC system also populates a variableLIBRARY_PATH
, which the compiler will use as an argument for-L
if the variable exists. We can thus omit the-L
in the call:$ gcc -lfftw3 -Wl,-rpath=$MPSD_FFTW_ROOT/lib fftw_test.c -o fftw_test
We can use the
ldd
command to check which libraries the dynamic linker identifies:$ ldd fftw_test linux-vdso.so.1 (0x00007f574dcdd000) libfftw3.so.3 => /opt_mpsd/linux-debian11/23b/sandybridge/spack/opt/spack/linux-debian11-sandybridge/gcc-11.3.0/fftw-3.3.10-susnnr2vitgjh6nqajmr7vfu7afumeti/lib/libfftw3.so.3 (0x00007f574dae7000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f574d8ff000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f574d7bb000) /lib64/ld-linux-x86-64.so.2 (0x00007f574dcdf000)
Setting
rpath
for multiple dependencies (useful template)As the
LIBRARY_PATH
variable contains the paths to all loaded libraries, we can use it to create the relevant-Wl,-rpath=...
argument automatically as shown in this example:$ export LDFLAGS=`echo ${LIBRARY_PATH:+:$LIBRARY_PATH} | sed -e 's/:/ -Wl,-rpath=/g'` $ gcc -lfftw3 $LDFLAGS fftw_test.c -o fftw_test
Related documentation
Related documentation (addressing the same issue and approach) from the MPCDF is available at https://docs.mpcdf.mpg.de/faq/hpc_software.html#how-do-i-set-the-rpath
2.5. Example batch scripts#
Here, we show a number of example batch scripts for different types of
jobs. All examples are available on the HPC system under
/opt_mpsd/linux-debian11/dev-23a/examples/slurm-examples
together
with the example programs. One can also get the latest copy of the
scripts from the git repository
here. We use the
public
partition and the generic module set for all examples.
To test an example on the HPC system we can copy the relevant directory
into our scratch directory. If required we can compile the code using
make
and then submit the job using sbatch submission-script.sh
.
2.5.1. MPI#
The source code and submission script are in
/opt_mpsd/linux-debian11/dev-23a/examples/slurm-examples/mpi-c
.
#!/bin/bash --login
#
# Standard output and error
#SBATCH -o ./out.%j
#SBATCH -e ./err.%j
#
# working directory
#SBATCH -D ./
#
# partition
#SBATCH -p public
#
# job name
#SBATCH -J MPI-example
#
#SBATCH --mail-type=ALL
#
# job requirements
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=16
#SBATCH --time=00:10:00
. setup-env.sh
srun ./hello-mpi
2.5.2. MPI + OpenMP#
The source code and submission script are in
/opt_mpsd/linux-debian11/dev-23a/examples/slurm-examples/mpi-openmp-c
.
#!/bin/bash --login
#
# Standard output and error
#SBATCH -o ./out.%j
#SBATCH -e ./err.%j
#
# working directory
#SBATCH -D ./
#
# partition
#SBATCH -p public
#
# job name
#SBATCH -J MPI-OpenMP-example
#
#SBATCH --mail-type=ALL
#
# job requirements
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=8
#SBATCH --time=00:10:00
. setup-env.sh
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
# TODO from the MPCDF example, does the same apply here?
# For pinning threads correctly:
export OMP_PLACES=cores
srun ./hello-mpi-openmp
2.5.3. OpenMP#
The source code and submission script are in
/opt_mpsd/linux-debian11/dev-23a/examples/slurm-examples/openmp-c
.
#!/bin/bash --login
#
# Standard output and error
#SBATCH -o ./out.%j
#SBATCH -e ./err.%j
#
# working directory
#SBATCH -D ./
#
# partition
#SBATCH -p public
#
# job name
#SBATCH -J OpenMP-example
#
#SBATCH --mail-type=ALL
#
# job requirements
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=16
#SBATCH --time=00:10:00
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
# TODO from the MPCDF example, does the same apply here?
# For pinning threads correctly:
export OMP_PLACES=cores
srun ./hello-openmp
2.5.4. Python with numpy or multiprocessing#
The source code and submission script are in
/opt_mpsd/linux-debian11/dev-23a/examples/slurm-examples/python-numpy
.
#!/bin/bash --login
#
# Standard output and error
#SBATCH -o ./out.%j
#SBATCH -e ./err.%j
#
# working directory
#SBATCH -D ./
#
# partition
#SBATCH -p public
#
# job name
#SBATCH -J python-numpy-example
#
#SBATCH --mail-type=ALL
#
# job requirements
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=16
#SBATCH --time=00:10:00
module purge
mpsd-modules dev-23a
module load anaconda3/2022.10
eval "$(conda shell.bash hook)"
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
srun python3 ./hello-numpy.py
2.5.5. Single-core job#
The source code and submission script are in
/opt_mpsd/linux-debian11/dev-23a/examples/slurm-examples/serial-fortran
.
#!/bin/bash --login
#
# Standard output and error
#SBATCH -o ./out.%j
#SBATCH -e ./err.%j
#
# working directory
#SBATCH -D ./
#
# partition
#SBATCH -p public
#
# job name
#SBATCH -J serial-example
#
#SBATCH --mail-type=ALL
#
# job requirements
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --time=00:10:00
# TODO do we want to use any modules for this example
srun ./hello
2.5.6. Serial Python#
The source code and submission script are in
/opt_mpsd/linux-debian11/dev-23a/examples/slurm-examples/python-serial
.
#!/bin/bash --login
#
# Standard output and error
#SBATCH -o ./out.%j
#SBATCH -e ./err.%j
#
# working directory
#SBATCH -D ./
#
# partition
#SBATCH -p public
#
# job name
#SBATCH -J python-example
#
#SBATCH --mail-type=ALL
#
# job requirements
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --time=00:10:00
module purge
mpsd-modules dev-23a
module load anaconda3/2022.10
eval "$(conda shell.bash hook)"
export OMP_NUM_THREADS=1 # restrict numpy (and other libraries) to one core
srun python3 ./hello.py
2.5.7. GPU jobs#
For GPU jobs, we recommend to specify desired hardware resources as follows. In parenthesis we provide the typical application (MPI, OpenMP) for guidance.
nodes
- how many computers to use, for example--nodes=1
tasks-per-node
- how many (MPI) processed to run per node:--tasks-per-node=4
gpus-per-task
- how many GPUs per (MPI) process to use (often 1):--gpus-per-task=1
cpus-per-task
- how many CPUs (OpenMP threads) to use:--cpus-per-task=4
Example:
user@mpsd-hpc-login1:~$ salloc --nodes=1 --tasks-per-node=4 --gpus-per-task=1 --cpus-per-task=4 --mem=128G -p gpu
user@mpsd-hpc-gpu-002:~$ mpsd-show-job-resources
9352 Nodes: mpsd-hpc-gpu-002
9352 Local Node: mpsd-hpc-gpu-002
9352 CPUSET: 0-7,16-23
9352 MEMORY: 131072 M
9352 GPUs (Interconnects, CPU Affinity, NUMA Affinity):
9352 GPU0 X NV1 NV1 NV2 SYS 0-7,16-23 0-1
9352 GPU1 NV1 X NV2 NV1 SYS 0-7,16-23 0-1
9352 GPU2 NV1 NV2 X NV2 SYS 0-7,16-23 0-1
9352 GPU3 NV2 NV1 NV2 X SYS 0-7,16-23 0-1
user@mpsd-hpc-gpu-002:~$
We can see from the output that we have one node (mpsd-hpc-gpu-002), 16 CPUs (with ids 0 to 7 and 16 to 23), 128GB (=131072MiB), and 4 GPUs allocated (GPU0 to GPU3).
The source code and submission script for the CUDA example are in
/opt_mpsd/linux-debian11/dev-23a/examples/slurm-examples/cuda
.
#!/bin/bash --login
#
# Standard output and error
#SBATCH -o ./out.%j
#SBATCH -e ./err.%j
#
# working directory
#SBATCH -D ./
#
# partition
#SBATCH -p gpu
#
# job name
#SBATCH -J CUDA-MPI-example
#
#SBATCH --mail-type=ALL
#
# job requirements
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --gpus-per-task=1
#SBATCH --cpus-per-task=2
#SBATCH --time=00:02:00
. setup-env.sh
srun ./hello-cuda