Documentation

Elysium is the central HPC Cluster at the Ruhr-Universität Bochum. See the overview of its resources.

To use Elysium, you need to

Please read about the basic concept of using Elysium first.

The login process combines SSH key-based authentication with web-based two-factor authentication.

After login, you can use available software modules or build your own software.

Read about submitting jobs and allocating resources in the SLURM section.

Subsections of Documentation

Basics

Elysium provides four login nodes: login1.elysium.hpc.rub.de, …, login4.elysium.hpc.rub.de. These are your entry points to the cluster.

After login, you typically use them to prepare your software and copy your data to the appropriate locations. Performing calculations on the login nodes is prohibited (see FAQ).

You can then allocate resources on the cluster using the Slurm workload manager.

After submitting your request, Slurm will grant you the resources as soon as they are free and your priority is higher than the priority of other jobs that might be waiting for some of the same resources.

Your priority depends on your waiting time and your remaining FairShare.

Login

Login to Elysium combines the common SSH key-based authentication with web-based two-factor authentication.

Circumventing these steps (e.g., opening sessions without the need for two factor authentication, sharing of accounts, keys, or login credentials) violates the IT.SERVICES Terms of Use and may result in permanent loss of access rights to the Elysium cluster and/or other services provided by IT.SERVICES.

In order to be able to authenticate during login you need submit your public SSH key via the User Access Application Form as well as enable two-factor authentication for your RUB LoginID at rub.de/login.

The additional web-based authentication is cached for 14 hours so that you typically only have to do it once per work day, per login node, and per IP address you connect from. After that, your normal key-based SSH workflow will work as expected.

In order to simplify the use of SSH keys we recommend to specify it as identity file in your SSH config. This can be done by adding the following lines to your ~/.ssh/config file:

Host login00*.elysium.hpc.rub.de login00*.elysium.hpc.ruhr-uni-bochum.de
    IdentityFile ~/.ssh/elysium
    User <loginID>

where <loginID> has to be exchanged by your RUB LoginID. If your SSH key is located in a different file the IdentityFile path needs to be adjusted accordingly.

Follow these steps:

  1. Start ssh with the correct private key, your RUB LoginID, and one of the four login hosts, e.g.
    ssh -i ~/.ssh/elysium LOGINID@login001.elysium.hpc.ruhr-uni-bochum.de, or ssh login001.elysium.hpc.rub.de if you want to use the SSH config specified above. Available login nodes are login001 to login004. Login step 1: start ssh Login step 1: start ssh

  2. Open the URL in a browser (or scan the QR code with your smartphone) to start web-based two-factor authentication. Login step 2 part 1: start web-based authentication Login step 2 part 1: start web-based authentication

  3. Enter the second factor for two-factor authentication. Login step 2 part 2: web-based LoginID / password authentication Login step 2 part 2: web-based LoginID / password authentication

  4. After successful login, you get a four-digit verification code. Login step 2 part 3: get the verification code Login step 2 part 3: get the verification code

  5. Enter this code at your ssh prompt to finish login. Login step 3: verify the SSH session Login step 3: verify the SSH session

For the next 14 hours, only step 1 (classic key-based authentication) will be necessary on the chosen login node for the IP address you connected from.

Login will fail if:

  • You use the wrong private key (“Permission denied (publickey)”)
  • You are not member of an active HPC project (“Permission denied (publickey)”)
  • You did not enable two-factor authentication for your LoginID (“Two-factor authentication is required”)
  • Web-based login fails
  • You enter the wrong verification code (“Verification failed”)
  • A timeout happens between starting the SSH session and finalizing web-based login (“session_id not found”); just start the process again to get a new session ID.

Software

We provide a basic set of toolchains and some common libraries via modules.

To build common HPC software packages, we provide a central installation of the Spack package manager. A detailed guide on how to use this installation can be found in the Spack Usage Guide.

Subsections of Software

Modules

We use the Lmod module system to provide compilers, MPI stacks, libraries, and tools.

The legacy module tree (elysium/2024) is currently the default view. To switch to the current tree, load:

module load elysium/2026

The elysium/202x modules modify the MODULEPATH. They are sticky, so they are not removed by module purge. elysium/2024 and elysium/2026 are mutually exclusive: loading one unloads the other. The legacy tree (elysium/2024) is frozen and no longer updated by admins. The 2026 tree is built with gcc@13.4.0, which provides better optimization support for Elysium’s Zen 4 CPUs than gcc@11.

Known Issues

  • Intel MPI in elysium/2024 (legacy tree) is known to be unstable in this version. It can cause random MPI deadlocks where applications stop progressing while the Slurm job is still running.
  • In the current tree (elysium/2026), Intel MPI is stable and recommended.
  • MPICH versions before 5.0.0 are not suitable for multi-node runs. Use 5.0.0 or newer for multi-node jobs.
  • If you must use MPICH < 5.0.0, restrict jobs to single-node execution.

Common module commands

ml is a short alias for module.

  • ml av: list available modules in the current module path.
  • ml load <module>: load a module. Example: ml load openmpi/5.0.9.
  • ml list: show currently loaded modules.
  • ml unload <module>: unload a module. Example: ml unload openmpi/5.0.9.
  • ml purge: unload all currently loaded modules except sticky ones.
  • ml help <module>: show module help text. Example: ml help openmpi/5.0.9.
  • ml show <module>: print the contents of the modulefile.
  • ml whatis <module>: show a short module description.
  • ml spider <name>: search all known versions. Use ml spider -A <name> to also show hidden entries.
  • ml use <path>: add a directory to your module search path.
  • ml unuse <path>: remove a directory from your module search path.

Using your own modulefiles

You can keep your own modulefiles in a personal directory and add it with ml use.

mkdir -p "$HOME/modules/mytool"
cat > "$HOME/modules/mytool/1.0.lua" <<'LUA'
help([[MyTool 1.0]])
whatis("Name: MyTool")
whatis("Version: 1.0")
depends_on("openmpi/5.0.9")
prepend_path("PATH", "/path/to/mytool/bin")
LUA

ml use "$HOME/modules"
ml load mytool/1.0

In this example, loading mytool/1.0 also loads openmpi/5.0.9 as a dependency.

To make your personal modules always visible, add this to your ~/.bashrc:

ml use "$HOME/modules"

If you need additional software or module versions, please contact support.

Spack

We provide a central Spack 1.1.1 installation in parallel to the existing 0.23.0 stack.

This separation is necessary because a database migration would render the 0.23.0 installation unusable. The new 2026 module tree is not compatible with the legacy tree, and self-built packages from the old setup must be rebuilt.

By default, you will still see the legacy module tree. The new stack is opt-in via:

module load spack/2026

If you already used the legacy setup, use this dedicated migration guide: Migration Guide for Existing Users.

Legacy documentation remains available here: Spack (Legacy 0.23.0).


Table of Contents

  1. Quick Setup (New Users)
  2. Architecture Overview
  3. Guide to Using Spack
  4. What Is New in Modern Spack
  5. Legacy Setup

Quick Setup

For a fresh setup:

rub-deploy-spack-configs-2026
module load spack/2026

After that, you can install your own packages into your home Spack tree and generate personal modulefiles. Add module load spack/2026 to your ~/.bashrc if you want this active in ever login shell.


Architecture Overview

We use a central Spack installation combined with per-user overlays:

Central Spack (read-only, maintained by HPC)
    ├─ pre-built packages
    ├─ MPI configurations
    └─ cluster-wide defaults
            ↓ (upstream)
User Spack ($HOME)
    ├─ personal installs
    └─ personal modules
            ↓
Lmod module system

This means:

  • You automatically use centrally installed packages when available.
  • Cluster-wide configurations (e.g. MPI defaults, compiler settings) are inherited.
  • Your own installations go into your personal $HOME Spack tree.
  • Only missing packages are built locally.

Guide to Using Spack

This section covers the most common Spack workflows on Elysium.


Searching and Inspecting Packages

Search for packages:

spack list <keyword>
spack list openfoam

Show details (versions, variants, dependencies):

spack info hdf5

Preview what will actually be installed:

spack spec hdf5 +mpi

Always check spack spec before installing complex packages.


Installing Packages

Basic installation:

spack install hdf5

Enable or disable variants:

spack install hdf5 +mpi +cxx ~fortran

Specify a compiler:

spack install hdf5 %gcc@13.4.0

Force rebuilding dependencies with the same compiler:

spack install --fresh hdf5 %gcc@13.4.0

Specify dependencies explicitly:

spack install hdf5 ^openmpi@5.0.9

Everything combined:

spack install hdf5@1.14.6 +mpi %gcc@13.4.0 ^openmpi@5.0.9

Virtual Providers (BLAS, MPI, FFT, etc.)

Some packages depend on virtual interfaces instead of concrete libraries. Examples include:

  • blas
  • lapack
  • mpi
  • fftw-api

To see which virtual packages exist:

spack providers

To list available providers for a specific interface:

spack providers blas
spack providers fftw-api

Example output:

Providers for blas:
    amdblis
    openblas
    intel-oneapi-mkl

You can select a specific provider during installation:

spack install gromacs ^blas=openblas

This is often preferred over manually selecting a specific concrete library, because it keeps the dependency graph clean and compatible.


Inspecting and Comparing Installations

List installed packages with variants and hashes:

spack find -vl hdf5

Inspect a specific installation:

spack spec /<hash>

Compare two installations:

spack diff /hash1 /hash2

Removing Packages

Remove a specific installation by hash:

spack uninstall /<hash>

Overriding Package Definitions

On Elysium, the central builtin repository is provided by the HPC team. If you want to override a package definition (e.g. to test changes), you can create a local repository on top of it.

Create a local repo

mkdir -p $HOME/spack/var/spack/repos/packages
cat > $HOME/spack/var/spack/repos/repo.yaml <<'EOF'
repo:
  namespace: overrides
EOF

Register it in ~/.spack/repos.yaml (place it above builtin to override):

repos:
  overrides: $HOME/spack/var/spack/repos
  builtin:
    destination: /cluster/spack/spack-packages

Check:

spack repo list

Override a package (example: ffmpeg)

mkdir -p $HOME/spack/var/spack/repos/packages/ffmpeg
cp /cluster/spack/spack-packages/repos/spack_repo/builtin/packages/ffmpeg/package.py \
   $HOME/spack/var/spack/repos/packages/ffmpeg/

Edit the copied package.py and adjust versions, dependencies, variants, etc.

Install explicitly from your namespace:

spack install overrides.ffmpeg

Verify which repository is used

For a spec (not yet installed):

spack spec -N ffmpeg

For installed packages:

spack find -N ffmpeg

The -N option shows the namespace (overrides or builtin).


Custom Changes to Packages using spack develop

If you want to modify the source code of a package (e.g. openfoam) and rebuild it locally, you can use spack develop. This allows you to work directly on a source checkout without creating tarballs or calculating checksums.

Create and activate a development environment

mkdir -p ~/openfoam-dev
cd ~/openfoam-dev
spack env create -d .
spacktivate . #shortcut for `spack env activate .`

Using a dedicated environment keeps your development work isolated from your normal Spack setup.

Add and install the package

spack add openfoam
spack install

This performs a normal installation and ensures all dependencies are available.

Switch to development mode

spack develop openfoam

This checks out the source code into the environment directory (e.g. ~/openfoam-dev/openfoam/) and registers it as the active development source.

You can verify this with:

spack find -cv openfoam

Look for dev_path=.../openfoam in the output.

Modify the source and rebuild

cd ~/openfoam-dev/openfoam
# edit source files here (e.g. with vim)

cd ~/openfoam-dev
spack install

Spack will now build openfoam from your modified local sources.

If compilation fails, Spack will print the relevant error messages and the path to the full build log.


Legacy Setup

If you need to reference the old 0.23.0 setup, use: Spack (Legacy 0.23.0).

Subsections of Spack

Vampir

Vampir is a framework for analyzing program behavior of serial and parallel software by utilizing function instrumentation via Score-p. Vampir is licensed by HPC.nrw and can be used freely on the Elysium Cluster. This site merely shows a small test case to show how Score-p can be used to generate profiling data and how Vampir can be started on Elysium. For information how to use Vampir to analyze your application, extract useful performance metrics, and identify bottlenecks, please refer to the Score-p Cheat Sheet and the official Vampir Documentation.

Compilation with Instrumented Functions

In order to generate profiling data the function calls in the application need to be instrumented. This means inserting additional special function calls that record the time, current call stack, and much more. Fortunately, this is not done manually, but can easily achieved by using the Score-p compiler wrapper. To follow along you can use this MPI Example Code.

To use the Score-p compiler wrapper, all that is needed is to prepend the compiler by the scorep command:

module load openmpi/5.0.5-d3ii4pq
module load scorep/8.4-openmpi-5.0.5-6mtx3p6
scorep mpicc -o mpi-test.x mpi-test.c

In the case of a Makefile, or other build systems, the compiler variable has to be adjusted accordingly.

Generating Profiling Data

Profiling data is created by running the application. Note that the profiling files can grow to enormous sizes. Thus, it is advisable to choose a small representative test case for your application and not a full production run. In its default mode Score-p collects profiling data by sampling the applications call-stack from time to time. In order to generate an accurate profile tracing needs to be enabled in your job script:

module load openmpi/5.0.5-d3ii4pq
module load scorep/8.4-openmpi-5.0.5-6mtx3p6

export SCOREP_ENABLE_TRACING=true
mpirun -np 4 ./mpi-test.x

Here is a full job script for the example:

#!/bin/bash
#SBATCH --partition=cpu
#SBATCH --ntasks=4
#SBATCH --nodes=1
#SBATCH --account=<Account>
#SBATCH --time=00-00:05:00

module purge
module load openmpi/5.0.5-d3ii4pq
module load scorep/8.4-openmpi-5.0.5-6mtx3p6

export SCOREP_ENABLE_TRACING=true
mpirun -np 4 ./mpi-test.x

The execution of the instrumented application will take significantly longer than usual. Thus, it should never be used for production runs, but merely for profiling. After the application is finished a new directory was created, containing the time stamp and some other information in its name e.g.: scorep-20251222_0912_1523094386395226 The file traces.otf2 contains the profiling data required by Vampir.

Visualizing With Vampir

In order to visualize the profiling data a Visualization Session has to be established. Vampir can be started with

module load vampir
vglrun +pr -fps 20 vampir ./traces.otf2

This will open the Vampir graphical user interface: Vampir Vampir

VASP

Build configuration (MKL)

On Elysium, VASP can be built with Spack using:

spack install vasp@6.4.3 +openmp +fftlib ^openmpi@5.0.5 ^fftw@3+openmp ^intel-oneapi-mkl threads=openmp +ilp64

This configuration uses:

  • Intel oneAPI MKL (ILP64) for BLAS, LAPACK and ScaLAPACK,
  • VASP’s internal FFTLIB to avoid MKL CDFT issues on AMD,
  • OpenMPI 5.0.5 as MPI implementation,
  • OpenMP enabled for hybrid parallelisation.

We choose MKL as baseline because it is the de-facto HPC standard and performs well on AMD EPYC when AVX512 code paths are enabled.


Activating AVX512

Intel’s MKL only enables AVX512 optimisations on Intel CPUs. On AMD, MKL defaults to AVX2/SSE code paths.

To unlock the faster AVX512 kernels on AMD EPYC we provide libfakeintel, which fakes Intel CPUID flags.

MKL version library to preload
≤ 2024.x /lib64/libfakeintel.so
≥ 2025.x /lib64/libfakeintel2025.so works for older versions too

⚠ Intel gives no guarantee that all AVX512 instructions work on AMD CPUs. In practice, the community has shown that not every kernel uses full AVX512 width, but the overall speed-up is still substantial.

Activate AVX512 by preloading the library in your job:

export LD_PRELOAD=/lib64/libfakeintel2025.so:${LD_PRELOAD}

Test case 1 – Si256 (DFT / Hybrid HSE06)

This benchmark uses a 256-atom silicon supercell (Si256) with the HSE06 hybrid functional. Hybrid DFT combines FFT-heavy parts with dense BLAS/LAPACK operations and is therefore a good proxy for most large-scale electronic-structure workloads.

Baseline: MPI-only, 1 node

Configuration Time [s] Speed-up vs baseline
MKL (no AVX512) 2367 1.00×
MKL (+ AVX512) 2017 1.17×

→ Always enable AVX512. The baseline DFT case runs 17 % faster with libfakeintel,


Build configuration (AOCL)

AOCL (AMD Optimized Libraries) is AMD’s analogue to MKL, providing:

  • AMDBLIS (BLAS implementation)
  • AMDlibFLAME (LAPACK)
  • AMDScaLAPACK, AMDFFTW optimised for AMD EPYC
  • built with AOCC compiler

Build example:

spack install vasp@6.4.3 +openmp +fftlib %aocc ^amdfftw@5 ^amdblis@5 threads=openmp ^amdlibflame@5 ^amdscalapack@5 ^openmpi

AOCL detects AMD micro-architecture automatically and therefore does not require libfakeintel.

Baseline: MPI-only, 1 node

Configuration Time [s] Speed-up vs baseline
MKL (+ AVX512) 2017 1.00
AOCL (AMD BLIS / libFLAME) 1919 1.05

The AOCl build is another 5% faster than MKL with AVX512 enabled.


Hybrid parallelisation and NUMA domains

Each compute node has two EPYC 9254 CPUs with 24 cores each (48 total). Each CPU is subdivided into 4 NUMA domains with separate L3 caches and memory controllers.

  • MPI-only: 48 ranks per node (1 per core).
  • Hybrid L3: 8 MPI ranks × 6 OpenMP threads each, bound to individual L3 domains.

This L3-hybrid layout increases memory locality, because each rank mainly uses its own local memory and avoids cross-socket traffic.

Single-node hybrid results (Si256)

Configuration Time [s] Speed-up vs MPI-only
MKL (L3 hybrid) 1936 1.04×
AOCL (L3 hybrid) 1830 1.05×

Hybrid L3 adds a modest 4-5 % speed-up.


Multi-node scaling (Si256)

Configuration Nodes Time [s] Speed-up vs 1-node baseline
MKL MPI-only 2 1305 1.55×
AOCL MPI-only 2 1142 1.68×
MKL L3 hybrid 2 1147 1.69×
AOCL L3 hybrid 2 968 1.89×

Interpretation

AOCL shows the strongest scaling across nodes; MKL’s hybrid variant catches up in scaling compared to its MPI-only counterpart. The L3-hybrid layout maintains efficiency even in the multi-node regime.


Recommendations for DFT / Hybrid-DFT workloads

  • AOCL generally outperforms MKL (+AVX512) on AMD EPYC.
  • Prefer L3-Hybrid (8×6) on single-node and even multi-node jobs for FFT-heavy hybrid-DFT cases.
  • For pure MPI runs, both MKL (+AVX512) and AOCL scale well; AOCL slightly better.
  • Always preload libfakeintel2025.so if MKL is used.

Jobscript examples

AOCL – Hybrid L3 (8×6)

#!/bin/bash
#SBATCH -J vasp_aocl_l3hyb
#SBATCH -N 1
#SBATCH --ntasks=8
#SBATCH --cpus-per-task=6
#SBATCH -p cpu
#SBATCH -t 48:00:00
#SBATCH --exclusive

module purge
module load vasp-aocl

export OMP_NUM_THREADS=6
export OMP_PLACES=cores
export OMP_PROC_BIND=close
export BLIS_NUM_THREADS=6

mpirun -np 8 --bind-to l3 --report-bindings vasp_std

MKL (+AVX512) – Hybrid L3 (8×6)

#!/bin/bash
#SBATCH -J vasp_mkl_avx512_l3hyb
#SBATCH -N 1
#SBATCH --ntasks=8
#SBATCH --cpus-per-task=6
#SBATCH -p cpu
#SBATCH -t 48:00:00
#SBATCH --exclusive

module purge
module load vasp-mkl
export LD_PRELOAD=/lib64/libfakeintel2025.so:${LD_PRELOAD}

export OMP_NUM_THREADS=6
export OMP_PLACES=cores
export OMP_PROC_BIND=close
export MKL_NUM_THREADS=6
export MKL_DYNAMIC=FALSE

mpirun -np 8 --bind-to l3 --report-bindings vasp_std

Test case 2 – XAS (Core-level excitation)

The XAS Mn-in-ZnO case models a core-level excitation (X-ray Absorption Spectroscopy). These workloads are not FFT-dominated; instead they involve many unoccupied bands and projector evaluations.

Single-node results (XAS)

Configuration Time [s] Relative
MKL MPI-only 897 1.00×
AOCL MPI-only 905 0.99×
MKL L3 hybrid 1202 0.75×
AOCL L3 hybrid 1137 0.79×

Multi-node scaling (XAS)

Configuration Nodes Time [s] Relative
MKL MPI-only 2 1333 0.67×
AOCL MPI-only 2 1309 0.69×
MKL L3 hybrid 2 1366 0.66×
AOCL L3 hybrid 2 1351 0.67×

Interpretation

For core-level / XAS calculations, hybrid OpenMP parallelisation is counter-productive, and scaling beyond one node deteriorates performance due to load imbalance and communication overhead.


Recommendations for XAS and similar workloads

  • Use MPI-only and single-node configuration.
  • MKL and AOCL perform identically within margin of error.
  • Hybrid modes reduce efficiency and should be avoided.
  • Set OMP_NUM_THREADS=1 to avoid unwanted OpenMP activity.

General guidance

For optimal performance on Elysium with AMD EPYC processors, we recommend using the AOCL build as the default choice for all VASP workloads. AOCL consistently outperforms or matches MKL (+AVX512) across tested scenarios (e.g., 5 % faster for Si256 single-node, up to 1.89× speedup for multi-node scaling) and does not require additional configuration like libfakeintel. However, MKL remains a robust alternative, especially for users requiring compatibility with existing workflows.

Workload type Characteristics Recommended setup
Hybrid DFT (HSE06, PBE0, etc.) FFT + dense BLAS, OpenMP beneficial AOCL L3 Hybrid (8×6)
Standard DFT (PBE, LDA) light BLAS, moderate FFT AOCL L3 Hybrid or MPI-only
Core-level / XAS / EELS many unoccupied bands, projectors AOCL MPI-only (single-node)
MD / AIMD (>100 atoms) large FFTs per step AOCL L3 Hybrid
Static small systems (<20 atoms) few bands, small matrices AOCL MPI-only

Recommendations:

  • Default to AOCL: Use the AOCL build for all workloads unless specific constraints (e.g., compatibility with Intel-based tools) require MKL.
  • AVX512 for MKL: If using MKL, always preload libfakeintel2025.so to enable AVX512 optimizations.
  • Benchmark if unsure: Test both MPI-only and L3 Hybrid on one node to determine the optimal configuration for your specific system.

SLURM

The Elysium HPC system utilizes SLURM as a resource manager, scheduler, and accountant in order to guarantee fair share of the computing resources.

If you are looking for technical details regarding the usage and underlying mechanisms of SLURM we recommend participating in the Introduction to HPC training course.

Examples of job scripts for different job types that are tailored to the Elysium cluster can be found in the Training Section.

List of Partition

All nodes in the Elysium cluster are grouped by their hardware kind, and job submission type. This way users can request specific computing hardware, and multi node jobs are guaranteed to run on nodes with the same setup.

In order to get a list of the available partitions, their current state, and available nodes, the sinfo command can be used.

 1[login_id@login001 ~]$ sinfo
 2PARTITION      AVAIL  TIMELIMIT  NODES  STATE NODELIST
 3cpu               up 7-00:00:00      4  alloc cpu[033-034,037-038]
 4cpu               up 7-00:00:00    280   idle cpu[001-032,035-036,039-284]
 5cpu_filler        up    3:00:00      4  alloc cpu[033-034,037-038]
 6cpu_filler        up    3:00:00    280   idle cpu[001-032,035-036,039-284]
 7fat_cpu           up 2-00:00:00     13   idle fatcpu[001-013]
 8fat_cpu_filler    up    3:00:00     13   idle fatcpu[001-013]
 9gpu               up 2-00:00:00     20   idle gpu[001-020]
10gpu_filler        up    1:00:00     20   idle gpu[001-020]
11fat_gpu           up 2-00:00:00      1 drain* fatgpu005
12fat_gpu           up 2-00:00:00      5    mix fatgpu[001,003-004,006-007]
13fat_gpu           up 2-00:00:00      1   idle fatgpu002
14fat_gpu_filler    up    1:00:00      1 drain* fatgpu005
15fat_gpu_filler    up    1:00:00      5    mix fatgpu[001,003-004,006-007]
16fat_gpu_filler    up    1:00:00      1   idle fatgpu002
17vis               up 1-00:00:00      3   idle vis[001-003]

Requesting Nodes of a Partition

SLURM provides two commands to request resources. srun is used to start an interactive session.

1[login_id@login001 ~]$ srun -N 1 --partition=cpu --job-name=test --time=00:05:00 --account=testproj_0000 --pty bash
2[login_id@cpu001 ~]$

sbatch is used to request resources that will execute a job script.

1[login_id@login001 ~]$ sbatch -N 1 --partition=cpu --job-name=test --time=00:05:00 --account=testproj_0000 myscript.sh
2Submitted batch job 10290

For sbatch the submission flags can also be incorporated into the job script itself. More information about job scripts, and the required and some optional flags can be found in the Training/SLURM Header section.

On Elysium several flags are mandatory. sbatch and srun will refuse to queue the job and give a detailed explanation which flag is missing and how to incorporate it into your command or script.

Use spredict myscript.sh to estimate the start time of your job.

Shared Nodes

All nodes are shared by default. If a user requests fewer CPU-cores than a node provides, other users may use these resources at the same time. To ensure that the requested nodes are not shared use the --exclusive flag. If more than one node is requested the --exlusive flag is mandatory.

GPU Nodes

For requesting resources on a GPU node the --gpus=<number of GPUs> flag is required. In order to allow for fairly shared resources the number of CPUs per GPU is limited. Thus the --cpus-per-gpu=<number of CPU cores per GPU> is required as well. For multi node jobs --gpus-per-node=<number of GPUs per node> option needs to be set.

Visualization Nodes

For requesting resources on a visualization node no --gpu parameter is needed. The available GPU will automatically be shared between all jobs on the node.

List of Currently Running and Pending Jobs

If requested resources are currently not available, jobs are queued and will start as soon as the resources are available again. To check which jobs are currently running, and which ones are pending and for what reason the squeue command can be used. For privacy reasons only the user’s own jobs are displayed.

1[login_id@login001 ~]$ squeue
2             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
3             10290       cpu     test login_id  R       2:51      1 cpu001

List of Computing Resources Share

Users/Projects/Groups/Institutes are billed for computing resources used. To check how many resources a user is entitled to and how many they have already used the sshare command is used. For privacy reasons only the user’s own shares are displayed.

1[login_id@login001 ~]$ sshare
2Account                    User  RawShares  NormShares    RawUsage  EffectvUsage  FairShare
3-------------------- ---------- ---------- ----------- ----------- ------------- ----------
4testproj_0000          login_id       1000    0.166667    20450435      0.163985   0.681818

List of Project Accounts

Due to technical reasons the project names on Elysium have rather cryptic names, based on the loginID of the project manager and a number. In order to make it easier to select a project account for the --account flag for srun, or sbatch, and to check the share and usage of projects, the RUB-exclusive rub-acclist command can be used.

1[login_id@login001 ~]$ rub-acclist
2Project ID    | Project Description
3--------------+--------------------------------------------------
4testproj_0000 | The fundamental interconnectedness of all things
5testproj_0001 | The translated quaternion for optimal pivoting

Visualization

We provide Visualization via VirtualGL on the visualization nodes on Elysium.hpc.ruhr-uni-bochum.de

Requirements:

X11 server with 24-bit- or 32-bit Visuals. VirtualGL version > 3.0.2 installed.

You can check support for your Operating Sytsem at: https://virtualgl.org/Documentation/OSSupport You can download VirtualGL at: https://github.com/VirtualGL/virtualgl/releases

To use VirtualGL on Elysium, you will only need the VirtualGL client, it is not necessary to configure a VirtualGL Server.

Resource allocation:

Allocate resources in the vis partition.

salloc -p vis -N1 --time=02:00:00 --account=$ACCOUNT

This will allocate a share of one vis node for 2 hours. (For more options on node allocations see SLURM). Wait until a Slot in the vis partition is available. You can check if your resources are already allocated using the ‘squeue’ command.

Establish Virtual GL connection:

Connect directly from your computer to the visualization node via ssh with vglonnect -s Use one of the login servers login[001-004] as a jump host.

vglconnect -s $HPCUSER@vis001.elysium.hpc.rub.de -J $HPCUSER@login001.elysium.hpc.rub.de

If you are prompted for password, enter your RUB-password. If you don’t like long commands, you can configure one of the login nodes as jump host in your ~/.ssh/config for the vis[001-003] hosts. The command vglconnect -s accepts nearly the same syntax as ssh.

Run your Software:

Load a module if required. Start your application using vglrun, please remember to use useful command line options like -fps .

module load vmd
vglrun +pr -fps 60 vmd

Please remember to cancel the resource allocation once you are done with your interactive session.

scancel $jobID

Interactive Sessions

The Elysium cluster provides the possibility to work interactively to allow compute intensive preparations, post processing steps, or tests. Further, interactive sessions on the visualization nodes (vis[001-003]) allow for remote connections to run e.g. jupyter notebooks from your local machine.

Interactive Jobs

In order to reserve nodes interactively the following command should be used:

[login_id@login001 ~]$ srun -N 1 --partition=vis --job-name=interactive --time=01:00:00 --tasks-per-node=1 --account=<project_name> --pty bash

Depending on the load on the cluster it might take some time until resources can be allocated for your job. Even if resources are free the scheduler might need a few seconds to reserve the requested node. If you require a specific node for your interactive session you may utilize the -w <nodename> flag. Note that this might increase the time for your session to start until the requested node is available.

(For more options on node allocations see SLURM).

SSH From Login Node to Interactive Session

As long as you have any job running on the target node you can use ssh to connect from any login node to your target node. When the job terminates, all ssh connections will be terminated as well.

SSH From Local Machine to Interactive Session

Connection to compute nodes from local machines is only possible for the three visualization nodes (vis[001-003]). In order to allow a local machine to connect to a node with an active job, one of the login nodes needs to be defined as a jump host, to forward your ssh request.

Please make sure that the ~/.ssh/config on your local machine contains the following entries:

Host elysium-login001
    User <login_id>
    Hostname login001.elysium.hpc.rub.de
    IdentityFile ~/.ssh/elysium

Host *.elysium.hpc.rub.de *.elysium.hpc.ruhr-uni-bochum.de !login*.elysium.hpc.rub.de !login*.elysium.hpc.ruhr-uni-bochum.de
    User <login_id>
    IdentityFile ~/.ssh/elysium
    ProxyJump elysium-login001

Please ensure that the IdentityFile points to the correct (private) key. You are now able to connect to connect to one of the vis nodes with the following command:

[localuser@localmachine ~]$ ssh vis001.elysium.hpc.rub.de
[<login_id>@vis001 ~]$

Job Monitoring

With our web-based job monitoring system (ClusterCockpit), you can easily monitor and analyze the performance of your jobs on the Elysium HPC system. For a quick performance check, see Metrics to Check; for an in-depth analysis, refer to the HPC-Wiki. For details on the web interface, consult the official documentation.

Login

To access the job monitoring system, use your RUB LoginID and corresponding password as credentials. JobMon Login Page JobMon Login Page

Overview

After logging in successfully, you will see the “Clusters” overview, which displays the total number of jobs you have run and the current number of jobs running on the cluster. At present, this information includes only the Elysium cluster. You can continue from here, either by going to the total jobs overview, or the running jobs overview. Alternatively, you can click on “My Jobs” in the top left of the page, or search for job names/ids in the top right of the page. JobMon Landing Page JobMon Landing Page

My Jobs

The “My Jobs” page displays a list of your jobs, fully customizable to your requirements. Use the menus in the top left corner to sort or filter the list, and select the metrics you want to display for your jobs. Below, you’ll find a detailed table with job IDs, names, and your selected metrics.

MyJobs Page MyJobs Page

Job Details

This page is split into three sections. The first one shows general information: JobInfo, a footprint and a roofline diagram that shows how efficiently the job utilized the hardware. Note that the footprint is only updated every 10 minutes and the energy footprint is generated after the job finished.

In the next section some metrics are shown as diagrams. For some of the diagrams you can choose the scope, i.e. core, socket or node. The shown metrics and their order can be customized with the “Select Metrics” menu. This selection is saved per partition. Double-click the graph to zoom out if the scale is too small.

The last section displays selected metrics in a numerical way, lets you inspect your job script, and shows more detail about the job allocation an runtime parameters.

Job Page Job Page Job Page Job Page Job Page Job Page Job Page Job Page

Metrics

The following table shows the metrics which are available for jobs on Elysium:

Metric name Meaning Meaningful for shared jobs
CPU
cpu_load Load on the node (processes/threads requesting CPU time) No
cpu_load_core Load on CPU cores of a job (processes/threads per core) Yes
cpu_user Percentage of CPU time spent as user time for each CPU core Yes
clock Frequency of the CPU cores of the job Yes (affected by other jobs)
ipc Instructions per cycle Yes
flops_any Floating-point operations performed by CPU cores Yes
core_power Power consumption of individual CPU cores Yes
Memory
mem_bw Memory bandwidth No (full socket only)
mem_used Main memory used on the node No
job_mem_used Main memory used for the job Yes
disk_free Free disk space on the node No
GPU
nv_compute_processes Number of processes using the GPU Yes
acc_mem_used Accelerator (GPU) memory usage Yes
acc_mem_util Accelerator (GPU) memory utilization Yes
acc_power Accelerator (GPU) power usage Yes
acc_utilization Accelerator (GPU) compute utilization Yes
Filesystem
lustre_write_bw /lustre write bandwidth No
lustre_read_bw /lustre read bandwidth No
lustre_close /lustre file close requests No
lustre_open /lustre file open requests No
lustre_statfs /lustre file stat requests No
io_reads Local Disk I/O read operations/s No
io_writes Local Disk I/O write operations/s No
nfs4_close /home + /cluster file close requests No
nfs4_open /home + /cluster file open requests No
nfsio_nread /home I/O read bandwidth No
nfsio_nwrite /home I/O write bandwidth No
Network
ib_recv Omnipath receive bandwidth No
ib_xmit Omnipath transmit bandwidth No
ib_recv_pkts Omnipath received packets/s No
ib_xmit_pkts Omnipath transmitted packets/s No
net_bytes_in Ethernet incoming bandwidth No
net_bytes_out Ethernet outgoing bandwidth No
net_pkts_in Ethernet incoming packets/s No
net_pkts_out Ethernet outgoing packets/s No
NUMA Nodes
numastats_numa_hit NUMA hits/s No
numastats_numa_miss NUMA misses/s No
numastats_interleave_hit NUMA interleave hits/s No
numastats_local_node NUMA local node accesses/s No
numastats_numa_foreign NUMA foreign node accesses/s No
numastats_other_node NUMA other node accesses/s No
Node metrics
node_total_power Power consumption of the whole node No

Metrics to Check

For a quick performance analysis, here are some key metrics to review:

  • cpu_user: Should be close to 100%. Lower values indicate system processes are using some of your resources.
  • flops_any: Measures calculations per second. On Elysium, a typical CPU node averages around 400 GFLOPS.
  • cpu_load_core: Should be 1 at most for non-OpenMP jobs. Higher values suggest oversubscription.
  • ipc: Instructions executed per cycle. Higher values indicate better efficiency.
  • mem_bw: Memory bandwidth, maxing out at 350 GByte/s. Only meaningful if the node isn’t shared or your job uses a full socket.
  • acc_utilization: GPU compute utilization. Aim for high percentages (e.g., above 80%) to ensure efficient GPU usage.

Known Problems

Occasionally, an orange box labeled “No dataset returned for <metric>” may be shown instead of the graph. This occurs when the ClusterCockpit service was unable to collect the metrics during your job. Job Page Job Page Note that jobs that ran before March 12th 2025 may report missing or incorrect data in some cases.

The measurements for ipc and clock are sometimes too high. This is related to power saving features of the CPU. We are currently investigating how to solve this issue.

For jobs that ran before March 7th 2025 a bug triggered an overflow for the power usage metric resulting in unrealisticly high power consumptions. This bug is fixed, but the fix cannot be applied to older jobs that were affected by it. Job Page Job Page