Documentation

Elysium is the central HPC Cluster at the Ruhr-Universität Bochum. See the overview of its resources.

To use Elysium, you need to

Have a RUB LoginID with two-factor authentication (activate at rub.de/login)
Fill out user access application form
Be a member of an approved HPC project belonging to a HPC project manager

Please read about the basic concept of using Elysium first.

The login process combines SSH key-based authentication with web-based two-factor authentication.

After login, you can use available software modules or build your own software.

Read about submitting jobs and allocating resources in the SLURM section.

Basics

Elysium provides four login nodes: login1.elysium.hpc.rub.de, …, login4.elysium.hpc.rub.de. These are your entry points to the cluster.

After login, you typically use them to prepare your software and copy your data to the appropriate locations. Performing calculations on the login nodes is prohibited (see FAQ).

You can then allocate resources on the cluster using the Slurm workload manager.

After submitting your request, Slurm will grant you the resources as soon as they are free and your priority is higher than the priority of other jobs that might be waiting for some of the same resources.

Your priority depends on your waiting time and your remaining FairShare.

Circumventing these steps (e.g., opening sessions without the need for two factor authentication, sharing of accounts, keys, or login credentials) violates the IT.SERVICES Terms of Use and may result in permanent loss of access rights to the Elysium cluster and/or other services provided by IT.SERVICES.

In order to be able to authenticate during login you need submit your public SSH key via the User Access Application Form as well as enable two-factor authentication for your RUB LoginID at rub.de/login.

The additional web-based authentication is cached for 14 hours so that you typically only have to do it once per work day, per login node, and per IP address you connect from. After that, your normal key-based SSH workflow will work as expected.

In order to simplify the use of SSH keys we recommend to specify it as identity file in your SSH config. This can be done by adding the following lines to your ~/.ssh/config file:

Host login00*.elysium.hpc.rub.de login00*.elysium.hpc.ruhr-uni-bochum.de
    IdentityFile ~/.ssh/elysium
    User <loginID>

where <loginID> has to be exchanged by your RUB LoginID. If your SSH key is located in a different file the IdentityFile path needs to be adjusted accordingly.

Follow these steps:

Start ssh with the correct private key, your RUB LoginID, and one of the four login hosts, e.g.
ssh -i ~/.ssh/elysium LOGINID@login001.elysium.hpc.ruhr-uni-bochum.de, or ssh login001.elysium.hpc.rub.de if you want to use the SSH config specified above. Available login nodes are login001 to login004. Note that if it is the first time you connect to the cluster you are asked to verify its identity via its host key via it’s SHA256 hash. The Elysium Host key SHA256 hash is:

SHA256:nGNVhRC0bBD6FLkuJnQULhuOb/xJTebcxpdOhA99h8M

Verify the host by typing yes if the hash matches.
Open the URL in a browser (or scan the QR code with your smartphone) to start web-based two-factor authentication.
Enter the second factor for two-factor authentication.
After successful login, you get a four-digit verification code.
Enter this code at your ssh prompt to finish login.

For the next 14 hours, only step 1 (classic key-based authentication) will be necessary on the chosen login node for the IP address you connected from.

You use the wrong private key (“Permission denied (publickey)”)
You are not member of an active HPC project (“Permission denied (publickey)”)
You did not enable two-factor authentication for your LoginID (“Two-factor authentication is required”)
Web-based login fails
You enter the wrong verification code (“Verification failed”)
A timeout happens between starting the SSH session and finalizing web-based login (“session_id not found”); just start the process again to get a new session ID.

Software

We provide a basic set of toolchains and some common libraries via modules.

To build common HPC software packages, we provide a central installation of the Spack package manager. A detailed guide on how to use this installation can be found in the Spack Usage Guide.

Modules

We use the Lmod module system to provide compilers, MPI stacks, libraries, and tools.

The legacy module tree (elysium/2024) is currently the default view. To switch to the current tree, load:

module load elysium/2026

The elysium/202x modules modify the MODULEPATH. They are sticky, so they are not removed by module purge. elysium/2024 and elysium/2026 are mutually exclusive: loading one unloads the other. The legacy tree (elysium/2024) is frozen and no longer updated by admins. The 2026 tree is built with gcc@13.4.0, which provides better optimization support for Elysium’s Zen 4 CPUs than gcc@11.

Known Issues

Intel MPI in elysium/2024 (legacy tree) is known to be unstable in this version. It can cause random MPI deadlocks where applications stop progressing while the Slurm job is still running.
In the current tree (elysium/2026), Intel MPI is stable and recommended.
MPICH versions before 5.0.0 are not suitable for multi-node runs. Use 5.0.0 or newer for multi-node jobs.
If you must use MPICH < 5.0.0, restrict jobs to single-node execution.

Common module commands

ml is a short alias for module.

ml av: list available modules in the current module path.
ml load <module>: load a module. Example: ml load openmpi/5.0.9.
ml list: show currently loaded modules.
ml unload <module>: unload a module. Example: ml unload openmpi/5.0.9.
ml purge: unload all currently loaded modules except sticky ones.
ml help <module>: show module help text. Example: ml help openmpi/5.0.9.
ml show <module>: print the contents of the modulefile.
ml whatis <module>: show a short module description.
ml spider <name>: search all known versions. Use ml spider -A <name> to also show hidden entries.
ml use <path>: add a directory to your module search path.
ml unuse <path>: remove a directory from your module search path.

Using your own modulefiles

You can keep your own modulefiles in a personal directory and add it with ml use.

mkdir -p "$HOME/modules/mytool"
cat > "$HOME/modules/mytool/1.0.lua" <<'LUA'
help([[MyTool 1.0]])
whatis("Name: MyTool")
whatis("Version: 1.0")
depends_on("openmpi/5.0.9")
prepend_path("PATH", "/path/to/mytool/bin")
LUA

ml use "$HOME/modules"
ml load mytool/1.0

In this example, loading mytool/1.0 also loads openmpi/5.0.9 as a dependency.

To make your personal modules always visible, add this to your ~/.bashrc:

ml use "$HOME/modules"

If you need additional software or module versions, please contact support.

Spack

We provide a central Spack 1.1.1 installation in parallel to the existing 0.23.0 stack.

This separation is necessary because a database migration would render the 0.23.0 installation unusable. The new 2026 module tree is not compatible with the legacy tree, and self-built packages from the old setup must be rebuilt.

By default, you will still see the legacy module tree. The new stack is opt-in via:

module load spack/2026

If you already used the legacy setup, use this dedicated migration guide: Migration Guide for Existing Users.

Legacy documentation remains available here: Spack (Legacy 0.23.0).

Quick Setup

For a fresh setup:

rub-deploy-spack-configs-2026
module load spack/2026

After that, you can install your own packages into your home Spack tree and generate personal modulefiles. Add module load spack/2026 to your ~/.bashrc if you want this active in ever login shell.

Architecture Overview

We use a central Spack installation combined with per-user overlays:

Central Spack (read-only, maintained by HPC)
    ├─ pre-built packages
    ├─ MPI configurations
    └─ cluster-wide defaults
            ↓ (upstream)
User Spack ($HOME)
    ├─ personal installs
    └─ personal modules
            ↓
Lmod module system

This means:

You automatically use centrally installed packages when available.
Cluster-wide configurations (e.g. MPI defaults, compiler settings) are inherited.
Your own installations go into your personal $HOME Spack tree.
Only missing packages are built locally.

Guide to Using Spack

This section covers the most common Spack workflows on Elysium.

Searching and Inspecting Packages

Search for packages:

spack list <keyword>
spack list openfoam

Show details (versions, variants, dependencies):

spack info hdf5

Preview what will actually be installed:

spack spec hdf5 +mpi

Always check spack spec before installing complex packages.

Installing Packages

Basic installation:

spack install hdf5

Enable or disable variants:

spack install hdf5 +mpi +cxx ~fortran

Specify a compiler:

spack install hdf5 %gcc@13.4.0

Force rebuilding dependencies with the same compiler:

spack install --fresh hdf5 %gcc@13.4.0

Specify dependencies explicitly:

spack install hdf5 ^openmpi@5.0.9

Everything combined:

spack install hdf5@1.14.6 +mpi %gcc@13.4.0 ^openmpi@5.0.9

Virtual Providers (BLAS, MPI, FFT, etc.)

Some packages depend on virtual interfaces instead of concrete libraries. Examples include:

blas
lapack
mpi
fftw-api

To see which virtual packages exist:

spack providers

To list available providers for a specific interface:

spack providers blas
spack providers fftw-api

Example output:

Providers for blas:
    amdblis
    openblas
    intel-oneapi-mkl

You can select a specific provider during installation:

spack install gromacs ^blas=openblas

This is often preferred over manually selecting a specific concrete library, because it keeps the dependency graph clean and compatible.

Inspecting and Comparing Installations

List installed packages with variants and hashes:

spack find -vl hdf5

Inspect a specific installation:

spack spec /<hash>

Compare two installations:

spack diff /hash1 /hash2

Removing Packages

Remove a specific installation by hash:

spack uninstall /<hash>

Overriding Package Definitions

On Elysium, the central builtin repository is provided by the HPC team. If you want to override a package definition (e.g. to test changes), you can create a local repository on top of it.

Create a local repo

mkdir -p $HOME/spack/var/spack/repos/packages
cat > $HOME/spack/var/spack/repos/repo.yaml <<'EOF'
repo:
  namespace: overrides
EOF

repos:
  overrides: $HOME/spack/var/spack/repos
  builtin:
    destination: /cluster/spack/spack-packages

Check:

spack repo list

Override a package (example: ffmpeg)

mkdir -p $HOME/spack/var/spack/repos/packages/ffmpeg
cp /cluster/spack/spack-packages/repos/spack_repo/builtin/packages/ffmpeg/package.py \
   $HOME/spack/var/spack/repos/packages/ffmpeg/

Edit the copied package.py and adjust versions, dependencies, variants, etc.

Install explicitly from your namespace:

spack install overrides.ffmpeg

Verify which repository is used

For a spec (not yet installed):

spack spec -N ffmpeg

For installed packages:

spack find -N ffmpeg

The -N option shows the namespace (overrides or builtin).

Custom Changes to Packages using `spack develop`

If you want to modify the source code of a package (e.g. openfoam) and rebuild it locally, you can use spack develop. This allows you to work directly on a source checkout without creating tarballs or calculating checksums.

Create and activate a development environment

mkdir -p ~/openfoam-dev
cd ~/openfoam-dev
spack env create -d .
spacktivate . #shortcut for `spack env activate .`

Using a dedicated environment keeps your development work isolated from your normal Spack setup.

Add and install the package

spack add openfoam
spack install

This performs a normal installation and ensures all dependencies are available.

Switch to development mode

spack develop openfoam

This checks out the source code into the environment directory (e.g. ~/openfoam-dev/openfoam/) and registers it as the active development source.

You can verify this with:

spack find -cv openfoam

Look for dev_path=.../openfoam in the output.

Modify the source and rebuild

cd ~/openfoam-dev/openfoam
# edit source files here (e.g. with vim)

cd ~/openfoam-dev
spack install

Spack will now build openfoam from your modified local sources.

If compilation fails, Spack will print the relevant error messages and the path to the full build log.

Legacy Setup

If you need to reference the old 0.23.0 setup, use: Spack (Legacy 0.23.0).

MATLAB

This page describes how to configure MATLAB to submit jobs to the Elysium HPC cluster, retrieve results, and debug errors.

Initial Configuration

Running MATLAB on the HPC Cluster

This setup is intended for job submission when you are logged directly into the cluster via the command line. This process only needs to be completed once per cluster.

After logging into the cluster, start MATLAB and run:

configCluster

Jobs will run across multiple nodes on the cluster rather than on the host machine.

Running MATLAB on the Desktop

This setup is intended for job submission when MATLAB is installed on your machine and jobs are run remotely on the cluster. This setup needs to be done once per cluster, per version of MATLAB installed on your machine.

Start MATLAB and run:

userpath

Download the Integration Scripts (RUB.Desktop.zip) directly from this page. Extract the ZIP contents into the folder returned by userpath.

Create a new cluster profile:

configCluster

Submission to the cluster requires SSH credentials. You will be prompted for your cluster username (LoginID).

Jobs will now run on the cluster rather than on the local machine. Before submitting jobs, log in to login002 via SSH on the command line so the two-factor authentication login is cached.

Note To run jobs on the local machine instead of the cluster, use the Processes profile.

% Get a handle to the local resources
c = parcluster('Processes');

Configuring Jobs

Prior to submitting a job, you can assign scheduler flags such as queue, email, wall time, and more. The following properties are mandatory and must be set before you can submit a job:

AccountName
Nodes
Partition
WallTime

% Get a handle to the cluster
c = parcluster;

% REQUIRED

% Specify an account
c.AdditionalProperties.AccountName = 'account-name';

% Specify number of nodes
c.AdditionalProperties.Nodes = 1;

% Specify the partition
c.AdditionalProperties.Partition = 'partition-name';

% Specify the wall time (e.g. 1 day, 5 hours, 30 minutes)
c.AdditionalProperties.WallTime = '1-05:30';

% OPTIONAL

% Specify a constraint
c.AdditionalProperties.Constraint = 'feature-name';

% Request email notification of job status
c.AdditionalProperties.EmailAddress = 'firstname.familyname@ruhr-uni-bochum.de';

% Specify number of GPUs (default: 0)
c.AdditionalProperties.GPUsPerNode = 1;

% Specify the number of CPUs per GPU
c.AdditionalProperties.CPUsPerGPU = 1;

% Specify memory to use, per core (default: 4GB)
c.AdditionalProperties.MemPerCPU = '6GB';

% Specify cores per node (default: 0)
c.AdditionalProperties.ProcsPerNode = 4;

% Set node exclusivity (default: false)
% Note that this will automatically be set to true if using more
% than one node.
c.AdditionalProperties.RequireExclusiveNode = true;

% Specify a reservation
c.AdditionalProperties.Reservation = 'reservation-name';

To persist changes made to AdditionalProperties between MATLAB sessions, save the profile:

c.saveProfile

To see the values of the current configuration options, display AdditionalProperties:

c.AdditionalProperties

Unset a value when it is no longer needed:

% Turn off email notifications
c.AdditionalProperties.EmailAddress = '';

% Do not request an entire node
c.AdditionalProperties.RequireExclusiveNode = false;

Note The instructions above cover the basics of configuring and running jobs on the cluster. For a more in-depth walkthrough of the job submission workflow, see the demo script ScalingToTheClusterDemoRemote.mlx.

Independent Batch Job - MATLAB on the HPC Cluster or Desktop

Use the batch command to submit asynchronous jobs to the cluster. The batch command returns a job object, which is used to access the output of the submitted job. See the MATLAB documentation for batch for more details.

% Get a handle to the cluster
c = parcluster;

% Submit job to query where MATLAB is running on the cluster
job = c.batch(@pwd, 1, {}, 'CurrentFolder', '.', 'AutoAddClientPath', false);

% Query job for state
job.State

% If job is finished, fetch the results
job.fetchOutputs{1}

% Delete the job after results are no longer needed
job.delete

To retrieve a list of running or completed jobs, call parcluster to return the cluster object. The cluster object stores an array of jobs that are listed as queued, running, finished, or failed.

c = parcluster;
jobs = c.Jobs

% Get a handle to the second job in the list
job2 = c.Jobs(2);

Once the job has been selected, fetch the results as previously shown.

fetchOutputs is used to retrieve function output arguments. If you call batch with a script, use load instead. Data written to disk on the cluster must be retrieved directly from the file system, for example via SFTP.

% Fetch all results from the second job in the list
job2.fetchOutputs{:}

% Alternate: load results if the job was a script instead of a function
job2.load

Parallel Batch Job - MATLAB on the HPC Cluster or Desktop

The batch command also supports parallel workflows. Save the following example as parallel_example.m.

function [sim_t, A] = parallel_example(iter)

if nargin == 0
    iter = 8;
end

disp('Start sim')

A = nan(iter, 1);
t0 = tic;
parfor idx = 1:iter
    A(idx) = idx;
    pause(2)
    idx
end
sim_t = toc(t0);

disp('Sim completed')

save RESULTS A

end

When using the batch command, specify a Pool argument:

% Get a handle to the cluster
c = parcluster;

% Submit a batch pool job using 4 workers for 16 simulations
job = c.batch(@parallel_example, 1, {16}, 'CurrentFolder', '.', 'Pool', 4, 'AutoAddClientPath', false);

% View current job status
job.State

% Fetch the results after a finished state is retrieved
job.fetchOutputs{1}

Example output:

ans =
    8.1678

The job ran in 8.17 seconds using four workers. Note that these jobs always request N + 1 CPU cores, since one worker is required to manage the batch job and pool of workers. For example, a job that needs eight workers will require nine CPU cores.

Run the same simulation again but increase the pool size. This time, to retrieve the results later, keep track of the job ID.

Note For some applications, there will be diminishing returns when allocating too many workers, as the overhead may exceed computation time.

% Get a handle to the cluster
c = parcluster;

% Submit a batch pool job using 8 workers for 16 simulations
job = c.batch(@parallel_example, 1, {16}, 'CurrentFolder', '.', 'Pool', 8, 'AutoAddClientPath', false);

% Get the job ID
id = job.ID

Example output:

id =
    4

% Clear job from workspace (as though MATLAB exited)
clear job

With a handle to the cluster, the findJob method searches for the job with the specified job ID:

% Get a handle to the cluster
c = parcluster;

% Find the old job
job = c.findJob('ID', 4);

% Retrieve the state of the job
job.State

Example output:

ans =
    finished

% Fetch the results
job.fetchOutputs{1};

Example output:

ans =
    4.1503

The job now runs in 4.15 seconds using eight workers. Run code with different numbers of workers to determine the ideal number to use. Alternatively, to retrieve job results via a graphical user interface, use the Job Monitor (Parallel > Monitor Jobs). It will take some time until the list is shown.

Debugging

If a serial job produces an error, call the getDebugLog method to view the error log file.

When submitting an independent job, specify the task:

c.getDebugLog(job.Tasks)

For pool jobs, specify only the job object:

c.getDebugLog(job)

When troubleshooting a job, the cluster administrators may request the scheduler ID of the job. You can derive it by calling getTaskSchedulerIDs:

job.getTaskSchedulerIDs()

Example output:

ans =
    4911680

Helper Functions

Function	Description	Notes
`clusterFeatures`	Lists cluster features / constraints
`clusterGpuCards`	Lists cluster GPU cards
`clusterPartitionNames`	Lists cluster partition / queue names
`disableArchiving`	Modifies file archiving to resolve file mirroring issues	Applicable only to Desktop
`fixConnection`	Reestablishes cluster connection (for example after reconnecting VPN)	Applicable only to Desktop
`seff`	Displays Slurm statistics related to the efficiency of resource usage by the job
`willRun`	Explains why a job is queued

To Learn More

To learn more about the MATLAB Parallel Computing Toolbox, see:

Vampir

Vampir is a framework for analyzing program behavior of serial and parallel software by utilizing function instrumentation via Score-p. Vampir is licensed by HPC.nrw and can be used freely on the Elysium Cluster. This site merely shows a small test case to show how Score-p can be used to generate profiling data and how Vampir can be started on Elysium. For information how to use Vampir to analyze your application, extract useful performance metrics, and identify bottlenecks, please refer to the Score-p Cheat Sheet and the official Vampir Documentation.

Compilation with Instrumented Functions

In order to generate profiling data the function calls in the application need to be instrumented. This means inserting additional special function calls that record the time, current call stack, and much more. Fortunately, this is not done manually, but can easily achieved by using the Score-p compiler wrapper. To follow along you can use this MPI Example Code.

To use the Score-p compiler wrapper, all that is needed is to prepend the compiler by the scorep command:

module load openmpi/5.0.5-d3ii4pq
module load scorep/8.4-openmpi-5.0.5-6mtx3p6
scorep mpicc -o mpi-test.x mpi-test.c

In the case of a Makefile, or other build systems, the compiler variable has to be adjusted accordingly.

Generating Profiling Data

Profiling data is created by running the application. Note that the profiling files can grow to enormous sizes. Thus, it is advisable to choose a small representative test case for your application and not a full production run. In its default mode Score-p collects profiling data by sampling the applications call-stack from time to time. In order to generate an accurate profile tracing needs to be enabled in your job script:

module load openmpi/5.0.5-d3ii4pq
module load scorep/8.4-openmpi-5.0.5-6mtx3p6

export SCOREP_ENABLE_TRACING=true
mpirun -np 4 ./mpi-test.x

Here is a full job script for the example:

#!/bin/bash
#SBATCH --partition=cpu
#SBATCH --ntasks=4
#SBATCH --nodes=1
#SBATCH --account=<Account>
#SBATCH --time=00-00:05:00

module purge
module load openmpi/5.0.5-d3ii4pq
module load scorep/8.4-openmpi-5.0.5-6mtx3p6

export SCOREP_ENABLE_TRACING=true
mpirun -np 4 ./mpi-test.x

The execution of the instrumented application will take significantly longer than usual. Thus, it should never be used for production runs, but merely for profiling. After the application is finished a new directory was created, containing the time stamp and some other information in its name e.g.: scorep-20251222_0912_1523094386395226 The file traces.otf2 contains the profiling data required by Vampir.

Visualizing With Vampir

In order to visualize the profiling data a Visualization Session has to be established. Vampir can be started with

module load vampir
vglrun +pr -fps 20 vampir ./traces.otf2

This will open the Vampir graphical user interface:

VASP

Build configuration (MKL)

On Elysium, VASP can be built with Spack using:

spack install vasp@6.4.3 +openmp +fftlib ^openmpi@5.0.5 ^fftw@3+openmp ^intel-oneapi-mkl threads=openmp +ilp64

This configuration uses:

Intel oneAPI MKL (ILP64) for BLAS, LAPACK and ScaLAPACK,
VASP’s internal FFTLIB to avoid MKL CDFT issues on AMD,
OpenMPI 5.0.5 as MPI implementation,
OpenMP enabled for hybrid parallelisation.

We choose MKL as baseline because it is the de-facto HPC standard and performs well on AMD EPYC when AVX512 code paths are enabled.

Activating AVX512

Intel’s MKL only enables AVX512 optimisations on Intel CPUs. On AMD, MKL defaults to AVX2/SSE code paths.

To unlock the faster AVX512 kernels on AMD EPYC we provide libfakeintel, which fakes Intel CPUID flags.

MKL version	library to preload
≤ 2024.x	`/lib64/libfakeintel.so`
≥ 2025.x	`/lib64/libfakeintel2025.so`	works for older versions too

⚠ Intel gives no guarantee that all AVX512 instructions work on AMD CPUs. In practice, the community has shown that not every kernel uses full AVX512 width, but the overall speed-up is still substantial.

Activate AVX512 by preloading the library in your job:

export LD_PRELOAD=/lib64/libfakeintel2025.so:${LD_PRELOAD}

Test case 1 – Si256 (DFT / Hybrid HSE06)

This benchmark uses a 256-atom silicon supercell (Si256) with the HSE06 hybrid functional. Hybrid DFT combines FFT-heavy parts with dense BLAS/LAPACK operations and is therefore a good proxy for most large-scale electronic-structure workloads.

Baseline: MPI-only, 1 node

Configuration	Time [s]	Speed-up vs baseline
MKL (no AVX512)	2367	1.00×
MKL (+ AVX512)	2017	1.17×

→ Always enable AVX512. The baseline DFT case runs 17 % faster with libfakeintel,

Build configuration (AOCL)

AOCL (AMD Optimized Libraries) is AMD’s analogue to MKL, providing:

AMDBLIS (BLAS implementation)
AMDlibFLAME (LAPACK)
AMDScaLAPACK, AMDFFTW optimised for AMD EPYC
built with AOCC compiler

Build example:

spack install vasp@6.4.3 +openmp +fftlib %aocc ^amdfftw@5 ^amdblis@5 threads=openmp ^amdlibflame@5 ^amdscalapack@5 ^openmpi

AOCL detects AMD micro-architecture automatically and therefore does not require libfakeintel.

Baseline: MPI-only, 1 node

Configuration	Time [s]	Speed-up vs baseline
MKL (+ AVX512)	2017	1.00
AOCL (AMD BLIS / libFLAME)	1919	1.05

The AOCl build is another 5% faster than MKL with AVX512 enabled.

Hybrid parallelisation and NUMA domains

Each compute node has two EPYC 9254 CPUs with 24 cores each (48 total). Each CPU is subdivided into 4 NUMA domains with separate L3 caches and memory controllers.

MPI-only: 48 ranks per node (1 per core).
Hybrid L3: 8 MPI ranks × 6 OpenMP threads each, bound to individual L3 domains.

This L3-hybrid layout increases memory locality, because each rank mainly uses its own local memory and avoids cross-socket traffic.

Single-node hybrid results (Si256)

Configuration	Time [s]	Speed-up vs MPI-only
MKL (L3 hybrid)	1936	1.04×
AOCL (L3 hybrid)	1830	1.05×

Hybrid L3 adds a modest 4-5 % speed-up.

Multi-node scaling (Si256)

Configuration	Nodes	Time [s]	Speed-up vs 1-node baseline
MKL MPI-only	2	1305	1.55×
AOCL MPI-only	2	1142	1.68×
MKL L3 hybrid	2	1147	1.69×
AOCL L3 hybrid	2	968	1.89×

Interpretation

AOCL shows the strongest scaling across nodes; MKL’s hybrid variant catches up in scaling compared to its MPI-only counterpart. The L3-hybrid layout maintains efficiency even in the multi-node regime.

Recommendations for DFT / Hybrid-DFT workloads

AOCL generally outperforms MKL (+AVX512) on AMD EPYC.
Prefer L3-Hybrid (8×6) on single-node and even multi-node jobs for FFT-heavy hybrid-DFT cases.
For pure MPI runs, both MKL (+AVX512) and AOCL scale well; AOCL slightly better.
Always preload libfakeintel2025.so if MKL is used.

Jobscript examples

AOCL – Hybrid L3 (8×6)

#!/bin/bash
#SBATCH -J vasp_aocl_l3hyb
#SBATCH -N 1
#SBATCH --ntasks=8
#SBATCH --cpus-per-task=6
#SBATCH -p cpu
#SBATCH -t 48:00:00
#SBATCH --exclusive

module purge
module load vasp-aocl

export OMP_NUM_THREADS=6
export OMP_PLACES=cores
export OMP_PROC_BIND=close
export BLIS_NUM_THREADS=6

mpirun -np 8 --bind-to l3 --report-bindings vasp_std

MKL (+AVX512) – Hybrid L3 (8×6)

#!/bin/bash
#SBATCH -J vasp_mkl_avx512_l3hyb
#SBATCH -N 1
#SBATCH --ntasks=8
#SBATCH --cpus-per-task=6
#SBATCH -p cpu
#SBATCH -t 48:00:00
#SBATCH --exclusive

module purge
module load vasp-mkl
export LD_PRELOAD=/lib64/libfakeintel2025.so:${LD_PRELOAD}

export OMP_NUM_THREADS=6
export OMP_PLACES=cores
export OMP_PROC_BIND=close
export MKL_NUM_THREADS=6
export MKL_DYNAMIC=FALSE

mpirun -np 8 --bind-to l3 --report-bindings vasp_std

Test case 2 – XAS (Core-level excitation)

The XAS Mn-in-ZnO case models a core-level excitation (X-ray Absorption Spectroscopy). These workloads are not FFT-dominated; instead they involve many unoccupied bands and projector evaluations.

Single-node results (XAS)

Configuration	Time [s]	Relative
MKL MPI-only	897	1.00×
AOCL MPI-only	905	0.99×
MKL L3 hybrid	1202	0.75×
AOCL L3 hybrid	1137	0.79×

Multi-node scaling (XAS)

Configuration	Nodes	Time [s]	Relative
MKL MPI-only	2	1333	0.67×
AOCL MPI-only	2	1309	0.69×
MKL L3 hybrid	2	1366	0.66×
AOCL L3 hybrid	2	1351	0.67×

Interpretation

For core-level / XAS calculations, hybrid OpenMP parallelisation is counter-productive, and scaling beyond one node deteriorates performance due to load imbalance and communication overhead.

Recommendations for XAS and similar workloads

Use MPI-only and single-node configuration.
MKL and AOCL perform identically within margin of error.
Hybrid modes reduce efficiency and should be avoided.
Set OMP_NUM_THREADS=1 to avoid unwanted OpenMP activity.

General guidance

For optimal performance on Elysium with AMD EPYC processors, we recommend using the AOCL build as the default choice for all VASP workloads. AOCL consistently outperforms or matches MKL (+AVX512) across tested scenarios (e.g., 5 % faster for Si256 single-node, up to 1.89× speedup for multi-node scaling) and does not require additional configuration like libfakeintel. However, MKL remains a robust alternative, especially for users requiring compatibility with existing workflows.

Workload type	Characteristics	Recommended setup
Hybrid DFT (HSE06, PBE0, etc.)	FFT + dense BLAS, OpenMP beneficial	AOCL L3 Hybrid (8×6)
Standard DFT (PBE, LDA)	light BLAS, moderate FFT	AOCL L3 Hybrid or MPI-only
Core-level / XAS / EELS	many unoccupied bands, projectors	AOCL MPI-only (single-node)
MD / AIMD (>100 atoms)	large FFTs per step	AOCL L3 Hybrid
Static small systems (<20 atoms)	few bands, small matrices	AOCL MPI-only

Recommendations:

Default to AOCL: Use the AOCL build for all workloads unless specific constraints (e.g., compatibility with Intel-based tools) require MKL.
AVX512 for MKL: If using MKL, always preload libfakeintel2025.so to enable AVX512 optimizations.
Benchmark if unsure: Test both MPI-only and L3 Hybrid on one node to determine the optimal configuration for your specific system.

SLURM

The Elysium HPC system utilizes SLURM as a resource manager, scheduler, and accountant in order to guarantee fair share of the computing resources.

If you are looking for technical details regarding the usage and underlying mechanisms of SLURM we recommend participating in the Introduction to HPC training course.

Examples of job scripts for different job types that are tailored to the Elysium cluster can be found in the Training Section.

List of Partition

All nodes in the Elysium cluster are grouped by their hardware kind, and job submission type. This way users can request specific computing hardware, and multi node jobs are guaranteed to run on nodes with the same setup.

In order to get a list of the available partitions, their current state, and available nodes, the sinfo command can be used.

 1[login_id@login001 ~]$ sinfo
 2PARTITION      AVAIL  TIMELIMIT  NODES  STATE NODELIST
 3cpu               up 7-00:00:00      4  alloc cpu[033-034,037-038]
 4cpu               up 7-00:00:00    280   idle cpu[001-032,035-036,039-284]
 5cpu_filler        up    3:00:00      4  alloc cpu[033-034,037-038]
 6cpu_filler        up    3:00:00    280   idle cpu[001-032,035-036,039-284]
 7fat_cpu           up 2-00:00:00     13   idle fatcpu[001-013]
 8fat_cpu_filler    up    3:00:00     13   idle fatcpu[001-013]
 9gpu               up 2-00:00:00     20   idle gpu[001-020]
10gpu_filler        up    1:00:00     20   idle gpu[001-020]
11fat_gpu           up 2-00:00:00      1 drain* fatgpu005
12fat_gpu           up 2-00:00:00      5    mix fatgpu[001,003-004,006-007]
13fat_gpu           up 2-00:00:00      1   idle fatgpu002
14fat_gpu_filler    up    1:00:00      1 drain* fatgpu005
15fat_gpu_filler    up    1:00:00      5    mix fatgpu[001,003-004,006-007]
16fat_gpu_filler    up    1:00:00      1   idle fatgpu002
17vis               up 1-00:00:00      3   idle vis[001-003]

Requesting Nodes of a Partition

SLURM provides two commands to request resources. srun is used to start an interactive session.

1[login_id@login001 ~]$ srun -N 1 --partition=cpu --job-name=test --time=00:05:00 --account=testproj_0000 --pty bash
2[login_id@cpu001 ~]$

sbatch is used to request resources that will execute a job script.

1[login_id@login001 ~]$ sbatch -N 1 --partition=cpu --job-name=test --time=00:05:00 --account=testproj_0000 myscript.sh
2Submitted batch job 10290

For sbatch the submission flags can also be incorporated into the job script itself. More information about job scripts, and the required and some optional flags can be found in the Training/SLURM Header section.

On Elysium several flags are mandatory. sbatch and srun will refuse to queue the job and give a detailed explanation which flag is missing and how to incorporate it into your command or script.

Use spredict myscript.sh to estimate the start time of your job.

Shared Nodes

All nodes are shared by default. If a user requests fewer CPU-cores than a node provides, other users may use these resources at the same time. To ensure that the requested nodes are not shared use the --exclusive flag. If more than one node is requested the --exlusive flag is mandatory.

GPU Nodes

For requesting resources on a GPU node the --gpus=<number of GPUs> flag is required. In order to allow for fairly shared resources the number of CPUs per GPU is limited. Thus the --cpus-per-gpu=<number of CPU cores per GPU> is required as well. For multi node jobs --gpus-per-node=<number of GPUs per node> option needs to be set.

Visualization Nodes

For requesting resources on a visualization node no --gpu parameter is needed. The available GPU will automatically be shared between all jobs on the node.

List of Currently Running and Pending Jobs

If requested resources are currently not available, jobs are queued and will start as soon as the resources are available again. To check which jobs are currently running, and which ones are pending and for what reason the squeue command can be used. For privacy reasons only the user’s own jobs are displayed.

1[login_id@login001 ~]$ squeue
2             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
3             10290       cpu     test login_id  R       2:51      1 cpu001

Users/Projects/Groups/Institutes are billed for computing resources used. To check how many resources a user is entitled to and how many they have already used the sshare command is used. For privacy reasons only the user’s own shares are displayed.

1[login_id@login001 ~]$ sshare
2Account                    User  RawShares  NormShares    RawUsage  EffectvUsage  FairShare
3-------------------- ---------- ---------- ----------- ----------- ------------- ----------
4testproj_0000          login_id       1000    0.166667    20450435      0.163985   0.681818

List of Project Accounts

Due to technical reasons the project names on Elysium have rather cryptic names, based on the loginID of the project manager and a number. In order to make it easier to select a project account for the --account flag for srun, or sbatch, and to check the share and usage of projects, the RUB-exclusive rub-acclist command can be used.

1[login_id@login001 ~]$ rub-acclist
2Project ID    | Project Description
3--------------+--------------------------------------------------
4testproj_0000 | The fundamental interconnectedness of all things
5testproj_0001 | The translated quaternion for optimal pivoting

Interactive Sessions

The Elysium cluster provides the possibility to work interactively to allow compute intensive preparations, post processing steps, or tests. Further, interactive sessions on the visualization nodes (vis[001-003]) allow for remote connections to run e.g. jupyter notebooks from your local machine.

Interactive Jobs

In order to reserve nodes interactively the following command should be used:

[login_id@login001 ~]$ srun -N 1 --partition=vis --job-name=interactive --time=01:00:00 --tasks-per-node=1 --account=<project_name> --pty bash

Depending on the load on the cluster it might take some time until resources can be allocated for your job. Even if resources are free the scheduler might need a few seconds to reserve the requested node. If you require a specific node for your interactive session you may utilize the -w <nodename> flag. Note that this might increase the time for your session to start until the requested node is available.

(For more options on node allocations see SLURM).

As long as you have any job running on the target node you can use ssh to connect from any login node to your target node. When the job terminates, all ssh connections will be terminated as well.

SSH From Local Machine to Interactive Session

Connection to compute nodes from local machines is only possible for the three visualization nodes (vis[001-003]). In order to allow a local machine to connect to a node with an active job, one of the login nodes needs to be defined as a jump host, to forward your ssh request.

Please make sure that the ~/.ssh/config on your local machine contains the following entries:

Host elysium-login001
    User <login_id>
    Hostname login001.elysium.hpc.rub.de
    IdentityFile ~/.ssh/elysium

Host *.elysium.hpc.rub.de *.elysium.hpc.ruhr-uni-bochum.de !login*.elysium.hpc.rub.de !login*.elysium.hpc.ruhr-uni-bochum.de
    User <login_id>
    IdentityFile ~/.ssh/elysium
    ProxyJump elysium-login001

Please ensure that the IdentityFile points to the correct (private) key. You are now able to connect to connect to one of the vis nodes with the following command:

[localuser@localmachine ~]$ ssh vis001.elysium.hpc.rub.de
[<login_id>@vis001 ~]$

Visualization

We provide Visualization via VirtualGL on the visualization nodes on Elysium.hpc.ruhr-uni-bochum.de

Requirements:

X11 server with 24-bit- or 32-bit Visuals. VirtualGL version > 3.0.2 installed.

You can check support for your Operating Sytsem at: https://virtualgl.org/Documentation/OSSupport You can download VirtualGL at: https://github.com/VirtualGL/virtualgl/releases

To use VirtualGL on Elysium, you will only need the VirtualGL client, it is not necessary to configure a VirtualGL Server.

Resource allocation:

Allocate resources in the vis partition.

salloc -p vis -N1 --time=02:00:00 --account=testproj_0000

This will allocate a share of one vis node for 2 hours. (For more options on node allocations see SLURM). Wait until a Slot in the vis partition is available. You can check if your resources are already allocated using the ‘squeue’ command.

Establish Virtual GL connection:

Connect directly from your computer to the visualization node via ssh with vglonnect -s Use one of the login servers login[001-004] as a jump host.

vglconnect -s <loginid>@vis001.elysium.hpc.rub.de -J <loginid>@login001.elysium.hpc.rub.de

If you are prompted for password, enter your RUB-password. If you don’t like long commands, you can configure one of the login nodes as jump host in your ~/.ssh/config for the vis[001-003] hosts. The command vglconnect -s accepts nearly the same syntax as ssh.

Run your Software:

Load a module if required. Start your application using vglrun, please remember to use useful command line options like -fps .

module load vmd
vglrun +pr -fps 60 vmd

Please remember to cancel the resource allocation once you are done with your interactive session.

scancel $jobID

Job Monitoring

With our web-based job monitoring system (ClusterCockpit), you can easily monitor and analyze the performance of your jobs on the Elysium HPC system. For a quick performance check, see Metrics to Check; for an in-depth analysis, refer to the HPC-Wiki. For details on the web interface, consult the official documentation.

To access the job monitoring system, use your RUB LoginID and corresponding password as credentials.

Overview

After logging in successfully, you will see the “Clusters” overview, which displays the total number of jobs you have run and the current number of jobs running on the cluster. At present, this information includes only the Elysium cluster. You can continue from here, either by going to the total jobs overview, or the running jobs overview. Alternatively, you can click on “My Jobs” in the top left of the page, or search for job names/ids in the top right of the page.

My Jobs

The “My Jobs” page displays a list of your jobs, fully customizable to your requirements. Use the menus in the top left corner to sort or filter the list, and select the metrics you want to display for your jobs. Below, you’ll find a detailed table with job IDs, names, and your selected metrics.

Job Details

This page is split into three sections. The first one shows general information: JobInfo, a footprint and a roofline diagram that shows how efficiently the job utilized the hardware. Note that the footprint is only updated every 10 minutes and the energy footprint is generated after the job finished.

In the next section some metrics are shown as diagrams. For some of the diagrams you can choose the scope, i.e. core, socket or node. The shown metrics and their order can be customized with the “Select Metrics” menu. This selection is saved per partition. Double-click the graph to zoom out if the scale is too small.

The last section displays selected metrics in a numerical way, lets you inspect your job script, and shows more detail about the job allocation an runtime parameters.

Metrics

The following table shows the metrics which are available for jobs on Elysium:

Metric name	Meaning	Meaningful for shared jobs
CPU
cpu_load	Load on the node (processes/threads requesting CPU time)	No
cpu_load_core	Load on CPU cores of a job (processes/threads per core)	Yes
cpu_user	Percentage of CPU time spent as user time for each CPU core	Yes
clock	Frequency of the CPU cores of the job	Yes (affected by other jobs)
ipc	Instructions per cycle	Yes
flops_any	Floating-point operations performed by CPU cores	Yes
core_power	Power consumption of individual CPU cores	Yes
Memory
mem_bw	Memory bandwidth	No (full socket only)
mem_used	Main memory used on the node	No
job_mem_used	Main memory used for the job	Yes
disk_free	Free disk space on the node	No
GPU
nv_compute_processes	Number of processes using the GPU	Yes
acc_mem_used	Accelerator (GPU) memory usage	Yes
acc_mem_util	Accelerator (GPU) memory utilization	Yes
acc_power	Accelerator (GPU) power usage	Yes
acc_utilization	Accelerator (GPU) compute utilization	Yes
Filesystem
lustre_write_bw	/lustre write bandwidth	No
lustre_read_bw	/lustre read bandwidth	No
lustre_close	/lustre file close requests	No
lustre_open	/lustre file open requests	No
lustre_statfs	/lustre file stat requests	No
io_reads	Local Disk I/O read operations/s	No
io_writes	Local Disk I/O write operations/s	No
nfs4_close	/home + /cluster file close requests	No
nfs4_open	/home + /cluster file open requests	No
nfsio_nread	/home I/O read bandwidth	No
nfsio_nwrite	/home I/O write bandwidth	No
Network
ib_recv	Omnipath receive bandwidth	No
ib_xmit	Omnipath transmit bandwidth	No
ib_recv_pkts	Omnipath received packets/s	No
ib_xmit_pkts	Omnipath transmitted packets/s	No
net_bytes_in	Ethernet incoming bandwidth	No
net_bytes_out	Ethernet outgoing bandwidth	No
net_pkts_in	Ethernet incoming packets/s	No
net_pkts_out	Ethernet outgoing packets/s	No
NUMA Nodes
numastats_numa_hit	NUMA hits/s	No
numastats_numa_miss	NUMA misses/s	No
numastats_interleave_hit	NUMA interleave hits/s	No
numastats_local_node	NUMA local node accesses/s	No
numastats_numa_foreign	NUMA foreign node accesses/s	No
numastats_other_node	NUMA other node accesses/s	No
Node metrics
node_total_power	Power consumption of the whole node	No

Metrics to Check

For a quick performance analysis, here are some key metrics to review:

cpu_user: Should be close to 100%. Lower values indicate system processes are using some of your resources.
flops_any: Measures calculations per second. On Elysium, a typical CPU node averages around 400 GFLOPS.
cpu_load_core: Should be 1 at most for non-OpenMP jobs. Higher values suggest oversubscription.
ipc: Instructions executed per cycle. Higher values indicate better efficiency.
mem_bw: Memory bandwidth, maxing out at 350 GByte/s. Only meaningful if the node isn’t shared or your job uses a full socket.
acc_utilization: GPU compute utilization. Aim for high percentages (e.g., above 80%) to ensure efficient GPU usage.

Known Problems

Occasionally, an orange box labeled “No dataset returned for <metric>” may be shown instead of the graph. This occurs when the ClusterCockpit service was unable to collect the metrics during your job. Note that jobs that ran before March 12th 2025 may report missing or incorrect data in some cases.

The measurements for ipc and clock are sometimes too high. This is related to power saving features of the CPU. We are currently investigating how to solve this issue.

For jobs that ran before March 7th 2025 a bug triggered an overflow for the power usage metric resulting in unrealisticly high power consumptions. This bug is fixed, but the fix cannot be applied to older jobs that were affected by it.

Advanced Permissions

On the elysium cluster every user is in the default hpcuser group. For this reason it is important that you don’t set group and other permission to read/writeable. If you want to actually share files with other people there are 2 solutions.

LDAP Groups

You can request an LDAP group to be added to the cluster. https://www.it-services.ruhr-uni-bochum.de/services/ias/benutzergruppen_ldap.html.de With this you can manage a group of people that are automatically added/removed on every folder that has those group permissions.

Advanced ACLs

Linux supports ACLs which allow you to manually grant people or groups permission to read/write to shared folders. Note that this will only work on /lustre for now.

Adding permission

To grant a specific user access to a folder you can use setfacl.
Normally you don’t want to grant other users complete access to your /home or /lustre folder so here is an example workflow how to create a folder with extra permissions to share. In the example alice shares a folder with bob. In this example we share the bob-share folder in /lustre with bob.

# First give general permission to bob to access subfolders
[alice@elysium ~] setfacl -m u:bob:x /lustre/alice/
# Then set the `default` permission to read, write and execute.
# Every new file/folder in `bob-share` will inherit this
[alice@elysium ~] setfacl -d -m u:bob:rwx /lustre/alice/bob-share
# If files already existed in the share directory apply those recursively
[alice@elysium ~] setfacl -d -R -m u:bob:rwx /lustre/alice/bob-share

Check permissions

[alice@elysium ~] ls -l /lustre/alice/
total 0
drwxrwxr-x+  2 alice hpcuser 4096 Feb 26 16:06 bob-share

The + shows that there are additional permissions set.

[alice@elysium ~] getfacl /lustre/alice/bob-share
getfacl: Removing leading '/' from absolute path names
user::rwx
group::r-x
other::r-x
default:user::rwx
default:user:bob:rwx
default:group::r-x
default:mask::rwx
default:other::r-x

Remove permissions

To remove all ACLs recursively you could do this

setfacl -b -R /lustre/alice/bob-share

You of course have more fine grained controls, but for every additional use case please consult the man page of getfacl/setfacl.
Please don’t open a ticket if this does not work on your home directory.

Documentation

Subsections of Documentation

Basics

Login

Software

Subsections of Software

Modules

Known Issues

Common module commands

Using your own modulefiles

Spack

Table of Contents

Quick Setup

Architecture Overview

Guide to Using Spack

Searching and Inspecting Packages

Installing Packages

Virtual Providers (BLAS, MPI, FFT, etc.)

Inspecting and Comparing Installations

Removing Packages

Overriding Package Definitions

Create a local repo

Override a package (example: ffmpeg)

Verify which repository is used

Custom Changes to Packages using spack develop

Create and activate a development environment

Add and install the package

Switch to development mode

Modify the source and rebuild

Legacy Setup

Subsections of Spack

MATLAB

Initial Configuration

Running MATLAB on the HPC Cluster

Running MATLAB on the Desktop

Configuring Jobs

Independent Batch Job - MATLAB on the HPC Cluster or Desktop

Parallel Batch Job - MATLAB on the HPC Cluster or Desktop

Debugging

Helper Functions

To Learn More

Vampir

Compilation with Instrumented Functions

Generating Profiling Data

Visualizing With Vampir

VASP

Build configuration (MKL)

Activating AVX512

Test case 1 – Si256 (DFT / Hybrid HSE06)

Baseline: MPI-only, 1 node

Build configuration (AOCL)

Baseline: MPI-only, 1 node

Hybrid parallelisation and NUMA domains

Single-node hybrid results (Si256)

Multi-node scaling (Si256)

Recommendations for DFT / Hybrid-DFT workloads

Jobscript examples

AOCL – Hybrid L3 (8×6)

MKL (+AVX512) – Hybrid L3 (8×6)

Test case 2 – XAS (Core-level excitation)

Single-node results (XAS)

Multi-node scaling (XAS)

Recommendations for XAS and similar workloads

General guidance

SLURM

List of Partition

Requesting Nodes of a Partition

Shared Nodes

GPU Nodes

Visualization Nodes

List of Currently Running and Pending Jobs

List of Computing Resources Share

List of Project Accounts

Interactive Sessions

Interactive Jobs

SSH From Login Node to Interactive Session

SSH From Local Machine to Interactive Session

Visualization

Requirements:

Resource allocation:

Custom Changes to Packages using `spack develop`