Modules
We use the Lmod module system to provide compilers, MPI stacks, libraries, and tools.
The legacy module tree (elysium/2024) is currently the default view.
To switch to the current tree, load:
The elysium/202x modules modify the MODULEPATH.
They are sticky, so they are not removed by module purge.
elysium/2024 and elysium/2026 are mutually exclusive: loading one unloads the other.
The legacy tree (elysium/2024) is frozen and no longer updated by admins.
The 2026 tree is built with gcc@13.4.0, which provides better optimization support for Elysium’s Zen 4 CPUs than gcc@11.
Known Issues
- Intel MPI in
elysium/2024 (legacy tree) is known to be unstable in this version. It can cause random MPI deadlocks where applications stop progressing while the Slurm job is still running.
- In the current tree (
elysium/2026), Intel MPI is stable and recommended.
- MPICH versions before
5.0.0 are not suitable for multi-node runs. Use 5.0.0 or newer for multi-node jobs.
- If you must use MPICH
< 5.0.0, restrict jobs to single-node execution.
Common module commands
ml is a short alias for module.
ml av: list available modules in the current module path.
ml load <module>: load a module. Example: ml load openmpi/5.0.9.
ml list: show currently loaded modules.
ml unload <module>: unload a module. Example: ml unload openmpi/5.0.9.
ml purge: unload all currently loaded modules except sticky ones.
ml help <module>: show module help text. Example: ml help openmpi/5.0.9.
ml show <module>: print the contents of the modulefile.
ml whatis <module>: show a short module description.
ml spider <name>: search all known versions. Use ml spider -A <name> to also show hidden entries.
ml use <path>: add a directory to your module search path.
ml unuse <path>: remove a directory from your module search path.
Using your own modulefiles
You can keep your own modulefiles in a personal directory and add it with ml use.
mkdir -p "$HOME/modules/mytool"
cat > "$HOME/modules/mytool/1.0.lua" <<'LUA'
help([[MyTool 1.0]])
whatis("Name: MyTool")
whatis("Version: 1.0")
depends_on("openmpi/5.0.9")
prepend_path("PATH", "/path/to/mytool/bin")
LUA
ml use "$HOME/modules"
ml load mytool/1.0
In this example, loading mytool/1.0 also loads openmpi/5.0.9 as a dependency.
To make your personal modules always visible, add this to your ~/.bashrc:
If you need additional software or module versions, please contact support.
Spack
We provide a central Spack 1.1.1 installation in parallel to the existing 0.23.0 stack.
This separation is necessary because a database migration would render the 0.23.0 installation unusable.
The new 2026 module tree is not compatible with the legacy tree, and self-built packages from the old setup must be rebuilt.
By default, you will still see the legacy module tree.
The new stack is opt-in via:
If you already used the legacy setup, use this dedicated migration guide:
Migration Guide for Existing Users.
Legacy documentation remains available here:
Spack (Legacy 0.23.0).
Table of Contents
- Quick Setup (New Users)
- Architecture Overview
- Guide to Using Spack
- What Is New in Modern Spack
- Legacy Setup
Quick Setup
For a fresh setup:
rub-deploy-spack-configs-2026
module load spack/2026
After that, you can install your own packages into your home Spack tree and generate personal modulefiles.
Add module load spack/2026 to your ~/.bashrc if you want this active in ever login shell.
Architecture Overview
We use a central Spack installation combined with per-user overlays:
Central Spack (read-only, maintained by HPC)
ββ pre-built packages
ββ MPI configurations
ββ cluster-wide defaults
β (upstream)
User Spack ($HOME)
ββ personal installs
ββ personal modules
β
Lmod module system
This means:
- You automatically use centrally installed packages when available.
- Cluster-wide configurations (e.g. MPI defaults, compiler settings) are inherited.
- Your own installations go into your personal
$HOME Spack tree.
- Only missing packages are built locally.
Guide to Using Spack
This section covers the most common Spack workflows on Elysium.
Searching and Inspecting Packages
Search for packages:
spack list <keyword>
spack list openfoam
Show details (versions, variants, dependencies):
Preview what will actually be installed:
Always check spack spec before installing complex packages.
Installing Packages
Basic installation:
Enable or disable variants:
spack install hdf5 +mpi +cxx ~fortran
Specify a compiler:
spack install hdf5 %gcc@13.4.0
Force rebuilding dependencies with the same compiler:
spack install --fresh hdf5 %gcc@13.4.0
Specify dependencies explicitly:
spack install hdf5 ^openmpi@5.0.9
Everything combined:
spack install hdf5@1.14.6 +mpi %gcc@13.4.0 ^openmpi@5.0.9
Virtual Providers (BLAS, MPI, FFT, etc.)
Some packages depend on virtual interfaces instead of concrete libraries.
Examples include:
To see which virtual packages exist:
To list available providers for a specific interface:
spack providers blas
spack providers fftw-api
Example output:
Providers for blas:
amdblis
openblas
intel-oneapi-mkl
You can select a specific provider during installation:
spack install gromacs ^blas=openblas
This is often preferred over manually selecting a specific concrete library, because it keeps the dependency graph clean and compatible.
Inspecting and Comparing Installations
List installed packages with variants and hashes:
Inspect a specific installation:
Compare two installations:
Removing Packages
Remove a specific installation by hash:
Overriding Package Definitions
On Elysium, the central builtin repository is provided by the HPC team. If you want to override a package definition (e.g. to test changes), you can create a local repository on top of it.
Create a local repo
mkdir -p $HOME/spack/var/spack/repos/packages
cat > $HOME/spack/var/spack/repos/repo.yaml <<'EOF'
repo:
namespace: overrides
EOF
Register it in ~/.spack/repos.yaml (place it above builtin to override):
repos:
overrides: $HOME/spack/var/spack/repos
builtin:
destination: /cluster/spack/spack-packages
Check:
Override a package (example: ffmpeg)
mkdir -p $HOME/spack/var/spack/repos/packages/ffmpeg
cp /cluster/spack/spack-packages/repos/spack_repo/builtin/packages/ffmpeg/package.py \
$HOME/spack/var/spack/repos/packages/ffmpeg/
Edit the copied package.py and adjust versions, dependencies, variants, etc.
Install explicitly from your namespace:
spack install overrides.ffmpeg
Verify which repository is used
For a spec (not yet installed):
For installed packages:
The -N option shows the namespace (overrides or builtin).
Custom Changes to Packages using spack develop
If you want to modify the source code of a package (e.g. openfoam) and rebuild it locally, you can use spack develop. This allows you to work directly on a source checkout without creating tarballs or calculating checksums.
Create and activate a development environment
mkdir -p ~/openfoam-dev
cd ~/openfoam-dev
spack env create -d .
spacktivate . #shortcut for `spack env activate .`
Using a dedicated environment keeps your development work isolated from your normal Spack setup.
Add and install the package
spack add openfoam
spack install
This performs a normal installation and ensures all dependencies are available.
Switch to development mode
This checks out the source code into the environment directory (e.g. ~/openfoam-dev/openfoam/) and registers it as the active development source.
You can verify this with:
Look for dev_path=.../openfoam in the output.
Modify the source and rebuild
cd ~/openfoam-dev/openfoam
# edit source files here (e.g. with vim)
cd ~/openfoam-dev
spack install
Spack will now build openfoam from your modified local sources.
If compilation fails, Spack will print the relevant error messages and the path to the full build log.
Legacy Setup
If you need to reference the old 0.23.0 setup, use:
Spack (Legacy 0.23.0).
Vampir
Vampir is a framework for analyzing program behavior of serial and parallel software by utilizing function instrumentation via Score-p.
Vampir is licensed by HPC.nrw and can be used freely on the Elysium Cluster.
This site merely shows a small test case to show how Score-p can be used to generate profiling data and how Vampir can be started on Elysium.
For information how to use Vampir to analyze your application, extract useful performance metrics, and identify bottlenecks, please refer to the Score-p Cheat Sheet and the official Vampir Documentation.
Compilation with Instrumented Functions
In order to generate profiling data the function calls in the application need to be instrumented.
This means inserting additional special function calls that record the time, current call stack, and much more.
Fortunately, this is not done manually, but can easily achieved by using the Score-p compiler wrapper.
To follow along you can use this
MPI Example Code.
To use the Score-p compiler wrapper, all that is needed is to prepend the compiler by the scorep command:
module load openmpi/5.0.5-d3ii4pq
module load scorep/8.4-openmpi-5.0.5-6mtx3p6
scorep mpicc -o mpi-test.x mpi-test.c
In the case of a Makefile, or other build systems, the compiler variable has to be adjusted accordingly.
Generating Profiling Data
Profiling data is created by running the application.
Note that the profiling files can grow to enormous sizes.
Thus, it is advisable to choose a small representative test case for your application and not a full production run.
In its default mode Score-p collects profiling data by sampling the applications call-stack from time to time. In order to generate an accurate profile tracing needs to be enabled in your job script:
module load openmpi/5.0.5-d3ii4pq
module load scorep/8.4-openmpi-5.0.5-6mtx3p6
export SCOREP_ENABLE_TRACING=true
mpirun -np 4 ./mpi-test.x
Here is a full job script for the example:
#!/bin/bash
#SBATCH --partition=cpu
#SBATCH --ntasks=4
#SBATCH --nodes=1
#SBATCH --account=<Account>
#SBATCH --time=00-00:05:00
module purge
module load openmpi/5.0.5-d3ii4pq
module load scorep/8.4-openmpi-5.0.5-6mtx3p6
export SCOREP_ENABLE_TRACING=true
mpirun -np 4 ./mpi-test.x
The execution of the instrumented application will take significantly longer than usual.
Thus, it should never be used for production runs, but merely for profiling.
After the application is finished a new directory was created, containing the time stamp and some other information in its name e.g.: scorep-20251222_0912_1523094386395226
The file traces.otf2 contains the profiling data required by Vampir.
Visualizing With Vampir
In order to visualize the profiling data a
Visualization Session
has to be established.
Vampir can be started with
module load vampir
vglrun +pr -fps 20 vampir ./traces.otf2
This will open the Vampir graphical user interface:

VASP
Build configuration (MKL)
On Elysium, VASP can be built with Spack using:
spack install vasp@6.4.3 +openmp +fftlib ^openmpi@5.0.5 ^fftw@3+openmp ^intel-oneapi-mkl threads=openmp +ilp64
This configuration uses:
- Intel oneAPI MKL (ILP64) for BLAS, LAPACK and ScaLAPACK,
- VASPβs internal FFTLIB to avoid MKL CDFT issues on AMD,
- OpenMPI 5.0.5 as MPI implementation,
- OpenMP enabled for hybrid parallelisation.
We choose MKL as baseline because it is the de-facto HPC standard and performs well on AMD EPYC when AVX512 code paths are enabled.
Activating AVX512
Intelβs MKL only enables AVX512 optimisations on Intel CPUs.
On AMD, MKL defaults to AVX2/SSE code paths.
To unlock the faster AVX512 kernels on AMD EPYC we provide libfakeintel, which fakes Intel CPUID flags.
| MKL version |
library to preload |
|
| β€ 2024.x |
/lib64/libfakeintel.so |
|
| β₯ 2025.x |
/lib64/libfakeintel2025.so |
works for older versions too |
β Intel gives no guarantee that all AVX512 instructions work on AMD CPUs.
In practice, the community has shown that not every kernel uses full AVX512 width, but the overall speed-up is still substantial.
Activate AVX512 by preloading the library in your job:
export LD_PRELOAD=/lib64/libfakeintel2025.so:${LD_PRELOAD}
Test case 1 β Si256 (DFT / Hybrid HSE06)
This benchmark uses a 256-atom silicon supercell (Si256) with the HSE06 hybrid functional.
Hybrid DFT combines FFT-heavy parts with dense BLAS/LAPACK operations and is therefore a good proxy for most large-scale electronic-structure workloads.
Baseline: MPI-only, 1 node
| Configuration |
Time [s] |
Speed-up vs baseline |
| MKL (no AVX512) |
2367 |
1.00Γ |
| MKL (+ AVX512) |
2017 |
1.17Γ |
β Always enable AVX512.
The baseline DFT case runs 17 % faster with libfakeintel,
Build configuration (AOCL)
AOCL (AMD Optimized Libraries) is AMDβs analogue to MKL, providing:
- AMDBLIS (BLAS implementation)
- AMDlibFLAME (LAPACK)
- AMDScaLAPACK, AMDFFTW optimised for AMD EPYC
- built with AOCC compiler
Build example:
spack install vasp@6.4.3 +openmp +fftlib %aocc ^amdfftw@5 ^amdblis@5 threads=openmp ^amdlibflame@5 ^amdscalapack@5 ^openmpi
AOCL detects AMD micro-architecture automatically and therefore does not require libfakeintel.
Baseline: MPI-only, 1 node
| Configuration |
Time [s] |
Speed-up vs baseline |
| MKL (+ AVX512) |
2017 |
1.00 |
| AOCL (AMD BLIS / libFLAME) |
1919 |
1.05 |
The AOCl build is another 5% faster than MKL with AVX512 enabled.
Hybrid parallelisation and NUMA domains
Each compute node has two EPYC 9254 CPUs with 24 cores each (48 total).
Each CPU is subdivided into 4 NUMA domains with separate L3 caches and memory controllers.
- MPI-only: 48 ranks per node (1 per core).
- Hybrid L3: 8 MPI ranks Γ 6 OpenMP threads each, bound to individual L3 domains.
This L3-hybrid layout increases memory locality, because each rank mainly uses its own local memory and avoids cross-socket traffic.
Single-node hybrid results (Si256)
| Configuration |
Time [s] |
Speed-up vs MPI-only |
| MKL (L3 hybrid) |
1936 |
1.04Γ |
| AOCL (L3 hybrid) |
1830 |
1.05Γ |
Hybrid L3 adds a modest 4-5 % speed-up.
Multi-node scaling (Si256)
| Configuration |
Nodes |
Time [s] |
Speed-up vs 1-node baseline |
| MKL MPI-only |
2 |
1305 |
1.55Γ |
| AOCL MPI-only |
2 |
1142 |
1.68Γ |
| MKL L3 hybrid |
2 |
1147 |
1.69Γ |
| AOCL L3 hybrid |
2 |
968 |
1.89Γ |
Interpretation
AOCL shows the strongest scaling across nodes; MKLβs hybrid variant catches up in scaling compared to its MPI-only counterpart.
The L3-hybrid layout maintains efficiency even in the multi-node regime.
Recommendations for DFT / Hybrid-DFT workloads
- AOCL generally outperforms MKL (+AVX512) on AMD EPYC.
- Prefer L3-Hybrid (8Γ6) on single-node and even multi-node jobs for FFT-heavy hybrid-DFT cases.
- For pure MPI runs, both MKL (+AVX512) and AOCL scale well; AOCL slightly better.
- Always preload libfakeintel2025.so if MKL is used.
Jobscript examples
AOCL β Hybrid L3 (8Γ6)
#!/bin/bash
#SBATCH -J vasp_aocl_l3hyb
#SBATCH -N 1
#SBATCH --ntasks=8
#SBATCH --cpus-per-task=6
#SBATCH -p cpu
#SBATCH -t 48:00:00
#SBATCH --exclusive
module purge
module load vasp-aocl
export OMP_NUM_THREADS=6
export OMP_PLACES=cores
export OMP_PROC_BIND=close
export BLIS_NUM_THREADS=6
mpirun -np 8 --bind-to l3 --report-bindings vasp_std
MKL (+AVX512) β Hybrid L3 (8Γ6)
#!/bin/bash
#SBATCH -J vasp_mkl_avx512_l3hyb
#SBATCH -N 1
#SBATCH --ntasks=8
#SBATCH --cpus-per-task=6
#SBATCH -p cpu
#SBATCH -t 48:00:00
#SBATCH --exclusive
module purge
module load vasp-mkl
export LD_PRELOAD=/lib64/libfakeintel2025.so:${LD_PRELOAD}
export OMP_NUM_THREADS=6
export OMP_PLACES=cores
export OMP_PROC_BIND=close
export MKL_NUM_THREADS=6
export MKL_DYNAMIC=FALSE
mpirun -np 8 --bind-to l3 --report-bindings vasp_std
Test case 2 β XAS (Core-level excitation)
The XAS Mn-in-ZnO case models a core-level excitation (X-ray Absorption Spectroscopy).
These workloads are not FFT-dominated; instead they involve many unoccupied bands and projector evaluations.
Single-node results (XAS)
| Configuration |
Time [s] |
Relative |
| MKL MPI-only |
897 |
1.00Γ |
| AOCL MPI-only |
905 |
0.99Γ |
| MKL L3 hybrid |
1202 |
0.75Γ |
| AOCL L3 hybrid |
1137 |
0.79Γ |
Multi-node scaling (XAS)
| Configuration |
Nodes |
Time [s] |
Relative |
| MKL MPI-only |
2 |
1333 |
0.67Γ |
| AOCL MPI-only |
2 |
1309 |
0.69Γ |
| MKL L3 hybrid |
2 |
1366 |
0.66Γ |
| AOCL L3 hybrid |
2 |
1351 |
0.67Γ |
Interpretation
For core-level / XAS calculations, hybrid OpenMP parallelisation is counter-productive, and scaling beyond one node deteriorates performance due to load imbalance and communication overhead.
Recommendations for XAS and similar workloads
- Use MPI-only and single-node configuration.
- MKL and AOCL perform identically within margin of error.
- Hybrid modes reduce efficiency and should be avoided.
- Set
OMP_NUM_THREADS=1 to avoid unwanted OpenMP activity.
General guidance
For optimal performance on Elysium with AMD EPYC processors, we recommend using the AOCL build as the default choice for all VASP workloads. AOCL consistently outperforms or matches MKL (+AVX512) across tested scenarios (e.g., 5 % faster for Si256 single-node, up to 1.89Γ speedup for multi-node scaling) and does not require additional configuration like libfakeintel. However, MKL remains a robust alternative, especially for users requiring compatibility with existing workflows.
| Workload type |
Characteristics |
Recommended setup |
| Hybrid DFT (HSE06, PBE0, etc.) |
FFT + dense BLAS, OpenMP beneficial |
AOCL L3 Hybrid (8Γ6) |
| Standard DFT (PBE, LDA) |
light BLAS, moderate FFT |
AOCL L3 Hybrid or MPI-only |
| Core-level / XAS / EELS |
many unoccupied bands, projectors |
AOCL MPI-only (single-node) |
| MD / AIMD (>100 atoms) |
large FFTs per step |
AOCL L3 Hybrid |
| Static small systems (<20 atoms) |
few bands, small matrices |
AOCL MPI-only |
Recommendations:
- Default to AOCL: Use the AOCL build for all workloads unless specific constraints (e.g., compatibility with Intel-based tools) require MKL.
- AVX512 for MKL: If using MKL, always preload
libfakeintel2025.so to enable AVX512 optimizations.
- Benchmark if unsure: Test both
MPI-only and L3 Hybrid on one node to determine the optimal configuration for your specific system.