Resources at RUB

HPC Cluster Elysium

Node Specifications

Type	Count	CPU	Memory	Local Usable NVMe Storage	GPU
Thin-CPU	284	2xAMD EPYC 9254 (24 core)	384 GB	810 GB	-
Fat-CPU	13	2xAMD EPYC 9454 (48 core)	2304 GB	1620 GB	-
Thin-GPU	20	2xAMD EPYC 9254 (24 core)	384 GB	1620 GB	3xNVIDIA A30 Tensor Core GPU 24GB, 933GB/s
Fat-GPU	7	2xAMD EPYC 9454 (48 core)	1152 GB	14000 GB	8xNVIDIA H100 SXM5 GPUs 80GB, 3.35TB/s, connected via NVLink
Fat-H200-GPU	1	2xAMD EPYC 9554 (64 core)	2304 GB	7000 GB	8xNVIDIA H200 SXM GPUs 141GB, 4.8TB/s, connected via NVLink
Thin-CPU-THINK	47	2xAMD EPYC 9645 (96 core)	1536 GB	480 GB	-
Thin-GPU-THINK	10	1xAMD EPYC 9535 (64 core)	768 GB	1920 GB	4xNVIDIA L40S PCIe 48GB, 864 GB/s
Fat-GPU-THINK	1	2xAMD EPYC 9335 (32 core)	1536 GB	15360 GB	4xNVIDIA H200 SXM 141GB, 4.8 TB/s, connected via NVLink

Interconnect Specifications

To allow for high data transfer rates and low latencies all nodes and servers of Elysium are connected via a Cornelis Omni-Path network. The network topology is a 1:2 blocking fat-tree. Each node is equipped with a single-port Cornelis Omni-Path Express 100Gb/s adapter, except for the Fat-GPU nodes, which have four of these adapters. The Ping-Pong latency for a node-to-node communication with minimal hops is approximately 1.1 μs.

File Systems

The following file systems are available:

/home: For your software and scripts. High availability, but no backup. Quota: 100 GB per user.
/lustre: Parallel file system to use for your jobs. High availability, but no backup. Not for archival storage. Quotas: 4.5 TB and 1,900,000 files per user.
/tmp: Fast storage on each node for temporary data. Limited in space, except for FatGPU nodes where multiple TB are available. Data is removed when the job ends. For shared jobs the quota scales with the number of reserved cores.
/think_fast: (THINK-members only) For your software and scripts. High availability, but no backup.
/think_big: (THINK-members only) Parallel file system to use for your jobs. High availability, but no backup. Not for long term storage.

Partition Overview

Two partitions are available for each type of compute node: the filler partitions are designed for short jobs, while the standard partitions support longer-running tasks.

Jobs in the filler partition have a lower priority and will only start if no job from the regular partition requests resources. Running jobs in the filler will cost only a fraction of the fair share of a regular partition.

The vis and think_vis partitions are special since the visualization nodes are intended for interactive use only.

Partition	Time limit	Node list	Max Tasks per Node	Max Memory per CPU³	Share-Cost²
cpu	2-00:00:00¹	cpu[001-284]	48	8 GB	1.000 / core
cpu_filler	3:00:00	cpu[001-336]	48	8 GB	0.050 / core
fat_cpu	2-00:00:00	fatcpu[001-013]	96	24 GB	1.347 / core
fat_cpu_filler	3:00:00	fatcpu[001-013]	96	24 GB	0.067 / core
gpu	2-00:00:00	gpu[001-020]	48	8 GB	49.374 / GPU
gpu_filler	3:00:00	gpu[001-020]	48	8 GB	12.344 / GPU
fat_gpu	2-00:00:00	fatgpu[001-007]	96	12 GB	196.867 / GPU
fat_gpu_filler	3:00:00	fatgpu[001-007]	96	12 GB	49.217 / GPU
fat_gpu_h200	2-00:00:00	h200gpu001	128	16 GB	225.906 / GPU
fat_gpu_h200_filler	3:00:00	h200gpu001	128	16 GB	56.476 / GPU
vis	1-00:00:00	vis[001-003]	48	24 GB	5.000 / core
think_cpu⁴	2-00:00:00¹	tcpu[001-047]	192	8 GB	1.000 / core
think_cpu_filler	3:00:00	tcpu[001-047]	192	8 GB	0.050 / core
think_gpu⁴	2-00:00:00	tl40sgpu[001-010]	64	12 GB	99.675 / GPU
think_gpu_filler	3:00:00	tl40sgpu[001-010]	64	12 GB	4.984 / GPU
think_fat_gpu⁴	2-00:00:00	th200gpu001	64	24 GB	388.580 / GPU
think_fat_gpu_filler	3:00:00	th200gpu001	64	24 GB	19.429 / GPU
think_vis⁴	1-00:00:00	tvis001	128	6 GB	1.595 / core

¹ Times of up to 7 days are possible on this partition but not recommended. Only 2 days are guaranteed, jobs running longer than that may get cancelled if that becomes necessary for important maintenance work.

² Cost does not refer to money, but the factor of computing time that is added to a projects used share in order to compute job priorities. The costs are based on the relative monetary costs of the underlying hardware.

³ Some of the memory is reserved for system services. Please check the scontrol show partition <partition_name> command to get the amount of memory that is available for your job via the --mem-per-cpu=<mem> submission flag.

⁴ THINK-members only