Resources at RUB

HPC Cluster Elysium

Node Specifications

Type Count CPU Memory Local Usable NVMe Storage GPU
Thin-CPU 284 2xAMD EPYC 9254 (24 core) 384 GB 810 GB -
Fat-CPU 13 2xAMD EPYC 9454 (48 core) 2304 GB 1620 GB -
Thin-GPU 20 2xAMD EPYC 9254 (24 core) 384 GB 1620 GB 3xNVIDIA A30 Tensor Core GPU 24GB, 933GB/s
Fat-GPU 7 2xAMD EPYC 9454 (48 core) 1152 GB 14000 GB 8xNVIDIA H100 SXM5 GPUs 80GB, 3.35TB/s, connected via NVLink
Fat-H200-GPU 1 2xAMD EPYC 9554 (64 core) 2304 GB 7000 GB 8xNVIDIA H200 SXM GPUs 141GB, 4.8TB/s, connected via NVLink
Thin-CPU-THINK 47 2xAMD EPYC 9645 (96 core) 1536 GB 480 GB -
Thin-GPU-THINK 10 1xAMD EPYC 9535 (64 core) 768 GB 1920 GB 4xNVIDIA L40S PCIe 48GB, 864 GB/s
Fat-GPU-THINK 1 2xAMD EPYC 9335 (32 core) 1536 GB 15360 GB 4xNVIDIA H200 SXM 141GB, 4.8 TB/s, connected via NVLink

Interconnect Specifications

To allow for high data transfer rates and low latencies all nodes and servers of Elysium are connected via a Cornelis Omni-Path network. The network topology is a 1:2 blocking fat-tree. Each node is equipped with a single-port Cornelis Omni-Path Express 100Gb/s adapter, except for the Fat-GPU nodes, which have four of these adapters. The Ping-Pong latency for a node-to-node communication with minimal hops is approximately 1.1 μs.

File Systems

The following file systems are available:

  • /home: For your software and scripts. High availability, but no backup. Quota: 100 GB per user.
  • /lustre: Parallel file system to use for your jobs. High availability, but no backup. Not for long term storage. Quotas: 4.5 TB and 1,900,000 files per user.
  • /tmp: Fast storage on each node for temporary data. Limited in space, except for FatGPU nodes where multiple TB are available. Data is removed when the job ends. For shared jobs the quota scales with the number of reserved cores.
  • /think_fast: (THINK-members only) For your software and scripts. High availability, but no backup.
  • /think_big: (THINK-members only) Parallel file system to use for your jobs. High availability, but no backup. Not for long term storage.

Partition Overview

Two partitions are available for each type of compute node: the filler partitions are designed for short jobs, while the standard partitions support longer-running tasks.

Jobs in the filler partition have a lower priority and will only start if no job from the regular partition requests resources. Running jobs in the filler will cost only a fraction of the fair share of a regular partition.

The vis and think_vis partitions are special since the visualization nodes are intended for interactive use only.

Partition Time limit Node list Max Tasks
per Node
Max Memory per CPU³ Share-Cost²
cpu 2-00:00:00¹ cpu[001-284] 48 8 GB 1.000 / core
cpu_filler 3:00:00 cpu[001-336] 48 8 GB 0.050 / core
fat_cpu 2-00:00:00 fatcpu[001-013] 96 24 GB 1.347 / core
fat_cpu_filler 3:00:00 fatcpu[001-013] 96 24 GB 0.067 / core
gpu 2-00:00:00 gpu[001-020] 48 8 GB 49.374 / GPU
gpu_filler 1:00:00 gpu[001-020] 48 8 GB 12.344 / GPU
fat_gpu 2-00:00:00 fatgpu[001-007] 96 12 GB 196.867 / GPU
fat_gpu_filler 1:00:00 fatgpu[001-007] 96 12 GB 49.217 / GPU
fat_gpu_h200 2-00:00:00 h200gpu001 128 16 GB 225.906 / GPU
fat_gpu_h200_filler 1:00:00 h200gpu001 128 16 GB 56.476 / GPU
vis 1-00:00:00 vis[001-003] 48 24 GB 5.000 / core
think_cpu⁴ 2-00:00:00¹ tcpu[001-047] 192 8 GB 0.050 / core
think_cpu_filler 3:00:00 tcpu[001-047] 192 8 GB 0.050 / core
think_gpu⁴ 2-00:00:00 tl40sgpu[001-010] 64 12 GB 4.984 / GPU
think_gpu_filler 1:00:00 tl40sgpu[001-010] 64 12 GB 4.984 / GPU
think_fat_gpu⁴ 2-00:00:00 th200gpu001 64 24 GB 19.429 / GPU
think_fat_gpu_filler 1:00:00 th200gpu001 64 24 GB 19.429 / GPU
think_vis⁴ 1-00:00:00 tvis001 128 6 GB 0.080 / core

¹ Times of up to 7 days are possible on this partition but not recommended. Only 2 days are guaranteed, jobs running longer than that may get cancelled if that becomes necessary for important maintenance work.

² Cost does not refer to money, but the factor of computing time that is added to a projects used share in order to compute job priorities. The costs are based on the relative monetary costs of the underlying hardware.

³ Some of the memory is reserved for system services. Please check the scontrol show partition <partition_name> command to get the amount of memory that is available for your job via the --mem-per-cpu=<mem> submission flag.

⁴ THINK-members only