Cluster description

Hardware

Compute nodes

Node Entity Type Threads Memory GPUs
cpu-node[32-39] RPBS Supermicro X8DTT-H 12 (2x Intel Xeon CPU X5650 @ 2.67GHz) 48GB
cpu-node[69-72] EDC Dell PowerEdge FC630 32 (2x Intel Xeon CPU E5-2630 v3 @ 2.40GHz) 64GB
cpu-node79 RPBS Dell PowerEdge FC630 56 (2x Intel Xeon CPU E5-2660 v4 @ 2.00GHz) 128GB
cpu-node80 RPBS Dell PowerEdge FC640 96 (2x Intel Xeon Gold 5220R @ 2.20GHz) 192GB
cpu-node[81-84] RPBS Dell PowerEdge FC640 64 (2x Intel Xeon Gold 5218 @ 2.30GHz) 256GB
cpu-node110 RPBS Dell PowerEdge R920 120 (4x Intel Xeon E7-4870 v2 @ 2.30GHz) 512GB
cpu-node[130-145] iPOP-UP Dell PowerEdge C6525 128 (2x AMD EPYC 7452 @ 2.35GHz) 256GB
cpu-node146 RPBS ProLiant DL560 Gen10 144 (4x Intel Xeon Gold 6254 CPU @ 3.10GHz) 1536GB
gpu-node[1,3] Master BI ProLiant DL380 Gen10 Plus 64 (2x Intel Xeon Silver 4314 CPU @ 2.40GHz) 256GB 2x Tesla A30 48GB
gpu-node2 Master ISDD ProLiant DL380 Gen10 Plus 64 (2x Intel Xeon Silver 4314 CPU @ 2.40GHz) 256GB 2x Tesla A30 48GB
gpu-node4 CMPLI Asus Z10PE-D8 WS 40 (2x Intel Xeon E5-2630 v4 @ 2.20GHz) 32GB 2x Geforce GTX1080Ti
gpu-node[5-6] CMPLI Asus Z10PE-D8 WS 40 (2x Intel Xeon E5-2630 v4 @ 2.20GHz) 32GB 2x GeForce RTX2080Ti
gpu-node[7-8] CMPLI Asus Z10PE-D8 WS 40 (2x Intel Xeon E5-2630 v4 @ 2.20GHz) 32GB 2x GeForce RTX2080Ti
gpu-node9 CMPLI Dell PowerEdge R7525 64 (2x AMD EPYC 7282 @ 2.80GHz) 128GB 3x Quadro RTX6000
gpu-node10 Master BI Dell PowerEdge R7525 32 (2x AMD EPYC 7262 @ 3.20GHz) 64GB 3x Quadro RTX6000
gpu-node11 RPBS Dell PowerEdge T640 32 (2x Intel Xeon Silver 4208 @ 2.10GHz) 96GB 1x GeForce RTX3090 / 1x RTX4000 Ada
gpu-node13 RPBS Dell PowerEdge T640 32 (2x Intel Xeon Silver 4208 @ 2.10GHz) 96GB 2x GeForce RTX3080Ti
gpu-node14 CMPLI Dell Precision 7920 Rack 64 (2x Intel Xeon Gold 6226R @ 2.90GHz) 192GB 3x RTXA6000
gpu-node[15,17-18] RPBS ProLiant DL385 Gen10 Plus v2 64 (2x AMD EPYC 7313 @ 3.0GHz) 512GB 2x Tesla A100 80GB
gpu-node16 CMPLI ProLiant DL385 Gen10 Plus v2 64 (2x AMD EPYC 7313 @ 3.0GHz) 512GB 2x Tesla A100 80GB
gpu-node19 RPBS Dell Precision 7960 Rack 64 (2x Intel Xeon Gold 6426Y @ 2.60GHz) 64GB 2x RTXA4500

Infrastructure

Node Entity Type Role
master1 iPOP-UP Dell PowerEdge R440 Slurm controler, Slurm database, gateway
virtserv4 RPBS Dell PowerEdge R730 Virtualization
virtserv5 RPBS Dell PowerEdge R440 Virtualization
virtserv6 iPOP-UP HP ProLiant DL380 Gen10 Virtualization
virtserv7 RPBS HP ProLiant DL380 Gen11 Virtualization
directory CMPLI HP ProLiant DL160 Gen10 LDAP server, DHCP, DNS
bastion CMPLI HP ProLiant DL160 Gen10 bastion
get-away RPBS Dell PowerEdge R420 reverse proxy, mailer

Storage

Hot storage

Node Entity Type Storage Role
hot-storage-mds1 iPOP-UP Dell PowerEdge R740xd 8x 1.92TB SSDs Metadata / management server (Lustre)
hot-storage-oss[1-2] iPOP-UP DELL PowerEdge R740xd 20x 3.84TB SSDs Object Storage server (Lustre)
Mountpoint Capacity Backup
/shared 125 To No

Cold storage

Node Entity Type Storage Role
cold-storage1 CMPLI DELL PowerEdge R740xd 12x 12TB HDD NFS server
cold-storage1 (extension) iPOP-UP Dell PowerVault MD1400 12x 12TB HDD NFS server (extension)
cold-storage2 CMPLI DELL PowerEdge R740xd 12x 12TB HDD Backup
Mountpoint Capacity Backup
/cold-storage 214 To Yes

Old cluster (Mobyle jobs only)

Node Partition Type CPUs Memory
node73 mobyle Dell PowerEdge FC630 32 (2x Intel Xeon CPU E5-2630 v3 @ 2.40GHz) 64GB
cpu-node[74-76] mobyle Dell PowerEdge FC630 40 (2x Intel Xeon CPU E5-2630 v4 @ 2.20GHz) 64GB
cpu-node[77-78] mobyle Dell PowerEdge FC630 56 (2x Intel Xeon CPU E5-2660 v4 @ 2.00GHz) 128GB
Node Entity Type Role
slurmmaster RPBS Dell PowerEdge R720 Slurm controler, Slurm database, gateway, Docker registry
Node Entity Type Storage Role
joule2 RPBS Dell PowerEdge R730xd 10x 4TB HDD NFS server
backup RPBS Dell PowerEdge R730xd 20x 4TB HDD Backup
lustre-mds RPBS Dell PowerEdge R720xd 10x 300GB HDD Metadata / management server (Lustre)
lustre-oss[1-2] RPBS Dell PowerEdge R720xd 20x 1TB HDD Object Storage server (Lustre)

Network

Node Entity Type Description
cmpli-rtr1 iPOP-UP Juniper SRX1500 16x 1GbE + 4x 10GbE ports
cmpli-rtr2 CMPLI Juniper SRX1500 16x 1GbE + 4x 10GbE ports
cmpli-sw CMPLI Juniper EX4600 24x SFP+ 10GbE ports
sw-data-mgmt-[1-2] RPBS Juniper EX3300 24x 1000BASE-T 1GbE ports
sw-dell-10g-[1,3] RPBS Dell N4032F 24x SFP+ 10GbE ports
sw-dell-10g-2 RPBS Dell S4128F-ON 28x SFP+ 10GbE ports
sw-dell-100g RPBS Dell S5232F-ON 32x QSFP28 100GbE

Software layer

The cluster is managed by Slurm (version 21.08.8-2).

Scientific software and tools are available through Environment Modules and are mainly based on Conda packages or Singularity images.

Operating System: CentOS 7, Rocky Linux 9

Around the cluster management: Proxmox VE

Deployment and configuration are powered by Saltstack, Terraform and Gitlab.