BeoShock HPC System Overview

System Description

(Webpage updated as of 4/27/2026 - Note that you can also view the Faculty Condo Queue Compute Node (compute202515) Overview and the information for Obtaining System Configuration Information on Other Compute Nodes further down the page.)

The institution operates a centrally managed high-performance computing (HPC) system providing shared CPU-only, high-memory, and GPU-accelerated compute resources for research applications. The system is heterogeneous and organized into node classes with distinct compute, memory, and accelerator capabilities to support a broad range of workloads, including high-throughput computing, large-scale MPI simulations, memory-intensive applications, and GPU-accelerated scientific computing.

All system resources are integrated into a common workload scheduler, monitoring framework, and security environment. System configuration and capacity are reported using scheduler-managed inventories, which serve as the authoritative source for resource availability and allocation enforcement.

The system is fully operational and available for immediate use.

Compute Resources

The HPC system comprises 45 compute nodes, delivering a total of 4704 scheduler allocatable CPU cores. CPU configurations span Intel Cascade Lake and AMD EPYC architectures, providing flexibility for both legacy and modern compute workloads.

CPU core counts per node range from 36 to 288 cores, depending on node class, with multiple configurations supporting high parallelism and large shared memory applications.

Memory Resources

System memory capacity varies by node class and is reported using scheduler defined allocatable memory, reflecting usable memory available to jobs after accounting for operating system overhead and system reservations.

Memory per node ranges from approximately 190 GB on general purpose CPU nodes to approximately 2.3 TB on large GPU accelerated systems. Dedicated high memory nodes provide approximately 1.5 TB of RAM for applications requiring large shared memory footprints.

GPU Resources

The system includes 36 GPUs across multiple NVIDIA architectures, supporting a wide range of accelerator based workloads. GPUs are exposed through scheduler managed generic resources (GRES) and are allocated consistently with system policy.

Supported GPU architectures include:

NVIDIA Tesla V100
NVIDIA A30
NVIDIA RTX Pro 6000
NVIDIA H200 NVL

GPU accelerated nodes provide between 2 and 4 GPUs per node, with large system memory and high core count CPUs for balanced CPU GPU workloads.

Storage Resources

Node-Local Storage

Selected compute nodes provide node local NVMe storage (~5 TB per node) intended for:

Temporary job data
Scratch space
Data staging during active computation

Local system disks are used for operating system and runtime needs. Node‑local storage is not intended for persistent data storage.

Persistent Storage

Persistent research data are provided through shared, facility‑managed storage systems, accessible from all compute nodes and supported by institutional monitoring and backup policies.

Networking

Compute nodes are connected via high speed Ethernet, with RoCE support on GPU accelerated and large memory nodes. Network configuration supports efficient MPI communication, GPU aware workloads, and data intensive applications.

Resource Management

All compute resources are managed through a centralized workload scheduler. Scheduler reported CPU cores, memory, and GPU resources reflect allocatable system capacity, ensuring consistency with enforced allocation limits and operational availability.

System configuration data are derived from scheduler managed node inventories and system topology tools, ensuring that documented resources accurately represent what users can request and utilize.

System Overview Table

HPC System Overview

Resource Category	System Symmary
Total Compute Nodes	45
CPU Architectures	Intel Cascade Lake, AMD EPYC (Zen 4)
Total CPU Cores	4704 allocatable cores (scheduler‑reported)
Cores per Node	36, 48, 72, 144, 192, and 288
Memory per Node	~190 GB to ~2.3 TB
High-Memory Nodes	Dedicated nodes with ~1.5 TB RAM
Total GPUs	36 GPUs
GPU Architecture	NVIDIA V100, A30, RTX Pro 6000, H200 NVL
GPUs per Node	2-4
Node-Local Storage	~5 TB NVMe on selected nodes (temporary use)
Persistent Storage	Shared, facility managed storage systems
Networking	High speed Ethernet with RoCE support
Resource Management	Centralized workload scheduler
System Status	Fully operational; immediately available

Node Classes, Counts, and Resources

Node Class	Node Count	CPU Cores per Node	Memory per Node	GPU Configuration	GPUs per Class
General‑Purpose CPU Nodes	20	36	~190 GB	None	0
High‑Memory CPU Node	1	36	~1.5 TB	None	0
Mid Scale CPU Nodes	8	192	~1.5 TB	None	0
Large CPU Nodes	4	288	~385 GB	None	0
GPU Nodes – NVIDIA V100	2	36	~384 GB	2 × V100 (16 GB)	4
GPU Nodes – NVIDIA A30	2	48/72	~514-772 GB	2–4 × A30	6
GPU Nodes – RTX Pro 6000 (2 GPU)	3	144	~1.5 TB	2 × RTX Pro 6000	6
GPU Nodes – RTX Pro 6000 (4 GPU)	3	144	~1.5 TB	4 × RTX Pro 6000	12
GPU Nodes – NVIDIA H200 NVL	2	192	~2.3 TB	4 × H200 NVL	8

Summary

The HPC system provides a diverse and well balanced mix of general purpose CPU nodes, high memory systems, and GPU accelerated compute resources. With 4704 allocatable CPU cores, 36 GPUs, large per node memory capacity, and fast local and shared storage, the system offers both capacity and capability computing to support a broad spectrum of computational research needs.

Return to the Top of the Page

Faculty Condo Queue Compute Node (compute202515)

The following information is specifically regarding the Faculty Condo Queue Compute Node (compute202515) and was last updated on 4/27/2026.

Overview

This document describes a dedicated high-performance compute (HPC) node provisioned within the institutional faculty condominium (condo) queue. The system is allocated for the exclusive use of a single faculty member and their research group, while being physically and operationally integrated into the shared cluster environment. The node has been designed to support GPU-accelerated scientific computing, large-scale Monte Carlo workloads, and hybrid MPI/OpenMP/CUDA applications.

This resource is immediately available and does not require additional acquisition or development.

The configuration reflects a balance between high core-count CPUs, multiple large-memory GPUs, and NUMA-aware system architecture, enabling both tightly coupled multi-GPU jobs and concurrent independent GPU workloads.

System Configuration

The compute node compute202515 is a dual-socket AMD EPYC Gen4 system equipped with two AMD EPYC 9565 processors, providing a total of 144 physical CPU cores (72 cores per socket, single thread per core) across two NUMA domains. The node includes four NVIDIA RTX Pro 6000 GPUs based on the Blackwell architecture, each with approximately 96 GB of ECC-protected device memory, for an aggregate GPU memory capacity of approximately 384 GB. GPUs are connected via PCIe and are evenly distributed across the two NUMA domains, with two GPUs local to each CPU socket. The software environment includes NVIDIA Driver version 575.51.03 with CUDA 12.9, supporting modern vector and mixed-precision execution on both CPUs and GPUs.

Storage

The node provides approximately 5 TB of node-local NVMe storage, mounted at /mnt/compute_data, which is intended for temporary job data, scratch usage, and data staging during active computations. Additional local storage is used for operating system and system software installation. Persistent user data are not stored on the node and are instead provided through shared, facility-managed storage systems accessible from all compute nodes.

System Architecture

Compute Node Type

Node classification: Faculty-owned condo node
Queue: Faculty condominium queue (restricted access)
Usage model: Exclusive to owning faculty member and approved group users

The node is fully managed by central IT / HPC operations, including operating system provisioning, scheduler integration, monitoring, and security patching, while computational capacity is reserved for the faculty owner.

Central Processing Units (CPUs)

Processor model: AMD EPYC 9565 (Zen 4 / Genoa)
Sockets: 2
Cores per socket: 72
Total physical cores: 144
Threads per core: 1 (SMT disabled)
NUMA domains: 2 (one per socket)

The absence of simultaneous multithreading simplifies performance predictability and avoids resource contention, which is advantageous for MPI and hybrid CPU/GPU workloads.

CPU Features

AVX2 and AVX‑512 instruction sets
AVX‑512 BF16 and VNNI support
Large on‑chip cache hierarchy (768 MB aggregate L3)

These features support efficient vectorized computation, mixed‑precision workflows, and memory‑intensive scientific applications.

Graphics Processing Units (GPUs)

GPU model: NVIDIA RTX Pro 6000 (Blackwell architecture)
Number of GPUs: 4 per node
Memory per GPU: ~96 GB ECC VRAM
Aggregate GPU memory: ~384 GB
Maximum power per GPU: 600 W
MIG: Disabled

Each GPU is available as a full device to user jobs, enabling large on‑device problem sizes without memory partitioning.

GPU Software Environment

NVIDIA driver version: 575.51.03
CUDA version: 12.9

The software stack supports CUDA, CUDA‑aware MPI, and modern GPU programming models such as Kokkos (CUDA backend).

NUMA and PCIe Topology

The node exhibits a clean and symmetric NUMA layout:

NUMA node 0: GPU0 and GPU1
NUMA node 1: GPU2 and GPU3

All GPU–GPU connectivity occurs via PCIe within a NUMA node. No NVLink interconnect is present. This topology favors explicit GPU ownership per MPI rank and NUMA‑local process placement.

High‑speed network interfaces (Mellanox NICs) are similarly distributed, with each socket hosting network devices in close PCIe proximity, supporting efficient MPI communication and RDMA operations.

Recommended Usage Model

The node is optimized for the following execution model:

Four MPI ranks per node, each bound to a single GPU
NUMA‑aware pinning, ensuring each rank uses CPU cores local to its GPU
Optional OpenMP threading within each rank (typically 8–16 threads)

This configuration minimizes cross‑socket memory traffic and maximizes sustained GPU utilization.

Intended Workloads

This condo node is well suited for:

GPU‑accelerated Monte Carlo simulations
Parameter scans and ensemble workflows
Hybrid MPI + CUDA or MPI + Kokkos applications
Large‑memory GPU workloads requiring tens of gigabytes per device

The hardware supports both tightly coupled multi‑GPU jobs and simultaneous independent GPU workloads, depending on user preference and scheduler policy.

Role Within the Facility

As a faculty condo queue resource, this node:

Expands the institution’s aggregate GPU capacity
Provides guaranteed access for faculty‑owned research programs
Operates under the same scheduler, monitoring, and security framework as general‑access cluster resources

This model enables customized, high‑end computational capability for individual research programs while maintaining operational consistency across the HPC facility.

Summary

This faculty condo compute node represents a high-end, GPU-dense resource tailored for modern scientific computing. Its combination of 144 Zen 4 CPU cores, four large-memory Blackwell GPUs, and a clean NUMA-aware architecture makes it an effective and flexible platform for advanced research workloads within the shared facility environment.

Return to the Top of the Page

Obtaining System Configuration Information on Other Compute Nodes

Users can obtain similar system configuration information for other compute nodes directly from a login or compute node without administrative privileges. The following commands are commonly available on HPC systems and provide the key details needed for facilities documentation, performance tuning, or job configuration.

CPU and NUMA Configuration

To view CPU model, core counts, sockets, NUMA layout, and supported instruction sets:

1 lscpu

This command reports processor model, number of sockets, cores per socket, NUMA node layout, cache sizes, and supported instruction extensions.

GPU Inventory and Driver/CUDA Versions

To list available GPUs and basic status information:

1 nvidia-smi

This reports GPU models, memory capacity, driver version, CUDA version, and current utilization.

To examine GPU-to-GPU and GPU-to-CPU topology (NUMA locality and PCIe connectivity):

1 nvidia-smi topo -m

Local and Shared Storage

To display mounted filesystems and their sizes:

1 df -h

Local node storage typically appears as devices mounted on directories such as /, /scratch, /local, or /mnt, while shared facility storage is usually mounted via network filesystems (e.g., NFS, Lustre, BeeGFS).

To identify the underlying block devices and local disks:

1 lsblk

Memory-Backed Temporary Storage

To check the size of shared memory (RAM-backed temporary storage):

1 df -h /dev/shm

This space is useful for I/O-intensive workloads but is not persistent.

Recommended Practice

For facilities documentation, users should avoid assuming hardware specifications across nodes and instead verify configuration directly on representative systems using the commands above. Reported values reflect the actual, installed hardware and software environment at the time of execution.

Return to the Top of the Page