HPC Training 

 

Explore video sessions from leading supercomputing centers! Designed to enhance your expertise and familiarity with HPC, these free training sessions provide valuable insights into HPC utilization and keep your knowledge up-to-date.

Parallel Computing Concepts

Parallel computing in High-Performance Computing (HPC) involves breaking down large computational tasks into smaller parts that can be processed simultaneously across multiple processors or computing nodes. This approach accelerates problem-solving by dividing the workload, enabling the system to handle vast data and complex simulations much faster than a single processor could.

Parallel computing is fundamental in HPC because it maximizes resource efficiency, reduces computational time, and allows for solving large-scale scientific, engineering, and data-intensive problems. Techniques include data parallelism, where large datasets are split across nodes, and task parallelism, where distinct tasks are executed concurrently. Commonly, parallel computing in HPC uses frameworks like MPI (Message Passing Interface) and OpenMP (Open Multi-Processing) to coordinate tasks across processors.

HPC Hardware Overview 

For anyone using advanced computing resources, it's helpful to understand a bit about the hardware so you can see what factors influence how well applications run. This session will cover some key points:

  • CPUs (Processors): Learn about what makes a CPU powerful, like the number of cores, hyperthreading (which makes a core act like two), and instruction sets (types of tasks a CPU can handle).
  • Compute Node Anatomy: Each compute node is like a mini-computer in the cluster, with its own processors, memory, storage devices, and sometimes special accelerators like GPUs.
  • Cluster Structure: It will also explain how clusters are set up, including different types of nodes (login and compute nodes), how they connect, and how they handle file storage.
  • Using Tools to Check Hardware: Finally, you’ll learn to use Linux tools to view hardware details, check system usage, and monitor performance.

Interactive Computing 

 

Interactive computing is a computing approach where users can provide inputs in real time and immediately see the results. In the context of high-performance computing (HPC), interactive computing lets users run jobs that respond instantly to inputs, such as exploring data, adjusting parameters, or visualizing results, directly on powerful HPC resources.

Code Migration 

This session covers how to transition computations to HPC resources, including:

  • Using pre-installed applications via Linux modules
  • Compiling code with compilers, libraries, and optimization flags
  • Setting up Python and R environments, including conda-based setups
  • Managing workflows and using containers with Singularity

Batch Computing: Getting Started with Batch Job Scheduling - Slurm Edition 

This video introduces the concept of a distributed batch job scheduler — what it is, why it exists, and how it works — using the Slurm Workload Manager as a reference tool and testbed. It explains how to write and submit a first job script to an HPC system using Slurm as the scheduler. The video also covers best practices for structuring batch job scripts, leveraging Slurm environment variables, and requesting resources from the scheduler to optimize task completion time.

Data Storage and File Systems

This session introduces common data storage and file systems on HPC systems, detailing their hardware and software architecture, capabilities, and typical use cases. Correct data handling practices are essential, as improper use can degrade system performance for all users. The session also covers Linux command-line tools to monitor and manage storage usage, adjust configurations for specific research needs, and provides guidance on data backups, security, and file permissions. Additional topics will be addressed as time allows.

 High-Throughput and Many-Task Computing - Slurm Edition

This session introduces high-throughput computing (HTC) and many-task computing (MTC) on HPC systems, focusing on how to harness consistent, aggregate compute power over time. HTC workloads tackle large problems by completing numerous smaller subtasks, ideal for parameter sweeps or data analysis tasks. The session covers using the Slurm Workload Manager to set up these workflows with job arrays and dependencies, discusses common challenges in HTC/MTC setups, and explains job bundling strategies and when to apply them. Additional HTC/MTC workflow topics will be addressed as time allows.