A comprehensive guide to High Performance Computing concepts, architectures, and best practices, with hands-on examples from ETH HPC Lab
High-Performance Computing (HPC) combines computational resources to deliver high computational power for solving complex problems. Modern HPC systems, often called supercomputers, can perform quadrillions of calculations per second.
# To access euler cluster make sure to be in ETHZ Network using VPN
ssh <user_name>@euler.ethz.ch
This connects you to the login node which manages basic cluster administration. Here, you should only compile and run small programs for testing. Large jobs should run on compute nodes using the batch system.
ETH clusters provide multiple software stacks:
Switch to the new stack using:
env2lmod
bsub -W 01:00 -n 1 ./my_program
export OMP_NUM_THREADS=8
bsub -n 8 -R "span[ptile=8]" -W 01:00 ./omp_program
bsub -n 16 -R "rusage[mem=4096]" -W 02:00 mpirun ./mpi_program
# List all jobs
bjobs [options] [JobID]
-l # Long format
-w # Wide format
-p # Show pending jobs
-r # Show running jobs
# Detailed resource usage
bbjobs [options] [JobID]
-l # Log format
-f # Show CPU affinity
# Connect to running job
bjob_connect <JOBID>
# Kill jobs
bkill <jobID> # Kill specific job
bkill 0 # Kill all your jobs
Modern processors handle floating point operations very efficiently:
This makes memory access optimization critical for performance.
// Cache-friendly blocked implementation
void matrix_multiply_blocked(float* A, float* B, float* C, int N, int BLOCK_SIZE) {
for (int i = 0; i < N; i += BLOCK_SIZE) {
for (int j = 0; j < N; j += BLOCK_SIZE) {
for (int k = 0; k < N; k += BLOCK_SIZE) {
// Block multiplication
for (int ii = i; ii < min(i+BLOCK_SIZE, N); ii++) {
for (int jj = j; jj < min(j+BLOCK_SIZE, N); jj++) {
float sum = C[ii*N + jj];
for (int kk = k; kk < min(k+BLOCK_SIZE, N); kk++) {
sum += A[ii*N + kk] * B[kk*N + jj];
}
C[ii*N + jj] = sum;
}
}
}
}
}
}
Key findings from our matrix multiplication study:
// Example: Cache-friendly matrix multiplication
for (int i = 0; i < N; i += BLOCK_SIZE) {
for (int j = 0; j < N; j += BLOCK_SIZE) {
for (int k = 0; k < N; k += BLOCK_SIZE) {
// Block multiplication
for (int ii = i; ii < min(i+BLOCK_SIZE, N); ii++) {
for (int jj = j; jj < min(j+BLOCK_SIZE, N); jj++) {
float sum = C[ii][jj];
for (int kk = k; kk < min(k+BLOCK_SIZE, N); kk++) {
sum += A[ii][kk] * B[kk][jj];
}
C[ii][jj] = sum;
}
}
}
}
}
module avail # List available modules
module load compiler # Load specific module
module list # Show loaded modules
module purge # Unload all modules