Wilson HPC Computing Facility

GPU cluster

Contents

  1. Cluster layout
  2. Accessing GPU hosts
  3. Building Code

Disclaimer

This is not documentation about NVIDIA GPU programming or performance, but rather a set of notes for using the Fermilab GPU cluster.

1. Cluster Layout

The GPU cluster consists five GPU host servers (gpu1, gpu2 gpu3 gpu4 and ibmpower9).

gpu1Dual 6-core Intel 2.0GHz E5-2620 "Sandy Bridge" CPU, 32GB of memory, Dual NVIDIA Tesla Kepler K20m GPUs with 5GB of memory each (after ECC overhead)

gpu1 Hardware Topology

gpu1 GPU communication matrix

gpu2Dual 8-core Intel 2.6GHz E5-2650 "Ivy Bridge" CPU, 32GB of memory, Dual NVIDIA Tesla Kepler K40m GPUs with 12GB of memory each (after ECC overhead)

gpu2 Hardware Topology

gpu2 GPU communication matrix

gpu3Dual 14-core Intel 2.4GHz E5-2680 "Broadwell" CPU, 128GB of memory, Dual NVIDIA Tesla Pascal P100 GPUs with 17GB of memory each (after ECC overhead), NVLINK connect between GPUs

gpu3 Hardware Topology

gpu3 GPU communication matrix

gpu4Dual 8-core Intel 1.7GHz E5-2609v4 "Broadwell" CPU, 768GB of memory, Eight NVIDIA Tesla Pascal P100 GPUs with 17GB of memory each (after ECC overhead)

gpu4 Hardware Topology

gpu4 GPU communication matrix

ibmpower9Dual IBM Power9 16-core 2.6/3GHz CPU, 1TB of memory, Four NVIDIA Volta V100 GPUs with 17GB of memory each (after ECC overhead), NVLINK connect between GPUs

ibmpower9 Hardware Topology

ibmpower9 GPU communication matrix

  • Inter-host Networking: QDR Infiniband, connecting only the hosts
  • All hosts above, mount /home from the Wilson cluster head node tev.fnal.gov which is backed up
  • All hosts above, NFS mount /data which is NOT backed up
  • 2. Accessing GPU Hosts

    3. Building Code

    For the NVIDIA GPU, CUDA a parallel computing platform and programming model invented by NVIDIA is available on each GPU host under /usr/local/cuda. On the GPU server host run /usr/local/cuda/bin/nvcc -V to query the CUDA version and cat /proc/driver/nvidia/version to query the NVIDIA driver version.

    Contact: Amitoj Singh
    Last modified: Oct 26, 2018