Install and Test NCCL for Ubuntu20.04
Download NCCL
Ubuntu20.04
CUDA12.8
1
https://developer.nvidia.com/downloads/compute/machine-learning/nccl/secure/2.26.2/ubuntu2004/x86_64/nccl-local-repo-ubuntu2004-2.26.2-cuda12.8_1.0-1_amd64.deb/
CUDA12.4
1
https://developer.nvidia.com/downloads/compute/machine-learning/nccl/secure/2.26.2/ubuntu2004/x86_64/nccl-local-repo-ubuntu2004-2.26.2-cuda12.4_1.0-1_amd64.deb/
Ubuntu22.04
CUDA12.8
1
https://developer.nvidia.com/downloads/compute/machine-learning/nccl/secure/2.26.2/ubuntu2204/x86_64/nccl-local-repo-ubuntu2204-2.26.2-cuda12.8_1.0-1_amd64.deb/
CUDA12.4
1
https://developer.nvidia.com/downloads/compute/machine-learning/nccl/secure/2.26.2/ubuntu2204/x86_64/nccl-local-repo-ubuntu2204-2.26.2-cuda12.4_1.0-1_amd64.deb/
CUDA12.2
1
https://developer.nvidia.com/downloads/compute/machine-learning/nccl/secure/2.26.2/ubuntu2204/x86_64/nccl-local-repo-ubuntu2204-2.26.2-cuda12.2_1.0-1_amd64.deb/
Ubuntu24.04
CUDA12.8
1
https://developer.nvidia.com/downloads/compute/machine-learning/nccl/secure/2.26.2/ubuntu2404/x86_64/nccl-local-repo-ubuntu2404-2.26.2-cuda12.8_1.0-1_amd64.deb/
Install NCCL(Local installers)
For a local NCCL repository:
1 | sudo dpkg -i nccl-repo-<version>.deb |
Test NCCL
Install nccl-tests
1 | git clone https://github.com/NVIDIA/nccl-tests.git |
Since i only use single node
1 | make MPI=0 |
Test GPU ↔︎️ GPU Communication
Use all_reduce_perf to measure GPU-to-GPU bandwidth and
latency.
1 | ./build/all_reduce_perf -b 8 -e 512M -f 2 |
Options explained:
-b 8: minimum message size (bytes)-e 512M: maximum message size-f 2: increase by power of two
To test specific GPUs, for example GPU 0 and GPU 1:
1 | CUDA_VISIBLE_DEVICES=0,1 ./build/all_reduce_perf -b 8 -e 512M -f 2 |
This way you can test PCIe or NVLINK connection depending on which GPUs you select.
Test CPU ↔︎️ GPU Communication
NCCL is mainly for GPU ↔︎️ GPU communication.
For CPU ↔︎️ GPU bandwidth, you can write a simple benchmark using CUDA or PyTorch.
Here’s a simple Python script using PyTorch to test CPU ↔︎️ GPU transfers:
1 | import torch |
Check GPU Topology (to know PCIe vs. NVLINK)
To understand how your GPUs are connected, use:
1 | nvidia-smi topo -m |
You’ll see output like:
1 | GPU0 GPU1 GPU2 CPU Affinity |
Legend:
NV2: NVLINK connectionPHB: PCIe connection
This helps you target the correct GPUs for testing specific links.