This simply code prints outs to which core MPI rank and OpenMP thread is bind to. Code performs also dummy calculation, increase in execution time shows how oversubscription of cores affects performance.
With modern CPUs, clock frequency is typically adjusted dynamically based on the number of cores in use, and comparing timings with single MPI task, single thread vs. full node gives also hints how frequency scaling affects performance
In order to test possibly different OpenMP runtime behaviour, there is both C and Fortran version.
Requires MPI and OpenMP.
Compile and run C version:
mpicc -o cpu_affinity cpu_affinity.c utilities.c -fopenmp -lm
export OMP_NUM_THREADS=<threads>
mpiexec -np <tasks> ./cpu_affinity
Compile and run Fortran version:
gcc -c utilities.c
mpif90 -o cpu_affinity cpu_affinity.f90 utilities.o -fopenmp
export OMP_NUM_THREADS=<threads>
mpiexec -np <tasks> ./cpu_affinity
NUMA information can be seen by compiling with -DNUMA -lnuma. Note that if no memory
binding options have been set, i.e. kernel is using the default NUMA policy
(https://www.kernel.org/doc/html/latest/admin-guide/mm/numa_memory_policy.html),
only MPOL_DEFAULT is printed out. Only when there is non-default policy
(e.g. set via numactl --membind) actual NUMA information is seen.