Non-uniform memory access (NUMA) systems are server platforms with more than one system bus. These platforms can utilize multiple processors on a single motherboard, and all processors can access all the memory on the board. When a processor accesses memory that does not lie within its own node (remote memory), data must be transferred over the NUMA connection at a rate that is slower than it would be when accessing local memory. Thus, memory access times are not uniform and depend on the location (proximity) of the memory and the node from which it is accessed.
References
Here is an example of a motherboard with two CPU sockets.
To achieve high performance, you first need to determine which CPU will run the application and ensure that the memory used is the one closest to it.
Mellanox adapters installed over PCIe link will be connected to one of the CPUs, when performing benchmark tests you need to run the tests from the CPU attached to that PCIe link.
Configuration
Mapping between PCI, device driver, port and NUMA
1. How do I map between a PCI, device, port and NUMA?
The easiest way it to run "mst status -v".
Here is an example of servers with two cards installed (ConnectX-4 and ConnectX-3 Pro), each connected to different numa_node.
The red line below shows that on PCI address 05:00.0, mlx5_0 is the defice, the port used for that is ens785f0 and the NUMA is 0.
# mst start
...
# mst status -v
MST modules:
------------
MST PCI module loaded
MST PCI configuration module loaded
PCI devices:
------------
DEVICE_TYPE MST PCI RDMA NET NUMA
ConnectX4(rev:0) /dev/mst/mt4115_pciconf0.1 05:00.1 mlx5_1 net-ens785f1 0
ConnectX4(rev:0) /dev/mst/mt4115_pciconf0 05:00.0 mlx5_0 net-ens785f0 0
ConnectX3Pro(rev:0) /dev/mst/mt4103_pciconf0
ConnectX3Pro(rev:0) /dev/mst/mt4103_pci_cr0 81:00.0 mlx4_0 net-ens817d1,net-ens817 1
2. How do I map a port and to a CPU (numa_node)?
On the same example, here is another way to find this information:
# ibdev2netdev
mlx4_0 port 1 ==> ens817 (Up)
mlx4_0 port 2 ==> ens817d1 (Down)
mlx5_0 port 1 ==> ens785f0 (Down)
mlx5_1 port 1 ==> ens785f1 (Up)
# cat /sys/class/net/ens785f0/device/numa_node
0
# cat /sys/class/net/ens817/device/numa_node
1
3. How do I map the PCI (root and function) to a numa_node?
On the same example, here is another way to find this information:
# lspci -D | grep Mellanox
0000:05:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
0000:05:00.1 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
0000:81:00.0 Network controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
# cat /sys/devices/pci0000\:00/0000\:00\:05.1/numa_node
0
# cat /sys/devices/pci0000\:00/0000\:00\:05.0/numa_node
0
HINT: In most cases, if the adapter is installed in a PCI address starting with 8 (for example: 81), it will be on NUMA 1. If it starts with 0 (for example: 05), it will be in NUMA 0.
Note: When the system does not support NUMA architecture, the result is expected to be -1.
4. How do I map the CPU Cores to the NUMA node?
Each CPU core is mapped to one of the NUMA nodes. In this example, by getting the CPU list (cpulist) we can see that cores 0-13 and 28-41 are mapped to NUMA 0, while the rest are mapped to NUMA 1.
# cat /sys/devices/system/node/node0/cpulist
0-13,28-41
# cat /sys/devices/system/node/node1/cpulist
14-27,42-55
The cpumap parameter, supply the same results in bitmap.
# cat /sys/devices/system/node/node0/cpumap
000003ff,f0003fff <-- 0-13 & 28-41 bits are ON
# cat /sys/devices/system/node/node1/cpumap
00fffc00,0fffc000 <-- 14-27 & 42-55 bits are ON
Invoking Application on specific NUMA node
1. How do I run applications on a specific NUMA node?
Use the taskset application as follows:
First run ib_send_bw as a server to get the PID.
# ib_write_bw &
[1] 45118
Next, get the Core affinity.
# taskset -p 45118
pid 45118's current affinity mask: ffffffffffffff
In this example this task can run on all cores. In our example ConnectX-4 is connected to NUMA 0. You can change the affinity mask to suit the list of cores used by NUMA 0 (0-13,28-41).
# taskset -cp 0-13,28-41 45118
pid 45118's current affinity list: 0-55
pid 45118's new affinity list: 0-13,28-41
# taskset -p 45118
pid 45118's current affinity mask: 3fff0003fff
In this example you spawn a task on specific NUMA cores using the -c flag.
# taskset -c 0-13,28-41 ib_send_bw &
[1] 45292
#
************************************
* Waiting for client to connect... *
************************************
For more information about using taskset, run taskset -h, run man taskset, or click here.
Comments
Post a Comment
https://gengwg.blogspot.com/