Skip to main content

Understanding NUMA Node for Performance Benchmarks

 

Description

Non-uniform memory access (NUMA) systems are server platforms with more than one system bus. These platforms can utilize multiple processors on a single motherboard, and all processors can access all the memory on the board. When a processor accesses memory that does not lie within its own node (remote memory), data must be transferred over the NUMA connection at a rate that is slower than it would be when accessing local memory. Thus, memory access times are not uniform and depend on the location (proximity) of the memory and the node from which it is accessed.

 

 

 

References

 

Here is an example of a motherboard with two CPU sockets.

 

 

 

To achieve high performance, you first need to determine which CPU will run the application and ensure that the memory used is the one closest to it.

Mellanox adapters installed over PCIe link will be connected to one of the CPUs, when performing benchmark tests you need to run the tests from the CPU attached to that PCIe link.

 

Configuration

 

Mapping between PCI, device driver, port and NUMA

 

1. How do I map between a PCI, device, port and NUMA?

The easiest way it to run "mst status -v".

Here is an example of servers with two cards installed (ConnectX-4 and ConnectX-3 Pro), each connected to different numa_node.

The red line below shows that on PCI address 05:00.0, mlx5_0 is the defice, the port used for that is ens785f0 and the NUMA is 0.

# mst start

...

# mst status -v

MST modules:

------------

MST PCI module loaded

MST PCI configuration module loaded

PCI devices:

------------

DEVICE_TYPE MST PCI RDMA NET NUMA

ConnectX4(rev:0) /dev/mst/mt4115_pciconf0.1 05:00.1 mlx5_1 net-ens785f1 0

 

 

ConnectX4(rev:0) /dev/mst/mt4115_pciconf0 05:00.0 mlx5_0 net-ens785f0 0

 

 

ConnectX3Pro(rev:0) /dev/mst/mt4103_pciconf0

ConnectX3Pro(rev:0) /dev/mst/mt4103_pci_cr0 81:00.0 mlx4_0 net-ens817d1,net-ens817 1

 

2. How do I map a port and to a CPU (numa_node)?

On the same example, here is another way to find this information:

# ibdev2netdev

mlx4_0 port 1 ==> ens817 (Up)

mlx4_0 port 2 ==> ens817d1 (Down)

mlx5_0 port 1 ==> ens785f0 (Down)

mlx5_1 port 1 ==> ens785f1 (Up)

 

# cat /sys/class/net/ens785f0/device/numa_node

0

 

# cat /sys/class/net/ens817/device/numa_node

1

 

3. How do I map the PCI (root and function) to a numa_node?

On the same example, here is another way to find this information:

# lspci -D | grep Mellanox

0000:05:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]

0000:05:00.1 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]

0000:81:00.0 Network controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]

 

# cat /sys/devices/pci0000\:00/0000\:00\:05.1/numa_node

0

 

# cat /sys/devices/pci0000\:00/0000\:00\:05.0/numa_node

0

 

HINT: In most cases, if the adapter is installed in a PCI address starting with 8 (for example: 81), it will be on NUMA 1. If it starts with 0 (for example: 05), it will be in NUMA 0.

 

Note: When the system does not support NUMA architecture, the result is expected to be -1.

 

4. How do I map the CPU Cores to the NUMA node?

Each CPU core is mapped to one of the NUMA nodes. In this example, by getting the CPU list (cpulist) we can see that cores 0-13 and 28-41 are mapped to NUMA 0, while the rest are mapped to NUMA 1.

# cat /sys/devices/system/node/node0/cpulist

0-13,28-41

# cat /sys/devices/system/node/node1/cpulist

14-27,42-55

The cpumap parameter, supply the same results in bitmap.

 

# cat /sys/devices/system/node/node0/cpumap

000003ff,f0003fff <-- 0-13 & 28-41 bits are ON

 

# cat /sys/devices/system/node/node1/cpumap

00fffc00,0fffc000 <-- 14-27 & 42-55 bits are ON

 

Invoking Application on specific NUMA node

 

1. How do I run applications on a specific NUMA node?

Use the taskset application as follows:

First run ib_send_bw as a server to get the PID.

# ib_write_bw &

[1] 45118

 

Next, get the Core affinity.

# taskset -p 45118

pid 45118's current affinity mask: ffffffffffffff

 

In this example this task can run on all cores. In our example ConnectX-4 is connected to NUMA 0. You can change the affinity mask to suit the list of cores used by NUMA 0 (0-13,28-41).

# taskset -cp 0-13,28-41 45118

pid 45118's current affinity list: 0-55

pid 45118's new affinity list: 0-13,28-41

 

# taskset -p 45118

pid 45118's current affinity mask: 3fff0003fff

 

In this example you spawn a task on specific NUMA cores using the -c flag.

# taskset -c 0-13,28-41 ib_send_bw &

[1] 45292

#

************************************

* Waiting for client to connect... *

************************************

 

For more information about using taskset, run taskset -h, run man taskset, or click here.

 
 

Comments

Popular posts from this blog

OWASP Top 10 Threats and Mitigations Exam - Single Select

Last updated 4 Aug 11 Course Title: OWASP Top 10 Threats and Mitigation Exam Questions - Single Select 1) Which of the following consequences is most likely to occur due to an injection attack? Spoofing Cross-site request forgery Denial of service   Correct Insecure direct object references 2) Your application is created using a language that does not support a clear distinction between code and data. Which vulnerability is most likely to occur in your application? Injection   Correct Insecure direct object references Failure to restrict URL access Insufficient transport layer protection 3) Which of the following scenarios is most likely to cause an injection attack? Unvalidated input is embedded in an instruction stream.   Correct Unvalidated input can be distinguished from valid instructions. A Web application does not validate a client’s access to a resource. A Web action performs an operation on behalf of the user without checkin...

CKA Simulator Kubernetes 1.22

  https://killer.sh Pre Setup Once you've gained access to your terminal it might be wise to spend ~1 minute to setup your environment. You could set these: alias k = kubectl                         # will already be pre-configured export do = "--dry-run=client -o yaml"     # k get pod x $do export now = "--force --grace-period 0"   # k delete pod x $now Vim To make vim use 2 spaces for a tab edit ~/.vimrc to contain: set tabstop=2 set expandtab set shiftwidth=2 More setup suggestions are in the tips section .     Question 1 | Contexts Task weight: 1%   You have access to multiple clusters from your main terminal through kubectl contexts. Write all those context names into /opt/course/1/contexts . Next write a command to display the current context into /opt/course/1/context_default_kubectl.sh , the command should use kubectl . Finally write a second command doing the same thing into ...