Performance Metrics

I’ve scoured the support site and most areas online looking for any way to grab performance metrics for monitoring. Is this possible?

Ideally any of the following:

  • Utilization percentage of the whole card (or individual MX3 Chips)
  • Temperature of the board and or chips
  • Power consumption
  • Throughput
  • Latency

I’m looking to create a custom Prometheus exporter to monitor the performance of the 2 MX3 M.2 cards I have installed.

Any advice you have would be appreciated!

Update:

I found where the temperature and utilization is stored under:

|/sys/memx0/temperature|Current NPU temperature (°C)|

|/sys/memx0/utilization|Real-time NPU usage (%)|

I’m assuming those are all the performance metrics available at this time?

Hi there!

The utilization is at /sys/memx{0,1}/utilization. But note that this number represents MX3 “pipeline fullness” percent, not the percent of utilized compute/memory units on the chips, which is information known offline by the Compiler instead.

Temperature info can be found both in /sys/memx0/temperature (and /sys/memx1/temperature for the 2nd card). It is also in the Linux hwmon subsystem, so they should show up as a sensor, like CPU temps, in commands such as sensors.

Power consumption unfortunately isn’t self-reported and needs to be measured externally. We do have a different M.2 module option with power measurement capability, but it’s not in production yet.

Throughput measurements, in terms of MB/s sending to and from the module, are available by adding a debug flag to the module parameters: fs_debug_en=1. So for example, you can add the line: memx_cascade_plus_pcie fs_debug_en=1 to the /etc/modules file.

Latency is measured at the application level in our case, by taking differences between timestamps, so I don’t think there would be a way to universally share this info with Prometheus.


So in summary:

  • Temperature: yes, via linux sensors or /sys/memx*/temperature
  • Utilization: yes, via /sys/memx*/utilization, but note it means “pipeline fullness” and not “compute unit activity”
  • Power: not yet
  • Throughput: yes, via /sys/memx*/throughput (needs module flag)
  • Latency: no, reported by applications
1 Like

Thanks so much Tim for your response. Looking forward to testing and benchmarking. I’ll report back here if I have any other questions!