It would be nice to provide API to query device info for all installed MX devices: at least how many devices are available.
Then, it would be nice to be able to do inference on a particular device, specifying the device to operate with in the API. As of now, there is no any notion in API about device selection.
When there’s multiple M.2s connected to a system, you can specify which to run on in a few ways:
C++ API: give the device (“group”) index to the connect_dfp function to use a single given M.2, or provide a vector of IDs to the same connect_dfp function if you want to automatically distribute the same DFP across multiple M.2s
Python API: give the device index to the group_id parameter of the constructor (no distributed DFP options yet)
acclBench tool: list the device index(es) to the –device_ids argument to benchmark on one or more M.2s
Let me know if you have any followup questions on this.
Thanks for explanation. So let me confirm: group_id parameter is actually a M.2 device index, right? If so, it would be nice to describe it in documentation clearly. As of now it say: “GroupId of MPU this application is intended to use. group_id is defaulted to 0, but needs to be provided if using any other group”, which is not very clear.
And the next question is: how to obtain the list of available devices, or at least the number of available devices? I diligently searched your documentation as well as header files and cannot find any public API to query such info.
And one more question, the most important one: can I use two devices simultaneously but independently? I.e. run one model on the first device and simultaneously run another model on the second device?
Yep, group_id is really more like “device_id” and we’ll be sure to clarify the API text. Thanks for finding this issue.
On the command line, the number of devices in the system can currently be found by counting the /sys/memx* entries. But the C++ API has a private function to print the found devices: MX::Runtime::DeviceManager.print_available_devices(), which is probably much preferred, so we’ll be sure to wrap it up to a public function in the top level MxAccl object in the next release.
Lastly, regarding independent devices: yes! You can either have 2 MxAccl (or AsyncAccl in Python) objects in one process, one with group_id 0 and the other with id 1. They can be used independently with different DFPs. Alternatively, you could have separate processes/applications, one that uses group_id 0 for its MxAccl object, and the other that uses id 1.