Just installed latest memx-accl 1.1.3 and tried to run my application inside the Docker container. It does not run. connect_dfp() throws an error: cannot find device.
Spent whole day debugging. Finally figured out, that in ver. 1.1.3 memx-accl library does not use the driver directly anymore, instead it communicates with gRPC mx-server. When memx-accl is installed with apt directly on the host, such server is started as a service. Obviously, it is not started inside the Docker.
The workaround I found is the following: inside my application check the list of running processes, and if it does not contain mx-server, just run it by executing /user/bun/mx-server
Leaving this post here, may be this will help someone.
With SDK 1.1, we use a service (mx-server) running on the host to manage access to the accelerator hardware. This prevents the potential for crashes that previously existed with a file-based lock mechanism, which could have happened if multiple containers ran simultaneously.
Please see the Docker tutorial, which shows how to use mx-server on the host to mediate hardware access.
I have a question about new local vs. shared mode of MxAccl operation. By default it is local mode. But it still requires mx_server process to be running. Otherwise, connect_dfp() throws error. Please advise.
Just read your article about Docker operation. It describes, how to access mx-server running on the host from inside the container. But my task is opposite: how to run mx-server INSIDE the container. And in this case I need to arrange starting mx-server inside the container one way or another myself.
Ah, I see, so you have the driver (memx-drivers) installed on the host, without mx-server (memx-accl package), which is installed only inside the Docker container and the container has privileged access to the host devices?
In that case, yes, you would have to start the mx-server process inside the container. You could see if systemctl start mx-server works inside the container, else yes would have to manually start /usr/bin/mx-server and let it run in the background.
Preferably mx-server (memx-accl) could be installed on the host in addition to memx-drivers. Is there something preventing this from working for your setup?
Ideally, we want to be able to use memx-accl runtime in local mode without any server running. We want to distribute containerized app with as little requirements to host configuration as possible. Yes, kernel driver installation on the host cannot be skipped obviously, but everything else we prefer to pack into the container. This way the burden of host configuration for our users will be minimal.
As of ver.1.1.1 it was enough to install memx-drivers package to the host, and memx-accl package into the container. Now, with 1.1.3 we have to run mx_server somewhere in order to use local inference (no idea, why do you need it for local inference though…)
The mx-server process also controls device locks. For example, if there were two Docker containers running, and both tried to use local mode on the same device without mx-server on the host, they could end up accessing the device at the same time and cause a crash.
Therefore the “safest” option from our perspective was to require mx-server on the host to guarantee against this scenario. But we have been getting feedback, from others as well, that applications where exactly 1 container will be running doesn’t need to use mx-server.
As such, we’re working to split the memx-accl package in the next SDK release: memx-accl and mxa-manager (renamed from mx-server, to be more clear about its role). Along with that, memx-accl will have an option to ignore mxa-manager’s locking, with warnings to users about the multi-container conflict potential.