MxAccl::stop() takes very long time even when all inferences are done

Typical use case: we want to switch from one model to another model. To do it gracefully, the documentation recommends to call MxAccl::wait() then MxAccl::stop(), and then we can load new dfp.

What we noticed, that MxAccl::stop() execution duration varies from 700ms to 7 seconds! For yolov8n models with multiple output tensors it varies from 4 to 6 seconds. Which is really long time.

Studying the code of mx-accl library we found few suspicious places:

  1. //Flushing the MPU loop in lines 524-544.
    memx_stream_ofmap() is called sequentially for every output port until it returns error. We guess that the error indicates that there is no more data to read and flushing is complete. But the last read on empty port will always wait for 100ms timeout, so total wait here will be 100ms*number of ports.

  2. output_pool->stop() in line 515 and input_pool->stop() in line 498.
    It sets m_stop flag and waits for all pool threads to join. But pool threads are waiting for tasks inside m_task_queue.pop(), line 71 with 50ms timeout, so in worst case there will be 50ms delay for each pool, 100ms total.

There should be other places, which slow down stop(), since found places do not add up to 6 seconds delay.

Please help.

Hi Vlad,

Yes, MxAccl::stop() is indeed slow because those long timeouts are in place in #1 and #2 in order to ensure even long-latency neural net models have fully drained their pipeline before stopping the chip.

The intent is that users with multiple models will co-map them into the same DFP (by simply giving multiple model files as input to mx_nc -m [file1] [file2]). This will allow models to run fully in parallel, and not require swapping the DFP file.

Efficiently “live swapping” a DFPs is quite involved: batches >1 are needed to amortize swap time, new API functions are needed, the mx-server daemon needs to arbitrate DFP scheduling, etc.

This is one of the most important features we’re working on for the SDK, but please be patient as this feature is a longer-term project (think like: few more months).

Thanks,
Tim

Hi Tim,

Actually, we already implemented model switching in our app. These long waits are the only problem. According to comments in the code, that flush is not needed at all under normal conditions, when all result are retrieved normal way. May be put this code under some condition: do flush only it is indeed needed?

For thread pool the fix is also easy: add to your queue the poison pill feature. When you need to terminate worker thread, just put the poison pill in the queue. Thread will immediately consume it, analyze, and exit.

Hi Vlad,

Maybe this would be best served by filing a support Jira ticket or emailing with your MemryX contact. As you’re from an existing software partner, our customer engineering team could work more closely with you that way.

Thanks,
Tim

How to file support Jira ticket?

Just emailed you with info. Thanks