About USB 3, maximum number of interconnection, documentation

Hi,

  1. Are there any cons using USB 3 interface rather than PCIE?
  2. Why up to 16-chips (96 TOPS/TFLOPs) can be interconnected? What is the reason for this limit?
  3. What about documentation of MX3? Datasheet, reference schematics etc.

Thank you

Happy Christmas, happy new year

Ali

Hi Ali,

  1. A USB3 board is coming soon next year (as “stackable” 2-chip sticks). In general though, the main difference is simply bandwidth. Our PCIe bandwidth usage between MX3 and host depends on the neural networks being run, but generally peaks around 1.3GB/s. USB 3.2 is lower, approximately half that bandwidth. But note that since all model weights are on-chip, the only host I/O bandwidth the MX3 uses is for first inputs (images) and final outputs (inference results). So the I/O bandwidth limit will usually only be hit for high-resolution models that are running at high FPS.
  2. In theory there is no limit to the number of chips that can be chained together in a single line, but we have set the limit to 16 in the firmware. We found this is the “practical limit”, because NeuralCompiler may be unable to map models that large (160M). If your goal is to run many small models, on the other hand, using parallel chains (such as 4 lines of 4 chips, like having 4x M.2 modules) will be much more suited to the task – compile groups of models into separate DFPs and deploy each DFP on a chain (e.g. M.2). On the other hand, if the goal is high FPS, you can simply connect multiple chains (M.2s) and give a list of devices to the C++ runtime API, and it will automatically load balance the DFP across all the given devices.
  3. Here’s the MX3 chip’s datasheet and the M.2 module’s datasheet. There is also a PCIe card reference design package (RDP) available by request. Let me know if you have any questions!

Thanks,
Tim

1 Like

Hi Tim,

Thank you very much for your detailed info. I hope NeuralCompiler will be able to map models larger than 160M in the near future.
But of course this can not prevent as from running multiple models in parallel, each of which is limited to 160M for now.

Kind regards

Happy new year

Ali

Hi,

Will USB3 board coming soon next year (as “stackable” 2-chip sticks) look like a USB memory stick or a standard board?

Can OSC_XIN be fed by 25MHz clock oscillator while leaving OSC_XOUT unconnected? If yes, what should be the voltage level for the oscillator?

Thank you

Regards
Ali