I have an ONNX model that causes mx_nc to report “the model must have a uniform batch dimension throughout the entire model” error. However, the model already have batch size=1 for all layers. How to fix that?
The model is developed internally and I’m not authorized to share it. Steps to diagnose the issue locally will be highly appreciated.
The “non-uniform batch dimension” error typically means one of two things:
There could be data organization or pre/post-processing layers in the ONNX graph that need to be cropped. Try using the --autocrop flag and refer to this tutorial, and see if that was the cause.
Your model might have Attention / Transformer layers, such as in YOLO v10/11/26 or ViT, and in this case you’ll need to enable the relevant Compiler Extension.
When the compiler isn’t using these extensions, it looks at the ONNX nodes of the layer individually, so it appears that the batch dim is being manipulated, when really it’s just a verbose expansion of a single layer. Extensions detect these structures and tell the rest of the compiler “this group of weird ONNX nodes is actually just one layer”. Sometimes the detection patterns can overlap with each other, so that’s why they need to be enabled with mx_nc’s --extensions argument.
The Extension library is growing with each new SDK. If your model is based on an existing open-source model, please share a link to it and we can see if an Extension for it could be possible.
Let us know if these help or if you have more questions!
Thanks! I forgot to mention that --autocrop does not work. It gives a different yet similar error:
memryx.errors.CompilerError: the model must have a uniform batch dimension throughout the entire model. Please reformulate your model to ensure the batch dimension (dim 0) is always N.
In our model, there are Einsum layers, which are reported as unsupported by the compiler, so I replaced them with Transpose + MatMul. As a result, dim 0 is not always N anymore. Can this requirement be relaxed, or Einsum layer can be added?
As a side note, our model do not use Attention. Yolov10, ConvNext, or VitSmall does not fix the issue, either.
EDIT: drop the link to the open source model to avoid confusion, as our model now deviates a lot from the original one
Hmmm, those layers (Einsum / Transpose + MatMul) probably won’t be supportable without a new Extension for it, or depending on the axes involved, it may not be supportable on the current generation chip. MX3 is mainly focused around CNNs, so transpose/matmul especially on the Batch (or just 1st dim) or Channel dimensions can be difficult.
How common is this layer in your network? If it’s only near the beginning or end of the graph, you could try manual cropping [see the other tab on the autocrop tutorial].
Likewise, if the layer is only once or twice in the middle of the network, perhaps cropping into separate sections, then co-compiling the supported sections, would work. If you’d like a tutorial on this just let me know.
Such layers appear a few times in the middle of the network, in one of multiple branches, and there are heavy linear layers before and after those layers. If manual cropping is applied, it seems pre- and post-processing models are not run on the accelerator. Is there a way to run all linear layers on the accelerator? Such a tutorial is highly appreciated.
Here’s an example where there’s 1 instance of the unsupported block in the middle of the network:
Crop the 1st supported section with mx_nc -m model.onnx --outputs "layer_just_before_the_unsupported_block”. You can kill it early with Ctrl+C, since we only need the cropping to happen, not yet compile to DFP.
You’ll get back model_crop.onnx as the supported section, and model_post.onnx as the rest of the network
Rename model_crop to mappable_part1.onnx and model_post to the_rest.onnx
Crop the 2nd supported section with mx_nc -m the_rest.onnx --inputs “layer_just_after_the_unsupported_block”, and again kill early with Ctrl+C
the_rest_crop.onnx is our 2nd mappable section, so rename it to mappable_part2.onnx
the_rest_pre.onnx is the isolated unsupported layer, rename it something like einsum_block.onnx
Compile the two supported sections together into 1 DFP with mx_nc -m mappable_part1.onnx mappable_part2.onnx, get models.dfp
Set the preprocessing model for model_idx=1 (mappable_part2.onnx) with accl.set_preprocessing_model(“einsum_block.onnx”, model_idx=1)
Define your input and output callbacks for both model_idx:
model_idx=0 should have
input callback: gets original input image [optionally pushing to a global queue if you want to draw on the original image before displaying]
output callback: gets the intermediate output feature maps and just pushes them to a queue
model_idx=1 should have
input callback: pops from model 0’s output fmaps from the queue and sends them directly to the accelerator
AsyncAccl will call the einsum_block.onnx “preprocessing” model under the hood.
output callback: now you have the final model output. Do post-processing, such as drawing and displaying, here.
If there are multiple instances in the middle of the network, unfortunately this process will have to be duplicated for each section, e.g. 3 instances of the unsupported block would end up having 4 model sections on running on the MX3 (model_idx 0,1,2,3) plus 3 separate “preprocessing” models attached (to model_idx 1,2,3).