Found 1 instance(s) of unsupported configuration for operator 'ReduceMax'

We’re attempting to convert a custom model for benchmarking with a 2-chip configuration:

mx_nc -v -m octonet.onnx -c 2 -so

The model itself is a MobileNetV3 backbone with custom heads, as well as a secondary branch for pre-processing a grayscale channel as a 2nd input.

The ONNX model is using the following operations (according to Netron):

  • Cast
  • Concat
  • Reshape
  • Transpose
  • Greater
  • Less
  • Mul
  • Div
  • Add
  • Sub
  • Clip
  • Hardswish
  • HardSigmoid
  • Sigmoid
  • Relu
  • Softmax
  • Resize
  • Slice
  • Conv
  • AveragePool
  • MaxPool
  • GlobalAveragePool

According to Supported Neural Network Operators — MemryX Developer Hub these should all be supported. However, when converting the model we’re running into:

mx_nc -vv -m octonet.onnx -c 2 -so
Converting ONNX Model…
Found 1 instance(s) of unsupported configuration for operator ‘ReduceMax’
Layer Name: ‘/ReduceMax’ | Condition: channel dimension
memryx.errors.OperatorError: During conversion found unsupported config in nodes ReduceMax(1). Using autocrop (–autocrop) might help.

Happy to privately share the model file if it helps. Not sure which operation is causing the issue.

This model converts all the way from PyTorch → ONNX → TF SavedModel → TFLite → TFLite INT8 successfully. It currently runs on a commercially available edge AI accelerator that is several years old. Here are some additional logs trying to convert these other downstream formats:

TF SavedModel (.pb)

mx_nc -v -m octonet.pb

WARNING: All log messages before absl::InitializeLog() is called are written to STDERRI0000 00:00:1758347527.901986   16581 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at I0000 00:00:1758347527.902687   16581 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at I0000 00:00:1758347527.902772   16581 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at I0000 00:00:1758347527.903766   16581 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at I0000 00:00:1758347527.903855   16581 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at I0000 00:00:1758347527.903917   16581 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at I0000 00:00:1758347527.944549   16581 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at I0000 00:00:1758347527.944670   16581 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at I0000 00:00:1758347527.944745   16581 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at Traceback (most recent call last):File “/home/yuri/mx/lib/python3.10/site-packages/tensorflow/python/training/py_checkpoint_reader.py”, line 92, in NewCheckpointReaderreturn CheckpointReader(compat.as_bytes(filepattern))RuntimeError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /home/yuri/mx/variables/variables
During handling of the above exception, another exception occurred:
Traceback (most recent call last):File “/home/yuri/mx/lib/python3.10/site-packages/tensorflow/python/saved_model/load.py”, line 1042, in load_partialloader = Loader(object_graph_proto, saved_model_proto, export_dir,File “/home/yuri/mx/lib/python3.10/site-packages/tensorflow/python/saved_model/load.py”, line 226, in initself._restore_checkpoint()File “/home/yuri/mx/lib/python3.10/site-packages/tensorflow/python/saved_model/load.py”, line 561, in _restore_checkpointload_status = saver.restore(variables_path, self._checkpoint_options)File “/home/yuri/mx/lib/python3.10/site-packages/tensorflow/python/checkpoint/checkpoint.py”, line 1456, in restorereader = py_checkpoint_reader.NewCheckpointReader(save_path)File “/home/yuri/mx/lib/python3.10/site-packages/tensorflow/python/training/py_checkpoint_reader.py”, line 96, in NewCheckpointReadererror_translator(e)File “/home/yuri/mx/lib/python3.10/site-packages/tensorflow/python/training/py_checkpoint_reader.py”, line 31, in error_translatorraise errors_impl.NotFoundError(None, None, error_message)tensorflow.python.framework.errors_impl.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /home/yuri/mx/variables/variables
During handling of the above exception, another exception occurred:
Traceback (most recent call last):File “/home/yuri/mx/bin/mx_nc”, line 7, in sys.exit(main())File “memryx/neural_compiler/nc.py”, line 2178, in memryx.neural_compiler.nc.mainFile “memryx/neural_compiler/nc.py”, line 1420, in memryx.neural_compiler.nc.NeuralCompiler.runFile “memryx/neural_compiler/nc.py”, line 1399, in memryx.neural_compiler.nc.NeuralCompiler.runFile “memryx/neural_compiler/nc.py”, line 1464, in memryx.neural_compiler.nc.NeuralCompiler.loadFile “memryx/neural_compiler/graph/framework/model_loader_factory.py”, line 454, in memryx.neural_compiler.graph.framework.model_loader_factory.ModelLoader.loadFile “memryx/neural_compiler/graph/framework/tensorflow/loader.py”, line 669, in memryx.neural_compiler.graph.framework.tensorflow.loader.TFLoader.loadFile “memryx/neural_compiler/graph/framework/tensorflow/loader.py”, line 678, in memryx.neural_compiler.graph.framework.tensorflow.loader.TFLoader._load_graph_defFile “memryx/neural_compiler/graph/framework/tensorflow/loader.py”, line 711, in memryx.neural_compiler.graph.framework.tensorflow.loader.TFLoader._load_saved_model_dirFile “/home/yuri/mx/lib/python3.10/site-packages/tensorflow/python/saved_model/load.py”, line 912, in loadresult = load_partial(export_dir, None, tags, options)[“root”]File “/home/yuri/mx/lib/python3.10/site-packages/tensorflow/python/saved_model/load.py”, line 1045, in load_partialraise FileNotFoundError(FileNotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /home/yuri/mx/variables/variablesYou may be trying to load on a different device from the computational device. Consider setting the experimental_io_device option in tf.saved_model.LoadOptions to the io_device such as ‘/job:localhost’.

TFLite (float32)

mx_nc -v -m octonet_float32.tflite

KeyError: 152
During handling of the above exception, another exception occurred:
memryx.errors.CompilerError: No builtin operator for: <memryx.neural_compiler.graph.framework.tflite.tflite.Operator.Operator object at 0x7eff2f7adc00>

So I noticed we’re running ONNX opset 20. Reducing down from 20 to 17 during conversion, this results in a slightly different error:

Converting ONNX Model…
Converted MXG
Model ‘main_graph’ Summary
Nodes : 412
Edges : 463
Inputs : [1, 480, 832, 3], [1, 1, 480, 832], [1, 8190, 2], [1, 8190, 2]
Outputs: [1, 8190, 4], [1, 8190, 3], [1, 2, 8190, 1], [1, 1, 8190, 1], [1, 120, 208, 5], [1, 1, 120, 208], [1, 1, 120, 208]
┌──────────────────────┬────────┐
│ Layer Type │ Count │
├──────────────────────┼────────┤
│ ConstAdd │ 2 │
│ ConstDivide │ 1 │
│ ConstGreater │ 3 │
│ ConstMultiply │ 2 │
│ ConstSubtract │ 1 │
│ Transpose │ 1 │
│ MoveChannelsFirst │ 7 │
│ MoveChannelsLast │ 6 │
│ HardSigmoid │ 9 │
│ HardSwish │ 20 │
│ ReLU │ 50 │
│ Sigmoid │ 4 │
│ Softmax │ 1 │
│ Cast │ 3 │
│ Clip │ 16 │
│ Concat │ 13 │
│ Add │ 10 │
│ Divide │ 2 │
│ Multiply │ 12 │
│ Subtract │ 1 │
│ Conv2D │ 104 │
│ DepthwiseConv2D │ 11 │
│ Input │ 2 │
│ Output │ 7 │
│ Reshape │ 11 │
│ Resize2D │ 5 │
│ ZeroPadding2D │ 70 │
│ BroadcastTo │ 10 │
│ AveragePooling2D │ 13 │
│ MaxPooling2D │ 3 │
│ Slice │ 12 │
└──────────────────────┴────────┘
Running Processor
ConvertCast executed 3x.
RemoveNoOpNodes executed 3x.
ConvertBroadcastTo executed 10x.
SplitGlobalPooling executed 9x.
SplitBigPoolingLayers executed 3x.
MergeZeroPadding executed 70x.
MergeClip executed 14x.
MergeReLU executed 50x.
CascadePlusAUnitMerging executed 1x.
ConvertSlice executed 2x.
ConvertConcat executed 13x.
ConvertSoftmax executed 1x.
ConvertExponential executed 1x.
ConvertBatchedDense executed 2x.
ConvertLinearInterpolation executed 5x.
ConvertReduceSum executed 1x.
ConvertBatchedDense executed 1x.
Error in ‘ConvertTranspose’
memryx.errors.OperatorError: Transpose (‘/box_keypoint_head/Reshape_1_to_channels_last’) (input_shape=[1, 4, 8190], perm=[0, 2, 1]). Arbitrary transposition is inefficient on the MXA and the induced matrix for data manipulation in this case is prohibitively big. Using autocrop (–autocrop) might help.

I think this might be actionable on our end to modify. Is there any information available about best practices when it comes to reshape operations?

Hi,

Sometimes exported ONNX models will have “data manipulation” layers, particularly at the end of the network. These including things like Transpose, Reshape, Where, Gather, etc. that are cheap computation-wise, but not very friendly to the MX3’s instruction set. Or some models will have NMS directly in the ONNX graph, for example.

This is where –autocrop and connect_post_model come in handy.

Either by the --autocrop flag or by manually giving --input and --output crop points, the Neural Compiler can cut the model into a core that compiles to the MX3 and _post.onnx (and _pre.onnx if relevant).

Then, the runtime (both Python/C++) can “connect” these cropped model sections and run them automatically on the CPU in your application. For example, in the yolov7 tutorial.

Can you please see if these steps allow you to run the core + cropped for your model? If you either have issues cropping, or if it runs but too many layers are being autocropped-out, let me know and we’d be happy to take a look at the model and offer specific suggestions.

Thank you for the response! We don’t include NMS into the model. However, the transpose and reshape operations are probably something we can remove in favor of grouped convolutions or similar.

I do want to mention that --autocrop hangs on this model after unsuccessfully trying a few times to optimize it. I’ve left it run overnight and came back to the process still active with no new information being printed to the console other than the following stack trace:

Cores optimization: 206 (Done)
Flow optimization: 758 - 1076 (Failed)
Mapping again to resolve dataflow hazards …
Initial flow optimization: (Done) 1140 - 10545)
Traceback (most recent call last):
File “/home/yuri/mx/bin/mx_nc”, line 7, in
sys.exit(main())
File “memryx/neural_compiler/nc.py”, line 2178, in memryx.neural_compiler.nc.main
File “memryx/neural_compiler/nc.py”, line 1424, in memryx.neural_compiler.nc.NeuralCompiler.run
File “memryx/neural_compiler/nc.py”, line 1660, in memryx.neural_compiler.nc.NeuralCompiler.map
File “memryx/neural_compiler/mapper/map.py”, line 974, in memryx.neural_compiler.mapper.map.MultiSweepMapper.run
File “memryx/neural_compiler/mapper/map.py”, line 891, in memryx.neural_compiler.mapper.map.run_one_sweep
File “memryx/neural_compiler/mapper/map.py”, line 331, in memryx.neural_compiler.mapper.map.Mapper.run
File “memryx/neural_compiler/mapper/map.py”, line 867, in memryx.neural_compiler.mapper.map.Mapper.__failed_exit
memryx.errors.ResourceError: Resource Mapping failed: Please try using more chips (-c , --num_chips).

We do want to optimize away the transpose / reshape ops in the box keypoint head for other reasons, so I’ll try again once we’ve got that change done. Meanwhile, if you’re open to it I can share the ONNX model with you, but would have to do so privately rather than posting a public link.

Hmm, that hang seems like an odd bug. Manually selecting a crop point with --inputs / --outputs might be necessary. I’ll DM you about sharing the model (feel free to zero-out weights if that’s important to you – we just need to look at the graph).