Hello,
I was able to assemble the heatsink on the MX3 module (thanks for the great video).
I was able to install the drivers (thanks for the great instructions).
Here is what “lspci -k” reports:
(mx) ~/memryx$ lspci
2e:00.0 Processing accelerators: Device 1fe9:0100
Subsystem: Device 1fe9:0000
Flags: bus master, fast devsel, latency 0, IRQ 46, NUMA node 0
Memory at a0000000 (32-bit, non-prefetchable) [size=256M]
Memory at b0000000 (32-bit, non-prefetchable) [size=1M]
Expansion ROM at <ignored> [disabled]
Capabilities: <access denied>
Kernel driver in use: memx_pcie_ai_chip
Kernel modules: memx_cascade_plus_pcie
I was able to install the MemryX SDK (thanks for the great instructions).
I am able to successfully run the “Hello, MXA!” example:
(mx) ~/memryx$ mx_bench --hello
Hello from MXA!
Group: 0
Number of chips: 4
Interface: PCIe 3.0
When I run the “Hello, MobileNet!” example, however, it hangs forever:
(mx) ~/memryx$ python3 -c "import tensorflow as tf; tf.keras.applications.MobileNet().save('mobilenet.h5');"
mx_nc -v -m mobilenet.h5
2025-01-20 12:47:01.901202: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-01-20 12:47:01.912572: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-01-20 12:47:01.926487: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-01-20 12:47:01.930667: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-01-20 12:47:01.940631: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-01-20 12:47:02.727591: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2025-01-20 12:47:04.193382: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1588 MB memory: -> device: 0, name: NVIDIA T400, pci bus id: 0000:21:00.0, compute capability: 7.5
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
W0000 00:00:1737395225.449573 8130 gpu_backend_lib.cc:593] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
Searched for CUDA in the following directories:
./cuda_sdk_lib
/usr/local/cuda-12.3
/usr/local/cuda
/home/albertabeef/mx/lib/python3.10/site-packages/tensorflow/python/platform/../../../nvidia/cuda_nvcc
/home/albertabeef/mx/lib/python3.10/site-packages/tensorflow/python/platform/../../../../nvidia/cuda_nvcc
.
You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions. For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
W0000 00:00:1737395225.725834 8130 gpu_kernel_to_blob_pass.cc:190] Failed to compile generated PTX with ptxas. Falling back to compilation by driver.
W0000 00:00:1737395225.727424 8132 gpu_kernel_to_blob_pass.cc:190] Failed to compile generated PTX with ptxas. Falling back to compilation by driver.
W0000 00:00:1737395225.730188 8129 gpu_kernel_to_blob_pass.cc:190] Failed to compile generated PTX with ptxas. Falling back to compilation by driver.
W0000 00:00:1737395225.731763 8134 gpu_kernel_to_blob_pass.cc:190] Failed to compile generated PTX with ptxas. Falling back to compilation by driver.
W0000 00:00:1737395225.734495 8131 gpu_kernel_to_blob_pass.cc:190] Failed to compile generated PTX with ptxas. Falling back to compilation by driver.
W0000 00:00:1737395225.737649 8133 gpu_kernel_to_blob_pass.cc:190] Failed to compile generated PTX with ptxas. Falling back to compilation by driver.
W0000 00:00:1737395225.739870 8128 gpu_kernel_to_blob_pass.cc:190] Failed to compile generated PTX with ptxas. Falling back to compilation by driver.
W0000 00:00:1737395225.741305 8127 gpu_kernel_to_blob_pass.cc:190] Failed to compile generated PTX with ptxas. Falling back to compilation by driver.
W0000 00:00:1737395225.751963 8130 gpu_kernel_to_blob_pass.cc:190] Failed to compile generated PTX with ptxas. Falling back to compilation by driver.
W0000 00:00:1737395225.753423 8132 gpu_kernel_to_blob_pass.cc:190] Failed to compile generated PTX with ptxas. Falling back to compilation by driver.
W0000 00:00:1737395225.756344 8129 gpu_kernel_to_blob_pass.cc:190] Failed to compile generated PTX with ptxas. Falling back to compilation by driver.
W0000 00:00:1737395225.757803 8134 gpu_kernel_to_blob_pass.cc:190] Failed to compile generated PTX with ptxas. Falling back to compilation by driver.
W0000 00:00:1737395225.759228 8131 gpu_kernel_to_blob_pass.cc:190] Failed to compile generated PTX with ptxas. Falling back to compilation by driver.
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet/mobilenet_1_0_224_tf.h5
17225924/17225924 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step
WARNING:absl:You are saving your model as an HDF5 file via `model.save()` or `keras.saving.save_model(model)`. This file format is considered legacy. We recommend using instead the native Keras format, e.g. `model.save('my_model.keras')` or `keras.saving.save_model(model, 'my_model.keras')`.
╭─────────────────┬─────┬─────┬────────╮
│ │ │ │ │
│ │ ├──── │
│ │ │ ╞══ ══╡ │
│ │ │ │ ├──── │
│ │ │ │ │ │ │
╰─────┴─────┴─────┴─────┴─────┴────────╯
╔══════════════════════════════════════╗
║ Neural Compiler ║
║ Copyright (c) 2019-2024 MemryX Inc. ║
╚══════════════════════════════════════╝
════════════════════════════════════════
Anonymously share diagnostic data to support optimizing performance & enabling debug support (Y/N)?
Y
Selected: Yes
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
W0000 00:00:1737395250.360234 8192 gpu_backend_lib.cc:593] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
Searched for CUDA in the following directories:
./cuda_sdk_lib
/usr/local/cuda-12.3
/usr/local/cuda
/home/albertabeef/mx/lib/python3.10/site-packages/tensorflow/python/platform/../../../nvidia/cuda_nvcc
/home/albertabeef/mx/lib/python3.10/site-packages/tensorflow/python/platform/../../../../nvidia/cuda_nvcc
.
You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions. For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
W0000 00:00:1737395250.368492 8188 gpu_kernel_to_blob_pass.cc:190] Failed to compile generated PTX with ptxas. Falling back to compilation by driver.
W0000 00:00:1737395250.369851 8185 gpu_kernel_to_blob_pass.cc:190] Failed to compile generated PTX with ptxas. Falling back to compilation by driver.
W0000 00:00:1737395250.371372 8191 gpu_kernel_to_blob_pass.cc:190] Failed to compile generated PTX with ptxas. Falling back to compilation by driver.
W0000 00:00:1737395250.372728 8186 gpu_kernel_to_blob_pass.cc:190] Failed to compile generated PTX with ptxas. Falling back to compilation by driver.
W0000 00:00:1737395250.374126 8187 gpu_kernel_to_blob_pass.cc:190] Failed to compile generated PTX with ptxas. Falling back to compilation by driver.
W0000 00:00:1737395250.375482 8192 gpu_kernel_to_blob_pass.cc:190] Failed to compile generated PTX with ptxas. Falling back to compilation by driver.
W0000 00:00:1737395250.376827 8189 gpu_kernel_to_blob_pass.cc:190] Failed to compile generated PTX with ptxas. Falling back to compilation by driver.
W0000 00:00:1737395250.378235 8190 gpu_kernel_to_blob_pass.cc:190] Failed to compile generated PTX with ptxas. Falling back to compilation by driver.
W0000 00:00:1737395250.386866 8188 gpu_kernel_to_blob_pass.cc:190] Failed to compile generated PTX with ptxas. Falling back to compilation by driver.
W0000 00:00:1737395250.390006 8185 gpu_kernel_to_blob_pass.cc:190] Failed to compile generated PTX with ptxas. Falling back to compilation by driver.
W0000 00:00:1737395250.391396 8186 gpu_kernel_to_blob_pass.cc:190] Failed to compile generated PTX with ptxas. Falling back to compilation by driver.
W0000 00:00:1737395250.392748 8191 gpu_kernel_to_blob_pass.cc:190] Failed to compile generated PTX with ptxas. Falling back to compilation by driver.
W0000 00:00:1737395250.394102 8187 gpu_kernel_to_blob_pass.cc:190] Failed to compile generated PTX with ptxas. Falling back to compilation by driver.
Converting Model: (Done)
Optimizing Graph: (Done)
Cores optimization: (Done)
Flow optimization: (Done)
. . . . . . . . . . . . . . . . . . . .
Ports mapping: (Done)
MPU 0 input port 0: {'model_index': 0, 'layer_name': 'input_layer', 'shape': [224, 224, 1, 3]}
MPU 3 output port 0: {'model_index': 0, 'layer_name': 'predictions', 'shape': [1, 1, 1, 1000]}
────────────────────────────────────────
Assembling DFP: (Done)
════════════════════════════════════════
(mx) ~/memryx$ mx_bench -v -d mobilenet.dfp -f 1000
╭─────────────────┬─────┬─────┬────────╮
│ │ │ │ │
│ │ ├──── │
│ │ │ ╞══ ══╡ │
│ │ │ │ ├──── │
│ │ │ │ │ │ │
╰─────┴─────┴─────┴─────┴─────┴────────╯
╔══════════════════════════════════════╗
║ Benchmark ║
║ Copyright (c) 2019-2024 MemryX Inc. ║
╚══════════════════════════════════════╝
Any idea what could be causing this ?
Cheers !
Mario (AlbertaBeef)