Planning a governed multi-node edge setup around MX3
Using a bit of hardware downtime to document the direction I’m pushing toward next.
The goal is to move beyond “run inference and hope” toward a governed edge runtime: where execution is explicitly controlled, outcomes are verifiable, and enough traceability is preserved to reason about mismatches or failure states after the fact.
This is still early and not production-ready. The immediate focus is defining the next proof boundary without overcommitting to the wrong topology.
The hardware direction I’m exploring is a small multi-node edge fabric:
- MX3 as the primary accelerator path
- eventual scaling to multiple MX3 modules for device-aware scheduling and failover testing
- dual-port QSFP+/40GbE links between nodes for higher-throughput data movement
- a three-node layout where execution, verification, and state can be separated
What I’m trying to better understand is how others are approaching MX3 beyond single-host / single-device configurations.
Specifically interested in practical experience around:
- running multiple MX3 modules within a single system vs distributing across nodes
- PCIe layout and lane allocation strategies
- cooling and enclosure considerations at higher densities
- host-to-host movement of frames or inference artifacts
- maintaining observability and reproducibility in accelerator-backed workflows
I’m intentionally avoiding overbuilding until I have a clearer understanding of the constraints. The goal is to make the next hardware step reinforce a clean execution boundary, not just scale capacity.
If anyone has hands-on experience with multi-MX3 or multi-node edge setups, I’d appreciate any lessons learned.