Friday, May 17, 2024

MemryX MX3: Edge AI Accelerator with 5+ TFLOPS Performance

 The world of artificial intelligence (AI) is constantly evolving, and now there's a powerful AI accelerator designed specifically for edge computing: the MemryX MX3.

Last month, Jean-Luc discussed the DeGirum ORCA M.2 and USB AI accelerator, and today we'll take a look at the MemryX MX3 and its modules, which can run computer vision neural networks using common frameworks like TensorFlow, TensorFlow Lite, ONNX, PyTorch, and Keras.

Unveiling the MemryX MX3 Specifications

MemryX hasn't revealed much about the chip's performance. What we do know is that the MX3 offers over 5 TFLOPs. Here are the specifications that have been listed:

  • Activations: Bfloat16
  • Weights: 4, 8, and 16-bit
  • Batch = 1
  • ~10M parameters stored on-die
  • Host interfaces - PCIe Gen 3 I/O and/or USB 2.0/3.x
  • Power consumption - ~1.0W
  • 1-click compilation for MX-SDK when mapping multi-layer neural networks

Setting itself apart, the MX3 uses MemryX Compute Engines (MCE) which are tightly coupled with at-memory computing. This design creates a native and proprietary dataflow architecture that can utilize up to 70% of the chip with just one click. Compare this to traditional CPUs, GPUs, and DSPs that use legacy instruction sets and control flow architectures after software tuning, where chip utilization only reaches 15-30%.

MemryX MX3: More Than Just a Chip

The MX3 comes in a few forms: a bare chip die, a single chip package, or modules (mini PCIe or M.2) with one or more MemryX MX3 chips.

MX3 EVB (Evaluation Board): This evaluation board (EVB) has four MX3 chips, and you can cascade multiple EVB boards using a single interface to get the required inference power. Each of these four chips is packaged in a single package.

MemryX MX3: Software Development Kit (SDK)

The MX SDK assists in the simulation and deployment of trained AI models. MemryX builds its products to:

  • Deliver the best performance per watt
  • Run models trained on popular frameworks without requiring software changes or retraining
  • Provide high scalability and granularity
  • Run AI models with the same performance on any host processor regardless of system load
  • Provide the same 1-click SDK (compiler software)

The developer hub of this SDK consists of a compiler (for graphics processing, mapping, and assembly), utility tools (bit accuracy simulator, performance analyzer, profiler, chip helper tool, and application template), and a runtime environment with APIs, OS drivers, and runtime dataflow.

MX3 SDK and Edge Impulse: You can use the MX3 EVB with Edge Impulse deployments after installing dependencies like Python 3.8+, MemryX tools and drivers, and Edge Impulse (for Linux). Next, connect the board to Edge Impulse, and then verify the connection by opening your project and clicking "devices".

MemryX MX3: Promising Performance

While the company hasn't provided many details about the chip's performance, they have uploaded a demo video using a virtual camera input from AirSim - a dataset creation software for autonomous vehicles and drones - comparing a computer with an MX3 M.2 module to a computer with an NVIDIA 4060 GPU.

The latency is very low when using the MX3 module, but it increases drastically when switching to the NVIDIA 4060 GPU. Additionally, the video also shows the noise generated by the GPU's cooling fan.

Want to learn more? You can visit the official MemryX website.

Additional Information

I hope this translation is helpful! Please let me know if you have any other questions.

0 comments:

Post a Comment