FuriosaAI’s RNGD is a new AI inference chip designed for high-performance data centers and capable of efficiently handling large language models. It delivers up to 512 TFLOPS performance with a low power consumption of 150–200W. The chip features a tensor contraction processor (TCP) for efficient operations, supported by 48GB of HBM3 memory. It is sampling now, with broader availability expected in early 2025.
What do we think? FuriosaAI is proving that there’s room for innovation in the rather congested AI hardware space with a non-MatMul, tensor contraction processor (TCP)-based architecture, but it’s not innovation that matters, it’s the trinity of performance, power efficiency, and price. Claimed numbers would make RNGD competitive in two of those areas. Cost has yet to be disclosed. Also to be seen is the software complexity of replacing MatMul-focused solutions with the TCP, which uses a higher level of abstraction. ONNX and Pytorch are supported today through translation.
The silicon RNGD
AI hardware competition is dauntingly fierce. But every so often, a new contender enters the ring that looks like they could go a few rounds, and maybe even reach the bell. Enter FuriosaAI’s latest brainchild, debuted by FuriosaAI’s CEO, June Paik, at Hot Chips 2024: RNGD, an AI accelerator that seeks to be a high-performance computing champion. And yes, it’s pronounced “renegade”—because, of course, it is.
RNGD is FuriosaAI’s new AI inference chip, designed for high-performance data centers. It’s poised to handle large language models (LLMs) and multi-modal model inference with high efficiency. A single RNGD PCIe card delivers a claimed 2,000 to 3,000 tokens per second throughput performance (depending on context length) for models with around 10 billion parameters. RNGD performance is up to a claimed 512 TFLOPS at the typical FP8 precision (BF16, int8, and int4 are also supported).
The RNGD is designed to be fast. But FuriosaAI also has a penchant for sustainable computing, with a TDP in the low hundreds (150–200W) compared to the thousands required for equivalent GPU performance.
The RNGD chip is born from the combined talents of engineers from AMD, Qualcomm, and Samsung. The first-generation chip in 2021 went from silicon samples to a claimed 113% MLPerf performance increase in the next submission through compiler enhancements. That chip is now shipping in small volume in South Korea but enabled the company to raise about US $60 million for RNGD development. And now, with RNGD, the company says it will smash records.
What makes RNGD the new hot thing?
- A non-MatMul, tensor contraction processor (TCP)-based architecture that enables a balance of efficiency, programmability, and performance.
- Programmability through a robust compiler co-designed to be optimized for TCP that treats entire models as single-fused operations.
- Efficiency, with a TDP of 150W compared to 1,000W-plus for leading GPUs.
- High-performance, with 48GB of HBM3 memory delivering the ability to run models like Llama 3.1 8B efficiently on a single card.
Industry giants like Supermicro and GUC are already singing RNGD’s praises. Supermicro SVP Vik Malyala pointed out that integrating RNGD tech into their systems significantly cuts power consumption while still delivering top-tier performance. GUC’s CMO Aditya Raina called RNGD “the most efficient AI inference chips in the industry.” High praise from some serious players.
FuriosaAI says RNGD is an ideal processor for neural networks because tensor contraction is a more natural abstraction than current solutions. We don’t disagree, but FuriosaAI will be relying on that compiler translating existing ONNX and Pytorch models to its TCP for quite some time, and results may vary. A lower-level abstraction is also available to advanced users and will surely deliver the best performance. Reportedly, over two-thirds of FuriosaAI engineers are focused on the software problem.
Early-access customers are sampling the chip now, with broader availability scheduled for early 2025. At Hot Chips, the company gave a hands-on look at the fully functioning RNGD card.