Palo Alto, California-USA headquartered SambaNova, the generative AI company delivering the most efficient AI chips and fastest models, today announced the availability of Meta’s Llama 4 Maverick model on SambaNova Cloud — setting a new industry benchmark delivering 655 tokens per second inference speed.
“This is the fastest output speed we have measured yet for Llama 4 Maverick, and it is several times faster than the fastest speeds achieved so far on GPUs. Llama 4 Maverick is Meta’s strongest model yet and a top choice for a wide range of workloads,” said Micah Hill-Smith, CEO & Co-Founder of Artificial Analysis.
Multomodal model
Llama 4 Maverick, Meta’s flagship multimodal model featuring 400 billion total parameters/17 billion active parameters and 128 experts, outperforms industry competitors such as GPT-4o and Gemini 2.0 Flash in multilingual and visual understanding benchmarks.
Utilizing SambaNova’s advanced Reconfigurable Dataflow Unit (RDU) chips, Maverick on SambaNova Cloud delivers industry-leading performance with unparalleled computational efficiency.
“We’re thrilled to partner with Meta, bringing the fastest inference speeds ever recorded for Llama 4 Maverick to developers and enterprises,” stated Rodrigo Liang, CEO and co-founder of SambaNova. “This collaboration marks a significant leap forward in the efficiency and capability of multimodal AI models.”