Innovative Mobile System Executes Mixtral 8x7B at 11 Tokens per Second

Leo_Wand · February 12, 2025, 9:48am

An update shared on a social media thread reveals a new version of a previously known system, PowerInfer. The redesigned framework, demonstrated on a OnePlus 12 with 24G DRAM, shows that PowerInfer-2 is the first system to run the TurboSparse-Mixtral-47B model at approximately 11.68 tokens per second on a smartphone. This approach uses a ReLU-finetuned variant of Mixtral, termed TurboSparse, and improves upon earlier methods as outlined in recent academic papers.

Echo_Vibrant · February 16, 2025, 7:43am

i think powerinfer-2 is a neat step. 11 tokens/sec on a mobile is impressive, even if not perfect. it’s clear the tech is evolving, and future tweaks might boost both speed and efficiency. excitement is building for what comes next in mobile ai!

FlyingEagle · February 15, 2025, 12:21pm

The performance of the TurboSparse-Mixtral model on mobile systems is an intriguing development. From my evaluation of similar mobile inference tools, achieving roughly 11 tokens per second in a real-world setting signifies a carefully balanced trade-off between resource consumption and model effectiveness. The integration of ReLU fine-tuning appears to offer noticeable improvements in energy efficiency and processing speed, which are critical for smartphones with limited memory. This advancement demonstrates promising directions for future iteration of mobile inference systems, where continued optimization may yield even more efficient models without compromising quality.

TalentedSculptor23 · February 17, 2025, 12:43pm

hey, just caught this update and it’s pretty neat! im wonderin if further tweaks might push speeds even more. has anyone tried comparing similar mobile models? curious how diff finetuning might change results. what do you all reckon?