MLPerf Client: A New Benchmark Suite for Desktop AI

Introduction

Buckle up, folks! MLCommons, the esteemed organization behind the MLPerf family of machine learning benchmarks, is embarking on a groundbreaking new mission: developing a benchmark suite exclusively for desktop AI. Dubbed MLPerf Client, this benchmark suite will meticulously evaluate the performance of traditional desktop PCs, workstations, and laptops in the realm of AI. The initial iteration of this benchmark suite will revolve around Meta’s Llama 2 LLM, with an unwavering focus on crafting a benchmark suite specifically tailored for Windows.

Goals of the MLPerf Client Benchmark

The MLPerf Client working group, a dedicated team of experts, is laser-focused on developing a benchmark that seamlessly aligns with the unique characteristics of client PCs. This means meticulously crafting a benchmark that’s appropriately sized for these devices and mirrors real-world client AI workloads. By doing so, the benchmark will deliver meaningful and actionable results, empowering users to make informed decisions about their AI hardware and software choices.

Technical Details

The inaugural version of the MLPerf Client benchmark will leverage Meta’s Llama 2 large language model (LLM), a proven performer in other MLPerf iterations. The working group has set their sights on a 7 billion parameter version of this model, known as Llama-2-7B. This specific model size is meticulously chosen to strike the perfect balance between complexity and suitability for client PCs.

The working group is diligently working on finalizing the benchmark’s specifics, with a keen eye on identifying the most representative ML workloads for client devices. Once these tasks are identified, the group will embark on the intricate task of integrating them into a user-friendly graphical benchmark, ensuring accessibility and ease of use for all.

APIs and Runtimes

The MLPerf Client benchmark will embrace a comprehensive range of commonly used and vendor-specific backends, mirroring the approach adopted by other desktop client AI benchmarks. This flexibility allows users to seamlessly plug in various execution backends, akin to UL’s Procyon AI benchmark suite, which empowers users to leverage multiple execution backends.

Hardware

The hardware utilized for executing the benchmark remains an open question, allowing for flexibility and adaptability. While the benchmark explicitly targets the burgeoning field of NPUs (Neural Processing Units), vendors are free to leverage GPUs and CPUs as they deem appropriate. This inclusivity ensures that the benchmark remains relevant across a wide spectrum of hardware configurations.

Industry Support

The MLPerf Client working group has garnered the unwavering support of industry titans, including Intel, AMD, NVIDIA, Arm, Qualcomm, Microsoft, and Dell. This collective endorsement is instrumental in driving the acceptance and widespread adoption of MLPerf for servers and will undoubtedly play a pivotal role in shaping the future of MLPerf client.

Comparison with Existing Benchmarks

The MLPerf Client benchmark is poised to join the ranks of established players in the AI benchmarking arena, such as UL’s Procyon AI benchmark and Primate Labs’ Geekbench ML, both of which have carved out a niche in Windows client AI benchmarks. MLCommons is confident that their collaborative and open approach will set them apart from the competition, propelling MLPerf Client to the forefront of AI benchmarking.

Future Plans

The initial iteration of the MLPerf Client benchmark is merely the first step in a grander journey. The working group envisions this benchmark as an evolving entity, continuously expanding to encompass additional platforms beyond Windows and incorporating a diverse range of workloads. This forward-thinking approach ensures that the benchmark remains relevant and reflective of the ever-changing landscape of AI technology.

Conclusion

The MLPerf Client benchmark stands as a monumental leap forward in the realm of AI benchmarking. It promises to revolutionize the way we evaluate the performance of client AI devices, driving innovation and propelling the development of cutting-edge AI technologies. As the benchmark evolves and expands, it will undoubtedly become an indispensable tool for researchers, developers, and consumers alike, empowering them to make informed decisions about their AI hardware and software investments.