The 2024 AI HW Summit: Here’s What Caught My Attention

The Summit drew over 1000 attendees this year, with scores of presentations and hundreds of AI leaders from large companies as well as many startups.

Every September since 2019, the AI HW Summit in the Bay Area has been the focal point for new technologies around AI. While the event, hosted by UK-based Kisaco, started with semiconductors, it has continually expanded its focus to include software, models, networking, and full data center optimization. Next year it will return as the AI Infra Summit, acknowledging that AI has become a full-stack endeavor that consumes entire data centers.

It may surprise some that Nvidia did not present at the event. They don’t see the need, since everyone knows who they are and how fast their GPUs are.

Here’s a few insights from the event.

A Food Fight Erupted Over the Claim, “Fastest Inference on the Planet”

Seems like the battle over inference services is really heating up, with Cerebras, Groq, and Samba Nova all claiming to be the fastest available tokens-as-a-service. Now, I am fairly confident that nobody is lying here, but let’s just say each company is cherry-picking the size of the Llama3.1 model they want to tout. And they are mostly referencing tests run by Artificial Analysis, which has results on its website.

Here’s Cerebras’ benchmark results:

Cerebras claims it is the fastest for inference using Llama3.1-8 and 70B

Artificial Analysis

And Samba Nova claims the fastest 405B parameter Llama 3.1. There were a lot of discussions on why Groq and Cerebras has not (yet) run this model. It could be that they don’t have enough SRAM on their systems to do as well. Or they just don’t have enough time. (note: OctoAI is reportedly in acquisition discussion now with Nvidia.)

Samba Nova claims THEY are the fastest….

Samba Nova

And here are Groq’s results. Groq seems to be picking up steam, and landed $640M investment from Blackrock and others. The company’s development cloud has quickly grown to over 360,000 developers building AI on GroqCloud. Groq also landed a large data center deal with Aramco to build a giant data center in Saudi Arabia that could grow to some 200,000 Language Processing Units.

But in larger models like the Llama 3.1 70B. Groq claims leadership

Artificial Analysis

So, I went to the AA website (no, not THAT AA, though perhaps I should) and found a very interesting chart that proclaims Cerebras the winner of 70B in performance and price per million tokens at just under 50 cents.

As you can see, Cerebras wins the 70B Instruct model performance crown and the best price per 1M … [+] tokens.

Artificial Analysis

Confused? So am I. But here’s the deal. Artificial Analysis runs a wide variety of models on whatever hardware a model service provider uses. They don’t do any tuning, and Nvidia is only represented by the providers who use an unspecified Nvidia GPU. They don’t disclose how many accelerators were used in the runs, nor which lower-level software was used.

Inference will become a larger market than training AI, and all three of these companies have demonstrated a massive leap forward in lowering the costs of using AI in real-world applications. Nice job! Now, if you could just publish some MLPerf results, we’d all feel better. AA provides a great service, but does not replace benchmarks run by the hardware providers themselves such as MLPerf, whose benchmarks are all peer-reviewed prior to release.

Optical is the Next Big Thing

How many times have we heard that? Its always coming “soon”. Yes, optical interconnects are widely used for rack-to-rack connectivity in modern data centers to get around the length limitations of copper and the need for retimers. But optical is rarely used within a rack, where the cable lengths are not a problem for the cheaper copper solutions.

The Celestial optical fabric can be used for system to HBM memory and System to System connectivity. … [+]

Celestial AI

But that may be about to change. Celestial AI is developing an elegant and performant design they were touting at the conference. Their approach could help solve the “memory wall” GPUs contend with today, by providing access to over 33 TB of shared HBM memory space. They claim they can lower costs by over 25 times, power by 8 times, and RDMA latency by 5 times, all while providing over 4 times better bandwidth. We will be watching these guys closely as they finish engineering their 1st generation.

Celestial AI’s Fabric can reduce costs by over 20-fold, power by 8-fold, and latency by 5-fold … [+] according to the company.

Celestial AI

What Ever Happened to Analog Computing?

There is a lot of research going on at IBM, Intel, and elsewhere to develop a performant analog in-memory compute solution. It looks great in PowerPoint, but the D-to-A converters add latency, and the size of memory is not conducive to running the LLMs that are driving billions of dollars of investment these days.

Enter Mentium, a startup out of UC Santa Barbara, that is building a platform that combines a digital processor with an in-memory-compute analog processor they believe provides the best of both worlds.

Mentium combines an analog processor for lage kernels with a digital processor for oft-used kernels … [+] for better power efficiency in Edge AI.

Mentium

As an important aside, Mentium switched from in-house EDA tool hosting to the Synopsys Cloud, hosted on Microsoft Azure. The switch save the company months of development time and costs, while reducing the complexity they were facing using on-prem EDA tools.

Synopsys offers its cloud-based EDA development environment to help startups design chips. including … [+] Mentium.

Synopsys

The Mega-NIC From Enfabrica Is Coming Soon

One of Nvidia’s greatest assets is NVLink, which interconnects up to 512 GPUs at 100 GB/s per link, and is 14 times faster than PCIe. But what about the “rest of the story”; how do you connect the GPU nodes? It takes a lot of switches.

Today’s network topology interconnecting GPUs across the data center requires a PCIe Switches, NICS, … [+] Rail Switches and Spine switches.

Enfabrica

Enfabrica came out of stealth at last year’s AI HW Summit, with backing by Nvidia and a who’s who of Venture Capitalists. This year, the company is closer to productization, and expanded their value proposition to include failover features so important to AI Training.

The Enfabrica Super-NIC fabric, enabled by the ACF-S silicon, eliminates the PCIe, NIC, and … [+] Switches.

Enfabrica

When adoption begins in 2025, we expect Enfabrica to become a darling of the industry, and they should see significant adoption.

Conclusions

Whew! That was a lot of slides and four full days of companies striving for AI efficiency. And the focus has expanded far above the chips that power AI. For example, Meta showed failure data and a three-pronged strategy to deal with the certainty of failures: Avoid failures, detect failures, and tolerate the inevitability.

The Causes of failures in Meta’s massive data centers.

The 2024 AI HW Summit: Here’s What Caught My Attention

A Food Fight Erupted Over the Claim, “Fastest Inference on the Planet”

Optical is the Next Big Thing

What Ever Happened to Analog Computing?

The Mega-NIC From Enfabrica Is Coming Soon

Other Stories Worth Telling

Positron:

Furiosa AI:

Broadcom and the Ultra Ethernet Consortium

Conclusions

NASCAR: Denny Hamlin to have new crew chief for 2025 Cup Series season

McDonald’s will serve up a McValue offering in 2025. Here’s what’s in it.

Golfweek Gift Guide 2024: Best customizable and personalized golf gifts

Martin Slumbers hangs his hat on making British Open big and promoting women’s golf

Sorry, But Mercury Retrograde Is Going to Mess With Your Holidays

More like this
Related

NASCAR: Denny Hamlin to have new crew chief for 2025 Cup Series season

McDonald’s will serve up a McValue offering in 2025. Here’s what’s in it.

Golfweek Gift Guide 2024: Best customizable and personalized golf gifts

Martin Slumbers hangs his hat on making British Open big and promoting women’s golf

About us

Company

The latest

NASCAR: Denny Hamlin to have new crew chief for 2025 Cup Series season

McDonald’s will serve up a McValue offering in 2025. Here’s what’s in it.

Golfweek Gift Guide 2024: Best customizable and personalized golf gifts

The 2024 AI HW Summit: Here’s What Caught My Attention

A Food Fight Erupted Over the Claim, “Fastest Inference on the Planet”

Optical is the Next Big Thing

What Ever Happened to Analog Computing?

The Mega-NIC From Enfabrica Is Coming Soon

Other Stories Worth Telling

Positron:

Furiosa AI:

Broadcom and the Ultra Ethernet Consortium

Conclusions

More like thisRelated

About us

Company

The latest

More like this
Related