Home » Technical Topics » Cloud and Edge

Backgrounder: Energy inefficiency and the uneasy convenience of LLMs

  • Alan Morrison 
Backgrounder: Energy inefficiency and the uneasy convenience of LLMs

Image courtesy of Pixabay

Backgrounder: Energy inefficiency and the uneasy convenience of LLMs

I had the opportunity earlier in September 2024 courtesy of host Kisaco Research to attend the AI Hardware and Edge AI Summit here in San Jose. This is the second year I’ve attended. I also attended Kisaco’s Efficient Generative AI Summit, a separate “pre-day” event.  

These events provided a helpful opportunity to immerse myself in what’s been happening on the generative AI front and try to assess where things stand in terms of how generative AI is delivered to users. In the course of just a couple of years, GenAI has acquired its own industry subculture and power dynamics. What I’ll be sharing in this post are some preliminary impressions and background from an outsider’s perspective. 

The booming influence of hyperscalers in 2024

In a previous life in the public sector, I was trained as a policy wonk focused on Russian and  international studies. One thing I learned was that big countries like China, India, Russia and the US dominate their neighbors due to their sheer size and resulting influence. So if you’re Sri Lanka, for example, a huge amount of your internal politics is influenced by your Indian neighbor to the north.

In the tech sector, the same dynamic applies. When it comes to the generative AI landscape, the so-called “hyperscalers” are the tech sector equivalent to the biggest countries across the global political landscape. 

Hyperscalers–those who run high-performance datacenters at scale–and co-location data centers together now account for at least 60 to 70 percent of data center energy load in the US, according to an Electric Power Research Institute report published in May 2024. 

The top cloud operators–including those such as AWS, Apple, Google Cloud, IBM, Meta, Microsoft, Netflix and Oracle in the US, and companies like Alibaba, Tencent and Huawei in China–also own and operate most of the GAI infrastructure. The top three hyperscalers ranked in 2023 – AWS, Microsoft and Google — accounted for 60 percent of all data center capacity, according to Synergy Research. 

Accordingly, the buying power of the hyperscalers determines which of the infrastructure suppliers win. NVIDIA reached $3 trillion in market capitalization in June 2024 due in large part to its ability to attract and keep hyperscale customers. 

NVIDIA’s key strength beyond chip design is the dominance of its Compute Unified Device Architecture (CUDA) ecosystem. Developers and engineers using CUDA apparently stick with it, though the hyperscalers claim to hedge their bets with alternatives from other chip providers, AMD and Intel being the two most frequently mentioned..

The hyperscalers and their suppliers such as NVIDIA win by providing convenience. By default, my search engine is Google. When I do a search, Google generates a Gemini answer and puts it up top of the search results for my convenience, whether I want a genAI answer or not. I’ve found myself checking the genAI answer first in some cases, though as a trained researcher I do check the source too, which Google also conveniently provides a link to. 

I’m typing this post in Google Docs, and up top in the right corner is the Gemini icon. If I hover over the icon, the text “Try Gemini” appears. I have tried Gemini, though I use Claude.ai (from Anthropic LLM on the AWS cloud) more often, for things like coding suggestions, code checking and reviews. 

When convenience wins, energy efficiency loses

Hyperscalers make sure they provide convenience to users whenever possible. Businesspeople are pragmatic, at least in one sense. From a tactical, project by project, day by day, moment by moment perspective, they want to save time. 

What this means in practice is that businesspeople choose time saving over energy efficiency. Here in San Jose, the buses and trolleys that run daily are mostly empty, most of the time. That’s because driving or riding in a car overall takes less time. That’s the case even if the bus or light rail is faster than a car when traveling from home to work or vice versa. 

If a businessperson has to go into the office for meetings, taking the car means they can get their hair cut over their lunch break, schedule a doctor’s appointment between meetings, or pick up a few items at the grocery store on the way home.

The same uneasy convenience dynamic applies in situations involving the use of gen AI. Even though one ChatGPT query consumes ten times as much energy as a Google web search, according to EPRI, business users will continue to use something akin to ChatGPT when doing so seems to save their own time.

Continuing data center design impacts of LLMs and lower energy alternatives

Last year, I talked about how server racks were being redesigned for cooling and how the Energy Management Research Center at Schneider Electric tracks data center industry efforts to reduce its carbon footprint (See https://www.datasciencecentral.com/how-ai-growth-has-triggered-data-center-redesign/ for more information.) 

This year, I learned that while a traditional data center server rack draws up to 10 kilowatts, advanced AI data center racks draw 170 kilowatts, 17 times as much power. TechTarget’s Mackenzie Holland, paraphrasing Forrester’s Alvin Nguyen in May 2024, wrote that “a rack for generative AI in a traditional data center uses more than 200 kW of electricity.” (See https://www.techtarget.com/searchcio/news/366587217/Big-tech-invests-billions-in-AI-data-centers-globally.)

The German government imposed new requirements on data center construction at the end of 2023, and in response, the German Datacenter Association (GDA) balked at a waste heat reuse requirement, succeeding in delaying enforcement until 2028. 

Meanwhile, Rich Miller in Data Center Frontier in August noted the following: “This week’s earnings reports indicate the 4 major hyperscalers – AWS, Google, Microsoft and Meta – are investing about $50 billion per quarter on digital infrastructure, including GPUs for AI computing and the data centers to house them. That spending is likely to continue into next year, and may even increase, companies indicated.”

We live in interesting times. I’ll be sharing more findings soon from the conference, so stay tuned.

Leave a Reply

Your email address will not be published. Required fields are marked *