HIS Tests, their team used setup with NVIDIA’s A100 and H100 GPUs, which are commonly used in data centers today, and it measured how much energy they used to run different large language models (LLMS), the flying models that produce text input -based images or videos, and many other types of AI systems.
The largest LLM Meta Lama on the Leader Board was 3.1 405B, which was an open source chat with 405 billion parameters. He used 3352.92 Jools in a two -H100 GPUS application application. It is about 0.93 watts of hours. These measurements confirmed the improvement of hardware energy efficiency. Maxterel 8x22B was the team’s largest LLM, whose team managed to run on both Empire and Hopper platforms. Running the model on two emperors GPU resulted in a hopper GPU on a GPU just 0.15 watts of hours per application.
However, what is not known is the performance of a proprietary model like GPT4, Gemini, or Grook. The ML Energy Initiative team says it is very difficult for the investigating community to come up with energy -saving problems when we do not even know what we are facing. We can estimate, but Chung insists that they need to be analyzed by error. Today we have nothing like that.
According to Chung and Chowdhury, the most important problem is lack of transparency. “Companies like Google or Open AI have no incentive to talk about electricity consumption,” said Chaudhry. If anything is, releasing the actual number will hurt them. ” “But people should understand what is actually happening, so we may somehow do them in releasing some of these numbers.”
Where the rubber meets the road
“Energy performance in the data centers follows the trend like Moore’s law,” he said, adding that the rack, which is used in the data centers between 10 to 14 Nvidia GPUs, is increasing, but the performance is improving, but the performance is improving.