Elon Musk is Using 100,000 GPUs for Grok-3 – But Why?  - AI digitalnews (2025)

Elon Musk is Using 100,000 GPUs for Grok-3 – But Why? - AI digitalnews (1)

Elon Musk, the founder and CEO of xAI, said that the company’s Grok-3 model has finished pre-training and confirmed that the model will be launched in the next 3-4 weeks.

However, the model is said to have used 10 times more compute than its predecessor, Grok-2. This is quite a bold move in an era when experts suggest that LLM scaling may have hit a wall and that additional computing while training the model will only yield diminishing returns.

Grok-3 will train on NVIDIA’s H100 GPUs – 100,000 of them. It has been trained on Colossus, which xAI claims is the world’s most powerful AI training system. He also said the cluster was built in 122 days from start to finish.

Moreover, this is the first time a cluster of this size has been built. “Grok 3 will resolve the question of whether or not we’re hitting a wall,” Gavin Baker, CIO and managing partner at Atreides Management, said on X.

Naturally, it consumes a mammoth amount of power. Yann Le Du, a physicist, calculated that training with 100k H100 GPUs is equivalent to the power consumed by 7% of a typical nuclear reactor. For a month, this consumes ~181 trillion joules of energy, which is ~10,000 times the energy consumed by a human brain over 30 years (~19 billion J at ~20W).
“Is Grok’s capacity comparable to that?” he asked. That is what everyone wants to know – will the model outperform its competitors?

So What is Musk’s Plan?

Recently, former OpenAI chief scientist Ilya Sutskever said that pre-training with more datasets might soon be over, as they are comparable to fossil fuels and may soon be exhausted. Synthetic data could be an answer to that, he said.

One of Musk’s comments earlier on an X Spaces conversation echoes this sentiment. “You really have started running into this data problem where you have to either create synthetic data or use real-world videos,” he said.
“Those are the two sources of unlimited data – synthetic data and real-world videos. Tesla has a pretty big advantage in real-world video,” he added.

Musk also revealed that the model will launch in Tesla’s vehicles soon.

Another user on X said that Grok-3 is rumoured to be the ‘most powerful base model in existence’. Unlike OpenAI’s approach to scaling laws, it won’t use test-time computing for reasoning. “I expect Grok 3 to be a failure. Test time scaling effectively enlarges the model by a factor of 1000-10000. If they can’t do it at xAI, they have just burnt a lot of money,” said a user on X.

So, is the company relying purely on a brute-force compute scaling approach?

AIM reached out to a few experts in the AI industry to understand what Grok-3 is possibly trying to do and what it may achieve. The model may be using 10 times more compute power to train, but will it achieve performance gains that are directly proportional?

Paras Chopra, an AI researcher and founder of Turing’s Dream, said, “Performance is often log-linear. So I’d say 10x more compute would have a ~double jump in performance over Grok 2.”
We also reached out to Sudipta Biswas, who has built an architecture to enhance an AI model’s ability to interact with external data sources. He suspects that Grok-3’s architecture may be noticeably different from Grok-2’s. “Hence, pretraining is required from zero,” he said.

He also suggested that Musk might be using excess computing for parallelised training: “If you are using 10x more compute, you can complete pretraining in 10x less time.”

However, despite using unprecedented computing power, Musk has failed to realise his hopes of delivering the model early enough. He previously stated that he planned to reveal it in December last year, but it has yet to materialise.

However, if the model arrives by next month, it will settle some of the most heated debates in the AI ecosystem.

The Tale of Two Models

Speaking of models pre-training on mammoth computing, China’s DeepSeek-V3 is unmissable because its approach starkly contrasts with Grok-3’s. In an earlier interview, Musk revealed that Grok-3 would finish training in three to four months, which is 100,000 GPUs x 2,880 hours. xAI used at least 200M GPU hours for training.

In contrast, the DeepSeek-V3 was trained on just 2.788 million NVIDIA H800 GPU hours. On most benchmarks, the model outperformed Meta’s 405 billion parameter Llama 3.1 and even closed-source Claude 3.5 Sonnet and GPT-4o in several tests. The model is a testament to achieving superior performance without using excess computational resources.

However, xAI is not alone. Meta revealed that its upcoming Llama 4 is being trained on a similar cluster size. “[This is] bigger than anything that I’ve seen reported for what others are doing,” said Meta chief Mark Zuckerberg in the company’s earnings report released in October.

“By the way, you can do some pretty neat reasoning stuff with a 200k GPU cluster,” said Erik Zelikman, a member of xAI’s technical team.

Again, the general concern is that these models better not waste computing power.

“It would be [kind of] sad if Grok 3 used 20x compute of Grok 2 and was still mid [mediocre]. I really hope they spend time on fixing their pipeline too, and not just GPUs,” said a user on X. However, if the company is innovating at the architectural level, like DeepSeek, and is using a 100K GPU cluster – it would yield unimaginable results.

But how close will the model get to OpenAI’s most powerful o3 model? We’ll soon know.

That said, even before delivering Grok-3, Musk revealed that Grok-4 would be released later this year and Grok-5 the next year. Moreover, the company is also hinting at building a reasoning model, as it’s looking to hire AI engineers and researchers.

The post Elon Musk is Using 100,000 GPUs for Grok-3 – But Why? appeared first on Analytics India Magazine.

Elon Musk is Using 100,000 GPUs for Grok-3 – But Why?  - AI digitalnews (2025)
Top Articles
Latest Posts
Recommended Articles
Article information

Author: Barbera Armstrong

Last Updated:

Views: 6336

Rating: 4.9 / 5 (59 voted)

Reviews: 82% of readers found this page helpful

Author information

Name: Barbera Armstrong

Birthday: 1992-09-12

Address: Suite 993 99852 Daugherty Causeway, Ritchiehaven, VT 49630

Phone: +5026838435397

Job: National Engineer

Hobby: Listening to music, Board games, Photography, Ice skating, LARPing, Kite flying, Rugby

Introduction: My name is Barbera Armstrong, I am a lovely, delightful, cooperative, funny, enchanting, vivacious, tender person who loves writing and wants to share my knowledge and understanding with you.