
Microsoft Unveils Maia 200 Reasoning Chip to Cut AI Portion Expenses
- By John K. Waters
- 02/05/26
Microsoft recently presented Maia 200, a customized accelerator aimed at lowering the expense of running artificial intelligence workloads at cloud scale, as major providers want to curb soaring reasoning costs and reduce dependence on Nvidia graphics processors.
The chip is developed particularly for reasoning, the phase in which qualified designs produce text, images and other outputs. As AI services transition from pilots to everyday production use, the cost of producing tokens has actually become a progressively considerable share of overall spending. Microsoft said Maia 200 is planned to attend to those economics through lower-precision calculate, high-bandwidth memory and networking enhanced for large AI clusters.
“Today, we’re proud to introduce Maia 200, an advancement reasoning accelerator engineered to considerably enhance the economics of AI token generation,” Scott Guthrie, Microsoft’s executive vice president for Cloud and AI, composed in an article announcing the chip.
Maia 200 is built on TSMC’s 3-nanometer procedure and is created around lower-precision mathematics utilized in contemporary inference work. Microsoft stated each chip contains more than 140 billion transistors and delivers more than 10 petaFLOPS in 4-bit precision (FP4), and more than 5 petaFLOPS in 8-bit accuracy (FP8), within a 750-watt thermal envelope. The chip includes 216 gigabytes of HBM3e memory with 7 terabytes per second of bandwidth, 272 megabytes of on-chip SRAM, and data motion engines to reduce bottlenecks that can restrict real-world throughput even when raw compute is high.
“Most importantly, FLOPS aren’t the only component for faster AI,” Guthrie wrote. “Feeding data is equally essential.”
The launch comes as Microsoft, Google, and Amazon invest greatly in custom silicon alongside Nvidia GPUs. Google’s TPU family and Amazon’s Trainium chips use options within their cloud services, and Microsoft has long signified that it desires greater control over expenses and capability in its AI infrastructure. Maia 200 follows Maia 100, introduced in 2023, and the business is placing the brand-new chip as an inference-focused workhorse for its AI items.
Microsoft stated Maia 200 will support numerous designs, including “the most recent GPT-5.2 designs from OpenAI,” and will be used to deliver a performance-per-dollar benefit to Microsoft Foundry and Microsoft 365 Copilot. The company likewise stated its Microsoft Superintelligence team plans to utilize Maia 200 for artificial data generation and support knowing as it establishes in-house models. Guthrie wrote that, for synthetic data pipelines, Maia 200’s style can speed up the generation and filtering of “premium, domain-specific information.”
The chip is likewise an effort to compete on headline performance with hyperscaler competitors. Guthrie composed that Maia 200 is “the most performant, first-party silicon from any hyperscaler,” adding that it offers “3 times the FP4 efficiency of the 3rd generation Amazon Trainium” and “FP8 performance above Google’s seventh generation TPU.” Reuters-style contrasts typically hinge on vendor-provided standards, and Microsoft did not, in its post, provide complete test setups for those claims.