We engineer tomorrow to build a better future.
Solutions to your liquid cooling challenges.
 
 
DANFOSS
数据中心液冷产品
  数据中心液冷产品
  FD83接头
  UQD快速接头
  UQDB盲插接头
  BMQC盲插接头
  MQD液冷接头
  MQD02液冷接头
  MQD03液冷接头
  MQD04液冷接头
  MQDB盲插接头
  EHW194液冷软管
  EHW094液冷软管
  5400制冷剂接头
  不锈钢90度旋转接头
  Manifold 分水器
  液冷系统生产及集成

卓越成长 业绩突破
Performance Outstanding Award
2025奖项获得者



 
选型资料下载
  新闻通告
  成功案例
  行业动态
  资料下载
 
汉深公司仓库

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


   

 

 

微软发布Maia 200芯片,挑战亚马逊和谷歌在AI领域的地位
Jan 26, 2026 | Scott Guthrie - Executive Vice President, Cloud + AI

 


微软发布新一代AI芯片Maia 200,采用台积电3nm工艺制造,拥有超过1000亿个晶体管。该芯片在FP4性能上比亚马逊第三代Trainium芯片高出3倍,FP8性能超越谷歌第七代TPU。微软称其推理效率比现有硬件提升30%,将用于托管OpenAI的GPT-5.2模型以及Microsoft 365 Copilot服务。芯片今日开始在Azure美国中部数据中心部署。

微软今日宣布推出其第二代自研AI芯片Maia 200,这款芯片采用台积电3纳米制程工艺,正式向亚马逊和谷歌等竞争对手发起挑战。

微软宣称,Maia 200 AI加速器在性能方面实现了重大突破,其FP4性能是亚马逊第三代Trainium芯片的3倍,FP8性能也超越了谷歌第七代TPU。这款芯片集成了超过1000亿个晶体管,专门设计用于处理大规模AI工作负载。

微软云计算和AI部门执行副总裁Scott Guthrie表示:"Maia 200能够轻松运行当今最大的模型,并为未来更大规模的模型提供充足的性能余量。"他同时指出,Maia 200是微软部署的最高效推理系统,与当前最新一代硬件相比,每美元性能提升了30%。

微软计划使用Maia 200托管OpenAI的GPT-5.2模型,并为Microsoft Foundry和Microsoft 365 Copilot提供支持。这标志着微软在AI基础设施方面的重大进展,也体现了其在云计算市场与竞争对手差异化竞争的决心。

值得注意的是,这次微软的表现与2023年首次发布Maia 100时的策略截然不同。当时微软避免与亚马逊和谷歌进行直接的AI云服务能力比较,而现在则主动展示性能优势。不过,谷歌和亚马逊也在积极开发下一代AI芯片,亚马逊甚至与英伟达合作,将即将推出的Trainium4芯片与NVLink 6和英伟达的MGX机架架构进行集成。

微软的超级智能团队将成为首批使用Maia 200芯片的用户。此外,微软还邀请学术研究人员、开发者、AI实验室和开源模型项目贡献者参与Maia 200软件开发套件的早期预览。从今天开始,微软将在其Azure美国中部数据中心区域部署这些新芯片,后续将扩展到其他地区。

Q&A

Q1:Maia 200芯片的主要技术优势是什么?

A:Maia 200采用台积电3纳米制程,集成超过1000亿个晶体管。其FP4性能是亚马逊第三代Trainium的3倍,FP8性能超过谷歌第七代TPU,同时每美元性能比微软当前最新硬件提升30%。

Q2:微软Maia 200芯片将用于哪些服务?

A:微软将使用Maia 200托管OpenAI的GPT-5.2模型,并为Microsoft Foundry和Microsoft 365 Copilot提供支持。微软超级智能团队将是首批用户。

Q3:普通开发者如何使用Maia 200芯片?

A:微软邀请学术研究人员、开发者、AI实验室和开源项目贡献者参与Maia 200软件开发套件的早期预览。芯片首先在Azure美国中部数据中心部署,后续扩展到其他地区。

 

微软甩出3nm自研AI芯片!算力超10PFLOPS,干翻AWS谷歌

HBM3e容量达216GB、读写速度达7TB/s。


芯东西1月27日报道,今日,微软宣布推出自研AI推理芯片Maia 200,并称该芯片是“目前所有超大规模数据中心中性能最高的自研芯片”,旨在显著提升AI token生成的经济效益。


Maia 200采用台积电3nm工艺制造,拥有超过1400亿颗晶体管,配备原生FP8/FP4张量核心,重新设计的内存子系统包含216GB HBM3e(读写速度高达7TB/s)和272MB片上SRAM,以及能确保海量模型快速高效运行的数据传输引擎。 Maia 200专为使用低精度计算的最新模型而设计,每块芯片在FP4精度下可提供超过10PFLOPS的性能,在FP8精度下可提供超过5PFLOPS的性能,所有这些都控制在750W的SoC TDP范围内。 其FP4性能是亚马逊自研AI芯片AWS Trainium3的3倍多,FP8性能超过了谷歌TPU v7。

Peak specifications Azure Maia 200 AWS Trainium3 Google TPU v7
Process Node 3nm 3nm 3nm
FP4 TFLOPS 10,145 2,517 n/a
FP8 TFLOPS 5,072 2,517 4,614
BF16 TFLOPS 1,268 671 2,307
HBM technology HBM3E HBM3E HBM3E
HBM BW (TB/s) 7 4.9 7.4
HBM capacity (GB) 216 144 192
Scale-up BW: bidirectional (TB/s) 2.8 TB/s 2.2-2.56 TB/s 1.2 TB/s

▲ Azure Maia 200、AWS Trainium3、谷歌TPU v7的峰值规格对比

 

Maia 200重新设计的内存子系统以窄精度数据类型、专用DMA引擎、片上SRAM和用于高带宽数据传输的专用片上网络(NoC)架构为核心,从而提高token吞吐量。
互连方面,Maia 200提供2.8TB/s双向专用扩展带宽,高于AWS Trainium3的2.56TB/s和谷歌TPU v7的1.2TB/s。
Maia 200也是微软迄今为止部署的最高效推理系统,每美元性能比微软目前部署的最新一代硬件提升了30%。

01. 能运行当前最大模型, 将支持GPT-5.2

根据微软博客文章,Maia 200可轻松运行当今最大的模型,并为未来更大的模型预留了充足的性能空间。 作为微软异构AI基础设施的一部分,Maia 200将支持多种模型,包括OpenAI最新的GPT-5.2模型,从而为Microsoft Foundry和Microsoft 365 Copilot带来更高的性价比。

Maia 200与微软Azure无缝集成。微软正在预览Maia软件开发工具包(SDK),其中包含一套完整的工具,用于构建和优化Maia 200模型。 它包含全套功能,包括PyTorch集成、Triton编译器和优化的内核库,以及对Maia底层编程语言的访问。这使开发者能够在需要时进行细粒度控制,同时实现跨异构硬件加速器的轻松模型移植。
微软超级智能团队将利用Maia 200进行合成数据生成和强化学习,以改进下一代内部模型。
在合成数据管道用例方面,Maia 200的独特设计有助于加快高质量、特定领域数据的生成和筛选速度,为下游训练提供更新、更具针对性的信号。
Maia 200已部署在微软位于爱荷华州得梅因附近的美国中部数据中心区域,接下来将部署位于亚利桑那州凤凰城附近的美国西部3数据中心区域,未来还将部署更多区域。

02. 支持2.8TB/s双向带宽、 6144块芯片互连

在系统层面,Maia 200引入了一种基于标准以太网的新型双层可扩展网络设计。定制的传输层和紧密集成的网卡无需依赖专有架构,即可实现卓越的性能、强大的可靠性和显著的成本优势。
每块芯片提供2.8TB/s双向专用扩展带宽,以及在多达6144块芯片的集群上公开可预测的高性能集体操作。


▲ Maia 200刀片服务器的俯视图

每个托架内,4块Maia芯片通过直接的非交换链路完全连接,实现高带宽的本地通信,以获得最佳推理效率。
机架内和机架间联网均采用相同的通信协议,即Maia AI传输协议,能够以最小的网络跳数实现跨节点、机架和加速器集群的无缝扩展。
这种统一的架构简化了编程,提高了工作负载的灵活性,并减少了闲置容量,同时在云规模下保持了一致的性能和成本效益。
该架构可为密集推理集群提供可扩展的性能,同时降低Azure全球集群的功耗和总拥有成本。

03. 将芯片部署时间缩短一半, 提升每美元和每瓦性能

Maia 200芯片首批封装件到货后数日内,AI模型就能在其上运行,从首批芯片到首个数据中心机架部署的时间可缩短至同类AI基础设施项目的一半以上。
这种从芯片到软件再到数据中心的端到端解决方案,直接转化为更高的资源利用率、更快的生产交付速度,以及在云规模下持续提升的每美元和每瓦性能。


▲ Maia 200机架和HXU冷却单元的视图

这归因于,微软芯片开发计划的核心原则是在最终芯片上市之前,尽可能多地验证端到端系统。
从架构的早期阶段开始,一套精密的芯片前开发环境就指导着Maia 200的开发,它能够高保真地模拟大语言模型的计算和通信模式。
这种早期协同开发环境使微软能够在首块芯片问世之前,将芯片、网络和系统软件作为一个整体进行优化。
微软从设计之初就将Maia 200定位为数据中心内快速、无缝的可用性解决方案,并对包括后端网络和第二代闭环液冷热交换器单元在内的一些最复杂的系统组件进行了早期验证。
与Azure控制平面的原生集成,可在芯片和机架级别提供安全、遥测、诊断和管理功能,从而最大限度地提高生产关键型AI工作负载的可靠性和正常运行时间。

04.
结语:在全球基础设施部署,
为未来几代AI系统托举

大规模AI时代才刚刚开始,基础设施将决定其发展的可能性。
随着微软在全球基础设施中部署Maia 200,微软已在为未来几代AI系统进行设计,并期望每一代系统都能不断树立新的标杆,为重要的AI工作负载带来更出色的性能和效率。
微软诚邀开发者、AI创企和学术界人士使用全新Maia 200 SDK开始探索早期模型和工作负载优化。
该SDK包含Triton编译器、PyTorch支持、NPL底层编程以及Maia模拟器和成本计算器,可在代码生命周期的早期阶段优化效率。

 

Maia 200: The AI accelerator built for inference
Jan 26, 2026 | Scott Guthrie - Executive Vice President, Cloud + AI

The Maia 200 AI accelerator chip with cables and equipment in the background.
Today, we’re proud to introduce Maia 200, a breakthrough inference accelerator engineered to dramatically improve the economics of AI token generation. Maia 200 is an AI inference powerhouse: an accelerator built on TSMC’s 3nm process with native FP8/FP4 tensor cores, a redesigned memory system with 216GB HBM3e at 7 TB/s and 272MB of on-chip SRAM, plus data movement engines that keep massive models fed, fast and highly utilized. This makes Maia 200 the most performant, first-party silicon from any hyperscaler, with three times the FP4 performance of the third generation Amazon Trainium, and FP8 performance above Google’s seventh generation TPU. Maia 200 is also the most efficient inference system Microsoft has ever deployed, with 30% better performance per dollar than the latest generation hardware in our fleet today.

Maia 200 is part of our heterogenous AI infrastructure and will serve multiple models, including the latest GPT-5.2 models from OpenAI, bringing performance per dollar advantage to Microsoft Foundry and Microsoft 365 Copilot. The Microsoft Superintelligence team will use Maia 200 for synthetic data generation and reinforcement learning to improve next-generation in-house models. For synthetic data pipeline use cases, Maia 200’s unique design helps accelerate the rate at which high-quality, domain-specific data can be generated and filtered, feeding downstream training with fresher, more targeted signals.

Maia 200 is deployed in our US Central datacenter region near Des Moines, Iowa, with the US West 3 datacenter region near Phoenix, Arizona, coming next and future regions to follow. Maia 200 integrates seamlessly with Azure, and we are previewing the Maia SDK with a complete set of tools to build and optimize models for Maia 200. It includes a full set of capabilities, including PyTorch integration, a Triton compiler and optimized kernel library, and access to Maia’s low-level programming language. This gives developers fine-grained control when needed while enabling easy model porting across heterogeneous hardware accelerators.

YouTube Video

Engineered for AI inference
Fabricated on TSMC’s cutting-edge 3-nanometer process, each Maia 200 chip contains over 140 billion transistors and is tailored for large-scale AI workloads while also delivering efficient performance per dollar. On both fronts, Maia 200 is built to excel. It is designed for the latest models using low-precision compute, with each Maia 200 chip delivering over 10 petaFLOPS in 4-bit precision (FP4) and over 5 petaFLOPS of 8-bit (FP8) performance, all within a 750W SoC TDP envelope. In practical terms, Maia 200 can effortlessly run today’s largest models, with plenty of headroom for even bigger models in the future.

A close-up of the Maia 200 AI accelerator chip.

Crucially, FLOPS aren’t the only ingredient for faster AI. Feeding data is equally important. Maia 200 attacks this bottleneck with a redesigned memory subsystem. The Maia 200 memory subsystem is centered on narrow-precision datatypes, a specialized DMA engine, on-die SRAM and a specialized NoC fabric for high?bandwidth data movement, increasing token throughput.

A table with the title “Industry-leading capability” shows peak specifications for Azure Maia 200, AWS Trainium 3 and Google TPU v7.

Optimized AI systems
At the systems level, Maia 200 introduces a novel, two-tier scale-up network design built on standard Ethernet. A custom transport layer and tightly integrated NIC unlocks performance, strong reliability and significant cost advantages without relying on proprietary fabrics.

Each accelerator exposes:

2.8 TB/s of bidirectional, dedicated scaleup bandwidth
Predictable, high-performance collective operations across clusters of up to 6,144 accelerators
This architecture delivers scalable performance for dense inference clusters while reducing power usage and overall TCO across Azure’s global fleet.

Within each tray, four Maia accelerators are fully connected with direct, non?switched links, keeping high?bandwidth communication local for optimal inference efficiency. The same communication protocols are used for intra-rack and inter-rack networking using the Maia AI transport protocol, enabling seamless scaling across nodes, racks and clusters of accelerators with minimal network hops. This unified fabric simplifies programming, improves workload flexibility and reduces stranded capacity while maintaining consistent performance and cost efficiency at cloud scale.

A top-down view of the Maia 200 server blade.

A cloud-native development approach
A core principle of Microsoft’s silicon development programs is to validate as much of the end-to-end system as possible ahead of final silicon availability.

A sophisticated pre-silicon environment guided the Maia 200 architecture from its earliest stages, modeling the computation and communication patterns of LLMs with high fidelity. This early co-development environment enabled us to optimize silicon, networking and system software as a unified whole, long before first silicon.

We also designed Maia 200 for fast, seamless availability in the datacenter from the beginning, building out early validation of some of the most complex system elements, including the backend network and our second-generation, closed loop, liquid cooling Heat Exchanger Unit. Native integration with the Azure control plane delivers security, telemetry, diagnostics and management capabilities at both the chip and rack levels, maximizing reliability and uptime for production-critical AI workloads.

As a result of these investments, AI models were running on Maia 200 silicon within days of first packaged part arrival. Time from first silicon to first datacenter rack deployment was reduced to less than half that of comparable AI infrastructure programs. And this end-to-end approach, from chip to software to datacenter, translates directly into higher utilization, faster time to production and sustained improvements in performance per dollar and per watt at cloud scale.

A view of the Maia 200 rack and the HXU cooling unit.

Sign up for the Maia SDK preview
The era of large-scale AI is just beginning, and infrastructure will define what’s possible. Our Maia AI accelerator program is designed to be multi-generational. As we deploy Maia 200 across our global infrastructure, we are already designing for future generations and expect each generation will continually set new benchmarks for what’s possible and deliver ever better performance and efficiency for the most important AI workloads.

Today, we’re inviting developers, AI startups and academics to begin exploring early model and workload optimization with the new Maia 200 software development kit (SDK). The SDK includes a Triton Compiler, support for PyTorch, low-level programming in NPL and a Maia simulator and cost calculator to optimize for efficiencies earlier in the code lifecycle. Sign up for the preview here.

Get more photos, video and resources on our Maia 200 site and read more details.

Scott Guthrie is responsible for hyperscale cloud computing solutions and services including Azure, Microsoft’s cloud computing platform, generative AI solutions, data platforms and information and cybersecurity. These platforms and services help organizations worldwide solve urgent challenges and drive long-term transformation.

Tags: AI, Azure, datacenters

 

 

 
北京汉深流体技术有限公司 Hansen Fluid
Danfoss Data center liquid cooling authorized distributor
丹佛斯签约中国经销商 ~ 液冷一站式连接解决方案供应商

地址:北京市朝阳区望京街10号望京SOHO塔1C座2115室 邮编:100102
电话:010-8428 2935 , 8428 3983
手机:13910962635
Http://www.hansenfluid.com

E-mail:sales@cnmec.biz
传真:010-8428 8762

京ICP备2023024665号
京公网安备 11010502019740

Since 2007 Strong Distribution & Powerful Partnerships