|
||||||
|
|
|
|
微软今日宣布推出其第二代自研AI芯片Maia 200,这款芯片采用台积电3纳米制程工艺,正式向亚马逊和谷歌等竞争对手发起挑战。 微软宣称,Maia 200 AI加速器在性能方面实现了重大突破,其FP4性能是亚马逊第三代Trainium芯片的3倍,FP8性能也超越了谷歌第七代TPU。这款芯片集成了超过1000亿个晶体管,专门设计用于处理大规模AI工作负载。 微软云计算和AI部门执行副总裁Scott Guthrie表示:"Maia 200能够轻松运行当今最大的模型,并为未来更大规模的模型提供充足的性能余量。"他同时指出,Maia 200是微软部署的最高效推理系统,与当前最新一代硬件相比,每美元性能提升了30%。 微软计划使用Maia 200托管OpenAI的GPT-5.2模型,并为Microsoft Foundry和Microsoft 365 Copilot提供支持。这标志着微软在AI基础设施方面的重大进展,也体现了其在云计算市场与竞争对手差异化竞争的决心。 值得注意的是,这次微软的表现与2023年首次发布Maia 100时的策略截然不同。当时微软避免与亚马逊和谷歌进行直接的AI云服务能力比较,而现在则主动展示性能优势。不过,谷歌和亚马逊也在积极开发下一代AI芯片,亚马逊甚至与英伟达合作,将即将推出的Trainium4芯片与NVLink 6和英伟达的MGX机架架构进行集成。 微软的超级智能团队将成为首批使用Maia 200芯片的用户。此外,微软还邀请学术研究人员、开发者、AI实验室和开源模型项目贡献者参与Maia 200软件开发套件的早期预览。从今天开始,微软将在其Azure美国中部数据中心区域部署这些新芯片,后续将扩展到其他地区。
Q&A Q1:Maia 200芯片的主要技术优势是什么? A:Maia 200采用台积电3纳米制程,集成超过1000亿个晶体管。其FP4性能是亚马逊第三代Trainium的3倍,FP8性能超过谷歌第七代TPU,同时每美元性能比微软当前最新硬件提升30%。 Q2:微软Maia 200芯片将用于哪些服务? A:微软将使用Maia 200托管OpenAI的GPT-5.2模型,并为Microsoft Foundry和Microsoft 365 Copilot提供支持。微软超级智能团队将是首批用户。 Q3:普通开发者如何使用Maia 200芯片? A:微软邀请学术研究人员、开发者、AI实验室和开源项目贡献者参与Maia 200软件开发套件的早期预览。芯片首先在Azure美国中部数据中心部署,后续扩展到其他地区。
微软甩出3nm自研AI芯片!算力超10PFLOPS,干翻AWS谷歌 HBM3e容量达216GB、读写速度达7TB/s。
▲ Azure Maia 200、AWS Trainium3、谷歌TPU v7的峰值规格对比
Maia 200重新设计的内存子系统以窄精度数据类型、专用DMA引擎、片上SRAM和用于高带宽数据传输的专用片上网络(NoC)架构为核心,从而提高token吞吐量。 01.
能运行当前最大模型,
将支持GPT-5.2 根据微软博客文章,Maia 200可轻松运行当今最大的模型,并为未来更大的模型预留了充足的性能空间。
作为微软异构AI基础设施的一部分,Maia 200将支持多种模型,包括OpenAI最新的GPT-5.2模型,从而为Microsoft Foundry和Microsoft 365 Copilot带来更高的性价比。 Maia 200与微软Azure无缝集成。微软正在预览Maia软件开发工具包(SDK),其中包含一套完整的工具,用于构建和优化Maia 200模型。
它包含全套功能,包括PyTorch集成、Triton编译器和优化的内核库,以及对Maia底层编程语言的访问。这使开发者能够在需要时进行细粒度控制,同时实现跨异构硬件加速器的轻松模型移植。 02.
支持2.8TB/s双向带宽、
6144块芯片互连 在系统层面,Maia 200引入了一种基于标准以太网的新型双层可扩展网络设计。定制的传输层和紧密集成的网卡无需依赖专有架构,即可实现卓越的性能、强大的可靠性和显著的成本优势。
每个托架内,4块Maia芯片通过直接的非交换链路完全连接,实现高带宽的本地通信,以获得最佳推理效率。 03.
将芯片部署时间缩短一半,
提升每美元和每瓦性能 Maia 200芯片首批封装件到货后数日内,AI模型就能在其上运行,从首批芯片到首个数据中心机架部署的时间可缩短至同类AI基础设施项目的一半以上。
这归因于,微软芯片开发计划的核心原则是在最终芯片上市之前,尽可能多地验证端到端系统。 04. 大规模AI时代才刚刚开始,基础设施将决定其发展的可能性。
The Maia 200 AI accelerator chip with cables and equipment in the background. Maia 200 is part of our heterogenous AI infrastructure and will serve multiple models, including the latest GPT-5.2 models from OpenAI, bringing performance per dollar advantage to Microsoft Foundry and Microsoft 365 Copilot. The Microsoft Superintelligence team will use Maia 200 for synthetic data generation and reinforcement learning to improve next-generation in-house models. For synthetic data pipeline use cases, Maia 200’s unique design helps accelerate the rate at which high-quality, domain-specific data can be generated and filtered, feeding downstream training with fresher, more targeted signals. Maia 200 is deployed in our US Central datacenter region near Des Moines, Iowa, with the US West 3 datacenter region near Phoenix, Arizona, coming next and future regions to follow. Maia 200 integrates seamlessly with Azure, and we are previewing the Maia SDK with a complete set of tools to build and optimize models for Maia 200. It includes a full set of capabilities, including PyTorch integration, a Triton compiler and optimized kernel library, and access to Maia’s low-level programming language. This gives developers fine-grained control when needed while enabling easy model porting across heterogeneous hardware accelerators. YouTube Video Engineered for AI inference A close-up of the Maia 200 AI accelerator chip. Crucially, FLOPS aren’t the only ingredient for faster AI. Feeding data is equally important. Maia 200 attacks this bottleneck with a redesigned memory subsystem. The Maia 200 memory subsystem is centered on narrow-precision datatypes, a specialized DMA engine, on-die SRAM and a specialized NoC fabric for high?bandwidth data movement, increasing token throughput. A table with the title “Industry-leading capability” shows peak specifications for Azure Maia 200, AWS Trainium 3 and Google TPU v7. Optimized AI systems Each accelerator exposes: 2.8 TB/s of bidirectional, dedicated scaleup bandwidth Within each tray, four Maia accelerators are fully connected with direct, non?switched links, keeping high?bandwidth communication local for optimal inference efficiency. The same communication protocols are used for intra-rack and inter-rack networking using the Maia AI transport protocol, enabling seamless scaling across nodes, racks and clusters of accelerators with minimal network hops. This unified fabric simplifies programming, improves workload flexibility and reduces stranded capacity while maintaining consistent performance and cost efficiency at cloud scale. A top-down view of the Maia 200 server blade. A cloud-native development approach A sophisticated pre-silicon environment guided the Maia 200 architecture from its earliest stages, modeling the computation and communication patterns of LLMs with high fidelity. This early co-development environment enabled us to optimize silicon, networking and system software as a unified whole, long before first silicon. We also designed Maia 200 for fast, seamless availability in the datacenter from the beginning, building out early validation of some of the most complex system elements, including the backend network and our second-generation, closed loop, liquid cooling Heat Exchanger Unit. Native integration with the Azure control plane delivers security, telemetry, diagnostics and management capabilities at both the chip and rack levels, maximizing reliability and uptime for production-critical AI workloads. As a result of these investments, AI models were running on Maia 200 silicon within days of first packaged part arrival. Time from first silicon to first datacenter rack deployment was reduced to less than half that of comparable AI infrastructure programs. And this end-to-end approach, from chip to software to datacenter, translates directly into higher utilization, faster time to production and sustained improvements in performance per dollar and per watt at cloud scale. A view of the Maia 200 rack and the HXU cooling unit. Sign up for the Maia SDK preview Today, we’re inviting developers, AI startups and academics to begin exploring early model and workload optimization with the new Maia 200 software development kit (SDK). The SDK includes a Triton Compiler, support for PyTorch, low-level programming in NPL and a Maia simulator and cost calculator to optimize for efficiencies earlier in the code lifecycle. Sign up for the preview here. Get more photos, video and resources on our Maia 200 site and read more details. Scott Guthrie is responsible for hyperscale cloud computing solutions and services including Azure, Microsoft’s cloud computing platform, generative AI solutions, data platforms and information and cybersecurity. These platforms and services help organizations worldwide solve urgent challenges and drive long-term transformation. Tags: AI, Azure, datacenters
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|