|
||||||
|
|
|
介绍 为满足AI规模数据中心的现实需求,ORW规范定义了一种开放的、双宽机架,旨在满足下一代AI系统对电力、冷却和可维护性的需求。这代表了行业向标准化、可互操作和可扩展的数据中心设计的基金会转变。 AMD很自豪地与Meta和开放计算项目社区合作,通过 Helios推进这一愿景——AMD最先进的 rack-scale 参考系统,完全基于 ORW 开放标准构建。 Helios将 AMD 从硅片到系统再到机架再到大规模集群的开放理念延伸,实现了支持 ORW 规范的开放硬件原则。 Helios:将开放标准变成机架规模现实
由AMD CDNA?架构驱动,每块MI450系列GPU提供高达432 GB的HBM4内存和19.6 TB/s的内存带宽,为渴望数据的AI模型提供行业领先的容量和带宽。在机架规模上, Helios系统搭载72块MI450系列GPU,提供高达1.4 exaFLOPS的FP8和2.9 exaFLOPS的FP4性能,总HBM4内存达31 TB,总带宽达1.4 PB/s——这是一代性的飞跃,使万亿参数训练和大规模AI推理成为可能。 Helios还提供了高达260 TB/s的垂直扩展互连带宽和43 TB/s的以太网为基础的水平扩展带宽,帮助确保GPU、节点和机架之间的无缝通信。 Helios相比上一代性能提升了高达17.9倍1,同时提供比NVIDIA的Vera Rubin系统多50%的内存容量和带宽。 Helios 机架将在 2025 年 OCP 大会上展示 这是AMD为前沿AI工作负载专门设计的第一个机架规模解决方案,为超大规模计算和企业提供了面向未来、基于开放标准的平台,该平台结合了强大的性能、灵活性和互操作性。 像 Helios这样的大规模系统对于下一代人工智能至关重要,因为性能依赖于数千个加速器之间的高效通信。AMD在开放标准方面的领导地位,如开放计算项目(OCP)、Ultra Accelerator Link(UALink?)和Ultra Ethernet Consortium(UEC),有助于确保这种扩展是通过行业合作实现的——使开放、高性能的 fabrics成为向上扩展和向外扩展人工智能集群的可能。这些努力共同定义了为人工智能时代构建的可互操作、节能基础设施的路径。
推动生态系统中的开放创新
基于Meta提交给OCP的ORW规范, Helios使OEM和ODM合作伙伴能够: 采用并扩展 Helios参考设计,加速新AI系统的上市时间。 为现代数据中心需求而设计
更高的扩展吞吐量和HBM带宽相比上一代使模型训练和推理速度更快。 实现人工智能基础设施革命的开放性
对于OEM和ODM, Helios提供了一个现成的、与OCP齐头并进的系统来构建差异化的AI基础设施。 对于客户来说,这意味着更快的部署、更低的风险以及在为AI、HPC和主权项目扩展计算能力方面的更多灵活性。
迈向2026年
基于Meta提交给开放计算项目(Open Compute Project)的ORW规范, Helios体现了AMD对开放和协作创新的承诺——塑造人工智能基础设施的下一阶段,并证明当行业共同建设时,每个人都能加速发展。 脚注
AMD “Helios’’: Advancing Open AI Infrastructure Built on Meta’s 2025 OCP Open Rack for AI Design Designed to meet the realities of AI-scale data centers, the ORW specification defines an open, double-wide rack optimized for the power, cooling, and serviceability demands of next generation AI systems. It represents a foundational shift toward standardized, interoperable, and scalable data center design across the industry. AMD is proud to align with Meta and the Open Compute Project community in advancing this vision through “Helios” — the most advanced rack-scale reference system from AMD, built fully on the ORW open standards. “Helios” extends the AMD philosophy of openness from silicon to system to rack to large-scale clusters, bringing to life the open hardware principles that underpin the ORW specification. “Helios”: Turning Open Standards into Rack-Scale Reality Powered by the AMD CDNA? architecture, each MI450 Series GPU delivers up to 432 GB of HBM4 memory and 19.6 TB/s of memory bandwidth, providing industry-leading capacity and bandwidth for data-hungry AI models. At rack scale, a “Helios” system with 72 MI450 Series GPUs delivers up to 1.4 exaFLOPS of FP8 and 2.9 exaFLOPS of FP4 performance, with 31 TB of total HBM4 memory and 1.4 PB/s of aggregate bandwidth — a generational leap that enables trillion parameter training and large scale AI inference. “Helios” also features up to 260 TB/s of scale-up interconnect bandwidth and 43 TB/s of Ethernet-based scale-out bandwidth, helping ensure seamless communication across GPUs, nodes, and racks. “Helios” delivers up to 17.9× higher performance compared to previous generations1, while offering 50% more memory capacity and bandwidth than NVIDIA’s Vera Rubin system. “Helios” Rack on Display at OCP 2025 Conference Rack-scale systems like “Helios” are essential for the next generation of AI, where performance depends on efficient communication across thousands of accelerators. AMD leadership in open standards such as the Open Compute Project (OCP), Ultra Accelerator Link (UALink?), and Ultra Ethernet Consortium (UEC) helps ensure that this scaling happens through industry collaboration — enabling open, high-performance fabrics for both scale-up and scale-out AI clusters. Together, these efforts define the path toward interoperable, energy-efficient infrastructure built for the AI era. Driving Open Innovation Across the Ecosystem Built on the ORW specification submitted by Meta to OCP, “Helios” enables OEM and ODM partners to: Adopt and extend the “Helios” reference design, accelerating time-to-market for new AI systems. Purpose-Built for Modern Data Center Realities Higher scale-out throughput and HBM bandwidth compared to previous generations enable faster model training and inference. Enabling the Openness in AI Infrastructure Revolution For OEMs and ODMs, “Helios” provides a ready-made, OCP-aligned system to build differentiated AI infrastructure. For customers, it means faster deployment, lower risk, and more flexibility in how they scale compute for AI, HPC, and sovereign initiatives. On Track for 2026 Built on the ORW specifications submitted by Meta to Open Compute Project, “Helios” embodies AMD’s commitment to open, collaborative innovation — shaping the next phase of AI infrastructure and proving that when the industry builds together, everyone accelerates. Footnotes
- “Helios”基于 Meta 贡献的新 Open Rack Wide (ORW) 标准构建,代表了开放 AI 基础设施的重大飞跃。它由 AMD Instinct GPU、EPYC CPU 和 Pensando 网络提供支持,旨在满足下一代 AI 工作负载的需求,具有无与伦比的性能和可扩展性。Meta推出了Open Rack Wide的双宽机柜标准,采用AMD GPU打了个样 Dual Rank Wide的双宽机柜还是搭有18个compute tray,每个compute tray有4个GPU,合计72个GPU;switch tray的数量变为6个,和NVL72 9个 switch tray 不一样。一直奇怪为什么要搞双宽的机柜,但还是72个GPU?和Meta的技术人员交流后发现,Dual rack wide的优势是用空间换方便,更容易更换compute tay和switch tray,直接抽拔就行;更大的空间更利于进行硬件设计,供电和散热的设计比较简单,不像单宽机柜那么紧凑,更加容易维护。 Helios机柜内的GPU纸面参数 Helios的第一个客户是Oracle,预计发货时间为2026年Q3,2027年实现量产。
- 首先,我们推出了突破性的“Helios”机架级平台。这个基于开放的人工智能参考平台建立在 Meta 为 Open Compute Project Foundation 贡献的新 Open Rack Wide (ORW) 标准之上,改变了游戏规则。Helios 由 AMD Instinct GPU、EPYC CPU 和 Pensando 网络组合提供支持,旨在提供下一代 AI 工作负载所需的性能、效率和可扩展性。 在此基础上,我们正在扩大与 甲骨文 的合作伙伴关系,后者将在基于 50,000 个 AMD Instinct MI450 系列 GPU 和 Helios 的 Oracle Cloud 基础设施 (OCI) 上部署 AI 超级集群。
关于我们 北京汉深流体技术有限公司 是丹佛斯中国数据中心签约代理商。产品包括FD83全流量双联锁液冷快换接头(互锁球阀);液冷通用快速接头UQD & UQDB;OCP ORV3盲插快换接头BMQC;EHW194 EPDM液冷软管、电磁阀、压力和温度传感器。在人工智能AI、国家数字经济、东数西算、双碳、新基建战略的交汇点,公司聚焦组建高素质、经验丰富的液冷工程师团队,为客户提供卓越的工程设计和强大的客户服务,支持全球范围内的大批量交付。 公司产品涵盖:丹佛斯液冷流体连接器、EPDM软管、电磁阀、压力和温度传感器及Manifold。 - 针对机架式服务器中Manifold/节点、CDU/主回路等应用场景,提供不同口径及锁紧方式的手动和全自动快速连接器。
About Us Beijing Hansen Fluid Technology Co., Ltd. is an authorized distributor of Danfoss China, specializing in the data center industry. Our product portfolio includes Danfoss FD83 full-flow double-interlock liquid cooling quick-disconnect couplings (equipped with interlocking ball valves); universal liquid cooling quick-disconnect couplings UQD & UQDB; OCP ORV3 blind-mate quick-disconnect couplings BMQC; EHW194 EPDM liquid cooling hoses; solenoid valves; and pressure/temperature sensors. Amid the convergence of strategic trends such as artificial intelligence (AI), China’s national digital economy, the “Eastern Data and Western Computing” initiative, the “dual carbon” goals, and new infrastructure development, we are committed to building a high-caliber, experienced team of liquid cooling engineers. We deliver exceptional engineering design, robust customer service, and support global large-scale deployment. Products: Danfoss liquid cooling fluid connectors, EPDM hoses, solenoid valves, pressure/temperature sensors, and manifolds. - We offer manual and fully automatic quick-disconnect couplings in various calibers and locking mechanisms, suitable for scenarios such as Manifold/Node and CDU/Main Circuit in rack-mounted servers.
|
|