|
|
|
2024 OCP Global Summit: Power and Cooling Contributions Rise to the Top
October 29, 2024 | Lucas Beran
The 2024 OCP Global Summit theme was “From Ideas to Impact,” but it could have been “AI Ideas to AI Impact.” Accelerated computing infrastructure was front and center starting with the keynote, on the exhibition hall floor, and in the breakout sessions. Hyperscalers and the ecosystem of suppliers that support them were eager to share what they’ve been working on to bring accelerated computing infrastructure and AI workloads to the market, at scale. As you may expect with anything AI related, it drew a crowd – Over 7000 attendees participated in the event in 2024, a significant increase from ~4500 last year. Throughout the crowds, sessions, and expo hall, three key themes stood out to me: Power and cooling designs for NVIDIA GB200 NVL Racks, an explosion of interest in liquid cooling, and sustainability’s presence among the AI backdrop.
Powering and Cooling NVIDIA GB200 NVL Racks
It's well known that accelerated computing infrastructure significantly increases rack power densities. This has posed a significant challenge for traditional data center designs, where compute and physical infrastructure are developed and deployed in relative isolation. Deploying accelerated computing infrastructure has forced a rethink, where these boundaries are removed to create an optimized end-to-end system to support next generation “AI factories” at scale. The data center industry is acutely aware this applies to power and cooling, with notable announcements and OCP contributions from industry leaders in how they are addressing these challenges:
- Meta kicked off the keynote by announcing Catalina, a rack-scale infrastructure design based on NVIDIA GB200 compute nodes. This design increased the power requirements from 12 – 18 kW/rack to 140 kW/system. To no surprise, Catalina utilizes liquid cooling.
- NVIDIA contributed (open-sourced) elements of its GB200 NVL72 design, including a powerful 1400-amp bus bar for distributing power in the rack, and many liquid cooling contributions related to the manifold, blind mating, and flow rates. Lastly, NVIDIA recognized a new ecosystem of partners focused on the power and cooling infrastructure, highlighting Vertiv 's GB200 NVL72 reference architecture, which enables faster time to deployment, utilizes less space, and increases cooling energy efficiency.
- Microsoft emphasized the need for liquid cooling for AI accelerators, noting retrofitting challenges in facilities without a chilled water loop. In response, they designed and contributed a custom liquid cooling heat exchanger, which leverages legacy air-based data center heat rejection. This is what I would refer to as air-assisted liquid cooling (AALC), more specifically, an air-assisted coolant distribution unit (CDU), which is becoming increasingly common in retrofitted accelerated computing deployments.
- Microsoft also announced a collaborative power architecture effort with Meta , named Mt. Diablo based on a 400 Vdc disaggregated power rack, that will be contributed to the OCP soon. Google also highlighted the potential use of 400 Vdc for future accelerated computing infrastructure.
Data Center Liquid Cooling Takes Center Stage
Liquid cooling was among the most discussed topics at the summit, mentioned by nearly every keynote speaker in addition to dozens of breakout sessions dedicated to its growing use in compute, networking, and facility designs. This is justified from my perspective, as Dell'Oro Group previously highlighted liquid cooling as a technology going mainstream creating a $15 B market opportunity over the next five years. Furthermore, the ecosystem understands that not only is liquid cooling a growing market opportunity, but a critical technology to enable accelerated computing and the growth of AI workloads at scale.
There was not just liquid cooling talk, but partnerships and acquisitions leading up to and during the global summit that further cemented the critical role data center liquid cooling will play in industries' future. This was highlighted in the following announcements:
- Jabil acquired Mikros Technologies : Kicking off two weeks of big announcements, Jabil's acquisition of Mikros brings together Mikros's expertise in liquid cooling cold plate technology, engineering and design with Jabil's manufacturing scale. This appears to position Mikros's technology as a high-volume option for hyperscale end-users and the greater data center industry in the near future.
- Jetcool announced facility CDU, Flex partnership : Jetcool, most known for their air-assisted liquid cooling infrastructure packaged in single servers, introduced a facility CDU (liquid-to-liquid) to keep pace with the market's evolution towards purpose-built AI factories. The partnership brings together a technology specialist with a contract manufacturer to enable the coming scale needed to support hyperscale end-users and the greater data center industries' liquid cooling needs.
- Schneider Electric acquired Motivair : On the Summit's final day, Schneider Electric announced its $1.13B acquisition of Motivair. This move, following prior partnerships and organic CDU developments, expands Schneider's high-density cooling portfolio. This now gives Schneider a holistic power and cooling portfolio to support large-scale accelerated computing deployments, a capability previously exclusive to Vertiv, albeit at a high cost for Schneider.
Sustainability Takes a Back Seat but Is Still Very Much Part of the Conversation
While sustainability did not dominate the headlines, it remained a recurring theme throughout the summit. As AI growth drives massive infrastructure expansion, sustainability has become a critical consideration in data center designs. OCP's CEO George Tchaparian characterized sustainability's role alongside AI capex investments best, “Without sustainability, it's not going to sustain.” Other highlights include:
- OCP announced a new alliance with Net Zero Innovation Hub, an organization focused on net-zero data center innovation in Europe. Details on the alliance were sparse, but more details are expected to emerge on this partnership at the 2025 OCP EMEA Regional Summit.
- Google shared a collaboration with Meta, Microsoft, and Amazon on green concrete. Most impressively, this collaboration began with a roadmap around the time of last year's OCP Summit, which resulted in a proof-of-concept deployment in August 2024, reducing concrete emissions by ~40%.
- A wide range of other sustainability topics were discussed. Improvements in cooling efficiency, water consumption, heat reuse, clean power, lifecycle assessment, and metrics to measure and track progress related to data center efficiency and sustainably were all prevalent.
Conclusion: Data Center Power and Cooling is Central to the Future of the Data Center Industry
The 2024 OCP Global Summit left me as confident as ever in the growing role data center power and cooling infrastructure has in the data center industry. It's not only improvements to existing technologies but the adoption of new technologies and facility architectures that have emerged. The event's theme, “From Ideas to Impact,” serves as a fitting reminder of how AI is reshaping the industry, with significant implications for the future. As we look ahead, the question isn't just how data centers will power and cool AI workloads, but how they'll do so sustainably, efficiently, and at an unprecedented scale.
2024 年 OCP 全球峰会:电力和制冷贡献跃居榜首
10月 29, 2024|卢卡斯·贝兰
2024 年 OCP 全球峰会的主题是“从想法到影响”,但也可能是“AI 想法到 AI 影响”。从主题演讲开始,加速计算基础设施就占据了前沿和中心位置,在展厅地板和分组会议中。超大规模企业和支持他们的供应商生态系统渴望分享他们一直在努力的工作,以大规模地将加速计算基础设施和 AI 工作负载推向市场。正如您所期望的那样,它与 AI 相关的任何事情都吸引了很多人——7000 年有超过 2024 名与会者参加了该活动,比去年的 ~4500 人显着增加。在人群、会议和展厅中,三个关键主题让我印象深刻:NVIDIA GB200 NVL 机架的电源和冷却设计、对液体冷却的兴趣爆炸式增长,以及可持续性在 AI 背景中的存在。
为 NVIDIA GB200 NVL 机架供电和冷却
众所周知,加速计算基础设施显著提高了机架功率密度。这给传统的数据中心设计带来了重大挑战,因为在传统数据中心设计中,计算和物理基础设施的开发和部署相对隔离。部署加速计算基础设施迫使人们重新思考,消除这些界限,创建一个优化的端到端系统,以支持大规模支持下一代“AI 工厂”。数据中心行业敏锐地意识到这适用于电源和冷却,行业领导者在如何应对这些挑战方面发布了值得注意的公告和 OCP 贡献:
Meta?在主题演讲中宣布了 Catalina,这是一种基于 NVIDIA GB200 计算节点的机架级基础设施设计。这种设计将功率要求从 12 – 18 kW/机架提高到 140 kW/系统。毫不奇怪,Catalina 使用液体冷却。
NVIDIA?贡献了其 GB200 NVL72 设计的(开源)元素,包括用于在机架中分配电力的强大 1400 安培母线,以及与歧管、盲插和流速相关的许多液体冷却贡献。最后,NVIDIA 认可了专注于电源和冷却基础设施的新合作伙伴生态系统,重点介绍了?Vertiv?的 GB200 NVL72 参考架构,该架构可以加快部署时间,利用更少的空间,并提高冷却能效。
Microsoft?强调了 AI 加速器对液体冷却的需求,并指出在没有冷冻水回路的设施中进行改造的挑战。作为回应,他们设计并贡献了一种定制的液体冷却热交换器,它利用了传统的基于空气的数据中心排热。这就是我所说的空气辅助液体冷却 (AALC),更具体地说,空气辅助冷却剂分配单元 (CDU),它在改造的加速计算部署中越来越普遍。
Microsoft?还宣布与?Meta?合作进行电源架构工作,名为 Mt. Diablo,基于 400 Vdc 分解式电源机架,该工作将很快用于 OCP。Google?还强调了 400 Vdc 在未来加速计算基础设施中的潜在用途。
数据中心液体冷却成为焦点
液体冷却是峰会上讨论最多的话题之一,除了数十场专门讨论液体冷却在计算、网络和设施设计中日益增长的应用的分组会议外,几乎每个主题演讲者都提到了液体冷却。从我的角度来看,这是有道理的,因为 Dell'Oro Group 之前强调液体冷却是一项成为主流的技术,在未来五年内创造了?15 美元 B 的市场机会。此外,该生态系统明白,液体冷却不仅是一个不断增长的市场机会,而且是实现加速计算和大规模 AI 工作负载增长的关键技术。
不仅有液体冷却的演讲,还有在全球峰会之前和期间的合作伙伴关系和收购,进一步巩固了数据中心液体冷却将在行业未来发挥的关键作用。以下公告强调了这一点:
- 捷普收购 Mikros Technologies:捷普对 Mikros 的收购开启了为期两周的重大公告,将 Mikros 在液体冷却冷板技术、工程和设计方面的专业知识与 Jabil 的制造规模相结合。这似乎将 Mikros 的技术定位为不久的将来超大规模最终用户和更大的数据中心行业的大批量选择。
- Jetcool 宣布与工厂 CDU 建立合作伙伴关系:Jetcool 以其封装在单个服务器中的空气辅助液体冷却基础设施而闻名,它推出了设施 CDU(液转液),以跟上市场向专用 AI 工厂发展的步伐。该合作伙伴关系将技术专家与合同制造商结合在一起,以实现支持超大规模最终用户和更大的数据中心行业液体冷却需求所需的未来规模。
- 施耐德电气收购了 Motivair:在峰会的最后一天,施耐德电气宣布以 $1.13B 收购 Motivair。继先前的合作伙伴关系和有机 CDU 开发之后,此举扩展了施耐德的高密度冷却产品组合。现在,这为 Schneider 提供了全面的电源和冷却产品组合,以支持大规模加速计算部署,这是 Vertiv 以前独有的功能,尽管 Schneider 的成本很高。
可持续发展退居二线,但仍然是对话的重要组成部分
虽然可持续性并未占据头条新闻,但它仍然是整个峰会中反复出现的主题。随着 AI 的发展推动了基础设施的大规模扩展,可持续性已成为数据中心设计中的关键考虑因素。OCP 首席执行官 George Tchaparian 最能描述可持续发展与 AI 资本支出投资的作用,“没有可持续性,它就不会持续。其他亮点包括:
OCP 宣布与 Net Zero Innovation Hub 建立新联盟,该组织专注于欧洲的净零数据中心创新。有关该联盟的细节很少,但预计在 2025 年 OCP EMEA 区域峰会上将提供有关该合作伙伴关系的更多细节。
Google 与 Meta、Microsoft 和 Amazon 在绿色能源方面进行了合作。最令人印象深刻的是,这项合作始于去年 OCP 峰会前后的路线图,并于 2024 年 8 月进行了概念验证部署,将碳排放量减少了 ~40%。
还讨论了广泛的其他可持续发展主题。冷却效率、水消耗、热再利用、清洁能源、生命周期评估以及衡量和跟踪与数据中心效率和可持续性相关的进展的指标方面的改进都很普遍。
结论:数据中心电源和冷却是数据中心行业未来的核心
2024 年 OCP 全球峰会让我对数据中心电力和冷却基础设施在数据中心行业中发挥着越来越大的作用充满信心。这不仅是对现有技术的改进,而且是采用已经出现的新技术和设施架构。该活动的主题是“从想法到影响”,恰当地提醒人们人工智能如何重塑行业,并对未来产生重大影响。展望未来,问题不仅在于数据中心如何为 AI 工作负载提供动力和冷却,还在于它们如何以可持续、高效和前所未有的规模实现这一点。
关于我们
北京汉深流体技术有限公司是丹佛斯中国数据中心签约代理商。产品包括FD83全流量自锁球阀接头;液冷通用快速接头UQD & UQDB;OCP ORV3盲插快换接头BMQC;EHW194 EPDM液冷软管、电磁阀、压力和温度传感器及Manifold的生产。在国家数字经济、东数西算、双碳、新基建战略的交汇点,公司聚焦组建高素质、经验丰富的液冷工程师团队,为客户提供卓越的工程设计和强大的客户服务。
公司产品涵盖:丹佛斯液冷流体连接器、EPDM软管、电磁阀、压力和温度传感器及Manifold。
未来公司发展规划:数据中心液冷基础设施解决方案厂家,具备冷量分配单元(CDU)、二次侧管路(SFN)和Manifold的专业研发设计制造能力。
- 针对机架式服务器中Manifold/节点、CDU/主回路等应用场景,提供不同口径及锁紧方式的手动和全自动快速连接器。
- 针对高可用和高密度要求的刀片式机架,可提供带浮动、自动校正不对中误差的盲插连接器。以实现狭小空间的精准对接。
- 基于OCP标准全新打造的液冷通用快速接头UQD & UQDB ;OCP ORV3盲插快换接头BMQC , 支持全球范围内的大批量交付。
|