|
|
英伟达正在考虑对其下一代Blackwell B300 AI GPU采用插座式设计,这一改变将使用户能够自行更换用于AI和HPC应用的GPU。
根据TrendForce的报告,这一设计变更可能会应用于代号为GB300的产品。不过目前这些信息的可信度尚不确定,但供应链中的讨论使得这一可能性值得关注。MoneyDJ的报道指出,英伟达可能出于对AI GPU高负载下的故障率、主板更换成本和冷却问题的考虑,而采用插座式设计,以此替代将GPU焊接到主板上的现行做法。 CLSA的分析师陈烁文提到,英伟达可能从GB200 Ultra开始就为其产品设计GPU插座,并可能采用带有英伟达CPU的四路英伟达GPU设计。然而他提到,采用插座式设计将会增加电源和冷却的挑战,而不是减轻它们,这与一些报告的预期相反。目前,英伟达的GB200平台对CPU和GPU都使用BGA封装,对于即将到来的B200 Ultra更新版本是否会带来改变,目前尚不确定。 英伟达的数据中心产品线包括了多种GPU和CPU平台,如A100、H100、B100/B200以及GH100、GB200等。这些产品的设计和封装方式对于其性能和可维护性有着重要影响。标准的CPU插座虽然易于维修和升级,但在服务器环境中,它们比BGA封装或SXM/OAM模块占用更多空间,并带来更严格的电源和热管理限制。 目前,英伟达的大多数SXM模块(Scalable Accelerator Module,SXM模块是一种专为数据中心设计的高性能、高功率GPU模块,支持通过专用的SXM插槽进行连接)由富士康制造,而从模块转向插座设计可以降低成本,但也会限制性能。 英伟达已经正式推出了其B200 GPU,这是一款功率超过1000W的高性能图形处理器。B200 GPU被设计用于GB200主板,这些主板有代号Ariel和Bianca两种版本。Ariel主板配备了一个Ariel CPU和一个Blackwell GPU,而Bianca主板则配备了一个Grace CPU和两个Blackwell GPU。这些配置使得B200 GPU能够以BGA形式出现,直接焊接到主板上,以实现更高的性能和能效。英伟达还推出了Umbriel GPU主板,这些主板支持多达八个B200(1000W)和B100(700W)GPU,采用SXM模块形式。 此外,根据SemiAnalysis的说法,英伟达还计划推出代号为Miranda和Oberon的GB200平台。这些新平台预计将提供更高的性能,包括支持PCIe 6.0和800G网络连接,以及更高的TDP(Thermal Design Power),这预示着它们将能够提供更多的计算能力,同时也需要更复杂的冷却方案。 英伟达已经推出了基于Hopper架构的H100和H200插入式卡,但尚未宣布任何带有基于Blackwell的GPU的插入式卡。不过,有非官方信息显示,英伟达正在准备代号为B200A的产品,这是一款基于单体B102处理器的产品,通过TSMC的CoWoS-S封装技术连接了四个HBM3E内存堆栈,与之前的B100/B200设计不同。 B200A可能采用多种形式因素,包括SXM模块设计和插入式卡形式因素,甚至可能是插座式设计。这种设计上的灵活性可能预示着英伟达在数据中心GPU领域的未来发展方向。
NVIDIA Blackwell Ultra “B300” AI GPUs For GB300 Servers Might Utilize A Socketed Design NVIDIA's Blackwell Ultra B300" GPUs may introduce a docketed design on GB300 servers, which will make maintenance and upgrades easier. With a socket-based design, NVDIDIA's Blackwell Ultra "B300" AI GPUs could be utilized just like CPUs
However, this could be the last series featuring the onboard design as several reports suggest that NVIDIA could move to a different design with the Blackwell B300 "Ultra" GPUs for GB300 servers. As per MoneyDJ and Economic Daily News (via Trendforce), the B300 GPUs could feature a socket-based design, which will allow the users to install or uninstall the GPUs from the motherboards. The socketed approach on NVIDIA's Blackwell Ultra "B300" AI GPUs approach is said to simplify the manufacturing process for NVIDIA and can benefit several companies, especially the Taiwan-based Foxconn and LOTES, which produce interconnect components and sockets. The current Blackwell GPUs are soldered directly to the motherboard and with the transition to a socket-type design, the B300 GPUs could be removed from the motherboard just like CPUs. With this transition, there will be several benefits including improved yield rate and flexible production as the GPU won't have to be soldered into the socket and NVIDIA wouldn't need to rely on Surface Mount Technology. Moreover, the process will simplify the maintenance and after-sales services as the whole motherboard won't have to be replaced in cases of GPU-related problems. As a result, the upgrades could reduce the overall time for downtime when the GPUs are changed and it will help companies to offer more reliable servers to their customers. However, it is expected that the new socket design will introduce some performance reduction as this will introduce some higher latency. Nonetheless, if maintenance, upgrades, and better yields are improved, the trade-off will be worth the design transition. Another important change with the B300 is the adoption of FP4(Floating Point 4), which benefits inference. Inference is how the trained models make predictions on data and serves as a crucial aspect of AI computation. The B200 is already exceptional in AI workloads and has already been deployed by various companies. Meanwhile, the B300 "Blackwell Ultra" is expected to enhance its performance significantly but surprisingly, it won't be the first to feature the socket-based design as AMD already introduced this design with its MI300A chips introduced in 2023.
By Anton Shilov published 3 days ago GB200 Grace Blackwell Superchip MoneyDJ reports that considering the failure rates of AI GPUs under high loads, the replacement costs of motherboards, and cooling challenges, Nvidia and other AI GPU designers might consider using socket designs for their next generation of GPUs instead of soldering GPUs to motherboards. EDN cites Chen Shuowen, an analyst with CLSA, as saying that based on supply chain checks, Nvidia has been designing GPU sockets for its products, possibly starting with the GB200 Ultra. Chen reportedly mentioned a 4-way Nvidia GPU design with one Nvidia CPU. Neither of the reports mentions anything called GB300, so TrendForce has added this part, possibly based on some additional chatter. Several things about the reports should be noted. Socketed designs would instead add to power and cooling challenges rather than help solve them, so the first report is inaccurate. The most power-hungry GPUs usually use BGA packaging. A 4-way Blackwell GPU with one CPU motherboard does not look extraordinary, considering that with DGX servers, we see an 8-way GPU baseboard and a 2-way CPU motherboard, yet such a design looks incredible. Nvidia's data center nomenclature divides the company's GPU (A100, H100, B100/B200) and Grace CPU + GPU platforms (GH100, GB200). For now, GB200 platforms use BGA packaging for both CPU and GPU; we are not sure something has to change with the B200 Ultra refresh, especially with the possible GB200 Ultra refresh sometime in the second half of the year. We all love standard CPU sockets for their easy repairs and upgradeability. But in servers, they take up more space and have more power and thermal constraints than BGA packages or SXM/OAM modules. While the modules provide reparability, the process might vary depending on the specific motherboard design, and removing an OAM/SXM module requires careful handling, so they are not as good as sockets.
关于我们 北京汉深流体技术有限公司是丹佛斯中国数据中心签约代理商。产品包括FD83全流量自锁球阀接头,UQD系列液冷快速接头、EHW194 EPDM液冷软管、电磁阀、压力和温度传感器及Manifold的生产和集成服务。在国家数字经济、东数西算、双碳、新基建战略的交汇点,公司聚焦组建高素质、经验丰富的液冷工程师团队,为客户提供卓越的工程设计和强大的客户服务。 公司产品涵盖:丹佛斯液冷流体连接器、EPDM软管、电磁阀、压力和温度传感器及Manifold。 - 针对机架式服务器中Manifold/节点、CDU/主回路等应用场景,提供不同口径及锁紧方式的手动和全自动快速连接器。
|
|