Liquid Cooling for an H100/B200 Cluster — Direct-to-Chip vs Immersion

NVIDIA H100 SXM5 draws 700 W on full FP16/BF16 training. B200 SXM follows at 1,000 W. GB200 in a full NVL36 rack 132 kW. These numbers are beyond what air cooling physically handles — not economically, not efficiently, but physically. This article is about what the decision looks like when you're standing in front of an 8-node H100 cluster or planning a first B200 deployment, and why two-phase immersion has died over the last 18 months.

Why air cooling stops above 30 kW/rack

Air cooling works on the principle: heatsink takes heat into copper heatpipes → heatsink fins → fan-driven air → CRAC unit cools the air → repeat. By the heat transfer equation:

Q = ṁ × cp × ΔT

Where Q is the heat to remove (W), ṁ is air mass flow (kg/s), cp is the specific heat capacity of air (1,005 J/kg·K) and ΔT is the difference between outlet and inlet air temperature.

For a 30 kW rack with ΔT = 15 K you need: ṁ = 30,000 / (1,005 × 15) = 1.99 kg/s = ~1,650 m³/h of air

That means fans in the rack + CRAC consuming 3.5–4.5 kW just for ventilation. At 60 kW per rack you'd need double the air — 3,300 m³/h, which already means 90+ dB noise and physical chassis limits (you have nowhere to put more fans). And the ventilation alone consumes 8–11 kW. PUE for pure air at a 60 kW rack realistically drops to 1.8–2.2 — economically unmanageable.

The boundary layer across chip → heatsink junction is another limit: H100 chip junction temperature must stay below 87 °C, ambient air into the CRAC max 27 °C, so ΔT across the whole path is 60 °C. For a 700 W chip across 7 cm² die area the heat flux is 100 W/cm². Air can't transfer this efficiently at reasonable speeds (3–6 m/s) — above 50 W/cm² liquid cooling becomes necessary.

Practical threshold: above 30 kW/rack air cooling loses economic sense, above 50 kW/rack it loses physical sense.

Direct-to-Chip (DTC) — how it actually works

DTC = a cold plate sits directly on CPU and GPU, liquid (typically propylene glycol-water 25:75 or PG 30:70) flows through microchannels in the cold plate. The liquid picks up heat, goes into the CDU (Coolant Distribution Unit), which passes it to a secondary loop — typically facility water that goes into a chiller or dry cooler.

Topology in a real 8-node H100 cluster

8× DGX H100 or 8× HPE Cray EX H100 — each node draws 10.2 kW (8× H100 SXM5 + 2× Sapphire Rapids CPU + DPU + NIC + PSU losses)
Rack TDP: ~85 kW (8 nodes + 2× InfiniBand switch + storage chassis)
DTC coverage: GPU SXM5 + CPU. NIC + DPU stay air-cooled (12–15% of residual heat)
CDU per rack: Asetek RackCDU D2C or CoolIT CHx650, capacity 100–150 kW per CDU unit
Secondary loop: facility water 32–40 °C input → 45–55 °C output (W4 ASHRAE liquid cooling envelope)
Heat rejection: dry cooler in the EU climate (no chiller needed at 32 °C+ water) — free cooling year-round on the right design

Top DTC vendors in 2026

Asetek - RackCDU D2C generation 4 — the broadest DTC ecosystem - Cold plates for H100, B200, GB200, Intel Xeon, AMD EPYC - CDU capacity 80/120/200 kW - Retrofit price: 5,800–7,200 EUR / rack for coldplates + manifold + quick disconnects - CDU price: 18–28k EUR per 120 kW unit

CoolIT Systems - AHx series (Asetek-style) + CHx series (server-level integrated) - For OEM (HPE Cray, Lenovo Neptune, Dell PowerEdge XE9680L) - Stronger OEM integration, fewer retrofit kits - Price: typically bundled in the OEM server quote, +3–4k EUR / server vs air variant

Submer DTC (previously CoolIT Direct-to-Chip) - Originally an immersion vendor, now DTC products too - Outdoor CDU variants with air-cooled rejection (no facility water required) - Price: 6,500–8,000 EUR / rack

Motivair - Specialised in high-density HPC retrofits - ColdPort technology (for HPE Cray EX) - Price: mostly project-based, 30–80k EUR per cluster

A real benchmark — 8-node DGX H100 cluster

In a project we audited (greenfield AI lab near Munich, fully deployed in Q4 2025):

Parameter	Air-cooled DGX H100	DTC Asetek retrofit
Server TDP per node	10.2 kW	10.2 kW
Cooling power per node	1.4 kW (fans + CRAC share)	0.28 kW (residual fans + CDU pump share)
PUE of the whole cluster	1.45	1.08
Annual consumption (8 nodes)	715 MWh	535 MWh
At 0.18 EUR/kWh	128,700 EUR / year	96,300 EUR / year
CAPEX delta	baseline	+52,000 EUR (8 racks DTC + CDU)
Payback	—	~19 months

With B200 and B300 this benefit grows further (higher TDP → higher ratio of heat rejected through liquid vs. air).

Immersion — single-phase reality

Single-phase immersion = the entire server (without fans) submerged in a dielectric fluid (Submer SmartCoolant, ShellLubri DCT 16, Castrol DC iX). Fluid flows through the tank, enters at 35–45 °C, leaves at 45–55 °C.

Capacity and PUE

Submer SmartPodX: 100 kW per tank, footprint ~2 m²
Asperitas AIC24: 50 kW per tank
GRC ICEraQ Quad: 168 kW per quad-tank
PUE: 1.03–1.06 (the best in industry)

Real-world limits

1.Server form factor. Not every server can be put in immersion. An NVIDIA HGX H100 8-GPU baseboard works, but fans must be removed and the thermal interface reapplied with an immersion-specific gap pad (ShellLubri SC2). Some OEMs (Supermicro, Inspur) offer immersion-ready variants; HPE Cray EX does not.

1.Maintenance. Pulling a server from the tank means: power down, wait 10–15 minutes for fluid drip-off, lift the server with a crane (typically 30–50 kg + 8–12 kg of fluid inside), move to the service bench. The operation takes 45–90 minutes instead of 5 minutes for air-cooled hot swap.

1.Cabling. Optical cables with a PVC jacket degrade in some fluids. LSZH (Low Smoke Zero Halogen) or PTFE jackets are required. Cabling cost surcharge 1.5–2× over standard.

1.CAPEX: 25–40k EUR / rack (tank + fluid + CDU secondary loop). For an 8-rack cluster the delta is 200–320k EUR vs DTC.

When single-phase immersion wins

Extreme density. GB200 NVL72 in greenfield — 132 kW in one NVL rack, DTC would need custom CDU sizing, immersion absorbs it natively.
Edge deployment with space constraint. 200 kW IT load in a 20 ft container — air cooling doesn't fit, DTC fits but tightly packed, immersion is the most compact.
Greenfield with a 5+ year horizon. CAPEX delta amortises through OPEX (PUE 1.05 vs 1.08).

For brownfield retrofit of an existing DC with air infrastructure, single-phase immersion is almost always a bad choice — form factor change + service disruption + cabling rebuild + maintenance retraining.

Two-phase immersion — why it died

Two-phase = the fluid transitions into a gas on contact with the hot chip, condenses on a cooling coil above the tank, drops fall back. The most efficient physical heat-transfer principle — passive, no pumps.

In 2020–2023 two-phase was considered SOTA: PUE 1.02, capacity 200–300 kW per tank, no mechanical motion in the primary loop. 3M Novec 7100, 7500, 649 were the flagship fluids — perfluorinated, good thermal properties, environmentally "safe."

The reality 2024–2026: - December 2022: 3M announced end of production of all PFAS (per- and polyfluoroalkyl substances) by end of 2025. - 2023: EU REACH proposal for PFAS restriction (over 10,000 chemicals, including Novec). The final restriction is expected to take effect 2026–2028. - 2024: Novec 7100 price rose from 65 EUR/kg to 180–220 EUR/kg, availability restricted to existing customers. - 2025–2026: No major vendor (Submer, Asperitas, GRC) sells new two-phase systems. Existing installations are maintained, but roadmaps are predominantly single-phase.

Replacement fluids (LiquidCool LCS-CF series, Engineered Fluids ElectroCool) are in pilot phase. For a production greenfield cluster in 2026 two-phase isn't a realistic choice — vendor support, regulatory risk, long-term fluid availability.

CDU sizing — the rule everyone fine-tunes

The CDU (Coolant Distribution Unit) is the heart of a DTC deployment. Heat exchanger between the primary (server) loop and the secondary (facility water) loop. Pumps in the primary loop.

Rule of thumb

Per-rack CDU: 1× CDU per rack on 50–100 kW racks. Single point of failure per rack, but a simple architecture. Asetek RackCDU D2C 50.
Per-row CDU: 1× CDU serves 4–6 racks, 200–500 kW total. Better economic scaling, but failure hits the whole row. Asetek CoolIT CHx650.
Central CDU: 1× CDU for the whole DC (1+ MW). Best economic scaling, but requires sophisticated plumbing with thousands of quick disconnects.

N+1 redundancy

For an AI training cluster running 24/7 where a lost checkpoint costs 8–24 hours of training time, CDU redundancy is mandatory. N+1 means: for a 100 kW load you have 2× 100 kW CDU in active-passive, or 3× 50 kW CDU in active-active load sharing.

CAPEX delta: +35–60% on cooling infrastructure. Payback: the first CDU pump failure (typically cycle ~5–7 years at baseline maintenance).

Leak risk and insurance

The most common client concern: "what if fluid leaks onto the servers?"

The reality after 5 years of DTC deployments (data from two insurers that shared aggregated claims data for EU AI infrastructure): - Leak frequency: 0.3–0.8 incidents per 1,000 rack-years - Damage per incident: typically < 5% of the equipment (quick disconnect prevents catastrophic spill) - Mean repair time: 2–6 hours (drain, replace coupling, refill, test)

For comparison: an air-cooled DC has its own failure modes (CRAC failure, condensate leak from evaporator coils, ventilation stop). Aggregated downtime over 5 years is comparable or lower on a properly designed DTC.

Insurance: Allianz, Munich Re, AXA have had DTC-specific policies since 2023. Premium delta vs air-cooled is ~3–8% in the EU in 2026 — sharply down from the 15–20% in 2020. Required: leak detection sensors (Aquasense, EcoFlux), automatic shut-off valves per rack, drip trays under the CDU, documented emergency response plan.

A 15-minute decision framework

1.
Which GPU and what density?
- H100 SXM5 single rack (8 GPUs, ~85 kW) → DTC mandatory
- B200 8-GPU baseboard (~120 kW per rack) → DTC or immersion
- GB200 NVL36/NVL72 (132–192 kW per rack) → DTC with high-capacity CDU or single-phase immersion
2.
Brownfield retrofit or greenfield?
- Brownfield → DTC (existing servers can be retrofitted or replaced with DTC variants), no tank rebuild
- Greenfield with a 5+ year horizon → consider immersion if density > 100 kW/rack
3.What maintenance can the team handle? DTC maintenance is similar to air-cooled (hot swap remains). Immersion needs 6–12 months of technical upskilling.
4.What is the facility water input? If you have a source < 35 °C (dry cooler in the EU climate, or a small chiller) → DTC is ideal. If you don't → you have to budget for a chiller plant CAPEX.
5.What PUE target? 1.08–1.12 → DTC. 1.03–1.06 → single-phase immersion (with higher CAPEX uplift).
6.Two-phase immersion? Out. Come back in 2 years if non-PFAS alternatives reach production maturity.

A practical tip in the tender process

Demand in the AI cluster cooling quote:

Per-node thermal envelope: GPU junction temp budget, CPU junction temp budget, residual air cooling for NIC/DPU
CDU sizing with 30% reserve for future GPU upgrade (B300, R100)
Facility water specification: input/output temperature, flow rate, water chemistry (pH, conductivity, biofouling protection)
Service runbook: quick disconnect procedure, leak response, CDU pump failover test
Insurance + warranty: how many leak incidents the vendor warranty covers, what insurance premium the supplier recommends

In a DGX H100 deployment audit in 2025 we found a client supplier offering an 80 kW CDU for 85 kW racks. At full load (training Llama 3.3 405B fine-tune) the CDU ran at 106% capacity, water output rose from 50 °C to 62 °C, GPU junction temperature climbed to 84 °C — 3 °C below thermal throttle. Marginal. In a summer peak with warm facility water (39 °C input) it would have throttled. A 30% CDU sizing reserve is non-negotiable.

---

*We do AI cluster design + cooling architecture for 8-node and larger deployments, from H100 through B200 to GB200. If you're planning a cluster above 500 kW IT load, the first design workshop (4 hours) walks through the DTC vs immersion decision for your specific build-out with numerical PUE and CAPEX comparison.*