Comments on: CXL is Finally Coming in 2025

By: Tsahi Livnoni

Tsahi Livnoni — Tue, 31 Dec 2024 00:25:01 +0000

Performance is not hurt from the physical distance compared to dram sitting next to the CPU working with the CPU memory controller?

By: spuwho

spuwho — Wed, 25 Dec 2024 22:03:39 +0000

When we buy servers, we buy max RAM, load VMWare, which allocates all of the RAM and then reallocates on demand to any VM.

For containers, it usually is the same. Max RAM, load RHEL and use Kubernetes to allocate appropriately.

The servers are then tasked based on the application requirements.

Swaps or changes are done only as a part of repair, since the host is already fully populated.

There is no interest in distributed memory resourcing because they want to keep any malfunction blast radius small and local to the rack.

By: Rob

Rob — Sat, 21 Dec 2024 22:00:49 +0000

If you’re waiting for 2027 then you’re getting PCIe 7 (assuming not another delay) and the option for optical interconnect, which is useful not just for the lower latency and the (slim ) chance of x32 lanes (because why would optical be x16 or less), it’s also useful for its reach throughout adjoining racks. With 2026 will come MRDIMM too.

By: Name

Name — Fri, 20 Dec 2024 22:27:13 +0000

DDR4 re-use will fail on 1) speed and 2) reliability. DDR5 did so much for memory reliability, especially in large memory servers. Re-using DDR4 en-masse will just run into crashes. I dont think any vendor with CXL 2.0 has hot-swap for individual modules.
CLX 3.0 with PCIe 6 and some RAS features should see adoption, so I agree with the 2027/2028 timeline. Anything before is just tinkering on small scale/proof-of-concept.

LLM training happens in HBM, not slow-speed 3rd tier memory.
People need to understand that just because its in memory, doesn’t mean its instantenous.

Where this might make sense is small nodes with remote-memory beyond the OS and shared within the rack. Exciting technology but I disagree with real-world usage patterns, at scale.

By: Kapdelta Amber

Kapdelta Amber — Fri, 20 Dec 2024 22:11:09 +0000

I see LLM training the primary driver in CXL (2.0+) adoption. As LLM datasets and pace of growth exceed the capacity limits of onboard memory, CXL memory pooling is the natural and more efficient next step.

By: emerth

emerth — Fri, 20 Dec 2024 21:55:33 +0000

@Onsigma Blue – consider the case of a compute cluster running an MPI application. You use Infiniband or 100/200 GbE to implement scatter/gather. Such amounts to a memory to memory copy across the network – bottleneck being IB/Enet or PCIe. Now suppose you could leave the data in place in a CXL memory shelf and just share that address space amongst the compute nodes. You have the same bottleneck but you have half as many copy operations. Seems like a win. —- I’m also not convinced that CXL is the way forward for huge memory single nodes, especially given memory modules keep getting bigger. But in some use cases this seems like a good thing.