That’s a pretty non-standard H200 configuration. In the regular HGX configuratio...

		tos1 25 days ago \| parent \| context \| favorite \| on: MegaTrain: Full Precision Training of 100B+ Parame... That’s a pretty non-standard H200 configuration. In the regular HGX configurations, a node with 8xH200 has that much CPU DRAM. That makes the title of the paper somewhat arguable imo.