MLPerf® Training V4.0:
LLama2 70B-LoRA
Result verified by MLCommons Association.

Llama2 70B-LoRA is a large language model (LLM) that uses a technique called Low-Rank Adaptation (LoRA) to fine-tune its parameters on specific tasks or domains. This allows for efficient customization of the model without the need to retrain the entire 70 billion parameter model from scratch. The Llama2 70B-LoRA model can be used for various tasks, such as natural language generation, translation, and question answering, with improved performance on specific tasks or domains compared to the base model.
Our verified MLPerf® Training V4.0 submission.
| Total Energy Consumed (Joules) - AC | 12,127,904 J | 
|---|---|
| Total Energy Consumed (Joules) - DC | 11,769,962 J | 
| Total Energy Consumed (kWh) - DC | 3.27 kWh | 
| Total Time (decimal) | 29.10069 | 
| Total Time (MM:SS) | 29:06 | 
| Ave Energy Per Node (Network1 ) - DC | n/a | 
| Ave Energy Per Node (Node2 ) - DC | 6.741 kW (DC) | 
| Ave Energy Per Node (Combined) - DC | 6.741 kW (DC) | 
| Total Energy Consumed (Joules) - AC | 20,644,275 J | 
|---|---|
| Total Energy Consumed (Joules) - DC | 20,381,549 J | 
| Total Energy Consumed (kWh) - DC | 5.66 kWh | 
| Total Time (decimal) | 5.40279 | 
| Total Time (MM:SS) | 05:24 | 
| Ave Energy Per Node (Network1 ) - DC | 2.013 kW (DC) | 
| Ave Energy Per Node (Node2 ) - DC | 5.847 kW (DC) | 
| Ave Energy Per Node (Combined) - DC | 7.859 kW (DC) | 
| Total Energy Consumed (Joules) - AC | 46,574,813 J | 
|---|---|
| Total Energy Consumed (Joules) - DC | 45,837,402 J | 
| Total Energy Consumed (kWh) - DC | 12.73 kWh | 
| Total Time (decimal) | 2.10304 | 
| Total Time (MM:SS) | 02:06 | 
| Ave Energy Per Node (Network1 ) - DC | 0.931 kW (DC) | 
| Ave Energy Per Node (Node2 ) - DC | 4.745 kW (DC) | 
| Ave Energy Per Node (Combined) - DC | 5.676 kW (DC) | 
Power consumption at the data center level. This is not in scope within MLPerf® Training V4.0 and has not been peer-reviewed by MLCommons members.
| Singapore 2 PUE | 1.10 | 
|---|---|
| Extrap. TTL energy consumed | 3.596 KWh | 
| Net CO2 emitted | 1.46 kg | 
| Singapore 2 PUE | 1.10 | 
|---|---|
| Extrap. TTL Energy Consumed | 6.228 KWH | 
| Net CO2 emitted | 2.60 kg | 
| Singapre 2 PUE | 1.10 | 
|---|---|
| Extrap. TTL Energy Consumed | 14.006 KWh | 
| Net CO2 emitted | 5.84 kg | 
Footnotes: see Disclaimer & Footnotes section below.
The as-submitted results for MLPerf® Training V4.0 for H100 SXM systems.
Verified Power results for MLPerf® Training V4.0 submitters. For more detail on how this was captured, refer to sections above
Parameters used in the benchmark run (across all nodes)
| Test Name | LLAMA2 70B-LORA | 
|---|---|
| Type | Large Language Model | 
| Framework | Pytorch/NEMO | 
| Dataset | SCROLLS govtReport | 
| Submission Date | 10/5/2024 | 
| Publishing Forum | MLPerf® Training V4.0 | 
| Peer reviewed? | YES - MLCOMMONS | 
Compute node hardware specifications.
| Instance Type | H100 80GB SXM | 
|---|---|
| CPU | Xeon 8462Y+, 128vCPU | 
| Memory | 2,048 GB DDR4 | 
| Network Cards (RDMA) | 8 x ConnectX-7 | 
| RDMA | YES | 
| NVLINK | 900 GBPS | 
Test location & environmental conditions present at test.
| Region | Singapore | 
|---|---|
| Availability Zone | Singapore 2 (SIN02) | 
| HyperCube Immersion | Yes | 
| Energy Grid Carbon Intensity | 0.405 kg CO22-e/kWh (2022) | 
| HyperCube design pPUE | 1.02 | 
| Facility including HyperCube design PUE | 1.10 | 
External storage cluster used in benchmark run.
| Type | WEKA | 
|---|---|
| Disks | NVME | 
Compute and storage network details.
| Compute Fabric | InfiniBand NDR 200 | 
|---|---|
| Contention | Min. 1,600 GBPS uncontended | 
| Storage Fabric | Ethernet | 
| Contention | Peak 200 GBPS | 
Overhead allocation of power for networking to test results¹
| 1 Node | n/a | 
|---|---|
| 8 Nodes | 16.10 KW | 
| 9 Nodes | 16.10 kW | 
| 64 Nodes | 59.57 kW | 
Disclaimer & Validity - MLCommons
MLPerf® Training v4.0 Closed LLama2 70B-LoRA offline. Retrieved from https://mlcommons.org/benchmarks/training/ 12 June 2024. Result verified by MLCommons Association. The MLPerf® name and logo are registered and unregistered trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information. 
Footnotes
- Network power allocation: To accurately capture the total power envelope of multi-node tests, it is appropriate to measure the power consumption of networking equipment that is associated with the test. For MLPerf® Training V4.0, a method of proportional allocation of total network power was adopted by the members. For our submission, the following methodology was used and accepted: SMC has a 'Scalable Unit' ('SU') of 6 nodes in 3 bays with a pair of nodes in each bay. Each SU consists of 2 QM9790 64-port Leaf switches, Leaf A and Leaf B. Four out of eight CX-7 NICs in each of the six nodes connect to Leaf A, and the other four CX-7s connect to Leaf B. 16 ports from each Leaf switch connect to 16 Spine switches Spine 01 – Spine 16. Power consumption per switch: 1610W. Port Utilization for leaf switches: (16+4*Num_used_nodes)/64 ports (16 upstream, 24 downstream). Port utilization for spine switches: 2*Number of SUs (each SU has 2 leaf switches each of which connects to 1 port of a spine switch).
 Power consumption by cluster size: 8 nodes - 2SUs; 6N in SU1, 2N in SU2 - Total interconnect power 16,100 W. 9 nodes - 2SUs; 6N in SU1, 3N in SU2 - Total interconnect power 16,100 W. 64 nodes - 11 SUs; 6N in SU1-10, 4N in SU11 - Total interconnect power 59,570 W
 For each cluster, based on the size, power has been apportioned pro-rata by the time taken to complete the benchmark test. For example, a hypothetical 8 node test that took 15 minutes to complete has its power apportioned as follows: (15/60) * 16,600 = 4,025 W. Measurements: 1 Watt equals 1 Joule per second. 1 kWh equals 3,600,000 Joules.
- Node power: SMC records and submits DC power via monitoring of power rectifiers that are upstream of the node. SMC nodes are immersed in HyperCube immersion tanks, arranged in 3 bays, each containing 2 H100 SXM nodes (16 GPUs). Each node is powered by a common 54V bus per bay, with each bus supporting 2 nodes. Each bus is connected to two HyperCube powershelves, equipped with 4 x 5.0 kW power rectifiers. Each rectifier reports statistics, including input/output voltage, current, power and temperature over a CAN bus. This information is available via an HTTP REST endpoint, which is polled and logged at 1 second intervals. 
 For MLPerf® Training V4.0 results. All runs for all tests were polled and written to an SQLite database. For MLCommons members, the raw data collected was submitted along with our results and verified as accurate and true. The total Joules submitted to MLCommons and displayed on this page include a verified proportion for overhead allowance of networking power, as calculated above.
 In instances where Watt, Kilowatt or Kilowatt-hour are displayed, a relevant conversion from Joules and time to complete has been performed.
- Data Center PUE & CO₂ calculations: Data center PUE calculations are not in the scope of the MLPerf Training V4.0 results, and have not been peer-reviewed. Data relating to pPUE and PUE is presented based on our measurements, which include calculations using industry-standard methodology. HyperCube pPUE: Partial PUE considers the load within the HyperCube and does not consider any of the supporting infrastructure to support the data center. The observed pPUE of 1.02 is representative of the system's core efficiency gains through the elimination of the fans and chilled water infrastructure to support the main HPC heat load. The Partial PUE includes cooling water pumps, fluid pumps and cooling tower fan energy. Extrapolated PUE: This grossed-up calculation embodies total facility losses to operate a HyperCube data hall, including apportionment of total facility power. An extrapolated PUE of 1.10 is calculated as the SIN02 deployment approaches full capacity. These real-world values were recorded at the time of conducting an assessment based on Greenmark standards. The estimated values are extrapolated from this data and incorporate electrical efficiencies which will be regained with the increased loading of UPS and TX.
 Carbon (CO2) calculations: extrapolation of the direct power consumed during the relevant benchmark, including net power required at the data center level, multiplied by the energy grid carbon coefficient present during the test.