Accurate Power and Energy Measurement on Kepler-based Tesla GPUs. Martin Burtscher Department of Computer Science

Size: px
Start display at page:

Download "Accurate Power and Energy Measurement on Kepler-based Tesla GPUs. Martin Burtscher Department of Computer Science"

Transcription

1 Accurate Power and Energy Measurement on Kepler-based Tesla GPUs Martin Burtscher Department of Computer Science

2 Introduction GPU-based accelerators Quickly spreading in PCs and even handheld devices Widely used in high-performance computing Power and energy efficiency Heat dissipation is a problem Electric bill and battery life are of growing concern Exascale requires 50x boost in performance per watt Important research area Need to develop techniques to reduce power and energy Have to be able to measure power/energy of programs Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 2

3 GPU Power Sensors Hardware High-end compute GPUs include power sensors For example, K20/K40 Tesla cards have built-in sensor These cards are the target of this talk Software Can query sensor with NVIDIA Management Library Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 3

4 Problems Power sensor data behaves strangely Running the same kernel twice yields different energy First launch: 114 J, second launch: 147 J (29% more energy) Running a kernel 2x as long more than doubles energy 1x input: 732 J, 2x input: 1579 J (8% above doubling) Power sensor sampling rate varies greatly Ranges from ms to 130 ms (7.7 Hz to 3760 Hz) Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 4

5 Methodology Hardware Two K20c, two K20m, two K20X, and two K40m GPUs Measurement Query power and time in loop on idle CPU core Test code Compute-intensive regular n-body kernel Constant computation rate of over 2 TFlops on a K20c No data dependences; vary n to adjust kernel runtime Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 5

6 Expected Power Profile Kernel starts executing Kernel stops executing Measurement loop runtime GPU idle power Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 6

7 Measured Power Profile Macroscopic phenomena 5s 3s 4s Power ramps up slowly Switch to step shape Power ramps down slowly Idle power reached Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 7

8 Energy = Area Under Power Curve Unclear how big energy is Missing energy? Delayed energy? Integrate to where? Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 8

9 Ramp-up Behavior of 2 Short Runs Ramp down doesn t follow 2 nd run starts higher but also follows curve Short run same as longer run Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 9

10 Ramp-down Behavior of Several Runs Measured Power [W] t 2 t 3 t 4 Shape depends on power at t 2 Power increases after kernel done Driver lowers power level Shape always the same Steps down every second Shifted Runtime [s] Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 10

11 Sampling Interval Lengths 160 t 1 t 2 t 3 t Measured Power [W] Very long interval Wide range of intervals Driver activity can prevent sampling Short intervals Sampling Interval [ms] Runtime [s] Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 11

12 Sampling Interval Lengths (zoomed-in) Measured Power [W] Identical values Sampled power only ever changes after long interval Very long interval Sampling Interval [ms] 20 Many short intervals Runtime [s] Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 12

13 Correcting the Measurements Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 13

14 Sampling Frequency Eliminate redundant samples Only sample once every 15 ms (66.7 Hz) Cannot accurately measure kernels under ~150 ms Account for the variation in interval length Use high-resolution time stamps t 1 t Example: energy from t 1 to t 4 Dotted (fixed intervals): 1205 J Solid (variable intervals): 1066 J 13% discrepancy Measured Power [W] Runtime [s] Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 14

15 True Power Sensor hardware Seems to asymptotically approach true power Reminiscent of capacitor charging True instant power P true is a function of the slope of the power profile dp/dt and the power measured by the sensor P sensor P true = P sensor + C dp sensor /dt Capacitance of sensor C = s on all tested K20 GPUs Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 15

16 Back-calculated from Expected Profile Minimized absolute errors to determine C Capacitor function matches measured values perfectly Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 16

17 Corrected Power Profile 160 t 1 t 2 t 3 Power [W] Wobbles due to sampling errors Corrected profile matches expected rectangular profile Active idle power level Time [s] Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 17

18 Correction of 2 Short Runs t 1a t 2a t 1b t 2b t 3b Corrected power profile matches expected profile Power [W] Time [s] Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 18

19 Second K20c GPU t 1 t 2 t Power [W] Identical to original K20c Time [s] Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 19

20 K20m GPU 180 t 1 t 2 t 3 Power [W] Similar profile but higher power level Time [s] Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 20

21 K20X GPU Power [W] t 1 t 2 t 4 Profile is good, no correction needed! Huge 600 ms gap Time [s] Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 21

22 K40m GPU K40m again requires correction Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 22

23 Application to Full CUDA Program Implementation of Barnes Hut n-body algorithm Taken from LonestarGPU benchmark suite Contains multiple regular and irregular kernels Highly optimized, but still suffers from load imbalance, divergence, and uncoalesced accesses Main kernel is regularized (warp-based) NASA/JPL-Caltech/SSC Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 23

24 Barnes Hut Power Profile (1 Step) Slow then fast drop-off Wave in profile Original profile is hard to interpret Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 24

25 Barnes Hut Power Profile (Kernels) Slow then fast drop-off Wave in profile Original profile is hard to interpret Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 25

26 Corrected Barnes Hut Power Profile a b cd Corrected profile reveals important info ef Power [W] Two similar irreg. kernels One more irreg. kernel Regularized main kernel Decrease due to load imbal. Very short regular kernel Time [s] Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 26

27 K20Power Tool Output Corrected profile and corresponding active energy Features Computes instant power using capacitor formula Employs high-resolution time steps Samples at true frequency of 66.7 Hz Dissemination Open source, research license Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 27

28 Marcher System Tool will be part of Marcher system at Texas State NSF-funded green computing infrastructure Marcher is a power-measurable cluster system 832 general-purpose cores 12,000 GPU and MIC cores 1.2 TB of DDR3 with power throttling and scaling 50 TB of hybrid storage with hard drives and SSDs Component-level power measurement tools (e.g., CPU, DRAM, Disk, GPU, Xeon Phi) Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 28

29 Summary Correctly measuring K20/K40 power and energy Sample at 66.7 Hz and include time stamps Compute true power with presented formula Use neighboring power samples to approximate slope Compute true energy by integrating true power Over intervals where power is above active idle K20Power tool Software tool that implements this methodology Paper at Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 29

30 Acknowledgments Collaborators Ivan Zecena and Ziliang Zong U.S. National Science Foundation DUE , CNS , and CNS NVIDIA Corporation Grants and equipment donations Texas State University Research Enhancement Program Nvidia Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 30