VRCon: Dynamic Reconfiguration of Voltage Regulators in a Multicore Platform

Size: px
Start display at page:

Download "VRCon: Dynamic Reconfiguration of Voltage Regulators in a Multicore Platform"

Transcription

1 VRCon: Dynamic Reconfiguration of Voltage Regulators in a Multicore Platform Woojoo Lee, Yanzhi Wang, and Massoud Pedram University of Southern California

2 Outlines Introduction Preliminary - VR characteristics Dynamic reconfiguration of the VR-to-core network Proposed multicore platform Reactive VRCon Proactive VRCon Experimental work Conclusion Mar

3 Introduction (1/3) Per-chip DVFS vs. Per-core DVFS (Conventional) per-chip DVFS hinders DVFS from achieving its full potential. Per-core DVFS allows excellent flexibility in controlling power, but has shortcomings from the indispensable use of multiple voltage regulators (VRs), such as footprint, power conversion loss, and control complexity. We target the multicore platforms that support the per-core DVFS. Mar

4 Introduction (2/3) We focus on the power conversion efficiency of the multiple VRs. The figure below shows traces of the VR efficiency during delivering power to a core. Around 24% of input power is dissipated by a single VR in the high efficiency region, but more than 53% of the input power is consumed by the VR in the low efficiency region Mean: 75.18(%) Mean: 46.38(%) Time (ms) Efficiency (%) Power dissipations of all VRs can result in a considerable amount of power loss. 40 Mar

5 Introduction (3/3) We propose a system-level optimization technique to substantially improve the VR efficiency: VR consolidation (VRCon for short). This technique starts from the intuition of combining some cores, which require the same voltage level and driving small amount of load current, to be powered by a single VR. Why is this helpful? We will see the reasons from the VR characteristics, in the following slides. We present two VRCon techniques, a reactive and a proactive VRCon. Mar

6 VR characteristics (1/3) We targets inductive switching regulators. The inductive switching regulators achieve the higher conversion efficiencies over a wide range of output loads, compared to other types of VRs, such as low-dropout regulators and switched-capacitor regulators. Due to the equipped controller to support dynamic voltage setting with fast transient response, the inductive switching regulator is suitable to power the processors. The circuit schematics is in the below: Mar

7 VR characteristics (2/3) The load current condition of the VR affects the VR efficiency The figure below shows load current vs. efficiency, simulated by the VR schematics and 45nm PTM. The main source of the power loss for Region I is the switching and controller losses, Region II is the conduction loss. Modern VRs exhibits high peak efficiency with a specific load current value, but their efficiency drops dramatically under the adverse load current conditions. Mar

8 VR characteristics (3/3) VRCon is motivated to save power by configuring the VR-to-core network to use a single VR instead of multiple VRs, if available. If some cores in a multicore processor require the same voltage level, and they have small load currents, then their power domains can be consolidated to share a single VR. Then, the VR used to power multiple cores has relatively high load current, and hence, higher efficiency. The VRs that are not used can be turned off to save power. Now, let s go into the detail of VRCon! Mar

9 Proposed platform (1/2) The proposed platform has a several components Network switches is to implement the reconfigurable the VR-to-core network. Power manager (PM) monitors the core status (i.e., performance) reported by hardware performance monitor (HPM). Different from PMs in conventional multicore platforms, PM here determines a tentative voltage and frequency levels of cores, and transmits this information to VRCon manger. VRCon manager (VRCM) is added to ultimately controls the core s frequency/ voltage level, as well as the operations of VRs and ON/OFF states of the network switches in VRCon Mar

10 Proposed platform (2/2) The figure below is a conceptual diagram of the proposed multicore platform Power Manager DVFS opinion Hardware Performance Monitor Core 1 Multi-core processor (per-core DVFS).... Core Core 4 Core 5 Core Core 12 VRCon Manager DVFS setup Sensing circuits Dynamic Config. VR-to-core distribution network Switch set 1 Switch set 2 Switch set VR output setup VR 1 VR groups.. VR VR VR 8 VR 9.. Mar

11 VRCon: overview (1/7) The power saving achieved by employing DVFS strongly depends on the frequency of the decision making process. Equivalently, it is the duration of decision period ( ). T DVFS T DVFS should be considered a design variable to be set by the PM, which needs to be (much) longer than the voltage scaling time of the VR. Turning on/off the network switches, the time to reconfigure the VR-to-core network ( T NS ) is only limited by the transient response of the VR. It is in general much shorter than the voltage scaling time. We treat the DVFS setting and network reconfiguration as the global and local power managements of VRCon. T DVFS and T NS are the required minimum global and local decision epoch lengths, respectively. Mar

12 VRCon: reactive VRCon (2/7) As a local management function, the reactive VRCon applies only to cores with the same voltage level. The figure below shows an example of applying the reactive VRCon to a dual core platform. Voltage (V) Vdd Current Current (A) Voltage (V) Vdd Current Current (A) Time Mar is a valid region Yanzhi for VRCon, Wang/ University is not, because of Southern of high California load current. 12

13 VRCon: reactive VRCon (3/7) (cont.) The VRCM in this case performs only the network switch control to minimize the total energy consumption. The total energy consumption is the summation of energy losses of the active VRs (including network switches) and the energy consumptions of the cores during the time period T DVFS. Algorithm for reactive VRCon. The VRCM first sorts the cores that have the same voltage levels and a lower amount of load current than the maximum driving capability of a single VR. The VRCM finds the two cores, by merging which the VR energy saving is maximized. The merged cores are treated as one core. The VRCM keeps repeating the above procedure until there is no available core. Mar

14 VRCon: proactive VRCon (4/7) For its global power management function, the proactive VRCon exploits DVFS technique to perform frequency (and its corresponding voltage level) scaling. The proactive VRCon takes account for the energy consumption of both cores and VRS. There can be a trade-off between the energy saving by DVFS (which is initially determined by the PM), and reduced energy loss by adaptively turning off the VRs and using fewer number of VRs at higher conversion efficiencies. If the VRCM determines that the latter option is better, the VRCM will not decrease the frequency/voltage levels of some cores to the minimum level possible; Instead it will adjust the frequency/ voltage levels of the cores to increase the chances for applying the VRCon. Mar

15 VRCon: proactive VRCon (5/7) The objective here is to find the frequency/voltage level of each core for each TDVFS to minimize the total energy consumption. min E TDV F S,t! TX E TDV F S,t (V core,1,v core,2,..,v core,n ), t=1 denotes the total energy consumption during the time period of T DVFS ; V core,i is i th core s voltage level. N is the total number of cores; indicates that all the task processings are finished in this period Solving the objective is difficult, because: V core,8i T DV FS,t+1 E TDV F S,T T DV FS,t changing in time period affects the VRCon results in period. There are locking and synchronization issues of the multithread applications in multi-core processors. Mar t th

16 VRCon: proactive VRCon (6/7) Therefore, by exploiting the initial DVFS schedule of the PM, we first divide the overall problem into subproblems, each of which only concerns how to modify the initial DVFS schedule to optimize the energy saving results of the reactive VRCon in a given period,. In order to guarantee that the performance (i.e., total execution time of applications) is not degraded by the modification of DVFS schedule, we impose the constraint that the VRCM can only keep the same or increase (but not decrease) the frequency/voltage level of each core from the original DVFS level suggested by the PM. This can be formulated as follows: f(v new core,1,v new core,2,..,v new core,n ) <f(v others core,1,v others core,2,..,v others core,n ) T DVFS s.t., V new core,i V PM core,i, for 1 apple i apple N Mar

17 VRCon: proactive VRCon (7/7) We present a clustering-based heuristic solution as follows: - We first sift through the cores driving a small amount of current so that they can be combined with others. - Next we consolidate two cores (and treat them as one equivalent core) if this merge results in the maximum energy saving. - The procedure is repeated until no energy saving can be achieved by VR consolidation. Mar

18 Experimental work (1/4) Multicore processor setup We performed the multicore processor simulations in the Sniper simulator. The platform configurations were set based on Intel Xeon Nehalem architecture, the topology is shown in the figure below. Core 1 Core 2 Core 3 Core 4 Core 12 Core 13 Core 14 Core 15 L1-I L1-D L2 (256KB) L1-I L1-D L2 (256KB) L1-I L1-D L2 (256KB) L1-I L1-D L2 (256KB) L1-I L1-D L2 (256KB) L1-I L1-D L2 (256KB) L1-I L1-D L2 (256KB) L1-I L1-D L2 (256KB) L3 (8MB) DRAM L3 (8MB) DRAM We set the five DVFS levels as follows: We modified the codes related to the McPAT module in the Sniper to collect the power and timing data from per-core DVFS. The multi-threaded applications from the PARSEC and SPLASH2 benchmarks were used in the simulation. Mar

19 Experimental work (2/4) Per-core DVFS simulation We treat the PM s DVFS recommendation as given a priori, exploit an offline DVFS approach as an intermediate step for the overall aim. We adopt an ILP based algorithm, as follows:! RX SX min P r,s x r,s r s RX SX s.t., D r,s x r,s <, and r s R is the total interval, and S is the five frequency/voltage levels. Pr,s is the power consumption set by s th frequency/voltage level for r th interval. By following the same notation to Pr,s, Dr,s denotes the incurred delay under the frequency/voltage condition. is a certain performance penalty. RX r SX x r,s = R s Mar

20 Experimental work (3/4) VR-to-core network setup We selected the programmable VR from LTC3816, which can power each core in our processor setup, and perform the high efficiency at the average current level of the core obtained from the benchmark simulations. We performed LTspice simulation to acquire the VR efficiencies for the various load current under the five output voltage levels. Efficiency (%) data1 Output voltage: 1.2V data2 Output voltage: 1.05V data3 Output voltage: 0.95V data4 Output voltage: 0.83V data5 Output voltage: 0.75V We set the number of VRs and cores in one group of the VR-tocore networks to 4. We determined the width of the network switch as 8mm based on 45nm technology. Mar data Input voltage: 12V Load current (A) Power loss (W)

21 Experimental work (4/4) Simulation results We define GVR and Gtotal as the energy loss reduction from VRs, and total energy saving, respectively. When we ran Streamcluster in 8-core simulator setup, the resulted enhancements showed GVR = 24.06% and Gtotal = 9.96% from the reactive VRCon, and GVR = 35.86% and Gtotal = 14.85% from both reactive and proactive VRCon. The below shows the simulation results from various applications under the different simulator setup. (I), (II) and (III) indicates 16cores, 8cores and 4cores setups, respectively. Mar

22 Conclusion We addressed the problem of power conversion efficiency in the multicore platform. Significant power is dissipated by the multiple VRs to support per-core DVFS. We proposed the VR consolidation methods with the configurable VR-to-core distribution network. The reactive VRCon was presented to configure the network to enhance the power conversion efficiency under the predetermined DVFS levels. The proactive VRCon was proposed to determine new DVFS levels for maximizing system-wide energy saving without performance degradation. Mar

23 Q&A Thank you! Mar