Thermal Challenges for SPARC Based Microprocessors

Size: px
Start display at page:

Download "Thermal Challenges for SPARC Based Microprocessors"

Transcription

1 MEPTEC Thermal Management Symposium, February, 2006 Thermal Challenges for SPARC Based Microprocessors SP Bidyut Sen & Jim Jones Semiconductor Packaging Sun Microsystems Make Money Grow Re-enlist Champions Leverage Our Partners Simplify Our Business

2 Agenda Thermal Roadmap Issues in Various Areas CMT Solution Space Page 2

3 1T Moore's Law What is the best use of these transistors? 1T 1B 1B 1M 1K 16-bit 29K 4-bit 2.25K 8-bit 5K 1M 32-bit 275K 64-bit 3.2M 42M Transistor density of a manufactured die doubles every 24 months 1 Planar transistor Page 3

4 Thermal Microprocessor Heat Dissipation Thermal Dissipation Roadmap Microprocessor Thermal Dissipation Power (W) CPU CPU - High CPU - Low Prototype Date Page 4

5 Industry System Thermal Power Density Future Generations E15K IBM P690 Page 5

6 Processor Trends Spending transistors on complex high frequency processors is showing diminishing returns... Frequency Performance / Power Density Issues: Memory Latency Design Complexity High power & hot chips Page 6

7 Total Chip Dynamic and Static Power Dissipation Trends Based on the ITRS Michigan: Kim et al., 2003 Page 7

8 Real End Use Condition - Ranch < 3C fluctuations Idle at 45C or 49C Courtesy of Kenny Gross & Kalyan Vaidyanathan and Leesa Noujeim ron.zhang@sun.com Page 8

9 Determine Equivalent ATC based on Used Condition Assume Solder Damage by Power Cycles = Solder Damage by ATC Then TEMPERATURE CYCLE C 25 C (T ambient ) Time 112W (85 C) 95W Idle + = t 1 t 2 t 3 t 4 Time Max TEMPERATURE (C) TIME (seconds) (d big cycle x N big cycle + d mini cycle x N mini cycle ) = d ATC x N ATC ron.zhang@sun.com Page 9 d Damage per cycle N Number of cycles

10 CMT Processor Advantages Addresses Bottlenecks - Memory, Power Simpler cores operate at lower frequencies while maintaining processor performance/throughput No extra transistors (and power) for multiple-issue or out-of-order execution Time-share a simpler pipeline Share under-utilized resources (crossbar switch): Memory and I/O controllers, Second-level caches Improved power efficiency & Die Yields Active power & temperature control by scheduling or idling threads & cores Page 10

11 CMT Multiple Multithreaded Cores Core 8 Thread 4 Thread 3 Thread 2 Thread 1 Core 7 Thread 4 Thread 3 Thread 2 Thread 1 Core 6 Thread 4 Thread 3 Thread 2 Thread 1 Core 5 Thread 4 Thread 3 Thread 2 Core 4 Core 3 Thread 4 Thread 3 Thread 2 Thread 1 Core 2 Thread 4 Thread 3 Thread 2 Core 1 Thread 1 Thread 4 Thread 3 Thread 2 Thread 1 Thread 1 Thread 4 Thread 3 Thread 2 Thread 1 Memory Latency Compute Page 11 Time

12 Processor Power Density Trends CMT Decreases Power Density without Sacrificing Performance Power Density (W/cm 2 ) CMT Page 12

13 CMT Processor Thermal Advantage Cool Threads Cool improves performance, power & reliability Uniform peak power close to average power, improves thermal management, clock distribution & reliability 107C 102C 96C 91C 85C 80C 74C 69C 63C 58C C0 C1 C2 C3 C4 C5 C6 C7 Single-core Processor (size not to scale) Page 13 CMT Processor

14 Cooling Hierarchy Package Board Level TIM2 Heat Sink Advacned Cooling Solutions Systems Facilities Page 14

15 Concept of Thermal Resistance Heat Sink Tambient Resistance Second Interface Resistance Package Resistance Heat generated from chip Tjunction Page 15

16 Solution Development Trends Thermal Budget Breakdown Current Product Rsa Future Products Rjc Rsa Rjc Vertical Systems Horizontal Systems Rcs Rcs Opportunity Areas Rjc Rsa Rcs Page 16

17 Selected DOE Solution Space Increasing lid thickness Increasing TIM2 K Rsa = 0.16 ºC/W (class A) Coolable Power Contours LID K Rsa = 0.08 ºC/W (class B) Rsa = 0.03 ºC/W (class C) TIM1 K Page 17

18 DOE Study Conclusions The DOE analysis reveals different contributions to the coolable power gains from the considered design variables Within the package TIM1 effective thermal conductivity has the most impact, followed by lid conductivity Outside the package the cooling solution, Rsa, has the most impact TIM2 has a small impact up to k 10 W/mK; and even that only with highly efficient cooling solutions Coolable power is largely independent of lid thickness for TIM2 k below ~10 W/mK. For TIM2 k above ~10 W/mK, increasing lid thickness could even be detrimental at CuW-like lid conductivity; this trend reverses at higher lid conductivity (like ScD) Improving TIM1/lid/TIM2 conductivity, alone with less efficient cooling solutions (Rsa 0.5 C/W), produces limited coolable power gains. To achieve the full potential, these changes must be made in conjunction with improvements in the cooling solution. Page 18

19 Thermal Solution Development - Rjc Improvements (Lid) Coolable Power Gain, % Lid Thickness & BTC* Impact on Coolable Power Material Needed Today 10 4 mm mm 2 mm 5 1 mm Material A B C D Lid BTC, W/mK Material Needed in months *BTC Bulk Thermal Conductivity Page 19

20 Thermal Solution Development - Rjc Improvements (TIM1's) TIM1 Effective BTC Impact on Coolable Power Rjc, deg. C/W Impact of TIM1 BLT on Rjc TIM1 BLT (micron) Coolable Power Gain, % Material Needed Today Material Needed in months TIM1 BTC, W/mK *BTC Bulk Thermal Conductivity Page 20

21 Impact of TIM2 BTC* on Rjc Rjc, C/W A B C D E Material S S TIM2 k, W/mK *BTC Bulk Thermal Conductivity Page 21

22 Impact of TIM2 BTC* on Die Thermal Gradient A B C D E S Temperature, C *BTC Bulk Thermal Conductivity S TIM2, W/mK Page 22

23 Cooling Solution Options BLT Improvements > Thinner BLT Organic TIMs > Higher effective TC > Material investigation Inorganic TIM > Materials Availble > Process validation Advanced Lid Materials > Issues with supply > unproven reliability Page 23

24 Cooling Solution Options TEC > Early stages of evaluation potential is uncertain Lidless > Heat Spreading with Air Cooling > Handling Issues Composite Die > Thinner silicon with diamond-like layer > Needs Development Page 24

25 Cooling Solution Options Liquid Cooling > Direct and Indirect Attach > Liquid Cold Plate > System Level > Data Center Level > Similar complexity to refrigeration with much shorter usable life and small performance gain over air Refrigeration > Infrastructure > Reliability Page 25

26 Wattage Microprocessor Power Projection Non-Conventional Pkg, Non- Conventional Cooling Non-Conventional Pkg, Air Cooling Conventional Pkg, Air Cooling NextGen NextGen NextGen NextGen NextGen NextGen NextGen NextGen NextGen Vertical System Horizontal System 0 Time Page 26

27 Cooling Options Coolable Power Gains 200% 190% 180% 170% 160% 150% 140% 130% 120% 110% 100% 90% 80% LL, thick LL Lidless TIM0 LD Lidded Refrigeration Liquid Cooling Air Cooling 26% 10% 85% 31% 12% 100% LD, thick TIM1 Additional Gains Over Solution Below 33% 33% 13% 13% 107% 108% LD, thin TIM1 LL, thin TIM0 Page 27 Theoretical Cases 38% 15% 122% 124% 126% LD, thick TIM1A 39% 15% LD, thin TIMA 39% 15% LD w/o TIM1 46% 18% 147% LL w/o TIM0

28 Thermal Solution Development - Rsa Improvements 0 Rsa reduction over time Rsa Reduction as % -50% Time source: Guoping Xu Page 28

29 Chip, compressor, and total power 135 Active Power = 100W Power Consumption, W Total Comp Chip Chip Junction Temperature, C Page 29

30 Data Center Cooling Options Past Future Today Tomorrow Page 30

31 FLOORPLAN What is Junction-Case Thermal Resistance (R jc )? T j T c R jc is an industry-wide accepted measure of package thermal performance R jc = T j T c Q Page 31

32 Thermal Resistance with Floorplan PkgA: 15 mm x 15 mm, 120 W : a bulk power density of 0.53 W/mm 2 as before PkgC: 10 mm x 10 mm, 25 W: a bulk power density of 0.25 W/mm 2, lower than that of PkgA PkgC resistance is 0.59 deg. C/W, higher than that of PkgA! Page 32

33 Summary Roadmap suggests continuation of increase of Chip Power unless a Silicon technology change happens Several Performance and Reliability Level Problems needs to be addressed Architecturally, Chip Multithreading (CMT) and Throughput Computing should help with Power mangement New Developments on Packaging and System Level Cooling is necessary Page 33