Overview. Design flow. Back-end process. FPGA design process. Conclusions

Size: px
Start display at page:

Download "Overview. Design flow. Back-end process. FPGA design process. Conclusions"

Transcription

1 ASIC Layout

2 Overview Design flow Back-end process FPGA design process Conclusions 2

3 ASIC Design flow 3 Source:

4 What is Backend? Physical Design: 1. FloorPlanning : Architect s job 2. Placement : Builder s job 3. Routing : Electrician s job 4

5 Input for Layout Tools Input: Verilog Gate Level Netlist Timing Constraint files, for all modes (*.sdc) Libraries: Physical Libraries (LEF/OA) Cell boundaries, pins, routing rules Timing Libraries (*.lib) Optional Input Files: Floorplan File IO File Scan Definition File Optional Libraries: Technology Files (Cap Tables, QRC Tech file) SI Libraries (*.cdb)

6 Import Design Procedure Global Definition File File - Import Design Verilog Netlist File(s) OA-Flow Reference, Custom Libraries of Standard Cells; IOs, Custom Blocks, Rams or LEF files (LEF/DEF Flow) Power/Ground (Special) net definitions, CPF: Common Power Format (Low-Power Design/Power Islands) Specify MMMC (Multi Mode Multi Corner) view file: links timing libraries, RC corners, and constraints per view Command: source <myfile>.globals init_design

7 Structure of a Die Silicon die is mounted inside a chip package. A die consists of a logic core inside a power ring. Special power pads are used for the VDD and VSS (Core and Pad). 7

8 The Design Implementation Flow

9 Floorplaning Floorplanning is a very important step in layout design. Important objectives: Chip size Aspect ratio Placement of basic building blocks IO placement Definition of chip size and aspect ratio along with the placement of its building blocks (memories, hard macros) strongly affects the chip routability and the final performance The pads should be placed in a way to meet minimum pitch requirements defined by the packaging methodology 9

10 Placement and Routing Placement Defines the position of each cell from the netlist Routing Performing the connection between the cells (and IOs) Placement performed in the defined rows Metal lines are used to make the routing Target is to place the connected cells into neighboring positions to reduce the timing penalty Objective is to reduce the interconnection length (reducing line capacitance i.e. interconnection delay) Global and local routing 10

11 Back-end Design decisions Core and pad limited design Design size can be defined either by the core size or by the pad size. In general the design complexity is defined by the number of gates (reflected to core area) However, the pads are unproportionally big and therefore in case of great number of them, they could define the chip area Opposite to that we have a core-limited design. The aspect ratio of the chip has to be chosen such that it doesn t affect the chip routability and that corresponds to packaging. The aspect ratio of 1.0 defines quadratic shape of the chip. This shape is the optimal shape in respect to placement and routing. The size of power rings depends on estimated power consumption of the chip. Since the power pads are usually distributed evenly on all four sides of the chip, the maximum current flow through the power rings is ¼ of the total estimated current. 11

12 Placement ASIC placement is performed in rows Routing can be performed in both directions horizontal and vertical The chip size strongly depends on the chosen core (row) utilization. A typical value of core utilization is 75%. If the chip contains complex logic requiring excessive routing, the user should consider relaxing the core utilization. If the chip logic is relatively simple, the user may try to tighten up utilization value in order to reduce the chip size 12

13 Objectives of Placement Process Performing the placement of each individual cells in the rows Reducing the placement distance between the connected cells Performing high density placements Reducing the timing overhead and power consumption Addressing the routing challenges (avoiding routing congestion congestion) Timing driven placement tries to fulfil the timing constraints while performing placement It is connected with the processes of trial routing and RC extraction to estimate the effects of the placement choices 13

14 Placement Algorithms Two general types of the algorithms: Constructive placement Iterative placement improvement. Constructive placement method Min-cut algorithm, or eigenvalue method Starts with a constructed solution, following iterative improvement The min-cut algorithm placement method uses successive application of partitioning Cut the area into two pieces. Swap the cells to minimize the cost. Repeat the process, cutting smaller pieces until all the logic cells are placed. The eigenvalue placement algorithm uses the cost matrix or weighted connectivity matrix (a) Divide the chip into bins using a grid. (b) Merge all connections to the center of each bin. (c) Make a cut and swap cells between bins to minimize the cost (d) Throw out all the edges that are not inside the piece. (e) Repeat the process and continue the individual bins. Source: Application-Specific Integrated Circuits - Michael J. S. Smith

15 Iterative Placement Based on initial placement further improvements are done Selection criteria decides which cells should be moved. Measurement criteria decides whether to move the selected cells. Several exchange methods pairwise interchange, force-directed interchange, force-directed relaxation, and force-directed pairwise relaxation. All methods based on selecting a pair of cells which need to be exchanged. First the examined cell is selected, after that exchange with all other random cells is evaluated based on cost criteria. The limits of selecting the pair could be defined through the Manhattan distance (a) Swapping two cells (b) Swapping more cells provides better results but It is more complex (c) A one-neighborhood. (d) A two-neighborhood. Source: Application-Specific Integrated Circuits - Michael J. S. Smith

16 Clock synthesis Clock network need to be implemented to drive all sink elements (flip-flips, lathes, etc) from the same source line Clock network consisting of large numbers of buffers, invertors, clock gates Objective is to reduce the phase difference between the clock at the different clock sinks (clock skew) Additional goals is to reduce the clock latency (depending on the clock tree complexity and interconnection delay) Clock is significant source of power consumption, therefore the objective to reduce it In modern designs ~50% Many sinks use all falling edge of the clocks Important objecting is balancing of the rise and the fall time. The clock tree is defined in clock tree definition file 16

17 Clock trees A path from the clock source to clock sinks 17 Figure source: vlsi.pro

18 Clock pad Concept of Clock Tree Clock tree Sub trees 18

19 Clock Skew Clock skew is the maximum difference in the arrival time of a clock signal at two different sinks (flip-flops, latches etc). Clock skew could lead to performance drop or to the need for fixing of hold time delay (adding the buffers) which results in additional power and area Clock skew should be minimized Figure source: vlsi.pro 19

20 Clock Gating and CTS Clock gating is often used as a methodology for reducing the power consumption Clock network uses ~50% of the power budget Switching of the network when it is not needed the consumption can be dramatically reduced Clock gating needs to be taken into consideration while making CTS Clock gate is part of the CTS and contribute to the skew CT balancing required between not-gated and gated subtrees

21 Routing Goals of the routing is to minimize the interconnect delay Routing in performed using the available different layers of metal connections in the automatic way Design rules need to be fulfilled (minimum spacing etc.) Different types of routing (trial, clock routing, final routing) depending on the design phase Global routing first phase of the final routing, connecting blocks Detailed routing final routing of all interblock connections 21

22 Manhattan Routing Algorithm Motivated by the streets of New York Straight connections in the horizontal and vertical directions Specific metal lines only for vertical or only for horizontal direction Avoiding interconnection problems Routing channels defined Manhattan distance Summary of distance in X-axis and Y-axis direction There are now much more advanced algorithms Pin C Pin D Metal 1 Metal 2 Pin A Pin B

23 Left-Edge Routing Algorithm Source: Application-Specific Integrated Circuits - Michael J. S. Smith

24 Verification Timing verification Power verification LVS (layout vs schematics) DRC (Design rule check) optdesign Final Non-SI Timing Summary Setup mode all reg2reg reg2cgate default WNS (ns): TNS (ns): Violating Paths: All Paths: Hold mode all reg2reg reg2cgate default WNS (ns): TNS (ns): Violating Paths: All Paths:

25 Timing Verification in Backend Design Timing verification after synthesis was possible based on the cell delay and assumed interconnect delay (wireload model) After layout the real interconnect delay can be estimated Based on routing information (length, types of metal lines between two pins) the parasitics can be calculated Two important parameters R (resistivity) and C (capacity) of the line Interconnect delay td = R * C Figure source: Application-Specific Integrated Circuits - Michael J. S. Smith

26 Power Verification Power related issues are very important in verification process Power consumption IR drop Ground bounce EMI Substrate noise Crosstalk

27 DRC & LVS During the verification step Design Rule Check it is verified whether all manufacturer rules have been followed LVS includes extraction of schematics from the final layout and comparison with the original netlist which was input for the layout Expected result is full matching Non-matching could indicate the problems: shorts, opens, parametric missmatch etc.

28 Full Back-End Flow Technology and IP setup (libraries, memory/hard macro IP, PDK) Loading of input data (verilog netlist, constraints) Floorplanning Power planning Placement Initial verification and IPO Clock tree insertion Post-CTS verification and IPO Routing Post-Routing Verification and IPO Timing Closure and ECO (Error Correction and Optimization) Power/Voltage verification DRC LVS Design for Manufacturability (Metal fillers etc) 28

29 Field-Programmable Gate Arrays (FPGAs) FPGAs are already fabricated chips which can be fully functionally programmed after production Programming can be done by writing into configuration memory after power-on Configuration SRAM or Flash FPGAs are consisting of configurable logic blocks (CLBs) which can be individually programmed using programmable LUTs and memory blocks Routing (interconnect) between the CLBs is also programmable using configurable routing elements FPGAs are in general less power efficient and with reduced performances but NRE costs are reduced to minimum Today FPGAs contain specialized blocks (embedded processors, DSP) which make them more optimal 29

30 Basic Architecture Basic architecture of FPGA contains the elements which can be fully programmed CLBs Memory IOs Interconnect Clocking Example: Spartan 2 Source figure: Xilinx 30

31 Configurable Logic Block (CLB) CLBs enable full functional programmability programmable Lookup-tables (LUT) for arbitrary combinational function selectable/programmable sequential cell for targeted distributed memory function use of multiplexors for interconnecting the correct function Example: Spartan 6 Source figure : Xilinx 31

32 I/O Block IO pads in FPGAs are fully reconfigurable support different IO directions (I, O, IO) single ended /differential different interface standards (CMOS, TTL, LVDS) different power supplies (3.3V, 2.5V, 1.8V, 1.5V, 1.2V) pullups, pulldowns, with and wo registering Example: Spartan 6 Source figure: Xilinx 32

33 FPGA Clocking Clock driver is routed to all relevant sinks CLBs, memory, IOs Clocking in FPGAs is also programmable based on DCMs which can be programmed in frequency/phase and aligned with other clock sources Example: Spartan 6 Source figure: Xilinx 33

34 FPGA Design Flow Design flow corresponds to the one for ASIC, but with different implementation Synthesis translation of HDL into components of FPGA Place placing the netlist into CLBs of FPGA Route programming interconnects to execute the function Source figure eet.com 34

35 FPGA Pros and Cons Pros Reducing NRE costs no mask costs, reduced design costs Reducing design time no need to wait for chip samples Possibility for easy correction only reprogramming needed Cons High unit costs one FPGA can be even ~10k Higher power consumption Reduced performances Today s FPGA much more optimal Integrating multiprocessors on chip, DSPs, interfaces etc. 35

36 Example- Xilinx Zynq Ultrascale+ Example of optimized FPGA platform Multi-core ARM system implemented on chip Large memory resources Advanced connectivity (USB, PCIe, CAN, SATA, etc) Real-time support Combining with programmable logic Support for high-speed serial interfaces Source figure: Xilinx 36

37 Conclusions Process of designing ASICs was here analysed in details. Main stapes include the synthesis, back-end and timing verification During the practical part we will analyze the steps using the software CAD tools FPGA flow is similar to ASIC flow 37