Physical Level Design using Synopsys

Size: px

Start display at page:

Download "Physical Level Design using Synopsys"

Willa Ball
5 years ago
Views:

1 1 Physical Level Design using Synopsys Jamie Bernard, Student MS CpE George Mason University Abstract Very-Large-Scale-Integration (VLSI) of digital systems is the foundation of electronic applications that are used in everyday life. These applications vary from specialized parts to Application-Specific Integrated Circuits (ASIC), as well as Systems-On-Chips (SoCs). The designs of these systems are so complex that manual design would not be feasible. The only way to design and fabricate such complex designs is to use computers to automate portions of the design process. The focus of this paper are the numerous aspects of the physical design process and how those aspects are automated using computer-aided design (CAD) tools by Synopsys. Index Terms Computer-Aided Design, Design Automation, Physical Design T I. INTRODUCTION he very large scale integration of transistors or integrated circuits has occurred since the 1980s. The process has evolved from the beginning where only one transistor was on a chip, to the point where there were a small number of devices on the chip such as transistors, resistors, and diodes. This made it possible to create more than one logic gate and was considered small scale integration. The next step in the progression towards VLSI was large scale integration in which there would be several thousand transistors on each chip. This technology has led to very large scale integration in which millions to hundreds of million transistors are on a single chip such as a microprocessor. This succession is continuing at Moore s Law pace and soon there will be dualcore processors that may reach one billion transistors [1]. In the early stages of what would eventually become VLSI design, the small number of transistors allowed human or manual design to occur. As the number of transistors increased and device dimensions began to shrink, manual design of such systems becomes impractical due to performance and design time requirements. The amount of evaluation and decision making that would be required would overwhelm engineers and design teams. Therefore the problem and focus of this paper is clear: How does one create a complex electronic design consisting of millions of transistors? The solution is to automate the design process using computer-aided design (CAD) tools. These tools are necessary for complex designing of VLSI integrated circuits in which manual design is not possible. CAD tools provide several advantages such as the ability to evaluate complex conditions in which solving one problem creates other problems [2]. These tools can also use analytical methods to assess the cost of a decision as well as synthesis methods to help provide a solution to the problem [2]. Applying CAD tools to the system design process to propose and analyze solutions to problems allows larger problems to be solved [2]. The solution of using CAD tools to create complex electronic designs falls under an industry category: Electronic Design Automation (EDA). The terms can be combined and the process is then referenced as electronic computer-aided design (ECAD). There are several companies, such as Cadence Design Systems, Magma Design Automation Inc, and Synopsys, who specialize in EDA software and CAD tools. Based in Mountain View, California, Synopsys is a leading provider of EDA software used to design complex ASICs, FPGAs, and SoCs from concept to product [4]. The majority of this paper will center on the physical design process and how the EDA software created by Synopsys automates parts of the physical design process. A. Overview II. DESIGN FLOW It is important to understand where the physical level design process is located in the flow of a complete system. A generic design flow is shown in Fig 1. This represents the major design milestones that are involved in the VLSI design flow. From start to finish, the flow defines what steps and tasks need to be completed and in what order they should be completed. Front-end design includes most of the steps in the flow prior to physical design. Starting with physical design and beyond is considered the back-end of the design flow. Therefore, physical design can be viewed as the bridge between front-end design flow (system specification and functional design) and back-end design flow, which eventually leads to the fabrication of a design. EDA software in the form of CAD tools plays a vital role in all stages of the VLSI design flow. Therefore it is advantageous that the output created at one stage of the flow will be able to become the input to the next stage. However, the EDA software and tools do not have to be from the same vendor. If one vendor has a better tool for Functional Verification, but another vendor s tool is better for Logic Design, then a set of common input and output standards will allow the different tools to communicate with each other. The EDA industry has such common standards so that different tools from different vendors can be used during chip design. Some of these common standards will be discussed in the physical design

2 2 flow. Fig 1. Physical Design combined with Layout Verification are part of the final steps in the VLSI design flow of a system [3]. The physical design stage of the VLSI design flow is also known as the place and route stage. This is based upon the idea of physically placing the circuits, which form logic gates and represent a particular design, in such a way that the circuits can be fabricated. This is followed by connecting the logic with routing (metal). The logic is connected in such a way as to form the function that was designed prior to physical design. For example, if the output of NAND logic is connected to the input of INVERTER logic, then the design has been routed to create AND logic. Each piece of individual logic is placed and connected in a manner that will result in a function being created that will perform a particular task intended by the system designer. This is a generic, high level description of the physical design (place/route) stage. Within the physical design stage, a complete flow is implemented as well. This flow will be described more specifically, and as stated before, several EDA companies provide software or CAD tools for this flow. Synopsys software for the physical design process is called Astro. The overall goal of this tool/software is to combine the inputs of a gate-level netlist, standard cell library, along with timing constraints to create and placed and routed layout. This layout can then be fabricated, tested, and implemented into the overall system that the chip was designed for. The first of the main inputs into Astro is the gate-level netlist, which can be in the form of Verilog or VHDL. This netlist is produced during logical synthesis, which takes place prior to the physical design stage as indicated by Fig 1. Logical synthesis is the combination of the functional design and logic design stages of the VLSI design flow. The logic synthesis combines the inputs of RTL code and design constraints to output a final gate-level netlist which can be interpreted by the physical design tool. The RTL (Register Transfer Level) code is a description of the architecture or function of the design in terms of data flow between registers [5]. The data flow between the registers is implemented using combinational logic such as AND, NAND, INV, etc. The logic synthesis optimizes this combinational logic between the registers based upon the other input to the logic synthesis tool, which are the design timing constraints. This input contains timing parameters such as clock speeds and delays that are associated with the inputs and outputs of the design. These constraints are the result of a system specification for the design being created The logic synthesis tool is capable of merging the function of a design implemented through RTL code in the form of Verilog and VHDL as well as the timing constraints of the design to create an optimized gate-level netlist. The gate-level netlist is then tested and simulated to verify the logic functionality of the design. Once the design has been verified, the netlist can then be used by Astro to begin the physical design process. This process is shown in Fig 2 and shows the details behind some of the stages outlined in the generic VLSI design flow of Fig 1. As described previously, the physical design stage can be seen as the bridge between front-end design which has just been described and the back-end design flow. A physical design engineer will assume that the VHDL code and logic synthesized to a target library has already been completed, and a final gate level netlist has been created. This initial netlist is also assumed to have been functionally simulated to prove that netlist going into physical design performs the function given in the system specification. The netlist is considered golden and is the starting/reference point for all stages in the physical design process and beyond. Meaning that once physical design is complete, the final netlist that is created, which has all of the components needed (timing/clocks) will be functionally compared to the original netlist to insure that function has not been changed. The second of the main inputs into Astro is a standard cell library. This is a collection of logic functions such as OR, AND, XOR, etc. The representation in the library is that of the physical shapes that will be fabricated.

This layout view or depiction of the logical function contains the drawn mask layers required to fabricate the design properly.

3 3 functions. This common height will aid in the placement process since they can now be linked together in rows across the design [8]. This concept will be explained in detail during the placement stage of physical design. Fig 2. Detailed flow of design steps prior to physical design of system [3]. This layout view or depiction of the logical function contains the drawn mask layers required to fabricate the design properly. However, the place and route tool does not require such level of detail during physical design. Only key information such as the location of metal and input/output pins for a particular logic function is needed. This representation used by Astro is considered to be the abstract version of the layout and the comparison is shown in Fig 3. Every desired logic function in the standard cell library will have both a layout and abstract view. Most standard cell libraries will also contain timing information about the function such as cell delay and input pin capacitance which is used to calculated output loads. This timing information comes from detailed parasitic analysis of the physical layout of each function at different process, voltage, and temperature points (PVT). This data is contained within the standard cell library and is in a format that is usable by Astro. This allows Astro to be able to perform static timing analysis during portions of the physical design process. It should be noted that the physical design engineer may or may not be involved in the creating of the standard cell library, including the layout, abstract, and timing information. However, the physical design engineer is required to understand what common information is contained within the libraries and how that information is used during physical design. Other common information about standard cell libraries is the fact that the height of each cell is constant among the different Fig 3. Comparison of layout and abstract views of a logic function. Synopsys Standard cell libraries can be generated manually or supplied by vendors. There are several vendors in the EDA industry that supply standard cell libraries based upon a specific process node and technology such as 0.25um or 0.13um. If generated manually, the cells will need to be prepared for the physical design process through library preparation which is a separate topic not discussed in this paper. The physical design engineer assumes that a standard cell library is available and compatible with Astro whether the library was created by a library group or supplied by a vendor. Other libraries needed during place and route to supplement the design are input/output (I/O) libraries as well as other custom cells such as RAMs and IP cores that can be reused. The third of the main inputs into Astro are the design constraints. These constraints are identical to those which were used during the front-end logic synthesis stage prior to physical design. These constraints are derived from the system specifications and implementation of the design being created. Common constraints among most designs include clock speeds for each clock in the design as well as any input or output delays associated with the input/output signals of the chip. These same constraints using during logic synthesis are used by Astro so that timing will be considered during each stage of place and route. The constraints are specific for the given system specification of the design being implemented. Now that the origin of the three main inputs to Astro, gate-level netlist, standard cell library, and design constraints, are realized, what does Astro do? An overview of Astro, since it is a place and route tool, is to say that it does exactly what was previously stated in the generic VLSI design flow: the tool places and routes. However, there are some other aspects that need to be discussed prior to the details of the physical design flow through Astro. Once this background information is discussed, the detailed flow can be presented and can be better understood. As presented previously, a standard cell library is one of the main inputs to Astro. However, other libraries are needed as well to make a design complete. The final place and routed layout will probably contain macro cells such as RAM or IP blocks and pad cells (Input/Output), which allow signals to enter and exit the chip.

Along with the timing, the ability to connect each standard cell as described in the gate-level netlist is also taken into account so that overall wire length (RC affect) is reduced.

4 4 Prior to placement of the standard cells, the placement of all macro, IP blocks, and pad cells needs to be defined. The tool then places the standard cells automatically based upon the timing of the design, which is given by the design constraints. Along with the timing, the ability to connect each standard cell as described in the gate-level netlist is also taken into account so that overall wire length (RC affect) is reduced. The pins on the standard cells are then physically connected during the routing stage of the process. This is also based on timing due to the fact that more timing critical nets such as clocks should have the shortest lengths and non-critical nets can afford to be longer. This concept is represented by Fig. 4. Fig 4. Visualization of Place and Route. Synopsys The timing driven placement of cells takes advantage of the common cell height and locates the standard cells into placement rows. Within the rows, cells that are part the timing critical path based upon the design constraints will be placed closer together so that interconnect delays are reduced. These placement rows can either be abutted or non-abutted rows. As shown by Fig. 5, one drawback to non-abutted rows is increase in area due to the gap between standard cell placement rows. If the rows were abutted, then the cells on the top row would need to be flipped so that the VDD lines would merge as opposed to VSS shorting with VDD if they are not flipped. The most common approach is to implement abutted rows to reduce area as well as increase the metal size of the VDD or VSS connections. Fig 5. Timing driven placement of standard cells on non-abutted rows. Synopsys Now that the basic concept of placement has been understood, the background of routing can be established. In many technologies, there are several levels of aluminum or copper metal that can be used to provide the connections between all of the cells in the design. When going from one layer to another, a via must be used to make the connection. To prevent metal shorting together during routing, each metal layer has a preferred direction, either horizontal or vertical. Typically in routing, the first metal layer is horizontal. As the metal layers increase, the direction alternates so that any two consecutive metal layers will always be perpendicular to one another. To route standard cells together, the router uses a grid or routing track to maneuver from point A to point B. Due to design rules imposed by a fabrication vendor (foundry), the metal routes need to have a certain minimum width and spacing in order to be manufactured correctly. The routing tracks are designed to make sure that these width and spacing requirements are achieved. The problem of routing congestion can then occur if there are more connections to be made than routing tracks available. This background information on placement and routing only sketches some of the things that can be done during the physical design process. There are other problems that need addressed during the flow in order to complete a design. These problems include what to do in the likely case the critical paths of the design do not meet the timing requirements of the system or how to connect all of the register clock pins in the design so that the design is synchronized correctly. The remainder of the design flow will show how Astro can be used to deal with all of these problems to produce a final place and routed design with all timing constraints achieved. B. DESIGN SETUP Before a design can be placed and routed within Astro, the environment for the design needs to be created. The goal of the design setup stage in the physical design flow is to prepare the design for floorplanning. The design setup flow is outlined in Fig 6. As shown in the figure, the first step is to create a design library. Without a design library, the physical design process using Astro will not work. This library contains all of the logical and physical data that Astro will need. Therefore the design library is also referenced as the design container during physical design. One of the inputs to the design library which will make the library technology specific is the technology file. This file must be explicitly defined when creating the design the library. The technology file contains all of the necessary data for Astro based upon a specific process node, such as 130 or 90nm. This file contains all of the mask layer information as well as via definitions used for the connection of metal. This is also where a version of the process design rules used by the tool is maintained. This information such as metal widths and spacing for each layer can be used by the place and router to aid in simple design rule checking for manufacturing. Since Astro is a graphical user interface (GUI), the layout layers can have different colors and fill backgrounds associated with each layer. This information is also stored in the technology file. Other critical data that is contained in the technology file are the resistance and capacitance values for each layer. This data usually comes in table look-up (TLU) format and is used by Astro to determine the resistance and capacitance of a particular route. It then can be used to calculate delay introduced by the routing. The units for dimensions such as

5 time, distance, resistance, capacitance, etc are identified in the technology file, so that Astro as well as the physical design engineer can interpret the data correctly.

even the Electronic Design Interchange Format (EDIF).

5 5 time, distance, resistance, capacitance, etc are identified in the technology file, so that Astro as well as the physical design engineer can interpret the data correctly. During the creating of the design library, this technology is processed into the library and stored within the database. Once this is done, the design library or container has been created. even the Electronic Design Interchange Format (EDIF). There are positives and negatives to using any of the formats; however, the industry standard is to use either Verilog or VHDL as the gate-level input. The netlist is parsed for any syntax errors that will cause problems during the physical design process. If errors are found, then the netlist needs to be updated prior to continuing. The first two stages do not need to be repeated if there are netlist problems, only the read netlist portion of the flow needs to be repeated. Since most designs are complex enough to require logical hierarchy, the design needs to be flattened or made nonhierarchical in order to work for Astro. This step is called expanding the netlist because each level of hierarchy in the design is expanded or flattened until the only representation is the leaf cells. During this process, Astro is able to validate that all leaf cells have a corresponding abstract view in a reference library. As described before, the abstract view is the necessary data from the layout that is needed for place and route. If there are abstract views of leaf cells missing from the reference libraries, then the layout design is incomplete and Astro will not be able to continue. Once all leaf cells are determined to have the correct abstract views in the reference library, the expansion process will complete without error. Now that the netlist has been read into the design library and has been correctly expanded, a starting cell needs to be created within the design library. This starting cell will be the beginning point for place and route in Astro. The directory structure is seen in UNIX or Linux for a design library named design_lib_orca as shown in Fig. 7. Fig 6. Design Setup Flow using Astro. Synopsys The next step is to attach reference libraries to the design library. These reference libraries, as discussed previously, contain the standard cells, macro cells, pad cells, and/or reusable IP core cells that are being implemented into the design. These libraries can contain several hundreds of cells and are referenced by pointers in the library for memory efficiency. However, the cells being implemented by the gatelevel netlist need to be located within the reference library. If the cell does not exist, then the next step of reading the netlist into the library will fail. Once the design library has been created, and all appropriate reference libraries have been attached, the next step is to read the gate-level netlist into the library. This gatelevel netlist is produced during the logic synthesis stage that was discussed earlier. There are several different formats that this file can be generated into including Verilog, VHDL, or Fig 7. Directory structure created under a design library in Astro. Synopsys The CEL directory is the where the starting cell and all subsequent cells are stored for place and route. The starting cell is typically named by the name of the top level of hierarchy in the netlist. In this case, the starting cell view would be named ORCA. The NETL directory is created during the read netlist stage of design setup. This is where all levels are hierarchy are maintained as well as the connectivity of all levels in the design. The EXP is the directory created when the design is expanded so that all sub-blocks are flattened to the leaf cells. This expanded version of the netlist maintained in the EXP directory is the logical representation of the netlist needed by Astro to perform place and route. The CEL directory is where the layout or graphical representation is stored. Therefore the EXP and CEL views of the design need to be combined to begin the place and route

6 process. This leads to the next step which is referred to as binding the netlist to the cell. During this step, the expanded netlist is bound to a specific graphical cell view.

6 6 process. This leads to the next step which is referred to as binding the netlist to the cell. During this step, the expanded netlist is bound to a specific graphical cell view. This allows Astro to merge the logical and physical representations of the design. Within the cell (CEL) view, all cells referenced in the gate-level netlist are now visible, or in other words, the cells needed to make the design have been assembled together. This includes all standard cells, pad cells, macro cells, and reusable IP cells which are implemented within the netlist. The placement of all used cells in the starting cell view is not considered placement but rather a graphical representation of what will be needed. At this point the netlist has been expanded since Astro operates only on a flattened design. The problem exists that after Astro has placed and routed the design, there will need to be functional verification of the netlist produced by Astro. Since most testing and verification performed prior to place and route was conducted using the benefits of hierarchy, the tool needs to be able to reproduce the netlist in a hierarchical fashion. This is solved during the last stage of design setup which is to preserve the hierarchy. Since the preservation is done prior to floorplan, clock trees, and other Astro optimizations, the tool will be able to create a hierarchical netlist to be used for verification. The preserve hierarchy utility maintains the ports and the function of the ports at the hierarchical boundaries of the subblocks. This information is used to reconstruct the hierarchy based upon the expanded version within Astro. Once the netlist is bound to the starting cell and the hierarchy preserved in the design the library, the design setup stage is completed and the design is ready for the floorplanning stage. C. FLOORPLAN The design setup prepares the netlist and design for floorplanning within Astro. Floorplanning can be considered layout design done at the chip level [2]. This design blueprint shows the actual placement of major components in the design such as inputs/outputs and memory elements such as RAMs. Floorplanning is a form of placement which can be done manually or automatically. It helps do things such as define the layout hierarchy of the design as well as aid in the estimation of the overall area required. This is also the time where aspect ratios of certain design blocks can be analyzed and established as to which sizes will give the best timing results [9]. There are a few approaches to floorplanning that can be used. These approaches include constructive, iterative, and knowledgebased [9]. Constructive assumes a starting module and other parts of the design are added one at a time until all major blocks have been added to the floorplan [9]. The other methods of iteration and knowledge-based assume that an initial floorplan have been proposed. However, using current and previous design knowledge to help in the floorplan can reduce the number of iterations that produce a final floorplan with the greatest probability of meeting the design timing constraints. For most designs, the floorplan consists of three major areas. These areas are the pad area, core area and the power/ground distribution area as shown in Fig. 8. Fig 8. The location of the core and periphery areas as well as the Power and Ground grid define the floorplan of a design. Synopsys The pad or periphery area identifies the locations of the input/output (I/O) cells for the design, and the core area defines the location for the standard cells, macro cells, and any reusable IP implemented in the design. The power/ground network distributes the power and ground needed by the core area logic as well as the I/O in the periphery area. In the peripheral area, there are several types of I/O pads that can be implemented. Most of the area is used by signal pads so that signals can go into and out of the chip. Their placement is fixed and is usually based upon the chip packaging requirements. The pads can be moved during the floorplanning stage if the package requirements change or the original placement of the pads causes a packaging violation as can occur when wire-bonding is used to make connections to the chip. The pad locations need to be fixed prior to completion of the floorplanning stage. In addition to signal pads, power and ground pads are placed to receive the power and ground connection externally. These pads are placed in the peripheral area similar to signal pads, however, their inputs are power (VDD) and ground (VSS) as opposed to a switching digital or analog signal. The physical size of the I/O circuitry as well as the chip size being implemented determines the amount of core area available for standard and macro cells. The actual amount or percentage of the core area that is used by standard cell and macro logic in a given design is referred to as the core utilization of the design. This percentage is found by summing the total standard cell area in addition to the macro cell area and dividing by the core area. To achieve maximum efficiency and use of expensive silicon, the design should be 100% utilized with standard and macro cells. However, with dimensions becoming smaller and the density or amount of shapes becoming larger inside of standard and macro cells, having 100% core utilization would result in routing failure of the design. Once the design is placed, all of the cells need to be routed. With such high utilization, there would be more routes or wires in the design than the fixed number of routing

7 7 tracks could manage. This routing congestion would be come an obstacle almost impossible to overcome without a reduction in the utilization. Many designs range from 80-85% for final core utilization. However, the starting gate-level netlist should only range from 60-75%. This will allow the needed area for logic optimizations, clock tree cells, and other cells added during the timing closure of the physical design process. Floorplanning the core area consists of placing the large macro cells such as RAM, ROM, or other IP being used in the design. As shown in Fig. 8, these blocks can be placed anywhere in the core area. However, approaches mentioned previously such as iterative or knowledge-based floorplanning will help in the proper placement of such blocks. Poor placement of macros during floorplanning can lead to problems later in the physical design process such as placement and routing that require the design to be refloorplanned wasting time and money. Therefore, it is recommended to assemble a good floorplan using the following guidelines. The large macros such as RAMs and IP should be to the sides or close to the corners depending on the number of macros. When placing these macros, sufficient space should be maintained between macros so that large routing channels are defined. The placement of the blocks should also create large partitions for standard cell placement. Restricting the placement of standard cells into small areas between macros or in other parts of the chip may lead to timing constraint problems. Each design is different and these guidelines should only serve as a starting point for an initial floorplan. A few quick iterations of placement and routing may reveal a better but different floorplan. Once the pads and large macros have a fixed location, the power and ground network can be created to connect power and ground to these cells. This network or grid will also supply the power and ground connections to the standard cells that will be placed in the design. The purpose of this grid is to take the powers and grounds received from the pads in the periphery and distribute it evenly across the entire core area. This is to ensure that all cells in the design receive the same power and ground signals as applied to the power and ground pads. In reality, the power and ground levels in the core area are different than those at the pad, but the grid should be constructed in a way that makes the difference as small as possible. As shown in Fig. 8, the power and ground rings are created around the edge of the core area. Then straps are created that connect from one side of the ring to the other side. The rings and straps are created using the process metal layers and span both vertically and horizontally depending on the preferred direction of each metal layer. Since the network is created using most of the metal layers in a horizontal and vertical manner, the end result is power and ground grid. There are separate straps for power and ground that connect to the other power and ground straps and rings as well as any macros in the design. Using the rings and straps, the power and ground applied to the pads is now distributed to all the cells in the design. Now that the floorplan of the design is complete including pad/macro placement and power and ground distribution, the standard cells are ready to be placed. D. TIMING DRIVEN PLACEMENT When discussing the fundamental steps of place and route, the timing constraints must be incorporated. These timing constraints are the requirements for a complete working design, and neglecting them will result in an expensive piece of junk. The place and route steps must be timing oriented and Astro provides this timing-driven place and route tool. Astro will optimize, place, and route the logic gates to meet all timing constraints or speed goals of a particular design. As mentioned in the overview of the design flow, the timing constraints are a critical input to the place and route tool. Astro needs these timing constraints to understand the design timing objectives. Standard constraints on most designs include arrival times of input signals to the design as well as the required arrival time at the output of the chip. Other common constraints include the clock period of the system clock as well as other clocks if the design contains multiple clock domains. The format of these constraints used by Astro is called Synopsys Design Constraints (SDC). The SDC used by Astro during place and route is generated by the logic synthesis tool. This allows Astro to place and route based upon the same timing constraints as the synthesis tool. The timing information that Astro uses to meet these goals is based upon the delays of the standard cells in the design as well as the nets or wiring that connect the cells together. The standard cell delays are a function of the input transition time as well as the summation of the capacitance of the output wire and input gates of all logic connected to the output wire. If any of these attributes are large or small, then the cell delay will increase or decrease respectively. The wire delay is similar and is a function of the resistance of the metal as well as the capacitances described for the cell delay (wire capacitance + input gate capacitance). Using this cell and wire delay information, Astro is able to perform timingdriven placement. However, prior to placement, Astro can perform a sanity check on the design constraints before continuing in the design flow. The process is call a zerointerconnect check and the purpose is to verify that the design constraints can be achieved throughout the physical design process. Astro can perform the timing analysis on the design while ignoring the wiring delay. If the design can not meet the constraints, then it is a good indication that after the design is routed and wiring delay is added, the design will not achieve the timing objectives. After this check has passed and the constraints for a given netlist are considered achievable, the design can be placed. Timing-driven placement of a design is the process of placing all standard cells onto rows in the core area using the timing constraints as the guidelines as to where to place cells. The design in Fig 9 shows all of the unplaced standard cells on the right side of the figure. The rows are abutted in this design and cells will be placed similar to those in Fig 5. As described in the overview, timing-driven placement will attempt to place cells within the critical timing paths close together to reduce wiring resistance and capacitance. Since the design is not routed, Astro uses virtual routes or best

Known as skew, if the clock reaches some registers before others, data transfer problems will be the result as well as overall timing objectives not being achieved.

8 8 estimates to simulate the length and direction of cell connections. will occur associated with the signal transition time and capacitance due to the length of the net and the many connections (load) to it. Known as skew, if the clock reaches some registers before others, data transfer problems will be the result as well as overall timing objectives not being achieved. There are some traditional clock network topologies that are used to prevent or reduce the amount of clock skew introduced into the design. These topologies are referred to as the H-Tree and X-Tree networks and are shown in Fig 10. Fig 9. The rows in the core area prior to placement. Synopsys Astro has the capability to perform other steps during placement. These include pre-place, in-place, and post-place optimizations with the ability to do each one separately or all concurrently. These optimizations include steps such as high fan-out synthesis (HFN). Astro can recognize nets with large amount of connections and provide the proper buffering so that common library design rules such as maximum capacitance and transition times are not exceeded. Other optimizations include buffer insertion into critical paths where certain standard cells have a large capacitance on the output. Astro can also perform logic duplication or remapping where the logic is actually changed compared to the gate-level netlist in a manner that is more conducive to meet timing objectives during placement. However, the overall function of the design is maintained during this change as well as during the other optimizations that can be done during placement. As stated before, the optimizations done during placement use the cell delays of the functions being implemented and the net delays are calculated using virtual or best estimate routes envisioned by the placer. Once the design has been placed and the timing constraints met based upon virtual routes, the focus of physical design then shifts to the clock network of the system. E. CLOCK TREE SYNTHESIS All registers or flip-flops within a design have a clock input. These clock inputs are all driven or connected to a single clock source I/O in the pad area as shown in Fig. 11. Depending on the number of registers in the design, this clock net can have several hundred to several thousand connections. Since each register connected to clock is expected to function upon receiving a clock signal, it is important that the clock signal arrive at the inputs to the registers at the same time. If not, then data will be transferred through some registers more quickly than others depending upon the arrival of the clock. This can result in the possibility of incorrect data being shifted or clocked throughout the design. If the clock network is allowed to originate from one source and connect to all registers distributed throughout the core area, then problems Fig 10. H-Tree and X-Tree clock distribution networks [12]. In the H-Tree and X-Tree topologies, the clock source is connected to the center of the network. The network then branches to the four corners of the design with the difference being in the manner in which the clock is distributed to those corners (shape of H and X). These four branches then provide the inputs to the next level of either H-Tree or X-Tree hierarchy. This distribution continues until all registers in the design are connected by a local clock buffer [12]. The amount of clock skew in the network will be reduced by ensuring that all registers in the design are connected with either of these symmetrical topologies. Fig 11. Clock network problem and solution. Synopsys Astro has the ability to handle the single clock source problem using a variation of the previously discussed topologies through clock tree synthesis. Clock tree synthesis is similar to high fan-out synthesis in the fact that the clock network has many connections from a single point and needs buffers inserted. The clock tree synthesis dynamically inserts clock drivers between the clock source and registers as well as physically placing those clock drivers in the design [14]. The advantage that clock tree synthesis has is in minimizing the skew of the network. As mentioned before, the skew is the difference in the arrival times of the clock signal to the many registers in the design. Therefore, clock tree synthesis has the

9 capacity to not only buffer the high fan-out network and balance the loads at each stage, but to optimize and minimize the skew within the clock network.

9 9 capacity to not only buffer the high fan-out network and balance the loads at each stage, but to optimize and minimize the skew within the clock network. The results of before and after clock tree synthesis can be seen in Fig 11. Once the clock tree is generated, Astro can calculate the delay from the clock source in the pad area, through the clock network and to the register. This delay is considered the insertion delay of the clock. Prior to clock tree synthesis, information such as a target skew and/or an insertion delay target can be given to Astro. Some design constraints may contain a minimum insertion delay required for the clock. If Astro determines that the insertion delay after clock tree synthesis is still less than the minimum insertion delay required, the tool has the ability to add a delay line of buffers to meet the required insertion delay. There has been an assumption made and reinforced by Fig 11 that the clock source comes directly from the I/O pad area and directly connects the registers of the design. However, in many designs, power-saving techniques such as clock gating are employed so that the clocking can be enabled or disabled to sections of the design. During clock tree synthesis, Astro can understand this clock gating logic and still build a clock tree to all clock pins on the output of the clock gating logic. This allows there to be logic between the clock source and the registers in the design. The result of clock tree synthesis is a balanced clock tree with minimized skew. This means that the output loads at the different tree levels are balanced and the delays to the clock inputs are matched among all registers. Other effects of clock tree generation include the increase of design congestion due to the fact that many clock buffers were added. Placed cells from before may have been moved to non ideal locations when the clock buffers were placed into the design. Since these cells have moved from there original locations after placement, the timing information will be different and there may be timing constraints not being met. Therefore, after clock tree generation, some of the optimizations performed during the placement process may need to be used in a postplacement method. Now that the design has been placed, optimized, and all clock trees have been grown, the next step is to route the design. F. ROUTING Routing is a fundamental step in the place and route process. The virtual routes that were used in the previous steps of placement and clock tree synthesis need to become reality for fabrication. The basic goal of routing is to create metal shapes that meet the requirements of a fabrication process. These metal shapes then become the physical connection between the cells in the design. Once the cells are connected by routing, the overall timing of the design needs to be preserved. Timing data such as signal transitions and clock network skew that were used to meet the timing requirements during placement and clock tree synthesis need to be kept. Similar to placement, the process of routing is also timing-driven. In efforts to maintain the overall timing, the routing of a timing critical path is given the highest priority so that the route is as short as possible. Nets that are non-critical are routed around critical areas to provide more wiring area for critical nets. The routing system used by Astro is grid-based as shown in Fig 12. Metal traces or routes are created and centered on routing tracks. These metal routes must meet minimum width and spacing requirements to prevent defects during fabrication. Fig 12. Grid-based routing systems using tracks. Synopsys Grid-based systems use these pitches (width + spacing) to determine the minimum center to center space for each metal layer. The design rule information to form this grid is located in the technology file for each metal layer. There are horizontal and vertical tracks (grid) to correspond with preferred routing directions of metal layers. As shown in Fig 12, the preferred direction for metal 1 is horizontal and vertical for metal 2. A major problem with routing in a gridbased system is the potential congestion of the metal routes. Congestion results from there being more wires in the design to route than tracks available. Typically only small areas of the design experience congestion in which standard cells can be moved accordingly. If the congestion is severe, more extreme measures may need to be taken such as the moving of macros or the re-floorplanning of the entire design. Astro performs several operations during the routing process. These include global routing, track assignment, detail routing, and search and repair. Global routing is the first step in the routing process. In this step, each wire receives a broad routing plan determining how the net will be routed in the design or through which channels that are open for routing [9]. This step outlines the overview of how all of the point A s will get to the point B s. It also lays the groundwork for the next step which is track assignment. Track assignment assigns each net or wire to a specific track and creates the physical metal connections. Once all of the nets have a corresponding track, the track assignment will attempt to reduce the number of jogs or bends in the metal by

10 10 making the routes straighter. The other goal is to reduce the number of vias in the design to help in the eventual manufacturing of the design. The design rule spacing and widths in the technology file for each metal layer are not checked during the track assignment phase. These rules are checked during the next two steps, detail routing and search and repair. Detail routing tries to fix the design rules (minimum width, spacing, etc) which were violated during track assignment. The design is broken into smaller boxes ( Sboxes ) as shown in Fig 13. Fig 13. Detail Route SBoxes. Synopsys Due to the fact the boxes are a fixed size, the detail router may not be able to fix all the design rule violations. The next step of search and repair is designed to resolve the remainder of the design rule violations. Search and repair uses the same concept of Sbox, however, each time through the design the size of the box increases to incorporate a larger portion of the design. The search and repair stage is the last step in the routing process. Once search and repair has resolved all design rule violations, the design is considered to have been placed and routed and ready for verification and fabrication. G. DESIGN FOR MANUFACTURING Prior to verification and fabrication, the physical design engineer can use Astro to address several manufacturing yield problems. Yield improvement is typically considered a process domain issue. However, there are other parts of the design process which can improve yield. One domain that can improve yield is in the testability of the product using design for test (DFT). Another domain is during the mask preparation phase through procedures such as reticle enhancement technologies (RET). The domain in which Astro can be implemented is during the physical design phase through techniques described as design for manufacturability (DFM) [10]. Before a designer can actually design for manufacturability, the designer needs to know what features of the design will cause problems during the fabrication process [11]. Focusing on these problems during the design stage prior to fabrication will help in the manufacturing and ultimately the yield of the design. These problems can include gate/oxide integrity, via resistance and reliability, and metal erosion. In terms of via resistance and reliability, single via connections throughout the design can reduce the yield. One missing via on one connection in the design will result in an open connection and, the design will not function correctly. Therefore, having extra via connections reduces the chance of having open connections in the design. Astro has the capability to add extra vias on one or all connections (routes) within the design with the tradeoff of expending the area to add vias to increase yield. The problem of metal erosion is a result of attempts to planarize or make the wafer flat. This technique, chemicalmechanical polishing (CMP), polishes the deposited dielectric layers during the metal interconnect process to provide a smooth and level surface for the next metal layers being deposited [15]. Due to the material property differences between the metal and the dielectric, erosion can occur during CMP on the metal traces which are very wide [15]. Therefore, the metal density needs to be reduced in those areas. The reduction is referred to as metal slotting and is implemented by creating rectangular openings or slots in the wide metal traces during the physical design process [16]. Another problem of damaging the gate/oxide of a transistor is introduced during the metal etching stages of fabrication. Ion etching is used both to remove excess metal under a mask as well as the removal of the mask layer itself [16]. During this etching, charge collection occurs on the metal traces and if the charge is significant, the transistor can be damaged or even destroyed. This phenomenon is referred to as the antenna effect. The longer the metal trace connected to the transistor, the more charge that can collect during the etching process. Therefore the amount of metal connected to a transistor gate needs to be limited. Many chip manufacturers define acceptable length of metal traces by the ratio of the metal area to the transistor gate area. This antenna ratio rule must be observed similar to other design rules such as metal spacing and widths. Astro has the capability to recognize and repair any routes that violate the antenna ratio rule using standard techniques of layer jumping and diode insertion during the routing phase of physical design. III. VERIFICATION Once the physical design process in Astro is complete, the design needs to be verified prior to fabrication. The design needs to be verified for timing, functionality, and manufacturability. This verification is completed outside of Astro, using industry standard, production quality ( signoff ) CAD tools. The Synopsys sign off tool Formality is used for checking functionality of the design. This formal verification compares the original gate level netlist produced by the logic synthesis tool and the final netlist created by Astro. This comparison is to ensure functional equivalency at the logical level between the two implementations of the design. The verification of timing contraints is a multiple step

11 11 process which begins with extracting the parasitics in the design. The tool Star-RCXT performs this layout parasitic extraction by calculating the resistances and capacitances of all connections (routes ) in the design and producing the results in a format such as SPEF that can be interpreted by a static timing analysis tool. The static timing analysis tool Primetime can detect timing violations in the design by combining the results from Star-RCXT and the netlist from Astro and checking that information against the clock frequencies implemented. Once the design has been timing and functionally verified, the design needs to be physically verified. This verification checks if a design can be fabricated. It also checks that the final design will have no physical defects that will result in the design to not function properly. The Synopsys tool Hercules can be used to perform these checks which are referred to as Design Rule Checking (DRC), Electrical Rules Checking (ERC), and Layout Versus Schematic (LVS). Design Rule Checks verify that the design does not violate any fabrication rules associated with the target process technology such as metal spacing/widths and the previously mentioned antenna ratios. Electrical Rules Checks verify that there are no short circuits or open circuits with power and ground in the design as well as resistors/capacitors/transistors with floating nodes. The Layout Versus Schematic check verifies that the final physical design matches the logical version of the design in terms of the correct connectivity and number of electrical devices in the design such as resistors, capacitors, and transistors. After successful completion of physical verification as well as the timing and functional verification, the design is complete ( signed off ) and the process follows the flow outlined in Fig 14, beginning with taping out the design into a format such as GDSII and resulting in fabrication of the design. The manufactured design can then be implemented into the system architecture for which it was designed. Fig 14. Steps after the design has been timing, functionally, and physically verified [3]. IV. FUTURE The technology and manufacturing industries are continuing to push the envelope with each new process node. Requirements for chip designs to have the fastest speeds, lowest cost, lowest power, and all within the smallest area possible are confronted during the physical design process. The challenges encountered during physical design at the 130nm and 90nm are dramatic due to more prevalent problems at these nodes. Some problems include voltage drop, crosstalk and signal integrity, and reliability including electromigration. However, the tools described in this paper to perform the physical design process can still be implemented at these process nodes by taking advantage of advanced features and practices to overcome the daunting obstacles at the 130 and 90nm nodes. Since the industry, including EDA vendors, has been at these process nodes for some time, the manufacturing and design processes have matured to the point where many companies are releasing production designs at 90nm, with some implementing 65nm. The next future processing node is 45nm and the challenges at this node are similar to that of the previous nodes, only more severe. Some of the concerns are that the leakage power of transistor could reach the level of the dynamic power of the design [6]. Other concerns are that the wiring delays will outweigh the gate delays, which has already been seen at the 130nm node. With respect to wiring, the cross-coupling capacitance will begin to dominate over the capacitance of the wire itself. So at the 45nm node, the process and design complexity will require greater advancements in the capabilities of EDA software. These process node problems