Improved overlay control using robust outlier removal methods

Size: px
Start display at page:

Download "Improved overlay control using robust outlier removal methods"

Transcription

1 Improved overlay control using robust outlier removal methods John C. Robinson 1, Osamu Fujita 2, Hiroyuki Kurita 2, Pavel Izikson 3, Dana Klein 3, and Inna Tarshish-Shapir 3 1 KLA-Tencor Corporation, One Technology Drive, Milpitas, CA 95035, USA 2 KLA-Tencor Corporation Japan, 134 Godo-cho, Hodogaya-ku Yokohama, Kanagawa , Japan 3 KLA-Tencor Corporation Israel, Haticshoret St., P.O.Box 143, Migdal Haemek 23100, Israel ABSTRACT Overlay control is one of the most critical areas in advanced semiconductor processing. Maintaining optimal product disposition and control requires high quality data as an input. Outliers can contaminate lot statistics and negatively impact lot disposition and feedback control. Advanced outlier removal methods have been developed to minimize their impact on overlay data processing. Rejection methods in use today are generally based on metrology quality metrics, raw data statistics and/or residual data statistics. Shortcomings of typical methods include the inability to detect multiple outliers as well as the unnecessary rejection of valid data. As the semiconductor industry adopts high-order overlay modeling techniques, outlier rejection becomes more important than for linear modeling. In this paper we discuss the use of robust regression methods in order to more accurately eliminate outliers. We show the results of an extensive simulation study, as well as a case study with data from a semiconductor manufacturer. Keywords: Overlay, metrology, outliers, robust regression. 1. INTRODUCTION Overlay control in advanced semiconductor processing continues to be one of the key challenges as the industry progresses to more advanced processing and smaller dimensions of integrated circuitry. Overlay metrology measures the relative positioning of one pattern layer to another. The results are modeled and used for advanced process control, equipment control, product disposition, equipment qualification, continuous improvement, troubleshooting, and the like. The integrity of these results are of utmost importance to support the manufacture of integrated circuits. A significant amount of effort is placed in developing proper metrology targets and advanced metrology systems such as the Archer family of metrology tools. Additionally, advanced automated data analysis packages such as KT Analyzer have been developed to model overlay results. Part of the analysis task is to insure the integrity of the data that is used in modeling. In statistics an outlying observation (called an outlier or flier), is one that appears to deviate markedly from the other members of the sample in which it occurs[1]. Causes of outliers are varied. For example a given data set may contain multiple statistical sub-populations, some of which may deviate significantly from what is expected. There may also be a measurement error such as a heavily damaged metrology target or one of poor contrast, for example. Data distributions may contain heavy tails, not falling off rapidly as with a Gaussian distribution, resulting in significant amounts of data that deviate greatly. Overlay data in semiconductor processing is inherently systematic, so methods used for normal random data can be Metrology, Inspection, and Process Control for Microlithography XXV, edited by Christopher J. Raymond, Proc. of SPIE Vol. 7971, 79711G 2011 SPIE CCC code: X/11/$18 doi: / Proc. of SPIE Vol G-1

2 problematic. In reality there exists no rigid mathematical definition of what constitutes an outlier, so identifying outliers is ultimately a subjective exercise. Our goal here is to minimize the adverse impact of outliers in automated overlay analysis in a high volume manufacturing (HVM) environment without missing legitimate excursions. While identifying and minimizing the impact of outliers has been a reality of day to day semiconductor fabrication since the beginning, there is currently a paradigm shift towards high order overlay modeling that brings more importance to this topic. First, high order modeling requires significant sampling near the edges of the wafers[2, 3], where processing often has a more adverse impact on metrology mark integrity. Second, the impact of outliers on modeled parameters is far greater for high order models than it is for linear models. For these two reasons, there is a renewed interest in the identification and minimization of the impact of outliers. Figure 1. Typical overlay data processing: quality metrics from the tool can identify outliers, raw data statistical methods can be used to identify outliers, and finally ordinary least squares (OLS) regression used for determining coefficients for control and disposition, during which or after which outliers can potentially be identified. Figure 1 shows a typical overlay data processing sequence. The X and Y overlay errors, and their locations, from the metrology tool comprise the raw data. The data is typically modeled by ordinary least squares (OLS) regression to produce the modeled terms used in control and disposition tasks. The residual data is one of the ways to determine the quality of the model fit, and can provide insights into un-modeled systematics and the like. The first post-metrology opportunity for outlier identification is the use of quality metrics. The metrology tool algorithms that provide the overlay metrology also provide metrics indicating the integrity of the target such as correlation scores, noise metrics, asymmetry metrics, goodness-of-fit, and the like. These quality metrics can be used to help identify outliers, such as in the case of a poor quality metrology mark. The next post-metrology opportunity to identify and eliminate outliers, commonly used in the industry today, is based on raw data statistical methods. These methods typically involve establishing thresholds to the tails of the distribution, outside of which data is eliminated, as shown in Figure 2. The simplest case is for fixed thresholds, which can be based on knowledge of the limits of a particular type of metrology structure. In some cases, however, the threshold is arbitrary and this method does not adjust dynamically to the data. Another typical method is to exclude data that are outside plus or minus some multiple of the standard deviation (σ) from the mean m of the data set. For example it is typical to remove all data that is < m - 3σ and > m+3σ. This method has the advantage of adjusting dynamically to the data at hand, however, the calculation of the mean and σ are highly influenced by outliers and thus are not robust metrics to identify outliers. A better approach is to exclude data that are outside plus or minus a multiple of the inter-quartile range (IQR) from the median. The IQR and median are more robust to outliers. The difficulty with using raw data statistics for outlier removal is that overlay data is highly systematic, correlated, having high tails and generally being non-normal. Proc. of SPIE Vol G-2

3 Figure 2. Data distribution illustration. Thresholds are typically established, outside of which data are excluded as outliers. Determination of these thresholds can be based on fixed values, or can adjust dynamically based on statistics such as mean and standard deviation or median and inter-quartile range. The next opportunity to eliminate outliers is after (or during) the OLS regression used to calculate modeled terms (the so called correctables). If the overlay model, which typically includes translation, rotation, scale, and potentially higher order terms, successfully accounts for the systematic components of the data then the residual values should be random and hence more appropriately handled by the statistical methods described in the previous paragraph: fixed thresholds, or dynamic thresholds based on mean +/- a multiple of the standard deviation or better yet median +/1 a multiple of the IQR. In many cases outlier identification is done iteratively and the OLS is repeated until the proper conditions are met. Figure 3. Illustration of the impact of outliers on regression modeling. (a) For OLS regression, data is weighted quadratically. Outliers can significantly impact the model, especially when multiple outliers are present. (b) Robust regression methods can implemented such that there is much less sensitivity to potential outliers. One difficulty with using the OLS regression calculation, used for calculating correctables, to identify outliers is the fact that data is weighted quadratically. Data that deviate far from the correct values can therefore have a significant impact on the model which is being used to identify the outliers. This is especially true when multiple outliers are present. Robust regression methods [4, 5, 6], however, can be much less sensitive to outliers and their impact on the model. There are many robust methods in the statistical literature, some more suitable for the overlay use-cases than others. In our investigations we have determined that some robust regression methods are superior to OLS regression for identification of outliers, however, the modeled parameters are less suited for use as overlay correctables than those of OLS regression. To make use of the best of both methods, we ve investigated and implemented a data flow as shown in Figure 4. Data is first modeled using robust regression methods to identify outliers, and then a subsequent OLS regression is performed to model the parameters, the so called correctables and other Proc. of SPIE Vol G-3

4 derived quantities, which are used for process control, equipment qualification, product disposition, and the like. In this paper we will discuss two investigations related to outlier identification comparing methods described above. First, we will describe a numerical simulation where the true overlay values are known and the outliers are injected for the purposes of illustration. Second, we describe a case study involving data from high volume semiconductor manufacturing. Figure 4. Data flow illustration including robust regression for outlier identification, and OLS regression for calculation of modeled parameters (overlay correctables). 2. OVERLAY SIMULATION INVESTIGATION This section describes a numerical simulation investigation to compare and contrast various outlier removal methods. Two overlay model types were investigated: a standard linear model and a high order grid correction (GCM) model. The linear model contain wafer offset, scaling, and rotation terms as well as field magnification and rotation terms in X and Y. The GCM model contains terms up through 3 rd order. Appropriate sample plans were chosen for each model, and 10,000 different numerical simulations were performed for each method under study. Between 2 and 6 outliers per wafer, of random magnitude (within a specified range), and random in location, were included in the simulation. Three different metrics were investigated for each case: the number of outliers identified, the difference between the true maximum predicted/modeled overlay and the estimated maximum predicted/modeled overlay, as well as the difference between the true and predicted residuals. Two outlier removal methods are presented here: elimination of all points outside +/-3σ from the mean based on OLS regression and robust regression. In Figure 5 the percentage of simulation cases where outliers were identified. Clearly the robust regression is more effective at identifying outliers. The advantage improves as the number of outliers increases, which is to be expected based on the discussion above. In a small minority of cases the algorithm overfilters, and thereby removes some valid data from the data set. Assuming the sampling plan is prudently chosen, the impact of over-filtering should be minimal. Proc. of SPIE Vol G-4

5 Figure 5. Numerical simulation on the impact of outliers: horizontal axis is the number of outliers present, vertical axis is the percentage of the 10,000 simulations for each case. (a) Linear model with OLS regression. (b) Linear model with robust regression. (c) GCM model with OLS regression. (d) GCM model with robust regression. In all cases robust regression did a better job at identifying outliers, and the improvement increases with the number of outliers present. In Figure 6 we compare the ideal results (no outliers) or true to the results with outliers and subsequent outlier removal algorithms or estimated results. For both the linear and the GCM model, we subtract the estimated from the true results for 10,000 simulations in each case and calculate the standard deviation. It is clear from panels (a) and (c) that the robust method is more accurate than the OLS regression method. Additionally, we use the OLS model to calculate the maximum predicted overlay from the coefficients (including a confidence factor based on the residuals) in each case, subtracting the estimated results from the true results for 10,000 simulations of each case and calculate the standard deviation. Again, it is clear from panels (b) and (d) that the robust outlier removal method provides more accurate results than the OLS regression outlier removal method. Proc. of SPIE Vol G-5

6 Figure 6. Numerical simulation on the impact of outliers: horizontal axis is the number of outliers present, vertical axis is 3σ (in nm) of the estimated metric minus the true metric for the 10,000 simulations for each case. (a) Linear model residuals. (b) Linear model maximum predicted overlay. (c) GCM model residuals. (d) GCM model maximum predicted overlay. In all cases the robust regression method is more accurate, and the accuracy increases with the number of outliers present. The conclusions from the simulation portion of this study are as follows. The robust regression outlier removal method is more successful than the OLS regression method at identifying outliers. The improvement goes up with the number of outliers present in the data set. This conclusion is valid for both the linear and high order model. For all metrics studied, the residuals and maximum predicted overlay, the robust regression method resulted in more accurate results. 3. CASE STUDY In this section we describe a specific case study on overlay data in a high volume manufacturing scenario. The data studied here involve extremely grainy aluminum back-end metal overlay metrology marks, as can be seen in Figure7. This study includes 42 different lots, each measured at 4 different back-end layers over a period of a few weeks. The wafers were measured at after develop inspection (ADI), that is after the litho process step and prior to any subsequent processing. For each lot 1 wafer was measured on 9 fields per wafer and 4 sites per field. The metrology tool is an Archer 100 from KLA-Tencor Corp., used in a singlegrab acquisition mode. The metrology marks are Box-in-Box (BiB) of 20μm x 20μm outer and 7μm x 7μm inner. It should be noted that there are more robust target designs available; however, for the purpose of this study optimal target design was not addressed. Proc. of SPIE Vol G-6

7 Figure 7. Example of a Box-in-Box overlay target in a back-end process. The outer box corresponds to the previous layer, and the inner box is photo resist of the current layer. In this case the metrology mark is grainy, and in some cases the inner box is of relatively poor contrast. In the case-study we present a comparison of 3 different outlier removal methods, as well as the case of no outlier removal for comparison. The first method excludes all residual data outside +/- 3σ from the mean of the OLS regression. Using 3 times the standard deviation is a commonly used factor in the industry, however, other numerical factors could also be considered. The second method excludes OLS residual data that deviates from the median by more than 1.5 times the IQR. In this case the factor of 1.5 was chosen based on Tukey s rule based on statistical considerations. Again, other numerical factors could be studied, however, we did not explore this parameter in our study. Our purpose here is for illustrative purposes. The final method considered is to exclude residual data that exceeds +/- 3σ from the mean using robust regression. Of course in all cases, the modeled overlay parameters are based on an OLS regression as described in the introduction. Figure 8. Back-end case study results for various outlier rejection methods. The single maximum and minimum overlay measurement for each of the 42 lots across 4 process layers is plotted post outlier rejection. (a) No outlier removal. (b) Residual 3σ OLS regression method. (c) Residual 1.5 factor IQR OLS regression method. (d) Robust regression 3σ method. Proc. of SPIE Vol G-7

8 In Figure 8 we plot the single maximum and minimum overlay measurement within each lot of each of the 4 layers measured and the 42 lots involved. The original complete data set is plotted in panel (a). Panel (b) shows the residual 3σ OLS regression method. It is clear from the data that this method did a poor job at identifying and removing many of the outliers. This is not surprising, because as was discussed in the introduction the OLS regression method is very susceptible to the outliers that we are attempting to identify. Panel (c) shows the residual 1.5 factor IQR OLS regression method. In this case it is clear that too much data is eliminated, as there is very little data remaining between the minimum and maximum. Finally panel (d) shows the robust regression 3σ method. In this case we can see that the outliers are removed and yet the breadth of the remaining data remains intact. Figure 9. Back-end case study results for various outlier rejection methods. (a) Normalized maximum predicted overlay by layer for 4 cases: no flier/outlier removal, residual 3σ OLS regression method, residual 1.5 factor IQR OLS regression method, and robust regression 3σ method. (b) The number of removed points by layer and method: residual 3σ OLS regression method, residual 1.5 factor IQR OLS regression method, and robust regression 3σ method. In Figure 9a we compare the maximum predicted overlay for the 4 cases discussed (original data, and 3 removal methods). Maximum predicted overlay uses the OLS regression model to predict the maximum overlay on the wafer including a confidence factor based on the residuals. For layers 2 and 3 the difference between the methods is minimal, however for layers 1 and 4 there are significant differences. Operations where the maximum predicted overlay are used for disposition would likely see a significant reduction in unnecessary rework. Figure 9b shows the number of points removed by the 3 methods (the non-removal case is not shown, since no data was removed). Clearly the IQR method, based on Tukee s rule excludes far too much data. A less restricting factor could be chosen, however, deciding on the exact factor that would be appropriate over time and across processes may be problematic. In Figure 10 we show the maximum overlay data value within each lot (black dots), resulting in a nonsymmetric distribution about zero. Box plots for the 4 cases studied are included showing median and quartile values in red. In addition, the median and 95% confidence interval values are indicated in green. We can see that the 3σ OLS method is not capable of removing the long tail of outliers. Additionally it is clear that the IQR method results in a significant decline in the mean of the data, which is problematic in that this is a statistic commonly used in product disposition. Proc. of SPIE Vol G-8

9 Figure 10. Back-end case study maximum raw overlay values for each of nearly 200 lots for each case. (a) Full view to show outliers. (b) Expanded view to show the impact on statistics. Red box plots show median and quartile values. Green plots show median and 95% confidence levels. Four cases studied: no flier/outlier removal, residual 3σ OLS regression method, residual 1.5 factor IQR OLS regression method, and robust regression 3σ method. Finally we look at one particular lot at one particular layer, and the impact of outliers in Figure 11. In this case there was a preliminary tool calibration step performed prior to the measurements, called tool induced shift (TIS) calibration. In this particular case there were outliers due to excessively grainy and low contrast overlay marks during the TIS calibration pre-step. The result is the propagation of the error to multiple measurement sites causing multiple outliers in the data. The robust algorithm is well suited to identify and eliminate multiple outliers, unlike the OLS removal method. The difference in the residual statistics is significant. Another approach to outliers of the TIS pre-calibration step would be to perform culling based on TIS values prior to overlay data analysis, similar to the quality metric methods described above. Figure 11. Back-end case study individual example lot. Tool induced shift (TIS) fliers/outliers caused multiple outliers in the data. Field map showing removed outlier vectors (red) and valid data vectors (green). Residual statistics are shown for the residual 3σ OLS regression method and robust regression 3σ method. The outliers are accurately removed by the robust method. Proc. of SPIE Vol G-9

10 4. SUMMARY In this paper we discussed common outlier removal methods used today in overlay analysis in the semiconductor industry. We showed both a simulation and a case study demonstrating the accuracy of robust regression methods for outlier removal, resulting in more accurate subsequent overlay modeling which is critical for process control, equipment qualification, product disposition, continuous improvement, and the like. As the industry transitions to high-order overlay modeling outlier identification becomes even more critical than in the linear era. First, more sampling needs to be done near the edges of the wafer where processing effects are more likely to negatively impact overlay metrology targets. Second, fitting high order functions is more sensitive to outliers than the linear case. In order to support best known methods for high volume manufacturing of semiconductors, KLA-Tencor Corp. has implemented these as well as other methods in its KT Analyzer product. 5. REFERENCES [1] Samprit, C.; Ali S. Hadi (1986). Influential Observations, High Leverage Points and Outliers in Linear Regression. Statistical Science, Vol. 1, No. 3, p [2] Koay, Chiew-seng; Matthew E. Colburn; Pavel Izikson; John C. Robinson; Cindy Kato; Hiroyuki Kurita; Venkat Nagaswami; Automated optimized overlay sampling for high-order processing in double patterning lithography; Metrology, Inspection, and Process Control for Microlithography XXIV, SPIE Volume 7638 (2010). [3] Kato, Cindy; Hiroyuki Kurita; Pavel Izikson; John C. Robinson. Sampling for advanced overlay process control; Metrology, Inspection, and Process Control for Microlithography XXIII, SPIE Volume 7272 (2009). [4] Peter J. Rousseeuw; Annick M. Leroy (1987). Robust Regression and outlier detection. John Wiley & Sons, Inc. [5] Tukey, J. W. (1977). Explortary Data Analysis. Addison-Wesley, Reading, MA. [6] Huber, P. (1981). Robust Statistics. Wiley. 6. AKNOWLEGEMENTS The authors would like to thank Canon, Inc., of Japan, for the use of their overlay data for the case study example. Proc. of SPIE Vol G-10