Session 1 (9:15-10AM)
A Non-linear Mixed Model Approach for Detecting Outlying Profiles
Valeria Quevedo, Universidad de Piura and Geoff Vining, Virginia Tech
In parametric non-linear profile modeling, mapping the impact of the model parameters to a single metric is essential. The profile monitoring literature suggests using multivariate T^2 statistics to monitor the stability of the parameters simultaneously. These approaches focus on the estimated parameters of the non-linear model and treat them as separate but correlated quality characteristics of the process. As a result, they do not take full advantage of the model structure. We propose a procedure based on a non-linear mixed model that takes into account the proper variance-covariance structure. This procedure recognizes and uses the correlations among the estimates of the fixed parameters created by the model. The proposed method is based on the concept of the externally studentized residual. We test whether a given profile is sufficiently different from the other profiles based on a non-linear mixed model. Our results from our simulation show that our method appears to perform better than the classical control chart based on the T^2 statistics. We also apply our approach to an aquaculture process on a shrimp farm which monitors its shrimp weight from over 300 ponds each year.
A Staggered-Level Design Case Study: Staple Fiber Cutting Experiment
Peter Goos, KU Leuven and Katherine Brickey, Eastman Chemical Company
Real-world experimental designs often require constraints on randomization. Time, cost, or equipment limitations prevent some factors from being independently reset at frequent intervals. Staggered-level designs provide a flexible structure for experiments with more than one hard-to-change factor by allowing the hard-to-change factor levels to be reset at different points in time throughout the design. In this case study, a staggered-level designed experiment was completed involving five experimental factors. Two hard-to-change factors were reset at staggered intervals during the experiment. This presentation will describe the experimental planning and the analysis of the results of the first documented real-world staggered-level experiment. It will also discuss the advantages of staggered-level designs compared to traditional split and split-split plot designs.
Statistical Tools for Survey Data : A Case Study of a First Responders Survey
Adam Pintar, Kerrianne Buchanan, and Yee-Ying Choong, National Institute of Standards and Technology
This presentation examines a survey conducted by researchers at the National Institute of Standards and Technology (NIST). The survey is of first responders, e.g., firefighters, EMS, police, and 9-1-1 dispatch, and its purpose was to assess first responders’ experiences with communication technology, e.g., radios. From the perspective of a statistical analysis, there are some non-standard and interesting aspects.
First, one of the demographic questions requested the respondent’s title. The question required an open text response. One research topic of interest is if respondents with chief/managerial roles think differently about technology than those on the frontlines. To consider that topic, it was necessary to code the open text responses into categories, which would typically be done manually by the researchers for each of the more than 7,000 respondents. It was possible to save hours of researcher time by applying “off-the-shelf” tools in natural language processing (word embeddings) and machine learning (random forests). A similar process could be useful for categorizing user or product experience feedback.
A second interesting aspect is a type of data that seem uncommon, traditionally, in engineering and physical science settings, but could be very applicable to new product development. A respondent is presented with a list of K choices, and is asked to rank their top l selections. Given the rankings of n respondents, the goal is to estimate the group preference of each choice. A few probability models are applicable to the problem. The Plackette-Luce model, used in this work, will be reviewed, and its ability to highlight group differences in technology preferences shown.
The presentation will conclude with some key takeaways for the NIST researchers. For example, those with chief/managerial roles do indeed seem to think differently about technology than those on the frontlines, with the price tag of that technology receiving greater scrutiny.
Session 2 (10:30AM-12PM)
Additive Manufacturing: A Case Study for Characterizing Variability
Lauren Wilson, Manuel Martinez, Leslie Moore, and Joshua Yee, Sandia National Laboratories
Additive manufacturing (AM) provides production advantages with respect to cost, weight, and design complexities. The relative newness of AM compared to conventional methods and its known variability carries non-trivial risk. Building metal AM components with dimensional requirements is of interest for a high-risk application. A statistical experimental design & analysis evaluates the effect of various factors on component acceptance criteria, which demonstrates methodology to provide quantitative evidence in support of an alternative manufacturing process. Analysis includes common data visualization techniques novel to AM process exploration, quantification of margin & uncertainties (QMU) to ensure product quality relative to dimensional limits, and estimation of variance components (within- and between-batch) to inform production sampling and future AM studies.
A Closed-Loop Machine Learning and Compensation Framework for Geometric Accuracy Control of 3D Printed Products
Wenbin Zhu and Arman Sabbaghi, Purdue University
Additive manufacturing (AM) systems enable direct printing of three-dimensional (3D) physical products from computer-aided design (CAD) models. Despite the many advantages that AM systems have over traditional manufacturing, one of their significant limitations that impedes their wide adoption is geometric inaccuracies, or shape deviations between the printed product and the nominal CAD model. Machine learning for shape deviations can enable geometric accuracy control of 3D printed products via the generation of compensation plans, which are modifications of CAD models informed by the machine learning algorithm that reduce deviations in expectation. However, existing machine learning and compensation frameworks cannot accommodate deviations of fully 3D shapes with different geometries. The feasibility of existing frameworks for geometric accuracy control is further limited by resource constraints in AM systems that prevent the printing of multiple copies of new shapes. We present a closed-loop machine learning and compensation framework that can improve geometric accuracy control of 3D shapes in AM systems. Our framework is based on a Bayesian extreme learning machine (BELM) architecture that leverages data and deviation models from previously printed products to transfer deviation models, and more accurately capture deviation patterns, for new 3D products. The closed-loop nature of compensation under our framework, in which past compensated products that do not adequately meet dimensional specifications are fed into the BELMs to re-learn the deviation model, enables the identification of effective compensation plans and satisfies resource constraints by printing only one new shape at a time. The power and cost-effectiveness of our framework are demonstrated with two validation experiments that involve different geometries for a Markforged Metal X AM machine printing 17-4 PH stainless steel products. As demonstrated in our case studies, our framework can reduce shape inaccuracies by 30% to 60% (depending on a shape’s geometric complexity) in at most two iterations, with three training shapes and one or two test shapes for a specific geometry involved across the iterations. Ultimately, our closed-loop machine learning and compensation framework provides an important step towards accurate and cost-efficient deviation modeling and compensation for fully 3D printed products using a minimal number of printed training and test shapes, and thereby can advance AM as a high-quality manufacturing paradigm.
Optimal Designs with Axial Values
Cameron Wilden and Willis Jensen, W.L. Gore & Associates, Inc.
We introduce a modification to the coordinate-exchange algorithm for generating optimal experimental designs that incorporates off-face axial value runs similar to a central composite design (CCD). This improvement addresses a weakness of current optimal designs relative to classical designs: the superior properties of CCDs with off-face axial values relative to equal-sized optimal designs that are constrained to the cuboidal experimental design region. CCDs tend to have significantly less collinearity among quadratic effects, which result in higher power for quadratic terms, better D-efficiency, and lower average prediction variance (i.e. I-optimality). Optimal designs offer greater flexibility in design size, model specification, and the ability to incorporate categorical factors. By incorporating axial values into an optimal design algorithm, the strengths of both approaches can be combined in a single design that generally outperforms both CCDs and current optimal designs.
The State of Supersaturated Design and Analysis
Byran Smucker and Maria Weese, Miami University; Jon Stallrich and Kade Young, North Carolina State University; David Edwards, Virginia Commonwealth University
Supersaturated designs (SSDs) try to pull off what seems impossible: Identify important effects when there are more factors than experimental runs. Despite the vast amount of literature on the topic, there is little record of their use in practice. We contend this imbalance is due to conflicting recommendations regarding SSD use in the literature as well as the designs’ inabilities to meet practitioners’ analysis expectations. In this talk, we learn about some practitioner concerns and expectations via an informal questionnaire, discuss and compare two recent SSDs that pair a design construction method with a particular analysis method, and introduce some new results that provide a way to directly assess the quality of SSDs that will be analyzed using the LASSO.
Using BART to Perform Pareto Optimization and Quantify its Uncertainties
Akira Horiguchi, Duke University
Techniques to reduce the energy burden of an industrial ecosystem often require solving a multiobjective optimization problem. However, collecting experimental data can often be either expensive or time-consuming. In such cases, statistical methods can be helpful. This article proposes Pareto Front (PF) and Pareto Set (PS) estimation methods using Bayesian Additive Regression Trees (BART), which is a nonparametric model whose assumptions are typically less restrictive than popular alternatives, such as Gaussian Processes (GPs). These less restrictive assumptions allow BART to handle scenarios (e.g., high-dimensional input spaces, nonsmooth responses, large datasets) that GPs find difficult. The performance of our BART-based method is compared to a GP-based method using analytic test functions, demonstrating convincing advantages. Finally, our BART-based methodology is applied to a motivating engineering problem. Supplementary materials, which include a theorem proof, algorithms, and R code, for this article are available online.
Sensitivity Prewarping for Local Surrogate Modeling
Nathan Wycoff, Georgetown University
In the continual effort to improve product quality and decrease operations costs, computational modeling is increasingly being deployed to determine feasibility of product designs or configurations. Surrogate modeling of these computer experiments via local models, which induce sparsity by only considering short range interactions, can tackle huge analyses of complicated input–output relationships. However, narrowing focus to local scale means that global trends must be relearned over and over again. In this article, we propose a framework for incorporating information from a global sensitivity analysis into the surrogate model as an input rotation and rescaling preprocessing step. We discuss the relationship between several sensitivity analysis methods based on kernel regression before describing how they give rise to a transformation of the input variables. Specifically, we perform an input warping such that the “warped simulator” is equally sensitive to all input directions, freeing local models to focus on local dynamics. Numerical experiments on observational data and benchmark test functions, including a high-dimensional computer simulator from the automotive industry, provide empirical validation.
Session 3 (2-3:30PM)
Analytical Problem Solving
Based on Causal, Predictive and Deductive Models
Jeroen de Mast and Stefan H. Steiner, University of Waterloo, Wim P. M. Nuijten, Jheronimus Academy of Data Science, and Daniel Kapitan, Eindhoven University of Technology
Many approaches for solving problems in business and industry are based on analytics and statistical modelling. Analytical problem solving is driven by the modelling of relationships between dependent (Y) and independent (X) variables, and we discuss three frameworks for modelling such relationships: cause-and-effect modelling, popular in applied statistics and beyond, correlational predictive modelling, popular in machine learning, and deductive (first-principles) modelling, popular in business analytics and operations research. We aim to explain the differences between these types of models, and flesh out the implications of these differences for study design, for discovering potential X/Y relationships, and for the types of solution patterns that each type of modelling could support. We use our account to clarify the popular descriptive-diagnostic-predictive-prescriptive analytics framework, but extend it to offer a more complete model of the process of analytical problem solving, reflecting the essential differences between causal, predictive and deductive models.
Data Analytics for Decision-Making
Joanne Wendelberger, Los Alamos National Laboratory
Many organizations collect vast amounts of data for various purposes. While data streams are often collected with a specific end goal in mind, they can also provide valuable information for broader initiatives and improved decision-making. Standard analyses can often be enhanced by incorporating additional information from previously underutilized data sources into the modeling process. The impact of individual analyses can be increased by developing reusable analysis processes with associated computational infrastructure and workflows. The integration of heterogeneous data from multiple sources can pose a variety of challenges in terms of data acquisition, storage, processing, analysis, and ongoing delivery of results. This presentation will discuss the development of an integrated analytics capability that leverages the power of statistical methods with access to data from multiple sources. Examples of analysis applications that have been developed to provide dynamic results informed by disparate data streams will also be discussed.
New Developments in Space Filling Designs
Christine Anderson-Cook, Los Alamos National Laboratory and Lu Lu, University of South Florida
Space filling designs have been broadly used for data collection in computer experiments. Traditional they have focused on uniform spacing throughout the input space. In this talk, we introduce two new recent developments for more flexible construction of these designs. They allow existing information about the study to be strategically incorporated into the construction designs. The non-uniform space filling (NUSF) designs allow the users to customize the regions of emphasis with a varied degree of concentration of points. These designs are desirable when the user wishes to focus on regions of interesting features, larger variability, or larger discrepancy between the computer model and the physical experimental data. The input-response space filling (IRSF) designs allow the consideration of the spread of design points in the input space as well as the estimated response values across its anticipated range. By constructing a Pareto front, the IRSF approach provides a suite of potential designs with varied emphasis of the input and response spaces. The methods and step-by-step implementation process are illustrated with examples to demonstrate their flexibility to match experimental goals.
Artificial intelligence (AI) algorithms provide state-of-art performance in a variety of applications. However, the performance of AI is reliant on informative datasets. In other words, these algorithms are vulnerable to data quality issues, such as mislabeled training data, out-of-distribution test data, class imbalance in both training and test data, adversarial attacks. The safety and robustness of AI applications must be evaluated and tested. In this work, we present a new framework based on experimental design to investigate the performance of AI classification algorithms. The performance of AI algorithms can be affected by a variety of factors, including the algorithm’s hyperparameters, data types, class proportions in training and test data (unequal proportions of classes), etc. Specifically, a space-filling design based on projection criterion is developed with an efficient construction algorithm. Our approach to construct the design of experiments can have a better capacity to comprehensively reveal the performance of AI algorithms. We gathered experimental AI algorithm performance data and conducted a statistical analysis to better understand how these factors influence the robustness of AI algorithms. Our framework can be used to evaluate the effects of factors on the AI algorithm robustness, sensitivity of the hyperparameters, to compare and assess the performance of AI algorithms against data quality.
Debunking Stress Rupture Theories Using Weibull Regression Plots
Anne Driscoll and Geoff Vining, Virginia Tech
As statisticians, we are always working on new ways to explain statistical methodologies to non-statisticians. It is in this realm that we never underestimate the value of graphics and patience! In this presentation, we present a case study that involves stress rupture data where a Weibull regression is needed to estimate the parameters. The context of the case study results from a multi-stage project supported by NASA’s Engineering Safety Center (NESC) where the objective was to assess the safety of composite overwrapped pressure vessels (COPVs). The analytical team was tasked with devising a test plan to model stress rupture failure risk in carbon fiber strands that encase the COPVs with the goal of understanding the reliability of the strands at use conditions for the expected mission life. While analyzing the data, we found that the proper analysis of the data contradicts accepted theories about the stress rupture phenomena. In this talk, we will introduce ways to graph the stress rupture data to better explain the proper analysis of the model and also explore assumptions.
Spatially Correlated Time-to-Event Model for Titan GPU Failures Data Under Competing Risks
Jie Min and Yili Hong, Virginia Tech
Graphics processing units (GPUs) are widely used in high-performance computing (HPC), and the reliability of GPU is of interest for the overall reliability of HPC systems. The Cray XK7 Titan supercomputer was one of the top ten supercomputers in the world. The failure times of more than 19,000 GPUs in Titan were recorded and previous research shows that the failure time of GPU may be affected by the GPU’s position inside the supercomputer. In this paper, we conduct in-depth statistical modeling of the effect of positions on GPU failures under competing risks with covariates and spatial correlated random effects. In particular, two major failure types of GPUs in Titan are considered, the positions of GPUs inside each cabinet are modeled as covariates, and the positions of cabinets are modeled as spatially correlated random effects. We use the powered-exponential covariance function to construct the spatial random effects’ covariance matrix and estimate the correlation of random effects between two failure modes. The Bayesian method is used for statistical inference, and posterior samples are obtained using the No U-Turn Sampler (NUTS), a Hamiltonian Monte Carlo Method. The proposed model combines competing risks and spatial random effects in modeling the Titan GPU failures data and our results interesting insights in GPU failures in HPC systems.
Session 4 (8-9:30AM)
Covariate Software Vulnerability Discovery Model to Support Cybersecurity Test & Evaluation
Lance Fiondella, University of Massachusetts Dartmouth
Vulnerability discovery models (VDM) have been proposed as an application of software reliability growth models (SRGM) to software security related defects. VDM model the number of vulnerabilities discovered as a function of testing time, enabling quantitative measures of security. Despite their obvious utility, past VDM have been limited to parametric forms that do not consider the multiple activities software testers undertake in order to identify vulnerabilities. In contrast, covariate SRGM characterize the software defect discovery process in terms of one or more test activities. However, data sets documenting multiple security testing activities suitable for application of covariate models are not readily available in the open literature.
To demonstrate the applicability of covariate SRGM to vulnerability discovery, this research identified a web application to target as well as multiple tools and techniques to test for vulnerabilities. The time dedicated to each test activity and the corresponding number of unique vulnerabilities discovered were documented and prepared in a format suitable for application of covariate SRGM. Analysis and prediction were then performed and compared with a flexible VDM without covariates, namely the Alhazmi-Malaiya Logistic Model (AML). Our results indicate that covariate VDM significantly outperformed the AML model on predictive and information theoretic measures of goodness of fit, suggesting that covariate VDM are a suitable and effective method to predict the impact of applying specific vulnerability discovery tools and techniques.
Rethinking Software Fault Tolerance
Kishor Trivedi, Duke University
Complex systems in different domains contain significant amount of software. Several studies have established that a large fraction of system outages are due to software faults. Traditional methods of fault avoidance, fault removal based on extensive testing/debugging, and fault tolerance based on design/data diversity are found inadequate to ensure high software dependability. The key challenge then is how to provide highly dependable software. We discuss a viewpoint of fault tolerance of software-based systems to ensure high dependability. We classify software faults into Bohrbugs and Mandelbugs, and identify aging-related bugs as a subtype of the latter. Traditional methods have been designed to deal with Bohrbugs. The key challenge then is to develop mitigation methods for Mandelbugs in general and aging-related bugs in particular. We submit that mitigation methods for Mandelbugs utilize environmental diversity. Retry operation, restart application, failover to an identical replica (hot, warm or cold) and reboot the OS are reactive recovery techniques applied after the occurrence of a failure. They are examples of techniques that rely on environmental diversity. For software aging related bugs, it is also possible to utilize a proactive environmental diversity technique known as software rejuvenation. We discuss environmental diversity both from experimental and analytic points of view and cite examples of real systems employing these techniques.
D-optimal Mixture Designs for Ordinal Responses
Rong Pan, Arizona State University
Ordered, categorical data in experimental studies are common in industries with sensory programs that typically solicit subjective responses from trained sensory experts. For example, Mancenido, Pan, and Montgomery (2016) described a case study where the quality attribute of interest from a three-chemical experiment was the fragrance intensity of chemical formulations, measured using an ordered, intensity scale from 1 (very weak) to 7 (very strong). Due to the lack of guidelines in designing mixture experiments for categorical responses in literature, the computer-generated optimal designs for continuous, numeric responses are often used in substitution. In our experience, using experimental designs which are ill-adapted for categorical responses could be detrimental to the modeling result. In this talk we will discuss the challenges in finding D-optimal mixture designs for ordinal responses and we will present an exchange algorithm that is efficient in finding a near optimal design. Our methods will be compared with other approaches.
Specifying Prior Distributions in Reliability Applications
Qinglong Tian, University of Waterloo
Especially when facing reliability data with limited information (e.g., a small number of failures), there are strong motivations for using Bayesian inference methods. These include the option to use information from physics-of-failure or previous experience with a failure mode in a particular material to specify an informative prior distribution. Another advantage is the ability to make statistical inferences without having to rely on specious (when the number of failures is small) asymptotic theory needed to justify non-Bayesian methods. Users of non-Bayesian methods are faced with multiple methods of constructing uncertainty intervals (Wald, likelihood, and various bootstrap methods) that can give substantially different answers when there is little information in the data. For Bayesian inference, there is only one method—but it is necessary to provide a prior distribution to fully specify the model. Much work has been done to find default or objective prior distributions that will provide inference methods with good (and in some cases exact) frequentist coverage properties. This paper reviews some of this work and provides, evaluates, and illustrates principled extensions and adaptations of these methods to the practical realities of reliability data (e.g., non-trivial censoring).
Estimating Pure-Error from Near Replicates in Design of Experiments
Caleb King, JMP Statistical Discovery, LLC, Thomas Bzik, Statistical Consultant, and Peter Parker, NASA
In design of experiments, setting exact replicates of factor settings enables estimation of pure-error; a model-independent estimate of experimental error useful in communicating inherent system noise and testing model lack-of-fit. Often in practice, the factor levels for replicates are precisely measured rather than precisely set, resulting in near-replicates. This can result in inflated estimates of pure-error due to uncompensated set-point variation. In this presentation, we review different strategies for estimating pure-error from near-replicate, including our own recent work as well as new material brought to our attention after the publication of our original work.
Monitoring Proportions with Two Components of Common Cause Variation
Rob Goedhart, University of Amsterdam, and William H. Woodall, Virginia Tech (retired)
We propose a method for monitoring proportions when the in-control proportion and the sample sizes vary over time. Our approach is able to overcome some of the performance issues of other commonly used methods, as we demonstrate in this paper using analytical and numerical methods. The derivations and results are shown mainly for monitoring proportions, but we show how the method can be extended to the monitoring of count data.
Session 5 (10-11:30AM)
New SPlit method for cross-validation
V. Roshan Joseph and Akhil Vakayil, Georgia Tech
For developing statistical and machine learning models, it is common to split the dataset into two parts: training and testing. The training part is used for fitting the model and the testing part for evaluating the performance of the fitted model. The most common strategy for splitting is to randomly sample a fraction of the dataset. In this talk, I will discuss an optimal method for data splitting called Support Points-based split (SPlit).
Active Learning for Deep Gaussian Process Surrogates
Annie Sauer, Robert B. Gramacy, and David Higdon, Virginia Tech
Deep Gaussian processes (DGPs) are increasingly popular as predictive models in machine learning for their non-stationary flexibility and ability to cope with abrupt regime changes in training data. Here we explore DGPs as surrogates for computer simulation experiments whose response surfaces exhibit similar characteristics. In particular, we transport a DGP’s automatic warping of the input space and full uncertainty quantification, via a novel elliptical slice sampling Bayesian posterior inferential scheme, through to active learning strategies that distribute runs non-uniformly in the input space — something an ordinary (stationary) GP could not do. Building up the design sequentially in this way allows smaller training sets, limiting both expensive evaluation of the simulator code and mitigating cubic costs of DGP inference. When training data sizes are kept small through careful acquisition, and with parsimonious layout of latent layers, the framework can be both effective and computationally tractable. Our methods are illustrated on simulation data and two real computer experiments of varying input dimensionality. We provide an open source implementation in the “deepgp” package on CRAN.
Advances in Orthogonal Minimally Aliased Response Surface (OMARS) Designs: Designs with Two-Level Categorical Factors, with a Blocking Factor, and Strong OMARS Designs
José Ares, Eric Schoen, and Peter Goos, KU Leuven
Response surface designs are a core component of the response surface methodology, which is widely used in the context of product and process optimization. The best known response surface designs are central composite and Box-Behnken designs. The problem with these designs is that the number of tests they require increases rapidly with the number of factors. In this presentation, we present a new family of response surface designs called orthogonal minimally aliased response surface or OMARS designs. The OMARS designs are available for many different run sizes and therefore bridge the gap between the small definitive screening designs and the large classical response surface designs. While the original OMARS designs included only three-level quantitative factors, there are now also mixed-level OMARS designs including both three-level quantitative factors and two-level categorical ones. In addition, we show how to arrange OMARS designs in blocks, how to select the best possible OMARS design for a given experiment, and how to assess the trade-off between the run size and the statistical quality of the design. We end the talk with the presentation of strong OMARS designs, which form a subclass of OMARS designs specially suited for optimization experiments.
Model Robust Response Surface Optimization for Mixed Quantitative and Qualitative Factors
Gautham Sunder and Christopher Nachtsheim, University of Minnesota
Motivated by our joint work with a medical device manufacturer on hyperparameter optimization of deep neural networks, in this study, we propose a model robust response surface optimization (RSO) strategy for the dual goals of model estimation and response optimization in the presence of mixed quantitative and qualitative (QQ) factors. RSO literature prescribes using Classical-RSO (C-RSO) methods when the response surface is assumed to be second-order and Bayesian Optimization (BO) when the response surface is assumed to be complex and nonlinear. However, neither C-RSO nor BO formally validate the assumptions made on the response surface complexity. The proposed model robust RSO strategy is highly efficient when the experimenter is ambiguous about the response surface complexity, second-order or complex and nonlinear. The proposed model robust RSO strategy is initialized with a model robust starting design which is a compromise between space-filling designs and Bayesian D-optimal designs and is supersaturated for a full second-order model. We formally validate the adequacy of a second-order approximation by proposing a modified version of RPtest, a bootstrap goodness-of-fit test for validating a second-order approximation. The proposed model robust RSO strategy transitions to BO methods if the second-order approximation is inadequate. Our simulation study illustrates that the proposed RSO method is highly efficient for estimating a second-order response surface with QQ factors and the goodness-of-fit test proposed has high power to detect the inadequacy of a second-order approximation. Additionally, the proposed RSO strategy has comparable performance to standard BO methods for estimating a complex response surface.
Incorporating Uncertainty for Enhanced Leadership Scoring and Ranking in Data Competitions
Lu Lu, University of South Florida and Christine Anderson-Cook, Los Alamos National Laboratory
Data competitions have become a popular and cost-effective approach for crowdsourcing versatile solutions from diverse expertise. Current practice relies on the simple leaderboard scoring based on a given set of competition data for ranking competitors and distributing the prize. However, a disadvantage of this practice in many competitions is that a slight difference in the scores due to the natural variability of the observed data could result in a much larger difference in the prize amounts. In this article, we propose a new strategy to quantify the uncertainty in the rankings and scores from using different data sets that share common characteristics with the provided competition data. By using a bootstrap approach to generate many comparable data sets, the new method has four advantages over current practice. During the competition, it provides a mechanism for competitors to get feedback about the uncertainty in their relative ranking. After the competition, it allows the host to gain a deeper understanding of the algorithm performance and their robustness across representative data sets. It also offers a transparent mechanism for prize distribution to reward the competitors more fairly with superior and robust performance. Finally, it has the additional advantage of being able to explore what results might have looked like if competition goals evolved from their original choices. The implementation of the strategy is illustrated with a real data competition hosted by Topcoder on urban radiation search.
Understanding and Addressing Complexity in Problem Solving
Willis Jensen, W. L. Gore & Associates, Inc. and Roger Hoerl, Union College
Complexity manifests itself in many ways when attempting to solve different problems, and different tools are needed to deal with the different dimensions underlying that complexity. Not all complexity is created equal. We find that most treatments of complexity in problem solving within both the statistical and quality literature focus narrowly on technical complexity, which includes complexity of subject matter knowledge as well as complexity in the data access and analysis of that data. The literature lacks an understanding of how political complexity or organizational complexity interferes with good technical solutions when trying to deploy a solution. Therefore, people trained in statistical problem solving are ill-prepared for the situations they are likely to face on real projects. We propose a framework that illustrates examples of complexity from our own experiences, and the literature. This framework highlights the need for more holistic problem-solving approaches and a broader view of complexity. We also propose approaches to successfully navigate complexity.
Session 6 (1:30-3PM)
Modeling Semiconductor Yield Using Statistical Engineering – A Case Study
Dana Krueger, Roadmap Research Global, and Douglas Montgomery, Arizona State University
Yield is a key process performance characteristic in the capital-intensive semiconductor fabrication process. In an industry where machines cost millions of dollars and cycle times are several months, predicting and optimizing yield are critical to process improvement, customer satisfaction, and financial success. Semiconductor yield modeling is essential to identifying processing issues, improving quality, and meeting customer demand in the industry. However, the complicated fabrication process, the massive amount of data collected, and the number of modeling approaches available make yield modeling a complex and unstructured problem that crosses potential silos around various process engineering domains, device engineers, defect metrologists, statisticians, and yield analysts. This complexity makes the problem a strong candidate for applying statistical engineering principles. This project first examines the issues of data pedigree and integration as a dataset is created to combine wafer-level process measurements (such as critical dimensions, where process owners use control charts to detect problems), differences found in defectivity scans after each processing layer of the wafer (where defect metrology engineers use scanning electron microscopes to identify problems and look for patterns on the scans), and electrical test data that assess functionality for the wafer (such as threshold voltage and approximately 400 other parameters). The yield data are captured after testing each die, and each is assigned to a bin that is designated as “passing” or “failing.” Various failure modes are denoted by different bin numbers. Once the dataset is constructed, a yield modeling approach must be determined. Relying on statistical methods alone limits the effectiveness of this effort. This case study reviews the benefits and limitations of using an iterative approach to improving yield models by first applying generalized linear models (GLMs), then generalized linear mixed models (GLMMs), and finally a combination of classification trees and GLMs. This third approach illustrates how being “tool agnostic” can help the statistical engineer find new, creative solutions to modeling challenges. The results show the strengths and limitations of each modeling approach. The challenge and benefits of obtaining subject matter expertise on the project throughout the modeling process and ideas for how a developed model could be deployed and sustained will also be discussed.
Fleet Predictions Using Bayesian Multilevel Models: An Example using Targeting Pod Accuracy
Todd Remund, Northrop Grumman Space
Fleet behavior and structure must be represented in the model, so that the model can represent the fleet. Deeper thinking suggests that fleet behavior and structure must be represented in the data collection effort, so that the model best represents the reality of the fleet. This leads to the statement: Appropriate models represent data, data represents reality—we hope. Fleets make up many populations of operational systems across the aerospace industry, as well as plenty of other engineering industries. Fleets can be made up of subgroups such as batches, material lots, software versions, supplier, and aircraft tail numbers to name a few. With these varying layers of groups comes structure that can heavily dictate the decisions and guidance to operators. Some of the results from modeling and analysis guide operators of systems to make real-time decisions that can have heavy implications such as weapon delivery to a target in the proximity of friendlies. Safety and effectiveness drive the need to seek proper representation of fleets so that indeed these two aspects of operational activity can be preserved to the best extent possible. Multilevel models are not a new approach for data analysis and prediction. Alternate names for this capability are mixed models, random coefficient models, hierarchical models. Textbooks have existed for decades describing both frequentist and Bayesian implementations of this method. The literature and computer applications available offer plenty of avenues to utilize these modeling methods. Within the aerospace and engineering communities there are ample opportunities to apply this approach. Application to tolerance intervals and probabilistic metrics are the focus here with the emphasis on Bayesian implementation. Methods of applying these to the aerospace industry will be given with an example of targeting accuracy in target pod performance. This is a call to seriously consider, or increase, the use of multilevel models and associated data collection to model fleet realities. Methods are offered in a simple example to depict how the Bayesian output from multilevel models can be specifically used to represent fleet realities in performance measures. Where some applications appropriately remove nuisance effects from performance metrics such as tester ID, many aerospace applications have the opposite need—to account for fleet group-to-group effects instead of removing them. This paper advances the use of the Bayesian multilevel model’s posterior distribution to account for these random group-to-group effects within predictions involving tolerance intervals and probabilistic representations of fleet success criteria. An example of notional target pod data will be used to demonstrate these methods.
Tuning Parameter Selection for Variable Selection via an R2 Metric
Jon Stallrich and Julia Holter, North Carolina State University
Many penalized estimators are capable of performing simultaneous variable selection and estimation, but are burdened by tuning parameter selection. Many tuning parameter selection procedures tend to choose more variables than necessary and are computationally expensive. We propose a tuning parameter selection strategy based on the squared correlations between the observed response and the predicted values of models, rather than squared error loss. Tuning parameters selected under our procedure are shown to better balance predictive capability and model simplicity. The approach is computationally efficient and, in the domain of penalized estimation, competitive with popular tuning parameter selection techniques in its capacity for variable selection. We explore the efficacy of this approach in a project involving optimal EMG placement for a robotic prosthesis controller.
A Probabilistic n-ary Classification Algorithm for Flexible Types of High-Dimensional, Heterogeneous, and Complex Data
Madeline Ausdemore, Los Alamos National Laboratory
In many applications, there is a need to classify data into one of n classes, where the data may be scalar, functional, or high-dimensional. This presentation proposes a probabilistic n-ary classification model that allows for simultaneously determining the class of a set of objects, rather than classifying each object in turn. This is particularly useful in forensic or intelligence scenarios, in which we may observe several pieces of glass on a burglary suspect, several pieces of paint on a hit-and-run vehicle, or several particles of nuclear material. In these instances, it is logical to assume that the multiple pieces of glass originate from the same window, that the paint chips originate from the same car, or that the particles of nuclear material were processed under similar conditions. Oftentimes, differentiating between classes can be reduced to a simple classification or model-selection problem that can be addressed using machine learning or likelihood-based techniques. However, when we are faced with high-dimensional, complex, heterogenous data and limited sample sizes, the process is not so straightforward: small sample sizes rule out many machine learning techniques, and high-dimensional, complex, or heterogenous data make it impossible to assign the necessary probability measures for assigning Bayes Factors or performing any other likelihood-based inference. Furthermore, multi-class extensions of many algorithms can lead to ambiguous results. This presentation proposes a model that leverages the properties of kernel functions to obtain a vector of scores that characterizes pairwise comparisons of all objects. This vector consists of within class scores, which arise when compared objects originate from a common class, and between class scores, which arise when compared objects originate from different classes. The model capitalizes on the variability that exists within and between these sets of scores to address the above inference question. Because the method relies on a kernel function, the method can be tailored to any type of data by merely modifying this function, and the following inference process remains the same. Furthermore, the model makes only one assumption, which can be satisfied through the design of the kernel function. This presentation considers the development of the proposed model along with applications. First, the development of the model is presented. Next, an application to the MNIST handwritten digit dataset is considered to demonstrate the usefulness of this algorithm, and to compare its performance to the classic SVM. Finally, an application to a real forensic dataset is considered.
Applied vs Academic Approaches to Statistical Process Monitoring (SPM)
James Lucas, J. M. Lucas and Associates
I was an integral part of the largest known implementation of CUSUM procedures. CUSUM was the Statistical Process Monitoring (SPM) procedure used in our company wide quality system. There are differences between what we did when we were implementing CUSUM SPM for our quality system and the recommendations contained many academic articles. This talk will describe the differences and explain why we conducted SPM differently from many academic recommendations. We introduce this talk by giving an overview of our quality system. We introduce CUSUM control procedures and give an implementation example. The example describes the specific CUSUM procedure that was used in a majority of our implementations. This specific CUSUM was used so frequently that I have described it as the 80% solution. A head-start feature was almost always used. We show that a headstart feature should be part of most SPM implementations. We then discuss specifics about our implementation, describe differences from academic recommendations and tell why those different choices were made. Some specific topics discussed are: Does a CUSUM signal cause you to shut down the process? Why or when does it cause a process shutdown? What False Alarm Probability (FAP) or Average Run Length (ARL) should be used? Why. Compare our choices with academic recommendations. How much historical process data is needed to set up an individual CUSUM loop. Academics frequently recommend much more data than we used. Our recommendations can be succinctly described as we recommend that you implement and update the CUSUM procedure rather than wait to implement. A recent review of data aggregation procedures did not describe the aggregation procedures we used. We describe our general process model that uses five variance components. Our modelling structure often used vessels, sub-vessels sub-sub-vessels. We show how this fact determined our aggregation procedures. Reducing process variability is an important goal of the quality system. However almost all of the CUSUM loops were for shifts in mean level. We tell how variability reduction was achieved by using mean-level control of sub-vessels. I conclude by comparing academic and applied approaches to SPM research. I describe the recent work of our research team that improves SPM and makes it easier to conduct. We have developed Power Guidelines for SPM. The guidelines are succinctly described using a stoplight model. Our power guidelines use an ARL ratio; we recommend an ARL Ratio=ARLic/ARLoc ≥ 20. Much of our recent research concerns SPM for attribute data. Our power guidelines are used to determine shift detection possibilities when count levels low. We describe our algorithmic approach for designing CUSUM for Counts SPM procedures. Table lookup procedures are more difficult to use when there are distributional; changes with changes in sample sizes and mean level. We take “A Closer Look at Geometric and Bernoulli Procedures.” We compare steady-state and initial state ARL procedures and tell when each is most appropriate. We tell why CCC-r procedures should seldom if ever be used. This continues recent valuable research criticizing misguided SPM methods.
A Consistent Data Model for Different Data Granularity in Control Charts
Scott Grimshaw, Brigham Young University
After a long-running show was canceled, control charts are used to identify if and when viewing drops. The finest granularity daily viewing has high autocorrelation and control charts use residuals from a seasonal ARIMA model. For coarse granularity data (weekly and monthly viewing) an approximate AR model is derived to be consistent with the finest granularity model. With the proposed approach, a longer memory model is used in the granular data control charts that reduces the number of false alarms from control charts constructed treating granular data as a different measurement.