# October 4th

### Session 1 (9:15-10AM)

Murat Kulahci, Technical University of Denmark and Lulea University of Technology

Manufacturing has been going through rejuvenation through automation and digitalization. This brings forth the new industrial era also called Industry 4.0 in Europe. For the past several years we have collaborated with companies from various industries that have all been going through this transformation towards more digital production where the goal is to digitalize the information content in production and hence enhance knowledge about their processes. The expectation is that the data often collected in extensive amounts should subsequently contain the sought after information. In their pursuit towards obtaining relevant information, many companies from a wide range of industries are now frantically scrambling to collect “more data” from their processes under the wishful thinking that it contains the “necessary information.” It has however been our unfortunate and yet repeated observation that this is done in a frenzy with little consideration to issues such as: What kind of problems we actually like to solve, what kind of data is in fact needed and how we can collect such data efficiently, how we can handle such data properly and how we can make sure the extracted information is used most effectively. Instead this haste in collecting as much data as possible generates more problems than it often solves. As a response, academics and practitioners alike rush into yet another frenzy of dealing with these problems, which could very well be artifacts of poor planning in data collection. Over the years we have through our own collaborations collected a vast amount of examples of “bad” experiences with Big Data applications within production statistics. Our goal in this paper is to share those experiences and lessons learned in dealing with practical issues from data acquisition to data management and finally to data analytics. When relevant, we will also provide paths we ended up taking towards resolving some of those issues.

**Big Data at Owens Corning: A Case Study on Deriving VOC from 1.4 Million Words of Free-Form Text**

Tina V. Pickerel, Owens Corning; Keith Bowers, Bowers Management Analytics

Motivation. Owens Corning was not able to effectively analyze the large volume of unstructured documents that made up their product return files. In order to analyze, classify and act on this information, they turned to big data techniques. Description of Work Done. We analyzed over 10,000 complaint and product return files from over forty Owens Corning plants around the world. These unstructured text files were composed of over 1.4 million words in 6 languages. We used Natural Language Processing and other big data techniques to classify and rank all the text complaints into natural clusters and ranked those clusters by cost and quantity. We then used a Bayesian Multilevel Model to prioritize improvement efforts across our many manufacturing sites. This combination of big data tools helps set our agenda so that our quality improvement efforts reap maximum benefit. This presentation focuses on the practical uses and benefits of big data tools for the Quality Professional rather than the technical details of the algorithms. Significance. The case study demonstrates a very effective application of big data techniques to a common problem faced by manufacturing firms – how to analyze and profit from the huge amount of useful information now hidden in their text documents.

**Slack-Variable Versus Mixture Modeling for Mixture Experiments: A Definitive Comparison**

Greg F. Piepel, Dayton C. Hoffmann, and Scott K. Cooley, Pacific Northwest National Laboratory

A mixture experiment (ME) involves (i) combining various proportions of the components in a mixture making up an end product, and (ii) measuring the values of one or more response variables for each mixture. The proportions of the components making up the end product must sum to 1.0. Since the late 1950s, statistical methods for designing MEs and modeling response variables as functions of the component proportions have appeared in the literature. The slack‐variable (SV) approach for designing and modeling MEs was used prior to the existence of ME methods and is still favored by some researchers. The SV approach develops the experimental design and models in terms of proportions of all but one of the components, which is referred to as the SV. The proportion of the SV for a given mixture in an experimental design is obtained by subtracting from 1.0 the sum of the proportions of the remaining components, thus “taking up the slack”. The SV approach uses classical statistical experimental designs and model forms, whereas the ME approach uses designs and models specifically for MEs that account for the proportions of varied components in a mixture summing to 1.0.

Over the years, several journal articles have discussed and advocated using SV models. These articles (including recent ones published in 2015 and 2016) have claimed SV models have collinearity and goodness-of-fit advantages over ME models. On the other hand, a 2009 article recommended ME models because they can be more appropriate and fit at least as well or better than SV models in four situations where the SV approach is often advocated.

This presentation summarizes the results of a definitive comparison of the SV and ME approaches for modeling. Analytical methods and examples from the literature are used to assess and illustrate the advantages and disadvantages of the SV and ME modeling approaches. The results of the definitive comparison of the ME and SV modeling approaches are used to make recommendations for choosing the ME or SV modeling approach in practice.

### Session 2 (10:30AM-12PM)

**Strategic Design and Analysis for Hosting Data Competitions**

Christine Anderson-Cook, Los Alamos National Laboratory; Lu Lu, , Los Alamos National Laboratory; Kary Myers, University of South Florida

Open source data competitions have become very popular to accelerate complex problem solving and advance methodological development by leveraging the depth and breadth of solutions possible from crowdsourcing. While many statisticians have experience as competitors, we focus instead on strategies for the host. Currently, implementations by the host are highly variable and can sometimes lead to unsatisfying solutions (i.e. learning algorithms). This talk presents new design and analysis approaches to improve learning from hosting data competitions.

First, strategic generation of relevant and informative data is the key to maximizing the learning outcome and finding the best solution that closely matches the study goals. New methods for designing competition data are proposed based on (1) precisely defining the problem, (2) identifying the target competitors, (3) strategically generating data sets to test interpolation and extrapolation to new scenarios of interest, and (4) carefully preventing unintentional artifacts in the competition data sets.

Second, developing a robust and efficient scoring metric is essential for appropriately ranking the competitors to match study goals and choosing winners on the leaderboard. Beyond the leaderboard, we propose a complementary post-competition analysis to improve understanding on more detailed sub-questions regarding similarities and differences between algorithms across scenarios, and universally easy or hard regions of the input space. The combination of the leaderboard and post-analysis can facilitate deeper understanding of competitor solutions.

The methods are illustrated with a current competition to evaluate algorithms capable of detecting, locating, characterizing radioactive materials, which is of particular interest to the Office of Proliferation Detection (OPD). However, the general principles, methods, and statistical tools are relevant for much broader problems.

**Statistics and Metrology: A Collaboration**

Stephen V. Crowder and David S. Walsh, Sandia National Laboratory

Opportunities for collaboration between statisticians and metrologists are plentiful throughout industry, government agencies and the national laboratories. The statistician has a great deal to offer in these collaborations, including sizing the measurement experiment, identifying appropriate measurement data, and providing technical support in the uncertainty analysis. This talk will discuss a collaboration at Sandia National Laboratories between the Statistical Sciences Department and Sandia’s Primary Standards Lab (PSL) involving the estimation of uncertainty of laboratory neutron measurements.

**Customizing and Assessing Deep Learning for Specific Tasks**

Amir Tavanaei, William Brenneman, and Matthew Barker, Procter and Gamble

Deep learning is inspired from the human brain where the learning in neural networks (NNs) occurs in the feature discovery hierarchy through many layers of non-linear units called neurons. Recently, deep learning approaches have shown remarkable performance in many areas of pattern recognition. In spite of a general idea which considers deep learning as a black box for data processing, different neural architectures and learning methods provide diverse data representations that are interpretable at each layer. Thus, the main step in developing a deep learning model, or generally a machine learning system, is to assess the type and size of data required for accomplishing our pattern recognition tasks using appropriate model structures.

This presentation discusses three aspects: 1) a brief introduction to different deep learning architectures such as deep NNs, convolutional NNs, deep belief networks, and deep recurrent NNs; 2) problems and data that can be solved and processed by deep learning; and 3) a review of real case challenges that have been (and are being) addressed by deep learning such as large-scale image recognition, healthcare and biomedical issues, and big data mining and analysis. At the end, we provide a viewpoint to connect the right tool for the right purpose in a set of applications and

problems.

**Understanding and Leveraging Today’s Artificial Intelligence and Machine Learning**

Michael Garris, NIST

A new generation of Artificial Intelligence (AI) capabilities based on machine learning (ML) have emerged and are accelerating technological innovations that are laying the groundwork to change society. But what is fact and what is hype? This talk will unpack the very broad category of AI technologies, distinguishing today’s “narrow” AI from “general” AI of the future, describing the different types of problems that can be solved with ML, as well as the workflow and lifecycle of developing ML/AI solutions, along with challenges and limitations with today’s AI, while providing real-world use cases of ML/AI at the National Institute of Standards and Technology.

**Fast Computation of Exact G-Optimal Designs Via Iλ -Optimality**

Christopher J. Nachtsheim, University of Minnesota; Lucia N. Hernandez, National University of Rosario

Exact G-optimal designs have rarely, if ever, been employed in practical applications. One reason for this is that, due to the computational difficulties involved, no statistical software system currently provides capabilities for constructing them. Two algorithms for exact G-optimal design construction of small designs involving one to three factors have been discussed in the literature: one employing a genetic algorithm and one employing a coordinate-exchange algorithm. However, these algorithms are extremely computer intensive in small experiments and do not scale beyond two or three factors. This work develops a new method for constructing exact G-optimal designs using the integrated variance criterion, Iλ-optimality. We show that with careful selection of the weight function, a difficult exact G-optimal design construction problem can be converted to an equivalent exact Iλ-optimal design problem, which is easily and quickly solved. We illustrate the use of the algorithm for full quadratic models in one to five factors.

**Central Composite Experimental Designs for Multiple Responses with Different Models **

Wilmina M. Marget, Augsburg University; Max D. Morris, Iowa State University

Central Composite Designs (CCDs) [Box and Wilson, 1951] are widely accepted and used experimental designs for fitting second order polynomial models in response surface methods. However, these designs are based only on the number of explanatory variables being investigated. In a multi-response problem where prior information is available in the form of a screening experiment or previous process knowledge, investigators often know which factors will be used in the estimation of each response. This work presents an alternative design based on CCDs that allows main effects to be aliased for factors that are not related to the same response. This results in fewer required runs than current designs, saving investigators both time and money, by taking this prior information into account.

### Session 3 (2-3:30PM)

**A Recommended Set of Indices for Evaluating Process Health**

Kevin White, Eastman, Willis Jensen, WL Gore, and John Szarka, WL Gore

Assessment of process health is an important aspect of any quality system. While control charts are often used in the day-to-day operations of processes, it is often helpful to take a retrospective look at past performance in the form of an overall process health assessment to identify specific opportunities for future improvement efforts. This presentation will recommend a complete set of indices to evaluate various aspects of process health including actual process performance, process stability, process centering, and potential process capability. It will be shown how the set of indices can be particularly useful when there are many processes to evaluate simultaneously. Both tabular and graphical methods using these indices will be shown for quick identification of potential problem areas that warrant more investigation. It will be demonstrated how these metrics can be used when there are both two-sided specifications and one-sided specifications (with and without a defined target). Finally, the presentation will explore the connection between the process and measurement system and recommend indices for evaluating the health of the measurement system.

**Specification Setting – An Adaptive Approach**

Brad Evans, Pfizer Research and Development

Specification setting is a critical component in the development of every new pharmaceutical product. Many specifications are set based on compendial, clinical, safety, and efficacy limits. However, there are times when specifications are set based on non-clinical data collected during development and manufacturing transfer. Statistics plays an important role here, leveraging the knowledge gained through development and in particular based on the analytical data collected from relevant scale batches. Setting data-driven specifications is challenging due to the sample size available at the time the specification is required. This presentation covers gives an overview of Tolerance Intervals, an adaptive approach based on Tolerance Intervals, with consideration given to both large and small sample scenarios, as well as producer risk vs. consumer risk, particularly within a highly regulated landscape. The approach, a comparison to control charts as well as summary measures will be presented to help understand the potential risks under different scenarios.

Chris Gotwalt, JMP Division of the SAS Institute; Philip J. Ramsey, University of New Hampshire

There are two different goals to statistical modeling: Explanation, and Prediction. Explanatory models often predict poorly (Shmueli, 2010). Often analyses of designed experiments (DOE) are explanatory, yet the experimental goals are prediction. Predictive modeling exercises typically partition the data into training and validation sets where the training set is often used for fitting model models and the validation set is used for an independent assessment of goodness of fit. Most DOEs have insufficient observations to form a validation set precluding direct assessment of prediction performance. Furthermore, the time and resources available for a DOE often do not allow for an additional set of validation trials We demonstrate a “Balanced Auto-Validation” technique using the original data to create two nearly identical copies of that data; one a training set and the other a validation set; these two sets differ only in how the rows are weighted. The weights for corresponding rows in the training and validation sets are identically gamma distributed, but are constructed so that these pairs of weights are highly anticorrelated. In this way, observations that contribute more to the training copy of the data contribute less to the validation copy (and vice-versa). A bootstrapping-like simulation is employed where for each simulation step a new set of weight pairs are created and a variable selection procedure is applied to the data, where the parameters are estimated using the training weights and the model selection criteria is the weighted sum of squares using the validation weights. Although our focus is on least squares-based models in the context of the analysis of designed experiments, the technique is very general in that it can extend the applicability of many predictive modeling techniques to smaller datasets common to laboratory and manufacturing studies. Two bio-pharma process development case studies are used to demonstrate the feasibility of the approach. Both datasets are Definitive Screening Designs that are accompanied by a fairly large number of confirmation runs.

**Predictive Response Surface Models: To Reduce or Not to Reduce?**

Byran Smucker, Miami University; David J. Edwards, Virginia Commonwealth University; Maria Weese, Miami University

In classical response surface methodology (RSM) the final, optimization step uses a small number of factors that clearly drive the process under study. However, in practice sometimes experimenters fit a second-order model without having done much previous experimentation. In this case, the true model is uncertain and using the full model may lead to overfitting. In this study, we obtain 25 responses from 12 response surface studies from the RSM literature, each of which include published validation runs. We analyze the original RSM experiments under several strategies, including the full second-order model, reducing via p-values, forward selection, and the Lasso. We then compare the predictions for these various methods with the actual validation responses to determine which method made the best predictions. We also study the same analysis methods using simulated data.

**Condition-Based Maintenance Policy under Gamma Degradation Process**

David Han, University of Texas at San Antonio

(Presentation unavailable. Please contact the author for further information about talk.)

As a part of System Health Management, Condition-Based Maintenance utilizes modern sensor technology for periodic inspections of a system, and the maintenance actions are based on the inspection of working conditions of the system unlike traditional methods. It is an effective method to reduce unexpected failures as well as the operations and maintenance costs. In this work, we discuss the condition-based maintenance/replacement policy with optimal inspection points under the gamma degradation process. A random effect parameter is used to account for population heterogeneities and its distribution is continuously updated at each inspection epoch. The observed degradation level along with the system age is utilized for making the optimal maintenance decision. Aiming to minimize the total discounted operational costs, we investigate the structural properties of the optimal policy and determine the optimal inspection intervals.

**Estimating the Uncertainty of the Change in Holdup Inventory**

Stephen Croft and Philip Gibbs, Oak Ridge National Laboratory

One of the challenging aspects of Nuclear Material Accountancy and Control (NMAC) is monitoring and evaluating in process nuclear material inventories. Some fraction of process throughput inevitably remains in the process as hold-up where it is difficult to measure. The amounts and statistical uncertainty of hold-up can be large enough to completely dominate NMAC inventory loss detection limits. This paper discusses the statistical approaches and methods used to determine the uncertainty in the amounts and change in holdup inventory between two successive periods. The amount of holdup will vary based on process design and chemical form of the material. Ideally it is minimized through cleanout procedures. However, there is a certain amount of fixed holdup (material that is very difficult or impossible to clean out) that will remain. History has shown that ingrowth to an equilibrium “fixed” hold-up amount can never truly be assumed in an operating line. It is more likely for there to be a variation in time including the possibility for monotonic growth. The fixed holdup is periodically determined by an established nondestructive assay (NDA) measurement procedure. For loss evaluations holdup inventory appears in the inventory equation as the term highlighted in red in the expression below [see page 2.51 NUREG-CR-2935]: Loss Detection for a process unit = [Receipts (loss detection) – Shipments (loss detection)] + Uncertainty beginning holdup – Uncertainty ending Hold-up)] These methods are part of NMAC which is an integrated approach to nuclear safety, security and safeguards intertwined with facility operations. NMAC monitors nuclear inventories to provide detection against unauthorized removals for illicit purposes.

## October 5th

### Session 4 (8-9:30AM)

**Sequential Bayesian Design for Accelerated Life Tests**

Yili Hong, Virginia Tech

Most of the recently developed methods on optimum planning for accelerated life tests (ALT) involve “guessing” values of parameters to be estimated, and substituting such guesses in the proposed solution to obtain the final testing plan. In reality, such guesses may be very different from true values of the parameters, leading to inefficient test plans. To address this problem, we propose a sequential Bayesian strategy for planning of ALTs and a Bayesian estimation procedure for updating the parameter estimates sequentially. The proposed approach is motivated by ALT for polymer composite materials, but are generally applicable to a wide range of testing scenarios. Through the proposed sequential Bayesian design, one can efficiently collect data and then make predictions for the field performance. We use extensive simulations to evaluate the properties of the proposed sequential test planning strategy. We compare the proposed method to various traditional non-sequential optimum designs. Our results show that the proposed strategy is more robust and efficient, as compared to existing non-sequential optimum designs. The supplementary material for this paper is available online.

**Comparing Two Kaplan-Meier Curves with the Probability of Agreement**

Nathaniel Stevens, University of San Francisco

The probability of agreement has been used as an effective strategy for quantifying the similarity between the reliability of two populations. In contrast to the p-value approach associated with hypothesis testing, the probability of agreement provides a more realistic assessment of similarity by accounting for a practically important difference. This talk discusses two approaches for assessing the probability of agreement and its associated uncertainty for comparing the Kaplan-Meier curves which estimate the reliability of two populations. The first approach provides a convenient assessment based on large sample approximations. The second approach offers more precise estimation by using the nonparametric fractional random-weight bootstrap approach. Both methods are illustrated with examples for which comparing the reliability curves of related populations is of interest.

**Introduction to the Design and Analysis of Order-of-Addition Experiments**

The order in which components are added in a chemical batch, paint formulation, film, food product, or a study of protein transport may be a primary consideration in an experiment, especially in its earliest stages. Until our recent work, little research had been done on the design of such order-of-addition (OofA) experiments. We define a reference standard of OofA experiments by extending the idea of orthogonal arrays. For strength-2 designs, upon which we focus most of our attention, we find that OofA orthogonal arrays require *N *= 0 mod 12 runs when the number *m* of components exceeds 3. We consider a *c*^{2} criterion to measure the balance of an OofA array, and show that for strength-2 designs, OofA OA’s (corresponding to *c*^{2} = 0) are essentially equivalent to D-optimal designs. In many situations, a number of non-isomorphic designs exist; in such cases we use additional criteria as finer measures of its quality, including their strength-3 properties. We then extend these optimal OofA designs to incorporate standard process variables as well so that, for example, temperature or mixing speeds may be included. Our methods can also take into account natural restrictions that the experimenter may have, such as requiring that one component is always added before another. Finally, we provide examples of the analysis of three related OofA experiments.

**Design and Analysis for Order-of- Addition Experiment: Some Recent Advances**

Dennis Lin, Pennsylvania State University

In Fisher (1971), a lady was able to tell (by tasting) whether the tea or the milk was first added to the cup. This is probably the first popular order of addition experiment. The question to be addressed is how to determine the “optimal” order of the additions? There, only two (2!=2) potential order sequences are possible—thus it is relatively simple. In general, there are m required components and we hope to determine the optimal sequence for adding these m components in order. There are in total of m! potential orders to be tested (note that, for example when m=10, m!=10! equals 3,628,800—this is impossible to test them all). Order of addition, as an important factor affecting practical life, is involved in many areas including medical, food and film industry. Knowing the optimal order of addition of components related in production is crucial. However, study in such an important subject are rather limited in Statistical literature. This is a relatively new area for industrial Statistics. It is anticipated that this session will open a new Pandora box for design of experiment.

**Bias/Variance Trade-Off in Estimates of a Process Parameter Based on Temporal Data**

Tricia Barfoot, Emmertros Ltd; Stefan Steiner, University of Waterloo

In the analysis of performance data over time, common objectives are to compare estimates of the current mean value of a process parameter with a target, over levels of the covariates, across multiple streams, and over time. When samples are taken over time, we can make the desired estimates using only the present time data or an augmented dataset that includes historical data. However, when the characteristic is drifting over time and sample sizes are small, the decision to include historical data trades precision for bias in the present time estimates. We propose an approach that regulates the bias-variance tradeoff using Weighted Estimating Equations where the estimating equations are based on a suitable Generalized Linear Model adjusting for the levels of the covariates. A customer loyalty survey for a smartphone vendor will be presented and resulting present time estimates of Net Promoter Score will be compared across various approaches applied to example data and simulated data.

**DP-Optimality in Terms of Multiple Criteria and Its Application to the Split-Plot Design**

Shaun S. Wulff, University of Wyoming

Choosing an optimal design is inherently a multi-criteria problem. This research develops methodology for selecting optimal designs with respect to multiple conflicting criteria. A Pareto approach is used that conveniently allows the experimenter to evaluate the design trade-offs without having to check all the criteria individually and without having to specify weighted combinations of the criteria. The methodology is demonstrated for incorporating pure error, along with traditional design criteria, for choosing optimal completely randomized designs (CRDs). The proposed approach also allows extension of the criteria for purposes of incorporating pure error degrees of freedom into the selection of optimal split-plot designs (SPDs). Results will be compared to optimal designs presented in the literature.

### Session 5 (10-11:30AM)

**FPL: Power Delivery’s Powerful Predictors**

Daniel Barbosa and Yinuo Du, Florida Power & Light

(Presentation unavailable. Please contact the authors for further information about talk.)

Our Story: Over the last 100 years, utilities have always initiated their restoration processes after an outage occurs. Our team challenged that model by leveraging data from our smart meters. Equipment condition data from the meters has allowed us to actually predict power outages and respond before it occurs. Analysis: The team analyzed millions of data points from the smart meters with conventional statistical tools. However, the team could not determine any pattern correlations to detect an intermittent customer power outage. Intermittent power outages are very infrequent and difficult to detect, producing an imbalanced data set. Solutions: We were inspired by other world-class companies that had made headlines with their innovative techniques of using data to anticipate their customers’ preferences. The team used data mining approaches to develop decision tree models and create a predictable algorithm. Results: The predictable algorithm had a high accuracy in detecting intermittent power outages. Not only did we detect these outage types, but now we could proactively prevent unplanned customer outages. We automated and operationalized the algorithm into our daily restoration processes creating more than 2,000 predictable customer activities in 2015.

**Storm Outage Forecasting: Handling Uncertainty and Dealing with Zero-Inflating**

Seth Guikema, University of Michigan

There have been substantial advances in power outage forecasting for storms in the past decade. For some types of storms such as hurricanes, modeling efforts are relatively mature and models are used operationally in practice. However, there are substantial challenges remaining. The first is uncertainty in the forecasts. Any prediction of the impacts of an event prior to the event is inherently uncertain, yet many of the existing models do not account for and represent this uncertainty. This remains a significant technically challenge, and this talk presents ongoing work in this area. The second major challenge is zero-inflation in the outage data. Zero-inflation occurs when there area large fraction of the spatial units with zero outages, and this is a common problem, even for high-impact events if the modeling is done at fine spatial scale. Zero-inflated input data can lead to a situation in which standard error metrics (MSE, MAE, raw accuracy, etc.) are substantially misleading. This talk also presents work being done to address the zero-inflation problem common in outage data.

**Difficulties with Applied Statistics in DoD: Practical Solutions to Limitations in Testing**

Francisco Ortiz, The Perduco Group

(Presentation unavailable. Please contact the author for further information about talk.)

Although testing is the foundation to producing efficient products within the Department of Defense, there are often many difficulties that arise from the nature of the systems. Whether it is budgetary constraints or complex systems with disallowed combinations, creative solutions are required to produce meaningful testing results. These solutions must provide rigorous and defensible results in order to field systems, despite the limitations that exist. This session aims to highlight some of these difficulties and provide practical solutions to the problems that arise in DoD testing.

**Power Approximations for Failure-Time Regression Models**

Rebecca Medlin, Thomas Johnson, and Laura Freeman, Institute for Defense Analyses

Reliability experiments determine which factors drive product reliability. Often, the reliability or lifetime data collected in these experiments tend to follow distinctly non-normal distributions and typically include censored observations. The experimental design should accommodate the skewed nature of the response and allow for censored observations, which occur when products do not fail within the allotted test time. To account for these design and analysis considerations, Monte-Carlo simulations are frequently used to evaluate design properties for reliability experiments. Simulation provides accurate power calculations as a function of sample size, allowing researchers to determine adequate sample sizes at each level of the treatment. However, simulation may be inefficient and cumbersome for comparing multiple experiments of various sizes. We present a closed form approach for calculating power, based on the non-central chi-squared approximation to the distribution of the likelihood ratio statistic.

Sara R. Wilson, NASA; Kurt A. Swieringa, NASA; Robert D. Leonard, VCU; Evan Freitag, VCU; David J. Edwards, VCU

This article presents a statistical engineering approach for clustering aircraft trajectories. The clustering methodology was developed to address the need to incorporate more realistic trajectories in fast-time computer simulations used to evaluate an aircraft spacing algorithm. The methodology is a combination of Dynamic Time Warping and k-Means clustering, and can be viewed as one of many possible solutions to the immediate problem. The implementation of this statistical engineering approach is also repeatable, scalable, and extendable to the investigation of other air traffic management technologies. Development of the clustering methodology is presented in addition to an application and description of results.

**Prioritization of Stockpile Maintenance with Layered Pareto Fronts**

Sarah Burke, The Perduco Group; Christina Anderson-Cook, Los Alamos National Laboratory; Lu Lu, Los Alamos National Laboratory; Doug Montgomery, Arizona State University

Difficult choices must be made in a decision-making process where resources and budgets are increasingly constrained. This paper demonstrates a structured decision-making approach using layered Pareto fronts to identify priorities about how to allocate funds between munitions stockpiles based on estimated reliability, urgency, and consequence metrics. This case study illustrates the process of identifying appropriate metrics that summarize the important dimensions of the decision, and then eliminating non-contenders from further consideration in an objective stage. The final subjective stage incorporates decision-makers’ priorities to select the four stockpiles to receive funds based on understanding the trade-offs and robustness to user priorities.

### Session 6 (1:30-3PM)

**The Art of Teaching and Communicating Design of Experiments to Non-Statisticians**

Shari Kraber, Stat-Ease

Drawing from over 20 years of experience teaching design of experiments to non-statisticians and consulting with clients on DOE projects, Shari shares some lessons learned. She will provide several entertaining analogies used to help clients understand statistical concepts. (For engineers and chemists, golfing and helicopters can provide unique inspiration, thus overcoming those alpha and beta obstacles!) Discover some of the key teaching points that help non-statisticians overcome their fear of statistics and allow them to propagate good DOE practices. Communication being the key to success, Shari will finish up with tips and tricks for presenting experimental results compellingly to managers and clients.

**Small Statistics, Big Data Curriculum**

Chad Foster, GE Aviation

There is an increased demand for big-data, machine learning instruction. Frequently these classes focus on database knowledge and algorithm implementation. Many of these classes neglect to instruct the necessary statistical methods to maximize the resulting analytics. This lack of rigor is evident from predictions that never achieve expected accuracy. Most instructors and practitioners, shared the belief that the traditional statistics field did not have sufficiently updated tools for the big data environment. From the feedback from existing classes and the lessons from the development of hundreds of predictive models several traditional statistical methods were selected as critical to future analytics success. These methods were integrated with the traditional data science content to create a new course. The resulting course combining data science fundamentals with statistical methods has been extremely popular and is continuously oversubscribed. This presentation will introduce the methods that were selected and examples of their use within the predictive analytics environment. The major areas addressed are hypothesis testing, data voracity, significance methods, ranking parameters, building models, and displaying results. The methods will be presented along with specific examples. There are no new methods introduced but an updated link is made between the methods and our big-data machine learning approaches in a hardware based analytics environment. The methods presented are standard statistical approaches and can be applied in numerous applications. They were initially applied in fielded machine predictive analytics but now there are examples in manufacturing and business processes. Other potential areas that can benefit from these standard statistical methods are also presented. When approaching the future of big-data and the applications to machines, processes, and people there is a clear place for traditional statistical procedures that should not be neglected.

**A Practical Framework for the Design and Analysis of Experiments with Interference Effects**

Katherine Allen Moyer and Jonathan Stallrich, North Carolina State University

Interference models are employed when a treatment’s effect from one experimental unit potentially interferes with the surrounding units. Interference effects are possible in a number of application areas, including pharmaceutical studies that apply a sequence of treatments to a patient over time. While including washout periods or buffers can mitigate interference effects, there are situations where interference cannot be fully avoided: such as those with constraints on the available time or space to perform the experiment. Interference designs have the potential to save experimenters time and money and to answer nuanced questions with fewer experimental units. However, we feel that many of the currently employed and active areas of research for interference designs do not mirror how this problem presents itself to the practitioner. In this talk, we propose a framework that accounts for more general design questions associated with these types of experiments. This framework motivates a new design criterion that allows experimenters to reflect the reality of interference when choosing a design. Examples are provided to demonstrate the framework and an exchange algorithm is used to find the optimal design under these scenarios.

A. Valeria Quevedo, Carlos Chang, Gerardo Chang, Edgar Rodriguez, Jenny Sanchez, Susana Vegas, and Geoff Vining, Virginia Tech

The International Roughness Index (IRI) is a standard worldwide indicator for measuring road surface roughness condition which is the support for the evaluation and management of roads performance. There are a variety of instruments with different precision levels that calculate and estimate IRI measurements. High precision instruments (called class 1 or class 2) have the disadvantage of being costly or having poor productivity.there are also low cost and highly available instruments such as smartphones that may not give good estimates. Smartphones’ applications estimate the IRI through regression equations based on road condition. The literature shows that under controlled conditions, there are good correlations between IRI measurements from class 1 or 2 instruments and IRI measurements from smartphones. However, these regression models are based on experiments considering some factors that differs from the Peruvian reality, such as different car fleet composition, maintenance strategies, among others. In addition, smartphones’ applications estimate IRI based on some reference speed which is higher than the ones regulated for the Peruvian urban traffic. The objective of this study is to calibrate the regression models used by Roadroid smartphone application to estimate the IRI in urban areas of Peru, comparing Roadroid’s IRI measurements against the measurements acquired by a MERLIN road roughness measuring machine. A design of experiments is going to be conducted according to the factors suggested by the literature and some additional local based factors, to finally calibrate the models.

Arman Sabbaghi, Purdue University; Qiang Huang, University of Southern California

Geometric shape deviation models constitute an important component in quality control for additive manufacturing (AM) systems. However, specified models have a limited scope of application across the vast spectrum of processes in a system that are characterized by different settings of process variables, and the disparate classes of shapes that are of interest for manufacture. A methodology that can make full use of data collected on different shapes and processes, and reduce the haphazard aspect of traditional statistical model building techniques, is necessary in this context. We develop a new Bayesian procedure based on the effect equivalence and modular deviation features concepts that incorporates all available data for the systematic and comprehensive construction of shape deviation models in an AM system. Our methodology is applied to dramatically facilitate modeling of the multiple deviation profiles that exist in cylinders with different types of cavities. Ultimately, our Bayesian approach connects different processes and shapes to provide a unified framework for geometric quality control in AM systems.

**Applying Monte Carlo Logic Regression to the Drug – Adverse Event Association Study**

Minh Pham, Feng Cheng, Kandethody Ramachandran, University of South Florida

One of the objectives of the U.S. Food and Drug Administration (FDA) is to protect the public health through post-marketing drug safety surveillance. An inexpensive and efficient method to inspect post-marketing drug safety is to use data mining algorithms on Electronic Health Records to discover associations between drugs and adverse events. The FDA Adverse Event Reporting System (FAERS) is a rich data source for this purpose with more than 17 million records of drugs taken and more than 14 million records of adverse event observed. All of the common methods for this problem have univariate structure, which use only the count of one drug and one adverse event to calculate association measures. Therefore, these methods are vulnerable to give false positive when certain drugs are usually co-prescribed. For instance, if drug A and drug B are commonly co-prescribed and only drug A causes an adverse event, univariate methods are likely to give positive detection for drug B as well. We proposed the Monte Carlo Logic Regression (MCLR) approach, which has been used in the genome-wide association study to detect associations between genomic sequences and diseases and also automatically detect interactions between genomic sequences. Due to unique characteristics of the Drug-Adverse Event Association Study, we needed to modify the original MCLR algorithm. Our comparison study using the FAERS database shows that the modified MCLR algorithm has better performance than the common methods when dealing with commonly co-prescribed drugs and can automatically detect drug interactions without the need to specify them in a model.