October 9th

Session 1 (9:15-10AM)

AI, BI & SI—Artificial Intelligence, Biological Intelligence and Statistical Intelligence

Dennis Lin, Purdue University

Artificial Intelligence (AI) is clearly one of the hottest subjects these days. Basically, AI employs a huge number of inputs(training data), super-efficient computer power/memory, and smart algorithms to perform its intelligence. In contrast, Biological Intelligence (BI) is a natural intelligence that requires very little or even no input. This talk will first discuss the fundamental issue of input (training data) for AI. After all, not-so-informative inputs (even if they are huge) will result in not-so-intelligent AI. Specifically, three issues will be discussed: (1)input bias, (2) data right vs. right data, and (3) sample vs. population. Finally, the importance of Statistical Intelligence (SI)will be introduced. SI is somehow in between AI and BI. It employs important sample data, solid theoretically proven statistical inference/models, and natural intelligence. In my view, AI will become more and more powerful in many senses, but it will never replace BI. After all, it is said that “The truth is stranger than fiction, because fiction must make sense.”

Quality by Design for the Era of Precision Medicine

Juila O’Neill, Direxa Consulting LLC

The pharmaceutical industry is adopting the Quality by Design paradigm after a long history of relying on Quality by Inspection. Recent accelerated approval pathways for vaccines, gene therapies, and other pioneering biotherapies have been supported by the deep connection between knowledge and statistical methods that is the foundation of QbD.The transition from conventional to modern development requires fundamental change. Many established practices in pharmaceutical statistics represent doing the “wrong thing right”. Statistical strategies must be modernized to do the “right thing right” instead.

Statistical thinking provides insights to establish effective quality control strategies. Advances in analytical testing have delivered a suite of methods better able to characterize products. However the traditional Quality by Inspection focus has mandated functional and clinical testing. Functional assays may provide an intuitively direct link to safety or efficacy, but often older methods such as cell-based potency assays are handicapped by inherently lower precision and accuracy relative to newer methods. The transition to modern methods for analytical characterization can be unfamiliar and complex. Examples from recent successful submissions will be introduced to illustrate possibilities in this new era, and to stimulate discussion about Quality by Design thinking to bridge the gap between familiar and novel control strategies.

Calibration and Uncertainty Quantification for Estimating Topographic Speedup Factors with CFD Models

Adam L. Pintar, NIST

During extreme wind events, such as hurricanes, the wind can be significantly accelerated by hilly or mountainous terrain. This topographic influence on wind fields has been identified as a major contributor to catastrophic damage and loss of life, as evidenced by past hurricanes, such as Iniki and Maria, which severely impacted Hawaii and Puerto Rico, respectively. Buildings located in mountainous regions could experience wind speeds that are nearly double those in flat terrain. Thus, estimating this speedup is essential to determining wind loads for designing buildings to resist such extreme wind events. Wind tunnel experiments are often used to investigate topographic effects on wind speeds; however, they face challenges in accurately generating approach flows, representing complex interactions between the natural landscape and constructions at small scales, and the spatial resolution of measurements. Computational fluid dynamics (CFD) simulations provide an alternative without the scale, flow field, or resolution limitations. However, the use of CFD simulations for this purpose requires the calibration of the simulation input parameters as well as statements about uncertainty.

This work combines CFD simulations and data from wind tunnel experiments to estimate topographic speed-up factors (TSFs) for generic topographic features . The TSF is the ratio of wind speed over topography to wind speed over flat land. The typical work flow for the calibration of computer experiments is followed. Gaussian process (GP)surrogates are created for the CFD simulations that mimic the wind tunnel experiments, one with a bare floor and one with topographic features. Maximum projection and maximin designs are utilized to develop the surrogates. The simulation output is two dimensional (wind speed and turbulence) and functional (over 2D space). For the bare floor surrogate, five tuning parameters are considered, and for the topographic surrogate, a sixth is added. The tuning parameters of each surrogate are optimized to wind tunnel measurements, and additional GPs are used to assess the discrepancy between the surrogate predictions and the wind tunnel measurements. The result is a calibrated estimate of the TSF for a full 2D cross section of the wind tunnel. The uncertainty statements account for the finite sample used to construct the surrogates, the limited number of wind tunnel measurements, and noise in the wind tunnel measurements.

Session 2 (10:30AM-12PM)

Minimally Aliased D- and A-optimal Main-effects Designs

Mohammed Saif Ismail Hameed, KU Leuven

The literature on two-level D- and A-optimal designs for the main-effects model is very exhaustive for run sizes that are multiples of four. This is due to the fact that complete catalogs of D- and A-optimal designs exist for run sizes that are multiples of four. However, for run sizes that are not multiples of four, there are no such catalogs, and experimenters resort to heuristic optimization algorithms to create designs (such as coordinate-and point-exchange algorithms). This approach has multiple weaknesses. First, it requires computing time. Second, heuristic optimization algorithms often fail to return a truly optimal design. Third, even in the event the design produced is truly optimal for the main-effects model, it often exhibits substantial aliasing between the main effects and the two-factor interactions as well as among the two-factor interactions. In this presentation, we explain how to enumerate complete catalogs of D- and A-optimal main-effects designs for run sizes that are not multiples of four, and how to select the best of these designs in terms of aliasing between the main effects and the two-factor interactions and among the two-factor interactions. As a result of our work, the use of heuristic optimization can be avoided for most optimal design problems where the run size is at most 20 in the event the primary interest of the experimenter is in a main-effects model and statistical software can provide a minimally aliased D-and A-optimal design instantaneously.

Optimal Designs Under Model Uncertainty

Xietao Zhou, Kings College London

Design of experiment is a useful approach when we are studying the effects of several factors on one or more responses, and it has seen wide application in industrial research and many other areas. Traditional approaches assume that the model best fitted will be fixed and some classic optimality criteria have been applied to evaluate the designs under this model. Early extensions have been made so that a few alternative models could be considered. The QB criterion has then been proposed to offer the capacity of considering hundreds of alternative models that could appear for multifactor designs. It could avoid the risk on the estimation of the factors by going to extreme belief of the model best fitted and allow experimenter’s beliefs about the model reflected in the design selection process by assigning different prior probabilities to each possible model.

Recently an alternative parameterization of factorial models called Baseline parameterization has been considered in the literature. It has been argued that such parameterization arises naturally if there is a null state of each factor, and the corresponding optimal design has been raised.

In this talk I will introduce the basic framework of QB criterion and how it could be extended to the Baseline parameterization. I will then present some QB optimal designs we have found and show that they have achieved advantages in terms of traditional A optimality versus the optimal design in previous literature for various specified prior probabilities of main effects and two-factor interactions been included in the best fitted model.

Peelle’s Pertinent Puzzle and D’Agostini Bias — Estimating the Mean with Relative Systematic Uncertainty

Scott Vander Wiel, LANL

Scientists often represent relative, systematic uncertainty by a covariance term proportional to the outer product of measurements (yy^T). This form of error covariance produces generalized least squares estimates that are biased toward zero, an effect known in the nuclear data community as Peelle’s Pertinent Puzzle and in high energy physics as D’Agostini Bias. Decades of explanations and proposed fixes have not been connected to well-established statistical methods and theory. This work fills the gap, establishing why this covariance has unacceptable properties and providing a clear explanation of the root cause–namely, overfitting. We compare properties of such estimators to the Gaussian maximum likelihood estimator and iteratively re-weighted least squares, showing that IRLS provides stable estimates with uncertainties that correctly reflect the impact of systematic errors. Peelle’s Puzzle is solved!

Estimation and Variable Selection of Conditional Main Effects for Generalized Linear Models

Kexin Xie, Virginia Tech

In numerous engineering and healthcare applications, comprehending the interaction effects among system factors is crucial. Traditional interaction effects, while powerful, often present challenges in interpretation and clarity, particularly within the context of complex systems. The concept of Conditional Main Effects (CMEs) marks a significant development aimed at addressing this challenge. In this work, we focus on modeling the data with non-continuous responses using CME within the generalized linear model framework. The proposed method considers an appropriate penalized likelihood function for model estimation, integrating CME coupling and reduction alongside an overlapping group structure to refine bi-level variable selection. An iteratively reweighted least squares procedure is used to enhance the computational efficiency. Simulation studies are conducted to examine the proposed methods’ advantage on selection and estimation accuracy. A case study in public health is used to demonstrate the merits of the proposed method.

Drift vs Shift: Decoupling Trends and Changepoint Analysis

Toryn Schafer, Texas A&M

We introduce a new approach for decoupling trends (drift) and changepoints (shifts) in time series. Our locally adaptive model-based approach for robustly decoupling combines Bayesian trend filtering and machine learning based regularization. An over-parameterized Bayesian dynamic linear model (DLM) is first applied to characterize drift. Then a weighted penalized likelihood estimator is paired with the estimated DLM posterior distribution to identify shifts. We show how Bayesian DLMs specified with so-called shrinkage priors can provide smooth estimates of underlying trends in the presence of complex noise components. However, their inability to shrink exactly to zero inhibits direct changepoint detection. In contrast, penalized likelihood methods are highly effective in locating changepoints. However, they require data with simple patterns in both signal and noise. The proposed decoupling approach combines the strengths of both, i.e. the flexibility of Bayesian DLMs with the hard thresholding property of penalized likelihood estimators, to provide changepoint analysis in complex, modern settings. The proposed framework is outlier robust and can identify a variety of changes, including in mean and slope. It is also easily extended for analysis of parameter shifts in time-varying parameter models like dynamic regressions. We illustrate the flexibility and contrast the performance and robustness of our approach with several alternative methods across a wide range of simulations and application examples.

Building Trees for Probabilistic Prediction via Scoring Rules

Sara Shashaani, NC State

Decision trees built with data remain in widespread use for nonparametric prediction. Predicting probability distributions is preferred over point predictions when uncertainty plays a prominent role in analysis and decision-making. We study modifying a tree to produce nonparametric predictive distributions. We find the standard method for building trees may not result in good predictive distributions and propose changing the splitting criteria for trees to one based on proper scoring rules. Analysis of both simulated data and several real datasets demonstrates that using these new splitting criteria results in trees with improved predictive properties considering the entire predictive distribution.

Session 3 (2-3:30PM)

XGBoost Modeling for Live Use in Manufacturing

Amanda Yoder, Corning Incorporated

Detecting defects early in the process leads to cost reduction and product capability. One key metric that is used is strength testing of filters. Many techniques have been used in the past but have required much time and many resources to complete. In addition, there was an insufficient feedback loop to manufacturing where we run continuously for them to make quick decisions to fix the process. With the combination of tools including JMP, XGBoost modeling, SQL, and many others, we have built a seamless process that provides this feedback to manufacturing. We used tabular data collected from high resolution images to build the model. Using JMP we were able to easily export the model in SQL, which links us to the web application that manufacturing sees today as a bar graph which updates automatically. Shapley values are used in conjunction with the model to help provide manufacturing reasons behind the predicted result of the black-box model, which leads to process changes and prevents leakage of defects.

Can You Dig It? Using Machine Learning to Efficiently Audit Utility Locator Tickets Prior to Excavation to Protect Underground Utilities

Jennifer H Van Mullekom, Virginia Tech

Virginia 811 (VA811) is a not for profit company in Virginia, USA that administers the transactional aspects of utility location prior to commencing an excavation project as mandated by Virginia law. In recent years, an increasing number of excavation tickets have been entered via web users versus through the call center. These web entry tickets have a higher number of errors, as opposed to those entered by call agents. Prior to working with the Virginia Tech Statistical Applications and Innovations Group (VTSAIG), VA811 performed random audits of their tickets to ensure quality. Beginning in 2020, the VT SAIG developed two machine learning models to predict ticket quality. The most recent of these models has been integrated into VA811’s quality assurance program and detects nearly twice the poor-quality tickets as compared to the random audit process used prior to late 2022. This talk details the case study in the context of the phases of Cross Industry Standard Data Mining Practice (CRISP-DM). Statistical methods include measurement systems analysis and gradient boosted machines. Features were engineered using text mining and geographical information systems data. Practical aspects of project implementation will also be discussed including data cleaning, model implementation, and model monitoring. This case study truly harmonizes quality, statistics, and data science by employing statistical thinking, feature engineering, machine learning models, and statistical programming to more accurately and efficiently audit the quality of a transactional process.

Deep Gaussian Processes for Surrogate Modeling with Categorical Data

Andrew Cooper, Virginia Tech

Many applications of experimental design produce categorical response data. Gaussian Processes (GPs) are stochastic models that provide flexible fitting of response surfaces, but must be modified to handle non-Gaussian likelihoods. Performing fully Bayesian estimation of a GP classifier requires directly sampling from a latent GP layer, and computational bottlenecks with inverting covariance matrices make posterior estimation computationally infeasible in large-data regimes. The Vecchia approximation can reduce the cost of inverting covariance matrices by inducing a sparse Cholesky decomposition. By combining this with the Elliptical Slice Sampling (ESS) algorithm for generating valid posterior samples from a latent layer, we obtain a tractable, fully Bayesian approach to fitting a global GP classification model that can handle large training sizes. We apply our methods to a Binary Black Hole (BBH) simulator example, which contains both binary and real-valued components in its response. Our method of combining fully Bayesian classification and regression models provides us full Uncertainty Quantification (UQ) estimation of BBH formation and chirp mass. Finally, we introduce an additional latent GP layer to add “deepness” to our model, which can capture non-stationary behavior in BBH formation and improve UQ estimation.

Generating Higher Resolution Sky Maps Using a Deep Gaussian Process Poisson Model

Steven D. Barnett, Virginia Tech

The Interstellar Boundary Explorer (IBEX) satellite was launched in 2008 in an effort to learn more about the heliosphere, which sits at the boundary between our solar system and interstellar space. IBEX detects energetic neutral atoms (ENAs) emanating from the heliosphere to create sky maps describing their rate of emission. These maps are used by physicists to inform their theoretical models about the heliosphere. However, the data collected by IBEX are both noisy and irregular. Multiple tools have been developed to smooth out this data to produce higher resolution sky maps. We propose a deep Gaussian process Poisson model for the rate of energetic neutral atoms (ENAs)emanating from the heliosphere. We believe our deep Gaussian process model constitutes a more cohesive model than those developed previously. Additionally, deep Gaussian processes have shown a greater ability to learn complex surfaces, while maintaining a simpler covariance function. We hope to develop a Markov chain Monte Carlo algorithm utilizing elliptical slice sampling and the Vecchia approximation to learn the underlying latent deep Gaussian process.

Selection of Initial Points Using Latin Hypercube Sampling for Active Learning

Roelof Coetzer, North-West University

Binary classification is a common task in machine learning where the goal is to assign binary labels to observed and new observations. However, labelling large sets of data is often a time-consuming and expensive process. Active learning learns from a few data points, while selecting the most informative unlabelled samples for labelling to improve model performance. The success of active learning is dependent on the selection of the initial points to initialize the active learning process, and the selection criteria used to identify informative samples. In this paper we illustrate the use of Latin Hypercube sampling, conditioned Latin Hypercube sampling, and a modified Latin Hypercube sampling procedure for initializing active learning for the estimation of the logistic regression classifier. In addition to the usual performance measures for classification, we consider the mean squared error of the predicted posterior probability and the classifier as performance measures. The results are demonstrated using simulated data sets and some actual case studies. We show that Latin Hypercube sampling outperforms the traditional random sampling approach for various performance measures.

Optimal Experimental Designs for Process Robustness Studies

Peter Goos, KU Leuven

In process robustness studies, experimenters are interested in comparing the responses at different locations within the normal operating ranges of the process parameters to the response at the target operating condition. Small differences in the responses imply that the manufacturing process is not affected by the expected fluctuations in the process parameters, indicating its robustness. In this presentation, we propose an optimal design criterion, named the generalized integrated variance for differences (GID) criterion, to set up experiments for robustness studies. GID-optimal designs have broad applications, particularly in pharmaceutical product development and manufacturing. We show that GID-optimal designs have better predictive performances than other commonly used designs for robustness studies, especially when the target operating condition is not located at the center of the experimental region. In some situations that we encountered, the alternative designs typically used are roughly only 50% as efficient as GID-optimal designs. We will demonstrate the advantages of tailor-made GID-optimal designs through an application to a manufacturing process robustness study of the Rotarix liquid vaccine.

October 10th

Session 4 (8-9:30AM)

On the Testing of Statistical Software

Ryan Lekivetz, JMP

Testing statistical software is an extremely difficult task. What is more, for many statistical packages, the developer and test engineer are one and the same, may not have formal training in software testing techniques, and may have limited time for testing. This makes it imperative that the adopted testing approach is both efficient and effective and, at the same time, it should be based on principles that are readily understood by the developer. As it turns out, the construction of test cases can be thought of as a designed experiment (DOE). This article provides a treatment of DOE principles applied to testing statistical software and includes other considerations that may be less familiar to those developing and testing statistical packages.

MaLT: Machine-Learning-Guided Test Case Design and Fault Localization of Complex Software Systems

Irene Ji, Duke

Software testing is essential for the reliable and robust development of complex software systems. Due to the complexity of these systems, testing and fault localization can be very costly. To mitigate this cost, we outline in this work a holistic machine-learning-guided test case design and fault localization (MaLT) framework, which leverages recent probabilistic machine learning methods to accelerate the testing of complex software systems. MaLT consists of three steps: (i) the construction of a suite of test cases using a covering array for initial testing, (ii) the investigation of posterior root cause probabilities via a Bayesian fault localization procedure, then (iii) the use of such Bayesian analysis to guide selection of subsequent test cases via active learning. The proposed MaLT framework can thus facilitate efficient identification and subsequent diagnosis of software faults with limited test runs. We demonstrate the effectiveness of MaLT via a numerical experiment and an application on the Traffic Alert and Collision Avoidance System (TCAS).

On Forming Control Limits for Short Run Standardized Xbar Control Charts with Varying Subgroup Sizes

Annie Dudley and Di Michelson, JMP and Bill Woodall, Virginia Tech

Short Run control charts are commonly used when manufacturing more than one product on the same production line. Developments with High Throughput and Just In Time Manufacturing methods lead to multiple products made on the same line, together with shorter intervals between products. Short Run control charts enable the practitioner to view all products in the order they were made all on the same chart, by either centering all the data by a corresponding product mean or target, or by standardizing all the data by corresponding product target and estimate of sigma.

In the absence of Short Run control chart methods, a practitioner cannot either view all the data on a meaningful control chart or apply any of the Nelson Runs Tests, as the runs are not in order.
Current methods used for computing control limits for Short Run, Standardized and Summarized (XBar/R) Control Charts have deficiencies when the subgroup size is not constant. In practice, XBar/R charts might regularly have unequal subgroup sizes. In this paper, we explore methods to resolve the constant subgroup size constraint.

We consider different published sets of recommendations for control limit calculation of Short Run standardized XBar/R control charts. For each, we study the distributional foundations of the estimators and run a simulation study to verify chart performance and present our findings for both the estimate of sigma and control limits.

Collaborative Design of Controlled Experiments in the Presence of Subject Covariates

William Fisher, Clemson University

In some cases, researchers may run multiple, separate controlled experiments where subjects participate in more than one experiment. Due to subjects participating in multiple experiments, there is correlation among the responses across experiments. Taking account of the correlation across experiments, Zhang et al. (2024) proposed the collaborative analysis framework and demonstrated that their framework can provide more precise estimates of treatment effects than if one were to analyze the experiments separately. In this work, we consider the experimental design problem of allocating subjects to treatment or control within each of the multiple experiments when subject covariate information is available. The goal of the allocation is to provide precise estimates of treatment effects for each experiment to further improve precision gained through collaborative analysis. Using D-optimality as our allocation criterion, we propose semi-definite programming based randomized algorithms which provide solutions to the D-optimality problem. We showcase the performance of our algorithms in a simulation study, demonstrating their effectiveness over pure randomization methods when the number of subject covariates is large.

Nonparametric Online Monitoring of Dynamic Networks

Peihua Qiu, University of Florida

Network sequence has been commonly used for describing the longitudinal pattern of a dynamic system. Proper online monitoring of a network sequence is thus important for detecting temporal structural changes of the system. To this end, there have been some discussions in the statistical process control (SPC) literature to first extract some features from the observed networks and then apply an SPC chart to monitor the extracted features sequentially over time. However, the features used in many existing methods are insensitive to some important network structural changes, and the control charts used cannot accommodate the complex structure of the extracted features properly. In this paper, we suggest using four specific features to describe the structure of an observed network, and their combination can reflect most network structural changes that we are interested in detecting in various applications. After the four features are extracted from the observed networks, we suggest using a multivariate nonparametric control chart to monitor the extracted features online. Numerical studies show that our proposed network monitoring method is more reliable and effective than some representative existing methods in various cases considered.

A Graphical Comparison of Screening Designs using Support Recovery Probabilities

Kade Young, Eli Lilly & Co.

A screening experiment attempts to identify a subset of important effects using a relatively small number of experimental runs. Given the limited run size and a large number of possible effects, penalized regression is a popular tool used to analyze screening designs. In particular, an automated implementation of the Gauss-Dantzig selector has been widely recommended to compare screening design construction methods. Here, we illustrate potential reproducibility issues that arise when comparing screening designs via simulation, and recommend a graphical method, based on screening probabilities, which compares designs by evaluating them along the penalized regression solution path. This method can be implemented using simulation, or, in the case of lasso, by using exact local lasso sign recovery probabilities. Our approach circumvents the need to specify tuning parameters associated with regularization methods, leading to more reliable design comparisons.

Session 5 (10-11:30AM)

Machine Learning, Cross Validation, and DOE

Maria Weese, Miami University

Recently there is interest in fitting complex models to traditional experimental designs (i.e. machine learning models and central composite designs). Machine learning models, which require out of sample data for hyperparameter tuning, are often optimized with cross validation which involves selecting subsets of training and test data. When larger, less structured, data sets are used to train machine learning models there is little concern with the structure of the subsets since they are not likely to be different from the full data. However, when the training data is collected using a structured experimental design, creating subsets using cross validation might produce training samples that do not preserve the structure of the original design. In this work we investigate the consequences of using cross validation on data collected using various types of experimental designs. We provide a literature review that illustrates the types of models fit to designed experiments and how those models are tuned. Finally, we present designs constructed using an optimality criteria to mitigate the effects of sampling in leave-one-out cross validation and k-fold cross validation procedures.

Autonomy versus Safety: Joint Modeling of Disengagement and Collision Events in Autonomous Vehicle Driving Study

Simin Zheng, Virginia Tech

As the popularity of artificial intelligence (AI) continues to grow, AI systems have become increasingly embedded into various aspects of daily life, leading to significant transformations in industries and changing how people live. One of the typical applications of AI systems is autonomous vehicles (AVs). In AVs, the relationship between the level of autonomy and safety is an important research question to answer, which can lead to two types of recurrent events data being recorded: disengagement and collision events. This paper proposes a joint modeling approach with multivariate random effects to analyze these two types of recurrent events data. The proposed model captures the inter-correlation between the levels of autonomy and safety in AVs. We apply an expectation-maximization (EM) algorithm to obtain maximum likelihood estimates for the functional form of fixed effects, variance-covariance components, and baseline intensity functions. This proposed joint modeling approach can be useful for modeling recurrent events data with multiple event types from various applications. We analyze disengagement and collision events data from the California Department of Motor Vehicles AV testing program to demonstrate its application.

A Replacement for Lenth’s Method for Nonorthogonal Designs

Caleb King, JMP

A key part of implementing a screening experiment is the correct identification of the active factors. To do this requires separating the signal from the noise. Therefore, a precise estimate of σ is desirable. Lenth (1989) provided a method for estimating σ in the context of unreplicated factorial designs that has become the standard for factor screening. Currently, optimal designs are in common use as screening designs as they exist for any desired number of runs. The price for using these designs is often the loss of orthogonality of the effects. For nonorthogonal designs then, a replacement for Lenth’s method is necessary. This work provides a method for analyzing saturated nonorthogonal screening designs. We show that our method performs well even when the number of active effects is up to half the number of runs.

Optimal Two-level Designs Under Model Uncertainty

Steven Gilmour, King’s College London

Two-level designs are widely used for screening experiments where the goal is to identify a few active factors which have major effects. We apply the model-robust Q_B criterion for the selection of optimal two-level designs without the usual requirements of level balance and pairwise orthogonality. We provide a coordinate exchange algorithm for the construction of Q_B-optimal designs for the first-order maximal model and second-order maximal model and demonstrate that different designs will be recommended under different prior beliefs. Additionally, we study the relationship between this new criterion and the aberration-type criteria. Some new classes of model-robust designs which respect experimenters’ prior beliefs are found.

Monitoring Univariate Processes Using Control Charts: Some Practical Issues and Advice

Bill Woodall, Virginia Tech

We provide an overview and discussion of some issues and guidelines related to monitoring univariate processes with control charts. We offer some advice to practitioners to help them set up control charts appropriately and use them most effectively. We propose a four-phase framework for control chart set-up, implementation, use, and maintenance. In addition, our recommendations may be useful for researchers in the field of statistical process monitoring. We identify some current best practices, some misconceptions, and some practical issues that rely on practitioner judgment.

How Generative AI models such as ChatGPT can be (Mis)Used in SPC Practice, Education, and Research? An Exploratory Study

Fadel Megahed, Miami University

Generative Artificial Intelligence (AI) models such as OpenAI’s ChatGPT have the potential to revolutionize Statistical Process Control (SPC) practice, learning, and research. However, these tools are in the early stages of development and can be easily misused or misunderstood. In this paper, we give an overview of the development of Generative AI. Specifically, we explore ChatGPT’s ability to provide code, explain basic concepts, and create knowledge related to SPC practice, learning, and research. By investigating responses to structured prompts, we highlight the benefits and limitations of the results. Our study indicates that the current version of ChatGPT performs well for structured tasks, such as translating code from one language to another and explaining well-known concepts but struggles with more nuanced tasks, such as explaining less widely known terms and creating code from scratch. We find that using new AI tools may help practitioners, educators, and researchers to be more efficient and productive. However, in their current stages of development, some results are misleading and wrong. Overall, the use of generative AI models in SPC must be properly validated and used in conjunction with other methods to ensure accurate results.

Session 6 (1:30-3PM)

Active Learning for a Recursive Non-Additive Emulator for Multi-Fidelity Computer Experiments

Junoh Heo, Michigan State University

Computer simulations have become essential for analyzing complex systems, but high-fidelity simulations often come with significant computational costs. To tackle this challenge, multi-fidelity computer experiments have emerged as a promising approach that leverages both low-fidelity and high-fidelity simulations, enhancing both the accuracy and efficiency of the analysis. In this paper, we introduce a new and flexible statistical model, the Recursive Non-Additive (RNA) emulator, that integrates the data from multi-fidelity computer experiments. Unlike conventional multi-fidelity emulation approaches that rely on an additive auto-regressive structure, the proposed RNA emulator recursively captures the relationships between multi-fidelity data using Gaussian process priors without making the additive assumption, allowing the model to accommodate more complex data patterns. Importantly, we derive the posterior predictive mean and variance of the emulator, which can be efficiently computed in a closed-form manner, leading to significant improvements in computational efficiency. Additionally, based on this emulator, we introduce four active learning strategies that optimize the balance between accuracy and simulation costs to guide the selection of the fidelity level and input locations for the next simulation run. We demonstrate the effectiveness of the proposed approach in a suite of synthetic examples and a real-world problem. An R package RNAmf for the proposed methodology is provided on CRAN.

Quantitative Assessment of Machine Learning Reliability and Resilience

Lance Fiondella, Dartmouth

Advances in machine learning (ML) have led to applications in safety-critical domains, including security, defense, and healthcare. These ML models are confronted with dynamically changing and actively hostile conditions characteristic of real-world applications, requiring systems incorporating ML to be reliable and resilient. Many studies propose techniques to improve the robustness of ML algorithms. However, fewer consider quantitative techniques to assess the reliability and resilience of these systems. To address this gap, this study demonstrates how to collect relevant data during the training and testing of ML suitable for the application of software reliability, with and without covariates, and resilience models and the subsequent interpretation of these analyses. The proposed approach promotes quantitative risk assessment of machine learning technologies, providing the ability to track and predict degradation and improvement in the ML model performance and assisting ML and system engineers with an objective approach to compare the relative effectiveness of alternative training and testing methods. The approach is illustrated in the context of an image recognition model, which is subjected to two generative adversarial attacks and then iteratively retrained to improve the system’s performance. Our results indicate that software reliability models incorporating covariates characterized the misclassification discovery process more accurately than models without covariates. Moreover, the resilience model based on multiple linear regression incorporating interactions between covariates tracks and predicts degradation and recovery of performance best. Thus, software reliability and resilience models offer rigorous quantitative assurance methods for ML-enabled systems and processes.

Quick Input-Response Space-Filling (QIRSF) Designs

Xiankui Yang, University of South Florida

Space-filling designs have been broadly used in computer experiments to guide efficient and informative data collection. Traditional space-filling designs primarily focus on uniformly spreading design points throughout the input space. Recent development on input-response space-filing (IRSF) designs offer additional advantages when having a nice coverage over the range of response values is also desirable for some applications. The original IRSF designs use the maximin distance criterion and a modified point exchange algorithm to balance the uniform spread of design points across the input space and response values. In this paper, we develop a new quick input-response space-filling (QIRSF) design which utilizes hierarchical clustering techniques and the minimax point to achieve desirable coverage in input and response spaces. The new method dramatically reduces the computing time by at least 20 folds while maintaining high efficiency of approximating the IRSF designs. The performance and computational efficiency of the proposed methods are demonstrated through multiple examples with different input and response dimensions and varied characteristics of response surfaces. An R Shiny App is offered to facilitate easy construction of QIRSF designs of flexible size and dimension.

A Kernel-Based Approach for Modelling Gaussian Processes with Functional Information
Andrew Brown, Clemson

Gaussian processes are commonly used tools for modeling continuous processes in machine learning and statistics. This is partly due to the fact that one may employ a Gaussian process as an interpolator for a finite set of known points, which can then be used for prediction and straight forward uncertainty quantification at other locations. However, it is not always the case that the available information is in the form of a finite collection of points. For example, boundary value problems contain information on the boundary of a domain, which is an uncountable collection of points that cannot be incorporated into typical Gaussian process techniques. In this paper, we propose and construct Gaussian processes that unify, via reproducing kernel Hilbert space, the typical finite case with the case of having uncountable information by exploiting the equivalence of conditional expectation and orthogonal projections. We show existence of the proposed Gaussian process and that it is the limit of a conventional Gaussian process conditioned on an increasing but finite number of points. We illustrate the applicability via numerical examples and proof-of-concept.

Optimal Experimental Designs for Precision Medicine with Multi-component Treatments

Yeng Saanchi, NC State

In recent years, many medical and behavioral interventions have evolved to combine multiple therapeutic options. We refer to treatments that consist of a combination of therapeutic options as multi-component treatments. Cancer treatment is one area in which the use of such treatments is increasingly becoming common. Considering several therapeutic options in a clinical trial can result in a potentially large number of unique treatment combinations at each decision point. Ignoring the overlap among components and viewing each combination as unique in evaluating the treatment combinations is inefficient. We propose a strategy that leverages classical optimal design methods to assign combination treatments to maximize power for primary and secondary analyses of interest. Our method uses the c and D optimality criteria in constructing a composite objective function to be minimized to obtain the optimal allocation of multi-component treatments to promote the patient’s long-term well-being.

Simulation Experiment Design for Calibration via Active Learning

Ozge Surer, Miami University

Simulation models often have parameters as input and return outputs to understand the behavior of complex systems. Calibration is the process of estimating the values of the parameters in a simulation model in light of observed data from the system that is being simulated. When simulation models are expensive, emulators are built with simulation data as a computationally efficient approximation of an expensive model. An emulator then can be used to predict model outputs, instead of repeatedly running an expensive simulation model during the calibration process. Sequential design with an intelligent selection criterion can guide the process of collecting simulation data to build an emulator, making the calibration process more efficient and effective. This presentation focuses on two novel criteria for sequentially acquiring new simulation data in an active learning setting by considering uncertainties on the posterior density of parameters. Analysis of several simulation experiments and real-data simulation experiments from epidemiology demonstrates that proposed approaches result in improved posterior and field predictions.

Fall Technical Conference

2024 Abstracts