FTC 2024 Short Courses

Short courses will be held on October 8, 2024. Attendees may registers for the conference and short courses together or separately. Conference attendance is not required to attend a short course.

Full Day

Steven Gilmour and Olga Egorova – Modern Response Surface Methodology

Christian Lucero and Alex Jaimes-Sandoval – Introduction to Python and Statistical Machine Learning

Half Day

Peng Liu – Introduction to Reliability Data Analysis and Reintroduction to Maximum Likelihood Method

Peng Liu – Introduction to Statistical Analysis Software Development

Luke Hagar – A Practitioner’s Guide to A/B tests with Posterior Probabilities

Mario Davidson and Jennifer Van Mullekom – Ethics in Statistics and Data Science

FULL DAY COURSES

8:30am to 5:30pm (1 hour lunch at 12:30pm) 

Modern Response Surface Methodology

Steven Gilmour, King’s College London

Olga Egorova, King’s College London

Abstract: This course provides a brief overview of the basics of response surface methodology (RSM) and introduces the most recent work, with a particular focus on the multi-objective design of experiments. We will introduce the methodological background accompanied by the practical tools and examples (in R).

Common practical experimental setups involve limitations of experimental resources and/or restrictions in terms of their allocation, such as hard-to-change factors. At the same time experimenters would like to plan according to various objectives, such as the quality of the model parameters’ estimators and overall predictions, being able to protect against and effectively detect a potential lack-of-fit of the fitted model. The course presents the methodological knowledge and tools for such multi-objective planning, focusing on understanding the underlying principles and their implementation in various frameworks. The flexibility of the approach presented allows practitioners to obtain the solutions tailored to their specific experimental aims and circumstances and make informed choices regarding their experimental plan.

Target audience and prerequisites: The course is aimed at statisticians and experienced users of statistics who work with experimenters in various fields and is designed for attendees with basic familiarity with optimal DoE.

CONTACT: steven.gilmour@kcl.ac.uk

Steven Gilmour is Professor of Statistics at King’s College London, where he also served as Chair of the Department of Mathematics from 2019-24. He has more than 35 years of research experience in the theory, methodology and applications of the design and analysis of experiments. Steve is co-author of the book Statistical Principles for the Design of Experiments, as well as more than 100 research articles in both Statistics journals and diverse application areas. He has taught design and analysis of experiments to many different audiences, as well as teaching most other areas of statistics. Steve’s current research includes various aspects of multi-objective optimal design, model-robust design, and design in the context of big data. He is known in both his teaching and research for always trying to build new ideas on the strong classical foundations of the design of experiments.

Olga Egorova is a Research Associate at King’s College London and has been working on methodological developments in optimal experimental design and providing application-tailored design and analysis support for experimenters across academia and industry. She has more than 5 years of research experience, working on multi-disciplinary projects in different collaborative settings, along with teaching various statistical methods. Olga specializes in multi-objective experimental design, and most recently in screening designs and sequential planning of experiments – building comprehensive statistical methodologies for general experimentation as well as particular applications.

Introduction to Python and Statistical Machine Learning

Christian Lucero, Virginia Tech

Alex Jaimes-Sandoval, Virginia Tech

Abstract: Statistical machine learning involves using statistical models and other algorithms that can learn from data and make predictions or decisions. Supervised learning includes regression and classification techniques and will be the main focus of this course. Topics include methods for model building, support vector machines, discriminant analysis methods, and tree-based methods. In order to properly use these methods, data cleaning and manipulation are necessary precursors along with basic exploratory data analysis (i.e. summary statistics and visualizations) which help investigators to form research questions and make decisions about which statistical machine learning methods to use.

In this full-day course, we will focus upon the complete data analysis workflow. During the first half we will introduce the Python programming language including reading in data, cleaning and manipulating data, some exploratory data analysis, and producing visualizations. In the second half of the course, we will have an overview of statistical learning methods and provide some case studies that demonstrate the use of these techniques.

Target audience and prerequisites: This course is intended for researchers who have some knowledge of statistics. No prior experience with Python is required. Participants are encouraged to download and install anaconda (https://www.anaconda.com/download) prior to the course, but we can assist you at the start of the event. An alternative to installation will also be provided.

CONTACT: chlucero@vt.edu

Christian Lucero is a Collegiate Assistant Professor of Statistics in the College of Science at Virginia Polytechnic and State University (Virginia Tech). He earned his PhD in Mathematical and Computer Sciences from the Colorado School of Mines in 2013 where his research focused upon optimal experimental designs for ill-posed inverse problems. His general interests include statistical computing, statistical machine learning, inverse problems, uncertainty quantification, and education in statistics and data science. While his home department is Statistics, he is a core faculty member teaching in the Computational Modeling and Data Analytics (CMDA) program. He regularly teaches courses on statistical machine learning and statistical computing at both the undergraduate and graduate levels. He advocates the use of experiential learning and prefers to teach using real-world datasets to answer questions that stakeholders care about as much as possible. One of his goals is to give students many opportunities to work with real data as much as possible, and as early as possible, and which he regularly achieves this through hosting data competitions each semester at Virginia Tech. The short course he will be presenting is based upon the most popular CMDA course that is regularly taken by hundreds of Virginia Tech students each year.

Alex Jaimes-Sandoval received his undergraduate degree from Virginia Tech in Computational Modeling and Data Analytics in 2024 with minors in Mathematics, Statistics, and Computer Science. He served as the head of the CMDA Computing Consultants, an in-major tutoring group that provides help with Python, R, Java, and other languages that are regularly used by students in CMDA courses. He has received multiple awards including the Alice-Luther Hamlett Scholarship and the CMDA Outstanding Senior Award. Alex has also won the grand prize in multiple American Statistical Association (ASA) DataFest competitions and has interned at JP Morgan Chase as an AI Analyst. He is currently a graduate student in Computer Science and will be focusing on Machine Learning and Numerical Analysis.

HALF DAY COURSES

Introduction to Reliability Data Analysis and Reintroduction to Maximum Likelihood Method
8:30pm to 12:30pm

Peng Liu, JMP

Abstract: Reliability data analysis is an important subject to the reliability engineering discipline. Reliability data analysis is a specialized subject to general statisticians. There have been enormous advances in research and development that have been invested in the area but may not be well-known to the general audience. The methodologies that have been developed and experiences that have been learned in this area, however, are extremely valuable to all scientists, engineers, and statisticians in general.

This half-day short course will serve as an introduction to reliability data analysis, by bringing interesting cases that are important to reliability engineers, and maybe interesting to the general audience including all scientists, engineers, and statisticians. Specific subjects include censored data, analysis of censored data, analysis of non-normal data, challenges of very few data, challenges to extrapolation in space and time, analysis of recurrent events, and system reliability engineering if time allows.

The instructor is a statistician and a software engineer by training. He does not analyze reliability data daily as a profession. He, however, has dedicated 17 years to developing software for analyzing reliability data. Some prominent statisticians have recognized that it takes one statistician to fully implement the software to understand the corresponding theory and method. The instructor has gained firsthand experience with analyzing reliability data through the software implementation.

This short course also wants to spend some extra time as a reintroduction to the maximum likelihood method. Based on the instructor’s own background and experience, the method of maximum likelihood is under-appreciated and less understood than it needs to be. That is an unfortunate reality among scientists, engineers, and even general statisticians. The subject will be inter-weaved into the reliability data analysis part of the short course, whenever it is relevant and appropriate. The subject is so important, the instructor believes that with it one can almost treat it as a first principle for statistical analysis, while without it one might be easily overwhelmed by statistical analysis as a huge bag of tricks.

Target audience and prerequisites:The short course will focus on the practicality of both subjects. There will be no complex mathematics. Concepts will be illustrated using plots and graphs as much as possible. The required background from the audience is minimal. Even if both subjects are new to you, the short course might help you to be better prepared to go into this field or apply what you learn here to other areas.

CONTACT: peng.liu@jmp.com

Peng Liu is a Principal Research Statistician Developer at JMP Statistical Discovery LLC. He holds a Ph.D. in statistics from NCSU. He has been working at JMP since 2007. He specializes in computational statistics, software engineering, reliability data analysis, reliability engineering, time series analysis, and time series forecasting. He is responsible for developing and maintaining all JMP platforms in the above areas. He has a broad interest in statistical analysis research and software product development.

Introduction to Statistical Analysis Software Development
1:30pm to 5:30pm

Peng Liu, JMP

Abstract: The short course will emphasize the instructor’s main experience on the development of desktop software driven by a graphical user interface (GUI). Different from software development of arbitrary desktop GUI applications, the short course will bring attention to statistical analysis software development, which has its unique challenges and opportunities. Although the course is centered around GUI driven statistical analysis software development, the instructor will cover three other important areas of software development that are important to data scientists and machine learning engineers. Two are prominent, which are software development in the R ecosystem, and that in Python ecosystem, which are related to the GUI driven statistical analysis software development, but substantially different. They have their advantages, disadvantages, challenges, and opportunities. There is a new area of statistical analysis software: web-based software, both single user and collaborative software. The web-based software is very likely to rely on existing R and Python eco-systems, but purely JavaScript-based approach is also feasible. The new area has its own challenges and opportunities.

The content of the short course includes:

  • basic knowledge and skill set of developing a GUI driven desktop software,
  • basic knowledge and skill set of developing a statistical analysis software,
  • an example of developing a maximum likelihood estimation procedure,
  • basic knowledge and skill set of testing a statistical analysis software,
  • basic knowledge and skill set of software engineering, and
  • special knowledge and skill set for developing in R, Python, and Web ecosystems.

Target audience and prerequisites:The required background from the audience is minimal. The intended audience is who has an interest in the conjunction with statistical analysis and software engineering. The GUI-specific part of the short course will be directly relevant to someone who has an interest in working on GUI driven software. The non-GUI-specific part of course will be useful to anyone regardless of the actual programming paradigm.

CONTACT: peng.liu@jmp.com

Peng Liu is a Principal Research Statistician Developer at JMP Statistical Discovery LLC. He holds a Ph.D. in statistics from NCSU. He has been working at JMP since 2007. He specializes in computational statistics, software engineering, reliability data analysis, reliability engineering, time series analysis, and time series forecasting. He is responsible for developing and maintaining all JMP platforms in the above areas. He has a broad interest in statistical analysis research and software product development.

A Practitioner’s Guide to A/B tests with Posterior Probabilities
1:30pm to 5:30pm

Luke Hagar, University of Waterloo

Abstract: Comparing two alternatives drives decision making. In online controlled experiments, such two-group comparisons are called A/B tests. While A/B tests motivate this course, the overviewed methods can be generally applied to design and analyze and two-group comparisons in reliability, pharmaceutical, and quality control contexts. Characteristics, such as median watch times for two video advertisements, quantify the impact of each choice available to decision makers. Given estimates for these two characteristics, such comparisons are often made via hypothesis tests with p-values.

However, posterior probabilities are easier for practitioners to interpret than p-values or Bayes factors. While posterior probabilities equip decision makers with an intuitive means to compare two alternatives based on superiority or practical equivalence, the process to calculate them is often computationally complex. It is also not trivial to assess the operating characteristics of an A/B test based on posterior probabilities, including power and the type I error rate. Practitioners typically use existing software to approximate posterior distributions, such as the rjags package in R. These solutions reduce the mathematical overhead required to calculate posterior probabilities, but they yield slow performance when realistic statistical models and large data sets are considered.

This course focuses on the design and analysis of A/B tests with posterior probabilities, where experimental design is considered by way of sample size determination. General R functions will be provided to leverage – and improve upon – the rjags package for a variety of statistical models. These functions allow practitioners to quickly compute posterior probabilities without intensive mathematical derivations.

The learning outcomes for this short course are as follows:

  • Familiarize participants with online A/B tests.
  • Describe how posterior distributions incorporate prior beliefs and information from observed data for Bayesian inference.
  • Draw formal conclusions from A/B tests based on posterior probabilities.
  • Assess the operating characteristics of A/B tests based on posterior probabilities via simulation.
  • Expedite computational design and analysis of A/B tests with posterior probabilities.

CONTACT: lmhagar@uwaterloo.ca

Luke Hagar is currently a PhD Candidate in Statistics at the University of Waterloo, under the supervision of Dr. Nathaniel Stevens. Luke will soon begin a Postdoctoral Fellowship at McGill University. Previously, he earned MMATH (2021) and BMATH (2020) degrees in Statistics from the University of Waterloo.

At the University of Waterloo, Luke taught an undergraduate course in computational statistics and data analysis as a Sessional Lecturer; he also worked part time with their Centre for Teaching Excellence and Statistical Consulting and Survey Research Unit. Luke’s research interests include experimental design, hypothesis testing, Bayesian methods, and computational inference. He also volunteers with the American Society for Quality in the Chemical and Process Industries Division.

Ethics in Statistics and Data Science
8:30pm to 12:30pm

Mario Davidson, Vanderbilt

Jennifer Van Mullekom, Virginia Tech

Abstract: The recent revision of the American Statistical Association (ASA) Ethical Guidelines for Statistical Practice combined with increased media attention on ethical data science algorithms has prompted our profession to renew its commitment to ethics education. We have developed DEPICT, a six-phase ethical reasoning process tailored to statistics and data science. We will present overviews of ethics paradigms and the ASA Ethical Guidelines followed by a deep dive into the DEPICT process. Participants will Define ethical dilemmas; Explore possible resolutions; Plan resolutions; anticipate issues associated with Implementation; Contemplate their actions; and Transcend to incorporate key learnings to avoid future dilemmas. The course consists of interactive exercises to learn ethical frameworks, guidance, and reasoning followed by applications in complex, nuanced case studies with applications in the physical & engineering sciences and quality & productivity. Attendees will participate in facilitated small group discussions as they apply the framework, reporting key elements of their small group discussion to the larger group. Participants will develop multi-perspective views and debate the pros and cons of various resolutions based on professional guidance. This course is appropriate for students, faculty, early career professionals, and managers in any application area. Participants will learn how to apply DEPICT as well as teach or mentor ethical reasoning in statistics and data science. Pre-reading the American Statistical Association Ethical Guidelines for Statistical Practice and a sample case study will allow participants to get the most out of the course.

CONTACT: vanmuljh@vt.edu

Dr. Mario Davidson is an associate professor in biostatistics at Vanderbilt University School of Medicine in Nashville, TN. Holding the position of Associate Vice-Chair of Equity, Diversity, and Inclusion within the Department of Biostatistics, he plays a pivotal role in fostering a culture of inclusivity. Dr. Davidson earned his Ph.D. in Statistics Education from The Ohio State University.

As the lead biostatistician for medical students and educators at Vanderbilt, Dr. Davidson directs the second-year research course known as PLAN. This innovative program employs weekly topics to guide students in comprehending research methodologies and crafting protocols. Additionally, he has facilitated numerous case-based learning courses for medical students and oversees the Classroom Peer Reviews, where he administers training for faculty peer reviewers.

Dr. Davidson is credited with developing the department’s integral course, Statistical Collaboration in Health Sciences. This course emphasizes communication, professionalism, and ethics, showcasing his expertise in statistical collaboration, education, and ethical considerations. Actively engaged in service committees at Vanderbilt, he contributes to discussions on diversity, equity, inclusion, assessment, bias, and admissions.

A member of the Academy of Excellence in Education, Dr. Davidson has served on its board, reflecting his commitment to advancing educational excellence.

After a 20-year career in industry, Dr. Jennifer Van Mullekom joined Virginia Tech in Fall 2016 as the Director of the Statistical Applications and Innovations Group (SAIG) where she is a Professor of Practice in the Department of Statistics. In addition to directing SAIG, she teaches collaboration skills and design of experiments to graduate students while serving as an active member of the global statistical practice community.

Formerly, she was a Senior Consulting Statistician and Certified Six Sigma Master Black Belt in DuPont’s Applied Statistics Group, supporting the DuPont Protection Technologies business. At DuPont, she provided statistical leadership to the Tyvek® Medical Packaging Transition Project in the areas of product development, quality, commercialization, and regulatory. Her contributions to this project earned her a DuPont Engineering Excellence Award, one of the company’s highest honors. She continues to collaborate with DuPont on various material science projects as permitted by Virginia Tech’s external consulting policy.

Dr. Van Mullekom is active in professional societies, holding leadership roles in the American Statistical Association (ASA) and the American Society for Quality (ASQ). She is an inventor on two US Patents and has also worked at Lubrizol and Capital One. Dr. Van Mullekom is a regular participant at the Conference on Statistical Practice on topics such as communication, collaboration, leadership, and ethics. In 2024, she was honored with the American Statistical Association’s Section on Statistical Consulting Mentoring Award for her role mentoring junior employees, colleagues, and students. She holds an MS and PhD in Statistics from Virginia Tech and a BS in Mathematics and a BS ED in Mathematics Education from Concord University.