Statistics Seminar
High-dimensional Change-point Detection Using Generalized Homogeneity Metrics
Abstract: Change-point detection has been a classical problem in statistics, finding applications in a wide variety of fields. A nonparametric change-point detection procedure is concerned with detecting abrupt distributional changes in the data generating distribution, rather than only mean changes. We consider the problem of detecting an unknown number of change-points in an independent sequence of high-dimensional observations and testing for the significance of the estimated change-point locations. Our approach essentially rests upon nonparametric tests for the homogeneity of two high-dimensional distributions. We construct a single change-point location estimator via defining a cumulative sum process in an embedded Hilbert space. As the main theoretical innovation, we rigorously derive its limiting distribution under the high-dimension medium sample size framework. Subsequently, we combine our statistic with the idea of wild binary segmentation to recursively estimate and test for multiple change-point locations. The superior performance of our methodology compared to several other existing procedures is illustrated via both simulated and real datasets.
Joint Bayesian analysis of multiple response-types using the hierarchical generalized transformation model
Abstract: Consider the situation where an analyst has a Bayesian statistical model that performs well for continuous data. However, suppose the observed dataset consists of multiple response-types (e.g., continuous, count-valued, Bernoulli trials, etc.), which are distributed from more than one class of distributions. We refer to these types of data as "multiple response-type" datasets. The goal of this talk is to introduce a reasonable easy-to-implement all-purpose method that "converts" a Bayesian statistical model for continuous responses (call this the preferred model) into a Bayesian model for multiple response-type datasets. To do this, we consider a transformation of the multiple response-type data, such that the transformed data can be be reasonably modeled using the preferred model. What is unique with our strategy is that we treat the transformations as unknown and use a Bayesian approach to model this uncertainty. The implementation of our Bayesian approach to unknown transformations is straightforward, and involves two steps. The first step produces posterior replicates of the transformed multiple response-type data from a latent conjugate multivariate (LCM) model. The second step involves generating values from the posterior distribution implied by the preferred model. We demonstrate the flexibility of our model through an application to Bayesian additive regression trees (BART) and a spatio-temporal mixed effects (SME) model. We provide a thorough joint multiple response-type spatio-temporal analysis of coronavirus disease 2019 (COVID-19) cases, the adjust closing price of the Dow Jones Industrial, and Google Trends data.
Data Augmentation Algorithms for Bayesian Analysis of Directional Data
Penalized likelihood estimation for Pearson's family of distributions, with an application to financial market risk
Abstract: Pearson’s family of distributions consists of all continuous densities f which are solutions to the differential equation:
f' = −g_βf, where g_β(x) = (x − β1) / (β2 + β3x + β4x^2) for all x in a connected subset of the real line and β = (β1, β2, β3, β4) is a given vector.
It is a rich class of models which includes many classical distributions and which can accommodate both skewness and flexible tail behavior. However, estimation of a Pearson density is challenging because a small variation in β can induce a wild change in the shape of the solution fβ. In this talk, I will show how β and fβ can be estimated effectively through a penalized likelihood procedure incorporating Pearson’s differential equation. The approach relies on a parameter cascading method from the functional data analysis literature. Simulations and an illustration involving the S&P 500 index will show that it leads to estimates of Value-at-Risk and Expected Shortfall that can substantially improve market risk assessment by outperforming the estimates currently used by financial institutions and regulators. This talk is based on joint work with M. Carey (Dublin) and my colleague J.O. Ramsay.
A Multivariate Spatio-temporal Change Point Model of Opioid Overdose Deaths in Ohio
Abstract: Ohio is one of the states most impacted by the opioid epidemic and experienced the second highest age-adjusted fatal drug overdose rate in 2017. Initially it was believed prescription opioids were driving the opioid crisis in Ohio. However as the epidemic evolved, opioid overdose deaths due to fentanyl have drastically increased. In this work, we develop a Bayesian multivariate spatio-temporal model for Ohio county overdose death rates from 2007 to 2018 due to different types of opioids. The log-odds are assumed to follow a spatially varying change point regression model. By assuming the regression coefficients are a multivariate conditional autoregressive process, we capture spatial dependence within each drug type and also dependence across drug types. The proposed model allows us to not only study spatio-temporal trends in overdose death rates, but also to detect county-level shifts in these trends over time for various types of opioids.
Dr. Staci Hepler is originally from southern Ohio and earned a Bachelor's degree in Mathematics Education from Shawnee State University in 2010 before going on to earn a PhD in Statistics from The Ohio State University. In 2015 Staci joined the faculty in the Department of Mathematics and Statistics at Wake Forest University in Winston-Salem, NC. Her primary research interests are in applied spatio-temporal statistics and Bayesian modeling, and she focuses on problems in public health, ecology, and environmental science.
R.L. Anderson Lecture
The final event in the Statistics Seminar Series for the 2019-2020 Academic Year will be the R.L. Anderson lecture, being presented by University of Kentucky faculty member Dr. Richard Kryscio.
Statistics Seminar Series - Lecture 4
Conjugate Bayesian Modeling of High-Dimensional Count Valued Survey Data Under Informative Sampling Designs
We introduce a computationally efficient Bayesian model for predicting high-dimensional dependent count-valued data. In this setting, the Poisson data model with a latent Gaussian process model has become the de facto model. However, this model can be difficult to use in high dimensional settings, where the data may be tabulated over different variables, geographic regions, and times. These computational difficulties are further exacerbated by acknowledging that count-valued data are naturally non-Gaussian. Thus, many of the current approaches, in Bayesian inference, require one to carefully calibrate a Markov chain Monte Carlo (MCMC) technique. We avoid MCMC methods that require tuning by developing a new conjugate multivariate distribution. To incorporate dependence between variables, regions, and time points, a multivariate spatio-temporal mixed effects model (MSTM) is used, resulting in an area-level model. In contrast, unit-level models for survey data offer many advantages over their area-level counterparts, such as potential for more precise estimates and a natural benchmarking property. However, two main challenges occur in this context: accounting for an informative survey design and handling non-Gaussian data types. The pseudo-likelihood approach is one solution to the former, and conjugate multivariate distribution theory offers a solution to the latter. By combining these approaches, we attain a unit-level model for count data that accounts for informative sampling designs and includes fully Bayesian model uncertainty propagation. Importantly, conjugate full conditional distributions hold under the pseudo-likelihood, yielding an extremely computationally efficient approach. Our methods are illustrated using data obtained from the US Census Bureau’s American Community Survey (ACS) and Longitudinal Employer-Household Dynamics (LEHD) program.
A link to the signup sheet for meals and meetings can be found here.