Skip to main content

Statistics Seminar

Seminar

Jonas Beck1 1 Paris Lodron University of Salzburg, Department of Artificial Intelligence and Human Interfaces, Austria, jonas.beck@plus.ac.at

 

Abstract: A fundamental functional in nonparametric statistics is the Mann-Whitney functional θ = P(X < Y ) , which constitutes the basis for the most popular nonparametric procedures. The functional θ measures a location or stochastic tendency effect between two distributions. A limitation of θ is its inability to capture scale differences. If differences of this nature are to be detected, specific tests for scale or omnibus tests need to be employed. However, the latter often suffer from low power, and they do not yield interpretable effect measures. In this manuscript, we extend θ by additionally incorporating the recently introduced distribution overlap index (nonparametric dispersion measure) I2 that can be expressed in terms of the quantile process. We derive the joint asymptotic distribution of the respective estimators of θ and I2 and construct confidence regions. Extending the Wilcoxon-Mann-Whitney test, we introduce a new test based on the joint use of these functionals. It results in much larger consistency regions while maintaining competitive power to the rank sum test for situations in which θ alone would suffice. Compared with classical omnibus tests, the simulated power is much improved. Additionally, the newly proposed inference method yields effect measures whose interpretation is surprisingly straightforward.

Date:
-
Location:
MDS 220
Tags/Keywords:
Event Series:

Gaussian Process Modeling for Dissolution Curve Comparison

Title: Gaussian Process Modeling for Dissolution Curve Comparison

Abstract: Dissolution studies are an integral part of pharmaceutical drug development, yet standard methods for analyzing dissolution data are inadequate for capturing the true underlying shapes of the dissolution curves. Methods based on similarity factors, such as the f2 statistic, have been developed to demonstrate comparability of dissolution curves, however this inability to capture the shapes of the dissolution curves can lead to substantial bias in comparability estimators. In this talk, we propose two novel semi-parametric dissolution curve modeling strategies for establishing the comparability of dissolution curves. The first method relies upon hierarchical Gaussian process regression models to construct an f2 statistic based on continuous time modeling that results in significant bias reduction. The second method uses a Bayesian model selection approach for creating a framework that does not suffer from the limitations of the f2 statistic. Overall, these two methods are shown to be superior to their comparator methods and provide feasible alternatives for similarity assessment under practical limitations. Illustrations highlighting the success of our methods are provided for two motivating real dissolution data sets from the literature, as well as extensive simulation studies.

Date:
-
Location:
MDS 220
Tags/Keywords:
Event Series:

Linear Models for matrix-variate data

Abstract: Observations that are made on p response variables and each response variable is measured over n sites or time points, construct matrix-variate response variable, and arise across a wide range of disciplines, including medical, environmental and agricultural studies. The observations in an (n x p)-dimensional matrix-variate sample are not independent, but are doubly correlated. The popularity of the classical general linear model (CGLM) is mostly due to the ease of modeling and authentication of the appropriateness of the model. However, CGLM is not appropriate for doubly correlated matrix-variate data. We propose an extension of CGLM for matrix-variate data with exchangeably distributed errors for multiple observations. Maximum likelihood estimates of the matrix parameters of the intercept, slope and the eigenblocks of the exchangeable error matrix are derived. The distributions of these estimators are also derived. The practical implications of the methodological aspects of the proposed extended model for matrix-variate data are demonstrated using two medical datasets.

Date:
-
Location:
MDS 220
Tags/Keywords:
Event Series:

Surrogate method for partial association between mixed data with application to well-being survey analysis

Abstract: 

This paper is motivated by the analysis of a survey study focusing on college student well-being before and after the COVID-19 pandemic outbreak. A statistical challenge in well-being studies lies in the multidimensionality of outcome variables, recorded in various scales such as continuous, binary, or ordinal. The presence of mixed data complicates the examination of their relationships when adjusting for important covariates. To address this challenge, we propose a unifying framework for studying partial association between mixed data. We achieve this by defining a unified residual using the surrogate method. The idea is to map the residual randomness to a consistent continuous scale, regardless of the original scales of outcome variables. This framework applies to parametric or semiparametric models for covariate adjustments. We validate the use of such residuals for assessing partial association, introducing a measure that generalizes classical Kendall’s tau to capture both partial and marginal associations. Moreover, our development advances the theory of the surrogate method by demonstrating its applicability without requiring outcome variables to have a latent variable structure. In the analysis of the college student well-being survey, our proposed method unveils the contingency of relationships between multidimensional well-being measures and micro personal risk factors (e.g., physical health, loneliness, and accommodation), as well as the macro disruption caused by COVID-19. 

Date:
-
Location:
MDS 220
Tags/Keywords:
Event Series:

Importance tempering of Markov chain Monte Carlo schemes

Abstract: Informed importance tempering (IIT) is an easy-to-implement MCMC algorithm that can be seen as an extension of the familiar Metropolis-Hastings algorithm with the special feature that informed proposals are always accepted, and which was shown in Zhou and Smith (2022) to converge much more quickly in some common circumstances. This work develops a new, comprehensive guide to the use of IIT in many situations. First, we propose two IIT schemes that run faster than existing informed MCMC methods on discrete spaces by not requiring the posterior evaluation of all neighboring states. Second, we integrate IIT with other MCMC techniques, including simulated tempering, pseudo-marginal and multiple-try methods (on general state spaces), which have been conventionally implemented as Metropolis-Hastings schemes and can suffer from low acceptance rates. The use of IIT allows us to always accept proposals and brings about new opportunities for optimizing the sampler which are not possible under the Metropolis-Hastings framework. Numerical examples illustrating our findings are provided for each proposed algorithm, and a general theory on the complexity of IIT methods is developed. Joint work with G. Li and A. Smith. 

Date:
-
Location:
MDS 220
Tags/Keywords:
Event Series:

Balancing Inferential Integrity and Disclosure Risk via a Multiple Imputation Synthesis Strategy

Abstract: Responsible data sharing anchors research reproducibility and promotes the integrity of scientific research. The possibility of identification creates tension between data sharing to facilitate medical treatment or collaborative research and patient privacy protection. Information loss due to incorrect specification of imputation models can weaken or even invalidate the inference obtained from the synthetic datasets. In this talk, we focus on privacy protection in the direction of statistical disclosure control. We introduce a synthetic component into the synthesis strategy behind the traditional multiple imputation framework to ease the task of conducting inferences for researchers with limited statistical backgrounds. The tuning of the injected synthetic components enables balancing inferential quality and disclosure risk. Its addition also has the advantage of protecting against model misspecification. This framework can be combined with existing missing data methods to produce complete synthetic data sets for public release. We show, using the Canadian Scleroderma Research Group data set, that the new synthesis strategy achieves better data utility than the direct use of the classical multiple imputation approach while providing similar or better protection against identity disclosure. This is joint work with Bei Jiang, Adrian Raftery, and Russell Steele.

Date:
-
Location:
MDS 220
Tags/Keywords:
Event Series:

Causal Discovery from Multivariate Functional Data

Abstract: Discovering causal relationship using multivariate functional data has received a significant amount of attention very recently. We introduce a functional linear structural equation model for causal structure learning. To enhance interpretability, our model involves a low-dimensional causal embedded space such that all the relevant causal information in the multivariate functional data is preserved in this lower-dimensional subspace. We prove that the proposed model is causally identifiable under standard assumptions that are often made in the causal discovery literature. To carry out inference of our model, we develop a fully Bayesian framework with suitable prior specifications and uncertainty quantification through posterior summaries. We illustrate the superior performance of our method over existing methods in terms of causal graph estimation through extensive simulation studies. We also demonstrate the proposed method using a brain EEG dataset.

Date:
-
Location:
MDS 220
Tags/Keywords:
Event Series:

Recent Advances in Independent Testing of Stochastic Processes

Abstract:

This talk focuses on independent testing of two stochastic processes. In contrast to i.i.d. random variable data, data originating from a stochastic process typically exhibit strong correlations. This inherent feature of stochastic processes poses significant challenges when conducting statistical inference. In this talk, we will commence by reviewing the historical context of Yule's nonsense correlation. This correlation is defined as the correlation of two independent random walks whose distribution is known to be heavily dispersed and frequently large in absolute value. This phenomenon demonstrates the difficulty inherent in conducting statistical inference for stochastic processes. The second part of the talk is devoted to AR(1) processes. We investigated the convergence rate of the distribution of the correlation between two independent AR(1) processes to the normal distribution. Our analysis demonstrates that this convergence rate follows the order of the square root of the fraction logarithm of n over n, with n representing the length of the processes. Finally, we will discuss the potential for a new methodology to test the independence of both random walks and AR(1) processes.

Date:
-
Location:
MDS 220
Tags/Keywords:
Event Series:

Maximum Wilcoxon-Mann-Whitney Test in High Dimensional Applications

Abstract: 

The statistical comparison of two multivariate samples is a frequent task, e.g. in biomarker analysis. Parametric and nonparametric multivariate analysis of variance (MANOVA) procedures are well established procedures for the analysis of such data. Which method to use depends on the scales of the endpoints and whether the assumption of a parametric multivariate distribution is meaningful. However, in case of a significant outcome, MANOVA methods can only provide the information that the treatments (conditions) differ in any of endpoints; they cannot locate the guilty endpoint(s). Multiple contrast tests in terms of maximum tests on the contrary provide local test results and thus the information of interest.

The maximum test method controls the error rate by comparing the value of the largest contrast in magnitude to the (1-α)-equicoordinate quantile of the joint distribution of all considered contrasts. The advantage of this approach over existing and commonly used methods that control the multiple type-I error rate, such as Bonferroni, Holm, or Hochberg, is that it is appealingly simple, yet has sufficient power to detect a significant difference in high-dimensional designs, and does not make strong assumptions (such as MTP2) about the joint distribution of test statistics. Furthermore, the computation of simultaneous confidence intervals is possible. The challenge, however, is that the joint distribution of the test statistics used must be known in order to implement the method.

In this talk, we develop a simultaneous maximum Wilcoxon-Mann-Whitney test for the analysis of multivariate data in two independent samples. We hereby consider both the cases of low-and high-dimensional designs. We derive the (asymptotic) joint distribution of the test statistic and propose different bootstrap approximations for small sample sizes. We investigate their quality within extensive simulation studies. It turns out that the methods control the multiple type-I error rate well, even in high-dimensional designs with small sample sizes. A real data set illustrates the application.

Date:
-
Location:
MDS 220
Tags/Keywords:
Event Series:
Subscribe to Statistics Seminar