Skip to main content

Statistics Seminar

Quantile Regression for Panel Data and Factor Models

 

Abstract: For nearly 25 years, advances in longitudinal data and quantile regression were developed almost completely in parallel, with no intersection until the work by Koenker (2004). The early theoretical work in statistics and economics raised more questions than answers, but it encouraged the development of several promising new approaches and research that offered a better understanding of the challenges and possibilities in the intersection of the literatures. Panel data quantile regression allows the estimation of effects that are heterogeneous throughout the conditional distribution of the response variable, while controlling for individual and time specific confounders. This type of heterogeneous effect is not be well summarized by the average effect. For instance, the relationship between number of students in a class and average educational achievement has been extensively investigated, but research also shows that class size affects differently low-achieving and high-achieving students. The recent advances in panel data include several methods and algorithms that created opportunities for more informative and robust empirical analysis in models with subject heterogeneity and factor structure.

 

Date:
Location:
https://uky.zoom.us/j/81136716902
Event Series:

Meta Clustering for Collaborative Learning

 

Abstract: An emerging number of learning scenarios involve multiple learners/analysts each equipped with a unique dataset and algorithm, who may collaborate with each other to enhance their learning performance. From the perspective of a particular learner, a careless collaboration with task-irrelevant other learners is likely to incur modeling error. A crucial problem is to search for the most appropriate collaborators so that their data and modeling resources can be effectively leveraged. Motivated by this, we propose to study the problem of ‘meta clustering’, where the goal is to identify subsets of relevant learners whose collaboration will improve the performance of each individual learner. In particular, we study the scenario where each learner is performing a supervised regression, and the meta clustering aims to categorize the underlying supervised relations (between responses and predictors) instead of the raw data. We propose a general method named as Select-Exchange-Cluster (SEC) for performing such a clustering. Our method is computationally efficient as it does not require each learner to exchange their raw data. We prove that the SEC method can accurately cluster the learners into appropriate collaboration sets according to their underlying regression functions. Synthetic and real data examples show the desired performance and wide applicability of SEC to a variety of learning tasks.

 

Date:
Location:
https://uky.zoom.us/j/84096885475
Event Series:

Statistical Methods for Complex Cancer Data : Q & A with Dr. Chi Wang

 

Abstract: State-of-the-art high-throughput technologies, such as next-generation sequencing and mass spectrometry, have dramatically advanced the understanding of biological organisms and diseases. However, the big and complex data generated from those technologies pose substantial challenges in data analysis that demand rapid development of new statistical methods. In this talk, I will present several statistical methods my students and I have developed for analyzing genomic, transcriptomic, proteomic and metabolomic data with applications in cancer studies. I will also discuss ongoing projects and future directions.

 

Date:
Location:
https://uky.zoom.us/j/83971643563
Event Series:

From Collaboration to Dissertation: The Case of LD50

 

Abstract: It is common for authors to contact a statistician, saying, “My paper has been tentatively accepted, but I’ve been told to consult a statistician.” This typically indicates that something is wrong or lacking with the data analysis in the paper. Often they need a “better” model for their data.  In the case we are going to discuss, the model was fine, but there were multiple problems with the way the data was summarized.  My first attempts at constructing more appropriate summaries failed miserably. Eventually, I employed an untested version of the bootstrap, which produced reasonable confidence intervals with meaningful interpretations. After completing the analysis, I recognized that I had implemented a new methodology for computing a confidence interval for the LD50, the dose of a drug needed to kill 50% of the experimental units. Such an innovation still has various empirical and theoretical properties that can be explored, and thus can serve as the foundation for a dissertation.

 

Date:
Location:
https://uky.zoom.us/j/82169066733
Event Series:

A Bayesian Stochastic Approximation Method

Abstract: Motivated by the goal of improving the efficiency of a sequential design with small sample size, we propose a novel Bayesian stochastic approximation method to estimate the root of a regression function. The method features adaptive local modeling and nonrecursive iteration. Consistency of the Bayes estimator is established. Simulation studies show its superiority in small-sample performance to Robbins–Monro type procedures. Extension to a version of generalized multivariate quantile is presented.

 

Bio: Jin Xu is a professor of statistics at East China Normal University. He received his PhD from Bowling Green State University. His research interests include statistical methods in clinical trials, multivariate analysis, and sequential designs.

 

Date:
Location:
https://uky.zoom.us/j/84973251075
Event Series:

Central Quantile Subspace

Abstract: Quantile regression (QR) is becoming increasingly popular due to its relevance in many scientific investigations.  There is a great amount of work about linear and nonlinear QR models.  Specifically, nonparametric estimation of the conditional quantiles received particular attention, due to its model flexibility.  However, nonparametric QR techniques are limited in the number of covariates.  Dimension reduction offers a solution to this problem by considering low-dimensional smoothing without specifying any parametric or nonparametric regression relation.  Existing dimension reduction techniques focus on the entire conditional distribution.  We, on the other hand, turn our attention to dimension reduction techniques for conditional quantiles and introduce a new method for reducing the dimension of the predictor X.  The novelty of this work is threefold.  We start by considering a single index quantile regression model, which assumes that the conditional quantile depends on X through a single linear combination of the predictors, then extend to a multi index quantile regression model, and finally, generalize the proposed methodology to any statistical functional of the conditional distribution.  The performance of the methodology is demonstrated through simulation examples and real data applications.  Our results suggest that this method has a good finite sample performance and often outperforms existing methods.  

 

Date:
Location:
https://uky.zoom.us/j/82109355405
Event Series:

Support Vector Machine based real time sufficient dimension reduction

Abstract: We discuss in this talk one of the first efforts for real-time sufficient dimension reduction. Support Vector Machine (SVM) based sufficient dimension reduction algorithms were proposed the last decade to provide a unified framework for linear and nonlinear sufficient dimension reduction.  We present our idea of using a variant of the classic SVM algorithm known as Least Squares SVM (LSSVM) to achieve real time sufficient dimension reduction.   We demonstrate the computational advantages as well as the computational efficiency of our algorithm through simulated and real data experiments.   This is joint work with my collaborators Yuexiao Dong (Temple University) and Seung Jun Shin (Korea University).

 

Date:
Location:
https://uky.zoom.us/j/92843330341
Event Series:

Kaplan-Meier Estimator, Alternative Variance Formula and Restricted Mean Survival Time Based Tests

Abstract: Recently there are many research reports that advocate the use of Restricted Mean Survival Time (RMST) to compare treatment effects when the Proportional Hazards assumption is in doubt (i.e. when the log-rank test may not work well). SAS STAT version 15.1 or later included this option.
 
We shall take a closer look at the variance of the Kaplan-Meier integral, both theoretically (as related to the Semiparametric Fisher Information) and how to estimate it (if we must).
 
Some simulation for proposed improvements over the existing RMST tests calculation will be presented.
 
Date:
Location:
https://uky.zoom.us/j/83021956044
Event Series:

Fiducial Distributions with Applications

Abstract: Fiducial distribution for a parameter is essentially the posterior distribution with no prior distribution on the parameter.  In this talk, we shall describe Fisher's method of finding a fiducial distribution for a parameter and fiducial inference through examples involving well-known distributions such as the normal and binomial. We then describe the approach for finding fiducial distributions for the parameters of a location-scale family. In particular, we shall see fiducial methods for finding confidence intervals, prediction intervals, prediction limits for the mean of a future sample and one-sided tolerance limits in one-way random models. Application to analysis of zero-inflated lognormal data will also be discussed. All the methods will be illustrated using some practical examples.

 

Date:
Location:
https://uky.zoom.us/j/86465597933
Event Series:

On model-based clustering of skewed matrix and tensor data

Abstract: The existing finite mixture modeling and model-based clustering literature focuses primarily on the analysis of multivariate data observed in the form of vectors, with each element representing a specific feature. In this setting, multivariate Gaussian mixture models have been the most commonly used. Due to severe modeling issues observed when normal components cannot provide adequate fit to groups, much attention is paid to developing models capable of accounting for skewness in data. We target the problem of mixture modeling with components that can handle skewness in matrix- and tensor-valued data. The proposed developments open a wide range of possible modeling capabilities, with numerous applications, as illustrated in the talk.

 

Date:
Location:
https://uky.zoom.us/j/88952989266
Event Series:
Subscribe to Statistics Seminar