Skip to main content

Statistics Colloquium Series

Marginal correlation measures for unpaired clustered data under cluster-based informativeness

In the marginal analysis of clustered data, two types of informativeness have been shown to bias standard method for marginal inference: informative cluster size, in which the number of observations in a cluster is associated with a response variable, and subcluster covariate informativeness, in which the probability that a covariate takes a certain value is associated with the response.  Monte Carlo-based within-cluster resampling estimators and cluster- and covariate-weighted analytic estimators have been suggested to adjust for both of these problems.  In this talk, we suggesting a unifying cluster-weighting paradigm for the marginal analysis of clustered data.  We then apply this paradigm to unpaired, clustered data - data which are paired at the cluster level, but unpaired within cluster - and develop marginal correlation estimators for such data.  The suggested estimators are evaluated through simulations studies, and illustrated with an application to a data from a longitudinal dental study.

Date:
-
Location:
223 MDS Building

Running Markov chain without Markov basis

The methodology of Markov basis initiated by Diaconis and Sturmfels 
(1998) stimulated active research on Markov bases for more than a 
decade. It also motivated improvements of algorithms for Gr\"obner 
basis computation  for toric ideals, such as those implemented in
 4ti2.
However at present explicit forms of Markov bases are known only 
for some relatively simple models, such as the decomposable models 
of contingency tables.
Furthermore general algorithms for Markov bases computation often fail to produce Markov bases even for moderate-sized models in a practical amount of time.
Hence so far we could not perform exact tests based on Markov basis 
methodology for many important practical problems.
In this talk we introduce two alternative methods for running Markov chain instead of using a Markov basis. 
The first one is to use a Markov subbasis for connecting practical fibers. 
The second one is to use a lattice basis which is an integer kernel of 
a design matrix.
Date:
-
Location:
223 MDS Building

An unbiased estimator for the mean of {0,1} random variables whose relative error distribution is known

Refreshments: 3:30-4:00in 312 MDS building
 
Consider the problem of estimating the mean p of a {0,1} random variable.  This problem arises in many area, such as experiments with binary data, estimating exact p-values, and acceptance rejection methods for integration and summation.  Suppose it is desired to know p to a set number of significant figures.  Then it is necessary to bound the relative error of the estimate.  In this talk I will present a new estimate q such that the relative error (q/p - 1) of the estimate has a random distribution that does not depend on p at all!  This allows for a guaranteed quality of the relative error of the estimate using far fewer samples on average than previous methods.  
Date:
-
Location:
MDS 223

MODELLING MARK-RECAPTURE DATA WITH MISIDENTIFICATION

Refreshments: 3:30-4:00

312 MDS building

Mark-recapture methods are crucial for studying animals in the wild and for monitoring species threatened by changes in the environment. The assumption that individuals are identified without error is standard in most models of mark-recapture data. However, mistakes can always occur, and it is difficult to account for such errors. The recorded data present a corrupted version of what was actually observed, and naively modelling this data will produce incorrectestimates. However, the possible configurations of true data consistent with the recorded data may be so numerous and complex that it is usually impossible to evaluate the likelihood function.



In this talk, I discuss several contributions I have made to modelling markrecapture data with identification errors. First, I introduce methods that I have developed to model data from populations in which each individual can be identified from multiple marks that cannot be linked (e.g., left and right skin pigmentation patterns). Building on the framework of the latent multinomial model introduced by Link et al. (Biometrics, 2010), these methods provide Bayesian inference using Markov chain Monte Carlo (MCMC) to sample from the joint posterior distribution of the model parameters and the true data. I illustrate these methods with data from a photo-identification study of whale sharks and use this example to show how the MCMC algorithm of Link et al. (2010) can be simplified to produce faster computation.



I also discuss problems with extending these methods to account for more complex identification errors. I show that the MCMC algorithm proposed by Link et al. (2010) may not produce irreducible Markov chains for some models and explain how this problem may be solved with new MCMC algorithms. I conclude by discussing my ongoing work to develop improved MCMC algorithms using Markov bases and further tools from algebraic statistics.

Date:
-
Location:
223 MDS Building

Predictive accuracy of covariates for survival outcomes

Li Chen; Assistant Professor,  Markey Cancer Center and the Department of Biostatistics



Sept 20th, 4-5 p.m.



MDS 223



Refreshments: 3:30-4:00



312 MDS building

    

We propose a graphical measure, the negative predictive function, to quantify the predictive accuracy of covariates for survival outcomes. This new measure characterizes the survival probabilities over time conditional on a thresholded linear combination of covariates and has direct clinical utility. We show that this function is maximized at the set of covariates truly related to event times and thus can be used to compare the predictive accuracy of different sets of covariates. We construct nonparametric estimators for this function under right censoring and prove that the proposed estimators, upon proper normalization, converge weakly to zero-mean Gaussian processes. To bypass the estimation of complex density functions involved in the asymptotic variances, we adopt the bootstrap approach and establish its validity. Simulation studies demonstrate that the proposed methods perform well in practical situations. A breast cancer gene expression study is provided for illustration.

Date:
-
Location:
312 MDS Building

Goodness-of-fit testing in Ising Models

Markov bases have been developed in algebraic statistics for exact goodness-of-fit testing. They connect all elements in a fiber (given by the sufficient statistics) and allow building a Markov chain to approximate the distribution of a test statistic by its posterior distribution. However, finding a Markov basis is often computationally intractable. In addition, the number of Markov steps required for converging to the stationary distribution depends on the connectivity of the sampling space.
 
In this joint work with Caroline Uhler, we study the combinatorial structure of the finite lattice Ising model and propose a new method for exact goodness-of-fit testing which avoids computing a Markov basis. Our technique is to build a Markov chain consisting only of simple moves (i.e. swaps of two interior sites). These simple moves might not be sufficient to create a connected Markov chain. We prove that when a bounded change in the sufficient statistics is allowed, the resulting Markov chain is connected. The proposed algorithm not only overcomes the computational burden of finding a Markov basis, but it might also lead to a better connectivity of the sampling space and hence a faster convergence.
Date:
-
Location:
220 MDS Building

The effect of uncertainty about the background population on the forensic value of evidence

 

Title:
The effect of uncertainty about the background population on the forensic value of evidence
 
Presenter:
Dr. Christopher Saunders
Assistant Professor
Department of Mathematics and Statistics
South Dakota State University
 
January 18, 2013
4:00-5:00p.m.
MDS 220
 
Refreshments: 312 MDS building
3:30-4:00
 
Abstract: 
A goal in the forensic interpretation of scientific evidence is to make an inference about the source of a trace of unknown origin; the inference usually concerns two propositions. The first proposition is usually referred to as the prosecution hypothesis and states that a given specific source is the actual source of the trace of unknown origin. The second usually referred to as the defense hypothesis, states that the actual source of the trace of unknown origin is randomly selected from a relevant alternative source population; i.e. the background population. The evidence that a forensic scientist is given for deciding between these two propositions is: (a) the trace of unknown origin, (b) a sample from the specific source specified by the prosecution hypothesis, and (c) a collection of samples from the alternative source population. One common approach is to assume that the collection of samples from the alternative source population is sufficiently large as to completely specify the alternative source population and to rely on a value of evidence for deciding between the competing hypotheses, as described in Lindley (1977). In this presentation, we present our construction of a Bayes Factor for deciding between the prosecution and defense hypotheses when the collection of samples from the alternative source population is not sufficiently large to completely characterize the alternative source population. We argue that the resulting Bayes Factor should be considered the Value of the Evidence and discuss its relationship to the standard value of evidence as developed by Lindley and presented in Aitken and Taroni (2004). We conclude with a discussion of some of our concerns about the effect of prior choice for the nuisance parameters in the alternative and specific source distributions on the resulting Bayes Factor. We will illustrate the construction of the Bayes Factors with a well-studied collection of samples relating to glass fragments under the assumption of a hierarchical normal model.
Date:
-
Location:
MDS 220

A Nonparametric Linear Hazards Model for Waiting Times from a Multistate Model

Refreshments: 15.30 in MDS 312

Students visit: 15:00 in MDS 312

Abstract: 

Traditional methods for the analysis of failure time data are often employed in the marginal analysis of waiting times from multistate models. However, such methods can exhibit substantial bias when transition times between model states are dependent, even when censoring is independent. We introduce a nonparametric, inverse probability of censoring–weighted (IPCW) linear hazard model for waiting times from multistate models, analogous to Aalen’s linear hazard model for failure time data. We provide a weak convergence result for the IPCW regression coefficient estimator and illustrate its unbiasedness through a simulation study, while also demonstrating the bias of the traditional linear hazard model for failure time data when waiting times are correlated. The IPCW estimators are used to examine prognostic indicators for patients receiving bone marrow transplant and predictors of ambulatory recovery in a data set of incomplete spinal cord injury patients receiving activity-based rehabilitation. This is joint work with Doug Lorenz.

Date:
-
Location:
MDS 220

Semiparametric Analysis of possibly time-dependent Treatment Effect with Survival Data

Refreshments: MDS 312 @ 15:30

Abstract:

For clinical trials with survival data, the hazard ratio has been the most widely used measure for describing the treatment effect. The short-term and long-term hazard ratio model of Yang and Prentice (2005) contains the proportional hazards model and the proportional odds model as sub-models, and do not have restrictions such as zero or infinite hazard ratio in the short term or long term, that many other semi-parametric models impose. Thus it provides sufficient flexibility when there is possibly a treatment by time interaction. We investigate various measures under this model. Point estimates, point-wise confidence intervals and simultaneous confidence bands of the hazard ratio and a few other measures are established. These results can be used to capture and to graphically present the treatment effect. We also investigate extension of the model to allow covariate adjusted analysis. We illustrate these visual tools and discuss their merits and limitations in applications to clinical trials including the Women's Health Initiative. 

Date:
-
Location:
MDS 220