Higher-Order Asymptotics for Repeated Measures Analysis under General Conditions
Abstract:
In human genetics, many quantitative traits, such as blood pressure, are thought to be influenced by particular genes, but are also affected by environmental factors, making the associated genes difficult to identify and locate from genetic data alone. For this reason, it is difficult to detect and localize single nucleotide polymorphisms (SNPs) associated with quantitative traits in genome-wide association study (GWAS) data using classical statistics. I will present a coalescent approach to search for SNPs associated with quantitative traits in GWAS data by taking into account the evolutionary history among SNPs, and evaluate its performance using simulation data. Results of applying the statistical methodology developed to a real-data set to search for SNPs associated with high-density lipoprotein cholesterol in mice will also be presented. By combining methods from stochastic processes and phylogenetics, this work provides an innovative avenue for the development of new statistical methodology in statistical genetics.
Abstract: We present a method for summarizing and visualizing large, tree-structured data. Many data sets can be represented by a rooted, node-weighted tree, such as a company organizational chart, clicks on webpages, flows to and from IP addresses, or hard disk file structures, for example, where the weights represent some attribute of interest for each node. If such a tree has thousands (or millions) of nodes, it is difficult to visualize on a single sheet or paper or computer screen. We define a way to aggregate the weights of a large, n-node tree into a smaller k-node “summary tree” (where k is something like 50 or 100), and we present a dynamic programming algorithm to compute the summary tree with maximum entropy among all summary trees of a given size, where the entropy of a node-weighted tree is defined as the entropy of the discrete probability distribution whose probabilities are the normalized node weights. We discuss and provide examples of how this algorithm produces useful visualizations, and may also be optimal for certain kinds of data analysis tasks. The talk will be heavy on visualization techniques, but I will also spend some time discussing statistical issues related to hierarchical data.=