Abstract: This talk concerns integrative data analytics and distributed inference in data integration. As data sharing from related studies become of interest, statistical methods for a joint analysis of all available datasets are needed in practice to achieve better statistical power and detect signals that are otherwise impossible to be captured from a single dataset alone. A major challenge arising from integrative data analytics pertains to principles of information aggregation, learning data heterogeneity, inference and algorithms for model fusion. Generalizing the classical theoretical foundation of information aggregation, we propose a new framework of distributed inference functions and divide-and-conquer algorithms to handle massive large-scale correlated data. I will focus on two new approaches, renewable estimation and incremental inference (REII), and distributed and integrated method of moments (DIMM). I discuss both conceptual formulations and theoretical guarantees of these methods, and illustrate their performances via numerical examples. This is joint work with Emily Hector, Lan Luo and Ling Zhou.
A link to sign up for meetings and meals with the speaker can be found here.