Title: Bayesian Model Criticism: From Holdout Checks to Model Comparison
Abstract: In this talk, I will cover two recent works on Bayesian model criticism. The first is Holdout Predictive Checks (HPCs). HPCs are built on posterior predictive checks (PPCs), which check a model by assessing the posterior predictive distribution on the observed data. PPCs, however, use the data twice: to calculate the posterior predictive and to evaluate it. This situation can lead to uncalibrated p-values. HPCs, in contrast, compare the posterior predictive distribution to a draw from the population distribution, a held-out dataset. This method blends Bayesian modeling with frequentist assessment.
Unlike the PPC, we prove that the HPC is properly calibrated. Empirically, we study HPCs on classical regression, a hierarchical model of text data and factor analysis. In the second work, we introduce the posterior predictive null check (PPN), a method for Bayesian model criticism that helps characterize the relationships between models. The idea behind the PPN is to check whether data from one model's predictive distribution can pass a predictive check designed for another model. This form of criticism complements the classical predictive check by providing a comparative tool.
A collection of PPNs, which we call a PPN study, can help us understand which models are equivalent and which models provide different perspectives on the data. With mixture models, we demonstrate how a PPN study, along with traditional predictive checks, can help select the number of components by the principle of parsimony. With probabilistic factor models, we demonstrate how a PPN study can help understand relationships between different classes of models, such as linear models and models based on neural networks. Finally, we also discuss ongoing work on aggregated posterior checks.