Skip to main content

Balancing Inferential Integrity and Disclosure Risk via a Multiple Imputation Synthesis Strategy

Date:
-
Location:
MDS 220
Speaker(s) / Presenter(s):
Dr. Naisyin Wang

Abstract: Responsible data sharing anchors research reproducibility and promotes the integrity of scientific research. The possibility of identification creates tension between data sharing to facilitate medical treatment or collaborative research and patient privacy protection. Information loss due to incorrect specification of imputation models can weaken or even invalidate the inference obtained from the synthetic datasets. In this talk, we focus on privacy protection in the direction of statistical disclosure control. We introduce a synthetic component into the synthesis strategy behind the traditional multiple imputation framework to ease the task of conducting inferences for researchers with limited statistical backgrounds. The tuning of the injected synthetic components enables balancing inferential quality and disclosure risk. Its addition also has the advantage of protecting against model misspecification. This framework can be combined with existing missing data methods to produce complete synthetic data sets for public release. We show, using the Canadian Scleroderma Research Group data set, that the new synthesis strategy achieves better data utility than the direct use of the classical multiple imputation approach while providing similar or better protection against identity disclosure. This is joint work with Bei Jiang, Adrian Raftery, and Russell Steele.

Event Series: