Speaker:Hsin-Cheng Huang Distinguished Research Fellow (Institute of Statistical Science, Academia Sinica)

  • Event Date: 2025-05-16
  • Speaker:  /  Host:


Topic:Data Splitting for Statistical Inference: From High-Dimensional Regression to Generative Modeling

Speaker:Hsin-Cheng Huang Distinguished Research Fellow

(Institute of Statistical Science, Academia Sinica)

Time:May 16 (Friday) , 2025, 10:40-11:30 

Place: 4F-427, Assembly Building I


Online Seminars- Google Meet

Abstract
Data splitting is a core principle in statistical learning, offering a structured approach to decouple model training, hyperparameter tuning, and inference. This talk highlights how data splitting enables valid and adaptable inference procedures in two modern settings: high-dimensional regression and generative modeling. In the first part, I present a multi-split framework for assigning valid p-values in high-dimensional regression. The method repeatedly partitions the data into two subsets, using one for variable selection and the other for inference via ordinary least squares regression. Stability selection identifies variables consistently selected across splits, and the resulting dependent p-values are aggregated using a Cauchy combination approach. This yields theoretically justified and interpretable p-values for post-selection inference. In the second part, I discuss ongoing work on calibrating predictive distributions from generative models. Outputs from diffusion models and other complex architectures can be poorly calibrated, especially with limited data. We investigate split conformal prediction as a model-agnostic strategy to construct valid predictive intervals, achieving reliable coverage without requiring a correctly specified generative mechanism.