專題演講 主講人:黃信誠特聘研究員(中央研究院統計科學研究所)

  • 事件日期: 2025-05-16
  • 演講者:  /  主持人:
這是一張圖片


題 目:Data Splitting for Statistical Inference: From High-Dimensional Regression to Generative Modeling

主講人:黃信誠特聘研究員(中央研究院統計科學研究所)

時 間:114年5月16日(星期五)上午10:40-11:30
    (上午10:20-10:40茶會於綜合一館428室舉行)

地 點:綜合一館427室


使用Google Meet線上直播,
演講開始前20分鐘可進入會議,請點選下列連結後按下「加入」即可
https://meet.google.com/pie-jmyd-cra

 
摘要
Data splitting is a core principle in statistical learning, offering a structured approach to decouple model training, hyperparameter tuning, and inference. This talk highlights how data splitting enables valid and adaptable inference procedures in two modern settings: high-dimensional regression and generative modeling. In the first part, I present a multi-split framework for assigning valid p-values in high-dimensional regression. The method repeatedly partitions the data into two subsets, using one for variable selection and the other for inference via ordinary least squares regression. Stability selection identifies variables consistently selected across splits, and the resulting dependent p-values are aggregated using a Cauchy combination approach. This yields theoretically justified and interpretable p-values for post-selection inference. In the second part, I discuss ongoing work on calibrating predictive distributions from generative models. Outputs from diffusion models and other complex architectures can be poorly calibrated, especially with limited data. We investigate split conformal prediction as a model-agnostic strategy to construct valid predictive intervals, achieving reliable coverage without requiring a correctly specified generative mechanism.