北京航空航天大学主页平台系统陈俊帆--中文主页-- Open-Set Semi-Supervised Text Classification with Latent Outlier Softening

导航

陈俊帆

点赞：

陈俊帆

点赞：

论文

Open-Set Semi-Supervised Text Classification with Latent Outlier Softening

发布时间：2025-10-22点击次数：

发表刊物： Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), CCF-A

摘要： Semi-supervised text classification (STC) has been extensively researched and reduces human annotation. However, existing research assuming that unlabeled data only contains in-distribution texts is unrealistic. This paper extends STC to a more practical Open-set Semi-supervised Text Classification (OSTC) setting, which assumes that the unlabeled data contains out-of-distribution (OOD) texts. The main challenge in OSTC is the false positive inference problem caused by inadvertently including OOD texts during training. To address the problem, we first develop baseline models using outlier detectors for hard OOD-data filtering in a pipeline procedure. Furthermore, we propose a Latent Outlier Softening (LOS) framework that integrates semi-supervised training and outlier detection within probabilistic latent variable modeling. LOS softens the OOD impacts by the Expectation-Maximization (EM) algorithm and weighted entropy maximization. Experiments on 3 created datasets show that LOS significantly outperforms baselines.

合写作者：陈俊帆,张日崇, Junchi Chen,胡春明, Yongyi Mao

论文类型：国际学术会议

页面范围： 226-236

是否译文：否

发表时间： 2023-01-01

上一条：Adversarial Word Dilution as Text Data Augmentation in Low-Resource Regime 下一条：Open-Set Semi-Supervised Text Classification via Adversarial Disagreement Maximization