Personal Homepage

Personal Information

MORE+

Associate Professor

Supervisor of Master's Candidates

E-Mail:

Date of Employment:2025-05-21

School/Department:软件学院

Education Level:博士研究生

Business Address:新主楼C808,G517

Gender:Male

Contact Information:18810578537

Degree:博士

Status:Employed

Alma Mater:北京航空航天大学

Discipline:Software Engineering
Computer Science and Technology

Junfan Chen

+

Gender:Male

Education Level:博士研究生

Alma Mater:北京航空航天大学

Paper

Current position: Home / Paper
Open-Set Semi-Supervised Text Classification with Latent Outlier Softening

Journal:Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), CCF-A
Abstract:Semi-supervised text classification (STC) has been extensively researched and reduces human annotation. However, existing research assuming that unlabeled data only contains in-distribution texts is unrealistic. This paper extends STC to a more practical Open-set Semi-supervised Text Classification (OSTC) setting, which assumes that the unlabeled data contains out-of-distribution (OOD) texts. The main challenge in OSTC is the false positive inference problem caused by inadvertently including OOD texts during training. To address the problem, we first develop baseline models using outlier detectors for hard OOD-data filtering in a pipeline procedure. Furthermore, we propose a Latent Outlier Softening (LOS) framework that integrates semi-supervised training and outlier detection within probabilistic latent variable modeling. LOS softens the OOD impacts by the Expectation-Maximization (EM) algorithm and weighted entropy maximization. Experiments on 3 created datasets show that LOS significantly outperforms baselines.
Co-author:Junfan Chen,Richong Zhang, Junchi Chen,Chunming Hu, Yongyi Mao
Indexed by:国际学术会议
Page Number:226-236
Translation or Not:no
Date of Publication:2023-01-01