导航
登录 English
陈俊帆
点赞:
陈俊帆
点赞:
论文
Preserving Label Correlation for Multi-label Text Classification by Prototypical Regularizations
发布时间:2025-10-22点击次数:
发表刊物: Proceedings of the ACM on Web Conference 2025 (WWW), CCF-A
摘要: Multi-label text classification (MLTC) aims to assign multiple relevant labels to a given sentence. An inherent challenge of MLTC is capturing label correlations compared with multi-class text classification. Existing MLTC models primarily focus on leveraging correlation information but often overlook the common issue of overfitting. Meanwhile, plug-and-play regularization methods struggle to preserve correlations effectively. In this paper, we distinguish two types of label correlations: explicit co-occurring correlation and implicit semantic correlations, and propose two regularization methods based on prototypical label embeddings for two correlation preservation, respectively. Specifically, we first generate the prototypical label embedding of multiple co-occurred labels as an intermediate. We then apply a prototypical label regularization on the distance between the sentence embedding and corresponding prototypical label embedding to alleviate the over-alignment issue caused by binary cross entropy loss and facilitate explicit correlation preservation. We finally extend the vanilla Mixup, which solely mixes multi-hot labels, on prototypical label embedding mixing to promote implicit correlation preservation. Empirical studies show the effectiveness of our regularization methods.
合写作者: Fanshuang Kong,张日崇, Xiaohui Guo,陈俊帆, Ziqiao Wang
论文类型: 国际学术会议
页面范围: 3300--3310
是否译文:
发表时间: 2025-01-01