Paper
Honglin Mu, Han He, Yuxin Zhou, Yunlong Feng, Yang Xu, Libo Qin, Xiaoming Shi, Zeming Liu, Xudong Han, Qi Shi, Qingfu Zhu, Wanxiang Che, Stealthy Jailbreak Attacks on Large Language Models via Benign Data Mirroring, In Proceedings of the 2025 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2025), 2025.04, Albuquerque, New Mexico, USA. (CCF B).
Release time:2025-01-23 Hits:Translation or Not:no

中文