Safe and Secure Generative AI

Publications

Prompt injection attacks and defenses

Slides

Yuqi Jia, Yupei Liu, Zedian Shao, Jinyuan Jia, and Neil Zhenqiang Gong. "PromptLocate: Localizing Prompt Injection Attacks". In IEEE Symposium on Security and Privacy, 2026.

Jiacheng Liang, Yuhui Wang, Changjiang Li, Rongyi Zhu, Tanqiu Jiang, Neil Gong, and Ting Wang. "GraphRAG under Fire". In IEEE Symposium on Security and Privacy, 2026.

Reachal Wang, Yuqi Jia, and Neil Zhenqiang Gong. "ObliInjection: Order-Oblivious Prompt Injection Attack to LLM Agents with Multi-source Data". In ISOC Network and Distributed System Security Symposium (NDSS), 2026.

Jiawen Shi, Zenghui Yuan, Guiyao Tie, Pan Zhou, Neil Zhenqiang Gong, and Lichao Sun. "Prompt Injection Attack to Tool Selection in LLM Agents". In ISOC Network and Distributed System Security Symposium (NDSS), 2026.

Xilong Wang, John Bloch, Zedian Shao, Yuepeng Hu, Shuyan Zhou, and Neil Zhenqiang Gong. "WebInject: Prompt Injection Attack to Web Agents". In Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025.

Zedian Shao, Hongbin Liu, Jaden Mu, and Neil Zhenqiang Gong. "Enhancing Prompt Injection Attacks to LLMs via Poisoning Alignment". In ACM Workshop on Artificial Intelligence and Security (AISec), 2025.

Yuqi Jia, Zedian Shao, Yupei Liu, Jinyuan Jia, Dawn Song, and Neil Zhenqiang Gong. "A Critical Evaluation of Defenses against Prompt Injection Attacks". In arXiv, 2025.

Yupei Liu, Yuqi Jia, Jinyuan Jia, and Neil Zhenqiang Gong. "Evaluating LLM-based Personal Information Extraction and Countermeasures". In USENIX Security Symposium, 2025.

Yupei Liu, Yuqi Jia, Jinyuan Jia, Dawn Song, Neil Zhenqiang Gong. "DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks". In IEEE Symposium on Security and Privacy, 2025.
Distinguished Paper Award

Jiawen Shi, Zenghui Yuan, Yinuo Liu, Yue Huang, Pan Zhou, Lichao Sun, and Neil Zhenqiang Gong. "Optimization-based Prompt Injection Attack to LLM-as-a-Judge". In ACM Conference on Computer and Communications Security (CCS), 2024.

Bo Hui, Haolin Yuan, Neil Gong, Philippe Burlina, and Yinzhi Cao. "PLeak: Prompt Leaking Attacks against Large Language Model Applications". In ACM Conference on Computer and Communications Security (CCS), 2024.

Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, and Neil Zhenqiang Gong. "Formalizing and Benchmarking Prompt Injection Attacks and Defenses". In USENIX Security Symposium, 2024.

Detecting and attributing AI-generated content

Yuepeng Hu, Zhengyuan Jiang, Moyang Guo, and Neil Gong. "A Transfer Attack to Image Watermarks". In International Conference on Learning Representations (ICLR), 2025.

Hongbin Liu, Moyang Guo, Zhengyuan Jiang, Lun Wang, and Neil Zhenqiang Gong. "AudioMarkBench: Benchmarking Robustness of Audio Watermarking". In NeurIPS Datasets and Benchmarks, 2024.

Zhengyuan Jiang, Moyang Guo, Yuepeng Hu, Jinyuan Jia, and Neil Zhenqiang Gong. "Certifiably Robust Image Watermark". In European Conference on Computer Vision (ECCV), 2024.

Zhengyuan Jiang, Moyang Guo, Yuepeng Hu, and Neil Zhenqiang Gong. "Watermark-based Attribution of AI-Generated Content". In arXiv, 2024.

Yuepeng Hu, Zhengyuan Jiang, Moyang Guo, and Neil Gong. "Stable Signature is Unstable: Removing Image Watermark from Diffusion Models". In arXiv, 2024.

Zhengyuan Jiang, Jinghuai Zhang, and Neil Zhenqiang Gong. "Evading Watermark based Detection of AI-Generated Content". In ACM Conference on Computer and Communications Security (CCS), 2023.

Preventing harmful content generation and jailbreaking

Yueqi Xie, Minghong Fang, Renjie Pi, and Neil Zhenqiang Gong. "GradSafe: Detecting Jailbreak Prompts for LLMs via Safety-Critical Gradient Analysis". In Annual Meeting of the Association for Computational Linguistics (ACL), 2024.

Yuchen Yang, Bo Hui, Haolin Yuan, Neil Gong, and Yinzhi Cao. "SneakyPrompt: Jailbreaking Text-to-image Generative Models". In IEEE Symposium on Security and Privacy (IEEE S&P), 2024.

Hallucinations

Wen Huang, Hongbin Liu, Minxin Guo, and Neil Zhenqiang Gong. "Visual Hallucinations of Multi-modal Large Language Models". In Findings of the Association for Computational Linguistics (ACL Findings), 2024.

Robustness to common perturbations

Kaijie Zhu, Jindong Wang, Jiaheng Zhou, Zichen Wang, Hao Chen, Yidong Wang, Linyi Yang, Wei Ye, Neil Zhenqiang Gong, Yue Zhang, and Xing Xie. "PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts". In arxiv, 2023.

Poisoning and backdoor attacks to embedding foundation models (e.g., CLIP) and defenses

These foundation models are foundations of many generative AI systems. For instance, CLIP text encoder is used in text-to-image models, and vision encoder is used in multi-modal LLM.

Hongbin Liu, Michael K Reiter, and Neil Zhenqiang Gong. "Mudjacking: Patching Backdoor Vulnerabilities in Foundation Models". In USENIX Security Symposium, 2024.

Jinghuai Zhang, Hongbin Liu, Jinyuan Jia, and Neil Zhenqiang Gong. "Data Poisoning based Backdoor Attacks to Contrastive Learning". In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024.

Hongbin Liu, Jinyuan Jia, and Neil Zhenqiang Gong. "PoisonedEncoder: Poisoning the Unlabeled Pre-training Data in Contrastive Learning". In USENIX Security Symposium, 2022.

Jinyuan Jia, Yupei Liu, and Neil Zhenqiang Gong. "BadEncoder: Backdoor Attacks to Pre-trained Encoders in Self-Supervised Learning". In IEEE Symposium on Security and Privacy, 2022.

Intellectual property and data-use auditing in model training

We study intellectual property for both model providers and users. For model providers, we study stealing attacks to foundation models and their defenses. For users, we study data auditing/tracing in pre-training foundation models.

Zonghao Huang, Neil Zhenqiang Gong, and Michael K Reiter. "A General Framework for Data-Use Auditing of ML Models". In ACM Conference on Computer and Communications Security (CCS), 2024.

Roy Xie, Junlin Wang, Ruomin Huang, Minxing Zhang, Rong Ge, Jian Pei, Neil Zhenqiang Gong, and Bhuwan Dhingra. "ReCaLL: Membership Inference via Relative Conditional Log-Likelihoods". In Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024.

Yupei Liu, Jinyuan Jia, Hongbin Liu, and Neil Zhenqiang Gong. "StolenEncoder: Stealing Pre-trained Encoders in Self-supervised Learning". In ACM Conference on Computer and Communications Security (CCS), 2022.

Hongbin Liu*, Jinyuan Jia*, Wenjie Qu, and Neil Zhenqiang Gong. "EncoderMI: Membership Inference against Pre-trained Encoders in Contrastive Learning". In ACM Conference on Computer and Communications Security (CCS), 2021. *Equal contribution

Talks

Talks on safe and robust generative were given at Google, Microsoft Research Asia, Privacy and Security in ML Seminars, ICLR 2022 Workshop on Socially Responsible Machine Learning (SRML), etc.. The talk given at Google is available on YouTube [here]

Code and Data

[Code and data] for prompt injection attacks and defenses.
[Code and data] for PromptBench.
[Code and data] for watermark-based detection of AI-generated content.
[Code and data] for BadEncoder.

Slides

Prompt injection [Slides].
Safe and Robust Generative AI [Slides].
Secure Foundation Models [Slides].
Robustness of Watermark-based Detection of AI-generated Content [Slides].