Private Information Leakage in LLMs

Buesser, Beat

doi:10.1007/978-3-031-54827-7_7

Beat Buesser⁶

14k Accesses

Abstract

Large Language Models (LLMs) can memorize training data and, if specifically prompted, reproduce or leak information on their training data. Information leakage has been observed for all types of machine-learning models. However, this threat exists at a much larger scale for LLMs because of their various applications as generative AI. This chapter relates the threat of information leakage with other adversarial threats, provides an overview of the current state of research on the mechanisms involved in memorization in LLMs, and discusses adversarial attacks aiming to extract memorized information from LLMs.

Download to read the full chapter text

Chapter PDF

References

Nicholas Carlini et al. Membership inference attacks from first principles, 2022.
Google Scholar
Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models, 2017.
Google Scholar
Matthew Fredrikson et al. Privacy in pharmacogenetics: An End-to-End case study of personalized warfarin dosing. In 23rd USENIX Security Symposium (USENIX Security 14), pages 17–32, San Diego, CA, August 2014. USENIX Association.
Google Scholar
Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, CCS ’15, page 1322–1333, New York, NY, USA, 2015. Association for Computing Machinery.
Google Scholar
Paul Christiano et al. Deep reinforcement learning from human preferences, 2023.
Google Scholar
Nicholas Carlini et al. Extracting training data from large language models, 2021.
Google Scholar
Jie Huang, Hanyin Shao, and Kevin Chen-Chuan Chang. Are large pre-trained language models leaking your personal information? In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 2038–2047, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics.
Google Scholar
Fatemehsadat Mireshghallah et al. An empirical analysis of memorization in fine-tuned autoregressive language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 1816–1826, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics.
Google Scholar
Xudong Pan, Mi Zhang, Shouling Ji, and Min Yang. Privacy risks of general-purpose language models. In 2020 IEEE Symposium on Security and Privacy (SP), pages 1314–1331, 2020.
Google Scholar
Nils Lukas et al. Analyzing leakage of personally identifiable information in language models, 2023.
Google Scholar
Jacson Rodrigues Correia-Silva, Rodrigo F. Berriel, Claudine Badue, Alberto F. de Souza, and Thiago Oliveira-Santos. Copycat CNN: Stealing knowledge by persuading confession with random non-labeled data. In 2018 International Joint Conference on Neural Networks (IJCNN). IEEE, jul 2018.
Google Scholar
Matthew Jagielski et al. High accuracy and high fidelity extraction of neural networks, 2020.
Google Scholar
Florian Tramèr et al. Truth serum: Poisoning machine learning models to reveal their secrets, 2022.
Google Scholar
Matt Burgess. The hacking of chatgpt is just getting started. https://www.wired.co.uk/article/chatgpt-jailbreak-generative-ai-hacking, 2023. Accessed 28 Sep 2023.
Matt Burgess. The hacking of chatgpt is just getting started. https://www.jailbreakchat.com, 2023. Accessed 28 Sep 2023.
Andy Zou, Zifan Wang, J. Zico Kolter, and Matt Fredrikson. Universal and transferable adversarial attacks on aligned language models, 2023.
Google Scholar
Nicholas Carlini et al. Are aligned neural networks adversarially aligned?, 2023.
Google Scholar
Taylor Shin, Yasaman Razeghi, Robert L. Logan IV au2, Eric Wallace, and Sameer Singh. Autoprompt: Eliciting knowledge from language models with automatically generated prompts, 2020.
Google Scholar

Download references

Author information

Authors and Affiliations

IBM Research Europe - Zurich, Zurich, Switzerland
Beat Buesser

Authors

Beat Buesser
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Beat Buesser .

Editor information

Editors and Affiliations

HES-SO Valais-Wallis, Sierre, Switzerland
Andrei Kucharavy
Cyber-Defence Campus, armasuisse Science and Technology, Thun, Switzerland
Octave Plancherel
Cyber-Defence Campus, armasuisse Science and Technology, Thun, Switzerland
Valentin Mulder
Cyber-Defence Campus, armasuisse Science and Technology, Thun, Switzerland
Alain Mermoud
Cyber-Defence Campus, armasuisse Science and Technology, Thun, Switzerland
Vincent Lenders

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Buesser, B. (2024). Private Information Leakage in LLMs. In: Kucharavy, A., Plancherel, O., Mulder, V., Mermoud, A., Lenders, V. (eds) Large Language Models in Cybersecurity. Springer, Cham. https://doi.org/10.1007/978-3-031-54827-7_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-54827-7_7
Published: 12 April 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-54826-0
Online ISBN: 978-3-031-54827-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics