Abstract
The “Deep Web” contains, among other data, sensitive information that is left unsecured and publicly available but not indexed and thus impossible to locate by search engines. Using search-augmented language models can potentially make the deep web shallower and more searchable, posing a concern for cyber defense, particularly in countries with linguistic specifics. Mitigation strategies include red-teaming of LLM-based search engines, end-to-end encryption, or modifying terms used in critical cyber-physical systems to make resources harder to find. However, these approaches may have limitations and cause potential disruptions to user workflows.
Chapter PDF
References
Google. How google search organizes information. https://www.google.com/search/howsearchworks/how-search-works/organizing-information/, 2023. Accessed 29 Sep 2023.
Arbër S Beshiri and Arsim Susuri. Dark web and its impact in online anonymity and privacy: A critical analysis and review. Journal of Computer and Communications, 7(03):30, 2019.
Josiah Marshall. What effects do large language models have on cybersecurity. 2023.
Shicheng Xu, Liang Pang, Huawei Shen, Xueqi Cheng, and Tat-seng Chua. Search-in-the-chain: Towards the accurate, credible and traceable content generation for complex knowledge-intensive tasks. arXiv preprint arXiv:2304.14732, 2023.
Gelei Deng, Yi Liu, Víctor Mayoral-Vilches, Peng Liu, Yuekang Li, Yuan Xu, Tianwei Zhang, Yang Liu, Martin Pinzger, and Stefan Rass. Pentestgpt: An llm-empowered automatic penetration testing tool. arXiv preprint arXiv:2308.06782, 2023.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2024 The Author(s)
About this chapter
Cite this chapter
Holland, A. (2024). Deep(er) Web Indexing with LLMs. In: Kucharavy, A., Plancherel, O., Mulder, V., Mermoud, A., Lenders, V. (eds) Large Language Models in Cybersecurity. Springer, Cham. https://doi.org/10.1007/978-3-031-54827-7_12
Download citation
DOI: https://doi.org/10.1007/978-3-031-54827-7_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-54826-0
Online ISBN: 978-3-031-54827-7
eBook Packages: Computer ScienceComputer Science (R0)