Deep(er) Web Indexing with LLMs

Holland, Aidan

doi:10.1007/978-3-031-54827-7_12

Aidan Holland⁶

14k Accesses

Abstract

The “Deep Web” contains, among other data, sensitive information that is left unsecured and publicly available but not indexed and thus impossible to locate by search engines. Using search-augmented language models can potentially make the deep web shallower and more searchable, posing a concern for cyber defense, particularly in countries with linguistic specifics. Mitigation strategies include red-teaming of LLM-based search engines, end-to-end encryption, or modifying terms used in critical cyber-physical systems to make resources harder to find. However, these approaches may have limitations and cause potential disruptions to user workflows.

Download to read the full chapter text

Chapter PDF

References

Google. How google search organizes information. https://www.google.com/search/howsearchworks/how-search-works/organizing-information/, 2023. Accessed 29 Sep 2023.
Arbër S Beshiri and Arsim Susuri. Dark web and its impact in online anonymity and privacy: A critical analysis and review. Journal of Computer and Communications, 7(03):30, 2019.
Google Scholar
Josiah Marshall. What effects do large language models have on cybersecurity. 2023.
Google Scholar
Shicheng Xu, Liang Pang, Huawei Shen, Xueqi Cheng, and Tat-seng Chua. Search-in-the-chain: Towards the accurate, credible and traceable content generation for complex knowledge-intensive tasks. arXiv preprint arXiv:2304.14732, 2023.
Google Scholar
Gelei Deng, Yi Liu, Víctor Mayoral-Vilches, Peng Liu, Yuekang Li, Yuan Xu, Tianwei Zhang, Yang Liu, Martin Pinzger, and Stefan Rass. Pentestgpt: An llm-empowered automatic penetration testing tool. arXiv preprint arXiv:2308.06782, 2023.
Google Scholar

Download references

Author information

Authors and Affiliations

Censys, Inc., Ann Arbor, MI, USA
Aidan Holland

Authors

Aidan Holland
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aidan Holland .

Editor information

Editors and Affiliations

HES-SO Valais-Wallis, Sierre, Switzerland
Andrei Kucharavy
Cyber-Defence Campus, armasuisse Science and Technology, Thun, Switzerland
Octave Plancherel
Cyber-Defence Campus, armasuisse Science and Technology, Thun, Switzerland
Valentin Mulder
Cyber-Defence Campus, armasuisse Science and Technology, Thun, Switzerland
Alain Mermoud
Cyber-Defence Campus, armasuisse Science and Technology, Thun, Switzerland
Vincent Lenders

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Holland, A. (2024). Deep(er) Web Indexing with LLMs. In: Kucharavy, A., Plancherel, O., Mulder, V., Mermoud, A., Lenders, V. (eds) Large Language Models in Cybersecurity. Springer, Cham. https://doi.org/10.1007/978-3-031-54827-7_12

Download citation

DOI: https://doi.org/10.1007/978-3-031-54827-7_12
Published: 12 April 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-54826-0
Online ISBN: 978-3-031-54827-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics