Abstract
This chapter introduces the different types of data sources, from unstructured to structured, that will be used in the remainder of the book. Specifically, we discuss the web, Wikipedia, and knowledge bases. We further introduce standard datasets and provide pointers to tools and resources.
Chapter PDF
References
Balog, K., Serdyukov, P., de Vries, A.P.: Overview of the TREC 2011 Entity track. In: The Twentieth Text REtrieval Conference Proceedings, TREC ’11. NIST (2012)
Berners-Lee, T.: Linked data (2009)
Berners-Lee, T., Hendler, J., Lassila, O.: The semantic web. Scientific American 284(5), 34–43 (2001)
Bizer, C., Mika, P.: Editorial: The semantic web challenge, 2009. Web Semantics: Science, Services and Agents on the World Wide Web 8(4) (2010)
Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: A collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD ’08, pp. 1247–1250. ACM (2008). doi: 10.1145/1376616.1376746
Buchanan, B.G., Shortliffe, E.H.: Rule Based Expert Systems: The Mycin Experiments of the Stanford Heuristic Programming Project (The Addison-Wesley Series in Artificial Intelligence). Addison-Wesley Publishing Co. (1984)
Campinas, S., Ceccarelli, D., Perry, T.E., Delbru, R., Balog, K., Tummarello, G.: The Sindice-2011 dataset for entity-oriented search in the web of data. In: 1st International Workshop on Entity-Oriented Search, EOS ’11 (2011)
Dong, X., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., Murphy, K., Strohmann, T., Sun, S., Zhang, W.: Knowledge Vault: A web-scale approach to probabilistic knowledge fusion. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, pp. 601–610. ACM (2014). doi: 10.1145/2623330.2623623
Gruber, T.R.: A translation approach to portable ontology specifications. Knowl. Acquis. 5(2), 199–220 (1993). doi: https://doi.org/10.1006/knac.1993.1008
Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia. Artificial Intelligence 194, 28–61 (2013). doi: 10.1016/j.artint.2012.06.001
Kazama, J., Torisawa, K.: Exploiting Wikipedia as external knowledge for named entity recognition. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL ’07, pp. 698–707. Association for Computational Linguistics (2007)
Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web Journal (2012)
Mahdisoltani, F., Biega, J., Suchanek, F.M.: YAGO3: A knowledge base from multilingual Wikipedias. In: Seventh Biennial Conference on Innovative Data Systems Research, CIDR ’15 (2015)
Mendes, P.N., Jakob, M., Bizer, C.: DBpedia for NLP: A multilingual cross-domain knowledge base. In: Proceedings of the Eight International Conference on Language Resources and Evaluation, LREC ’12. ELRA (2012)
Mesgari, M., Okoli, C., Mehdi, M., Nielsen, F.Å., Lanamäki, A.: “The sum of all human knowledge”: A systematic review of scholarly research on the content of Wikipedia. Journal of the Association for Information Science and Technology 66(2), 219–245 (2015). doi: https://doi.org/10.1002/asi.23172
Navigli, R.: Ontologies. In: Mitkov, R. (ed.) Ontologies. Oxford University Press (2017)
Neumann, T., Weikum, G.: RDF-3X: a risc-style engine for RDF. Proc. VLDB Endow. 1(1), 647–659 (2008). doi: 10.14778/1453856.1453927
Oren, E., Delbru, R., Catasta, M., Cyganiak, R., Stenzhorn, H., Tummarello, G.: Sindice.com: a document-oriented lookup index for open linked data. Int. J. Metadata Semant. Ontologies 3(1), 37–52 (2008). doi: https://doi.org/10.1504/IJMSO.2008.021204
Pellissier Tanon, T., Vrandečić, D., Schaffert, S., Steiner, T., Pintscher, L.: From Freebase to Wikidata: The great migration. In: Proceedings of the 25th International Conference on World Wide Web, WWW ’16, pp. 1419–1428. International World Wide Web Conferences Steering Committee (2016). doi: 10.1145/2872427.2874809
Shadbolt, N., Berners-Lee, T., Hall, W.: The semantic web revisited. IEEE Intelligent Systems 21(3), 96–101 (2006). doi: https://doi.org/10.1109/MIS.2006.62
Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: A core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web, WWW ’07, pp. 697–706. ACM (2007). doi: 10.1145/1242572.1242667
Vrandečić, D., Krötzsch, M.: Wikidata: A free collaborative knowledge base. Commun. ACM 57(10), 78–85 (2014). doi: 10.1145/2629489
Zhai, C., Massung, S.: Text Data Management and Analysis: A Practical Introduction to Information Retrieval and Text Mining. ACM and Morgan & Claypool (2016)
Author information
Authors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2018 The Editor(s) (if applicable) and the Author(s)
About this chapter
Cite this chapter
Balog, K. (2018). Meet the Data. In: Entity-Oriented Search. The Information Retrieval Series, vol 39. Springer, Cham. https://doi.org/10.1007/978-3-319-93935-3_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-93935-3_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93933-9
Online ISBN: 978-3-319-93935-3
eBook Packages: Computer ScienceComputer Science (R0)