Abstract
Topic modeling is a popular method in tourism data analysis. Many authors have applied various approaches to summarize the main themes of travel blogs, reviews, video diaries, and similar media. One common shortcoming of these methods is their severe limitation in working with short documents, such as blog readers’ feedback (reactions). In the past few years, a new crop of large language models (LLMs), such as ChatGPT, has become available for researchers. We investigate LLM capability in extracting the main themes of viewers’ reactions to popular videos of a rural China destination that explores the cultural, technological, and natural heritage of the countryside. We compare the extracted topics and model accuracy with the results of the traditional Latent Dirichlet Allocation approach. Overall, LLM results are more accurate, specific, and better at separating discussion topics.
You have full access to this open access chapter, Download conference paper PDF
Keywords
1 Introduction
The history of automated annotation of textual documents starts from the 1960s when Borko and Bernick [1] applied exploratory factor analysis to unsupervised classification of scientific publication abstracts. Nowadays, dozens of models have been developed and applied to extract topics from a texts [2, 3]. In tourism and social sciences in general, the most popular approach [4] is Latent Dirichlet Allocation (LDA) developed by Blei [5]. Meanwhile, LDA has important restrictions, which are usually ignored by authors. First, LDA relies on discerning parameters of the document-topic and topic-word distributions, which necessitates the presence of documents of ample length to effectively encapsulate a diversified amalgamation of topics. Second, the LDA algorithm mandates a substantial corpus of textual data to ensure precise estimation of the underlying topic distributions. Lastly, the discordant or extraneous documents within the corpus, which are common in social media, negatively impact the quality of the inferred topics. Even when all these assumptions are met, LDA topic models are criticized for inherent instability and challenges in defining the “optimal” number of target topics.
In the past few years, a new crop of large language model (LLM) such as Google’s BERT [6] has become increasingly popular, owing the success to their ability to capture the context instead of considering document words in isolation. In tourism domain, TourBERT topic model was pre-trained on tourist reviews, descriptions of tourist services, attractions and sights [7], though we are not aware of any publication in tourism journals that would utilize it.
The explosive development of the LLM field, which drew public attention after a ChatGPT became freely available over a web-based interface, has led to exploration of LLM topic extraction capabilities following a set of instructions (prompts). A new discipline known as prompt engineering explores LLM ability to learn new tasks from examples provided as an input (prompts). The key concepts of prompt engineering are the precise setting of the context such as providing relevant facts; providing elaborate instructions; conditioning LLM behavior by, e.g., providing examples; controlling for data biases; iterative refinement of LLM responses; and, finally, result validating [8, 9].
Emerging studies hint at ability of using LLM prompt engineering for topic modeling [10,11,12]. In this respect, LLMs have numerous advantages over previous generation of topic models such as leveraging general knowledge obtained in the pre-training process to infer the comments’ topics, even when the data is incomplete or ambiguous; ability to infer the topic of short comments by transferring knowledge from similar domains; and robustness to noise in the data. They can handle misspellings, grammatical errors, and inconsistent punctuation, which are common in noisy documents, by capitalizing on the surrounding context and their understanding of language patterns [8, 9].
This paper is the first to the best of our knowledge attempt to apply an LLM (GPT-3) to extraction of topics from a set of online feedbacks (reactions) of blog readers. A typical reaction is short (one sentence) and noisy (contains cultural references, slang, and typos), which makes topic extractions with traditional methods challenging. We compare extracted topics with results of traditional LDA model trained on the same dataset.
2 Data and Methodology
The specific setting are online reviews of a famous Chinese social media influencer Li Ziqi who holds a Guinness World Record for the “most subscribers for a Chinese language channel on YouTube”. The focus of Li Ziqi’s videos is on rural China; their depiction of simple yet beautiful traditional way of life evidently impacts potential tourists wishing to “visit LIZIQI’S world”. We collected all Weibo and Youtube reactions to four most popular Li Ziqi’s videos reflective of her area of interest: Rural way of life; Traditional self-made culture; Food and cooking; and Input of China to the world civilization. The collected data was cleaned, and short reactions (lesser than three words) were removed. In total, 1,852 reactions in English language were collected on Youtube. On Weibo, 2,980 reactions in simplified Chinese were collected and translated to English with Google translate. The quality of translation was verified by a native speaker.
Collected data was then processed in batches of circa 2,000 words to fit GPT-3 limits using the following prompt: “Find the most common and prominent topics covered in the {text}. For each topic that you find print the number of occurrences of this topic.” Here, {text} represents a block of reactions. Identified topics were then merged using GPT-3, resulting in 18 major topics. Finally, the reactions were mapped back to the topics following prompt engineering best practices (abridged):
-
goal = “match review to the best fitting review topic from a list of topics”
-
steps = “1. Break the list of reviews onto separate reviews; 2. For each review find two best matching review topics from the list of review topics separated by the ‘;’ sign; 3. When there are no well-matching topics, assume that the topic is ‘Other’; 4. Print the review followed by the best matching topics”
-
actAs = “a classifier assigning a class label to a data input”
-
format = “a table with reviews in the first column …”
-
prompt = “Your goal is to {goal}, acting as {actAs}. To achieve this, take a systematic approach by: {steps}. Present your response in markdown format, following the structure: {format}. The list of review topics are as follows: {topics_str}”.
-
The list of reviews is as follows: {text}
For comparison, we used identical set of reactions to extract their topic with LDA. Data was pre-processed following the best practices of topic modeling: stop word removal, bigram tokenization, and lemmatization. Then, LDA topic modeling was completed for the number of topics varying from 5 to 25. A 13-topic solution was selected for its best interpretability.
3 Results
Table 1 presents LLM topics, together with validation outcomes. The quality of topic modeling was validated by bilingual expert on a stratified random sample consisting of 360 reactions (20 per topic). The overall accuracy of topic modeling, as conducted by LLM, was found to be 97.7%. The most important reason for the high accuracy is improved recognition of short texts. Note that 30% of reviews were classified into “Other” category and were not rated. In a similar way, we performed validation of LDA topics (Table 2). For each document, LDA returns a mix of topics; we validated the topic with the highest probability and only this probability exceeded 0.5. One can interpret this decision as assigning documents not related to any high probability topic to the category “Other” (42% of dataset) and removing them from validation process. Overall accuracy of topic assignment was 58%.
4 Discussion
Given that the social media reactions tend to be short, it is not surprising that LDA topic modeling accuracy was moderate (58%); in comparison, LLM accuracy was excellent (98%). Meanwhile, even though LDA performance in terms of assigning the documents to specific topics was unimpressive, the overall set of topics is similar between LDA and LLM. It includes themes related to Chinese culture, crafts, beauty of living with nature, pets, and variations of expressions of praise towards the influencer. Note that LLM derived topics are much more specific, easy to comprehend, and did not require tedious interpretation process.
To our best knowledge, this is the first attempt to use LLM in tourism domain, a much wider effort is needed to make solid conclusions about the best practices and limitations of the methodology. The field of prompt engineering has existed for only one year. However, in our view application of LLM to topic modeling in tourism domain seems to have a very high potential. Our next plans are exploration of LLM capabilities in analysis of textual and pictorial tourism data with goals of understanding limitations and formulation of the best practices.
References
Borko, H., Bernick, M.: Automatic document classification. J. ACM JACM 10, 151–162 (1963)
Churchill, R., Singh, L.: The evolution of topic modeling. ACM Comput. Surv. 54, 1–35 (2022)
Vayansky, I., Kumar, S.A.: A review of topic modeling methods. Inf. Syst. 94, 101582 (2020)
Egger, R., Yu, J.: A topic modeling comparison between LDA, NMF, Top2Vec, and BERTopic to demystify twitter posts. Front. Sociol. 7, 886498 (2022)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. ArXiv Prepr. arXiv:1810.04805 (2018)
Arefieva, V., Egger, R.: TourBERT: a pretrained language model for the tourism industry. ArXiv Prepr. arXiv:2201.07449 (2022)
Ekin, S.: Prompt Engineering for ChatGPT: A Quick Guide to Techniques, Tips, and Best Practices (2023)
White, J., et al.: A prompt pattern catalog to enhance prompt engineering with chatgpt. ArXiv Prepr. arXiv:2302.11382 (2023)
Bhaskar, A., Fabbri, A.R., Durrett, G.: Zero-shot opinion summarization with GPT-3. ArXiv Prepr. arXiv:2211.15914 (2022)
Kublik, S., Saboo, S.: GPT-3. O’Reilly Media, Incorporated, Sebastopol (2022)
Rijcken, E., Scheepers, F., Zervanou, K., Spruit, M., Mosteiro, P., Kaymak, U.: Towards interpreting topic models with ChatGPT. Presented at the The 20th World Congress of the International Fuzzy Systems Association (2023)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2024 The Author(s)
About this paper
Cite this paper
Kirilenko, A., Stepchenkova, S. (2024). Automated Topic Analysis with Large Language Models. In: Berezina, K., Nixon, L., Tuomi, A. (eds) Information and Communication Technologies in Tourism 2024. ENTER 2024. Springer Proceedings in Business and Economics. Springer, Cham. https://doi.org/10.1007/978-3-031-58839-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-58839-6_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-58838-9
Online ISBN: 978-3-031-58839-6
eBook Packages: Business and ManagementBusiness and Management (R0)