Keywords

1 Introduction

Recommender systems have been around for multiple decades. In tourism, they help visitors to identify, among others, worthwhile points of interest (POIs), hiking tours, or destinations. Compared to early, relatively simple approaches, state-of-the-art recommender systems achieve impressive results, often thanks to improved technologies, including matrix factorization, deep learning, and graph neural networks (GNNs) [1]. Recent use cases even go beyond recommender systems that “simply” identify suitable items but aim at generating multistakeholder recommendations [2], for example, to address sustainability goals [3, 4]. Such recommender systems consider multiple perspectives, including visitors’ preferences, nature preservation, local inhabitants, and the spatiotemporal occupancy of specific places. Regarding visitors’ acceptance of generated recommendations, there is significant evidence that visitors tend to follow recommendations better if they understand the reasons behind them [5]. Moreover, recommender systems tackling sustainability issues (e.g., by guiding visitors to little-crowded places [4]) not only aim to recommend alternatives but include the critical aspect of visitor education, making them understand the implications of their actions [6].

This, however, reveals an essential shortcoming of many state-of-the-art recommender systems. Due to their complex underlying architecture, they are usually hard to understand and often referred to as (AI) black boxes [7]. Fortunately, not all use cases require opening these black boxes and hence do not require visitors to understand the actual recommender’s (algorithmic) inference (i.e., the technical path the algorithm takes to derive a particular recommendation). In contrast, it is often sufficient to find approximate explanations, for instance, a comprehensible explanation of why a visitor should not visit a specific place for sustainability reasons and why a recommended (possibly similar) alternative may also appeal to the visitor’s interests [8]. Developing such systems, even based on approximate explanations, is still challenging for developers. It requires considerable engineering effort and domain-specific knowledge, involves expensive trial and error, and demands sufficiently high accuracy.

In this paper, we introduce an approach for visually and interactively exploring the space of potential explanations for generated recommendations, showcasing how trustworthy explanations can be designed based on an interactive dashboard. We employ knowledge graphs (i.e., structural descriptions of real-world entities [9]) to incorporate domain-specific knowledge, allowing the systematic exploration of interrelationships within generated recommendations. Intended to be used by researchers and developers of touristic recommendation services, our dashboard helps find convincing reasoning paths suitable to simplify implementing a goal-oriented explanation logic. We evaluate our approach using a prototype to find explanations for two state-of-the-art recommender systems, including a knowledge-based system that exposes hints of its internal functionality and a deep-learning approach that we consider a complete black box.

2 Background

The most prominent distinction of recommender system categories comprises content-based (CB), collaborative-filtering-based (CFB), and hybrid methods. CB recommendation techniques aim to recommend items similar (i.e., with respect to item attributes) to those a user has interacted with in the past [10]. In contrast, CFB recommenders create recommendations based on the unseen items of similar users (i.e., users with a similar user-item interaction history) [10]. Hybrid recommender systems [11] combine different techniques to balance the disadvantages of specific methods and tend to score a higher precision than traditional algorithms [12]. More recent approaches to recommender systems leverage information beyond pure interaction data and item attributes. Context-aware recommender systems extend the generation of recommendations by incorporating additional context information about entities (e.g., users, items, additional objects) [11] comprising time, location, groups, and social context [13]. Similarly, knowledge-based recommender systems leverage side information, often contained in knowledge graphs (KGs) [9], to achieve better recommendation performance [7]. KGs capture entities in a structured way, presenting them as labeled nodes that are linked by labeled edges (representing relationships). Using formal semantics, KGs allow logical reasoning to capture information about subtle relationships (e.g., if A is related to B and B is related to C, A is also related to C) [14].

With the growing complexity of the underlying recommender algorithm, generating comprehensive yet understandable explanations of why an algorithm inferred a specific item internally becomes an increasingly difficult task. However, full transparency on how a recommender’s inference internally works may be irrelevant and even overwhelm regular visitors. [8] identifies seven frequently used explanation goals in the literature: transparency, scrutability, trust, effectiveness, persuasiveness, efficiency, and satisfaction. Since achieving all goals in any use case is neither possible nor necessarily desirable, the authors highlight the importance of defining the individual goals before diving deeper into a specific implementation.

The specified explainability goal determines which information is required for the explanation task. Transparency, for example, where explanations should reason about an algorithm’s inference process, requires a model-specific approach that reveals model internals (e.g., the structure of regression trees [15] or attention weights [16]). Otherwise, a post-hoc method with explanations generated by a separate model and thus independent of the selected recommender may be sufficient or even lead to more suitable explanations [7].

For conveying generated explanations to a user, researchers presented different types of explanation styles that may vary depending on the recommendation model, the explanation goal, the intended users, and the application domain [8]. Table 1 summarizes prominent examples of explanation styles.

Table 1. Overview of Explanation styles.

Despite the research progress in explainable recommendation systems, developing such systems remains challenging. Choosing suitable explanation techniques and communication styles requires comprehensive knowledge of the underlying domain and a thorough analysis of the particular use case [8]. This considerable engineering effort is complemented by expensive trial and error, regularly evaluating the effects of changes. A viable solution would allow researchers and developers to explore different explanation techniques and styles without needing constant re-development.

However, existing approaches in this regard are rare and bound to specific explanation styles and use cases [18], leaving tourism largely untouched. Furthermore, visualizing the data and internals of recommender systems for algorithmic improvement is an urgent yet underexplored field [19]. In the following, we propose a visual, interactive approach to explore different explanations tailored to the tourism domain, where location-based recommendations and geographic attributes are crucial [20]. Our approach is centered around KGs, which have proven successful not only for generating recommendations but also for generating related explanations [1]. Employing KGs, we allow incorporating tourism-specific knowledge to deep-dive into model internals or to support the creation of model-agnostic (i.e., post-hoc) explanations. Using our approach, researchers and developers can interactively evaluate different explanation styles (and configurations) to find the most effective approach for their use case.

3 Method

This paper adopts a design science paradigm (DSR) [21], an optimal fit for developing and evaluating novel artifacts to solve real-world problems. To endow our research with the necessary rigor, we follow the framework of [22] using an objective-centered approach comprising the following steps: (i) definition of solution objectives, (ii) conceptualization, (iii) demonstration, (iv) evaluation, and (v) communication (covered by the publication of this work).

Our overall objective is to support researchers and developers in making decisions for implementing recommendation explainability in the scope of tourism. More precisely, we aim to provide them with an interactive, visual approach to experiment and explore various explanation styles and associated configurations to discover an appropriate solution for their use case. Our objectives are based on the theoretical background outlined in the previous section and the specifics of the tourism domain, including the importance of geographic objects [20], touristic knowledge graphs [23], and the dynamic nature of data-driven tourism practices [24], making it necessary for explanations to specifically adapt to different use cases [8]. Considering the diversity of existing recommendation algorithms, we incorporate both model-specific and post-hoc explanation styles to keep our approach generalizable. In particular, we support side information from knowledge graphs, even for non-knowledge-graph-based models. Our concrete objectives are summarized in Table 2.

Table 2. Determined solution objectives.

In line with the second DSR step, we developed a visual approach to realizing these objectives. For this purpose, we decided to conceptualize a dashboard, which contains interactive visualizations of different explainability methods. Following our objectives, our overall approach is heavily based on knowledge graphs, which allow highly dynamic data structures and enable modeling complex situations, which can be exploited for sophisticated explanations. Regarding DSR steps three (i.e., demonstration) and four (i.e., evaluation), we developed a prototypical implementation of our concept [26, 27] and created an illustrative scenario [28] based on real-world objects and simulated user interactions to compare our solution objectives with the outputs of the developed prototype, a typical evaluation method in DSR for solutions to problems that are not yet sufficiently addressed [22]. In the following sections, we detail the conceptualization, demonstration, and evaluation steps, respectively.

4 Conceptualizing an Explainability Dashboard for Tourism

Our dashboard has been designed for explaining touristic objects (i.e., items), including POIs and tours. While bar charts and boxplot diagrams give detailed information about specific item features (e.g., tour lengths), a geographic map overviews the items’ geographic properties. The interactive core visualization style of our dashboard is a graph-based approach that allows inferring relationships for the purpose of explainability.

Our explainability dashboard allows a deep dive into explanations for a configurable number of \(N\) recommendations of a selected model based on a selected user’s profile and history. While the features of our dashboard can be used in any order, a typical analysis workflow (Fig. 1) starts with first inspecting item-based explanations before diving deeper into more complex explanations using graphs.

Fig. 1.
figure 1

Typical explainability analysis workflow using the explainability dashboard.

4.1 Item-Based Explanations

Item-based explanations refer to explanations generated based on item features. To emphasize the relevance of geographic properties in tourism [20], we further distinguish between feature-based and geographic-based explanations.

Feature-Based Explanations. Feature-based explanations provide insights into the distribution of feature values for recommended and interacted items. Boxplot diagrams visualize the distribution of numerical features (e.g., a tour’s meters of altitude or steepness) and reflect their statistical properties (i.e., quartiles, minimum, and maximum). For categorical features (e.g., difficulty), we utilize bar charts to visualize the respective values’ distribution. Feature-based visualization allows us to detect correlations and to understand which item properties may have been most influential to an underlying recommendation. Since preferred feature values may differ depending on item types (e.g., 30 km may be substantial for hiking but short for biking), we suggest creating several boxplot diagrams per item category, although this is a case-by-case consideration.

Geographic Explanations. Geographic explanations provide a more detailed overview of geographic features (e.g., the coordinates of a POI or the length and shape of a hiking tour). It helps identify interrelations between those items a user has interacted with and the items that have been recommended subsequently. This allows both a rough estimation of how well the underlying recommender works and what geographic similarities and proximities are important for a recommendation. Besides, the map should support visualizing the items’ underlying regions. Regions may include political and touristic (marketing) regions and help visualize regional influence, including the effects of marketing campaigns by destination management organizations.

4.2 Graph-Based Explanations

Graph-based explanations exploit a knowledge graph’s structure to find explanations based on paths. KGs often capture a wide range of domain knowledge, making them well-suited to interactively explore possible explanations. Note, however, that it is not necessary for a recommender to employ a graph-based model; KG-based explanations can be generated post-hoc without the involvement of the original recommender algorithm. On the other hand, if a graph-based model exposes internal knowledge about the algorithm’s inference (e.g., attention weights), this information can be used to generate model-specific explanations. The following describes two approaches to generate explanations based on KG paths.

Shortest-Path Explanations. Shortest-path explanations aim to identify the closest connections between recommended and interacted items. Based on a shortest-path algorithm, the shortest paths from interacted to recommended items are identified and visualized as an interactive subgraph. For example, a recommended hiking tour \(A\) could be linked to an interacted hiking tour \(B\) via category or difficulty nodes, indicating that \(A\) and \(B\) belong to the same category and are similarly difficult. To compare different shortest paths interactively, filters can be applied to specify which intermediate nodes are allowed (e.g., to filter out less expressive nodes or such with a high node degree).

Score-Path Explanations. While shortest-path explanations may extract the most obvious connections and may be sufficient for a range of use cases, the result of this investigation largely depends on the knowledge graph’s overall design. Score paths allow the definition of more sophisticated patterns (i.e., meta paths) to achieve precise results independent of the knowledge graph’s structure. Meta paths are defined by specifying a sequence of allowed nodes and edges, each of which may be restricted to a subset of types (i.e., node types and edge types). Based on this specification, matching paths are retrieved from the knowledge graph and scored according to customizable scoring functions (e.g., page rank and centrality). While these graph metrics are independent of the underlying recommendation algorithm and, therefore, refer to a post-hoc explainability approach, model-specific explanations can be realized by simply adjusting the scoring function to incorporate information internal to the model’s inference process (e.g., attention weights).

5 Demonstration and Evaluation

Following the DSR paradigm, we aim to compare our dashboard’s functionality with the previously defined solution objectives [22]. To perform this comparison, we implemented a web-based prototype of our concept, a common evaluation technique in DSR [26], which aims to verify solutions based on artificial or naturalistic use cases [27]. We apply our prototype by adopting an illustrative scenario [28] serving the use case of explaining recommendations in outdoor tourism (i.e., a tour recommender). Based on real-world outdoor data and simulated user behavior, we showcase the creation of explanations for tour recommendations generated by two exemplary models.

5.1 Data and Model Generation

To initialize our illustrative scenario, we created a simple KG comprising 185,229 tours from Outdooractive [29], one of Europe’s largest platforms for outdoor tourism. Our exemplary knowledge graph has been stored in a graph database. It includes three different types of entities: (i) tours, (ii) hierarchical tour categories, and (iii) touristic and political regions (which may overlap). Each tour has been connected to one category node (e.g., hiking, cycling) and multiple region nodes (i.e., all regions the tour intersects with). Furthermore, to simulate user interactions with tours, we generated 10,000 artificial users with random user preferences, including (i) preferred tour lengths and durations, (ii) a sub-selection between one and four favorite tour categories, and (iii) three to ten favorite regions. The preference distribution has been designed to be close to the real data in the Outdooractive platform. For each user, we generated five to 100 interactions (i.e., clicks) matching the specified preferences, resulting in 517,866 interactions.

To standardize the creation and training of our recommendation models, we used Recbole [30], a Python-based framework providing state-of-the-art recommendation models and a standardized training and evaluation process. We extended Recbole’s ability to export model-internal data to evaluate model-specific explanations and chose two models, namely (i) KTUP [31], a knowledge-graph-based model that employs an attention mechanism allowing the use of attention weights for explainability, and (ii) MultiDAE [32], a general purpose, deep-learning-based recommender system that does not rely on a KG and is used for post-hoc explanations. Both models were trained for 100 epochs with a learning rate of 0.001 and a batch size of 2096.

5.2 Results

We verified the suitability of our approach by generating 50 recommendations for a randomly selected user using KTUP and MultiDAE. We map the dashboard’s abilities to our objectives and discuss our solution’s degree of support. In the following, we discuss the different explanation styles for the generated recommendations.

Fig. 2.
figure 2

Exemplary visualizations of interacted (blue) and KTUP-recommended (red) tours. Boxplots explain the distributions of numerical features, geographic features are shown in a map (The map is based on Leaflet (leafletjs.com). Map data from OpenStreetMap (openstreetmap.org/copyright)).

Feature-Based Explanation. To get an overview of the similarity between interacted and recommended tours, we first visualize their feature values using boxplot diagrams (Fig. 2. a) and recognize a correlation between interacted and recommended tours, showing that the applied KTUP recommender seems to respect such properties. For MultiDAE, we obtained a similar result (feature-based explanations are generated post-hoc).

Geographic Explanation. As a next step, we investigate the geographic properties of individual tours (Fig. 2. b). As with feature-based explanations, geographic explanations are generated post-hoc and lead to similar results for both models. Comparing the interacted with the recommended tours reveals one of the user’s favorite regions (centered on the map) and their preference for round trips. Following this first impression of the map visualization, most recommendations seem to reflect these preferences, which allows inferring that tours are recommended based on their region and shape. However, contradicting examples exist. A red tour in the bottom right corner (i.e., “Drei Täler Runde”, DTR) is in a different region and seems to exceed the user’s preferred length. To explain this recommendation, a more sophisticated approach is necessary.

Graph-Based Explanation. To generate suitable explanations for more complex recommendations, we make use of a graph-based approach. For this, we navigate existing paths of the knowledge graph connecting users and tours using graph algorithms implemented in the underlying graph database. First, we apply the shortest-path explainability module to the DTR tour above and find that they share the same countries, categories, and provinces with many recommended tours. However, due to their size, countries may be insufficient for explaining a recommendation and should be excluded, while using the category seems to be a valid approach (Fig. 3).

Fig. 3.
figure 3

Exemplary shortest-path explanation indicating (previously hidden) shortest relations between recommended (bottom) and interacted (top) tours according to the underlying KG.

Besides shortest paths, score paths allow fine-grained definitions of sub-graphs and meta paths. To investigate the DTR tour in more detail, we perform a model-specific explanation by scoring paths, which reveals the main differences between explanations with MultiDAE and KTUP. While MultiDAE is restricted to general graph metrics such as centrality and betweenness that do not capture any model internals, KTUP allows a model-specific exploration of why a specific recommendation has been inferred, as it exposes model-specific attention weights (Fig. 4). In the DTR case, the province seems to play the predominant role in the recommendation.

Next, we provide a mapping of the evaluation results to our initial solution objectives (SO). As the results show, our dashboard supports various explanation styles (SO #1), including the styles mentioned in the background section, except for opinion-based explanations. The latter was excluded from our evaluation due to the nature of the underlying data. However, we argue that opinion-based explanations can still be realized using our graph-based approach, provided that the KG includes the necessary opinions. Furthermore, we verified the support of both post-hoc and model-specific explanations (SO #2 and #3). Our map visualization module allows us to visually explore geographic properties (SO #4). Finally, our approach is centered around a touristic KG, exploiting arbitrary side information and allowing us to define (meta) paths for explanations tailored to the underlying use case (SO #5 and #6).

Fig. 4.
figure 4

Exemplary model-specific score path with attention weights.

6 Conclusion and Future Work

This paper introduced a visual, interactive approach to support the explainability of touristic recommendations, supporting both model-specific and post-hoc explanations, mainly based on knowledge graphs. To demonstrate and evaluate our approach, we developed a prototypical dashboard implementing our concept and evaluated it following an illustrative scenario. Our results show that our concept can explain recommendations, even in complex scenarios generated by state-of-the-art deep learning models. By providing a novel approach for interactively exploring recommendation explanations, our work contributes to theory and practice. Researchers may use our concept to study the feasibility of different explanations, especially when studying different explanation goals [8] in multistakeholder recommendation scenarios [2]. On the other hand, practitioners benefit from our dashboard by being provided a tool to develop tailored explanations for their specific real-world use cases.

Like other work, our work suffers from certain limitations. First, while supporting many of the most important recommendation explanation techniques mentioned in the literature, our approach does not consider opinion-based explanations. Moreover, our evaluation is based on an illustrative scenario and should be subject to further evaluation, especially with real user involvement. According to the results of this evaluation, our concept may have to be further adjusted.

For future work, we plan to use our dashboard more broadly, incorporating real-world user interactions and dashboard users to obtain more in-depth evaluation results. Furthermore, we will extend our dashboard to support even more types of explanation and model-specific parameters (e.g., integrated gradients). Moreover, we want to motivate other researchers to strengthen the investigation of visual recommendation evaluation tools in general, not limited to explainability. Finally, using our dashboard, we plan to develop an explainable, knowledge-graph-based recommendation model that captures the dimensions of multiple stakeholders and deploy it in the context of sustainable tourism.