Keywords

1 Introduction and Related Work

This paper introduces \(\chi \)iplot (), a modular system for interactive exploration of data and pre-trained machine learning models. \(\chi \)iplot can be run locally on the user’s computer or installation-free in a web browser. Our motivation for writing \(\chi \)iplot was three-fold.

(i) First, we want a Python-based system to develop and test machine learning and dimensionality reduction methods, such as [1], a manifold visualisation method for explainable AI. For this purpose, we prefer a modular system that is easy to expand and modify to test new machine learning and visualisation methods and interaction ideas.

(ii) Second, we need a tool to facilitate collaboration with primarily domain experts in quantum chemistry but also other domains. Ideally, we want to avoid forcing our collaborators to install additional software. However, we also do not want to set up and maintain server infrastructure to host a web-accessible service.

(iii) Third, the system should be practical and usable for the end user, including physicists and chemists, despite being built for quick prototyping and painless implementation. We know no prior system satisfies all of these three requirements.

Many interactive visualisation tools are available; see, e.g., [7] for a recent survey and references. Much of our research collaboration targets quantum chemistry; hence the system must also be capable of visualising, e.g., molecular structures from SMILES strings [11]. ChemInformatics Model Explorer [5] (CIME) is another tool that explores explainable AI in small molecule research. However, CIME has only four fixed views, and full functionality requires a server. Another recent example is XSMILES [4], where users can examine individual molecules in 2D diagrams and visualise attribution scores for atoms and non-atom tokens.

2 Usage

The main idea of \(\chi \)iplot is to simultaneously show multiple plots and visualisations to compare and contrast diverse information. Since \(\chi \)iplot also targets non-technical end users, intuitive visual selection and configuration of the plots are required.

\(\chi \)iplot comes with six types of plots out-of-the-box – scatterplots, histograms, heat maps, bar plots, data tables, and SMILES plots, which render molecules in a stick structure from a SMILES string [11] – but more can be added with \(\chi \)iplot ’s plugin system. Users can add and remove plots to create a layout that is the most optimal for their specific needs. The end users have the capability to generate clusters by running a k-means algorithm or by lasso selection on a scatterplot. Unique colours distinguish the generated clusters. In addition, the end users can generate a 2D embedding through Principal Component Analysis (PCA).

To use \(\chi \)iplot, the user may install it with pip install xiplot. The xiplot console command is then available to host a local \(\chi \)iplot server. Alternatively, an installation-free WebAssembly (WASM)Footnote 1 version can be used immediately at https://edahelsinki.fi/xiplot.

Fig. 1.
figure 1

\(\chi \)iplot interface when studying a regression model on a QM9 dataset.

We demonstrate the main concepts with the QM9 molecular dataset [8, 9], a collection of quantum chemical properties calculated for small organic molecules. Our machine-learning task is to estimate some quantum chemical properties from their structural description. We can use physics simulators with varying fidelity or regression models. In this example, we want to study how the structures in the dataset relate to the estimation task. We have precomputed a 2D Slisemap [1] embedding (revealing the structures relevant to a regression model) and attached the embedding to the dataset file we uploaded to \(\chi \)iplot.

Figure 1 shows a view of the \(\chi \)iplot interface during our exploration. A chemist can explore the Slisemap embedding in a scatter plot on the left. There is a notable cluster structure, so we use \(\chi \)iplot to find the clusters and plot their distribution in the middle. If we compare the two clusters, we notice that the distributions of the functional groups differ. For example, we could manually draw an additional cluster in the scatter plot to further study the two subgroups in the rightmost cluster.

The behaviour of a molecule is not only determined by the functional groups but also by how they are structured. However, finding good summary statistics for structure is much more difficult. Therefore, we add a visualisation of individual molecules on the right of Fig. 1. A chemist can then rapidly inspect multiple molecules inside and between clusters by hovering over the points in the scatter plot; the molecule visualisation is automatically updated.

3 Description of the System

A key aspect of \(\chi \)iplot is interactivity, not just for a single plot but also between plots. For example, selecting a data item in one might show you more information about it in another, as described above. To accomplish this interactivity, the plots of \(\chi \)iplot are implemented as independent modules, communicating through shared data storage. Furthermore, to support collaboration and sharing, the set of active plots, their configuration, and the data can be saved to and restored from a file. Since \(\chi \)iplot is an interactive system, time-consuming computations (e.g., learning the Slisemap embedding) should be done as part of data preprocessing.

\(\chi \)iplot is implemented in Python using Plotly [6] for the plots and Dash for the interactivity. Usually, this would require the users to be able to install Python packages (see Sect. 2). However, we also provide a static server-less webpage version of \(\chi \)iplot that runs both the Dash backend and the Plotly frontend installation-free inside a browser using WebAssembly [10] (WASM). This also means no data leaves the user’s computer in the WASM version.

In detail, the WASM version of \(\chi \)iplot uses Pyodide [3] to run Python in the browser. The front- and backend communication is intercepted and redirected to the in-WASM server, inspired by the WebDash prototype [2]. Crucially, neither the front- nor backend code needs to know that it runs inside a browser.

As Pyodide does not yet support all Python packages, we use dynamic import detection to enable certain features and fallbacks, such as additional data file formats. Deploying the WASM version requires bundling all frontend files, \(\chi \)iplot, and the scripts that bootstrap the web app in the WASM backend, all documented in the \(\chi \)iplot GitHub repository.

To open up \(\chi \)iplot to even more use cases, \(\chi \)iplot has an API for creating plugins for, e.g., new visualisations and machine learning methods. It uses the “entry points” feature of Python to discover installed plugins, which also works in the WASM version. Due to the modular design with shared data, new plots can automatically interact with old ones.

4 Conclusions

We have already found \(\chi \)iplot helpful when collaborating with domain experts since it lets them configure interactive plots without programming or installing anythingFootnote 2. The online version also enables easy results sharing without exposing the data to any third party. For more technical users \(\chi \)iplot is easy to maintain end expand due to the modular architecture. Finally, \(\chi \)iplot is available under the Open Source MIT license from GitHubFootnote 3 (which includes documentation, usage examples, and a demonstration video).