Abstract
\(\chi \)iplot is an HTML5-based system for interactive exploration of data and machine learning models. A key aspect is interaction, not only for the interactive plots but also between plots. Even though \(\chi \)iplot is not restricted to any single application domain, we have developed and tested it with domain experts in quantum chemistry to study molecular interactions and regression models. \(\chi \)iplot can be run both locally and online in a web browser (keeping the data local). The plots and data can also easily be exported and shared. A modular structure also makes \(\chi \)iplot optimal for developing machine learning and new interaction methods.
Supported by the Research Council of Finland (decisions 346376 and 345704) and the Future Makers Funding Programme of Technology Industries of Finland Centennial Foundation and Jane and Aatos Erkko Foundation.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction and Related Work
This paper introduces \(\chi \)iplot (), a modular system for interactive exploration of data and pre-trained machine learning models. \(\chi \)iplot can be run locally on the user’s computer or installation-free in a web browser. Our motivation for writing \(\chi \)iplot was three-fold.
(i) First, we want a Python-based system to develop and test machine learning and dimensionality reduction methods, such as [1], a manifold visualisation method for explainable AI. For this purpose, we prefer a modular system that is easy to expand and modify to test new machine learning and visualisation methods and interaction ideas.
(ii) Second, we need a tool to facilitate collaboration with primarily domain experts in quantum chemistry but also other domains. Ideally, we want to avoid forcing our collaborators to install additional software. However, we also do not want to set up and maintain server infrastructure to host a web-accessible service.
(iii) Third, the system should be practical and usable for the end user, including physicists and chemists, despite being built for quick prototyping and painless implementation. We know no prior system satisfies all of these three requirements.
Many interactive visualisation tools are available; see, e.g., [7] for a recent survey and references. Much of our research collaboration targets quantum chemistry; hence the system must also be capable of visualising, e.g., molecular structures from SMILES strings [11]. ChemInformatics Model Explorer [5] (CIME) is another tool that explores explainable AI in small molecule research. However, CIME has only four fixed views, and full functionality requires a server. Another recent example is XSMILES [4], where users can examine individual molecules in 2D diagrams and visualise attribution scores for atoms and non-atom tokens.
2 Usage
The main idea of \(\chi \)iplot is to simultaneously show multiple plots and visualisations to compare and contrast diverse information. Since \(\chi \)iplot also targets non-technical end users, intuitive visual selection and configuration of the plots are required.
\(\chi \)iplot comes with six types of plots out-of-the-box – scatterplots, histograms, heat maps, bar plots, data tables, and SMILES plots, which render molecules in a stick structure from a SMILES string [11] – but more can be added with \(\chi \)iplot ’s plugin system. Users can add and remove plots to create a layout that is the most optimal for their specific needs. The end users have the capability to generate clusters by running a k-means algorithm or by lasso selection on a scatterplot. Unique colours distinguish the generated clusters. In addition, the end users can generate a 2D embedding through Principal Component Analysis (PCA).
To use \(\chi \)iplot, the user may install it with pip install xiplot. The xiplot console command is then available to host a local \(\chi \)iplot server. Alternatively, an installation-free WebAssembly (WASM)Footnote 1 version can be used immediately at https://edahelsinki.fi/xiplot.
We demonstrate the main concepts with the QM9 molecular dataset [8, 9], a collection of quantum chemical properties calculated for small organic molecules. Our machine-learning task is to estimate some quantum chemical properties from their structural description. We can use physics simulators with varying fidelity or regression models. In this example, we want to study how the structures in the dataset relate to the estimation task. We have precomputed a 2D Slisemap [1] embedding (revealing the structures relevant to a regression model) and attached the embedding to the dataset file we uploaded to \(\chi \)iplot.
Figure 1 shows a view of the \(\chi \)iplot interface during our exploration. A chemist can explore the Slisemap embedding in a scatter plot on the left. There is a notable cluster structure, so we use \(\chi \)iplot to find the clusters and plot their distribution in the middle. If we compare the two clusters, we notice that the distributions of the functional groups differ. For example, we could manually draw an additional cluster in the scatter plot to further study the two subgroups in the rightmost cluster.
The behaviour of a molecule is not only determined by the functional groups but also by how they are structured. However, finding good summary statistics for structure is much more difficult. Therefore, we add a visualisation of individual molecules on the right of Fig. 1. A chemist can then rapidly inspect multiple molecules inside and between clusters by hovering over the points in the scatter plot; the molecule visualisation is automatically updated.
3 Description of the System
A key aspect of \(\chi \)iplot is interactivity, not just for a single plot but also between plots. For example, selecting a data item in one might show you more information about it in another, as described above. To accomplish this interactivity, the plots of \(\chi \)iplot are implemented as independent modules, communicating through shared data storage. Furthermore, to support collaboration and sharing, the set of active plots, their configuration, and the data can be saved to and restored from a file. Since \(\chi \)iplot is an interactive system, time-consuming computations (e.g., learning the Slisemap embedding) should be done as part of data preprocessing.
\(\chi \)iplot is implemented in Python using Plotly [6] for the plots and Dash for the interactivity. Usually, this would require the users to be able to install Python packages (see Sect. 2). However, we also provide a static server-less webpage version of \(\chi \)iplot that runs both the Dash backend and the Plotly frontend installation-free inside a browser using WebAssembly [10] (WASM). This also means no data leaves the user’s computer in the WASM version.
In detail, the WASM version of \(\chi \)iplot uses Pyodide [3] to run Python in the browser. The front- and backend communication is intercepted and redirected to the in-WASM server, inspired by the WebDash prototype [2]. Crucially, neither the front- nor backend code needs to know that it runs inside a browser.
As Pyodide does not yet support all Python packages, we use dynamic import detection to enable certain features and fallbacks, such as additional data file formats. Deploying the WASM version requires bundling all frontend files, \(\chi \)iplot, and the scripts that bootstrap the web app in the WASM backend, all documented in the \(\chi \)iplot GitHub repository.
To open up \(\chi \)iplot to even more use cases, \(\chi \)iplot has an API for creating plugins for, e.g., new visualisations and machine learning methods. It uses the “entry points” feature of Python to discover installed plugins, which also works in the WASM version. Due to the modular design with shared data, new plots can automatically interact with old ones.
4 Conclusions
We have already found \(\chi \)iplot helpful when collaborating with domain experts since it lets them configure interactive plots without programming or installing anythingFootnote 2. The online version also enables easy results sharing without exposing the data to any third party. For more technical users \(\chi \)iplot is easy to maintain end expand due to the modular architecture. Finally, \(\chi \)iplot is available under the Open Source MIT license from GitHubFootnote 3 (which includes documentation, usage examples, and a demonstration video).
Notes
- 1.
WASM is supported in most modern browsers; see https://caniuse.com/wasm.
- 2.
Installation-free version at https://edahelsinki.fi/xiplot.
- 3.
References
Björklund, A., Mäkelä, J., Puolamäki, K.: SLISEMAP: supervised dimensionality reduction through local explanations. Mach. Learn. 112(1), 1–43 (2023). https://doi.org/10.1007/s10994-022-06261-1
Dafna, I., Tulop, J., Ivanov, P.: Webdash (2022). https://github.com/ibdafna/webdash
Droettboom, M., Chatham, H., Yurchak, R., Choi, G., et al.: Pyodide/pyodide: 21.0, August 2022. https://doi.org/10.5281/ZENODO.6977227
Heberle, H., Zhao, L., Schmidt, S., Wolf, T., Heinrich, J.: XSMILES: interactive visualization for molecules, SMILES and XAI attribution scores. J. Cheminformatics 15(1), 2 (2023). https://doi.org/10.1186/s13321-022-00673-w
Humer, C., et al.: ChemInformatics Model Explorer (CIME): exploratory analysis of chemical model explanations. J. Cheminformatics 14(1), 1–14 (2022). https://doi.org/10.1186/s13321-022-00600-z
Plotly: Plotly Open Source Graphing Library for Python (2023). https://plotly.com/python/
Qin, X., Luo, Y., Tang, N., Li, G.: Making data visualization more efficient and effective: a survey. VLDB J. 29(1), 93–117 (2019). https://doi.org/10.1007/s00778-019-00588-3
Ramakrishnan, R., Dral, P.O., Rupp, M., von Lilienfeld, O.A.: Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 1(1), 140022 (2014). https://doi.org/10.1038/sdata.2014.22
Stuke, A., et al.: Chemical diversity in molecular orbital energy predictions with kernel ridge regression. J. Chem. Phys. 150(20), 204121 (2019). https://doi.org/10.1063/1.5086105
W3C: WebAssembly Core Specification, April 2022. https://www.w3.org/TR/wasm-core-2
Weininger, D.: SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28(1), 31–36 (1988). https://doi.org/10.1021/ci00057a005
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2023 The Author(s)
About this paper
Cite this paper
Tanaka, A., Tyree, J., Björklund, A., Mäkelä, J., Puolamäki, K. (2023). \(\chi \)iplot: Web-First Visualisation Platform for Multidimensional Data. In: De Francisci Morales, G., Perlich, C., Ruchansky, N., Kourtellis, N., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Applied Data Science and Demo Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14175. Springer, Cham. https://doi.org/10.1007/978-3-031-43430-3_26
Download citation
DOI: https://doi.org/10.1007/978-3-031-43430-3_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43429-7
Online ISBN: 978-3-031-43430-3
eBook Packages: Computer ScienceComputer Science (R0)