$$\chi $$ iplot: Web-First Visualisation Platform for Multidimensional Data

Tanaka, Akihiro; Tyree, Juniper; Björklund, Anton; Mäkelä, Jarmo; Puolamäki, Kai

doi:10.1007/978-3-031-43430-3_26

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14175))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

1075 Accesses
1 Citations

Abstract

$\chi $iplot is an HTML5-based system for interactive exploration of data and machine learning models. A key aspect is interaction, not only for the interactive plots but also between plots. Even though $\chi $iplot is not restricted to any single application domain, we have developed and tested it with domain experts in quantum chemistry to study molecular interactions and regression models. $\chi $iplot can be run both locally and online in a web browser (keeping the data local). The plots and data can also easily be exported and shared. A modular structure also makes $\chi $iplot optimal for developing machine learning and new interaction methods.

Supported by the Research Council of Finland (decisions 346376 and 345704) and the Future Makers Funding Programme of Technology Industries of Finland Centennial Foundation and Jane and Aatos Erkko Foundation.

You have full access to this open access chapter, Download conference paper PDF

The HTPmod Shiny application enables modeling and visualization of large-scale biological data

Article Open access 05 July 2018

Visual Data Mining: Effective Exploration of the Biological Universe

On Computationally-Enhanced Visual Analysis of Heterogeneous Data and Its Application in Biomedical Informatics

Keywords

1 Introduction and Related Work

This paper introduces $\chi $iplot (), a modular system for interactive exploration of data and pre-trained machine learning models. $\chi $iplot can be run locally on the user’s computer or installation-free in a web browser. Our motivation for writing $\chi $iplot was three-fold.

(i) First, we want a Python-based system to develop and test machine learning and dimensionality reduction methods, such as [1], a manifold visualisation method for explainable AI. For this purpose, we prefer a modular system that is easy to expand and modify to test new machine learning and visualisation methods and interaction ideas.

(ii) Second, we need a tool to facilitate collaboration with primarily domain experts in quantum chemistry but also other domains. Ideally, we want to avoid forcing our collaborators to install additional software. However, we also do not want to set up and maintain server infrastructure to host a web-accessible service.

(iii) Third, the system should be practical and usable for the end user, including physicists and chemists, despite being built for quick prototyping and painless implementation. We know no prior system satisfies all of these three requirements.

Many interactive visualisation tools are available; see, e.g., [7] for a recent survey and references. Much of our research collaboration targets quantum chemistry; hence the system must also be capable of visualising, e.g., molecular structures from SMILES strings [11]. ChemInformatics Model Explorer [5] (CIME) is another tool that explores explainable AI in small molecule research. However, CIME has only four fixed views, and full functionality requires a server. Another recent example is XSMILES [4], where users can examine individual molecules in 2D diagrams and visualise attribution scores for atoms and non-atom tokens.

2 Usage

The main idea of $\chi $iplot is to simultaneously show multiple plots and visualisations to compare and contrast diverse information. Since $\chi $iplot also targets non-technical end users, intuitive visual selection and configuration of the plots are required.

$\chi $iplot comes with six types of plots out-of-the-box – scatterplots, histograms, heat maps, bar plots, data tables, and SMILES plots, which render molecules in a stick structure from a SMILES string [11] – but more can be added with $\chi $iplot ’s plugin system. Users can add and remove plots to create a layout that is the most optimal for their specific needs. The end users have the capability to generate clusters by running a k-means algorithm or by lasso selection on a scatterplot. Unique colours distinguish the generated clusters. In addition, the end users can generate a 2D embedding through Principal Component Analysis (PCA).

To use $\chi $iplot, the user may install it with pip install xiplot. The xiplot console command is then available to host a local $\chi $iplot server. Alternatively, an installation-free WebAssembly (WASM)^{Footnote 1} version can be used immediately at https://edahelsinki.fi/xiplot.

We demonstrate the main concepts with the QM9 molecular dataset [8, 9], a collection of quantum chemical properties calculated for small organic molecules. Our machine-learning task is to estimate some quantum chemical properties from their structural description. We can use physics simulators with varying fidelity or regression models. In this example, we want to study how the structures in the dataset relate to the estimation task. We have precomputed a 2D Slisemap [1] embedding (revealing the structures relevant to a regression model) and attached the embedding to the dataset file we uploaded to $\chi $iplot.

Figure 1 shows a view of the $\chi $iplot interface during our exploration. A chemist can explore the Slisemap embedding in a scatter plot on the left. There is a notable cluster structure, so we use $\chi $iplot to find the clusters and plot their distribution in the middle. If we compare the two clusters, we notice that the distributions of the functional groups differ. For example, we could manually draw an additional cluster in the scatter plot to further study the two subgroups in the rightmost cluster.

The behaviour of a molecule is not only determined by the functional groups but also by how they are structured. However, finding good summary statistics for structure is much more difficult. Therefore, we add a visualisation of individual molecules on the right of Fig. 1. A chemist can then rapidly inspect multiple molecules inside and between clusters by hovering over the points in the scatter plot; the molecule visualisation is automatically updated.

3 Description of the System

A key aspect of $\chi $iplot is interactivity, not just for a single plot but also between plots. For example, selecting a data item in one might show you more information about it in another, as described above. To accomplish this interactivity, the plots of $\chi $iplot are implemented as independent modules, communicating through shared data storage. Furthermore, to support collaboration and sharing, the set of active plots, their configuration, and the data can be saved to and restored from a file. Since $\chi $iplot is an interactive system, time-consuming computations (e.g., learning the Slisemap embedding) should be done as part of data preprocessing.

$\chi $iplot is implemented in Python using Plotly [6] for the plots and Dash for the interactivity. Usually, this would require the users to be able to install Python packages (see Sect. 2). However, we also provide a static server-less webpage version of $\chi $iplot that runs both the Dash backend and the Plotly frontend installation-free inside a browser using WebAssembly [10] (WASM). This also means no data leaves the user’s computer in the WASM version.

In detail, the WASM version of $\chi $iplot uses Pyodide [3] to run Python in the browser. The front- and backend communication is intercepted and redirected to the in-WASM server, inspired by the WebDash prototype [2]. Crucially, neither the front- nor backend code needs to know that it runs inside a browser.

As Pyodide does not yet support all Python packages, we use dynamic import detection to enable certain features and fallbacks, such as additional data file formats. Deploying the WASM version requires bundling all frontend files, $\chi $iplot, and the scripts that bootstrap the web app in the WASM backend, all documented in the $\chi $iplot GitHub repository.

To open up $\chi $iplot to even more use cases, $\chi $iplot has an API for creating plugins for, e.g., new visualisations and machine learning methods. It uses the “entry points” feature of Python to discover installed plugins, which also works in the WASM version. Due to the modular design with shared data, new plots can automatically interact with old ones.

4 Conclusions

We have already found $\chi $iplot helpful when collaborating with domain experts since it lets them configure interactive plots without programming or installing anything^{Footnote 2}. The online version also enables easy results sharing without exposing the data to any third party. For more technical users $\chi $iplot is easy to maintain end expand due to the modular architecture. Finally, $\chi $iplot is available under the Open Source MIT license from GitHub^{Footnote 3} (which includes documentation, usage examples, and a demonstration video).

Notes

1.
WASM is supported in most modern browsers; see https://caniuse.com/wasm.
2.
Installation-free version at https://edahelsinki.fi/xiplot.
3.
https://github.com/edahelsinki/xiplot.

References

Björklund, A., Mäkelä, J., Puolamäki, K.: SLISEMAP: supervised dimensionality reduction through local explanations. Mach. Learn. 112(1), 1–43 (2023). https://doi.org/10.1007/s10994-022-06261-1
Article MathSciNet MATH Google Scholar
Dafna, I., Tulop, J., Ivanov, P.: Webdash (2022). https://github.com/ibdafna/webdash
Droettboom, M., Chatham, H., Yurchak, R., Choi, G., et al.: Pyodide/pyodide: 21.0, August 2022. https://doi.org/10.5281/ZENODO.6977227
Heberle, H., Zhao, L., Schmidt, S., Wolf, T., Heinrich, J.: XSMILES: interactive visualization for molecules, SMILES and XAI attribution scores. J. Cheminformatics 15(1), 2 (2023). https://doi.org/10.1186/s13321-022-00673-w
Article Google Scholar
Humer, C., et al.: ChemInformatics Model Explorer (CIME): exploratory analysis of chemical model explanations. J. Cheminformatics 14(1), 1–14 (2022). https://doi.org/10.1186/s13321-022-00600-z
Article Google Scholar
Plotly: Plotly Open Source Graphing Library for Python (2023). https://plotly.com/python/
Qin, X., Luo, Y., Tang, N., Li, G.: Making data visualization more efficient and effective: a survey. VLDB J. 29(1), 93–117 (2019). https://doi.org/10.1007/s00778-019-00588-3
Article Google Scholar
Ramakrishnan, R., Dral, P.O., Rupp, M., von Lilienfeld, O.A.: Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 1(1), 140022 (2014). https://doi.org/10.1038/sdata.2014.22
Article Google Scholar
Stuke, A., et al.: Chemical diversity in molecular orbital energy predictions with kernel ridge regression. J. Chem. Phys. 150(20), 204121 (2019). https://doi.org/10.1063/1.5086105
Article Google Scholar
W3C: WebAssembly Core Specification, April 2022. https://www.w3.org/TR/wasm-core-2
Weininger, D.: SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28(1), 31–36 (1988). https://doi.org/10.1021/ci00057a005

Download references

Author information

Authors and Affiliations

University of Helsinki, Helsinki, Finland
Akihiro Tanaka, Juniper Tyree, Anton Björklund, Jarmo Mäkelä & Kai Puolamäki
CSC – IT Center for Science Ltd., Espoo, Finland
Jarmo Mäkelä

Authors

Akihiro Tanaka
View author publications
You can also search for this author in PubMed Google Scholar
Juniper Tyree
View author publications
You can also search for this author in PubMed Google Scholar
Anton Björklund
View author publications
You can also search for this author in PubMed Google Scholar
Jarmo Mäkelä
View author publications
You can also search for this author in PubMed Google Scholar
Kai Puolamäki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anton Björklund .

Editor information

Editors and Affiliations

CENTAI, Turin, Italy
Gianmarco De Francisci Morales
NYU and Two Sigma, New York, NY, USA
Claudia Perlich
Netflix, Los Angeles, CA, USA
Natali Ruchansky
Telefonica Research, Barcelona, Spain
Nicolas Kourtellis
Politecnico di Torino, Turin, Italy
Elena Baralis
CENTAI, Turin, Italy
Francesco Bonchi

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tanaka, A., Tyree, J., Björklund, A., Mäkelä, J., Puolamäki, K. (2023). $\chi $iplot: Web-First Visualisation Platform for Multidimensional Data. In: De Francisci Morales, G., Perlich, C., Ruchansky, N., Kourtellis, N., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Applied Data Science and Demo Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14175. Springer, Cham. https://doi.org/10.1007/978-3-031-43430-3_26

Download citation

DOI: https://doi.org/10.1007/978-3-031-43430-3_26
Published: 17 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43429-7
Online ISBN: 978-3-031-43430-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

\(\chi \)iplot: Web-First Visualisation Platform for Multidimensional Data

Abstract

Similar content being viewed by others

The HTPmod Shiny application enables modeling and visualization of large-scale biological data

Visual Data Mining: Effective Exploration of the Biological Universe

On Computationally-Enhanced Visual Analysis of Heterogeneous Data and Its Application in Biomedical Informatics

Keywords

1 Introduction and Related Work

2 Usage

3 Description of the System

4 Conclusions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

\(\chi \)iplot: Web-First Visualisation Platform for Multidimensional Data

Abstract

Similar content being viewed by others

The HTPmod Shiny application enables modeling and visualization of large-scale biological data

Visual Data Mining: Effective Exploration of the Biological Universe

On Computationally-Enhanced Visual Analysis of Heterogeneous Data and Its Application in Biomedical Informatics

Keywords

1 Introduction and Related Work

2 Usage

3 Description of the System

4 Conclusions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation