Abstract
For partitioning clustering methods, the number of clusters has to be determined in advance. One approach to deal with this issue are stability indices. In this paper several stability-based validation methods are investigated with regard to the k-prototypes algorithm for mixed-type data. The stability-based approaches are compared to common validation indices in a comprehensive simulation study in order to analyze preferability as a function of the underlying data generating process.
Chapter PDF
Similar content being viewed by others
References
Ahmad, A., Khan, S.: Survey of state-of-the-art mixed data clustering algorithms. IEEE Access, 31883–31902 (2019)
Huang, Z.: Extension to the k-Means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Discov. 2(6), 283–304 (1998)
Szepannek, G.: clustMixType: User-friendly clustering of mixed-type data in R. The R J. 10(2), 200–208 (2018)
Szepannek, G., Aschenbruck, R.: clustMixType: k-prototypes clustering for mixed variable-type data. R package version 0.2–15 (2021). https://CRAN.R-project.org/package=clusterMixType
Thorndike, R. L.: Who belongs in the family. Psychometrika 18(4), 267–276 (1953)
Hennig, C.: Clustering strategy and method selection. In: Hennig, C., Meila, M. , Murtagh, F., Rocci, R. (eds.) Handbook of Cluster Analysis, pp. 703–730. Chapman and Hall/CRC, New York (2015)
Halkidi, M., Vazirgiannia, M., Hennig, C.: Method-independent indices for cluster validation and estimating the number of clusters. In: Hennig, C., Meila, M. , Murtagh, F., Rocci, R. (eds.) Handbook of Cluster Analysis, pp. 595–618. Chapman and Hall/CRC, New York (2015)
Desgraupes, B.: clusterCrit: clustering indices. R package version 1.2.8 (2018). https://CRAN.R-project.org/package=clusterCrit
Aschenbruck, R., Szepannek, G.: Cluster validation for mixed-type data. Arch. Data Sci., Ser. A 6(1), 1–12 (2020)
Lange, T., Roth, V., Braun, M. L., Buhmann, J. M.: Stability-based validation of clustering solutions. Neural. Comput. 16(6), 1299–1323 (2004)
Dolnicar, S., Leisch, F.: Evaluation of structure and reproducibility of cluster solutions using bootstrap. Mark. Lett. 21, 83–101 (2010)
Fang, Y., Wang, J.: Selection of the number of clusters via the bootstrap method. Comput. Stat. Data Anal. 56(3), 468–477 (2012)
Mucha, H.-J., Bartel, H.-G.: Validation of k-means clustering: why is bootstrapping better than subsampling. Arch. Data Sci., Ser. A 2(1), 1–14 (2017)
Ben-Hur, A., Elisseeff, A., Guyon, I.: A stability based method for discovering structure in clustered data. In: Pac. Symp. Biocomput. 2002, 6–17 (2001)
Hennig, C.: Cluster-wise assessment of cluster stability. Comput. Stat. Data Anal. 52(1), 258–271 (2007)
Rand, W. M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336) 846–850 (1971)
Fowlkes, E. B., Mallows, C. L.: A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 78(383) 553–569 (1983)
von Luxburg, U.: Clustering stability: an overview. Found. Trends® Mach. Learn. 2(3), 235–274 (2010)
Dangl, R., Leisch, F.: Effects of resampling in determining the number of clusters in a data set. J. Classif. 37(3), 558–583 (2020)
Jimeno, J., Roy, M., Tortora, C.: Clustering mixed-type data: a benchmark study on KAMILA and k-prototypes. In: Chadjipadelis, T., Lausen, B., Markos, A., Lee, T.R., Montanari, A., Nugent, R. (eds.) Data Analysis and Rationality in a Complex World, 83–91, Springer International Publishing, Cham (2021)
Leisch, F.: Resampling methods for exploring cluster stability. In: Hennig, C., Meila, M., Murtagh, F., Rocci, R. (eds.) Handbook of Cluster Analysis, pp. 637–652. Chapman and Hall/CRC, New York (2015)
Ilies, J., Wilhelm, A. F. X.: Projection-based partitioning for large, high-dimensional datasets. J. Comp. Graph. Stat. 19(2), 474–492 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2023 The Author(s)
About this paper
Cite this paper
Aschenbruck, R., Szepannek, G., Wilhelm, A.F.X. (2023). Stability of Mixed-Type Cluster Partitions for Determination of the Number of Clusters. In: Brito, P., Dias, J.G., Lausen, B., Montanari, A., Nugent, R. (eds) Classification and Data Science in the Digital Age. IFCS 2022. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-031-09034-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-09034-9_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-09033-2
Online ISBN: 978-3-031-09034-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)