Abstract
Gaussian Process based Bayesian Optimization is largely adopted for solving problems where the inputs are in Euclidean spaces. In this paper we associate the inputs to discrete probability distributions which are elements of the probability simplex. To search in the new design space, we need a distance between distributions. The optimal transport distance (aka Wasserstein distance) is chosen due to its mathematical structure and the computational strategies enabled by it. Both the GP and the acquisition function is generalized to an acquisition functional over the probability simplex. To optimize this functional two methods are proposed, one based on auto differentiation and the other based on proximal-point algorithm and the gradient flow. Finally, we report a preliminary set of computational results on a class of problems whose dimension ranges from 5 to 100. These results show that embedding the Bayesian optimization process in the probability simplex enables an effective algorithm whose performance over standard Bayesian optimization improves with the increase of problem dimensionality.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Data availability
Code and data used in this paper are available at (https://github.com/andreaponti5/boris).
References
Archetti, F., Candelieri, A.: Bayesian optimization and data science. Springer, Cham (2019)
Backhoff-Veraguas, J., Fontbona, J., Rios, G., & Tobar, F.: Stochastic gradient descent in Wasserstein space. arXiv preprint arXiv:2201.04232 (2022)
Frazier, P. I.: Bayesian optimization. In Recent advances in optimization and modeling of contemporary problems (pp. 255–278). Informs (2018)
Cai, Y., Lim, L.H.: Distances between probability distributions of different dimensions. IEEE Trans. Inf. Theory 68(6), 4020–4031 (2022)
Sriperumbudur, B.K., Gretton, A., Fukumizu, K., Schölkopf, B., Lanckriet, G.R.: Hilbert space embeddings and metrics on probability measures. J Machine Learn Res 11, 1517–1561 (2010)
Monge, G.: Mémoire sur la théorie des déblais et des remblais. Mem. Math. Phys. Acad. Royale Sci. 666–704 (1781)
Kantorovich, L. V.: On the translocation of masses. In Dokl. Akad. Nauk. USSR (NS) (Vol. 37, pp. 199–201) (1942)
Villani, C.: Optimal transport: old and new (Vol. 338, p. 23). Berlin: springer (2009)
Peyré, G., Cuturi, M.: Computational optimal transport With applications to data science. Foundations Trends® Machine Learn 11(5–6), 355–607 (2019)
Gangbo, W., & McCann, R. J.: Shape recognition via Wasserstein distance. Quarterly of Applied Mathematics, 705–737 (2000)
Redko, I., Courty, N., Flamary, R., & Tuia, D.: Optimal transport for multi-source domain adaptation under target shift. In The 22nd Int Conference on Artificial Intel Stat (pp. 849–858). PMLR (2019)
Öcal, K., Grima, R., Sanguinetti, G.: Parameter estimation for biochemical reaction networks using Wasserstein distances. J. Phys. A: Math. Theor. 53(3), 034002 (2019)
Vayer, T., Chapel, L., Flamary, R., Tavenard, R., & Courty, N.: Optimal transport for structured data with application on graphs. arXiv preprint arXiv:1805.09114 (2018)
Frogner, C., Mirzazadeh, F., & Solomon, J.: Learning embeddings into entropic wasserstein spaces. arXiv preprint arXiv:1905.03329 (2019)
Arjovsky, M., Chintala, S., & Bottou, L.: Wasserstein generative adversarial networks. In International conference on machine learning (pp. 214–223). PMLR (2017)
Girard, A., Rasmussen, C., Candela, J. Q., & Murray-Smith, R.: Gaussian process priors with uncertain inputs application to multiple-step ahead time series forecasting. Adv Neural Inform Process Syst 15 (2002)
Dallaire, P., Besse, C., Chaib-Draa, B.: Learning Gaussian process models from uncertain data. In: Neural Information Processing: 16th International Conference, ICONIP 2009, Bangkok, Thailand, December 1–5, 2009, Proceedings, Part I 16, pp. 433–440. Springer, Berlin Heidelberg (2009)
Dolgov, M., & Hanebeck, U. D.: A distance-based framework for Gaussian processes over probability distributions. arXiv preprint arXiv:1809.09193 (2018)
Bachoc, F., Suvorikova, A., Ginsbourger, D., Loubes, J.M., Spokoiny, V.: Gaussian processes with multidimensional distribution inputs via optimal transport and Hilbertian embedding. Electronic J Stat 14(2), 2742–2772 (2020)
Santambrogio, F.: Euclidean, metric, and Wasserstein gradient flows: an overview. Bull. Math. Sci. 7(1), 87–154 (2017)
Crovini, E., Cotter, S. L., Zygalakis, K., & Duncan, A. B.: Batch Bayesian Optimization via Particle Gradient Flows. arXiv preprint arXiv:2209.04722 (2022)
Liutkus, A., Simsekli, U., Majewski, S., Durmus, A., & Stöter, F. R.: Sliced-Wasserstein flows: Nonparametric generative modeling via optimal transport and diffusions. In Int Conference Machine Learn (pp. 4104–4113). PMLR (2019)
Backhoff-Veraguas, J., Fontbona, J., Rios, G., & Tobar, F.: Bayesian learning with Wasserstein barycenters. arXiv preprint arXiv:1805.10833 (2018)
Luise, G., Rudi, A., Pontil, M., & Ciliberto, C.: Differential properties of sinkhorn approximation for learning with wasserstein distance. Adv Neural Inform Process Syst. 31 (2018)
Salim, A., Korba, A., Luise, G.: The wasserstein proximal gradient algorithm. Adv. Neural. Inf. Process. Syst. 33, 12356–12366 (2020)
Chuang, C. Y., Jegelka, S., & Alvarez-Melis, D.: InfoOT: Information Maximizing Optimal Transport. arXiv preprint arXiv:2210.03164 (2022)
Candelieri, A., Ponti, A., Archetti, F.: Bayesian optimization in Wasserstein spaces. In: International Conference on Learning and Intelligent Optimization, pp. 248–262. Springer International Publishing, Cham (2022)
Candelieri, A., Ponti, A., Archetti, F.: Explaining Exploration-Exploitation in Humans. Big Data and Cognitive Comput 6(4), 155 (2022)
Panaretos, V. M., & Zemel, Y. An invitation to statistics in Wasserstein space (p. 147). Springer Nature (2020)
Li, W.: Transport information geometry: Riemannian calculus on probability simplex. Inform Geometry 5(1), 161–207 (2022)
Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control. Optim. 14(5), 877–898 (1976)
De Plaen, H., Fanuel, M., Suykens, J.A.: Wasserstein exponential kernels. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–6. IEEE (2020)
Bigot, J.: Statistical data analysis in the Wasserstein space. ESAIM: Proceedings and Surveys. 68, 1–19 (2020)
Bertrand, J., Kloeckner, B.: A geometric study of Wasserstein spaces: Hadamard spaces. J Topol Anal. 4(04), 515–542 (2012)
Feragen, A., Lauze, F., Hauberg, S.: Geodesic exponential kernels: When curvature and linearity conflict. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3032–3042 (2015)
Feragen, A., Hauberg, S.: Open problem: Kernel methods on manifolds and metric spaces. what is the probability of a positive definite geodesic exponential kernel? In: Conference on learning theory, pp. 1647–1650. PMLR (2016)
Jaquier, N., Rozo, L.: High-dimensional Bayesian optimization via nested Riemannian manifolds. Adv. Neural. Inf. Process. Syst. 33, 20939–20951 (2020)
Zhigljavsky, A., Žilinskas, A.: Selection of a covariance function for a Gaussian random field aimed for modeling global optimization problems. Optimization Letters 13(2), 249–259 (2019)
Candelieri, A.: A gentle introduction to bayesian optimization. In 2021 Winter Simulation Conference (WSC) (pp. 1–16). IEEE (2021)
Mockus, J.: The Bayes methods for seeking the extremal point. Kybernetes 3(2), 103–108 (1974)
Kushner, H. J.: A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise. In: Joint Automatic Control Conference, no. 1, pp. 69–79. (1963)
Hennig, P., & Schuler, C. J.: Entropy Search for Information-Efficient Global Optimization. J Machine Learn Res. 13(6), (2012)
Hernández-Lobato, J. M., Hoffman, M. W., & Ghahramani, Z.: Predictive entropy search for efficient global optimization of black-box functions. Adv Neural Inform Process Syst. 27, (2014)
Wang, Z., & Jegelka, S.: Max-value entropy search for efficient Bayesian optimization. In Int Conference Machine Learn (pp. 3627–3635). PMLR (2017)
Žilinskas, A., Calvin, J.: Bi-objective decision making in global optimization based on statistical models. J. Global Optim. 74, 599–609 (2019)
Flamary, R., Courty, N., Gramfort, A., Alaya, M. Z., Boisbunon, A., Chambon, S.,... & Vayer, T.: POT: Python Optimal Transport. J. Mach. Learn. Res. 22(78), 1-8 (2021)
Funding
Open access funding provided by Università degli Studi di Milano - Bicocca within the CRUI-CARE Agreement.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest statement
The authors declare that they have no conflict of interest.
Open Access
This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Missing Open Access funding information has been added in the Funding Note
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Candelieri, A., Ponti, A. & Archetti, F. Bayesian optimization over the probability simplex. Ann Math Artif Intell (2023). https://doi.org/10.1007/s10472-023-09883-w
Accepted:
Published:
DOI: https://doi.org/10.1007/s10472-023-09883-w