Bayesian optimization over the probability simplex

Candelieri, Antonio; Ponti, Andrea; Archetti, Francesco

doi:10.1007/s10472-023-09883-w

758 Accesses
Explore all metrics

Abstract

Gaussian Process based Bayesian Optimization is largely adopted for solving problems where the inputs are in Euclidean spaces. In this paper we associate the inputs to discrete probability distributions which are elements of the probability simplex. To search in the new design space, we need a distance between distributions. The optimal transport distance (aka Wasserstein distance) is chosen due to its mathematical structure and the computational strategies enabled by it. Both the GP and the acquisition function is generalized to an acquisition functional over the probability simplex. To optimize this functional two methods are proposed, one based on auto differentiation and the other based on proximal-point algorithm and the gradient flow. Finally, we report a preliminary set of computational results on a class of problems whose dimension ranges from 5 to 100. These results show that embedding the Bayesian optimization process in the probability simplex enables an effective algorithm whose performance over standard Bayesian optimization improves with the increase of problem dimensionality.

Article PDF

Constrained Bayesian Optimization for Problems with Piece-wise Smooth Constraints

Practical Hilbert space approximate Bayesian Gaussian processes for probabilistic programming

Article Open access 14 December 2022

A comparison of mixed-variables Bayesian optimization approaches

Article Open access 09 June 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Data availability

Code and data used in this paper are available at (https://github.com/andreaponti5/boris).

References

Archetti, F., Candelieri, A.: Bayesian optimization and data science. Springer, Cham (2019)
Book MATH Google Scholar
Backhoff-Veraguas, J., Fontbona, J., Rios, G., & Tobar, F.: Stochastic gradient descent in Wasserstein space. arXiv preprint arXiv:2201.04232 (2022)
Frazier, P. I.: Bayesian optimization. In Recent advances in optimization and modeling of contemporary problems (pp. 255–278). Informs (2018)
Cai, Y., Lim, L.H.: Distances between probability distributions of different dimensions. IEEE Trans. Inf. Theory 68(6), 4020–4031 (2022)
Article MathSciNet MATH Google Scholar
Sriperumbudur, B.K., Gretton, A., Fukumizu, K., Schölkopf, B., Lanckriet, G.R.: Hilbert space embeddings and metrics on probability measures. J Machine Learn Res 11, 1517–1561 (2010)
MathSciNet MATH Google Scholar
Monge, G.: Mémoire sur la théorie des déblais et des remblais. Mem. Math. Phys. Acad. Royale Sci. 666–704 (1781)
Kantorovich, L. V.: On the translocation of masses. In Dokl. Akad. Nauk. USSR (NS) (Vol. 37, pp. 199–201) (1942)
Villani, C.: Optimal transport: old and new (Vol. 338, p. 23). Berlin: springer (2009)
Peyré, G., Cuturi, M.: Computational optimal transport With applications to data science. Foundations Trends® Machine Learn 11(5–6), 355–607 (2019)
Article MATH Google Scholar
Gangbo, W., & McCann, R. J.: Shape recognition via Wasserstein distance. Quarterly of Applied Mathematics, 705–737 (2000)
Redko, I., Courty, N., Flamary, R., & Tuia, D.: Optimal transport for multi-source domain adaptation under target shift. In The 22nd Int Conference on Artificial Intel Stat (pp. 849–858). PMLR (2019)
Öcal, K., Grima, R., Sanguinetti, G.: Parameter estimation for biochemical reaction networks using Wasserstein distances. J. Phys. A: Math. Theor. 53(3), 034002 (2019)
Article MathSciNet MATH Google Scholar
Vayer, T., Chapel, L., Flamary, R., Tavenard, R., & Courty, N.: Optimal transport for structured data with application on graphs. arXiv preprint arXiv:1805.09114 (2018)
Frogner, C., Mirzazadeh, F., & Solomon, J.: Learning embeddings into entropic wasserstein spaces. arXiv preprint arXiv:1905.03329 (2019)
Arjovsky, M., Chintala, S., & Bottou, L.: Wasserstein generative adversarial networks. In International conference on machine learning (pp. 214–223). PMLR (2017)
Girard, A., Rasmussen, C., Candela, J. Q., & Murray-Smith, R.: Gaussian process priors with uncertain inputs application to multiple-step ahead time series forecasting. Adv Neural Inform Process Syst 15 (2002)
Dallaire, P., Besse, C., Chaib-Draa, B.: Learning Gaussian process models from uncertain data. In: Neural Information Processing: 16th International Conference, ICONIP 2009, Bangkok, Thailand, December 1–5, 2009, Proceedings, Part I 16, pp. 433–440. Springer, Berlin Heidelberg (2009)
Dolgov, M., & Hanebeck, U. D.: A distance-based framework for Gaussian processes over probability distributions. arXiv preprint arXiv:1809.09193 (2018)
Bachoc, F., Suvorikova, A., Ginsbourger, D., Loubes, J.M., Spokoiny, V.: Gaussian processes with multidimensional distribution inputs via optimal transport and Hilbertian embedding. Electronic J Stat 14(2), 2742–2772 (2020)
Article MathSciNet MATH Google Scholar
Santambrogio, F.: Euclidean, metric, and Wasserstein gradient flows: an overview. Bull. Math. Sci. 7(1), 87–154 (2017)
Article MathSciNet MATH Google Scholar
Crovini, E., Cotter, S. L., Zygalakis, K., & Duncan, A. B.: Batch Bayesian Optimization via Particle Gradient Flows. arXiv preprint arXiv:2209.04722 (2022)
Liutkus, A., Simsekli, U., Majewski, S., Durmus, A., & Stöter, F. R.: Sliced-Wasserstein flows: Nonparametric generative modeling via optimal transport and diffusions. In Int Conference Machine Learn (pp. 4104–4113). PMLR (2019)
Backhoff-Veraguas, J., Fontbona, J., Rios, G., & Tobar, F.: Bayesian learning with Wasserstein barycenters. arXiv preprint arXiv:1805.10833 (2018)
Luise, G., Rudi, A., Pontil, M., & Ciliberto, C.: Differential properties of sinkhorn approximation for learning with wasserstein distance. Adv Neural Inform Process Syst. 31 (2018)
Salim, A., Korba, A., Luise, G.: The wasserstein proximal gradient algorithm. Adv. Neural. Inf. Process. Syst. 33, 12356–12366 (2020)
Google Scholar
Chuang, C. Y., Jegelka, S., & Alvarez-Melis, D.: InfoOT: Information Maximizing Optimal Transport. arXiv preprint arXiv:2210.03164 (2022)
Candelieri, A., Ponti, A., Archetti, F.: Bayesian optimization in Wasserstein spaces. In: International Conference on Learning and Intelligent Optimization, pp. 248–262. Springer International Publishing, Cham (2022)
Candelieri, A., Ponti, A., Archetti, F.: Explaining Exploration-Exploitation in Humans. Big Data and Cognitive Comput 6(4), 155 (2022)
Article Google Scholar
Panaretos, V. M., & Zemel, Y. An invitation to statistics in Wasserstein space (p. 147). Springer Nature (2020)
Li, W.: Transport information geometry: Riemannian calculus on probability simplex. Inform Geometry 5(1), 161–207 (2022)
Article MathSciNet MATH Google Scholar
Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control. Optim. 14(5), 877–898 (1976)
Article MathSciNet MATH Google Scholar
De Plaen, H., Fanuel, M., Suykens, J.A.: Wasserstein exponential kernels. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–6. IEEE (2020)
Bigot, J.: Statistical data analysis in the Wasserstein space. ESAIM: Proceedings and Surveys. 68, 1–19 (2020)
Bertrand, J., Kloeckner, B.: A geometric study of Wasserstein spaces: Hadamard spaces. J Topol Anal. 4(04), 515–542 (2012)
Feragen, A., Lauze, F., Hauberg, S.: Geodesic exponential kernels: When curvature and linearity conflict. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3032–3042 (2015)
Feragen, A., Hauberg, S.: Open problem: Kernel methods on manifolds and metric spaces. what is the probability of a positive definite geodesic exponential kernel? In: Conference on learning theory, pp. 1647–1650. PMLR (2016)
Jaquier, N., Rozo, L.: High-dimensional Bayesian optimization via nested Riemannian manifolds. Adv. Neural. Inf. Process. Syst. 33, 20939–20951 (2020)
Google Scholar
Zhigljavsky, A., Žilinskas, A.: Selection of a covariance function for a Gaussian random field aimed for modeling global optimization problems. Optimization Letters 13(2), 249–259 (2019)
Article MathSciNet MATH Google Scholar
Candelieri, A.: A gentle introduction to bayesian optimization. In 2021 Winter Simulation Conference (WSC) (pp. 1–16). IEEE (2021)
Mockus, J.: The Bayes methods for seeking the extremal point. Kybernetes 3(2), 103–108 (1974)
Article MATH Google Scholar
Kushner, H. J.: A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise. In: Joint Automatic Control Conference, no. 1, pp. 69–79. (1963)
Hennig, P., & Schuler, C. J.: Entropy Search for Information-Efficient Global Optimization. J Machine Learn Res. 13(6), (2012)
Hernández-Lobato, J. M., Hoffman, M. W., & Ghahramani, Z.: Predictive entropy search for efficient global optimization of black-box functions. Adv Neural Inform Process Syst. 27, (2014)
Wang, Z., & Jegelka, S.: Max-value entropy search for efficient Bayesian optimization. In Int Conference Machine Learn (pp. 3627–3635). PMLR (2017)
Žilinskas, A., Calvin, J.: Bi-objective decision making in global optimization based on statistical models. J. Global Optim. 74, 599–609 (2019)
Article MathSciNet MATH Google Scholar
Flamary, R., Courty, N., Gramfort, A., Alaya, M. Z., Boisbunon, A., Chambon, S.,... & Vayer, T.: POT: Python Optimal Transport. J. Mach. Learn. Res. 22(78), 1-8 (2021)

Download references

Funding

Open access funding provided by Università degli Studi di Milano - Bicocca within the CRUI-CARE Agreement.

Author information

Authors and Affiliations

Department of Economics, Management, and Statistics, University of Milano-Bicocca, Milan, Italy
Antonio Candelieri & Andrea Ponti
Department of Computer Science, University of Milano-Bicocca, Systems, and Communication, Milan, Italy
Francesco Archetti

Authors

Antonio Candelieri
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Ponti
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Archetti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrea Ponti.

Ethics declarations

Conflict of interest statement

The authors declare that they have no conflict of interest.

Open Access

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Missing Open Access funding information has been added in the Funding Note

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Candelieri, A., Ponti, A. & Archetti, F. Bayesian optimization over the probability simplex. Ann Math Artif Intell (2023). https://doi.org/10.1007/s10472-023-09883-w

Download citation

Accepted: 21 June 2023
Published: 18 July 2023
DOI: https://doi.org/10.1007/s10472-023-09883-w

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Bayesian optimization over the probability simplex

Abstract

Article PDF

Similar content being viewed by others

Constrained Bayesian Optimization for Problems with Piece-wise Smooth Constraints

Practical Hilbert space approximate Bayesian Gaussian processes for probabilistic programming

A comparison of mixed-variables Bayesian optimization approaches

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest statement

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Bayesian optimization over the probability simplex

Abstract

Article PDF

Similar content being viewed by others

Constrained Bayesian Optimization for Problems with Piece-wise Smooth Constraints

Practical Hilbert space approximate Bayesian Gaussian processes for probabilistic programming

A comparison of mixed-variables Bayesian optimization approaches

Explore related subjects

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest statement

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation