Abstract
In many statistical problems, maximum likelihood estimation by an EM or MM algorithm suffers from excruciatingly slow convergence. This tendency limits the application of these algorithms to modern high-dimensional problems in data mining, genomics, and imaging. Unfortunately, most existing acceleration techniques are ill-suited to complicated models involving large numbers of parameters. The squared iterative methods (SQUAREM) recently proposed by Varadhan and Roland constitute one notable exception. This paper presents a new quasi-Newton acceleration scheme that requires only modest increments in computation per iteration and overall storage and rivals or surpasses the performance of SQUAREM on several representative test problems.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Alexander, D.H., Novembre, J., Lange, K.L.: Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009)
Becker, M.P., Young, I., Lange, K.L.: EM algorithms without missing data. Stat. Methods Med. Res. 6, 37–53 (1997)
de Leeuw, J.: Block relaxation algorithms in statistics. In: Bock, H.H., Lenski, W., Richter, M.M. (eds.) Information Systems and Data Analysis, pp. 308–325. Springer, New York (1994)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Stat. Soc. Ser. B 39(1), 1–38 (1977)
Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996)
Griffiths, D.A.: Maximum likelihood estimation for the beta-binomial distribution and an application to the household distribution of the total number of cases of a disease. Biometrics 29(4), 637–648 (1973)
Heiser, W.J.: Convergent computing by iterative majorization: theory and applications in multidimensional data analysis. In: Krzanowski, W.J. (ed.) Recent Advances in Descriptive Multivariate Analysis, pp. 157–189. Clarendon Press, Oxford (1995)
Hestenes, M.R., Karush, W.: A method of gradients for the calculation of the characteristic roots and vectors of a real symmetric matrix. J. Res. Natl. Bur. Stand. 47, 45–61 (1951a)
Hestenes, M.R., Karush, W.: Solutions of Ax=λ Bx. J. Res. Natl. Bur. Stand. 47, 471–478 (1951b)
Hotelling, H.: Analysis of a complex of statistical variables onto principal components. J. Educ. Psychol. 24, 417–441 (1933)
Hotelling, H.: Relations between two sets of variables. Biometrika 28, 321–377 (1936)
Hunter, D.R., Lange, K.L.: A tutorial on MM algorithms. Am. Stat. 58, 30–37 (2004)
Jamshidian, M., Jennrich, R.I.: Conjugate gradient acceleration of the EM algorithm. J. Am. Stat. Assoc. 88(421), 221–228 (1993)
Jamshidian, M., Jennrich, R.I.: Acceleration of the EM algorithm by using quasi-Newton methods. J. R. Stat. Soc. Ser. B 59(3), 569–587 (1997)
Jolliffe, I.: Principal Component Analysis. Springer, New York (1986)
Kent, J.T., Tyler, D.E., Vardi, Y.: A curious likelihood identity for the multivariate t-distribution. Commun. Stat. Simul. Comput. 23(2), 441–453 (1994)
Lange, K.L., Carson, R.: EM reconstruction algorithms for emission and transmission tomography. J. Comput. Assist. Tomogr. 8(2), 306–316 (1984)
Lange, K.L.: A quasi-Newton acceleration of the EM algorithm. Stat. Sin. 5(1), 1–18 (1995)
Lange, K.L.: Numerical Analysis for Statisticians. Springer, New York (1999)
Lange, K.L.: Optimization transfer using surrogate objective functions. J. Comput. Statist. 9, 1–59 (2000)
Lange, K.L.: Optimization. Springer, New York (2004)
Lange, K.L., Little, R.J.A., Taylor, J.M.G.: Robust statistical modeling using the t distribution. J. Am. Stat. Assoc. 84(408), 881–896 (1989)
Lidwell, O.M., Somerville, T.: Observations on the incidence and distribution of the common cold in a rural community during 1948 and 1949. J. Hyg. Camb. 49, 365–381 (1951)
Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data, 2nd edn. Wiley-Interscience, New York (2002)
Liu, C., Rubin, D.B.: The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 81(4), 633–648 (1994)
Liu, C., Rubin, D.B., Wu, Y.N.: Parameter expansion to accelerate EM: the PX-EM algorithm. Biometrika 85(4), 755–770 (1998)
Louis, T.A.: Finding the observed information matrix when using the EM algorithm. J. R. Stat. Soc. Ser. B 44(2), 226–233 (1982)
McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions, 2nd edn. Wiley-Interscience, New York (2008)
Meng, X.L., Rubin, D.B.: Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80(2), 267–278 (1993)
Meng, X.L., van Dyk, D.: The EM algorithm—an old folk-song sung to a fast new tune (with discussion). J. R. Stat. Soc. Ser. B 59(3), 511–567 (1997)
Mitchell, M., Gregersen, P., Johnson, S., Parsons, R., Vlahov, D.: The New York Cancer Project: rationale, organization, design, and baseline characteristics. J. Urban Health 61, 301–310 (2004)
Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, New York (2006)
Saad, Y.: Numerical Methods for Large Eigenvalue Problems. Halstead [Wiley], New York (1992)
Titterington, D.M., Smith, A.F.M., Makov, U.E.: Statistical Analysis of Finite Mixture Distributions. Wiley, New York (1985)
Ueda, N., Nakano, R.: Deterministic annealing EM algorithm. Neural Netw. 11, 271–282 (1998)
Varadhan, R., Roland, C.: Squared extrapolation methods (squarem): a new class of simple and efficient numerical schemes for accelerating the convergence of the EM algorithm. Johns Hopkins University, Department of Biostatistics Working Papers (Paper 63) (2004)
Varadhan, R., Roland, C.: Simple and globally convergent methods for accelerating the convergence of any EM algorithm. Scand. J. Statist. 35(2), 335–353 (2008)
Vardi, Y., Shepp, L.A., Kaufman, L.: A statistical model for positron emission tomography (with discussion). J. Am. Stat. Assoc. 80(389), 8–37 (1985)
Wu, T.T., Lange, K.L.: The MM alternatives to EM. Stat. Sci. (2009, in press)
Zhou, H., Lange, K.L.: Rating movies and rating the raters who rate them. Am. Stat. 63(4), 297–307 (2009)
Zhou, H., Lange, K.L.: MM algorithms for some discrete multivariate distributions. J. Comput. Graph. Stat. (2009b, to appear)
Zhou, H., Lange, K.L.: On the bumpy road to the dominant mode. Scand. J. Stat. (2009c, to appear)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Zhou, H., Alexander, D. & Lange, K. A quasi-Newton acceleration for high-dimensional optimization algorithms. Stat Comput 21, 261–273 (2011). https://doi.org/10.1007/s11222-009-9166-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-009-9166-3