Abstract
A major challenge in applying Bayesian tracking methods for tracking 3D human body pose is the high dimensionality of the pose state space. It has been observed that the 3D human body pose parameters typically can be assumed to lie on a low-dimensional manifold embedded in the high-dimensional space. The goal of this work is to approximate the low-dimensional manifold so that a low-dimensional state vector can be obtained for efficient and effective Bayesian tracking. To achieve this goal, a globally coordinated mixture of factor analyzers is learned from motion capture data. Each factor analyzer in the mixture is a “locally linear dimensionality reducer” that approximates a part of the manifold. The global parametrization of the manifold is obtained by aligning these locally linear pieces in a global coordinate system. To enable automatic and optimal selection of the number of factor analyzers and the dimensionality of the manifold, a variational Bayesian formulation of the globally coordinated mixture of factor analyzers is proposed. The advantages of the proposed model are demonstrated in a multiple hypothesis tracker for tracking 3D human body pose. Quantitative comparisons on benchmark datasets show that the proposed method produces more accurate 3D pose estimates over time than those obtained from two previously proposed Bayesian tracking methods.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Agarwal, A., & Triggs, B. (2004). Tracking articulated motion with piecewise learned dynamical models. In Proceedings of the European conference on computer vision (ECCV) (Vol. 3, pp. 54–65).
Balan, A., Sigal, L., & Black, M. (2005). A quantitative evaluation of video-based 3d person tracking. In IEEE workshop on VS-PETS (pp. 349–356).
Beal, M. (2003). Variational algorithms for approximate Bayesian inference. PhD thesis, Gatsby Computational Neuroscience Unit, University College London.
Belkin, M., & Niyogi, P. (2001). Laplacian eigenmaps and spectral techniques for embedding and clustering. In Advances in neural information processing systems (NIPS) (pp. 585–591).
Bishop, C., Svensén, M., & Williams, C. (1998). GTM: the generative topographic mapping. Neural Computation, 10(1), 215–234.
Brand, M. (2002). Charting a manifold. In Advances in neural information processing systems (NIPS) (pp. 961–968).
Cham, T.-J., & Rehg, J. M. (1999). A multiple hypothesis approach to figure tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 239–245).
Cheeseman, P., & Stutz, J. (1996). Bayesian classification (AutoClass: theory and results). In Advances in knowledge discovery and data mining (pp. 153–180).
Choo, K., & Fleet, D. (2001). People tracking using hybrid Monte Carlo filtering. In Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 321–328).
Deutscher, J., Blake, A., & Reid, I. (2000). Articulated body motion capture by annealed particle filtering. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 126–133).
Elgammal, A., & Lee, C.-S. (2004). Inferring 3D body pose from silhouettes using activity manifold learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 681–688).
Elgammal, A., & Lee, C.-S. (2009). Tracking people on a torus. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(3), 520–538.
Ghahramani, Z., & Hinton, G. (1996). The EM algorithm for mixtures of factor analyzers (Technical Report CRG-TR-96-1). University of Toronto.
Ioffe, S., & Forsyth, D. (2001). Human tracking with mixtures of trees. In Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 690–695).
Jefferys, W., & Berger, J. (1992). Ockham’s Razor and Bayesian analysis. American Scientist, 80, 64–72.
Jenkins, O., & Matarić, M. (2004). A spatio-temporal extension to Isomap nonlinear dimensionality reduction. In Proceedings of the IEEE international conference on machine learning (ICML) (pp. 56–73).
Ju, S. X., Black, M., & Yacoob, Y. (1996). Cardboard people: a parameterized model of articulated image motion. In International conference on automatic face and gesture recognition (pp. 38–44).
Kass, R., & Raftery, A. (1995). Bayesian factors. Journal of the American Statistical Association, 90, 773–795.
Lan, X., & Huttenlocher, D. (2004). A unified spatio-temporal articulated model for tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 722–729).
Lawrence, N. (2003). Gaussian process latent variable models for visualization of high dimensional data. In Advances in neural information processing systems (NIPS) (pp. 329–336).
Li, R., Yang, M.-H., Sclaroff, S., & Tian, T.-P. (2006). Monocular tracking of 3D human motion with a coordinated mixture of factor analyzers. In Proceedings of the European conference on computer vision (ECCV) (Vol. 2, pp. 137–150).
Li, R., Tian, T.-P., & Sclaroff, S. (2007). Simultaneous learning of nonlinear manifold and dynamical models for high-dimensional time series. In Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 1–8).
Lin, R.-S., Liu, C.-B., Yang, M.-H., Ahuja, N., & Levinson, S. (2006). Learning nonlinear manifolds from time series. In Proceedings of the European conference on computer vision (ECCV) (Vol. 3, pp. 239–250).
MacCormick, J., & Blake, A. (1999). A probabilistic exclusion principle for tracking multiple objects. In Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 572–578).
MacKay, D. (1992). Bayesian interpolation. Neural Computation, 4(3), 415–417.
MacKay, D. (1996). Bayesian non-linear modelling for the 1993 energy prediction competition. In G. Heidbreder (Ed.), Maximum entropy and Bayesian methods, Santa Barbara 1993 (pp. 221–234). Dordrecht: Kluwer.
Mori, G., & Malik, J. (2002). Estimating human body configurations using shape context matching. In Proceedings of the European conference on computer vision (ECCV) (pp. 666–680).
Poppe, R. (2007a). Evaluating example-based pose estimation: experiments on the Humaneva sets. In Online proceedings of the workshop on evaluation of articulated human motion and pose estimation (EHuM) at the international conference on computer vision and pattern recognition (CVPR).
Poppe, R. (2007b). Vision-based human motion analysis: an overview. Computer Vision and Image Understanding, 108, 4–18.
Ramanan, D., Forsyth, D. A., & Zisserman, A. (2007). Tracking people by learning their appearance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(1), 65–81.
Rasmussen, C. (2000). The infinite Gaussian mixture model. In Advances in neural information processing systems (NIPS) (pp. 554–560).
Richardson, S., & Green, P. (1997). On Bayesian analysis of mixtures with unknown number of components. Journal of the Royal Statistical Society, Series B, 59(4), 731–758.
Roweis, R., & Saul, L. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323–2326.
Roweis, R., Saul, L., & Hinton, G. (2001). Global coordination of local linear models. In Advances in neural information processing systems (NIPS) (pp. 889–896).
Safonova, A., Hodgins, J., & Pollard, N. (2004). Synthesizing physically realistic human motion in low dimensional, behavior-specific spaces. In ACM computer graphics (SIGGRAPH) (pp. 514–521).
Schölkopf, B., Smola, A., & Müller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(1), 1299–1319.
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464.
Shakhnarovich, G., Viola, P., & Darrel, T. (2003). Fast pose estimation with parameter sensitive hashing. In Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 750–757).
Sidenbladh, H., Black, M., & Fleet, D. (2000). Stochastic tracking of 3D human figures using 2D image motion. In Proceedings of the European conference on computer vision (ECCV) (pp. 702–718).
Sigal, L., Bhatia, S., Roth, S., Black, M., & Isard, M. (2004). Tracking loose-limbed people. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 421–428).
Sigal, L., & Black, M. (2006). HumanEva: synchronized video and motion capture dataset for evaluation of articulated human motion (Technical Report CS-06-08). Brown University.
Silva, V., & Tenenbaum, J. (2003). Global versus local methods in nonlinear dimensionality reduction. In Advances in neural information processing systems (NIPS) (pp. 705–712).
Sminchisescu, C., & Jepson, A. (2004). Generative modelling for continuous non-linearly embedded visual inference. In Proceedings of the IEEE international conference on machine learning (ICML) (pp. 140–147).
Sminchisescu, C., & Triggs, B. (2001). Covariance scaled sampling for monocular 3D body tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 447–454).
Snelson, E., & Ghahramani, Z. (2006). Sparse Gaussian processes using pseudo-inputs. In Advances in neural information processing systems (NIPS) (pp. 1259–1226).
Stenger, B., Thayananthan, A., Torr, P., & Cipolla, R. (2003). Filtering using a tree-based esimator. In Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 1063–1070).
Sullivan, J., & Rittscher, J. (2001). Guiding random particles by deterministic search. In Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 323–330).
Teh, W.-Y., & Roweis, S. (2002). Automatic alignment of local representations. In Advances in neural information processing systems (NIPS) (pp. 841–848).
Tenenbaum, J., Silva, V., & Langford, J. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290, 2319–2323.
Tian, T.-P., Li, R., & Sclaroff, S. (2005a). Articulated pose estimation in a learned smooth space of feasible solutions. In Learning workshop in conjunction with CVPR.
Tian, T.-P., Li, R., & Sclaroff, S. (2005b). Tracking human body pose on a learned smooth space (Technical Report 2005-029). Boston University.
Urtasun, R., Fleet, D., Hertzmann, A., & Fua, P. (2005). Priors for people tracking from small training sets. In Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 403–410).
Urtasun, R., Fleet, D., & Fua, P. (2006). 3D people tracking with Gaussian process dynamical models. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 238–245).
Urtasun, R., Fleet, D., & Lawrence, N. (2008). Topologically-constrained latent variable models. In Proceedings of the IEEE international conference on machine learning (ICML).
Verbeek, J. (2006). Learning non-linear image manifolds by combining local linear models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(10), 1864–1875.
Wang, L., Hu, W., & Tan, T. (2003). Recent development in human motion analysis. Pattern Recognition, 36(3), 585–601.
Wang, J., Fleet, D., & Hertzman, A. (2008). Gaussian process and dynamical models for human motion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(2), 283–298.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Li, R., Tian, TP., Sclaroff, S. et al. 3D Human Motion Tracking with a Coordinated Mixture of Factor Analyzers. Int J Comput Vis 87, 170–190 (2010). https://doi.org/10.1007/s11263-009-0283-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-009-0283-4