Abstract
This area is often referred to as transfer of knowledge across tasks, or simply transfer learning; it aims at developing learning algorithms that leverage the results of previous learning tasks. This chapter discusses different approaches in transfer learning, such as representational transfer, where transfer takes place after one or more source models have been trained. There is an explicit form of knowledge transferred directly to the target model or to the meta-model. The chapter also discusses functional transfer, where two or more models are trained simultaneously. This situation is sometimes referred to as multi-task learning. In this approach, the models share their internal structure (or possibly some parts) during learning. Other topics include instance-, feature-, and parameter-based transfer learning, often used to initialize the search on the target domain. A distinct topic is transfer learning in neural networks, which includes, for instance, the transfer of a part of the network structure. The chapter also presents the double loop architecture, where the base-learner iterates over the training set in an inner loop, while the metalearner iterates over different tasks to learn metaparameters in an outer loop. Details are given on transfer learning within kernel methods and parametric Bayesian models.
Chapter PDF
Similar content being viewed by others
References
Andrychowicz, M., Denil, M., Colmenarejo, S. G., Hoffman, M. W., Pfau, D., Schaul, T., Shillingford, B., and de Freitas, N. (2016). Learning to learn by gradient descent by gradient descent. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, pages 3988–3996, USA. Curran Associates Inc.
Argyriou, A., Evgeniou, T., and Pontil, M. (2007). Multi-task feature learning. In Advances in neural information processing systems 20, NIPS’07, pages 41–48.
Bakker, B. and Heskes, T. (2003). Task clustering and gating for Bayesian multitask learning. Journal of Machine Learning Research, 4:83–99.
Basura, F., Habrard, A., Sebban, M., and Tuytelaars, T. (2013). Unsupervised visual domain adaptation using subspace alignment. In Proceedings of the IEEE International Conference on Computer Vision, ICCV, pages 2960–2967.
Baxter, J. (1998). Theoretical models of learning to learn. In Thrun, S. and Pratt, L., editors, Learning to Learn, chapter 4, pages 71–94. Springer-Verlag.
Baxter, J. (2000). A model of inductive learning bias. Journal of Artificial Intelligence Research, 12:149–198.
Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., and Vaughan, J. W. (2010). A theory of learning from different domains. Mach. Learn., 79(1-2):151–175.
Bengio, Y. (2012). Deep learning of representations for unsupervised and transfer learning. In Proceedings of ICML Workshop on Unsupervised and Transfer Learning, pages 17–36.
Bertinetto, L., Henriques, J. F., Torr, P. H. S., and Vedaldi, A. (2019). Meta-learning with differentiable closed-form solvers. In International Conference on Learning Representations, ICLR’19.
Bickel, S., Brückner, M., and Scheffer, T. (2009). Discriminative learning under covariate shift. J. Mach. Learn. Res., 10:2137–2155.
Blitzer, J., McDonald, R., and Pereira, F. (2006). Domain adaptation with structural correspondence learning. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, ACL, pages 120–128.
Blumer, A., Haussler, D., and Warmuth, M. K. (1989). Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM, 36(1):929–965.
Bousquet, O. and Elisseeff, A. (2002). Stability and generalization. Journal of Machine Learning Research, 2:499–526.
Cao, X., Wipf, D., Wen, F., and Duan, G. (2013). A practical transfer learning algorithm for face verification. In International Conference on Computer Vision (ICCV).
Chopra, S., Hadsell, R., and LeCun, Y. (2005). Learning a similarity metric discriminatively, with application to face verification. In Proceedings of Computer Vision and Pattern Recognition (CVPR’05) - Volume 1, CVPR ’05, pages 539–546, Washington, DC, USA. IEEE Computer Society.
Duda, R. O., Hart, P. E., and Stork, D. G. (2001). Pattern Classification (2 ed.). John Wiley & Sons, New York.
Evgeniou, T. and Pontil, M. (2004). Regularized multi-task learning. In Tenth Conference on Knowledge Discovery and Data Mining.
Finn, C., Abbeel, P., and Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning, ICML’17, pages 1126–1135. JMLR.org.
Finn, C., Xu, K., and Levine, S. (2018). Probabilistic model-agnostic meta-learning. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, pages 9537–9548, USA. Curran Associates Inc.
Geman, S., Bienenstock, E., and Doursat, R. (1992). Neural networks and the bias/variance dilemma. Neural Computation, pages 1–58.
Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press.
Graves, A., Wayne, G., and Danihelka, I. (2014). Neural Turing Machines. arXiv preprint arXiv:1410.5401.
Gretton, A., Smola, A., Huang, J., Schmittfull, M., Borgwardt, K., and Schӧlkopf, B. (2009). Covariate shift by kernel mean matching. In Quiñonero-Candela, J., Sugiyama, M., Schwaighofer, A., and Lawrence, N. D., editors, Dataset Shift in Machine Learning, pages 131–160. MIT Press, Cambridge, MA.
Hastie, T., Tbshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edition. Springer.
Heskes, T. (2000). Empirical Bayes for Learning to Learn. In Proceedings of the 17th International Conference on Machine Learning, ICML’00, pages 367–374. Morgan Kaufmann, San Francisco, CA.
Hochreiter, S., Younger, A. S., and Conwell, P. R. (2001). Learning to learn using gradient descent. In Dorffner, G., Bischof, H., and Hornik, K., editors, Lecture Notes on Comp. Sci. 2130, Proc. Intl. Conf. on Artificial Neural Networks (ICANN-2001), pages 87–94. Springer.
Kanamori, T., Hido, S., and Sugiyama, M. (2009). A least-squares approach to direct importance estimation. J. Mach. Learn. Res., 10:1391–1445.
Koch, G., Zemel, R., and Salakhutdinov, R. (2015). Siamese Neural Networks for Oneshot Image Recognition. In Proceedings of the 32nd International Conference on Machine Learning, volume 37 of ICML’15. JMLR.org.
Maclaurin, D., Duvenaud, D., and Adams, R. P. (2015). Gradient-based hyperparameter optimization through reversible learning. In Proceedings of the 32nd International Conference on Machine Learning, volume 37 of ICML’15, pages 2113–2122.
Maurer, A. (2005). Algorithmic Stability and Meta-Learning. Journal of Machine Learning Research, 6:967–994.
Munkhdalai, T. and Yu, H. (2017). Meta networks. In Precup, D. and Teh, Y. W., editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of ICML’34, pages 2554–2563, International Convention Centre, Sydney, Australia. JMLR.org.
Quionero-Candela, J., Sugiyama, M., Schwaighofer, A., and Lawrence, N. D. (2009). Dataset shift in machine learning. The MIT Press.
Ravi, S. and Larochelle, H. (2017). Optimization as a model for few-shot learning. In International Conference on Learning Representations, ICLR’17.
Rosenstein, M. T., Marx, Z., and Kaelbling, L. P. (2005). To transfer or not to transfer. In Workshop at NIPS (Neural Information Processing Systems).
Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., and Lillicrap, T. (2016). Metalearning with memory-augmented neural networks. In Proceedings of the 33rd International Conference on Machine Learning, ICML’16, pages 1842–1850. JMLR.org.
Shawe-Taylor, J. and Cristianini, N. (2004). Kernel Methods for Pattern Analysis. Cambridge University Press.
Shimodaira, H. (2000). Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference, 90(2):227–244.
Sugiyama, M., Nakajima, S., Kashima, H., Buenau, P. V., and Kawanabe, M. (2008). Direct importance estimation with model selection and its application to covariate shift adaptation. In Advances in Neural Information Processing Systems 21, NIPS’08, pages 1433–1440.
Taylor, M. E. and Stone, P. (2009). Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research, 10:1633–1685.
Thrun, S. (1998). Lifelong Learning Algorithms. In Thrun, S. and Pratt, L., editors, Learning to Learn, pages 181–209. Kluwer Academic Publishers, MA.
Thrun, S. and Mitchell, T. (1995). Learning One More Thing. In Proceedings of the International Joint Conference of Artificial Intelligence, pages 1217–1223.
Thrun, S. and O’Sullivan, J. (1998). Clustering Learning Tasks and the Selective Cross-Task Transfer of Knowledge. In Thrun, S. and Pratt, L., editors, Learning to Learn, pages 235–257. Kluwer Academic Publishers, MA.
Torrey, L. and Shavlik, J. (2010). Transfer learning. In Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques, pages 242–264. IGI Global.
Vapnik, V. (1995). The Nature of Statistical Learning Theory. Springer Verlag, New York.
Vilalta, R. and Drissi, Y. (2002). A perspective view and survey of meta-learning. Artificial Intelligence Review, 18(2):77–95.
Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., and Wierstra, D. (2016). Matching networks for one shot learning. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, pages 3637–3645, USA. Curran Associates Inc.
Weiss, K., Khoshgoftaar, T. M., and Wang, D. (2016). A survey of transfer learning. Journal of Big Data, 3(1).
Yosinski, J., Clune, J., Bengio, Y., and Lipson, H. (2014). How transferable are features in deep neural networks? arXiv e-prints, page arXiv:1411.1792.
Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R., and Smola, A. (2017). Deep sets. arXiv e-prints, page arXiv:1703.06114.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2022 The Author(s)
About this chapter
Cite this chapter
Vilalta, R., Meskhi, M.M. (2022). Transfer of Knowledge Across Tasks. In: Metalearning. Cognitive Technologies. Springer, Cham. https://doi.org/10.1007/978-3-030-67024-5_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-67024-5_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67023-8
Online ISBN: 978-3-030-67024-5
eBook Packages: Computer ScienceComputer Science (R0)