Transfer of Knowledge Across Tasks

Vilalta, Ricardo; Meskhi, Mikhail M.

doi:10.1007/978-3-030-67024-5_12

Ricardo Vilalta⁶ &
Mikhail M. Meskhi⁶

Part of the book series: Cognitive Technologies ((COGTECH))

12k Accesses
1 Citations

Abstract

This area is often referred to as transfer of knowledge across tasks, or simply transfer learning; it aims at developing learning algorithms that leverage the results of previous learning tasks. This chapter discusses different approaches in transfer learning, such as representational transfer, where transfer takes place after one or more source models have been trained. There is an explicit form of knowledge transferred directly to the target model or to the meta-model. The chapter also discusses functional transfer, where two or more models are trained simultaneously. This situation is sometimes referred to as multi-task learning. In this approach, the models share their internal structure (or possibly some parts) during learning. Other topics include instance-, feature-, and parameter-based transfer learning, often used to initialize the search on the target domain. A distinct topic is transfer learning in neural networks, which includes, for instance, the transfer of a part of the network structure. The chapter also presents the double loop architecture, where the base-learner iterates over the training set in an inner loop, while the metalearner iterates over different tasks to learn metaparameters in an outer loop. Details are given on transfer learning within kernel methods and parametric Bayesian models.

Download to read the full chapter text

Chapter PDF

Inductive Transfer

Limits of Transfer Learning

Inductive Transfer

References

Andrychowicz, M., Denil, M., Colmenarejo, S. G., Hoffman, M. W., Pfau, D., Schaul, T., Shillingford, B., and de Freitas, N. (2016). Learning to learn by gradient descent by gradient descent. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, pages 3988–3996, USA. Curran Associates Inc.
Google Scholar
Argyriou, A., Evgeniou, T., and Pontil, M. (2007). Multi-task feature learning. In Advances in neural information processing systems 20, NIPS’07, pages 41–48.
Google Scholar
Bakker, B. and Heskes, T. (2003). Task clustering and gating for Bayesian multitask learning. Journal of Machine Learning Research, 4:83–99.
Google Scholar
Basura, F., Habrard, A., Sebban, M., and Tuytelaars, T. (2013). Unsupervised visual domain adaptation using subspace alignment. In Proceedings of the IEEE International Conference on Computer Vision, ICCV, pages 2960–2967.
Google Scholar
Baxter, J. (1998). Theoretical models of learning to learn. In Thrun, S. and Pratt, L., editors, Learning to Learn, chapter 4, pages 71–94. Springer-Verlag.
Google Scholar
Baxter, J. (2000). A model of inductive learning bias. Journal of Artificial Intelligence Research, 12:149–198.
Google Scholar
Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., and Vaughan, J. W. (2010). A theory of learning from different domains. Mach. Learn., 79(1-2):151–175.
Google Scholar
Bengio, Y. (2012). Deep learning of representations for unsupervised and transfer learning. In Proceedings of ICML Workshop on Unsupervised and Transfer Learning, pages 17–36.
Google Scholar
Bertinetto, L., Henriques, J. F., Torr, P. H. S., and Vedaldi, A. (2019). Meta-learning with differentiable closed-form solvers. In International Conference on Learning Representations, ICLR’19.
Google Scholar
Bickel, S., Brückner, M., and Scheffer, T. (2009). Discriminative learning under covariate shift. J. Mach. Learn. Res., 10:2137–2155.
Google Scholar
Blitzer, J., McDonald, R., and Pereira, F. (2006). Domain adaptation with structural correspondence learning. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, ACL, pages 120–128.
Google Scholar
Blumer, A., Haussler, D., and Warmuth, M. K. (1989). Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM, 36(1):929–965.
Google Scholar
Bousquet, O. and Elisseeff, A. (2002). Stability and generalization. Journal of Machine Learning Research, 2:499–526.
Google Scholar
Cao, X., Wipf, D., Wen, F., and Duan, G. (2013). A practical transfer learning algorithm for face verification. In International Conference on Computer Vision (ICCV).
Google Scholar
Chopra, S., Hadsell, R., and LeCun, Y. (2005). Learning a similarity metric discriminatively, with application to face verification. In Proceedings of Computer Vision and Pattern Recognition (CVPR’05) - Volume 1, CVPR ’05, pages 539–546, Washington, DC, USA. IEEE Computer Society.
Google Scholar
Duda, R. O., Hart, P. E., and Stork, D. G. (2001). Pattern Classification (2 ed.). John Wiley & Sons, New York.
Google Scholar
Evgeniou, T. and Pontil, M. (2004). Regularized multi-task learning. In Tenth Conference on Knowledge Discovery and Data Mining.
Google Scholar
Finn, C., Abbeel, P., and Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning, ICML’17, pages 1126–1135. JMLR.org.
Google Scholar
Finn, C., Xu, K., and Levine, S. (2018). Probabilistic model-agnostic meta-learning. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, pages 9537–9548, USA. Curran Associates Inc.
Google Scholar
Geman, S., Bienenstock, E., and Doursat, R. (1992). Neural networks and the bias/variance dilemma. Neural Computation, pages 1–58.
Google Scholar
Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press.
Google Scholar
Graves, A., Wayne, G., and Danihelka, I. (2014). Neural Turing Machines. arXiv preprint arXiv:1410.5401.
Gretton, A., Smola, A., Huang, J., Schmittfull, M., Borgwardt, K., and Schӧlkopf, B. (2009). Covariate shift by kernel mean matching. In Quiñonero-Candela, J., Sugiyama, M., Schwaighofer, A., and Lawrence, N. D., editors, Dataset Shift in Machine Learning, pages 131–160. MIT Press, Cambridge, MA.
Google Scholar
Hastie, T., Tbshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edition. Springer.
Google Scholar
Heskes, T. (2000). Empirical Bayes for Learning to Learn. In Proceedings of the 17th International Conference on Machine Learning, ICML’00, pages 367–374. Morgan Kaufmann, San Francisco, CA.
Google Scholar
Hochreiter, S., Younger, A. S., and Conwell, P. R. (2001). Learning to learn using gradient descent. In Dorffner, G., Bischof, H., and Hornik, K., editors, Lecture Notes on Comp. Sci. 2130, Proc. Intl. Conf. on Artificial Neural Networks (ICANN-2001), pages 87–94. Springer.
Google Scholar
Kanamori, T., Hido, S., and Sugiyama, M. (2009). A least-squares approach to direct importance estimation. J. Mach. Learn. Res., 10:1391–1445.
Google Scholar
Koch, G., Zemel, R., and Salakhutdinov, R. (2015). Siamese Neural Networks for Oneshot Image Recognition. In Proceedings of the 32nd International Conference on Machine Learning, volume 37 of ICML’15. JMLR.org.
Google Scholar
Maclaurin, D., Duvenaud, D., and Adams, R. P. (2015). Gradient-based hyperparameter optimization through reversible learning. In Proceedings of the 32nd International Conference on Machine Learning, volume 37 of ICML’15, pages 2113–2122.
Google Scholar
Maurer, A. (2005). Algorithmic Stability and Meta-Learning. Journal of Machine Learning Research, 6:967–994.
Google Scholar
Munkhdalai, T. and Yu, H. (2017). Meta networks. In Precup, D. and Teh, Y. W., editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of ICML’34, pages 2554–2563, International Convention Centre, Sydney, Australia. JMLR.org.
Google Scholar
Quionero-Candela, J., Sugiyama, M., Schwaighofer, A., and Lawrence, N. D. (2009). Dataset shift in machine learning. The MIT Press.
Google Scholar
Ravi, S. and Larochelle, H. (2017). Optimization as a model for few-shot learning. In International Conference on Learning Representations, ICLR’17.
Google Scholar
Rosenstein, M. T., Marx, Z., and Kaelbling, L. P. (2005). To transfer or not to transfer. In Workshop at NIPS (Neural Information Processing Systems).
Google Scholar
Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., and Lillicrap, T. (2016). Metalearning with memory-augmented neural networks. In Proceedings of the 33rd International Conference on Machine Learning, ICML’16, pages 1842–1850. JMLR.org.
Google Scholar
Shawe-Taylor, J. and Cristianini, N. (2004). Kernel Methods for Pattern Analysis. Cambridge University Press.
Google Scholar
Shimodaira, H. (2000). Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference, 90(2):227–244.
Google Scholar
Sugiyama, M., Nakajima, S., Kashima, H., Buenau, P. V., and Kawanabe, M. (2008). Direct importance estimation with model selection and its application to covariate shift adaptation. In Advances in Neural Information Processing Systems 21, NIPS’08, pages 1433–1440.
Google Scholar
Taylor, M. E. and Stone, P. (2009). Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research, 10:1633–1685.
Google Scholar
Thrun, S. (1998). Lifelong Learning Algorithms. In Thrun, S. and Pratt, L., editors, Learning to Learn, pages 181–209. Kluwer Academic Publishers, MA.
Google Scholar
Thrun, S. and Mitchell, T. (1995). Learning One More Thing. In Proceedings of the International Joint Conference of Artificial Intelligence, pages 1217–1223.
Google Scholar
Thrun, S. and O’Sullivan, J. (1998). Clustering Learning Tasks and the Selective Cross-Task Transfer of Knowledge. In Thrun, S. and Pratt, L., editors, Learning to Learn, pages 235–257. Kluwer Academic Publishers, MA.
Google Scholar
Torrey, L. and Shavlik, J. (2010). Transfer learning. In Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques, pages 242–264. IGI Global.
Google Scholar
Vapnik, V. (1995). The Nature of Statistical Learning Theory. Springer Verlag, New York.
Google Scholar
Vilalta, R. and Drissi, Y. (2002). A perspective view and survey of meta-learning. Artificial Intelligence Review, 18(2):77–95.
Google Scholar
Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., and Wierstra, D. (2016). Matching networks for one shot learning. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, pages 3637–3645, USA. Curran Associates Inc.
Google Scholar
Weiss, K., Khoshgoftaar, T. M., and Wang, D. (2016). A survey of transfer learning. Journal of Big Data, 3(1).
Google Scholar
Yosinski, J., Clune, J., Bengio, Y., and Lipson, H. (2014). How transferable are features in deep neural networks? arXiv e-prints, page arXiv:1411.1792.
Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R., and Smola, A. (2017). Deep sets. arXiv e-prints, page arXiv:1703.06114.

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Houston, 501 PGH Building, Houston, TX, 77204-3010, USA
Ricardo Vilalta & Mikhail M. Meskhi

Authors

Ricardo Vilalta
View author publications
You can also search for this author in PubMed Google Scholar
Mikhail M. Meskhi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ricardo Vilalta .

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Vilalta, R., Meskhi, M.M. (2022). Transfer of Knowledge Across Tasks. In: Metalearning. Cognitive Technologies. Springer, Cham. https://doi.org/10.1007/978-3-030-67024-5_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-67024-5_12
Published: 22 February 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67023-8
Online ISBN: 978-3-030-67024-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Transfer of Knowledge Across Tasks

Abstract

Chapter PDF

Similar content being viewed by others

Inductive Transfer

Limits of Transfer Learning

Inductive Transfer

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Transfer of Knowledge Across Tasks

Abstract

Chapter PDF

Similar content being viewed by others

Inductive Transfer

Limits of Transfer Learning

Inductive Transfer

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation