Learning to assess visual aesthetics of food images

Sheng, Kekai; Dong, Weiming; Huang, Haibin; Chai, Menglei; Zhang, Yong; Ma, Chongyang; Hu, Bao-Gang

doi:10.1007/s41095-020-0193-5

Learning to assess visual aesthetics of food images

Research Article
Open access
Published: 28 November 2020

Volume 7, pages 139–152, (2021)
Cite this article

Download PDF

You have full access to this open access article

Computational Visual Media Aims and scope Submit manuscript

Learning to assess visual aesthetics of food images

Download PDF

Kekai Sheng^1,2,
Weiming Dong²,
Haibin Huang³,
Menglei Chai⁴,
Yong Zhang⁵,
Chongyang Ma³ &
…
Bao-Gang Hu²

1328 Accesses
14 Citations
Explore all metrics

Abstract

Distinguishing aesthetically pleasing food photos from others is an important visual analysis task for social media and ranking systems related to food. Nevertheless, aesthetic assessment of food images remains a challenging and relatively unexplored task, largely due to the lack of related food image datasets and practical knowledge. Thus, we present the Gourmet Photography Dataset (GPD), the first large-scale dataset for aesthetic assessment of food photos. It contains 24,000 images with corresponding binary aesthetic labels, covering a large variety of foods and scenes. We also provide a non-stationary regularization method to combat over-fitting and enhance the ability of tuned models to generalize. Quantitative results from extensive experiments, including a generalization ability test, verify that neural networks trained on the GPD achieve comparable performance to human experts on the task of aesthetic assessment. We reveal several valuable findings to support further research and applications related to visual aesthetic analysis of food images. To encourage further research, we have made the GPD publicly available at https://github.com/Openning07/GPA.

Article PDF

Classical learning or deep learning: a study on food photo aesthetic assessment

Article 11 May 2023

Technological Development of Image Aesthetics Assessment

Photo Aesthetics Ranking Network with Attributes and Content Adaptation

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Manna, L. Digital food photography. Cengage Learning PTR, 2015.
Murray, N.; Marchesotti, L.; Perronnin, F. Ava: A large-scale database for aesthetic visual analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2408–2415, 2012.
Ma, S.; Liu, J.; Chen, C. W. A-lamp: Adaptive layout-aware multi-patch deep convolutional neural network for photo aesthetic assessment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 722–731, 2017.
Hosu, V.; Goldlücke, B.; Saupe, D. Efiective aesthetics prediction with multi-level spatially pooled features. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9367–9375, 2019.
Bossard, L.; Guillaumin, M.; van Gool, L. Food-101—mining discriminative components with random forests. In: Computer Vision-ECCV 2014. Lecture Notes in Computer Science, Vol. 8694. Fleet, D.; Pajdla, T.; Schiele, B.; Tuytelaars, T. Eds. Springer Cham, 446–461, 2014.
Google Scholar
Zhang, X. J.; Lu, Y. F.; Zhang, S. H. Multi-task learning for food identification and analysis with deep convolutional neural networks. Journal of Computer Science and Technology Vol. 31, No. 3, 489–500, 2016.
Article Google Scholar
Salvador, A.; Hynes, N.; Aytar, Y.; Marin, J.; Oi, F.; Weber, I.; Torralba, A. Learning cross-modal embeddings for cooking recipes and food images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3068–3076, 2017.
Li, Y.; Sheopuri, A. Applying image analysis to assess food aesthetics and uniqueness. In: Proceedings of the IEEE International Conference on Image Processing, 311–314, 2015.
Luo, W.; Wang, X.; Tang, X. Content-based photo quality assessment. In: Proceedings of the IEEE International Conference on Computer Vision, 2206–2213, 2011.
Chen, X.; Zhu, Y.; Zhou, H.; Diao, L.; Wang, D. ChineseFoodNet: A large-scale image dataset for chinese food recognition. arXiv preprint arXiv:1705.02743, 2017.
Sheng, K. K.; Dong, W. M.; Huang, H. B.; Ma, C. Y.; Hu, B. G. Gourmet photography dataset for aesthetic assessment of food images. In: Proceedings of the SIGGRAPH Asia 2018 Technical Briefs, Article No. 20, 2018.
Datta, R.; Joshi, D.; Li, J.; Wang, J. Z. Studying aesthetics in photographic images using a computational approach. In: Computer Vision-ECCV 2006. Lecture Notes in Computer Science, Vol. 3953. Leonardis, A.; Bischof, H.; Pinz, A. Eds. Springer Berlin Heidelberg, 288–301, 2006.
Google Scholar
Zhang, F. L., Wang, M.; Hu, S. M. Aesthetic image enhancement by dependence-aware object recomposition. IEEE Transactions on Multimedia Vol. 15, No. 7, 1480–1490, 2013.
Article Google Scholar
Kong, S.; Shen, X. H.; Lin, Z.; Mech, R.; Fowlkes, C. Photo aesthetics ranking network with attributes and content adaptation. In: Computer Vision-ECCV 2016. Lecture Notes in Computer Science, Vol. 9905. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 662–679, 2016.
Chapter Google Scholar
Lu, X.; Lin, Z.; Shen, X.; Mech, R.; Wang, J. Z. Deep multi-patch aggregation network for image style, aesthetics, and quality estimation. In: Proceedings of the IEEE International Conference on Computer Vision, 990–998, 2015.
Talebi, H., Milanfar, P. NIMA: Neural image assessment. IEEE Transactions on Image Processing Vol. 27, No. 8, 3998–4011, 2018.
Article MathSciNet Google Scholar
Sheng, K. K.; Dong, W. M.; Ma, C. Y.; Mei, X.; Huang, F. Y.; Hu, B. G. Attention-based multipatch aggregation for image aesthetic assessment. In: Proceedings of the 26th ACM International Conference on Multimedia, 879–886, 2018.
Kucer, M.; Loui, A. C.; Messinger, D. W. Leveraging expert feature knowledge for predicting image aesthetics. IEEE Transactions on Image Processing Vol. 27, No. 10, 5100–5112, 2018.
Article MathSciNet Google Scholar
Liu, Z. G.; Wang, Z. P.; Yao, Y. Y.; Zhang, L. M.; Shao, L. Deep active learning with contaminated tags for image aesthetics assessment. IEEE Transactions on Image Processing doi: https://doi.org/10.1109/TIP.2018.2828326, 2018.
Sun, R.; Lian, Z.; Tang, Y.; Xiao, J. Aesthetic visual quality evaluation of Chinese handwritings. In: Proceedings of the International Joint Conferences on Artificial Intelligence, 2510–2516, 2015.
Chang, H. W.; Yu, F.; Wang, J.; Ashley, D.; Finkelstein, A. Automatic triage for a photo series. ACM Transactions on Graphics Vol. 35, No. 4, Article No. 148, 2016.
Chang, K.-Y.; Lu, K.-H.; Chen, C.-S. Aesthetic critiques generation for photos. In: Proceedings of the IEEE International Conference on Computer Vision, 3514–3523, 2017.
Hung, W.-C.; Zhang, J.; Shen, X.; Lin, Z.; Lee, J.-Y.; Yang, M.-H. Learning to blend photos. In: Proceedings of the European Conference on Computer Vision, 70–86, 2018.
Yu, W. H.; Zhang, H. D.; He, X. N.; Chen, X.; Xiong, L.; Qin, Z. Aesthetic-based clothing recommendation. In: Proceedings of the World Wide Web Conference, 649–658, 2018.
Hassannejad, H.; Matrella, G.; Ciampolini, P.; de Munari, I.; Mordonini, M.; Cagnoni, S. Food image recognition using very deep convolutional networks. In: Proceedings of the 2nd International Workshop on Multimedia Assisted Dietary Management, 41–49, 2016.
Meyers, A.; Johnston, N.; Rathod, V.; Korattikara, A.; Gorban, A.; Silberman, N.; Guadarrama, S.; Papandreou, G.; Huang, J.; Murphy, K. P. Im2Calories: Towards an automated mobile vision food diary. In: Proceedings of the IEEE International Conference on Computer Vision, 1233–1241, 2015.
Hinton, G. E.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2014.
Szegedy, C.; Vanhoucke, V.; Iofie, S.; Shlens, J.; Z. Wojna. Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2818–2826, 2016.
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research Vol. 15, No. 1, 1929–1958, 2014.
MathSciNet MATH Google Scholar
Krizhevsky, A.; Sutskever, I.; Hinton, G. E. ImageNet classification with deep convolutional neural networks. Communications of the ACM Vol. 60, No. 6, 84–90, 2017.
Article Google Scholar
Hein, M.; Andriushchenko, M.; Bitterwolf, J. Why ReLU networks yield high-confidence predictions far away from the training data and how to mitigate the problem. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 41–50, 2019.
Manning, C. D.; Raghavan, P.; Schütze, H. Introduction to Information Retrieval. Cambridge University Press, 2008.
Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 248–255, 2009.
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1–9, 2015.
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778, 2016.
Oliva, A.; Torralba, A. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision Vol. 42, No. 3, 145–175, 2001.
Article Google Scholar
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556v6, 2015.
Zhou, B. L.; Lapedriza, A.; Khosla, A.; Oliva, A.; Torralba, A. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 40, No. 6, 1452–1464, 2018.
Article Google Scholar
Zhang, R.; Efros, A. A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 586–595, 2018.
Mai, L.; Jin, H.; Liu, F. Composition-preserving deep photo aesthetics assessment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 497–506, 2016.
Zhang, X. D.; Gao, X. B.; Lu, W.; He, L. H. A gated peripheral-foveal convolutional neural network for unified image aesthetic prediction. IEEE Transactions on Multimedia Vol. 21, No. 11, 2815–2826, 2019.
Article Google Scholar
Deng, Y.; Loy, C. C.; Tang, X. Aesthetic-driven image enhancement by adversarial learning. In: Proceedings of the 26th ACM International Conference on Multimedia, 870–878, 2018.
Hu, Y.; He, H.; Xu, C.; Wang, B.; Lin, S. Exposure: A white-box photo post-processing framework. ACM Transactions on Graphics Vol. 37, No. 2, Article No. 26, 2018.
Xu, Z.; Huang, S. L.; Zhang, Y.; Tao, D. C. Webly-supervised fine-grained visual categorization via deep domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 40, No. 5, 1100–1113, 2018.
Article Google Scholar
Sheng, K. K.; Dong, W. M.; Kong, Y.; Mei, X.; Li, J. L.; Wang, C. J.; Huang, F.; Hu, B. Evaluating the quality of face alignment without ground truth. Computer Graphics Forum Vol. 34, No. 7, 213–223, 2015.
Article Google Scholar
Papadopoulos, D. P.; Tamaazousti, Y.; Oi, F.; Weber, I.; Torralba, A. How to make a pizza: Learning a compositional layer-based GAN model. In: proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8002–8011, 2019.

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant Nos. 61832016 and 61672520, and by a CASIA-Tencent Youtu joint research project.

Author information

Authors and Affiliations

Youtu Lab, Tencent, Shanghai, 200233, China
Kekai Sheng
NLPR, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
Kekai Sheng, Weiming Dong & Bao-Gang Hu
Kuaishou Technology, Beijing, 100085, China
Haibin Huang & Chongyang Ma
Snap Inc., Santa Monica, 90405, USA
Menglei Chai
AI Lab, Tencent Inc., Shenzhen, 518000, China
Yong Zhang

Authors

Kekai Sheng
View author publications
You can also search for this author in PubMed Google Scholar
Weiming Dong
View author publications
You can also search for this author in PubMed Google Scholar
Haibin Huang
View author publications
You can also search for this author in PubMed Google Scholar
Menglei Chai
View author publications
You can also search for this author in PubMed Google Scholar
Yong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Chongyang Ma
View author publications
You can also search for this author in PubMed Google Scholar
Bao-Gang Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weiming Dong.

Additional information

Kekai Sheng received his Ph.D. degree from the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences in 2019. He received his B.Eng. degree in telecommunication engineering from the University of Science and Technology, Beijing in 2014. He is currently a researcher engineer at Youtu Lab, Tencent Inc. His research interests include image quality evaluation, domain adaptation, and AutoML.

Weiming Dong is a professor in the Chinese-French Joint Laboratory for Computer Sciences, Control, and Applied Mathematics and the National Laboratory of Pattern Recognition at the Institute of Automation, Chinese Academy of Sciences. He received his B.Eng. and M.S. degrees in computer science in 2001 and 2004 from Tsinghua University. He received his Ph.D. degree in information technology from the University of Lorraine, France, in 2007. His research interests include visual media synthesis and evaluation. He is a member of the ACM and IEEE.

Haibin Huang is a senior research scientist at Kuaishou Technology. He obtained his Ph.D. degree in computer science from UMass Amherst. He obtained his B.S. and an M.S. degrees in the Department of Mathematics, Zhejiang University. His research focuses on visual content analysis and creation.

Menglei Chai is a senior research scientist at Snap Inc. He received his Ph.D. and B.Eng. degrees in computer science from Zhejiang University in 2017 and 2011 respectively. His research interests are in computer vision and graphics, especially in photo manipulation and physics-based simulation.

Yong Zhang is a senior researcher in the Tencent AI Lab. He received his Ph.D. degree from the Institute of Automation, Chinese Academy of Sciences in 2018. He was supervised by Prof. Bao-Gang Hu and Prof. Weiming Dong at the National Laboratory of Pattern Recognition. He obtained his B.Eng degree in automation from Hunan University in 2012. His research is on computer vision and machine learning, particularly human facial behavior analysis, face recognition, and face synthesis.

Chongyang Ma received his B.S. degree in fundamental science (mathematics and physics) from Tsinghua University in 2007 and his Ph.D. degree in computer science from the Institute for Advanced Study of Tsinghua University in 2012. He is currently a research leader at Kuaishou Technology. His research interests include computer graphics and computer vision.

Bao-Gang Hu is a full professor at the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences. He received his M.S. degree from the University of Science and Technology, Beijing, China in 1983, and his Ph.D. degree from McMaster University, Canada in 1993, both in mechanical engineering. He worked as a lecturer in the University of Science and Technology, Beijing, from 1983 to 1987. From 1994 to 1997, he was a research engineer and senior research engineer at C-CORE, the Memorial University of Newfoundland, Canada. From 2000 to 2005, he was the Chinese Director of the Chinese-French Joint Laboratory for Computer Science, Control and Applied Mathematics. He is Senior Member of the IEEE.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.

Reprints and permissions

About this article

Cite this article

Sheng, K., Dong, W., Huang, H. et al. Learning to assess visual aesthetics of food images. Comp. Visual Media 7, 139–152 (2021). https://doi.org/10.1007/s41095-020-0193-5

Download citation

Received: 09 June 2020
Accepted: 25 August 2020
Published: 28 November 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s41095-020-0193-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Learning to assess visual aesthetics of food images

Abstract

Article PDF

Similar content being viewed by others

Classical learning or deep learning: a study on food photo aesthetic assessment

Technological Development of Image Aesthetics Assessment

Photo Aesthetics Ranking Network with Attributes and Content Adaptation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning to assess visual aesthetics of food images

Abstract

Article PDF

Similar content being viewed by others

Classical learning or deep learning: a study on food photo aesthetic assessment

Technological Development of Image Aesthetics Assessment

Photo Aesthetics Ranking Network with Attributes and Content Adaptation

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation