Abstract
Distinguishing aesthetically pleasing food photos from others is an important visual analysis task for social media and ranking systems related to food. Nevertheless, aesthetic assessment of food images remains a challenging and relatively unexplored task, largely due to the lack of related food image datasets and practical knowledge. Thus, we present the Gourmet Photography Dataset (GPD), the first large-scale dataset for aesthetic assessment of food photos. It contains 24,000 images with corresponding binary aesthetic labels, covering a large variety of foods and scenes. We also provide a non-stationary regularization method to combat over-fitting and enhance the ability of tuned models to generalize. Quantitative results from extensive experiments, including a generalization ability test, verify that neural networks trained on the GPD achieve comparable performance to human experts on the task of aesthetic assessment. We reveal several valuable findings to support further research and applications related to visual aesthetic analysis of food images. To encourage further research, we have made the GPD publicly available at https://github.com/Openning07/GPA.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Manna, L. Digital food photography. Cengage Learning PTR, 2015.
Murray, N.; Marchesotti, L.; Perronnin, F. Ava: A large-scale database for aesthetic visual analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2408–2415, 2012.
Ma, S.; Liu, J.; Chen, C. W. A-lamp: Adaptive layout-aware multi-patch deep convolutional neural network for photo aesthetic assessment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 722–731, 2017.
Hosu, V.; Goldlücke, B.; Saupe, D. Efiective aesthetics prediction with multi-level spatially pooled features. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9367–9375, 2019.
Bossard, L.; Guillaumin, M.; van Gool, L. Food-101—mining discriminative components with random forests. In: Computer Vision-ECCV 2014. Lecture Notes in Computer Science, Vol. 8694. Fleet, D.; Pajdla, T.; Schiele, B.; Tuytelaars, T. Eds. Springer Cham, 446–461, 2014.
Zhang, X. J.; Lu, Y. F.; Zhang, S. H. Multi-task learning for food identification and analysis with deep convolutional neural networks. Journal of Computer Science and Technology Vol. 31, No. 3, 489–500, 2016.
Salvador, A.; Hynes, N.; Aytar, Y.; Marin, J.; Oi, F.; Weber, I.; Torralba, A. Learning cross-modal embeddings for cooking recipes and food images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3068–3076, 2017.
Li, Y.; Sheopuri, A. Applying image analysis to assess food aesthetics and uniqueness. In: Proceedings of the IEEE International Conference on Image Processing, 311–314, 2015.
Luo, W.; Wang, X.; Tang, X. Content-based photo quality assessment. In: Proceedings of the IEEE International Conference on Computer Vision, 2206–2213, 2011.
Chen, X.; Zhu, Y.; Zhou, H.; Diao, L.; Wang, D. ChineseFoodNet: A large-scale image dataset for chinese food recognition. arXiv preprint arXiv:1705.02743, 2017.
Sheng, K. K.; Dong, W. M.; Huang, H. B.; Ma, C. Y.; Hu, B. G. Gourmet photography dataset for aesthetic assessment of food images. In: Proceedings of the SIGGRAPH Asia 2018 Technical Briefs, Article No. 20, 2018.
Datta, R.; Joshi, D.; Li, J.; Wang, J. Z. Studying aesthetics in photographic images using a computational approach. In: Computer Vision-ECCV 2006. Lecture Notes in Computer Science, Vol. 3953. Leonardis, A.; Bischof, H.; Pinz, A. Eds. Springer Berlin Heidelberg, 288–301, 2006.
Zhang, F. L., Wang, M.; Hu, S. M. Aesthetic image enhancement by dependence-aware object recomposition. IEEE Transactions on Multimedia Vol. 15, No. 7, 1480–1490, 2013.
Kong, S.; Shen, X. H.; Lin, Z.; Mech, R.; Fowlkes, C. Photo aesthetics ranking network with attributes and content adaptation. In: Computer Vision-ECCV 2016. Lecture Notes in Computer Science, Vol. 9905. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 662–679, 2016.
Lu, X.; Lin, Z.; Shen, X.; Mech, R.; Wang, J. Z. Deep multi-patch aggregation network for image style, aesthetics, and quality estimation. In: Proceedings of the IEEE International Conference on Computer Vision, 990–998, 2015.
Talebi, H., Milanfar, P. NIMA: Neural image assessment. IEEE Transactions on Image Processing Vol. 27, No. 8, 3998–4011, 2018.
Sheng, K. K.; Dong, W. M.; Ma, C. Y.; Mei, X.; Huang, F. Y.; Hu, B. G. Attention-based multipatch aggregation for image aesthetic assessment. In: Proceedings of the 26th ACM International Conference on Multimedia, 879–886, 2018.
Kucer, M.; Loui, A. C.; Messinger, D. W. Leveraging expert feature knowledge for predicting image aesthetics. IEEE Transactions on Image Processing Vol. 27, No. 10, 5100–5112, 2018.
Liu, Z. G.; Wang, Z. P.; Yao, Y. Y.; Zhang, L. M.; Shao, L. Deep active learning with contaminated tags for image aesthetics assessment. IEEE Transactions on Image Processing doi: https://doi.org/10.1109/TIP.2018.2828326, 2018.
Sun, R.; Lian, Z.; Tang, Y.; Xiao, J. Aesthetic visual quality evaluation of Chinese handwritings. In: Proceedings of the International Joint Conferences on Artificial Intelligence, 2510–2516, 2015.
Chang, H. W.; Yu, F.; Wang, J.; Ashley, D.; Finkelstein, A. Automatic triage for a photo series. ACM Transactions on Graphics Vol. 35, No. 4, Article No. 148, 2016.
Chang, K.-Y.; Lu, K.-H.; Chen, C.-S. Aesthetic critiques generation for photos. In: Proceedings of the IEEE International Conference on Computer Vision, 3514–3523, 2017.
Hung, W.-C.; Zhang, J.; Shen, X.; Lin, Z.; Lee, J.-Y.; Yang, M.-H. Learning to blend photos. In: Proceedings of the European Conference on Computer Vision, 70–86, 2018.
Yu, W. H.; Zhang, H. D.; He, X. N.; Chen, X.; Xiong, L.; Qin, Z. Aesthetic-based clothing recommendation. In: Proceedings of the World Wide Web Conference, 649–658, 2018.
Hassannejad, H.; Matrella, G.; Ciampolini, P.; de Munari, I.; Mordonini, M.; Cagnoni, S. Food image recognition using very deep convolutional networks. In: Proceedings of the 2nd International Workshop on Multimedia Assisted Dietary Management, 41–49, 2016.
Meyers, A.; Johnston, N.; Rathod, V.; Korattikara, A.; Gorban, A.; Silberman, N.; Guadarrama, S.; Papandreou, G.; Huang, J.; Murphy, K. P. Im2Calories: Towards an automated mobile vision food diary. In: Proceedings of the IEEE International Conference on Computer Vision, 1233–1241, 2015.
Hinton, G. E.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2014.
Szegedy, C.; Vanhoucke, V.; Iofie, S.; Shlens, J.; Z. Wojna. Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2818–2826, 2016.
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research Vol. 15, No. 1, 1929–1958, 2014.
Krizhevsky, A.; Sutskever, I.; Hinton, G. E. ImageNet classification with deep convolutional neural networks. Communications of the ACM Vol. 60, No. 6, 84–90, 2017.
Hein, M.; Andriushchenko, M.; Bitterwolf, J. Why ReLU networks yield high-confidence predictions far away from the training data and how to mitigate the problem. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 41–50, 2019.
Manning, C. D.; Raghavan, P.; Schütze, H. Introduction to Information Retrieval. Cambridge University Press, 2008.
Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 248–255, 2009.
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1–9, 2015.
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778, 2016.
Oliva, A.; Torralba, A. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision Vol. 42, No. 3, 145–175, 2001.
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556v6, 2015.
Zhou, B. L.; Lapedriza, A.; Khosla, A.; Oliva, A.; Torralba, A. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 40, No. 6, 1452–1464, 2018.
Zhang, R.; Efros, A. A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 586–595, 2018.
Mai, L.; Jin, H.; Liu, F. Composition-preserving deep photo aesthetics assessment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 497–506, 2016.
Zhang, X. D.; Gao, X. B.; Lu, W.; He, L. H. A gated peripheral-foveal convolutional neural network for unified image aesthetic prediction. IEEE Transactions on Multimedia Vol. 21, No. 11, 2815–2826, 2019.
Deng, Y.; Loy, C. C.; Tang, X. Aesthetic-driven image enhancement by adversarial learning. In: Proceedings of the 26th ACM International Conference on Multimedia, 870–878, 2018.
Hu, Y.; He, H.; Xu, C.; Wang, B.; Lin, S. Exposure: A white-box photo post-processing framework. ACM Transactions on Graphics Vol. 37, No. 2, Article No. 26, 2018.
Xu, Z.; Huang, S. L.; Zhang, Y.; Tao, D. C. Webly-supervised fine-grained visual categorization via deep domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 40, No. 5, 1100–1113, 2018.
Sheng, K. K.; Dong, W. M.; Kong, Y.; Mei, X.; Li, J. L.; Wang, C. J.; Huang, F.; Hu, B. Evaluating the quality of face alignment without ground truth. Computer Graphics Forum Vol. 34, No. 7, 213–223, 2015.
Papadopoulos, D. P.; Tamaazousti, Y.; Oi, F.; Weber, I.; Torralba, A. How to make a pizza: Learning a compositional layer-based GAN model. In: proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8002–8011, 2019.
Acknowledgements
This work was supported by the National Natural Science Foundation of China under Grant Nos. 61832016 and 61672520, and by a CASIA-Tencent Youtu joint research project.
Author information
Authors and Affiliations
Corresponding author
Additional information
Kekai Sheng received his Ph.D. degree from the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences in 2019. He received his B.Eng. degree in telecommunication engineering from the University of Science and Technology, Beijing in 2014. He is currently a researcher engineer at Youtu Lab, Tencent Inc. His research interests include image quality evaluation, domain adaptation, and AutoML.
Weiming Dong is a professor in the Chinese-French Joint Laboratory for Computer Sciences, Control, and Applied Mathematics and the National Laboratory of Pattern Recognition at the Institute of Automation, Chinese Academy of Sciences. He received his B.Eng. and M.S. degrees in computer science in 2001 and 2004 from Tsinghua University. He received his Ph.D. degree in information technology from the University of Lorraine, France, in 2007. His research interests include visual media synthesis and evaluation. He is a member of the ACM and IEEE.
Haibin Huang is a senior research scientist at Kuaishou Technology. He obtained his Ph.D. degree in computer science from UMass Amherst. He obtained his B.S. and an M.S. degrees in the Department of Mathematics, Zhejiang University. His research focuses on visual content analysis and creation.
Menglei Chai is a senior research scientist at Snap Inc. He received his Ph.D. and B.Eng. degrees in computer science from Zhejiang University in 2017 and 2011 respectively. His research interests are in computer vision and graphics, especially in photo manipulation and physics-based simulation.
Yong Zhang is a senior researcher in the Tencent AI Lab. He received his Ph.D. degree from the Institute of Automation, Chinese Academy of Sciences in 2018. He was supervised by Prof. Bao-Gang Hu and Prof. Weiming Dong at the National Laboratory of Pattern Recognition. He obtained his B.Eng degree in automation from Hunan University in 2012. His research is on computer vision and machine learning, particularly human facial behavior analysis, face recognition, and face synthesis.
Chongyang Ma received his B.S. degree in fundamental science (mathematics and physics) from Tsinghua University in 2007 and his Ph.D. degree in computer science from the Institute for Advanced Study of Tsinghua University in 2012. He is currently a research leader at Kuaishou Technology. His research interests include computer graphics and computer vision.
Bao-Gang Hu is a full professor at the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences. He received his M.S. degree from the University of Science and Technology, Beijing, China in 1983, and his Ph.D. degree from McMaster University, Canada in 1993, both in mechanical engineering. He worked as a lecturer in the University of Science and Technology, Beijing, from 1983 to 1987. From 1994 to 1997, he was a research engineer and senior research engineer at C-CORE, the Memorial University of Newfoundland, Canada. From 2000 to 2005, he was the Chinese Director of the Chinese-French Joint Laboratory for Computer Science, Control and Applied Mathematics. He is Senior Member of the IEEE.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.
About this article
Cite this article
Sheng, K., Dong, W., Huang, H. et al. Learning to assess visual aesthetics of food images. Comp. Visual Media 7, 139–152 (2021). https://doi.org/10.1007/s41095-020-0193-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41095-020-0193-5