欢迎访问《茶叶科学》,今天是

茶叶科学 ›› 2023, Vol. 43 ›› Issue (3): 411-423.doi: 10.13305/j.cnki.jts.2023.03.006

• 研究报告 • 上一篇    下一篇

多茶类CNN图像识别的数据增强优化及类激活映射量化评价

章展熠1, 张宝荃1, 王周立1, 杨垚1, 范冬梅1, 何卫中2, 马军辉3,*, 林杰1,*   

  1. 1.浙江农林大学茶学与茶文化学院,浙江 临安 311300;
    2.丽水市农林科学研究院,浙江 丽水 323000;
    3.丽水市经济作物总站,浙江 丽水 323000
  • 收稿日期:2023-02-14 修回日期:2023-04-19 出版日期:2023-06-15 发布日期:2023-06-29
  • 通讯作者: *278805795@qq.com;linjie@zafu.edu.cn
  • 作者简介:章展熠,女,在读本科生,茶学专业,2248559187@qq.com。
  • 基金资助:
    国家级大学生创新创业训练计划项目(202110341061)、2022年丽水市茶产业专家团队项目(202203)、浙江省农业重大技术协同推广计划(2022XTTGCY04)

Data Enhancement Optimization and Class Activation Mapping Quantitative Evaluation for CNN Image Recognition of Multiple Tea Categories

ZHANG Zhanyi1, ZHANG Baoquan1, WANG Zhouli1, YANG Yao1, FAN Dongmei1, HE Weizhong2, MA Junhui3,*, LIN Jie1,*   

  1. 1. College of Tea Science and Tea Culture, Zhejiang A&F University, Lin'an 311300, China;
    2. Lishui Academy of Agricultural and Forestry Sciences, Lishui 323000, China;
    3. Lishui Economic Crop Terminal, Lishui 323000, China
  • Received:2023-02-14 Revised:2023-04-19 Online:2023-06-15 Published:2023-06-29

摘要: 我国茶叶种类繁多,识别难度大。卷积神经网络(Convolutional neural network,CNN)图像识别具有客观性、适应复杂图片背景且可移植于移动端的优势。但当前茶叶CNN图像识别缺乏对数据增强优化和识别准确性客观评价的研究,限制了模型识别的鲁棒性和泛化能力。采集29种常见茶类共6 123张图像构建数据集,对比了10种图像数据增强方法的ResNet-18(Residual network-18)训练效果;为了客观评价模型识别区域的准确性,构建了2个梯度加权类激活映射(Gradient-weighted class activation mapping,Grad-CAM)量化评价指标(IOB和MPI)。结果表明,网格擦除(Ratio=0.3)、分辨率扰动和HSV(Hue,Saturation,Value)颜色空间扰动是较优的数据增强方法,准确率(Accuracy)、损失值(Loss)、IOB和MPI等4个指标综合表现较优。进一步通过消融实验,得到了最佳的数据增强方法组合—水平镜像翻转+网格擦除(Ratio=0.3)+HSV颜色空间扰动,其模型测试准确率达到了99.82%、损失值仅有0.64,且IOB、MPI指标也表现较优,体现了较好的图像识别区域准确性。本研究对茶叶图像数据增强方法进行了优化,训练得到了高鲁棒性的多茶类CNN图像识别模型,构建的量化指标IOB和MPI也解决了CAM识别区域准确性客观评价的问题。

关键词: 茶类识别, 卷积神经网络, 图像识别, 数据增强, 类激活映射

Abstract: There are many kinds of tea in China, and subjective identification is easy to be confused and very dependent on professional experience. Convolutional Neural Network (CNN) image recognition applied to multi-tea identification has the advantages of objectivity, adaptability to complex image backgrounds and portability to mobile devices. However, the current CNN image recognition of tea lacks data enhancement optimization and objective evaluation of recognition accuracy, which limits the robustness and generalization ability of model recognition. In this study, a total of 6 123 images of 29 common tea categories were collected to construct a dataset, and the ResNet-18 (Residual network-18) training effects of 10 image data enhancement methods were compared. To objectively evaluate the accuracy of the model recognition area, two gradient-weighted class activation mapping (Grad-CAM ) quantitative evaluation indexes (IOB and MPI) were constructed. The results show that grid erasure (Ratio=0.3), resolution perturbation and HSV (Hue, Saturation, Value) color space perturbation are better data enhancement methods, with four indicators of accuracy, loss, IOB and MPI performing better. Furthermore, through the ablation experiment, the optimal combination of data enhancement methods “horizontal mirror flip + grid erasure (Ratio=0.3) + HSV color perturbation” was obtained. The accuracy rate of model test reached 99.82%, with a loss value of only 0.64, and the IOB and MPI indicators also performed better, reflecting good accuracy in image recognition. This study optimized the tea image data enhancement method, and obtained the multi-tea CNN image recognition model with high robustness. The constructed quantization indexes IOB and MPI also solved the problem of accuracy evaluation of CAM recognition region.

Key words: tea recognition, convolutional neural network, image recognition, data augmentation, class activation mapping

中图分类号: