欢迎访问《茶叶科学》,今天是

茶叶科学 ›› 2025, Vol. 45 ›› Issue (3): 393-401.

• 研究报告 • 上一篇    下一篇

AI茶树育种技术:以黄化性状预测为例

徐歆1,3, 李亚奇1,3, 杨亦扬4, 徐琪1,3, 钱雪飞1,3, 马春雷2,*, 梅菊芬1,3,*   

  1. 1.江苏省茶叶研究所,无锡市茶叶品种研究所有限公司,江苏 无锡 214000;
    2.中国农业科学院茶叶研究所/农业农村部特种经济动植物生物学与遗传育种重点实验室,浙江 杭州 310008;
    3.江苏省茶叶研究所,江苏省种质资源圃,江苏 无锡 214000;
    4.江苏省农业科学院休闲农业研究所,江苏 南京 210014
  • 收稿日期:2024-12-27 修回日期:2025-01-10 出版日期:2025-06-15 发布日期:2025-06-18
  • 通讯作者: *malei220@tricaas.com;meijufen@sina.com
  • 作者简介:徐歆,男,博士研究生,主要从事茶树智能化育种方面的研究。
  • 基金资助:
    国家重点研发计划(2024YFD1200504)、江苏省种业振兴“揭榜挂帅”项目(JBGS[2021]085)、“太湖之光”科技攻关项目(N20231002)

AI in Tea Breeding: A Case Study on Prediction of the Yellowing Trait

XU Xin1,3, LI Yaqi1,3, YANG Yiyang4, XU Qi1,3, QIAN Xuefei1,3, MA Chunlei2,*, MEI Jufen1,3,*   

  1. 1. Jiangsu Tea Research Institute, Wuxi Tea Breeding Research Co., Ltd., Wuxi 214000, China;
    2. Tea Research Institute of the Chinese Academy of Agricultural Sciences/Key Laboratory of Biology, Genetics and Breeding of Special Economic Animals and Plants, Ministry of Agriculture and Rural Affairs, P. R. China, Hangzhou 310008, China;
    3. Jiangsu Tea Research Institute, Germplasm Resource Nursery of Jiangsu Province, Wuxi 214000, China;
    4. Institute of Leisure Agriculture, Jiangsu Academy of Agricultural Sciences, Nanjing 210014, China
  • Received:2024-12-27 Revised:2025-01-10 Online:2025-06-15 Published:2025-06-18

摘要: 茶树作为重要经济作物,其品质改良是育种的核心目标。针对传统茶树育种周期长(≥2年)、表型鉴定效率低等问题,以黄化茶树品种‘安吉黄叶’自然杂交的90个子代为材料,整合40 326个核心单核苷酸多态性(Single nucleotide polymorphism,SNP)位点的基因型数据与黄化表型连续两年观测数据(黄化∶非黄化=54∶36),系统比较了逻辑回归、随机森林和支持向量机3种机器学习模型的预测效能。结果显示,随机森林模型在十折交叉验证中性能最优,其精确度达78.96%,显著优于其他模型。通过特征重要性分析鉴定出两个关键遗传标记位点:Chr8_142477650(编码叶绿体型丙酮酸脱氢酶E1β亚基)和Chr8_126475215(参与RNA加工调控)。然而,在涵盖109份多源黄化种质的独立验证中,模型预测准确率降至21.10%,遗传背景差异导致的特征权重偏移是主要限制因素。该研究建立了茶树黄化性状的机器学习预测框架,将表型鉴定周期从24个月缩短至基因型即时分析,实现了育种早期阶段的性状预判。尽管跨品种泛化能力有待提升,但构建的SNP-表型关联模型为解析茶树基因型-表型复杂关联提供了可扩展的研究范式,标志着人工智能技术在木本植物复杂性状预测中的创新应用。

关键词: 茶树育种, 机器学习, 黄化性状, 性状预测

Abstract: Tea plant (Camellia sinensis), as a crucial economic crop, faces core challenges in quality improvement through breeding. To address the prolonged traditional breeding cycle (≥2 years) and inefficient phenotypic identification, this study utilized 90 progeny from natural hybridization of a chlorotic cultivar ‘Anjihuangye’, integrating genotypic data from 40 326 core single nucleotide polymorphism (SNP) loci with biennial phenotypic observations (chlorotic∶non-chlorotic = 54∶36). We systematically compared three machine learning models (logistic regression, random forest, and support vector machine) for predictive performance. The results demonstrate that the random forest model achieved the best performance in the 10-fold cross-validation, and its accuracy was 78.96%, which was significantly better than other models (P<0.05). Feature importance analysis identifies two critical genetic markers: Chr8_142477650 (encoding the chloroplast-localized pyruvate dehydrogenase E1 beta subunit) and Chr8_126475215 (involved in RNA processing regulation). However, independent validation using 109 germplasms with diverse yellowing trait reveals that the prediction accuracy of the model decreased to 21.10%, and the feature weight deviations caused by genetic background heterogeneity was the main limiting factor. In this study, a machine learning prediction framework for tea yellowing trait was established, which shortened the phenotypic identification cycle from 24 months to real-time genotype analysis, and realized the prediction of traits in the early stage of breeding. Although cross-cultivar generalizability requires improvement, the developed SNP-phenotype association model provided an extensible paradigm for deciphering genotype-phenotype complexity in tea plants, representing an innovative application of artificial intelligence in predicting complex traits of woody perennials.

Key words: tea breeding, machine learning, leaf yellowing trait, trait prediction

中图分类号: