AI茶树育种技术：以黄化性状预测为例

徐歆; 李亚奇; 杨亦扬; 徐琪; 钱雪飞; 马春雷; 梅菊芬

doi:10.13305/j.cnki.jts.2025.03.006

茶叶科学 >

2025 , Vol. 45 >Issue 3: 393 - 401

DOI: https://doi.org/10.13305/j.cnki.jts.2025.03.006

研究报告

AI茶树育种技术：以黄化性状预测为例

徐歆 ,
李亚奇 ,
杨亦扬 ,
徐琪 ,
钱雪飞 ,
马春雷 ,
梅菊芬

展开

1.江苏省茶叶研究所,无锡市茶叶品种研究所有限公司,江苏无锡 214000;
2.中国农业科学院茶叶研究所/农业农村部特种经济动植物生物学与遗传育种重点实验室,浙江杭州 310008;
3.江苏省茶叶研究所,江苏省种质资源圃,江苏无锡 214000;
4.江苏省农业科学院休闲农业研究所,江苏南京 210014

徐歆,男,博士研究生,主要从事茶树智能化育种方面的研究。

收稿日期: 2024-12-27

修回日期: 2025-01-10

网络出版日期: 2025-06-18

基金资助

国家重点研发计划（2024YFD1200504）、江苏省种业振兴“揭榜挂帅”项目（JBGS[2021]085）、“太湖之光”科技攻关项目（N20231002）

收起

AI in Tea Breeding: A Case Study on Prediction of the Yellowing Trait

XU Xin ,
LI Yaqi ,
YANG Yiyang ,
XU Qi ,
QIAN Xuefei ,
MA Chunlei ,
MEI Jufen

Expand

1. Jiangsu Tea Research Institute, Wuxi Tea Breeding Research Co., Ltd., Wuxi 214000, China;
2. Tea Research Institute of the Chinese Academy of Agricultural Sciences/Key Laboratory of Biology, Genetics and Breeding of Special Economic Animals and Plants, Ministry of Agriculture and Rural Affairs, P. R. China, Hangzhou 310008, China;
3. Jiangsu Tea Research Institute, Germplasm Resource Nursery of Jiangsu Province, Wuxi 214000, China;
4. Institute of Leisure Agriculture, Jiangsu Academy of Agricultural Sciences, Nanjing 210014, China

Received date: 2024-12-27

Revised date: 2025-01-10

Online published: 2025-06-18

Fold

摘要

茶树作为重要经济作物,其品质改良是育种的核心目标。针对传统茶树育种周期长（≥2年）、表型鉴定效率低等问题,以黄化茶树品种‘安吉黄叶’自然杂交的90个子代为材料,整合40 326个核心单核苷酸多态性（Single nucleotide polymorphism,SNP）位点的基因型数据与黄化表型连续两年观测数据（黄化∶非黄化=54∶36）,系统比较了逻辑回归、随机森林和支持向量机3种机器学习模型的预测效能。结果显示,随机森林模型在十折交叉验证中性能最优,其精确度达78.96%,显著优于其他模型。通过特征重要性分析鉴定出两个关键遗传标记位点：Chr8_142477650（编码叶绿体型丙酮酸脱氢酶E1β亚基）和Chr8_126475215（参与RNA加工调控）。然而,在涵盖109份多源黄化种质的独立验证中,模型预测准确率降至21.10%,遗传背景差异导致的特征权重偏移是主要限制因素。该研究建立了茶树黄化性状的机器学习预测框架,将表型鉴定周期从24个月缩短至基因型即时分析,实现了育种早期阶段的性状预判。尽管跨品种泛化能力有待提升,但构建的SNP-表型关联模型为解析茶树基因型-表型复杂关联提供了可扩展的研究范式,标志着人工智能技术在木本植物复杂性状预测中的创新应用。

关键词： 茶树育种; 黄化性状; 机器学习; 性状预测

本文引用格式

徐歆 , 李亚奇 , 杨亦扬 , 徐琪 , 钱雪飞 , 马春雷 , 梅菊芬 . AI茶树育种技术：以黄化性状预测为例[J]. 茶叶科学, 2025 , 45(3) : 393 -401 . DOI: 10.13305/j.cnki.jts.2025.03.006

Abstract

Tea plant (Camellia sinensis), as a crucial economic crop, faces core challenges in quality improvement through breeding. To address the prolonged traditional breeding cycle (≥2 years) and inefficient phenotypic identification, this study utilized 90 progeny from natural hybridization of a chlorotic cultivar ‘Anjihuangye’, integrating genotypic data from 40 326 core single nucleotide polymorphism (SNP) loci with biennial phenotypic observations (chlorotic∶non-chlorotic = 54∶36). We systematically compared three machine learning models (logistic regression, random forest, and support vector machine) for predictive performance. The results demonstrate that the random forest model achieved the best performance in the 10-fold cross-validation, and its accuracy was 78.96%, which was significantly better than other models (P<0.05). Feature importance analysis identifies two critical genetic markers: Chr8_142477650 (encoding the chloroplast-localized pyruvate dehydrogenase E1 beta subunit) and Chr8_126475215 (involved in RNA processing regulation). However, independent validation using 109 germplasms with diverse yellowing trait reveals that the prediction accuracy of the model decreased to 21.10%, and the feature weight deviations caused by genetic background heterogeneity was the main limiting factor. In this study, a machine learning prediction framework for tea yellowing trait was established, which shortened the phenotypic identification cycle from 24 months to real-time genotype analysis, and realized the prediction of traits in the early stage of breeding. Although cross-cultivar generalizability requires improvement, the developed SNP-phenotype association model provided an extensible paradigm for deciphering genotype-phenotype complexity in tea plants, representing an innovative application of artificial intelligence in predicting complex traits of woody perennials.

Key words： leaf yellowing trait; machine learning; tea breeding; trait prediction

参考文献

[1] 杜茜雅, 刘馨秋, 卢勇. 长江流域茶叶产地历史变迁及其影响因素[J]. 茶叶科学, 2024, 44(4): 694-706.
Du X Y, Liu X Q, Lu Y.Historical changes and influencing factors of tea producing areas in Yangtze River Basin[J]. Journal of Tea Science, 2024, 44(4): 694-706.
[2] 涂良剑, 林用松, 黄学敏, 等. 高EGCG茶树品系杂交技术研究[J]. 茶叶科学, 2012, 32(5): 426-431.
Tu L J, Lin Y S, Huang X M, et al.Hybridization technique for tea plant lines with high EGCG content[J]. Journal of Tea Science, 2012, 32(5): 426-431.
[3] Burghardt L T, Young N D, Tiffin P.A guide to genome-wide association mapping in plants[J]. Current Protocols in Plant Biology, 2017, 2(1): 22-38.
[4] Li J W, Zhou P, Hu Z H, et al.CsPAT1, a GRAS transcription factor, promotes lignin accumulation by antagonistic interacting with CsWRKY13 in tea plants[J]. The Plant Journal, 2024, 118(5): 1312-1326.
[5] Wang W L, Wang Y X, Li H, et al.Two MYB transcription factors (CsMYB2 and CsMYB26) are involved in flavonoid biosynthesis in tea plant [Camellia Sinensis (L.) O. Kuntze][J]. BMC Plant Biology, 2018, 18(1): 288. doi: 10.1186/s12870-018-1502-3.
[6] Li H, Teng R M, Liu J X, et al.Identification and analysis of genes involved in auxin, abscisic acid, gibberellin, and brassinosteroid metabolisms under drought stress in tender shoots of tea plants[J]. DNA and Cell Biology, 2019, 38(11): 1292-1302.
[7] Greener J G, Kandathil S M, Moffat L, et al.A guide to machine learning for biologists[J]. Nature Reviews Molecular Cell Biology, 2022, 23(1): 40-55.
[8] Montesinos-López O A, Montesinos-López A, Pérez-Rodríguez P, et al. A review of deep learning applications for genomic selection[J]. BMC Genomics, 2021, 22(1): 19. doi: 10.1186/s12864-020-07319-x.
[9] Yoosefzadeh-Najafabadi M, Rajcan I, Eskandari M.Optimizing genomic selection in soybean: an important improvement in agricultural genomics[J]. Heliyon, 2022, 8(11): e11873. doi: 10.1016/j.heliyon.2022.e11873.
[10] Sandhu K S, Lozada D N, Zhang Z W, et al.Deep learning for predicting complex traits in spring wheat breeding program[J]. Frontiers in Plant Science, 2021, 11: 613325. doi: 10.3389/fpls.2020.613325.
[11] Ornella L, Gonzalez-Camacho J M, Dreisigacker S, et al. Methods in molecular biology[M]. New York: Springer, 2017: 173-182.
[12] Liu Q, Zuo S M, Peng S S, et al.Development of machine learning methods for accurate prediction of plant disease resistance[J]. Engineering, 2024, 40: 100-110.
[13] Zhou M M, Kimbeng C A, Tew T L, et al.Logistic regression models to aid selection in early stages of sugarcane breeding[J]. Sugar Tech, 2014, 16(2): 150-156.
[14] Awad M, Khanna R.Efficient learning machines[M]. Berkeley: Apress, 2015: 39-66.
[15] Xiong Z, Cui Y X, Liu Z H, et al.Evaluating explorative prediction power of machine learning algorithms for materials discovery using k-fold forward cross-validation[J]. Computational Materials Science, 2020, 171: 109203. doi: 10.1016/j.commatsci.2019.109203.
[16] Qi Y F, Wang X M, Lei P, et al.The chloroplast metalloproteases VAR2 and EGY1 act synergistically to regulate chloroplast development in Arabidopsis[J]. Plant Biology, 2020, 295(4): 1036-1046.
[17] Noam S, Tamar E, Rosalind W, et al.Use of plant chloroplast RNA-binding proteins as orthogonal activators of chloroplast transgenes in the green alga Chlamydomonas reinhardtii[J]. Algal Research, 2021, 60: 102535. doi: 10.1016/j.algal.2021.102535.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献