Welcome to Journal of Tea Science,Today is

Journal of Tea Science ›› 2025, Vol. 45 ›› Issue (3): 393-401.

• Research Paper • Previous Articles     Next Articles

AI in Tea Breeding: A Case Study on Prediction of the Yellowing Trait

XU Xin1,3, LI Yaqi1,3, YANG Yiyang4, XU Qi1,3, QIAN Xuefei1,3, MA Chunlei2,*, MEI Jufen1,3,*   

  1. 1. Jiangsu Tea Research Institute, Wuxi Tea Breeding Research Co., Ltd., Wuxi 214000, China;
    2. Tea Research Institute of the Chinese Academy of Agricultural Sciences/Key Laboratory of Biology, Genetics and Breeding of Special Economic Animals and Plants, Ministry of Agriculture and Rural Affairs, P. R. China, Hangzhou 310008, China;
    3. Jiangsu Tea Research Institute, Germplasm Resource Nursery of Jiangsu Province, Wuxi 214000, China;
    4. Institute of Leisure Agriculture, Jiangsu Academy of Agricultural Sciences, Nanjing 210014, China
  • Received:2024-12-27 Revised:2025-01-10 Online:2025-06-15 Published:2025-06-18

Abstract: Tea plant (Camellia sinensis), as a crucial economic crop, faces core challenges in quality improvement through breeding. To address the prolonged traditional breeding cycle (≥2 years) and inefficient phenotypic identification, this study utilized 90 progeny from natural hybridization of a chlorotic cultivar ‘Anjihuangye’, integrating genotypic data from 40 326 core single nucleotide polymorphism (SNP) loci with biennial phenotypic observations (chlorotic∶non-chlorotic = 54∶36). We systematically compared three machine learning models (logistic regression, random forest, and support vector machine) for predictive performance. The results demonstrate that the random forest model achieved the best performance in the 10-fold cross-validation, and its accuracy was 78.96%, which was significantly better than other models (P<0.05). Feature importance analysis identifies two critical genetic markers: Chr8_142477650 (encoding the chloroplast-localized pyruvate dehydrogenase E1 beta subunit) and Chr8_126475215 (involved in RNA processing regulation). However, independent validation using 109 germplasms with diverse yellowing trait reveals that the prediction accuracy of the model decreased to 21.10%, and the feature weight deviations caused by genetic background heterogeneity was the main limiting factor. In this study, a machine learning prediction framework for tea yellowing trait was established, which shortened the phenotypic identification cycle from 24 months to real-time genotype analysis, and realized the prediction of traits in the early stage of breeding. Although cross-cultivar generalizability requires improvement, the developed SNP-phenotype association model provided an extensible paradigm for deciphering genotype-phenotype complexity in tea plants, representing an innovative application of artificial intelligence in predicting complex traits of woody perennials.

Key words: tea breeding, machine learning, leaf yellowing trait, trait prediction

CLC Number: