Genomic Selection for fast-tracking cotton breeding

Home » Cotton Breeding Australia » Genomic Selection for fast-tracking cotton breeding

Genomic Selection for fast-tracking cotton breeding

Project Leader: Warren Conaty

Key Researchers: Zitong Li, Philippe Moncuquet, Qian-Hao Zhu, Iain Wilson, Shiming Liu, Warwick Stiller and Warren Conaty

Brief Summary of Project Objectives:

This project aims to develop and evaluate a new predictive breeding approach called Genomic Selection (GS).

GS will allow the prediction of the phenotypic outcomes (for example, yield, fibre quality or other agronomic properties) in breeding populations based on the presence or absence of large numbers of DNA markers present in individual plants as well as the environmental data. GS is already being used in other crops like maize and soybean.

It has the potential to revolutionise the way we breed cotton and may speed up the delivery of new varieties in cotton by the Core Breeding project.

Market/ end user:

Cotton growers through the Core Breeding and Core Biotech Projects as the primary end users of the research.

Estimated year to uptake by end user:

This project is the development of a revolutionary breeding process in cotton that may not deliver direct outcomes to the breeding program for another decade. However, this approach has the potential to dramatically change how we generate new elite cotton varieties.

Once the appropriate data is collected and protocols are established and validated, the new breeding approach will be deployed by our cotton breeders into their routine crossing and selection procedures.

It is from this point that the outcomes of the GS project will impact variety development.

Executive Summary

The aim of this project is to develop and evaluate a new predictive breeding approach called Genomic Selection (GS).

GS has the potential to revolutionise the way we breed cotton and may speed up the delivery of new varieties in cotton by the Core Breeding project.

The workflow of genomic selection. We input both DNA (genomic) and trait (phenomic, such as yield and fibre quality) information to machine learning (ML) models to mathematically describe how the DNA information influences the traits of interest. Other data such as climate information (environment) can be added into this model. Once the model is developed, DNA information from new breeding lines can be used to estimate their performance, producing improved cotton varieties.

Excellent progress has been made in the GS project. Excitingly, after seven years of research and development, we are aiming to deploy GS in a Single Plant Selection (SPS) population in the 2022/23 cotton season.

Analysis has shown that the prediction accuracy of genomic estimated breeding values (GEBV) for the traits of interest measured at the SPS stage are comparable to the accuracy of observed SPS phenotypes (relative to more accurate phenotype records developed from replicated experiments).

Of note is that the accuracy of GEBVs appear to be of benefit for traits more affected by environment, such as micronaire.

Although the deployment of GS will only be on one population, this development in the GS project is significant!

Progress has also been made in the development of the genomic selection model. The model now routinely accounts for multiple environments and pedigree data. The updated GS model has been tested with genotype and phenotype data from the 2020/21 season.

Our GS model produced prediction accuracies of 0.44, 0.45, 0.33, and 0.17 for fibre length (LEN), strength (STR) and lint percentage (LP), and lint yield (LY), respectively.

These prediction accuracies remain largely unchanged from those observed in the last reporting period. This suggests that without a major increase in the number of characterised lines, accuracies with our existing GS model may be reaching a plateau. Therefore, we believe that that further developing our GS model through incorporating the effect of environment on GEBVs is a critical step in the development of our GS model. Environmental data has been collected form 55 year-site locations, and the selection of environmental variables associated with traits of interest has been undertaken. Presently, a pipeline is being developed to incorporate environment into our GS model.

Research to define the minimum number of SNP markers required for accurate genomic prediction is close to completion. A pipeline based on linkage disequilibrium (LD) pruning has been developed. This work is important as it is central to the cost-effective rollout of GS across the breeding program. A small-scale GM and native trait seed genotyping platform has been successfully developed and is currently being deployed for the first time commercially in the GM and native trait introgression programs for two breeding populations.

Routine milestones around the collection of genotype and environment data have been successfully completed in this reporting period. An additional 1495 breeding lines and cultivars were genotyped, and environment data was collected from 12 of our breeding and evaluation sites.