A,novel,procedure,for,identifying,a,hybrid,QTL-allele,system,for,hybrid-vigor,improvement,with,a,case,study,in,soybean(Glycine,max)yield

来源:优秀文章 发布时间:2023-04-25 点击:

Jinshe Wang,Jianbo He,Jiayin Yang,Junyi Gai

Soybean Research Institute/MARA National Center for Soybean Improvement/MARA Key Laboratory of Biology and Genetic Improvement of Soybean(General)/State Key Laboratory for Crop Genetics and Germplasm Enhancement/Jiangsu Collaborative Innovation Center for Modern Crop Production,Nanjing Agricultural University,Nanjing 210095,Jiangsu,China

Keywords:Breeding by design Diallel hybrid population PLSRGA(partial least squares regression via genetic algorithm)QTL-allele matrix of additive/dominance effect Simulation experiment Soybean[Glycine max(L.)Merr.]

ABSTRACT‘‘Breeding by design”for pure lines may be achieved by construction of an additive QTL-allele matrix in a germplasm panel or breeding population,but this option is not available for hybrids,where both additive and dominance QTL-allele matrices must be constructed.In this study,a hybrid-QTL identification approach,designated PLSRGA,using partial least squares regression(PLSR)for model fitting integrated with a genetic algorithm(GA)for variable selection based on a multi-locus,multi-allele model is described for additive and dominance QTL-allele detection in a diallel hybrid population(DHP).The PLSRGA was shown by simulation experiments to be superior to single-marker analysis and was then used for QTL-allele identification in a soybean DPH yield experiment with eight parents.Twenty-eight main-effect QTL with 138 alleles and nine QTL×environment QTL with 46 alleles were identified,with respective contributions of 61.8%and 23.5%of phenotypic variation.Main-effect additive and dominance QTL-allele matrices were established as a compact form of the DHP genetic structure.The mechanism of heterosis superior-to-parents(or superior-to-parents heterosis,SPH)was explored and might be explained by a complementary locus-set composed of OD+(showing positive over-dominance,most often),PD+(showing positive partial-to-complete dominance,less often)and HA+(showing positive homozygous additivity,occasionally)loci,depending on the parental materials.Any locus-type,whether OD+,PD+and HA+,could be the best genotype of a locus.All hybrids showed various numbers of better or best genotypes at many but not necessarily all loci,indicating further SPH improvement.Based on the additive/dominance QTL-allele matrices,the best hybrid genotype was predicted,and a hybrid improvement approach is suggested.PLSRGA is powerful for hybrid QTL-allele detection and cross-SPH improvement.

In crop production,hybrids and pure lines are the basic types of crop cultivar.Plant breeding is a procedure aimed at combining complementary alleles from adapted parental materials or transferring alleles from specific donors into adapted genotypes to generate composite individuals genetically improved in productivity,quality,and resistance or tolerance to biotic or abiotic stresses.For pure-line cultivars,additive and additive-by-additive epistatic effects of elite alleles are used,while for hybrid cultivars,additive,dominance and the corresponding epistatic effects among alleles are involved.

Cross design and progeny selection are the two keys to crop breeding.A sizable population of parental materials provides potential choices for optimizing cross design,which is usually based on phenotypic information in conventional breeding plans[1].As genomic marker has become increasingly available,marker-assisted selection(MAS)provides a promising approach to direct genotypic selection,making the progeny selection more efficient in plant breeding.Peleman and van der Voort[2]introduced the concept of‘‘breeding by design”based on functional genes and quantitative trait loci(QTL)aimed at choosing parents for potential crosses.Therefore,a relatively thorough detection of the whole-genome QTL-allele constitution is an essential requirement for MAS and‘‘breeding by design”.

QTL mapping with linkage analysis is usually conducted based on segregating populations such as F2,BC(backcross),RIL(recombinant inbred lines),etc.However,in these biparental genetic populations,the genetic basis is relatively narrow,limiting recombination and resulting in limited mapping resolution and efficiency,given that only two alleles per locus are involved.In contrast,in natural populations,multiple alleles per locus have accumulated during their history[3].Accordingly,multi-parent advanced-generation intercross(MAGIC)populations were developed[4,5]for mapping QTL with multiple alleles.Then,association mapping was shown[6–8]to be a promising approach for mapping QTL with multiple alleles using historical linkage disequilibrium between molecular markers and QTL or genes in natural populations or germplasm panels.

For hybrid cultivars,extensive studies have focused on investigation of the mechanism and utilization of hybrid vigor or heterosis.Because hybrid vigor involves both additive and dominance genetic effects[9–11],the genetic dissection of hybrid cultivars involves the contribution of not only additive but dominance and even epistatic effects[9,12].However,previous QTL or gene mapping approaches(both linkage and association mapping procedures)used with natural and biparental populations[13–16]are not directly usable for dissecting the genetic basis of hybrids.An approach for mapping QTL in hybrid populations(hybrid QTL)with multiple alleles per locus and estimating multiple allele effects needs to be established with a mapping population and corresponding mapping procedure.Because hybrid vigor is greatest in the F1generation and decays with generation advance[17,18],a diallel cross[19]F1derived from a group of parents should be the ideal population for mapping hybrid QTL expressing dominance,additive,and corresponding epistatic effects.

To dissect the genetic components of hybrids,Guan[20]proposed an approach for diallel crosses,designated as‘‘locus-group analysis”based on the concept of differential loci among a group of parental materials[21].This method was a pure statistical approach not involving molecular markers and allowed inference of only the number and effects of hybrid QTL,leaving their genomic positions unknown.Li et al.[22]used a biparental mapping population designated as immortalized F2,to which linkage mapping procedures(composite interval mapping[13]and inclusive composite interval mapping[23])could be applied,to identify fiber quality QTL in upland cotton.In rice,Qu[24]developed a backcross RIL population with three parental lines in a NCII design[25]population that was used to dissect QTL for combining ability and heterosis of agronomic traits.In these studies,because the genetic population was derived mainly from biparental materials,the genetic analysis was limited to a biallelic model.In addition,these methods may also be affected by inbreeding depression,possibly obscuring loci influencing hybrid vigor.In hybrid QTL mapping,multiple hybrids or DHP with multiple alleles per locus are involved.Zhang[26]described a single-marker analysis(SMA)method in a diallel rice population,in which one-way ANOVA was fitted for testing the significance of each marker individually.Yang[27]used SMA to test SSR(simple sequence repeat,with multiple alleles)loci associated with F1yield in a diallel hybrid population of soybean.However,SMA can identify QTL individually but not multiple QTL simultaneously.In view of the resulting risk of false positives,multi-locus multi-allele models should be considered for hybrid QTL mapping.

Gai et al.[28]proposed that the detected QTL-alleles and their genetic effects can be organized into a QTL-allele matrix as a compact form of genetic basis of a population.The multiple QTL-allele matrices for target traits of germplasm(breeding material)populations can be established for prediction of optimal crosses.He et al.[6]established the RTM-GWAS(restricted two-stage multi-locus genome-wide association study)procedure to detect QTL with multiple alleles.Using this procedure,QTL-allele matrices[28]were established for germplasm(natural)populations and optimal crosses were designed in breeding for pure-line cultivars.

We wished to develop a procedure to detect hybrid QTL,to establish QTL-allele matrices,not only of additive effect but of dominance effects,to predict optimal hybrids,and based on them to design breeding plans for the improvement of the parental materials of the predicted optimal crosses and thereby their hybrids.To achieve our goal,the key point was to establish a procedure for identifying all QTL along with their allele effects,both additive and dominance effects.

In using linear regression to detect QTL or marker-trait associations,the number of observations is usually much smaller than the number of possible QTL alleles with their corresponding parameters,especially in diallel cross experiments,resulting in an underdetermined multi-locus model.An appropriate variable selection procedure is required to find the best multi-locus model from the huge model space of dense genome-wide markers.Therefore,the present study was aimed at developing a hybrid QTLallele detection procedure under a multi-locus model,verifying its feasibility and effectiveness using simulation studies,and then demonstrating its efficiency in genetic dissection of hybrid vigor in terms of QTL-allele matrix.We proposed the partial least squares regression(PLSR)[29]integrated with a genetic algorithm(GA)[30]in this study for identifying an optimized multi-locus model with the corresponding parameters estimated under reduced matrix rank.From PLSR the corresponding genetic parameters of the candidate QTL or markers can be estimated[31–34],while the optimized multi-locus model(the candidate markers or QTL)is determined from a model selection procedure GA[30,35,36].

2.1.Design and establishment of the PLSRGA(partial least squares regression via genetic algorithm)procedure for mapping QTL in hybrids

2.1.1.General idea and linear model of a diallel cross population

The basic assumptions for genetic model establishment are:(i)diploid segregation,(ii)homozygous parental materials,(iii)additive–dominance genetic model(assuming no epistasis for simplification),and(iv)absence of reciprocal or maternal effects.Assuming that a molecular marker is closely associated with a genetic locus,it can be treated as a QTL for the trait.For a diallel cross population derived from n parents,let A1,A2,...,Aprepresent the p(p≤n)alleles at a locus with additive effects and frequencies of a1,a2,...,apand f1,f2,...,fp,respectively.If the ith(i=1,2,...,n)parent in a diallel cross has genotype ArArand the jth(j=1,2,...,n)parent has genotype AsAs,then the genotype of the F1individual derived from the ithand jthparents is ArAs.Its genetic effect can be defined as.

where drsrepresents the dominance effect of alleles Arand As.A weighted sum-to-zero constraint for additive effects is assumed as∑

kfkak=0.Accordingly,the total genetic effect for multiple loci is the sum of the genetic effect of each locus.

In a diallel cross population,the linear model for a quantitative trait is written as follows:

where yijis the phenotypic value of the F1hybrid derived from the ithand jthparents;μ is the population mean;K is the total number of loci;akis a Lk×1 matrix containing the additive effects of the kthlocus,Lkis the number of alleles at the kthlocus,and Lk≥2;dkis a Pk×1 matrix containing the dominance effects of pairs of alleles at the kthlocus,Pk=Lk×(Lk-1)/2;Xijkis a 1×Lkdesign matrix of additive effects for the kthlocus;Zijkis a 1×Pkdesign matrix of dominance effects for the kthlocus,and its elements are all 0 for homozygous genotypes.

Based on the genetic model(1),the design matrices of additive and dominance effects are determined by the locus genotype.Taking a locus with three alleles(A1,A2,A3)as an example,the additive design matrix for homozygous genotype A1A1is Xijk=[1,0,0],for A2A2is Xijk=[0,1,0],and for A3A3is Xijk=[0,0,1].The design matrix for heterozygous genotype A1A2is Xijk=[1/2,1/2,0](for additive effect)and Zijk=[1,0,0](for dominance effect),for genotype A1A3is Xijk=[1/2,0,1/2]and Zijk=[0,1,0],and for genotype A2A3is Xijk=[0,1/2,1/2]and Zijk=[0,0,1].

Model(2)can be written further in matrix form:

where y is a column vector of length n(n-1)/2 containing phenotype observations;b is a row vector containing the overall mean and additive and dominance effects;X is the design matrix for the overall mean and additive and dominance effects;e is a column vector of random errors.

2.1.2.Parameter estimation with PLSR

As the number of observations in diallel cross is generally less than the number of parameters to be estimated in multi-locus model(3),ordinary least-squares regression cannot be readily used for parameter estimation.In the present study,the PLSR was used and only one response variable(y)was modeled,corresponding to univariate PLSR.The basic idea of PLSR is that y and X are modeled indirectly by a set of latent components(columns of matrix T).

where T is a N×k matrix in which the k columns represent k latent components and N is the number of observations;P is a p×k matrix of X loadings,and q is a k×1 column vector of y loadings,where p is the number of columns of X;E is a N×p matrix of random error,and e is a N×1 column vector of random errors[29].

The main step of PLSR is to estimate latent components as a linear combination of X columns,i.e.,T=XW,by maximizing the covariance between the latent components and the response.W is a p×k matrix of weights to be estimated.In this study,the widely used SIMPLS(a straightforward implementation of a statistically inspired modification of the PLS method according to the simple concept)algorithm[37]was used by default for finding latent components.However,other PLSR algorithms can also be applied.Once T is estimated,q is obtained as the least-squares solution q=(T’T)-1T’y and then the regression coefficient is estimated as b=Wq[29].A key issue in PLSR is how to determine the number of latent components[38].In the present study,the widely used cross-validation(CV)method was used to test the predictive significance of each latent component.CV is performed by dividing the diallel cross population into V approximately equalsized sub-populations,and then one sub-population at a time is excluded for validation while the remaining data are used to build the model.Once the model is established,the predicted values of the excluded(validation)sub-population are calculated.The squared difference between the observed and predicted response is calculated and aggregated to give the predictive sum of squares(PRESS).This step is replicated V times so that each subset is excluded exactly once.This procedure is also called V-fold cross validation.The PRESS statistic for a model including h latent components is defined as.

The CV criterion for inclusion of the hthlatent component is defined as.

where RSSh-1represents the residual sum of squares of the model including h-1 latent components based on all the samples.When Q2≥1-(0.95)2=0.0975,the hthlatent component will be added into the model.

2.1.3.QTL detection using GA

With dense genome-wide markers,tight linkage between genetic loci and markers is assumed.The problem of detecting multiple QTL is simplified to finding a subset of markers,rather than a single marker,best predicting the phenotype.The selection status of each marker can be represented by 1(in the model)or 0(not in the model).Thus,the final multi-locus model in term of selection status of all markers is a vector consisting of binary elements 1 and 0,and there are 2mpossible models,where m is the number of markers.In the present study,GA was used for model selection.In GA,a possible model in term of selection status of all markers(a vector of 1 and 0)is treated as a haploid individual,and the selection status of each marker is treated as a biallelic genetic locus(allele of 1 or 0).The performance of a model is treated as the fitness of the individual,and the whole GA process as a population contains a set of individuals(models)with differing fitness values.GA starts with a randomly defined population,after which the population evolves for a number of generations under selection for fitness.In each generation,progeny are derived from pairs of appropriate individuals(models)by recombination and mutation.The final model is selected as the individual with best fitness in the final generation(Fig.S1).

In each generation of GA,three major operations are involved.(i)Selection.The root mean square error(RMSE)was used to measure the fitness of an individual(a model),and lower RMSE indicates better fitness.The RMSE is defined as the square root of the mean square error in a regression model.Once the RMSE of each GA individual(model)is estimated,half of the individuals with lowest RMSE are chosen to survive and used to generate the next generation.(ii)Reproduction(crossover,recombination).Two individuals are selected randomly from survivals to produce a new individual(model)via meiosis where a crossover position is simulated randomly.(iii)Mutation.Random mutation is applied to each individual(model)to avoid local optima.A small mutation rate is applied to children to randomly change the status of one locus(1 to 0 or 0 to 1).

The processes of selection,reproduction,and mutation are repeated until the number of new individuals equal the number of individuals in the survival pool,so that the population size(S)in GA remains unchanged.In the present study,the initial population size was set equal to the number of markers(m)when it was even,or m+1 when it was odd,given that half of the individuals of each current generation are chosen to survive and used to generate the next generation[39].

GA is considered to have converged if most of the individuals in the last generation are identical.In the present study,average entropy was used as a convergence criterion[39].Let fikdenote the status(taking a value of 1 for in the model,0 for not in the model)of the kthmarker(locus)for the ithindividual(model),anddenotes the value at the kthmarker averaged over S individuals.If all S individuals are identical in a single GA generation at the kthmarker,thentaking a value of 1 indicates that the marker is selected in the model,while 0 indicates that the marker is unselected.The mean entropy of a collection of S individuals is calculated as.

As GA is mainly a stochastic algorithm,in order to obtain robust and definitive solutions with the method,it is suggested by Leardi[40]that in practice at least five runs should be performed for each data set.

2.1.4.The computer software and major steps of PLSRGA

According to the above procedure,an R[41]package PLSRGA was developed for hybrid QTL-allele detection in both diallel cross design and North Carolina II(NCII)mating design.It consists of the following four steps:

Step 1:According to(1)and(2),the design matrices of additive and dominance effects of each polymorphic locus are constructed.Step 2:According to(3)and step 1,for marker/QTL identification,the initial GA population is generated randomly with the population size as the marker number and each marker selected at least once in the population.The GA individual in the population follows a regression model,in which the dependent variable is the phenotype and the independent variable is the design matrix.(i)The fitness of each GA individual in turn is estimated by PLSR.(ii)The operation(crossover,mutation,and selection)of GA is performed iteratively on the GA population until the population size is equal to that of the initial population.(iii)The stopping criterion is estimated according to(1.5),and one generation of GA is completed.Procedures(i),(ii)and(iii)are repeated until the average stopping criterion of the last five generations is below the preset criterion(0.05 in the present study).Step 3:In parameter estimation,PLSR is used to estimate the additive effect,dominance effect,and contribution proportion of each QTL locus.Step 4:From the detected QTL with their allele effects,additive and dominance QTL-allele matrices are established for further genetic improvement plans of hybrids as well as their parental lines.

2.2.Model validation by simulation

To evaluate the power of the PLSRGA procedure,a F1hybrid population of a type II diallel cross[19],including 10 parents and their 45 possible crosses was simulated.If p alleles are present at a locus,then p(p+1)/2 genetic parameters(p additive effects and p(p-1)/2 dominance effects)are to be estimated.The alleles at each locus were generated randomly according to the corresponding preset allele numbers p(2≤p≤n)and allele frequencies,where n is the number of parents.Thus,the simulated parents carry distinct alleles when the number of alleles at a locus equals the number of parents,and some parents may carry the same allele when p<n.

QTL effects were simulated by specifying the contributions of additive and dominance variance to the total phenotypic variance at each locus.The genetic variances were specified with reference to a randomly mating population from which the 10 inbred founder parents had been extracted.In simulation experiments,the total genetic variances of the QTL in the reference population contributed 80 % of the phenotypic variance,leaving a residual error variance of 20 %.The genetic effects were generated as follows:for each QTL,the additive effects,denoted as ai(i=1,2,...,p)corresponding to the ithallele carried by one of the founder parents,were sampled from a standard normal distribution.The effects were standardized to have mean zero and variance equal to half of the specified one for the locus.Thus,a QTL with those alleles at frequencies fiwere generated with the specified additive variance.The QTL variances in the simulation were continuously fixed from one simulation to the next and equal to the specified variances.For each QTL,the dominance effects,denoted as dij(=1,2,...,p)and dij=dji,were sampled from a standard normal distribution.Each effect corresponded to the difference between the value conferred by a given diploid genotype and the value predicted by the additive effects of the genotype’s two alleles.These effects were standardized so that∑jdij=0 for any j andfor any i,and the variance of dijwas equal to that specified for the QTL.The consequence of this sampling scheme is that the variance of the dii,called the variance of homozygous dominance deviations,is expected to be equal to the variance of the dijas a whole,and the covariance between additive effects and homozygous dominance deviations is expected to be zero.After the genetic effects had been sampled,the genotypic value of each progeny was generated by first simulating the progeny’s QTL genotype and then summing all the genetic effects associated with that genotype.

Finally,500 markers and 10 unlinked QTL were simulated based on this scheme.As SMA performs an association test on a singlemarker basis,the cumulative contribution of the detected QTL may be inflated obviously due to the correlations among neighboring loci.Accordingly,to make SMA comparable with PLSRGA,only unlinked QTL were considered in the simulation study.All the markers were multi-allelic and informative in all crosses.The simulation procedure was performed using two plans,one repeated 1000 times and the other only once.All the 55 entries,including the 10 parents and their 45 F1s,comprised the population following the type II diallel cross design[19,41].

2.3.A case study of a hybrid QTL-allele system for genetic improvement of hybrid vigor of soybean yield

2.3.1.Materials and field experiments

The data set of our previous diallel cross experiment[27,42]in soybean(Glycine max)was used.In that experiment,eight soybean cultivars(seven from China and one from the USA,Table S1)were crossed with one another to form a type II diallel cross[19]composed of 28 crosses and 8 parents.These materials were tested in a randomized complete block experiment,with one-row plots,4 m long with 0.4 m row spacing,in 3 replications at Huaiyin Institute of Agricultural Sciences(33°38′N,119°09′E),Jiangsu province,China,in summer,2002–2004.In each plot,the cultivar Zhongdou 20 was sown in the flanking rows as a guard row in a plot to maintain a uniform environment for the hybrids.At maturity,the seeds of individual genotypes were harvested and dried to constant moisture for yield measurement[42].

2.3.2.SNP genotyping and SNPLDB construction

Restriction site-associated DNA sequencing was used for single nucleotide polymorphism(SNP)marker genotyping of the eight parents along with a Chinese germplasm population of 1024 soybean accessions.Sequencing was performed at BGI Tech,Shenzhen,Guangdong,China.A total of 100,904 SNPs(Table S2)was identified for the eight parents.For the diallel cross population,there could be multiple alleles at each locus.To match the requirement,a marker type with multiple haplotypes should be used.According to Meng et al.[16],He et al.[6]and Li et al.[43],the multi-allelic marker-type termed SNPLDB(SNP linkage disequilibrium block)was used as a genomic marker for soybean germplasm population.Although SNPs are usually distributed unevenly in the genome,the tight and loose linkage between neighboring SNPs leads to a blocklike structure of genomic sequence that may be transmitted together to offspring.The different combinations of linked SNPs in a block were assigned as multiple haplotypes or alleles.Because the material used in the present study was limited,SNPs were grouped into SNPLDBs following the pattern of the 1024 germplasm accessions,revealing 18,512 SNPLDBs in the eight parents.

2.3.3.QTL mapping and establishment of additive and dominance QTLallele matrices

Analysis of variance was performed for the yield dataset of the diallel hybrid population(36 entries,3 replicates in 3 years)using the SAS/STAT 9.4 software(SAS Institute Inc.,Cary,NC,USA)to test differences among genotypes and genotype-by-environment interactions.Then the hybrid QTL with their alleles were detected using the PLSRGA package based on SNPLDBs.The additive and dominance genetic effects of the alleles were estimated and then organized into the respective additive-effect and dominance-effect QTL-allele matrices.The QTL-allele matrices were used for developing efficient breeding plans,including choices of potential crosses and improvement plans of the hybrid parents.

2.3.4.Candidate gene annotation

The candidate genes for the detected QTL were annotated using the following procedures.(i)Candidate gene(s)in a QTL region were found by comparison of the sequences with those in SoyBase(https://soybase.org).(ii)Annotated genes within the detected QTL,or its flanking SNPLDBs if no gene was located in the QTL,were retrieved.(iii)Candidate genes from the annotated genes were identified using a chi-square test of association between the SNPs in SNPLDBs and those in annotated genes at the significance level of 0.05.A candidate gene was accepted when the SNPs in a SNPLDB were significantly associated with those in the candidate gene.

2.4.Data availability

The genotype and phenotype data are available in the Supplemental Materials.The software code of PLSRGA procedure is available at https://github.com/njau-sri/PLSRGA.

3.1.Simulation results confirmed the superiority of the PLSRGA procedure over the SMA procedure

In SMA results from 1000 simulation runs,71 loci with 400 alleles and 1476 genetic effects were detected at P<0.05,including the preset 10 loci with their 55 alleles and 186 genetic effects.The detection proportions among 1000 replications of the 10 preset loci were all>0.900 except for one locus(0.060 for M160).The detection proportions of the 61 false positives ranged from 0.530 to 0.090(Table S3).All 71 loci contributed 36.09 times the phenotypic variance(PV,or 3609%)whereas the preset 10 loci contributed 4.85 times the PV(485%),far exceeding the heritability of 80.00%.If only a single simulation(imitating a real experiment)was considered,39 loci were detected with only 9 preset loci included and the total contributions were respectively 18.96(1896%)and 4.38(438%).Thus,SMA yielded an extremely high false positive rate even though the preset loci were detected.In PLSRGA results from 1000 simulations,13 loci with 78 alleles and 286 genetic effects were detected,covering the preset 10 loci with 55 alleles and 186 genetic effects.The detection proportions among 1000 replications were 0.874–0.987 for the 10 preset loci and 0.137–0.159 for the three false-positive loci.The mean false-positive rate was 4.54% with 95% confidence interval 0–9.09%.All the 13 loci contributed 0.82(82%)to PV,slightly more than the preset heritability value of 0.80(or 80%).If only a single simulation was considered,11 loci with 62 alleles and 214 genetic effects were detected,covering all the 10 preset loci with 55 alleles and 186 genetic effects,and the total contribution was 0.80(80%),including 0.79(79%)for the 10 preset loci.All the genetic parameters of the 10 loci were estimated completely and the preset and estimated genetic contributions were approximately equal(Table 1).Thus,the simulation results demonstrated the superiority of the PLSRGA procedure over the SMA procedure,even using a single simulation,which is equivalent to a real experiment.

3.2.Detection of yield QTL-allele system in the diallel hybrid population using PLSRGA

According to Yang and Gai[42],the yields of the F1hybrids varied from 1740.0(P2×P6)to 2857.2 kg ha-1(P6×P7)on average over three environments,with an overall mean of 2220.6 kg ha-1(Tables S4,S5).The top three crosses among the 28 crosses were P6×P7,P1×P7and P3×P5,and the yields were 2857.2 kg ha-1,2749.7 kg ha-1and 2676.6 kg ha-1,with the heterosis values superior-to-parents(or superior-to-parents heterosis,SPH)of 43.47%,23.02% and 25.60%,respectively(Table S5).The ANOVA results(Table S6)showed significant differences(P<0.01)in yield among crosses(genotypes)and genotype-by-environment interaction(GEI),while the variance among F1lines was 2.68 times greater than that of GEI[42].

In the following text describing genetic analysis of hybrid yield,the unit for all the allele effects(kg ha-1)is omitted for simplicity.

Using PLSRGA with a QTL×environment interaction(QEI)model,31 QTL with 154 alleles were identified(Table 2;Fig.1A).Of these,28 QTL with 138(82 negative plus 56 positive)alleles were main-effect QTL,and 9 QTL with 46(26 negative plus 15 positive)alleles were QEI QTL.Of the 9 QEI QTL,6 were shared and 3 not shared with main-effect QTL.The genetic variance of the QTL was separated into additive and dominance variances,(1020.33 vs 2427.34),indicating that dominance variance was the major contributor to the hybrid population(Table 2).The QTL-allele results are summarized in Table 3.The 28 main-effect QTL contributed 58.04% to phenotypic variance(PV),including 16 large-contribution(≥2%)QTL accounting for 43.53% PV and 12 small-contribution(<2%)QTL accounting for 14.51% PV.The 9 QEI QTL contributed 21.43% PV,including 5 large-contribution QTL accounting for 16.02% PV and 4 small-contribution QTL accounting for 5.41%PV.The trait heritability was 62.69%for main effect and 35.65% for GEI.Thus,4.65%(62.69%–58.04%)was the genetic variation due to the unmapped minor QTL and 14.22%(35.65%–21.43%)was the genetic variation due to the unmapped minor QEI QTL.This finding implied that more QTL might be separated from the collective unmapped minor QTL if experimental precision could be increased.All the 31 main-effect and QEI QTL were located on 13 chromosomes(Table 2;Fig.1A).

Because the environment factor‘‘Year”was a random factor whose exact conditions were undefined,the further analysis will be focused on the main-effect QTL system only,or in other words,on the hybrid-population genetic components under average year conditions rather than individual years.

3.3.Additive-and dominance-effect QTL-allele matrices of the diallel hybrid population

From PLSRGA results for the main-effect QTL system,two sets of genetic effects(Tables S5,S6,S7,and S8):additive effect of eachallele at each locus and the dominance effect of a pair of alleles at each locus,were estimated and organized into respectively an additive-effect QTL-allele matrix(Fig.1B)and a dominanceeffect QTL-allele matrix(Fig.1C).A third matrix is the composition of the above two matrices;that is,the locus genotypic effect matrix of the population,including all parents and hybrids(Fig.1D).

Table 1 Identification of the scheduled QTL in the simulation experiment using PLSRGA approach.

Table 2 QTL for hybrid yield in the soybean DHP and corresponding information.

Table 3 Summary of the detected significant QTL-allele system influencing hybrid yield in the diallel F1 population of soybean.

Fig.1.Positions of detected QTL and corresponding additive,dominance,and genotypic matrices of the diallel hybrid population.(A)Chromosome position of the identified QTL influencing soybean hybrid yield.The blue vertical bars represent SNPLDB marker positions(or intervals for SNPLDB markers composed of multiple SNPs)and red vertical bars show the detected hybrid yield QTL positions(or intervals for SNPLDB markers composed of multiple SNPs,and bars for single SNPs are made wider for clarity)in the diallel hybrid population.The detected 31 QTL are scattered over 13 chromosomes,omitting chromosomes 1,4,5,10,12,19 and 20,and 12 QTL lie on chromosome 15.(B)The QTL-allele additive effect matrix of the 8 parents in the diallel hybrid population.The additive loci are on the vertical axis and parents on the horizontal axis,with gray color denoting negative effect,blue denoting positive effect,and color intensity showing effect size.(C)The QTL-allele dominance effect matrix among the 8 parents in the diallel hybrid population.QTL are on the vertical axis and crosses on the horizontal axis;the colors are as in B.(D)The genotypic matrix of the diallel F1 population.Loci are on the vertical axis and hybrids(including parents)on the horizontal axis;the colors are as in B.

The additive effects of the 138 alleles of the 28 QTL varied from-50.0 to 16.9(Table 4).As shown in Fig.1B,parent P3carried the most alleles with positive additive effect and P1took second place.The alleles carried by parents P6and P8were mostly of negative additive effects.Among the 275 dominance effects for the identified 28 QTL,there were 143 positive and 132 negative dominance effects(Table S7).The dominance effects ranged from-43.64 to 41.04(or from-206.1 to 90.7 expressed in dominance degree,Table S8).The allele pair A1A2of qyGm17-28 had the highest dominance effect 41.0 among the 28 loci,while A2A4of qyGm14-13 had the smallest,-42.9(Table S7).

There are three categories of allele-pair effect:overdominance(OD),partial or complete dominance(PD),and homozygous additive(HA)and six subcategories of allele-pair effect,OD+,OD-,PD+,PD-,HA+,and HA-in the diallel hybrid population,if positive and negative effects are distinguished.Total numbers of loci with OD+,OD-,PD+,PD-,HA+and HAwere 238,186,104,124,17 and 115,respectively in the diallel hybrid population without parents.There were multiple allele pairs for a single locus under the multi-allelic model,among which the best allele pair was usually heterozygous but might occasionally be homozygous,such as the A1A1allele pair of the qygm08-9 locus(Table 4).The additive and dominance effect QTL-allele matrix may provide a basis for inferring the genetic structure of elite hybrids or the genetic mechanism of heterosis superior to parents.

Table 4 Additive and dominance effects of 28 main-effect yield QTL in the diallel cross population.

3.4.Genetic mechanism of superior-to-parents heterosis

Plant breeders are interested in hybrid cultivars with high SPH.Among the 28 crosses,26 crosses showed SPH(cross-SPH),indicating that crosses with SPH accounted for a large portion of the diallel hybrid population(Table S3).To elucidate the genetic mechanism of cross-SPH,the allele-pair effect types at each locus for each hybrid are summarized in Table 5.The mean numbers of OD+,OD-,PD+,PD-,HA+and HA-were 17.0,0.7,3.3,4.3,1.7,and 1.0 loci in the three best crosses with cross-SPH(P1×P7,P3×P5and P6×P7)and those were 3.0,15.0,0,2.5,2.5,and 5.0 loci in the two crosses without cross-SPH(P1×P4and P2×P5),respectively.Obviously,the crosses with strong cross-SPH had more OD+and fewer OD-,some more PD+and some fewer HA+and HA-loci.With respect to the mean number of loci with best allele pair in a cross,the numbers of best genotype with OD+at a locus(BGOD),best genotype with PD+at a locus(BGPD)and best genotype with HA+at a locus(BGHA)were respectively 7.3,0.7,and 0.3 in the three best crosses with strong cross-SPH and 1.0,0.5,and 0.5 in the crosses without cross-SPH.The crosses with best cross-SPH had many more BGOD loci than the crosses without cross-SPH.BGPD and BGHA contributed some phenotypic variation to cross-SPH but the contribution was not large(Table 5).

Among the 28 loci,four homozygous additive allele pairs exceeded the highest dominance effect at locus level:A1A1of qyGm08-9,A3A3of qyGm-15–17,A3A3of qyGm-15–18,and A3A3of qyGm-15–24.Their respective contrasts were 11.5 HA effect vs 8.2 OD effect,6.6 HA effect vs 6.2 OD effect,6.6 HA effect vs 6.2 OD effect,and 6.6 HA effect vs 6.2 OD effect,although these cases are rare in corresponding parental cross(Tables 4,S9).

From the above results,the genetic mechanism of cross-SPH is mainly OD+,followed by PD+and then HA+.Thus,the best hybrid should have the best genotype at all the loci,which may consist of the best combination of OD+,PD+,and HA+loci depending on the parental materials.However,as in the present soybean hybrid population,a best hybrid should maintain a high portion of OD+loci,while the PD+and HA+are also important for some loci.

3.5.Genetic design for the improvement of superior-to-parent heterosis of hybrids

The genetic structure of cross P6×P7(with the highest yield among the 28 crosses)is described as the allele composition at each locus(Table 6),from which the locus genotypic values along with their total genotypic value were obtained as 2535.6.The best genotype and the corresponding best genotypic value with additive and dominance effect for each locus are also listed(Table S9).This composite genotype is expected to be the best potential genotype in the genetic system of the eight parental materials,with its genotypic value of 2688.7(with unmapped minor QTL not yet included).The cross P6×P7was chosen as the best cross in the hybrid population because more OD loci were included,with the OD+:PD+:HA+locus composition being 20:7:1.Among them,12 loci were of the best genotypes for their corresponding loci,but 16 loci did not carry their best genotypes,so that P6×P7still has room for improvement(2688.7–2535.6=153.1)to reach the theoretically best cross genotype.

To achieve the best cross genotype(23 OD++1 PD++3 HA+,with as many as 3 best HA+),the approaches for improvement of cross P6×P7are shown in Table 6.For example,at locus qyGm02-2,the allele A3carried by parents P3 or P4 can be introduced into P6×P7,so that its allelic combination may be changed from A1A2to A2A3and its genotype value increased from 4.6 to 11.3.There are 16 loci with room for improvement,among them some having larger genetic potential(Table 6)that can be improved first.If all the locus-genotypes can be improved and the final hybrid genotype reaches the ideal one,the maximum genotypic value will be 2688.7 kg ha-1(omitting unmapped minor QTL).Thus,QTL-allele dissection of the diallel hybrid population has provided a way to establish the locus genotype and effect matrices of the diallel hybrid population,from which a genetic improvement plan can be designed for each of the crosses.It thus makes breeding by design possible for hybrids.

Table 5 Numbers of locus-effect types and of best locus-effect types in the hybrids.

Table 6 The genetic structure of the best hybrid P6/P7 and its improvement plan.

3.6.Candidate genes annotated

According to SoyBase(https://www.soybase.org),from the position of the 28 QTL of soybean hybrid yield,258 possible candidate genes were annotated.After chi-square testing for SNP association between the annotated genes and sequence segments in the parental materials,155 candidate genes at the 28 loci with 58.04%contribution to phenotypic variance were identified.GO enrichment analysis showed that 128 of these were associated with 44 GO groups with functions in biological processes,molecular functions and cellular components,including nucleus,cytoplasm,plasma membrane,mitochondrion,and others(Fig.S2),and the remaining 27 candidate genes were of unknown function.This result showed that candidate genes with differing functions worked together as an interrelated gene network contributing to yield hybrid vigor in soybean,its detailed genetic mechanism is for further studies.

4.1.Identification of hybrid QTL-allele constitution and superiority of PLSRGA over SMA approach

PLSRGA was established to identify QTL with their alleles in diallel hybrid population under a multi-locus multi-allele genetic model combining GA for marker selection and PLSR for parameter estimation.The procedure models all markers simultaneously in a single linear model by using PLSR to resolve the contradiction between a large number of parameters to be estimated and a small number of observations provided.Accordingly,the total contribution of the detected QTL was confined to the trait heritability value,avoiding the expansion of phenotypic contribution that occurs in SMA.This was demonstrated by the simulation experiments,which indicated that all the preset loci,genetic effects and contributions were identified properly,with false positives rare in 1000 simulation runs and even in a single simulation(imitating an experimental observation).This result was completely different from that of the SMA simulation experiment,where a large number of false positives appeared in both 1000 runs and a single simulation run,owing to inflation of the false positive rate and the estimated contribution proportion by interference among neighboring loci[44,45].The PLSRGA accurately detected all 10 preset loci and their alleles,with a low false positive rate,completely and exactly estimated the preset genetic parameters,and estimated contribution proportions close to the preset values.The reason for PLSRGA superior to SMA is that a multi-locus model in addition to PLSR with GA was used for PLSRGA,controlling neighbor influences and confining the total genetic contribution within the heritability value,avoiding the large numbers of false positives and overflowing heritability occurring in SMA without this control.

The superiority of the PLSRGA over the SMA method was further demonstrated by the experiment using real data.In the diallel hybrid population composed of eight soybean parents[27],there was a dataset composed of 36 entries×3 replications per year×3 years=324 plots,from which 28 main-effect QTL with 138 alleles and 9 QEI QTL with 46 alleles were identified.All the allele additive and dominance effects of the main-effect QTL were estimated.The total genetic contributions to PV for main-effect QTL and QEI QTL were estimated as 58.04% and 21.43%,close to but less than the corresponding heritability values of 62.69% and 35.65%.Using SMA,4600 QTL and 10,335 alleles with 131562%contribution to PV were estimated,extremely unreasonable figures(data omitted).The PVE of detected QTL in SMA may be inflated by the correlations among neighboring loci.This is a pitfall of SMA and may also cause false positives.It is more reasonable to explicitly include multiple loci in the statistical model[46].In this study,PLSRGA was based on a multi-locus model,and multiple QTL were fitted in a single model,such that individual QTL PVs are additive,and the total PVE will be<1(100%)and even less than the heritability.

4.2.Genetic mechanism of cross-SPH under multi-allelic model

Under the genetic model ignoring multiple alleles,the genetic mechanism of cross-SPH is due to OD+at some loci plus some complementary PD+and HA+among loci[12,42].However,under the multi-allele genetic model,a locus with OD+is not necessarily a best locus-genotype to contribute to cross-SPH because several heterozygous genotypes with OD+may be present at the same locus,with some contributing to cross-SPH and some not,and the case is similar for PD+and HA+.Among the allele pairs at a locus,plant breeders are most likely interested in those with higher or highest OD+,PD+,and HA+.In the diallel hybrid population,of the 728 loci of the 26 hybrids with cross-SPH,232 OD+,104 PD+,and 12 HA+,for a total of 348 or 47.8%of loci,contributed to cross-SPH,with 71 loci carrying the best allele pairs in respective locus which should be the key for cross-SPH in this population.Thus in the genetic mechanism of cross-SPH under the multi-allelic model,the genetic effect of OD+is the most important,followed by PD+and then HA+,in addition to complementation among loci.In fact,in the previous study[42],the function of HA+in cross-SPH was often neglected,whereas in the present study,taking the P6×P7hybrid as an example,4 loci of 28 loci could be further improved using homozygous HA+effects.Thus,the relative proportion of OD+,PD+,and HA+for the best hybrid depends on the group of parental materials,when a multiple-allele model is used in breeding programs.

4.3.Genetic dissection of diallel hybrid parents and implications for breeding by design for hybrids

By PLSRGA analysis,two genetic-effect matrices:an additiveand a dominance-effect matrix[28],were estimated,allowing the construction of a locus-genotypic effect matrix.The three matrices are compact genetic structures describing parental materials and hybrids,from which the best cross genotypes can be designed and their breeding or genetic improvement procedures can be designed.If the genetic dissection of a diallel hybrid population has been performed using genome-wide markers,the resulting genetic-effect matrices and the locus-genotypic effect matrix are expected to contain all the genome-wide genetic information,so that cross selection and design are performed at the genomewide level.Thus,the PLSRGA represents a novel genomic approach to‘‘breeding by design”for hybrid cultivar development.

Genetic dissection of a group of parental materials requires testing all possible crosses,as shown in the present study,where 28 hybrids plus 8 parents,for a total of 36 materials,were tested.If more parental materials are involved in a study,many more hybrids will have to be tested.Then the question is how to obtain the necessary genetic information while minimizing the expense of testing hybrids.One solution might be using a partial diallel cross design[47],and another might be using the previous results to establish the locus-genotypic effect matrix for designing potential crosses.Further knowledge,such as epistatic effects and more parental cross,should also be incorporated for hybrid improvement,given that in the present genetic model only additive and dominance effects were included.For exhausting all the genetic information,whether epistasis should be included and how important this additional information may be remaining to be addressed,even though our previous studies showed epistasis to be of less importance than additive and dominance effects[42].These should be our next targets for further improvement of hybrid QTL mapping technology.

CRediT authorship contribution statement

Jinshe Wang:Methodology,Software,Data curation,Formal analysis,Validation,Writing-original draft,Writing-review &editing.Jianbo He:Validation,Writing-review & editing.Jiayin Yang:Data curation,Investigation.Junyi Gai:Conceptualization,Methodology,Funding acquisition,Writing-review & editing.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by the National Key Research and Development Program of China (2021YFF1001204,2017YFD0101500),the MOE Program of Introducing Talents of Discipline to Universities(‘‘111”Project,B08025),the MOE Program for Changjiang Scholars and Innovative Research Team in University(PCSIRT_17R55),the MARA CARS-04 Program,the Jiangsu Higher Education PAPD Program,the Fundamental Research Funds for the Central Universities(KYZZ201901),and the Jiangsu JCICMCP Program.

Appendix A.Supplementary data

Supplementary data for this article can be found online at https://doi.org/10.1016/j.cj.2022.04.019.

推荐访问:allele System vigor
上一篇:郭予元院士在小麦害虫预测和治理上的学术贡献——纪念郭予元院士诞辰90周年
下一篇:《堂吉诃德》与“《三体》三部曲”比较研究

Copyright @ 2013 - 2018 优秀啊教育网 All Rights Reserved

优秀啊教育网 版权所有