Genome Organization in Dicots: I. Genome Duplication in Arabidopsis and Synteny Between Soybean and Arabidopsis


David Grant1, Perry Cregan2 and Randy C. Shoemaker1
1USDA-ARS Corn Insect and Crop Genetics Research Unit, Department of Agronomy, Iowa State University, Ames, Iowa 50011
2USDA-ARS Soybean and Alfalfa Research Unit, BARC-West, Beltsville, MD 20705

Proc. Nat. Acad. Sci. (USA) ??:????-???? (2000)

ABSTRACT

Synteny between soybean and Arabidopsis was studied using conceptual translations of DNA sequences from loci which map to soybean linkage groups A2, J and L. Synteny was found between these linkage groups and all four of the Arabidopsis chromosomes where Genbank contained enough sequence for synteny to be confidently identified. Soybean linkage group A2 (soyA2) and Arabidopsis chromosome I showed significant synteny over almost their entire lengths, with only two to three chromosomal rearrangements required to bring the maps into substantial agreement. Smaller blocks of synteny were identified between soyA2 and Arabidopsis chromosomes IV and V (near the RPP5 and RPP8 genes) and between soyA2 and Arabidopsis chromosomes I and V (near the PhyA and PhyC genes). These subchromosomal syntenic regions were themselves homoeologous, suggesting that Arabidopsis has undergone a number of segmental duplications or possibly a complete genome duplication during its evolution. Homologies between the homoeologous soybean linkage groups J and L and Arabidopsis chromosomes II and IV also revealed evidence of segmental duplication in Arabidopsis. Further support for this hypothesis was provided by the observation of very close linkage in Arabidopsis of homologs of soybean Vsp27 and Bng181 (three locations) and purple acid phosphatase-like sequences and homologs of soybean A256 (five locations). Simulations show that the synteny and duplications we report are unlikely to have arisen by chance during our analysis of the homology reports.

Results of TBLASTX Analysis

For this study 23 clones from which simple sequence repeats (SSR) markers on linkage group J were derived and all available RFLP probes that mapped to soybean linkage groups A2, L, and J were sequenced. These soybean linkage groups were chosen as being representative of a densely populated map (A2) and a pair of homoeologous linkage groups (J and L). The table shows the number of matches to Arabidopsis BACs identified by TBLASTX analysis.

Linkage Group A2Linkage Group JLinkage Group L
RFLP probe# of hits to ArabidopsisRFLP probe# of hits to ArabidopsisRFLP probe# of hits to Arabidopsis
A065manyA0602A02312
A085manyA1991A07112
A0961A2041A1063
A110many cDNAsA2331A132-10
A1110A36326A132-39
A1173A4503A1691
A1360A7240A2640
A1708B0322A4595
A25611B0740A4610
A2620B1017A4893
A4860B12212A5371
A5050B1666B0461
A5100G81513B1241
A57212K3754B1621
A6384K3841B1641
A690manyL050manyB1740
A9750mO1092K3853
B1324BNG440R00113
K4172BNG630R2010
K6362BNG1791BNG717
P0031Sct_0466BNG882
R02818  BNG953
T0362    
T1532    
BNG770    
BNG1216    
BNG1816    
BNG2051    
BNG2251    
VSP275    


Simulations

The general algorithm we used can be summarized as The exact numbers we used for each simulation and the results we obtained are summarized below.

Figure 1 - soyA2 is syntenic with arabI
N = 4 (we are considering synteny between whole chromosomes and 4 Arabidopsis chromosomes were analyzed)
There were 14 soybean sequences that had a total of 84 homologies somewhere in the Arabidopsis genome. Each of these loci had one homology on arabI. The loci and the number of matches each had were

Bng1216            R02818
Vsp275T0362
A6384Bng2251
P0031A57212
T1532Bng1816
A0961A25611
A1173I7

10,000 simulations were run. In 23 of them (0.0023) 1 of the 4 chromosomes contained all 14 homologies.

Figure 2 - regions of arabIV, arabV and arabI are homoeologous
There were three soybean sequences that had a total of 37 homologies somewhere in the Arabidopsis genome. Each of them had one homology in the putatively duplicated regions on the three Arabidopsis chromosomes and two of them (R028 and Vsp27) had two homologies on arabV. In addition, each chromosomal region contained a phytochrome gene. The loci and the number of matches each had were

PhyX   5
Vsp275
A572 12
R02818

The putatively homoeologous regions on arabI, IV and V had sizes of ~23 cM, ~23 cM and ~46 cM, respectively. The total size of the four Arabidopsis chromosomes was ~460 cM.

The simulated genome was divided into 20 bins (460/23=20). The 42 loci were randomly placed into these bins and this result saved. Adjacent pairs of bins were combined to yield 10 large bins (460/46=10) and these results saved.

A simulation was considered to match our results if all 6 loci were found in 1 of the 10 large bins AND there were 2 of the smaller bins which contained 1 copy of each of the 4 loci.

10,000 simulations were run. 360 of them (0.036) of them matched our results. There was also an R gene in each of the three homoeologous regions. This locus was not included in the simulation as it was not clear how many such loci should be randomly distributed in the simulated genome. However, intuitively it would seem that their presence in our results lends support to our conclusions.

Figure 4 - regions of arabII and arabIV are homoeologous
The putatively homoeologous regions were each approximately 10 cM. This results in N = 46 (460/10=46).

Three soybean sequences had a total of 34 homologies somewhere in the Arabidopsis genome. Each of these had one homology in each homoeologous region. A single BAC in each region contained one copy each of two of these three but this fact was not considered in the simulation. The loci and the number of matches each had were

A363   26
Sct_0466
A0602

10,000 simulations were run. In 19 of them (0.0019) 2 of the 46 bins contained all 3 homologies.

Figure 5 - small regions have been duplicated in the Arabidopsis genome
There were 638 BACs in the four Arabidopsis chromosomes analyzed. The A256/PAP and Bng181/AP duplications were considered separately.

A256/PAP
There were five instances where these loci were linked in Arabidopsis with 1-7 BACs between them. Thus N = 91 (638/7=91). The loci and the number of matches each had were

A256   11
PAP8

10,000 simulations were run. In 3 of them (0.0003) 5 of the 91 bins contained both homologies.

Bng181/AP
There were three instances where these loci were linked in Arabidopsis with 1-9 BACs between them. Thus N = 71 (638/9=71). The loci and the number of matches each had were

Bng181   6
AP7

10,000 simulations were run. In 76 of them (0.0076) 3 of the 71 bins contained both homologies.



A total of     people have accessed this page since 4/1/00.