Genome Organization in Dicots: I. Genome Duplication in Arabidopsis and Synteny Between Soybean and Arabidopsis
David Grant^{1}, Perry Cregan^{2} and Randy C. Shoemaker^{1}


^{1}USDAARS Corn Insect and Crop Genetics Research Unit, Department of Agronomy, Iowa State University, Ames, Iowa 50011
 ^{2}USDAARS Soybean and Alfalfa Research Unit, BARCWest, Beltsville, MD 20705
Proc. Nat. Acad. Sci. (USA) ??:???????? (2000)
ABSTRACT
Synteny between soybean and Arabidopsis was studied using conceptual translations of DNA sequences from loci which map to soybean linkage groups A2, J and L. Synteny was found between these linkage groups and all four of the Arabidopsis chromosomes where Genbank contained enough sequence for synteny to be confidently identified. Soybean linkage group A2 (soyA2) and Arabidopsis chromosome I showed significant synteny over almost their entire lengths, with only two to three chromosomal rearrangements required to bring the maps into substantial agreement. Smaller blocks of synteny were identified between soyA2 and Arabidopsis chromosomes IV and V (near the RPP5 and RPP8 genes) and between soyA2 and Arabidopsis chromosomes I and V (near the PhyA and PhyC genes). These subchromosomal syntenic regions were themselves homoeologous, suggesting that Arabidopsis has undergone a number of segmental duplications or possibly a complete genome duplication during its evolution. Homologies between the homoeologous soybean linkage groups J and L and Arabidopsis chromosomes II and IV also revealed evidence of segmental duplication in Arabidopsis. Further support for this hypothesis was provided by the observation of very close linkage in Arabidopsis of homologs of soybean Vsp27 and Bng181 (three locations) and purple acid phosphataselike sequences and homologs of soybean A256 (five locations). Simulations show that the synteny and duplications we report are unlikely to have arisen by chance during our analysis of the homology reports.
Results of TBLASTX Analysis
For this study 23 clones from which simple sequence repeats (SSR) markers on linkage group J were derived and all available RFLP probes that mapped to soybean linkage groups A2, L, and J were sequenced. These soybean linkage groups were chosen as being representative of a densely populated map (A2) and a pair of homoeologous linkage groups (J and L). The table shows the number of matches to Arabidopsis BACs identified by TBLASTX analysis.
Linkage Group A2  Linkage Group J  Linkage Group L 
RFLP probe  # of hits to Arabidopsis  RFLP probe  # of hits to Arabidopsis  RFLP probe  # of hits to Arabidopsis 
A065  many  A060  2  A023  12 
A085  many  A199  1  A071  12 
A096  1  A204  1  A106  3 
A110  many cDNAs  A233  1  A1321  0 
A111  0  A363  26  A1323  9 
A117  3  A450  3  A169  1 
A136  0  A724  0  A264  0 
A170  8  B032  2  A459  5 
A256  11  B074  0  A461  0 
A262  0  B101  7  A489  3 
A486  0  B122  12  A537  1 
A505  0  B166  6  B046  1 
A510  0  G815  13  B124  1 
A572  12  K375  4  B162  1 
A638  4  K384  1  B164  1 
A690  many  L050  many  B174  0 
A975  0  mO109  2  K385  3 
B132  4  BNG44  0  R001  13 
K417  2  BNG63  0  R201  0 
K636  2  BNG179  1  BNG71  7 
P003  1  Sct_046  6  BNG88  2 
R028  18    BNG95  3 
T036  2     
T153  2     
BNG77  0     
BNG121  6     
BNG181  6     
BNG205  1     
BNG225  1     
VSP27  5     
Simulations
The general algorithm we used can be summarized as
 Divide a simulated Arabidopsis genome into N equally sized bins. The size of the bins, and therefore their number, depends on the exact result being tested.
 For each locus L being tested, randomly distribute H homologies into the bins where H is the number of soybean:Arabidopsis homologies actually found for soybean sequence L.
 Determine the number of bins containing at least one homology for each locus. Do not consider order of the homologies in each bin.
 Compare this number to the number actually observed.
The exact numbers we used for each simulation and the results we obtained are summarized below.
Figure 1  soyA2 is syntenic with arabI
N = 4 (we are considering synteny between whole chromosomes and 4 Arabidopsis chromosomes were analyzed)
There were 14 soybean sequences that had a total of 84 homologies somewhere in the Arabidopsis genome. Each of these loci had one homology on arabI. The loci and the number of matches each had were
Bng121  6   R028  18 
Vsp27  5   T036  2 
A638  4   Bng225  1 
P003  1   A572  12 
T153  2   Bng181  6 
A096  1   A256  11 
A117  3   I  7 
10,000 simulations were run. In 23 of them (0.0023) 1 of the 4 chromosomes contained all 14 homologies.
Figure 2  regions of arabIV, arabV and arabI are homoeologous
There were three soybean sequences that had a total of 37 homologies somewhere in the Arabidopsis genome. Each of them had one homology in the putatively duplicated regions on the three Arabidopsis chromosomes and two of them (R028 and Vsp27) had two homologies on arabV. In addition, each chromosomal region contained a phytochrome gene. The loci and the number of matches each had were
PhyX  5 
Vsp27  5 
A572  12 
R028  18 
The putatively homoeologous regions on arabI, IV and V had sizes of ~23 cM, ~23 cM and ~46 cM, respectively. The total size of the four Arabidopsis chromosomes was ~460 cM.
The simulated genome was divided into 20 bins (460/23=20). The 42 loci were randomly placed into these bins and this result saved. Adjacent pairs of bins were combined to yield 10 large bins (460/46=10) and these results saved.
A simulation was considered to match our results if all 6 loci were found in 1 of the 10 large bins AND there were 2 of the smaller bins which contained 1 copy of each of the 4 loci.
10,000 simulations were run. 360 of them (0.036) of them matched our results. There was also an R gene in each of the three homoeologous regions. This locus was not included in the simulation as it was not clear how many such loci should be randomly distributed in the simulated genome. However, intuitively it would seem that their presence in our results lends support to our conclusions.
Figure 4  regions of arabII and arabIV are homoeologous
The putatively homoeologous regions were each approximately 10 cM. This results in N = 46 (460/10=46).
Three soybean sequences had a total of 34 homologies somewhere in the Arabidopsis genome. Each of these had one homology in each homoeologous region. A single BAC in each region contained one copy each of two of these three but this fact was not considered in the simulation. The loci and the number of matches each had were
10,000 simulations were run. In 19 of them (0.0019) 2 of the 46 bins contained all 3 homologies.
Figure 5  small regions have been duplicated in the Arabidopsis genome
There were 638 BACs in the four Arabidopsis chromosomes analyzed. The A256/PAP and Bng181/AP duplications were considered separately.
A256/PAP
There were five instances where these loci were linked in Arabidopsis with 17 BACs between them. Thus N = 91 (638/7=91). The loci and the number of matches each had were
10,000 simulations were run. In 3 of them (0.0003) 5 of the 91 bins contained both homologies.
Bng181/AP
There were three instances where these loci were linked in Arabidopsis with 19 BACs between them. Thus N = 71 (638/9=71). The loci and the number of matches each had were
10,000 simulations were run. In 76 of them (0.0076) 3 of the 71 bins contained both homologies.
A total of people have accessed this page since 4/1/00.