SoyBase Follow us on Twitter @SoyBaseDatabase
Integrating Genetics and Genomics to Advance Soybean Research

Gene Names Derived by Sequence Similarity

When assigning gene function and therefore gene names to soybean genes based on sequence similarity to orthologs in other species it must be kept in mind that any particular gene in another species may be present in two or more copies in soybean based on the existence of a whole genome duplication in the genomic history of soybean.

Name assignments based on sequence similarity should be performed by reciprocal best-BLAST assignment. The last step in the name assignment should be to perform a multiple sequence alignment of the soybean sequence with other known sequences. The presence of required motifs and/or catalytic residues in the soybean protein sequence should be confirmed in this step.

In the event the soybean genome contains more than one copy of a sequence (paralogs), a dash-number will be arbitrarily assigned to each paralog to uniquely identify the sequence. For example if two sequences in soybean are identified as similar by sequence to the Arabidopsis gene alcohol dehydrogenase 1 (Arabdopsis gene symbol ADH1), one would be named ADH1-1 and the other ADH1-2.

To identify the allele status of the sequence, an alphabetical symbol will be added to the gene name. Each soybean cultivar will be assigned a letter combination that uniquely identifies it. This letter combination will be added to the “dash” number to designate the cultivar from which the sequence was derived. For example, the cultivar Williams82 has been assigned the letter designation of “a”. Therefore any gene identified in this cultivar will be assigned the “a” designator ie ADH1-1a. A master list of cultivar-letter designators will be available at SoyBase. If your cultivar is not on the list, please email SoyBase so that a new designator for your cultivar can be determined and posted on the list.

Iowa State University Logo