EST sequencing and phylogenetic analysis of the model grass Brachypodium distachyon

John P. Vogel1*, Yong Q. Gu1, Paul Twigg2 Gerard R. Lazo1, Debbie Laudencia-Chingcuanco1 Daniel M. Hayden1, Teresa J. Donze2, Lindsay A. Vivian2, Boryana Stamova1, and Devin Coleman-Derr1
1 USDA-ARS Western Regional Research Center, 800 Buchanan Street, Albany, CA 94710
2 University of Nebraska at Kearney, 905 W. 25th St., Kearney, NE 68849

* corresponding author
Theoretical and Applied Genetics (2006) 113:186-195

For more information, visit the bEST web site.

SUPPLEMENTAL MATERIALS

Of the 20,587 Brachypodium distachyon ESTs sequenced, 14,617 (71%) were able to be assigned by GO Molecular terms. Of the 20,587 Brachypodium distachyon ESTs sequenced, 13,770 (69%) were able to be assigned by GO Biological terms. Of the 20,587 Brachypodium distachyon ESTs sequenced, 11,898 (58%) were able to be assigned by GO Cellular terms. The above lists accounted for 15,595 ESTs from the total of 20,587.

Remaining ESTs not covered by the GO associations relied on matches to the NCBI non-redundant database (release 144) to approximate candidate gene assignments. BlastX better for matches, but BlastN better for gaining additional candidate assignments.

The matches to the NR release 144 database were then matched to descriptions to the NCBI Clusters of Orthologous Groups (COG) Index (www.ncbi.nlm.nih.gov/COG/) to finish out the descriptions. Of the 4,992 searched against the NR database, most unusually matched to the database and were attempted to be placed in functional categories as classified for the NCBI COG Index; 560 apparently did not have matches, but even so, most of these did match sequences in plant EST databases.

5548 sequences not GO assigned were extracted and assigned by NR keyword matches to the COG tables and micellaneous categories. There were 550 not assigned by this method, but did have matches to the NR database, and upon inspection would fit into the general genome sequence class; there were only 10 sequences of the entire collection which did not have any matches to either the UniProt or the NR database.

INFORMATION STORAGE AND PROCESSING
85JTranslation, ribosomal structure and biogenesis
2ARNA processing and modification
47KTranscription
2LReplication, recombination and repair
5BChromatin structure and dynamics
(2.5%)
CELLULAR PROCESSES AND SIGNALING
4DCell cycle control, cell division, chromosome partitioning
.YNuclear structure
3VDefense mechanisms
8TSignal transduction mechanisms
3MCell wall/membrane/envelope biogenesis
1NCell motility
17ZCytoskeleton
.WExtracellular structures
3UIntracellular trafficking, secretion, and vesicular transport
22OPosttranslational modification, protein turnover, chaperones
(1.1%)
METABOLISM
12CEnergy production and conversion
13GCarbohydrate transport and metabolism
78EAmino acid transport and metabolism
3FNucleotide transport and metabolism
.HCoenzyme transport and metabolism
8ILipid transport and metabolism
9PInorganic ion transport and metabolism
8QSecondary metabolites biosynthesis, transport and catabolism
METABOLISM
94RGeneral function prediction only
2SFunction unknown
(2.4%)
MISCELLANEOUS CATEGORIES OF INTEREST
2XSStorage
7XTRepetitive
10Sorghum Genome
349Zea mays Genome
3107Oryza sativa Genome
111Triticum spp. Genome
1Pennesetum spp. Genome
94Hordeum spp. Genome
67Saccharum spp. Genome
154Arabidopsis thaliana Genome
397Mus musculus Genome
179Homo sapiens Genome
81XGGeneral Genome
(82.2%)
560XOther
(10.1%)
5548Total