We have generated three transcriptome datasets, for Triticum urartu, Triticum turgidum and a complementary set of published wheat transcripts not present in the T. turgidum set.
Transcripts
Datasets include 5' and 3' untranslated regions. T. urartu: 86,247, T. turgidum: 140,118
Open reading frames
ORFs start at ATG and end at the stop codon unless they are truncated. Predicted pseudogenes have been excluded. T. urartu: 37,806, T. turgidum: 66,633
Proteins
Translated ORFs. Select BLASTP to see the protein datasets. Predicted pseudogenes have been excluded. T. urartu: 37,806, T. turgidum: 66,633
"Complementary" wheat transcribed sequences
Since our tetraploid transcriptome was assembled from a limited number of tissues and developmental stages, we generated an additional non-redundant set of sequences from
- published wheat transcriptomes [1,2,3],
- full-length cDNA datasets [4], and
- re-assembled wheat ESTs.
The initial non-redundant set included 146,300 contigs, available in dataset "Published wheat transcripts".
Following BLASTX searches against characterized plant protein databases, we used findorf to predict 65,921 ORFs (>30 amino acids, no pseudogenes) within the previous dataset. We then identified and removed ORFs that were already present in our T. turgidum dataset (CD-HIT-2D). The remaining 27,544 non-redundant ORFs are available in "Complementary wheat ORFs" and "Complementary wheat proteins".
Cautionary note: Some of the T. turgidum assembled transcripts and predicted ORFs are chimeras between A and B genome transcripts.
If you use this data in your publications, please cite:
Krasileva, K.V., V. Buffalo, P. Bailey, S. Pearce, M. Soria, F. Tabbita, C. Uauy, International Wheat Genome Sequencing Consortium, and J. Dubcovsky*. Separating homeologs by phasing in the tetraploid wheat transcriptome. 2013 Genome Biology. In press.
References
1.- Brenchley R, Spannagl M, Pfeifer M, Barker GL, D'Amore R, Allen AM, McKenzie N, Kramer M, Kerhornou A, Bolser D, Kay S, Waite D, Trick M, Bancroft I, Gu Y, Huo N, Luo MC, Sehgal S, Gill B, Kianian S, Anderson O, Kersey P, Dvorak J, McCombie WR, Hall A, Mayer KF, Edwards KJ, Bevan MW, Hall N: Analysis of the bread wheat genome using whole-genome shotgun sequencing. Nature 2012, 491:705-710.
2.- Schreiber AW, Hayden MJ, Forrest KL, Kong SL, Langridge P, Baumann U: Transcriptome-scale homoeolog-specific transcript assemblies of bread wheat. BMC Genomics 2012, 13:492.
3.- Cantu D, Pearce SP, Distelfeld A, Christiansen MW, Uauy C, Akhunov E, Fahima T, Dubcovsky J: Effect of the down-regulation of the high Grain Protein Content (GPC) genes on the wheat transcriptome during monocarpic senescence. BMC Genomics 2011, 12:492.
4.- Mochida K, Yoshida T, Sakurai T, Ogihara Y, Shinozaki K: TriFLDB: a database of clustered full-length coding sequences from Triticeae with applications to comparative grass genomics. Plant Physiol 2009, 150:1135-1146.