Information resources for wheat genomics
D.E. Matthews1, G.R. Lazo2, V.L. Carollo2 and O.D. Anderson2
1USDA-ARS, Cornell University, 409 Bradfield Hall, Ithaca, NY 14853, USA
2USDA-ARS Western Regional Research Center, 800 Buchanan St., Albany, CA 94710, USA
ABSTRACT
In the last few years there has been an explosion of information about the wheat genome, including the sequence of hundreds of thousands of expressed sequence tags (ESTs). This information is distributed among dozens of WWW sites, and finding it all can be difficult. The "Genomics" page on the GrainGenes website, http://wheat.pw.usda.gov/ggpages/genomics, provides the necessary pointers. Resources available include assemblies of the ESTs into groups putatively derived from the same gene (unigenes), a map of thousands of unigenes on deletion lines of Chinese Spring, alignments of wheat ESTs to the sequence of the rice genome, cooperative international projects to develop wheat SNPs and SSRs, an assembly of large DNA clones (BACs) into a physical map of the D genome, a database of repeat sequences from the Triticeae (TREP), and protocols developed for marker-assisted selection of important genes for pest resistance and quality (MASwheat).
INTRODUCTION
Here we give an overview of the major World Wide Web sites with data about wheat genomics. Only a few of these sites are focused specifically on wheat. Many of them provide useful cross-species comparisons with barley, rice, maize and other grasses. Some sites that deal only with other grasses are of interest for wheat genome research too.
RESULTS AND DISCUSSION
Project Websites
There are three sites that serve data derived from individual large research projects on wheat.
US EST project. http://wheat.pw.usda.gov/NSF is the site for an NSF project that has sequenced 90,000 ESTs from 50 cDNA libraries, and has mapped ESTs from 7,000 unigenes against a set of 101 Chinese Spring deletion lines. This website provides downloads of the sequences and assembled contigs. A companion site http://wheat.pw.usda.gov/wEST adds a searchable database including the map positions and functional annotation, as well as a BLAST server. For users interested specifically in the map data, the best page is http://wheat.pw.usda.gov/cgi-bin/westsql/map_locus.cgi.
The mapping in this project was performed using the cDNA clones as hybridization probes. Thus the actual sequence mapped by each clone is longer than the EST sequence derived from it. In many cases the EST is part of a longer contig. Both "Mapped ESTs" and "Mapped Contigs" are available for searching in the BLAST server.
UK EST project. http://www.cerealsdb.uk.net/wheat.htm serves data from a BBSRC project that has sequenced 26,000 ESTs from 35 cDNA libraries. The database can be searched with BLAST, by keywords in the annotation against known genes, or by "digital differential display" of contigs containing significantly more ESTs from one specified set of libraries than another. A microarray of 10,000 unigenes derived from these ESTs is available from this project.
Physical map of the D genome. http://wheat.pw.usda.gov/PhysicalMapping is the site for a U.S. NSF project that is building a map of overlapping BAC clones to cover the Aegilops tauschii genome. This physical map is being anchored to the genetic and deletion-line maps by hybridization of standard RFLP probes and mapped ESTs to the BAC clones. The database at http://wheatdb.ucdavis.edu:8080/wheatdb/Database can be searched and browsed for these markers, currently 600 of them.
Other EST projects. Yasunari Ogihara's 116,000 ESTs from 10 libraries have been integrated into the Komugi database, http://shigen.lab.nig.ac.jp/wheat/komugi. The data can be searched by library, annotated function, and BLAST. An additional 200,000 ESTs will be added after the 10th IWGS.
For barley, the new BarleyBase site has data about the Barley1 GeneChip for gene expression studies: http://www.barleybase.org.
SNPs and SSRs
Single nucleotide polymorphisms in wheat are now almost free for the asking because of the large number of EST sequences available. Two sites have done the bioinformatics to identify SNPs ripe for development.
CerealsDB at U. Bristol, http://www.cerealsdb.uk.net/discover.htm, has assembled 400,000 wheat ESTs and processed the resulting contigs with their AutoSNP software to identify polymorphisms that correlate with the germplasm source as opposed to sequencing errors. The database is searchable by BLAST or GenBank accession.
The Wheat SNP Development project, http://wheat.pw.usda.gov/ITMI/WheatSNP, has its own assembly of 400,000 ESTs. Only the raw contigs are available, detection of the SNPs is left to the user. The important thing about this site is that it serves as a coordination point for development of the SNPs into usable markers. If you are developing wheat SNPs please consult this site to see who may be already working on the same contig. And please register yourself there so other people don't repeat work you've already done. Or even better, join the organized development project, testing against the same set of 31 wheat lines.
The Genoplante / INRA Wheat SSR Club, http://wheat.pw.usda.gov/ggpages/SSRclub, is a similar coordinated public development project for microsatellites. Volunteers are assigned a set of SSR-containing sequences to design and test primers, and to map the SSRs. Results are reported back and published on the website.
The Triticeae EST-SSR Coordination site, http://wheat.pw.usda.gov/ITMI/EST-SSR, is somewhat less coordinated in that no specific development assignments are made. However, it does provide information about which SSR-containing sequences have been mined from the public EST database and who is working on developing some of them, as well as reports of the results.
EST Unigene Sets
For many experimental purposes it is necessary to assemble the ca. 420,000 wheat ESTs into a smaller number of groups representing different putative genes. Several such unigene sets are available online. Besides the assemblies mentioned above for the US and UK EST projects and the two SNP projects, there are sets at the following sites.
- NCBI (23,000 unigenes), http://www.ncbi.nlm.nih.gov/UniGene/clust.cgi?ORG=Ta
- TIGR (110,000), http://www.tigr.org/tdb/tgi/tagi
- PlantGDB (108,000), http://www.plantgdb.org/ESTCluster/progress.html
- HarvEST (38,000), http://harvest.ucr.edu
Different assemblies can give greatly different results, depending in large part on the software and parameters used. A prototype database to correlate contigs from different assemblies based on the ESTs they contain is AssembliesDB, http://www.graingenes.org/cgi-bin/ace/search/assemblies.
Comparative Mapping vs. Rice and Other Grasses
The positions of wheat ESTs in the wheat deletion-line map described above have been correlated with the positions of corresponding BLAST similarities in the rice genome (Sorrells et al, 2003). The detailed results from this study -- which ESTs have BLAST hits in which rice BAC/PACs -- are available as Supplementary Data at http://wheat.pw.usda.gov/pubs/2003/Sorrells.
Alignments of wheat ESTs to rice BAC/PACs can be viewed and searched at
- Gramene, http://www.gramene.org/perl/SeqTable
- TIGR, http://www.tigr.org/tdb/tgi/tagi, and
- GrainGenes, http://wheat.pw.usda.gov/cgi-bin/gbrowse?source=japonica
The data at GrainGenes only includes rice chromosomes 1 and 2, but these chromosomes have been assembled from their constituent BAC/PACs into full-length pseudomolecules.
Gramene's comparative map viewer, http://www.gramene.org/cmap/viewer, allows side by side comparison of genetic maps of wheat, barley, rice, oat, maize and sorghum, with common markers highlighted. NCBI has recently made a similar facility available, http://www.ncbi.nlm.nih.gov/mapview.
BACs and Repeat Sequences
A compilation of existing libraries of BACs and other large-insert clones from wheat and related species is maintained by Jorge Dubcovsky at http://agronomy.ucdavis.edu/Dubcovsky/BAC-library/ITMIbac/ITMIBAC.htm.
Results of hybridization of RFLP mapping probes to these libraries are available for Ae. tauschii BACs at http://wheat.pw.usda.gov/PhysicalMapping as mentioned above. Corresponding information about barley BACs is at Andy Kleinhofs' site http://barleygenomics.wsu.edu/databases/databases.html.
Although few wheat BACs have been sequenced yet, the results have produced considerable information about the repetitive sequence elements present in the genome. This information has been compiled, annotated, and is being maintained as the Triticeae repeat sequence database TREP (Wicker et al, 2002) which is online at http://wheat.pw.usda.gov/ITMI/Repeats.
Taking it to the Field
Finally, the MASwheat site http://maswheat.ucdavis.edu is a good example of an information resource on applied genomics. It provides specific protocols and results on the application of markers to assist selection of genes for pest resistance and quality in backcrosses to adapted cultivars and breeding lines. Protocols for sixteen genes and QTLs are available already and another 24 are being developed.
Although the MASwheat site doesn't involve a huge database with a query interface, interactive graphics, etc., it does share with all the other resources described above that it delivers necessary wheat genome information on demand via the Internet to those who need it and know where to find it.
Note
A copy of this article in HTML format, to avoid having to type in the URLs cited, is at http://wheat.pw.usda.gov/pubs/2003/Matthews.
ACKNOWLEDGEMENTS
This work was supported by the United States Department of Agriculture, Agricultural Research Service, CRIS #5325-21000-007-00D.
REFERENCES
Sorrells, M.E., La Rota, M., Bermudez-Kandianis, C.E., Greene, R.A., Kantety, R., Munkvold, J.D., Miftahudin, N., Mahmoud, A., Ma, X., Gustafson, P.J., Qi, L.L., Echalier, B., Gill, B.S., Matthews, D.E., Lazo, G.R., Chao, S., Anderson, O.D., Edwards, H., Linkiewicz, A.M., Dubcovsky, J., Akhunov, E.D., Dvorak, J., Zhang, D., Nguyen, H.T., Peng, J., Lapitan, N.L.V., Gonzalez-Hernandez, J.L., Anderson, J.A., Hossain, K., Kalavacharla, V., Kianian, S.F., Choi, D-W., Close, T.J., Dilbirligi, M., Gill, K.S., Steber, C., Walker-Simmons, M.K., McGuire, P.E. and Qualset, C.O. 2003. Comparative DNA sequence analysis of wheat and rice genomes. Genome Research, in press.
Wicker, T., Matthews, D.E. and Keller, B. 2002. TREP: a database for Triticeae repetitive elements. Trends in Plant Science 7: 561-562.