Upload a file of barley accessions:
Use the 'Select options' tab to query the WGS database.
Phylogenetic tree view powered by VCF2PopTree
No VCF file loaded. Use the 'Select options' tab to query the WGS database.
Draw tree
SNPversity 2.0 Data Downloads
Release 2.0.80 (20 September 2024)
Overview
Below you will find detailed information about the datasets available for download, along with guidelines and best practices. For more information on how the datasets were generated see the 'Help' section.
How to Download
Raw SNPs (FigShare) : Original raw SNP VCF files are here with a permanent DOI (10.6084/m9.figshare.30531372).
Passport data : Passport data for each wild barley sample avialble in the supplementary data here .
GitHub : GitHub repository with the code used for variant calling available here .
High-coverage SNP VCFs (GrainGenes) : Files were processed using the SNPversity workflow and are available here .
High-quality SNP VCFs (GrainGenes) : Files were processed using the SNPversity workflow and are available here .
High-coverage InDel VCFs (GrainGenes) : Files were processed using the SNPversity workflow and are available here .
High-quality SNP VCFs (GrainGenes) : Files were processed using the SNPversity workflow and are available here .
Download Instructions
Prepare Adequate Storage : Ensure you have sufficient storage space on your system for the download.
Use a Reliable Internet Connection : A wired connection is recommended for faster and more stable downloads.
Consider Using a Download Manager : For large files, a download manager can help manage the download process, allowing for pause, resume, and recovery of the download.
Verify Data Integrity : After download, use checksums (provided in the tables below) to verify the integrity of the downloaded files.
Barley High Coverage SNP Datasets (VCFs) Download
Dataset Name
Description
Size
MD5 Checksum
chr1H.high_coverage.vcf.gz
High Coverage SNP variants and metadata identified in Chromsome 1H
1.18 GB
46c46a25a88541a4961cfe9d4e25104b
chr2H.high_coverage.vcf.gz
High Coverage SNP variants and metadata identified in Chromsome 2H
1.55 GB
b1ccb1a3f124dc34c87c85ec81aa207f
chr3H.high_coverage.vcf.gz
High Coverage SNP variants and metadata identified in Chromsome 3H
1.48 GB
9540b9de8cba3d5c43047fe9ff539ce9
chr4H.high_coverage.vcf.gz
High Coverage SNP variants and metadata identified in Chromsome 4H
1.47 GB
6c7b8eed0120e420f25bc7881d6c0149
chr5H.high_coverage.vcf.gz
High Coverage SNP variants and metadata identified in Chromsome 5H
1.32 GB
4414107070a511c46c784939383cabdf
chr6H.high_coverage.vcf.gz
High Coverage SNP variants and metadata identified in Chromsome 6H
1.31 GB
b1305985578042bdda0209e40339474c
chr7H.high_coverage.vcf.gz
High Coverage SNP variants and metadata identified in Chromsome 7H
1.51 GB
720dcca043983ff330dbbd8682f8a4b5
Barley High Quality SNP Datasets (VCFs) Download
Dataset Name
Description
Size
MD5 Checksum
chr1H.high_quality.vcf.gz
High Quality SNP variants identified in Chromsome 1H
11 MB
ff1d939804560c961397f3351499fdfe
chr2H.high_quality.vcf.gz
High Quality SNP variants identified in Chromsome 2H
14 MB
dc3ae1695f96029de98098e50443f4c5
chr3H.high_quality.vcf.gz
High Quality SNP variants identified in Chromsome 3H
15 MB
fabe5a9a58ac1c482fd668ad9395e6cd
chr4H.high_quality.vcf.gz
High Quality SNP variants identified in Chromsome 4H
5 MB
0962f1e32ab40cd2f87b1481a90d1f25
chr5H.high_quality.vcf.gz
High Quality SNP variants identified in Chromsome 5H
14 MB
fcf674464fdc78a53d557f0a2f3748f9
chr6H.high_quality.vcf.gz
High Quality SNP variants identified in Chromsome 6H
11 MB
434d02ea3331edde57d756e0177e90b5
chr7H.high_quality.vcf.gz
High Quality SNP variants identified in Chromsome 7H
15 MB
17309202b24d42e5b20515a18c290f09
Barley High Coverage InDel Datasets (VCFs) Download
Dataset Name
Description
Size
MD5 Checksum
chr1H.high_coverage.vcf.gz
High Coverage InDel variants identified in Chromsome 1H
122 MB
a638e5a6a2e68e3d169b7fcdada79589
chr2H.high_coverage.vcf.gz
High Coverage InDel variants identified in Chromsome 2H
159 MB
c39d1912d6eed9224a64ef8d3be42018
chr3H.high_coverage.vcf.gz
High Coverage InDel variants identified in Chromsome 3H
154 MB
53f745e0dbc0ff99193954e3d7880c28
chr4H.high_coverage.vcf.gz
High Coverage InDel variants identified in Chromsome 4H
143 MB
76d155919bf5eb947aacd7f1551543a5
chr5H.high_coverage.vcf.gz
High Coverage InDel variants identified in Chromsome 5H
139 MB
b2c5076a1d2e63e065c8f52d97ce79a4
chr6H.high_coverage.vcf.gz
High Coverage InDel variants identified in Chromsome 6H
132 MB
7a54684350cec206c3dc366b0fb20fd4
chr7H.high_coverage.vcf.gz
High Coverage InDel variants identified in Chromsome 7H
158 MB
f4a76dd8a8cc10d41b7a8500b86194ca
Barley High Quality InDel Datasets (VCFs) Download
Dataset Name
Description
Size
MD5 Checksum
chr1H.high_quality.vcf.gz
High Quality InDel variants identified in Chromsome 1H
24 MB
be7b270b1ca048bc14c46629dea895ab
chr2H.high_quality.vcf.gz
High Quality InDel variants identified in Chromsome 2H
31 MB
31eda619e2e77c65127dd2112da5b0d9
chr3H.high_quality.vcf.gz
High Quality InDel variants identified in Chromsome 3H
31 MB
d202fb1da546a46ab5f231a572744793
chr4H.high_quality.vcf.gz
High Quality InDel variants identified in Chromsome 4H
29 MB
daf62688bf97841a8f9fc030dbb1f28b
chr5H.high_quality.vcf.gz
High Quality InDel variants identified in Chromsome 5H
28 MB
edc7b9b05994374a03543a8c5047aa43
chr6H.high_quality.vcf.gz
High Quality InDel variants identified in Chromsome 6H
25 MB
cff1dd53c84a829712497b44b5cbd5ab
chr7H.high_quality.vcf.gz
High Quality InDel variants identified in Chromsome 7H
33 MB
83d0cbc5ab0e0b7cf924eaacec3726e5
Need Help?
If you encounter any issues during the download or have any questions, please contact through
GrainGenes Feedback .
Disclaimer
We reserve the right to modify the access procedures to ensure data security, integrity, and convenience. This includes using another platform to host the datasets. This page will be updated with the most up-to-date datasets and methods of accessing the data.
SNPversity 2.0
Release 2.0.80 (20 September 2024)
Citation
If you use SNPversity for your research, please cite one or more of the following
SNPversity 2.0 Andorf CM, et al. (2025) A unified VCF dataset from nearly 1,500 diverse maize accessions and resources to explore the genomic landscape of maize. G3 Genes|Genomes|Genetics. doi:10.1093/g3journal/jkae281 .
SNPversity Schott DA, et al. (2025) SNPversity: a web-based tool for visualizing diversity. Database. doi:10.1093/database/bay037 .
GrainGenes Database Yao E, et al. (2025). GrainGenes: genetics, genomes, and pangenomes. Genetics. doi:10.1093/genetics/iyaf270
Wild Barley (Hordeum vulgare ssp. spontaneum) Dataset Spanner R, et al. (2025). Whole-Genome Resequencing of the Wild Barley Diversity Collection: A Resource for Identifying and Exploiting Genetic Variation for Cultivated Barley Improvement. G3 Genes|Genomes|Genetics. doi:10.1093/g3journal/jkaf261
Overview
Welcome to SNPversity 2.0, the second generation of the SNPversity platform. Designed as an open-source visualization tool, SNPversity 2.0 enables users to explore extensive variant datasets with ease. Here's how it works:
Input & Output : Users can input specific genomic intervals and select accessions of interest. SNPversity 2.0 processes this input to deliver a Variant Call Format (VCF) file and a detailed table. These outputs showcase the alleles matching the user's query, providing a clear and concise visualization of genetic variations.
Technical Architecture : At its core, SNPversity 2.0 is powered by a robust HDF5 database back-end that manages variant annotations efficiently. A data exchange layer, developed in Python, facilitates seamless data handling, while a JavaScript-based interface layer presents Single Nucleotide Polymorphisms (SNPs) and Insertions/Deletions (InDels) in an interactive format.
Real-Time Visualization : SNPversity 2.0 elevates data presentation by displaying information in real-time directly in your web browser. Tables are intuitively color-coded, reflecting each SNP's allelic status and mutational state for straightforward interpretation.
Enhanced Data Insights : Beyond basic variant data, SNPversity 2.0 enriches your research with valuable metadata. This includes variant effect annotations, mapping quality scores, genotypic coverage, and maximum R2 scores to indicate linkage disequilibrium, offering a comprehensive view of genetic variations.
Code
For information about the code used in SNPversity 2.0 please refer to the GitHub page. This page provides detailed information about web compatibility, the directory structure, system requirements, and a comprehensive description of the code. It also offers guidance on how to use the websites, information about the pipelines used, and instructions on preparing data for PanEffect.
For more details, please visit our GitHub page:
GitHub - SNPversity 2.0 .
How to use the website
There are four tabs in the SNPversity 2.0 website 'Select options', 'Table view', 'Tree view', and 'Help'.
Select options
The two main inputs for this tool are selecting the genomic interval of interest and which subset of accessions to include.
Select genomic interval:
This section allows the user to select the genomic interval of interest. The two main options are entering the genomic coordinates or entering a gene model identifier.
Genome Version : Select the reference genome version. The only option currently available is barley Morex version 3.
Dataset : Select the variation dataset to query. The 'High Quality' datasets were filtered on mapping quality (>= 30), genotypic coverage (>= 50%), and linkage disequilibrium with max R2 > 0.5. The 'High Coverage' datasets were filtered only on mapping quality (>= 30) and genotypic coverage (>= 50%).
Chromosome : Select the chromosome.
Genome Start Position (bp) : Select the start position on the chromosome.
Genome End Position (bp) : Select the end position on the chromosome.
Loci per page : Select the number of loci to view per page in the HTML table view. This number corresponds to the number of rows in the table.
Gene Model ID : Optionally, select the gene model identifier and the number of base pairs to add as padding to the start and end of the gene model coordinates. Pressing the 'Load coordinates' button will load the chromosome, start, and end positions if the gene model is found. This option currently only accepts barley Morex (v3) gene models.
(
NOTE: Genomic regions larger than 1 MB will only be avaialble as VCF downloads. The table and tree views will not be available. )
Select which accessions to include:
This section allows the user to select a subset of the accessions to view. The two main options are to upload a file with the accession IDs, use the buttons to randomly subsample the datasets, or use the checkboxes to manually select the accessions. A list of all accessions can be found in
XLSX .
Table view
The table view option allows the user to download the VCF generated from the select options tab and displays a table of the data (for regions <= 1Mb). Each row of the data corresponds to a locus position in the dataset. The descriptions of the columns are in the following table.
Column name
Definition
Abbreviation
Example data
Chromosome
CHR
The chromosome where the locus is located
chr1
Position
POS
The genomic coordinate on the chromosome
104985
Reference allele
REF
The allele value for the locus in the reference genome barley Morex (v3).
A
Alternate allele
ALT
The alternative allele value found in other maize accessions
T
Gene models
Gene model(s)
The name of the barley Morex (v3) gene model affected by the variant. Displays the closest gene models when the variant is intergenic.
HORVU.MOREX.r3.1HG0090290
Effect type
Effect type
The type of effect using Sequence Ontology terms.
stop gained
Effect impact
Effect impact
A estimation of putative impact/deleteriousness.
HIGH MODIFIER
Mapping quality score
MQ
The average mapping quality of reads supporting the variant.
58
Completeness
COMP
The percent of accessions that provide genotype data for a particular variant (i.e., there is at least one read for that accession at the given variant). Note, this is different than read coverage
99
Maximum squared correlation
max R2
The linkage disequilibrium measured by the maximum R2 for a given loci.
0.64
Minor allele frequency
MAF
Minor Allele Frequency (MAF) is the proportion of the less common allele at a genetic locus within a given population.
0.23
The gene models in the Gene model(s) column are linked to the GrainGenes barley Morex (v3) genome browser. The position of the locus is shown as a vertical line on the browser.
Variant effects were calculated using
SNPeff .
Linkage disequilibrium values were calculated using
Plink .
The remainder of the columns are based on the subset of selected barley accessions. There is one column for each barley accession and the column header is color-coded based on the project. The columns are named based on the accession name, an underscore, and the SRR ID. The values and colors of the data in these columns are based on the allele value for the given locus for that accession.
0
Homozygous reference genotype
1
Heterozygous genotype with one reference and one alternate allele
2
Homozygous alternate genotype
N
Missing or unknown genotypes
Variant effect types
VCF Column
Description
MQ
Mapping quality score - The average mapping quality of reads supporting the variant
COMP
Completeness - The percent of accessions that provide genotype data for a particular variant
max R2
Maximum squared correlation - The linkage disequilibrium measured by the maximum R2 for a given loci
MAF
Minor allele frequency - The proportion of the less common allele at a genetic locus within a given population
Seq. Ontology
Effect
Description
Impact
intergenic_region
INTERGENIC
The variant is in an intergenic region
MODIFIER
upstream_gene_variant
UPSTREAM
Upstream of a gene (default length: 5K bases)
MODIFIER
5_prime_UTR_variant
UTR_5_PRIME
Variant hits 5'UTR region
MODIFIER
coding_sequence_variant
CDS
The variant hits a CDS.
MODIFIER
exon_variant
EXON
The variant hits an exon (from a non-coding transcript) or a retained intron.
MODIFIER
intron_variant
INTRON
Variant hits an intron. Technically, hits no exon in the transcript.
MODIFIER
frameshift_variant
FRAME_SHIFT
Insertion or deletion causes a frame shift e.g.: An indel size is not multiple of 3
HIGH
missense_variant
NON_SYNONYMOUS_CODING
Variant causes a codon that produces a different amino acid e.g.: Tgg/Cgg, W/R
MODERATE
synonymous_variant
SYNONYMOUS_CODING
Variant causes a codon that produces the same amino acid e.g.: Ttg/Ctg, L/L
LOW
stop_lost
STOP_LOST
Variant causes stop codon to be mutated into a non-stop codon e.g.: Tga/Cga, */R
HIGH
stop_gained
STOP_GAINED
Variant causes a STOP codon e.g.: Cag/Tag, Q/*
HIGH
3_prime_UTR_variant
UTR_3_PRIME
Variant hits 3'UTR region
MODIFIER
downstream_gene_variant
DOWNSTREAM
Downstream of a gene (default length: 5K bases)
MODIFIER
Impact
Meaning
Example
HIGH
The variant is assumed to have high (disruptive) impact in the protein, probably causing protein truncation, loss of function or triggering nonsense mediated decay.
stop_gained, frameshift_variant
MODERATE
A non-disruptive variant that might change protein effectiveness.
missense_variant, inframe_deletion
LOW
Assumed to be mostly harmless or unlikely to change protein behavior.
synonymous_variant
MODIFIER
Usually non-coding variants or variants affecting non-coding genes, where predictions are difficult or there is no evidence of impact.
exon_variant, downstream_gene_variant
To see a full list of the variant effect types click
here .
VCF Download from Search
After completing a search based on a genomic region and a subset of accessions, a "Download the VCF file" button will be available in the "Table view" tab. When the user clicks this link, a subset of the variants meeting the search criteria will be downloaded as a VCF file. The VCF will only include the specified set of accessions and each variant within the selected genomic region. The VCF will contain rows with no variant relative to the reference. The genotype coverage fields (CVC and CVP) will refer to the number of accessions with genotype data for a particular variant based on all 1,525 accessions, not the local genotype coverage of the selected accessions.
Below is a sample of an output VCF file generated from SNPversity 2.0:
##fileformat=VCFv4.2
##fileDate=20240531
##source=MaizeGDB2024
##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS mapping quality">
##INFO=<ID=CVC,Number=1,Type=Integer,Description="The number of accessions that have genotype data for a particular variant">
##INFO=<ID=CVP,Number=1,Type=Float,Description="The percent of accessions that have genotype data for a particular variant.">
##INFO=<ID=TYPE,Number=.,Type=String,Description="The type of effect using Sequence Ontology terms">
##INFO=<ID=EFFECT,Number=.,Type=String,Description="An estimation of putative impact/deleteriousness">
##INFO=<ID=GENEMODEL,Number=.,Type=String,Description="The name of the gene model affected by the variant">
##INFO=<ID=SUB,Number=.,Type=String,Description="The amino acid substitution for missense and non-synonymous variants">
##INFO=<ID=MAXR2,Number=1,Type=Float,Description="The maximum R2 for a given loci">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT CML228_SRR8906784 CML69_SRR8906963 B97_CRX445264 CML322_CRX445267 CML333_CRX445268 CML52_SRR5725841 CML103_SRR5976229 M37W_SRR5976317
chr10 218441 . T C 5127.59 . MQ=59.89;CVC=1481;CVP=98.87;TYPE=3_prime_UTR_variant;EFFECT=MODIFIER;GENEMODEL=Zm00001eb404760;SUB=NA;MAXR2=0.910833 GT 0/0 1/1 0/0 0/0 0/0 0/0 0/0 0/0
chr10 218443 . G A 5127.84 . MQ=60;CVC=1484;CVP=99.07;TYPE=3_prime_UTR_variant;EFFECT=MODIFIER;GENEMODEL=Zm00001eb404760;SUB=NA;MAXR2=0.910835 GT 0/0 1/1 0/0 0/0 0/0 0/0 0/0 0/0
chr10 218458 . T C 1656.67 . MQ=53.05;CVC=1484;CVP=99.07;TYPE=3_prime_UTR_variant;EFFECT=MODIFIER;GENEMODEL=Zm00001eb404760;SUB=NA;MAXR2=0.666034 GT 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0
chr10 218498 . T A 1715.17 . MQ=52.05;CVC=1465;CVP=97.80;TYPE=3_prime_UTR_variant;EFFECT=MODIFIER;GENEMODEL=Zm00001eb404760;SUB=NA;MAXR2=0.916933 GT 0/0 1/1 0/0 0/0 0/0 0/0 0/0 0/0
chr10 218502 . C CTCTGTCTG 1676.75 . MQ=47.9;CVC=1444;CVP=96.40;TYPE=3_prime_UTR_variant;EFFECT=MODIFIER;GENEMODEL=Zm00001eb404760;SUB=NA;MAXR2=0.916923 GT 0/0 1/1 0/0 0/0 0/0 0/0 0/0 0/0
Tree view
The tree view option allows the user to generate phylogenetic tree views based on the
VCF2PopTree tool . The input to build the tree is based on the VCF file generated from the select options tab.
The tree tree can be constructed as either an UPGMA tree or Neighbour-Joining tree (Unrooted). The drawing options are inlcudes Rectangular tree or Radial tree. In addtion, the trees can be saved in the following text formats: Newick tree, Pair-wise diversity (MEGA), or PHYLIP.
Downloads
The Downloads tab provides detailed information about the datasets available for download, along with guidelines and best practices.
Help
The help page gives an overview of the webpage, descriptions of the datasets and methods, and how to use the website.