SNP Functional Annotation Dataset Description A total of 1,556 soybean whole-genome sequences mostly deposited in SRA NCBI and generated by the Soybean Genome Improvement Laboratory at USDA-ARS, Beltsville, MD, USA were analyzed, and SNPs were identified (Zhang, H., Jiang, H., Hu, Z., Song, Q., & An, Y.C. (2022) BMC Genomics, 23, 250). We converted the SNP VCF files to SNP allele files and eliminated SNPs with missing and heterozygosity greater than 30%, with one or tri-alleles only. To reduce false SNP calling, SNPs with the number of accessions containing minor alleles less than or equal to 2 across 1,556 cultivated and wild soybeans were eliminated. SNP functional annotation was performed using Annovar based on the whole-genome sequence assembly Wm82a2v1 and its gene IDs. The input variant file included the following columns: variant chromosome IDs, variant physical positions, reference genome variant allele, alternate variant allele. The annotation file contains the following columns: chromosome, SNP position, reference allele, alternative allele, G. max minor allele, G. soja minor allele, mutation type, gene ID in Wm82a2v1, and the position and function of SNPs in the gene (see HGVS (Human Genome Variant Society) site https://hgvs-nomenclature.org/stable/recommendations/general/ for the nomenclature describing genetic variants). For details, please refer to the following citation: Liu Z, Shi XL, Yang Q, Li Y, Yang CY, Zhang MC, An YQ, Yan L. Song QJ. 2025. Landscape of rare allelic variants in cultivated and wild soybean genomes (in review)