There are certainly accuracy issues with these tests, but that doesn't mean that they are not useful.
Since I can't post links, here's the text of one company's analysis:
"Identifying hundreds of inaccurate SNPs with high-impact in 23andMe raw data"
While developing and testing our Enlis Genome Personal software, we noticed some unusual SNPs in 23andMe’s raw data. We found a lot of rare homozygous SNPs, with very serious consequences, and the same SNPs were found in multiple samples that we had on hand!
The SNP variant shown here is a splice disruption in a gene called HEXA. Splice disruptions in HEXA are known to cause Tay-Sachs disease. Not only do all 3 of these 23andMe users have this extremely rare homozygous (2 copies) splice disruption SNP, but all 3 users also have
2 more extremely rare homozygous splice disruption SNPs in the same HEXA gene! That can’t be right.
We wanted to verify with more data, and identify similar inaccurate positions, so first, we downloaded the database of user-submitted 23andMe data from Opensnp.org
Then, using the our software’s Variation Filter tool, we were able to compare the allele frequency of each 23andMe SNP among 1,500 users, against the expected allele frequency, based on next-generation sequencing projects (1000 genomes and Exome Aggregation Consortium).
As it turns out, there are more than 500 inaccurate positions like this in 23andMe’s raw data:
- 323 of the faulty SNPs are in splice sites, and 246 of those are splice disruptions (more serious).
- 75 are missense
- The faulty SNPs are in 279 different genes, and 243 of those genes are known to affect a human disease or trait.
We have notified 23andMe of this problem, and our hope was that they will fix their raw data — however, so far they have not seemed very interested in our findings. This brings up the question: If 23andMe wants to have an ongoing relationship with their customers, then what is their responsibility fix the raw data when errors are discovered?
So there is some inaccurate data 23andMe’s results — is this cause for banning the download of raw data? No, not at all. In data sets this large, there are bound to be errors of this nature. We should fix errors where we find them and move forward. But if you want to get raw data interpreted, make sure that you use an experienced service, with quality control measures in place