As DNA tests for ancestry explode in popularity, a fundamental problem remains: The tests deliver more detailed results for people of European descent, as evidenced by the ethnicities and data that major DNA testing companies represent. While this bias should, theoretically, abate as more people take the test and add their DNA data to the mix, the companies have some work to do before their kits can work reasonably well on a worldwide population. 

In 2017, more people took DNA tests than in all the previous years combined, according to the MIT Technology Review, and that number keeps climbing. According to the International Society of Genetic Genealogy (ISOGG), more than 18 million people have tested their DNA to learn about their ethnic identity or to find relatives. DNA testing companies like AncestryDNA and 23andMe have become household names as a result, while new tests claiming more specialized results crop up every few years.

It’s easy to see the appeal. For $99, 23andMe and AncestryDNA simply require that you spit in a cup, send it off to a lab for testing, and then wait a matter of weeks to learn the ethnic breakdown of your genes by region. (See our comparison of these two popular kits.)

The data problem

The risk for bias in DNA tests starts with the databases used by the companies. AncestryDNA, for instance, bases the ethnicity estimate in its test upon a reference panel sourced from the DNA of 16,638 people representing 43 different populations. The people in the reference panel are screened to ensure they represent a certain ethnicity strongly—“people with a long family history in one place or within one group,” the company explains. The screening involves controls, such as removing close relatives, to avoid skewing the ethnicity profile.

While this pre-screened data can identify ethnicity on a broad level, more detail comes only with more data. Every DNA test kit sent in adds to the company’s database. That’s why leading contenders AncestryDNA and 23andMe have some of the best estimates available—they have more customers, and therefore more data. 

Because DNA tests like AncestryDNA and 23andMe were at first available only in the United States, however, and have expanded mostly to European countries or former European colonies, the customer base continues to be fairly homogeneous. ISOGG estimates that four-fifths of the people who have taken DNA tests are U.S. citizens, meaning their data reflects a population with majority European ancestry. 
ancestrydna regional map Dieter Holger/IDG
AncestryDNA’s ethnic regions. Colored areas represent locations that came up in PCWorld’s review.

Challenges in funding and poor infrastructure make it more difficult to gather genetic data on underrepresented DNA groups like Africans, Asians, and indigenous peoples. Sarah Tishkoff, a professor at the University of Pennsylvania who has studied African genomics for 18 years, told PCWorld, “right now, it’s not possible to infer the exact sources of ancestry of African Americans,” Tishkoff said, ”and it would be unfortunate if they have the expectation that they will be able to get that information.”

Tishkoff said that gathering a more diverse set of DNA data brings its own challenges, both financial and ethical. “There needs to be better funding and resources for generating that data. It’s also important to do the research in an ethical manner. I personally think there should be caution about using information from indigenous populations for commercial purposes such as ancestry testing.”