Science 3 min read

27 million ancestors make up the world's largest human family tree

By Science Gazette25 February 2022

The Big Data Institute at the University of Oxford has made a significant step in mapping the totality of our genetic relationships: a single genealogy that tracks our common history. The research was published in the journal Science today.

Human genetic research has advanced dramatically in the last two decades, yielding genomic data for hundreds of thousands of people, including thousands of ancient people. This opens up the fascinating prospect of tracing the beginnings of human genetic variation and creating a comprehensive map of how people throughout the globe are linked.

Until recently, the primary obstacles to realizing this goal were figuring out how to merge genomic sequences from many databases and creating algorithms to manage such large amounts of data. Researchers from the University of Oxford’s Big Data Institute have developed a novel approach that can readily merge data from numerous sources and scale to handle millions of genomic sequences.

One of the study’s main authors, Dr. Yan Wong, an evolutionary geneticist at the Big Data Institute, explained: “We’ve essentially created a massive family tree, a genealogy for all of humanity, that models the history that led to all of the genetic variation we see in humans today as accurately as possible. This genealogy enables us to examine how each individual’s genetic sequence links to one another throughout the genome.”

The lineage of each point on the genome may be thought of as a tree since specific genomic areas are exclusively inherited from one parent, either the mother or the father. A “tree sequence” or “ancestral recombination graph” is a series of trees that connects genomic areas back in time to ancestors where the genetic variation initially emerged.

Dr. Anthony Wilder Wohns, the study’s lead author, conducted the study as part of his PhD at the Big Data Institute and is now a postdoctoral researcher at the Broad Institute of MIT and Harvard “In essence, we’re reconstructing our ancestors’ genomes and using them to create a vast network of relationships. After that, we can figure out when and where their ancestors lived. Our method is unique in that it requires minimal assumptions about the underlying data and can be used with both current and ancient DNA samples.”

A total of 3,609 individual genome sequences from 215 groups were included in the research, which included data on present and ancient human genomes from eight separate databases. The ancient genomes contained samples from all across the globe, ranging in age from a few thousand years to over 100,000 years. To explain patterns of genetic variation, the algorithms projected where common ancestors must be present in evolutionary trees. Almost 27 million ancestors were found in the resultant network.

The authors utilized the network to determine where the anticipated common ancestors lived after adding location data to these sample genomes. The findings accurately reconstructed crucial events in human evolution, such as the migration out of Africa.

Although the genealogical map is currently a very thorough resource, the study team intends to expand it even further by adding genetic data as it becomes available. The dataset might readily include millions of additional genomes since tree sequences store data in a very efficient manner.

Dr. Wong said, “This research will pave the way for the next generation of DNA sequencing. The trees will get more precise as the quality of genome sequences from present and ancient DNA samples improves, and we will ultimately be able to build a single, unified map that explains the ancestry of all human genetic diversity we observe today.”

Dr. Wohns said, ” “While our research focuses on people, the approach is applicable to most living things, from orangutans to microorganisms. It might be especially useful in medical genetics, as it could help distinguish between actual links between genetic regions and illnesses and false connections resulting from our common ancestors.”

The Energy Problem AI Cannot Solve for Itself

Efficiency is not the answer. It never was. The question was always about continuity.

14 May 2026

Science

Clean Energy Pledges Meet Their Hardest Test Inside the Data Center

Every time a large language model answers a question, something burns. Not metaphorically.

13 May 2026

Energy

The Circuit Nobody Built Yet: Neutrinos, Jobs, and the Communities Waiting for Both

The science is moving fast. The question nobody's asking is what it means for the rest of us.

5 May 2026