Genealogy and mathematics

Summary: Mathematical methods are used to investigate genealogies in a variety of ways, including creating probability models and simulations to determine the likelihood of common ancestors.

Genealogy is the study of families, often motivated by the desire to tell the story of lineage and to place family history in a larger historical context. For instance, among Americans there is often a great interest in determining one’s pre-American roots. The study of genealogy requires not only an understanding of history and the ability to work with historical primary sources of data but also mathematical structures. Ancestral charts double with every generation, and this geometric progression grows to large numbers quickly, so mathematical techniques have been fundamental in organizing and presenting family connections, both visually and in numerical formats. Mathematicians may use probability models and simulations to investigate the likelihood of common ancestors. Mathematicians also construct their mathematical genealogy, where parentage is redefined using the adviser and student relationship.

Genealogy Graph and Visualization Formats

Though people tend to think of a “family tree,” genealogical graphs may overlap or be shaped differently than tree-like structures. Other representations of the data include hourglass charts, which are centered on an individual, and spread both upward and downward to show direct ancestors and descendants, eliminating relations like cousins. Exponential crowding and edge crossing are common challenges in visualizing family data, and some researchers propose a multitree arrangement.

Genealogical software typically presents a variety of visualization options. Numbering systems have long been used to identify individuals. Methods from graph theory are important in analyzing the data for connections and patterns. Another genealogical challenge is the integration of information from disparate sources, such as census information and individual recordkeeping.

While standardized software and Genealogical Data Communication (GEDCOM) files may on one level make it easier to share information, the dynamic nature of huge online genealogical graphs presents new mathematical challenges. One method used to simplify such graphs has been to deemphasize individuals who enlarge a tree but do not increase the complexity.

Brief History of Genealogy

Historically, most genealogy was the study of the kinship and descent of royal and noble families—in this form, many of the earliest histories in Egypt and ancient Rome are genealogies mixed with mythology. As a study of royalty, genealogical research generally had the ultimate goal of demonstrating or undercutting claims of legitimacy or determining a line of succession. Early American genealogical research was associated with efforts to prove kinship to noble families and was thus part of the British class system, which the egalitarian republic had outgrown. The New England genealogist and historian John Farmer (1789–1838) may have been the first to change this, as his work on local histories was seen as a way to honor and glorify the work of early Americans and the story of America’s growth from loosely affiliated royal colonies to an independent nation. Farmer referred to his work—the combination of genealogy and local history—as “antiquarianism.”

The trend he helped to popularize led to the creation of the New England Historic Genealogical Society (NEHGS), the oldest genealogical society in the United States, in 1845, six years after his death. Many such societies opened throughout the country, notably the Genealogical Society of Utah (1894), now associated with the Church of Jesus Christ of Latter-day Saints (LDS), which has since developed the most extensive genealogical records in the world. Because LDS beliefs focus strongly on the sealing of family units together so that they may copersist in eternity, genealogy is an especially critical concern for the faith and necessary for religious ceremonies. Later in the twentieth century, the revival of interest in ethnic identity and in ties to ethnic roots long abandoned or forgotten in the 1960s and 1970s led to a revival of interest in genealogy. This interest was furthered in the following decades as software and genetic research provided new genealogical tools, while the Internet provided a new source of information sharing.

Genealogical Numbering Systems (GNS)

A variety of numbering systems are used to quantify family relationships. One descending numbering system that traces the line of an earlier ancestor is the Register System, which was developed by NEHGS in 1870, for the purpose of simplified recordkeeping in the New England Historic and Genealogical Register.

The system groups generations separately and uses both Arabic and Roman numerals, assigning each parent a unique Arabic numeral and using lower-case Roman numerals to enumerate progeny of each parent:

1 Parent
2 i Child
ii Child (no progeny)
3 iii Child
(2nd Generation)
2 Child
4 i Grandchild
3 Child
i Grandchild (no progeny)
(3rd generation)
4 Grandchild
5 i Great-grandchild

Along with the Register System, the most popular GNS in the United States is the NHSQ System, named for the National Genealogical Society Quarterly, and often called the Record System. It is derived from the Register System but assigns Arabic numbers to children without progeny as well. If a new child is discovered, the family numbers must be recalculated.

An older GNS is the Ahnentafel (“ancestor table”), published by historian Michael Eytzinger in 1590. Unlike the Register and Record systems, the Ahnentafel is an ascending numbering system, beginning with “1” in the present generation and increasing as generations are traced backward through time:

1 Subject
2 Father
3 Mother
4 Father’s father
5 Father’s mother
6 Mother’s father
7 Mother’s mother
8 Father’s father’s father

The Ahnentafel results in the following mathematical relationship: the number of an individual’s father is double that individual’s number, while the number of the individual’s mother is double plus one. Apart from #1, all even-numbered persons are male, and all odd-numbered persons are female. It is plain to see why this version did not become the dominant system in the United States: it is principally concerned with demonstrating to which families one has a blood relation, without accounting for siblings in any generation: nor can it be continued through the subject’s children and their descendants. It was a popular system for European nobility to display the noble families to whom they were related, along with their coats of arms.

Common Ancestors

Common ancestors are individuals who are the genealogical ancestors of every person in some given set of people. Genealogists and mathematicians often try to determine how many generations into the past a tree must be traced to find such common ancestors. These models rely on statistical estimates, such as the average human lifespan at different points in time, and average length of time between successive generations, as well as rates of reproduction. In the early twenty-first century, computer scientist Douglas Rohde, writer and editor Steve Olson, and statistician Joseph Chang collaborated to create mathematical models to estimate the most recent common ancestor (MRCA) of every human currently alive. Their initial probabilistic model, designed primarily for theoretical insight, assumed an unrealistic random mating scheme to facilitate an explicit analytical solution. A second, more realistic model required the researchers to mathematically express historical population dynamics and conduct Monte Carlo simulations to produce a distribution of feasible results with associated probabilities. According to these models, the MRCA likely lived just a few thousand years ago, perhaps during the reign of Tutankhamen, or even as recently at the start of the first century c.e.

In 2008, the genealogical Web site Geni allowed people with common ancestors to merge trees. Privacy was maintained by defining a set distance from the ancestor in which the information would be viewable. This defined distance has changed over time. The rate at which the trees enlarged often increased as the size of the tree increased. One particularly large, connected component of Geni’s graph is known as the “big tree” and represented over 35 million people in 2010.

Bibliography

Chang, Joseph. “Recent Common Ancestors of All Present-Day Individuals.” Advances in Applied Probability 31 (1999).

Lewis, Cathryn. “The Use of Graph Theory Techniques to Investigate Genealogical Structure.” Mathematical Medicine and Biology 9, no. 3 (1992).

McGuffin, Michael, and Ravin Balakrishnan. “Interactive Visualization of Genealogical Graphs.” Proceedings of IEEE Symposium on Information Visualization (InfoVis), 2005. http://profs.etsmtl.ca/mmcguffin/research.