African-American mitochondrial DNAs often match mtDNAs found in multiple African ethnic groups

Background Mitochondrial DNA (mtDNA) haplotypes have become popular tools for tracing maternal ancestry, and several companies offer this service to the general public. Numerous studies have demonstrated that human mtDNA haplotypes can be used with confidence to identify the continent where the haplotype originated. Ideally, mtDNA haplotypes could also be used to identify a particular country or ethnic group from which the maternal ancestor emanated. However, the geographic distribution of mtDNA haplotypes is greatly influenced by the movement of both individuals and population groups. Consequently, common mtDNA haplotypes are shared among multiple ethnic groups. We have studied the distribution of mtDNA haplotypes among West African ethnic groups to determine how often mtDNA haplotypes can be used to reconnect Americans of African descent to a country or ethnic group of a maternal African ancestor. The nucleotide sequence of the mtDNA hypervariable segment I (HVS-I) usually provides sufficient information to assign a particular mtDNA to the proper haplogroup, and it contains most of the variation that is available to distinguish a particular mtDNA haplotype from closely related haplotypes. In this study, samples of general African-American and specific Gullah/Geechee HVS-I haplotypes were compared with two databases of HVS-I haplotypes from sub-Saharan Africa, and the incidence of perfect matches recorded for each sample. Results When two independent African-American samples were analyzed, more than half of the sampled HVS-I mtDNA haplotypes exactly matched common haplotypes that were shared among multiple African ethnic groups. Another 40% did not match any sequence in the database, and fewer than 10% were an exact match to a sequence from a single African ethnic group. Differences in the regional distribution of haplotypes were observed in the African database, and the African-American haplotypes were more likely to match haplotypes found in ethnic groups from West or West Central Africa than those found in eastern or southern Africa. Fewer than 14% of the African-American mtDNA sequences matched sequences from only West Africa or only West Central Africa. Conclusion Our database of sub-Saharan mtDNA sequences includes the most common haplotypes that are shared among ethnic groups from multiple regions of Africa. These common haplotypes have been found in half of all sub-Saharan Africans. More than 60% of the remaining haplotypes differ from the common haplotypes at a single nucleotide position in the HVS-I region, and they are likely to occur at varying frequencies within sub-Saharan Africa. However, the finding that 40% of the African-American mtDNAs analyzed had no match in the database indicates that only a small fraction of the total number of African haplotypes has been identified. In addition, the finding that fewer than 10% of African-American mtDNAs matched mtDNA sequences from a single African region suggests that few African Americans might be able to trace their mtDNA lineages to a particular region of Africa, and even fewer will be able to trace their mtDNA to a single ethnic group. However, no firm conclusions should be made until a much larger database is available. It is clear, however, that when identical mtDNA haplotypes are shared among many ethnic groups from different parts of Africa, it is impossible to determine which single ethnic group was the source of a particular maternal ancestor based on the mtDNA sequence.

mtDNA to a single ethnic group. However, no firm conclusions should be made until a much larger database is available. It is clear, however, that when identical mtDNA haplotypes are shared among many ethnic groups from different parts of Africa, it is impossible to determine which single ethnic group was the source of a particular maternal ancestor based on the mtDNA sequence.

Background
The Atlantic slave trade resulted in the forced migration of an estimated 11 million Africans to the Americas. Only 9 million are thought to have survived the passage, and many more died in the early years of captivity. Historical accounts indicate that virtually all enslaved Africans brought to North America came from either West or West Central Africa. A recent comparison of mtDNA sequences from 1148 African Americans living in the US with a database of African mtDNA sequences showed that more than 55% of the US lineages have a West African ancestor, while fewer than 41% came from West Central or South West Africa [1]. In North America, different constellations of African groups were brought to various staging areas [2]. Among the important staging areas for the arrival and distribution of enslaved Africans were the ports of Savannah, GA and Charleston, SC. Estimates of the origin of enslaved Africans received at these sites are presented in Figure 1, with the largest African regional contributions coming from West Central Africa (40%; contemporary Angola, the Congos, Equatorial Guinea, and Gabon), and the West African regions of Senegambia (23%; contemporary Senegal, Gambia, and northern Guinea), and Upper Guinea (18%; contemporary Guinea and Sierra Leone and northwestern Liberia). Africans in the Carolina coast region were intentionally mixed to reduce the possibilities for successful revolts and to facilitate their assimilation into plantation-slave society. The contemporary Gullah/ Geechee culture emerged from these Africans.
Because mitochondrial DNA (mtDNA) is passed from mother to daughter with few, if any, changes occurring over many generations, it is possible to compare contemporary African-American mtDNA haplotypes with contemporary mtDNA haplotypes in a worldwide database to obtain information about the ancestral origins of these mtDNAs. In such a comparison, continent-specific haplotypes are readily observed, and the assignment of mtDNAs to continent of origin is relatively straightforward. The more difficult task is to tie particular mtDNA haplotypes to specific geographical regions and ethnic groups within a continent. This task is particularly difficult for Africa, as there is more genetic diversity among Africans than among people from any other continent and because humanity has resided in Africa longer than anywhere else.
Comparisons of individual mtDNA haplotypes could be used to identify a geographical region, particular country, or even an ethnic group from which a maternal ancestor emanated. However, the geographic distribution of mtDNA haplotypes is greatly influenced by the migration of individuals or population groups. These movements often result in the assimilation of people from other ethnic groups. Intermarriage also causes mtDNA haplotypes to move from one ethnic group to another. Over time, mtDNA haplotypes that originated in a single ethnic group are distributed among many ethnic groups. Despite these complications, mtDNA analyses for the purposes of ancestry reconstruction are increasing in popularity. Many people have had their mtDNA tested with the hope that the test will match their DNA to an mtDNA haplotype found in a particular ethnic group. For African Americans, who have been disenfranchised from their specific African roots, such a test might provide a clue about the ethnic group or country in Africa where one of their maternal ancestors originated. However, if identical mtDNA haplotypes are shared among many ethnic groups from different parts of Africa, it would be impossible to use DNA sequence information to determine which single ethnic group was the source of a particular maternal ancestor. To date, there are no published assessments that provide quantitative information about how often African-American mtDNAs are exact matches to multiple African ethnic groups. Therefore, we decided to compare samples of Carolina coast and other African-American mtDNAs to a database of sub-Saharan African mtDNAs to generate such an assessment.
Proportions of enslaved Africans brought to historic Caro-lina coast ports from the 17th to 19th centuries CE (from Jackson, 2004 [2]) Figure 1 Proportions of enslaved Africans brought to historic Carolina coast ports from the 17th to 19th centuries CE (from Jackson, 2004 [2]).

Database characterization
We assembled a database of 3645 mtDNA HVS-I sequences from the published literature and 80 additional sequences from our own (unpublished) studies of ethnic groups in Mali to generate a database of 3725 sequences. Only sequences from sub-Saharan Africa were included in the database, because North African mtDNAs are quite different from sub-Saharan mtDNAs [1] and few North American slaves are thought to have come from North African countries. Within the sub-Saharan database, more than 50% of the sequences were identical to a sequence from at least one other ethnic group. The remaining sequences either occurred multiple times within a single ethnic group or occurred only once in the database.
To provide a regional analysis of the database, samples were assigned to geographic regions as shown in # 1 and Figure 2, and the percentages of within-region and among-region matches were determined. The West African region contributed 1528 (41%) of the sequences ( Table 2). The sizes of the other regional groups ranged from 127 to 995. Overall, 40% of the sequences were present only once in the database or were found multiple times within a single ethnic group. In contrast, 24% of sequences were found in multiple ethnic groups from at least three geographical regions.
Two of the regional groupings, East and South, had an excess of sequences that were found in a single ethnic group, and a corresponding deficit of matches to sequences from multiple regions. This result is consistent with the idea that these two regions are dominated by samples that have low levels of the mtDNA haplotypes that are characteristic of the Bantu [4,5]. In contrast, the majority of mtDNA sequences from Mozambique in the Southeast region match sequences from multiple regions, and only a small percentage of these sequences are unique to ethnic groups from Mozambique, perhaps reflecting the fact that only Bantu speakers were sampled [5,6]. In support of this idea, most matches that include sequences from only two regions involve the West Central region that is believed to have been the original Bantu homeland [7].

Comparison of African-American samples with the sub-Saharan databases
Two African-American samples, a sample of African Americans who self-identified as Gullah/Geechee and a sample of African-American DNAs obtained from the Armed Forces DNA Identification Laboratory (AFDIL), were compared with both the original and the expanded databases to provide a sense of how increasing the database size impacts the distribution of exact matches. The Gullah/ Geechee people are an African-American microethnic group residing in the Georgia/South Carolina Lowcountry and coastal islands whose numbers are now estimated between 200,000 and 500,000 in the Sea Islands of South Carolina, Georgia, North Florida, and beyond [8]. Gullah/ Geechee language and culture include unique practices and artefacts (e.g., coiled basketry, Brer Rabbit stories, praise houses) including a distinct linguistic style with roots among the Mende peoples of Sierra Leone, West Africa. When a sample of 74 Gullah/Geechee mtDNA sequences was compared with the sub-Saharan database, approximately half of the mtDNAs were identical to two or more mtDNAs in the database and only seven mtDNAs matched mtDNAs from a single ethnic group ( Table 3). The remaining 28 mtDNAs were not identical to any sequence in the expanded database.
Similar results were obtained when the 97 African-American AFDIL mtDNAs were compared with the databases. Approximately half (49) of the mtDNAs were identical to multiple sequences in the original database ( Table 3). As with the Gullah/Geechee sample, fewer than 10% of the sequences matched a sequence from a single ethnic group, and 40% of the sequences did not have any perfect match in the database.
When the unmatched AFDIL and Gullah/Geechee mtD-NAs were combined and analyzed further, 63% differed  (Table 4). Nearly three-quarters of these imperfect matches were to sequences that were found in multiple ethnic groups. Thus, most of the imperfect matches appear to be derived from the common haplotypes by a single mutational event.

Geographical distribution of database matches
The majority of African-American mtDNAs that were identical to database mtDNAs matched mtDNAs from ethnic groups that were scattered throughout sub-Saharan Africa. However, 41% of the Gullah/Geechee and 37% of the AFDIL mtDNAs that matched database sequences were identical to mtDNAs found only in western (West plus West Central) Africa (Table 5). Only one Gullah/ Geechee mtDNA and one AFDIL mtDNA matched mtD-NAs that are found exclusively in eastern Africa in the sub-Saharan database. This distribution of matches is consistent with the historical information that most North American slaves were originally from western Africa. Most of the single region matches to both the Gullah/Geechee and the AFDIL mtDNAs occurred with West African samples ( Table 6). This result is consistent with the historical records indicating that West Africa was a major source of American slaves, but it also probably reflects the fact that the West African samples made up 41% of the expanded database. Surprisingly, five AFDIL mtDNAs matched only mtDNAs from the two Angolan samples that make up 4% of the database. This result is consistent with historical records indicating that a large proportion of the enslaved Africans brought to the Americas came from the West Central African region of Angola/Congo region, and suggests that ethnic groups in this region of Africa need to be sampled more extensively.
Map depicting the geographic locations and the regional groupings of the population samples used in this study Figure 2 Map depicting the geographic locations and the regional groupings of the population samples used in this study.

Language group comparisons
Considering Africa's geographical size and population density, and the duration of human residence on this continent, linguistic diversity at the taxonomic level of family is amazing low. This low level of linguistic diversity is probably the consequence of protracted mobility and interaction among Africa's indigenous groups, facilitated by the longstanding presence of such organized politicalsocial units as kingdoms and empires and such sociocultural practices as polygamy.
Among the AFDIL sequences with more than five matches to various African ethnic groups, most language diversity was within the various subfamilies of the Niger-Congo family. These subfamiliesinclude Atlantic Congo (e.g., the ethnic groups Fula, Yoruba, Wolof, Balanta) and Mande (e.g., the ethnic groups Mandingo, Mende, Bambara). However, in some of the sequence matches, different linguistic families were represented altogether, including the Afro-Asiatic (e.g., the Tuareg ethnic group) and Nilo-Saharan (e.g., the Dinka ethnic group) families, along with members of the Niger-Congo family.

Discussion
Because only a small fraction of the sub-Saharan African ethnic groups have been sampled, and there are parts of sub-Saharan Africa that are poorly represented in our database (Figure 2), our database cannot be considered a representative subset of the sub-Saharan mtDNA gene pool. Nevertheless, it is clear that a much larger database is needed since 40% of the African-American samples analyzed have no exact match in our database. The extensive sharing of mtDNA haplotypes among ethnic groups from different regions of Africa is consistent with the historical evidence of extensive migration and mixing of African ethnic groups. Indeed, the well-documented Bantu migrations appear to have had a major impact [4], as have the formation of the historic empires and kingdoms of the region (such as the historic empires of Ghana, Mali, and the Songhai, Bakongo, and Ashanti Kingdoms). Despite the limitations of our database of sub-Saharan mtDNA sequences, it is likely that we have identified the most common haplotypes found in this region. Some are found throughout the region that includes the Bantu migrations, and others are found primarily in either the western or the eastern parts of the continent. We intend to continue to increase the size of our database, because a significantly larger database would provide more information about haplotypes that are present at lower frequencies than the most common haplotypes. Some of these lower-frequency haplotypes are likely to be shared among widely  Another way to assess our sub-Saharan mtDNA database would be to see how well African-American mtDNAs match database sequences. Historical accounts of the trans-Atlantic slave trade indicate that most North American slaves came from the western coast of Africa, including the geographical regions from present-day Angola to Senegal. When African-American mitochondrial DNA HVS-I sequences were studied, nearly half were identical to those from two or more African ethnic groups in our expanded database. Furthermore, the average number of perfect matches per matching African-American mtDNA increased from 3.6 different ethnic groups to 6.1 different ethnic groups when the size of the database was increased by 53% to its present size of 3725 sequences. These results reflect the fact that approximately half the mtDNA sequences in our sub-Saharan database are shared by members of three or more ethnic groups.
In both of the African-American samples, approximately 40% of the mtDNA sequences did not match any sequence in any other ethnic group (Table 3). However, more than half of these sequences differed from multiple database sequences at a single position (Table 4). Because it is unlikely that more than a few of these differences result from new mutations that occurred in North Amer-ica or that more than a few lineages went extinct in Africa after being introduced to the new world, this result suggests that only a small fraction of the mtDNA diversity present in sub-Saharan Africa has been sampled, and that much of the unsampled diversity is due to single mutations that have occurred in the common haplotypes.
Many African Americans are interested in learning more about their African roots and are willing to pay to have their mtDNA analyzed in the hope that it will match DNA from a particular African ethnic group. However, as more than half of the mtDNA sequences in the African database are identical to sequences from other ethnic groups, African-American mtDNAs will be much more likely to match sequences from multiple ethnic groups than sequences from a single ethnic group. When this result is coupled with the fact that 40% of African-American mtDNAs did not match any sequence in the database, it is clear that matches to a single African ethnic group will not be the outcome for most African Americans, and even when a match to a single ethnic group is obtained, multiple matches may occur in a larger database. Furthermore, for the typical African American, the maternal ancestor who was the source of the mtDNA was just one of hundreds of enslaved African ancestors. In fact, it likely that there has been more mixing of African ethnic groups in the Americas than has ever occurred elsewhere. Thus, the ancestors of virtually all contemporary African Americans came from a large number of ethnic groups located throughout the region from Senegal to Angola.

Conclusion
Half of the sub-Saharan mtDNA sequences in our database are common haplotypes that are shared among ethnic groups from multiple regions of sub-Saharan Africa.    Herero Bostwana, Namibia 19 Vigilant et al, 1991 [14] Total 127     The finding that fewer than 10% of African-American mtDNAs matched mtDNA sequences from a single African region suggests that as few as one in nine African Americans may be able to trace their mtDNA lineage to a particular region of Africa. However, no firm conclusions should be made until a much larger database is available. It is clear, however, that nearly half of contemporary African-American mtDNAs are identical to African haplotypes that are found in multiple ethnic groups throughout sub-Saharan Africa. For these mtDNAs, it is impossible to use only mtDNA sequence information to determine which single ethnic group was the source of the maternal ancestor.

African-American samples
A sample of 78 African Americans who self-identified as Gullah/Geechee was generated by our laboratories from unrelated people sampled in the coastal areas of South Carolina and Georgia using either cheek swabs or mouthwash to collect buccal cells. DNA was isolated using a Buc-calAmp DNA Extraction Kit (Epicentre, Madison, WI) for the cheek swabs or a DNAzol procedure (Molecular Research Center, Cincinnati, OH) for the mouthwash samples. The HVS-I region was amplified and sequenced as described previously [3]. Those mtDNAs with non-African haplotypes, three with Native American haplotypes (two haplotype B and 1 haplotype A2) and one with European mtDNA (haplotype H) were excluded from further analysis (Table 9). A second sample of 104 African-American mtDNA sequences was obtained from Tom Parsons at the Armed Forces DNA Identification Laboratory. In this sample, mtDNAs with non-African haplotypes (five haplotype H, one haplotype J, and one haplotype U4) were excluded.

Database assembly
A database of 3725 mtDNA HVS-I sequences from people living in sub-Saharan Africa was assembled from the published literature in October 2005 (Table 7) with the addition of 80 new mtDNA sequences from people belonging to the Malinke and Bambara ethnic groups in Mali (Table  8). DNA from these latter samples was isolated using a BuccalAmp DNA Extraction Kit (Epicentre, Madison, WI) from cheek swabs obtained from unrelated volunteers. MtDNA HVS-I sequences from two African-American population samples were then compared with these databases to determine how often individual HVS-I sequences are identical to African HVS-I sequences in the databases. For these comparisons, only sequences from 16030 to 16420 were considered, and both insertions and differences at positions 16182 and 16183 were ignored. In addition, a change to 16390A was inferred for all L2 haplogroup sequences that did not include this mutation. No attempt was made to correct any other errors that might be present among the published sequences. However, the presence of sequencing errors would have the effect of reducing the incidence of perfect matches so that the frequencies of perfect matches we observe should be considered minimum estimates. Matches to multiple individuals within an African ethnic group were considered a single match. Sequences included in the databases are available from Bert Ely.