The Little Scottish Cluster is a group of families who share a recent Y-DNA genetic relationship. The most recent common ancestor (MRCA) of this group lived in Scotland over a millenium ago. This website is a place for the members of this cluster to come together, share information, and to trace their common roots. There is also a Little Scottish Cluster project at FTDNA, where we hope most of our data collection will take place in the future.

Introduction

The founding member of this cluster lived over a thousand years ago, before the formal adoption of surnames. His descendants now carry a wide variety of surnames. Some of these surnames were adopted early on and have a large number of members, while others were adopted much later leading to a much smaller family. Some of these surnames have remained in their native Scotland, while many of them have spread to Ireland, England and beyond.

Within each of the surname groups, the members of this cluster should be able (at least in principle) to trace their ancestry back to the first member of the Little Scottish Cluster to adopt that surname. This assumes that the surname was only adopted by one member of the cluster, and not independently by other members as well. As this cluster is fairly small however, multiple origins for a single surname is not a likely scenario.

Conventional paper genealogy as well as analysis of each members haplotype (results from genetic testing) can be used to reconstruct each surname's family tree. Paper genealogy can of course only be used as far back as paper records exist, beyond that, we have only our genetic heritage to rely on.

The illustration below gives an idea of what techniques can be used to trace our genealogy ever further back in time, all the way to the beginning. The years given in the diagram are very approximate and are only used to give a rough idea of the time period.

Timeline

To reach beyond your family surname and to study those families to whom you are most closely related is the purpose of this website. Assuming you are a member of this cluster, an issue we will discuss in more detail below, we have only genetic genealogy and the occasional historial record to guide us in understanding our larger family. By first reconstructing the ancestral haplotypes for each of the members various surnames, and then fitting these pieces together, we can hopefully reconstruct our larger family tree.

How do we identify cluster members?

Ideally, our common ancestor would have possessed a SNP (single-nucleotide polymorphism) uniquely identifying him among all other men. This SNP would be passed on to all his male descendants, their male descendants, and so on. A simple test for this SNP would indicate if you descended from this man, and thus membership in this cluster or not. Unfortunately, no such SNP has yet been found; but we can keep our fingers crossed for the future.

Instead, we must rely on comparing any candidate's haplotype (sequence of STR markers) with the haplotypes of the other cluster members. (A comparison of haplotypes is how this cluster was first discovered.) It is not sufficient that the candidate's sequence be within a certain 'genetic distance' of another cluster member. Instead, the candidate's haplotype should be consistent with what we would expect of a descendant of the cluster's founder. In particular, any "perculiarities" in the founder's haplotype we would expect to find in the candidate's haplotype.

If one were to compare the haplotypes of the members of our cluster with typical R1b values, say the WAMH (Western Atlantic Modal Haplotype), then one would find that the members shared a common set of markers whose values differed from the typical R1b values; these are our characteristic markers. They are the "peculiarities" that have been passed down from our cluster's founder. (See ySearch ID 55GU9 for the R1b modal value.)

Table illustrating the characteristic markers.

The complete list of characteristic markers is summarized in the table below:

MarkerTypical R1b valueTypical cluster value(s)
DYS 3911110
DYS 4581718
DYS 4492930
DYS 46415-15-17-1713-15-17-X
GATA H41110
DYS 59089
DYS 41323-2322-23
Group members tend to be different from the modal R1b values for the above markers. For DYS 464, X tends to take on the values 13,16,17 and 18. (All values assume FTDNA conventions)

Of course not all group members will have all of these characteristic values. Mutations can happen to any marker, even the characteristic ones. Thus each member will also have markers whose values differ from both the typical R1b values and the characteristic values. There is one marker in particular however, DYS 590, that has an exceptionally low mutation rate, perhaps even the lowest of the 67 FTDNA markers. The vast majority of the members of R1b have the value of 8 repeats for this marker. Members of our cluster however, have the value 9. To date, no member of our cluster has been identified who does not have the value 9.

If you have tested only 25 markers, and match the characteristic markers above, then the results are only suggestive that you are a member. The more markers tested, the more certain you can be. At 37 markers if you match, then it is very likely that you're a member. At 67 markers, if you fit fairly well with the characteristic markers and have DYS 590=9, then you are definitely a part of this cluster.

When fewer than 67 markers are tested, there are other groups whose member's haplotypes are very similar to our own, in particular, Irish Type III. Our most striking feature at the 25 and 37 marker level, that of having 13-15-X-17 for the DYS 464 markers is also shared by them, where it is almost always 13-13-15-17. They however tend to have DYS 459 values of 8-9, where we have the typical R1b values of 9-10 (although several members have 9-9). They also tend to have DYS 391 equal to 11, rather than our characteristic value of 10.

Haplogroup

Haplogroups have a more ancient origin than the founder of this cluster. They are defined in terms of SNP mutations that happened thousands of years before the MRCA for our cluster. As such, each member of this cluster must belong to the same haplogroup; the results for one member apply equally to all members and to the cluster as a whole.

Various members of the cluster have been tested to determine their haplogroup. In the 2007 nomenclature, this haplogroup was originally given as R1b1c. Since that time, several new SNPs have been discovered, and the naming conventions have changed. On November 7, 2008 Steven Colson, a cluster member, tested positive for the SNP L21, making our cluster's haplogroup R-L21*. In August of 2011, Clif Hinds and Frank Boggs tested positive for DF21, making our cluster's haplogroup R-DF21. For those of you tested L21+, I encourage you to join the R-L21 Project at FTDNA. For those of you who have tested DF21+, I encourage you to join the R-DF21 Project at FTDNA.

To clarify the nomenclature, R-L21* indicates that we are members of the R clade and that we have tested positive for the SNP L21. The * indicates that we do not belong to any known sub-clades of R-L21; in particular, we do not belong to R-M222. Older style haplogroup designations, such as R1b1c, change frequently as new SNPs are discovered. I will not attempt to keep up with these designations. At one point in time during 2009, ISOGG would have characterized our haplogroup as R1b1b2a1b6*, but our current FTDNA designation is R1b1a2a1a1b4.

A large diagram showing a possible structure of R-L21 with various clusters labeled can be found here. The data for this diagram came from a spreadsheet maintained by Mike Walsh for the Yahoo R-L21 project, and was generated by the PHYLIP software package. Among the various clusters that have been labeled are the Little Scottish Cluster, Irish Type III, and the Scots cluster discussed above.

Member Families of the Cluster

At present, there are about 60 different surnames known to have members included in this cluster. Some surnames such as Kilgore, Sloan, Boggs and Munro have a large number of members, who are also members of our cluster. Other surnames, have only a few members. There are many factors that affect the number of members with a specific surname.

  • Some surnames may have been established earlier than the others, allowing them more time to grow in size.
  • Some families, perhaps due to economic/social circumstances, may have been better positioned to grow in size.
  • Some surname projects are perhaps just over-sampled, and a large number of members does not accurately reflect the population, relative to other families.

Below is a table summarizing the various families that are a part of our cluster. The column labelled "# markers" in the table below indicates the minimum number of markers that have been tested by a member of that family. Families with a "67" will have members that have had at least 67 markers tested (the FTDNA 67 marker test), and are confirmed to have DYS 590 = 9.

Surname# of membersWebsite (Subgroup)# markersNotesGp
Kilgore30Groups 1, 2, 3 and 4672 of the "Kilgore Unassigned" are likely members as wellA
Sloan29R-L2167A
Boggs22Boggs, Lineage II67C
Munro19No Specific Subgroup67A
Stark13Group 267C
Hall11Unassigned Members67There are several SMGF membersA
Crockett9Subgroup Five67B
Walker9Group GR-1467A
Doig8Those with the green "Match"67There are two entries on ySearch.B
McCorkle8Group 167A
Akin7Lineage II671 member in 'Haplogroup R1b - Not yet assigned to a Lineage' may be Group BA
McClaren6No Specific Subgroup67B
Thornton6Group South Carolina A37?
Williamson6E Sub group67A
Gilreath4Galbraith - Group 637?
Mitchell3Hap R1b Group 0467A
Ewing3Group 4b67B
McGuire31790 Fayette Ky37?
Wilson3No website67All three members have ySearch entries.A
Ferguson3R1b67B
Savage2No Specific Subgroup67B
Colson2Part of R1b67A
Livingston2Boggs67C
Russell208.Lineage IV67A
McLaren2No Specific Subgroup67B
Morrison2Group Y67A
Shaw2Shaw Lineage III67A
Hodge(s)2No Lineage Assigned37?
Martin2No website37One SMGF and one ySearch member?
Gilmore2Group AB37?
McGregor2MacGregor distant37?
Thompson2Group 14 (R-M269)67A

This table only includes families that have at least two participants who are members of our cluster. There are many more families with just one member at present. If I have missed a website for your family, or you wish to add others, please let me know. I look for new members from time to time, but have no way of catching them all. Please feel free to e-mail me if you find any new members or if you are a surname project and a new member enters your project. There is an Excel (2007) file, maintained semi-occasionally, containing a list of these members as well as their haplotypes.

How are we all related?

This is a challenging question, and a question this website is here to answer. Unfortunately, we don't yet have the answer. With traditional paper genealogy, we can typically trace back our families several hundred years. But, when the records run out, the genealogy stops. Genetic genealogy does not suffer this limitation of time; the records however are far more cryptic, and can only hint at past relationships.

There are many ways of analyzing the genetic data to extract relationships. Certainly, one of the easiest and most powerful, but also time consuming and tedious, is to just stare at the data. With time, various patterns will emerge, and you'll have your first hints at our relationships. Computers of course offer us other alternatives.

Below, is an early diagram of our cluster generated by using the PHYLIP computer software. This software takes a matrix of the genetic distance between all the pairs of members of our cluster, and attempts to piece them together. In this diagram, time runs to the right. The separation between the vertical grey lines represents a duration of 10 generations, or about 250 years.

Tree Diagram
The separation between each grey line represents a duration of 10 generations or about 250 years.

There are very large (about 40%) uncertainties in the estimates of the number of generations between each pair of men. There is about a 100% uncertainty in the estimates of the mutation rates used to calculate that number of generations. As a result, we can only draw very broad conclusions from the diagram.

  • The cluster of markers containing the surnames Boggs, Brown (Livingstone), and Walters is defintely distinct from the rest of the cluster. This distinctiveness is in fact fairly robust, and remains even when further members are added.
  • As can be seen, the "depth" of this diagram is about 6 grey lines or about 60 generations (1500 years). This is the age of our most recent common ancestor (MRCA). It is important to keep in mind that this is a very rough estimate, likely only accurate within 500 years or so.

Another very popular style of diagram is the median-joined network. Below is a beautiful diagram recently (Summer 2009) produced by Steven Colson using Fluxus software.

Network Diagram
Median-joined network diagram of the cluster.

Although line lengths do not have a direct correspondence with genetic separation, you can see that the diagram contains many of the same features, but also introduces several new ones. The large group of members around the Boggs surname is common to both diagrams. New however, is another fairly large group of names that lay upstream of the Boggs group.

Although I am very much enamoured with the above analyses (in particular the latter diagram), there are several opportunities for improvement:

  • Strictly speaking, there really isn't enough information in the haplotypes to accurately determine the exact relationship of the members. Some aspects of the above trees may in fact be accurate, but others are accidental and just a reflection of the randomness of the mutations that have taken place over the centuries. Two markers which just happened to have mutated in a similar manner may make two distantly related individuals seem more closely related than they really are.
  • These trees are not particulary stable; instead of refining the existing tree, the addition of more members can shift around the braches of the tree as the algorithms try to find a more perfect fit.
  • They fail to make use of information from several valuable sources:
    1. They do not use the fact that men with the same surname are likely more closely related than those with different surnames. This may not actually be that important of a point, as the above diagrams seem to suggest that anyways.
    2. They fail to make use of 12, 25, and 37 marker data, which is in fact more plentiful than the 67 marker data. Although it's possible to find algorithms that use this additional information, these plots do not.
    3. They fail to make use of any of the hard work of traditional genealogy in determining relationships in the last several centuries.
    Failure to include this information in no way invalidates these diagrams, but one might expect to do better if it were somehow included.

The next section aims to address some of these criticisms, but will fall far short. This is a program for the future. The best way to proceed may be to split the problem into two pieces. First the surname projects could attempt to reconstruct each of their respective family trees, and in particular, their ancestral haplotypes. This reconstruction can make use of both traditional and genetic genealogy. Second, examination of the ancestral haplotypes can identify patterns which can at least partially reassemble our larger family tree in a more stable manner. As far as reconstructing our larger family tree is concerned, each surname's ancestral haplotype (including when and where he lived) encapsulates all the pertinent information, what happened to each surname afterwards is of no relevance.

Future Work

What can be done to advance things?

If we follow the approach outlined above (which certainly may not be the best approach), then both this cluster as a whole, and each surname project has important work to do.

Although the ancestral haplotype of each surname project isn't known as yet, it is possible to reconstruct the ancestral haplotype of our entire cluster with a fair certainty. For the FTDNA 67 markers, the ancestral haplotype is:

Ancestral Haplotype of the Cluster
Ancestral haplotype of the cluster. The coloured values indicate variations from typical R1b values.

The most uncertain value in the above haplotype is for CDY a. The typical R1b value is 36, while the value for our cluster is most likely either 37 or 38 (as given in the table). 38 happens to be the modal value, while 37 may give a more balanced tree. A better understanding of the ancestral surname haplotypes should resolve this issue in the future.

As mentioned above, each surname's ancestral haplotype is not available yet, some educated guesses are possible. The table below summarizes the possible ancestral haplotype for various surnames as well as giving a possible time period during which the common ancestor may have lived. Each haplotype is described in terms of how it deviates from the cluster's ancestral haplotype given above.

SurnameHaplotype(# mutations)/(# markers)AgeYear
KilgoreDYS 449 = 29/30, DYS 464d = 18, CDY b = 38/39, DYS 520 = 21, DYS 617 = 1359/853 = 7.07%25.21320
SloanCDY b = 39, DYS 537 = 1177/1128 = 6.83%24.31340
MunroDYS 385b = 15, DYS 464c = 16, DYS 576 = 17/18, DYS 570 = 17/18, CDY a = 37/38, DYS 568 = 9?35/595 = 5.88%20.11425
HallDYS 458 = 17, DYS 459b = 9, DYS 464d = 18, DYS 456 = 15/16, CDY b = 3927/498 = 5.42%19.31470
BoggsDYS 385b = 15, DYS 458 = 19, CDY a = 36, CDY b = 37, DYS 534 = 16, DYS 487 = 1431/1113 = 2.78%9.91700
StarkDYS 385b = 15, DYS 456 = 15, CDY a = 36, CDY b = 36, DYS 534 = 17, DYS 487 = 1411/404 = 2.72%9.71710
WilliamsonDYS 576 = 19, DYS 537 = 118/300 = 2.67%9.51710
WalkerDYS 439 = 14, DYS 464c = 16, DYS 570 = 17, 9/347 = 2.59%9.31720
Ancestral haplotypes of various surnames in the cluster. 'Age' is given in terms of generations, and this is translated into an approximated time period based on 25 year generations in 'Year'.

The method to determine each surnames age is not particularly sophisticated, and assumes that haplotypes from older surnames simply have had more time to accumulate mutations since their founding and so will have a greater number of mutations per number of markers tested. An average mutation rate of 0.003 mutations per marker per generation is assumed and a 7% correction is put into place because of the possibility of mutations reverting back to their previous value. One should not read too much into these ages. There is a large uncertainty in the average mutation rate, but what's worse, the average mutation rate may vary from surname to surname.

Age in Generations

The role of the Surname DNA Projects can not be over-estimated. Traditional paper genealogy is invaluable for piecing together trees and figuring out ancestral surname haplotypes. If this cluster is about 1500-2000 years old, then about half of the mutations that have occured along the various lines will have happened since the adoption of surnames. Thus the Surname DNA Projects will give us a better glimpse at what our ancestor's DNA looked like some 800 years ago. It would be much easier to piece together these much older ancestral pieces, than the recent ones. The more members the surname DNA projects have, the better their common ancestral DNA can be constructed. Moreover, traditional genealogy will add greatly to determining when and where this ancestor lived.

Analysis first steps

As can be seen from the diagrams above, there is some stable structure present in our cluster. Several of the members do group more closely to each other, than to the other members. Rather than reproduce the above trees in their full glory, we can try to be much more cautious, and simply divide our cluster into three separate groups (A, B & C). These groups manifest themselves not only in the trees above, but also in their haplotypes; a simple pattern of mutations separate these groups. Although it may be possible to further divide our cluster, and in the future we aim to do so, this breakdown is a conservative first step, that is likely to last should an increased number of markers be tested.

The diagram below, gives the definition of these three groups in terms of mutations. Group A is comprised of members with the ancestral values of 15, 13, and 14 for the markers DYS 534, DYS 487 and DYS 385b respectively. At some early point in our cluster's history, the value of DYS 534 mutated from 15 to 16 for the founder of group B. His descendants would pass on that mutation. Similary, Group C was founded by an individual who had both DYS 487 = 14 and DYS 385b = 15.

Diagram showing subclusters

There are several methods of estimating the age of each of these groups. One can look at the diversity of haplotypes in each of these groups. This diversity is due to mutations which have taken time to accumulate; hence the greater number of mutations present, the older the group. This is the technique that was used for the surnames above. In November 2009 this technique was used and the following dates found:

Group(# mutations)/(# markers) = fractionAge (gen.)Year
Full Cluster1038/9452 = 10.98%39.2970
Group B (including C)281/3444 = 8.16%29.11220
Group C93/1919 = 4.85%17.31520
Age of the full cluster as well as Groups B and C. The fraction of markers that have mutated, the corresponding age in generations, and the estimated start date are given.

One could also consider the most diverse pairs of individuals from each group, and calcuate the age of their MRCA. The most distant age would also be an estimate the age of the group.

Group A contains the majority of the members of our cluster. Apart from having the characteristic markers which define our cluster, the members of group A differ from groups B and C at DYS 534 and DYS 487. Group A members have the values of 15 and 13 respectively for these two markers; these are in fact the typcial R1b values. In the future, it would be nice to break this group into smaller pieces, once it becomes more apparent how this should be done.

Group B contains members of our cluster that "branched" off the main group approximately 800 years ago. The members of this group typically have the value 16 for DYS 534, instead of the value 15 that is present for the members of group A. They also typically have CDYa values of about 36, as compared to 37 or 38 in group A.

Group C "branched" from Group B perhaps 500 years ago. The most important mutation defining this group is the change from DYS 487 = 13 to DYS 487 = 14. They also typically have DYS 385b = 15, CDYb values about 37 and DYS 458 values of about 19.

The table below gives a list of the surnames present in each group, and a count of the members in each of them:

GroupLastnames (number of members)Total number members
Group ASloan (28), Kilgore (20), Munro (19), Hall (11), McCorkle (8), Walker (9), Williamson (6), Aiken (6), Mitchell (3), Wilson (3), Colson (2), Russell (2), Morrison (2), Shaw (2), Thompson (2), Chambers (1), Adams (1), Hinds (1), Benjamin (1), Norris (1), McHenry (1), Land (1), Hudson (1), Brown (1), unknown (1)133
Group BCrockett (9), Doig (8), McClaren (6), Ewing (3), Ferguson (3), Savage (2), Reid (1), McArthur (1), Lamond (1), Keeney (1), Curren (1), Hamilton (1)37
Group CBoggs(22), Stark(13), Livingston (2), Bogle (1), Walters (1), Clements (1), Ray (1)41
Above are a list of surnames, with the number of members in parentheses, for the various groups A, B and C.

It should be noted that the group C surnames Livingston and Shearer have the value 15 for DYS 534, whereas all other members of groups B and C have the value 16. The other markers for these individuals clearly show that they should indeed belong to group C. It would appear that DYS 534 marker mutated back to 15 in some recent common ancestor of these individuals. It should also be noted that the rooting of this tree is not 100% certain. It could in fact be that Group B is the root of the tree with groups A and C deriving from it.

The above grouping can only be done for surnames who have members of our cluster who have tested for at least the 67 marker FTDNA test. There are important surnames such as Thornton and Gilreath for which we have only 37 marker results, and as such can only guess at their proper group. Judging by the low values for CDYa, and the fact that they have DYS 458 = 19, Thornon is possibly members of group B. The Gilreath members on the other hand have haplotypes that are consistent with group A.

History

This clusters origin may well be in the vicinity of Stirlingshire, Scotland about 900 to 1200 years ago. Those wishing to know more about what our possible history may be might enjoying reading Steven Colson's article on our history from the
November 2007 issue of the Journal of Clan Ewing.
If you have comments or suggestions please feel free to contact the web master, and cluster member, Alex Williamson at Alex _at_ LittleScottishCluster.com. Or you can contact Steven Colson at Steven _at_ LittleScottishCluster.com We are always looking for more data and more insight into our cluster. If you have either of these, please let us know.