Decoding Hidden Heritages in Gaelic Traditional Narrative with Text-Mining and Phylogenetics

This project is funded by the AHRC and IRC jointly under the UK–Ireland collaboration in digital humanities programme.

August 2021–July 2024


This project will fuse deep, qualitative analysis with cutting-edge computational methodologies to decode, interpret and curate the hidden heritages of Gaelic traditional narrative. In doing so, it will provide the most detailed account to date of convergence and divergence in the narrative traditions of Scotland and Ireland and, by extension, a novel understanding of their joint cultural history. Leveraging recent advances in Natural Language Processing, the consortium will digitise, convert and help to disseminate a vast corpus of folklore manuscripts in Irish and Scottish Gaelic.

The project team will create, analyse and disseminate a large text corpus of folktales from the Tale Archive of the School of Scottish Studies Archives and from the Main Manuscript Collection of the Irish National Folklore Collection. The creation of this corpus will involve the scanning of c.80k manuscript pages (and will also include pages scanned by the Dúchas digitisation project), the recognition of handwritten text on these pages (as well as some audio material in Scotland), the normalisation of non-standard text, and the machine translation of Scottish Gaelic into Irish. The corpus will then be annotated with document-level and motif-level metadata.

Analysis of the corpus will be carried out using data mining and phylogenetic techniques. Both the data mining and phylogenetic workstreams will encompass the entire corpus, however, the phylogenetic workstream will also focus on three folktale types as case studies, namely Aarne–Thompson–Uther (ATU) 400 ‘The Search for the Lost Wife’, ATU 425 ‘The Search for the Lost Husband’, and ATU 503 ‘The Gifts of the Little People’. The results of these analyses will be published in a series of articles and in a book entitled Digital Folkloristics. The corpus will be disseminated via Dúchas and Tobar an Dualchais, and via a new aggregator website (under construction) that will include map and graph visualisations of corpus data and of the results of our analysis.

Project team


  • Principal Investigator Dr William Lamb, The University of Edinburgh (School of Literatures, Languages and Cultures)
  • Co-Investigator Prof. Jamshid Tehrani, Durham University (Department of Anthropology)
  • Co-Investigator Dr Beatrice Alex, The University of Edinburgh (School of Literatures, Languages and Cultures)

The University of Edinburgh

  • Language Technician: Michael Bauer
  • Copyright Administrator: Louise Scollay
  • Digitisation and Data Entry Technician: TBA


  • Co-Principal Investigator Dr Brian Ó Raghallaigh, Dublin City University (Fiontar & Scoil na Gaeilge)
  • Co-Investigator Dr Críostóir Mac Cárthaigh, University College Dublin (National Folklore Collection)
  • Co-Investigator Dr Barbara Hillers, Indiana University (Folklore and Ethnomusicology)

Dublin City University

  • Postdoctoral Researcher: Dr Andrea Palandri
  • Research Assistant: Kate Ní Ghallchóir




Ár leithscéal

Níl an suíomh seo comhoiriúnach leis an mbrabhsálaí gréasáin Microsoft Internet Explorer. Bain úsáid as Chrome, Edge, Firefox, Opera, nó brabhsálaí nua-aimseartha eile chun teacht ar ábhar an tsuímh.

Déan teagmháil linn ag sa chás go mbíonn aon cheist agat.