Decoding Hidden Heritages in Gaelic Traditional Narrative with Text-Mining and Phylogenetics

This project was funded by UKRI-AHRC and the Irish Research Council under the ‘UK-Ireland Collaboration in the Digital Humanities Research Grants Call’ (grant numbers AH/W001934/1 and IRC/W001934/1).

August 2021–July 2024


This project will fuse deep qualitative analysis with cutting-edge computational methodologies to decode, interpret and curate the hidden heritages of Gaelic traditional narrative. In doing so, it will provide the most detailed account to date of convergence and divergence in the narrative traditions of Scotland and Ireland and, by extension, a novel understanding of their joint cultural history. Leveraging recent advances in Natural Language Processing, the consortium will digitise, convert and help to disseminate a vast corpus of folklore manuscripts in Irish and Scottish Gaelic.

The project team will create, analyse and disseminate a large text corpus of folktales from the Tale Archive of the School of Scottish Studies Archives and from the Main Manuscript Collection of the Irish National Folklore Collection. The creation of this corpus will involve the scanning of c.80k manuscript pages (and will also include pages scanned by the Dúchas digitisation project), the recognition of handwritten text on these pages (as well as some audio material in Scotland), the normalisation of non-standard text, and the machine translation of Scottish Gaelic into Irish. The corpus will then be annotated with document-level and motif-level metadata.

Analysis of the corpus will be carried out using data mining and phylogenetic techniques. Both the data mining and phylogenetic workstreams will encompass the entire corpus, however, the phylogenetic workstream will also focus on three folktale types as case studies, namely Aarne–Thompson–Uther (ATU) 400 ‘The Search for the Lost Wife’, ATU 425 ‘The Search for the Lost Husband’, and ATU 503 ‘The Gifts of the Little People’. The results of these analyses will be published in a series of articles and in a book entitled Digital Folkloristics. The corpus will be disseminated via Dúchas and Tobar an Dualchais, and via a new aggregator website (under construction) that will include map and graph visualisations of corpus data and of the results of our analysis.

Project team


  • Principal Investigator: Prof. William Lamb, The University of Edinburgh (School of Literatures, Languages and Cultures).
  • Co-Investigator: Prof. Jamshid Tehrani, Durham University (Department of Anthropology).
  • Co-Investigator: Dr Beatrice Alex, The University of Edinburgh (School of Literatures, Languages and Cultures).
  • Co-Investigator: Dr Barbara Hillers, Indiana University (Folklore and Ethnomusicology).

The University of Edinburgh

  • Technical Supervisor: Gavin Willshaw.
  • Scottish and University Collections Archivist: Kirsty Stewart.
  • Language Technician: Michael Bauer.
  • Digitisation and Data Entry Technician: Cristina Horvath.
  • Copyright Administrator: Louise Scollay.


  • Co-Principal Investigator: Dr Brian Ó Raghallaigh, Dublin City University (Fiontar & Scoil na Gaeilge).
  • Co-Investigator: Dr Críostóir Mac Cárthaigh, University College Dublin (National Folklore Collection).

Dublin City University

  • Postdoctoral Researcher: Dr Andrea Palandri.
  • Research Assistant: Kate Ní Ghallchóir.
  • Research Assistant: Tiernan Gaffney.

Other collaborators

  • Postgraduate Research Fellow: Monica Marion.
  • Academic Advisors: Úna Bhreathnach, Kevin Scannell, Tiber Falzett.
  • Other members of the project Steering Group: Melissa Terras (Chairperson), Rachel Hosker, Floraidh Forrest (Tobar an Dualchais).




  • Mark Sinclair, William Lamb, Beatrice Alex. 2022. Handwriting Recognition for Scottish Gaelic. In Proceedings of the CLTW 4 @ LREC2022, 60–70. [pdf]
  • Brian Ó Raghallaigh, Andrea Palandri, Críostóir Mac Cárthaigh. 2022. Handwritten Text Recognition (HTR) for Irish-Language Folklore. In Proceedings of the CLTW 4 @ LREC2022, 121–126. [pdf]

Ár leithscéal

Níl an suíomh seo comhoiriúnach leis an mbrabhsálaí gréasáin Microsoft Internet Explorer. Bain úsáid as Chrome, Edge, Firefox, Opera, nó brabhsálaí nua-aimseartha eile chun teacht ar ábhar an tsuímh.

Déan teagmháil linn ag sa chás go mbíonn aon cheist agat.