Decoding Hidden Heritages in Gaelic Traditional Narrative with Text-Mining and Phylogenetics
This project was funded by UKRI-AHRC and the Irish Research Council under the ‘UK-Ireland Collaboration in the Digital Humanities Research Grants Call’ (grant numbers AH/W001934/1 and IRC/W001934/1).
August 2021–July 2024
This project will fuse deep qualitative analysis with cutting-edge computational methodologies to decode, interpret and curate the hidden heritages of Gaelic traditional narrative. In doing so, it will provide the most detailed account to date of convergence and divergence in the narrative traditions of Scotland and Ireland and, by extension, a novel understanding of their joint cultural history. Leveraging recent advances in Natural Language Processing, the consortium will digitise, convert and help to disseminate a vast corpus of folklore manuscripts in Irish and Scottish Gaelic.
The project team will create, analyse and disseminate a large text corpus of folktales from the Tale Archive of the School of Scottish Studies Archives and from the Main Manuscript Collection of the Irish National Folklore Collection. The creation of this corpus will involve the scanning of c.80k manuscript pages (and will also include pages scanned by the Dúchas digitisation project), the recognition of handwritten text on these pages (as well as some audio material in Scotland), the normalisation of non-standard text, and the machine translation of Scottish Gaelic into Irish. The corpus will then be annotated with document-level and motif-level metadata.
Analysis of the corpus will be carried out using data mining and phylogenetic techniques. Both the data mining and phylogenetic workstreams will encompass the entire corpus, however, the phylogenetic workstream will also focus on three folktale types as case studies, namely Aarne–Thompson–Uther (ATU) 400 ‘The Search for the Lost Wife’, ATU 425 ‘The Search for the Lost Husband’, and ATU 503 ‘The Gifts of the Little People’. The results of these analyses will be published in a series of articles and in a book entitled Digital Folkloristics. The corpus will be disseminated via Dúchas and Tobar an Dualchais, and via a new aggregator website (under construction) that will include map and graph visualisations of corpus data and of the results of our analysis.
- Principal Investigator: Prof. William Lamb, The University of Edinburgh (School of Literatures, Languages and Cultures).
- Co-Investigator: Prof. Jamshid Tehrani, Durham University (Department of Anthropology).
- Co-Investigator: Dr Beatrice Alex, The University of Edinburgh (School of Literatures, Languages and Cultures).
- Co-Investigator: Dr Barbara Hillers, Indiana University (Folklore and Ethnomusicology).
- Postdoctoral Researcher: Julie-Anne Meaney
- Technical Supervisor: Gavin Willshaw.
- Scottish and University Collections Archivist: Kirsty Stewart.
- Language Technician: Michael Bauer.
- Digitisation and Data Entry Technician: Cristina Horvath; Catherine Banks.
- Copyright Administrator: Louise Scollay.
- Co-Principal Investigator: Dr Brian Ó Raghallaigh, Dublin City University (Fiontar & Scoil na Gaeilge).
- Co-Investigator: Dr Críostóir Mac Cárthaigh, University College Dublin (National Folklore Collection).
- Co-Investigator: Dr Tiber Falzett, University College Dublin (School of Irish, Celtic Studies and Folklore).
- Postdoctoral Researcher: Dr Andrea Palandri.
- Research Assistant: Kate Ní Ghallchóir.
- Research Assistant: Tiernan Gaffney.
- Research Assistant: Monica Marion.
- Postgraduate Research Fellow: Monica Marion.
- Academic Advisors: Úna Bhreathnach, Kevin Scannell.
- Other members of the project Steering Group: Melissa Terras (Chairperson), Rachel Hosker, Floraidh Forrest (Tobar an Dualchais).
- Junfan Huang, Beatrice Alex, Michael Bauer, David Salvador-Jasin, Yuchao Liang, Robert Thomas, William Lamb. 2023. ‘A transformer-based standardisation system for Scottish Gaelic’. In Proceedings of SIGUL 2023, Special Session on Celtic Languages. [pdf]
- Brian Ó Raghallaigh, Andrea Palandri, Críostóir Mac Cárthaigh. 2022. Handwritten Text Recognition (HTR) for Irish-Language Folklore. In Proceedings of the CLTW 4 @ LREC2022, 121–126. [pdf]
- Mark Sinclair, William Lamb, Beatrice Alex. 2022. Handwriting Recognition for Scottish Gaelic. In Proceedings of the CLTW 4 @ LREC2022, 60–70. [pdf]
- William Lamb & Brian Ó Raghallaigh, ‘Decoding Hidden Heritages in Gaelic Traditional Narrative with Text-Mining and Phylogenetics’, Scottish-Irish Cultural Diplomacy and Cultural Relations: Arts and Humanities in Focus, Edinburgh, 11 October 2022.
- Brian Ó Raghallaigh et al., ‘Providing full-text access to Scottish and Irish folklore archives: Decoding Hidden Heritages’, ICA-SUV, Dublin, 29–31 May 2023.
- Tiber Falzett et al., ‘Enhancing access to Scottish and Irish traditional narrative: Decoding Hidden Heritages’, SIEF2023, Brno, 7–10 June 2023.
- Barbara Hillers et al., ‘Mining the Celtic Folklore Archives: Decoding Hidden Heritages in Gaelic Traditional Narrative’, ICCS, Utrecht, 24–28 July 2023.
- Beatrice Alex, ‘AI-driven language technologies and digital collections: the need for interdisciplinary communication, co-design and training’, The 27th International Conference on Theory and Practice of Digital Libraries, Zadar, Croatia, 28 September 2023.