Enabling NLP for Small Language Communities

Only a handful of the world’s languages benefit from today’s modern language technology, such as online search, automatic translation, and recent advances in generative AI, as exemplified by ChatGPT.

Due to the massive amounts of data required by state-of-the-art solutions, such languages will continue to be marginalised. For instance, it is not feasible for a community the size of the Faroe Islands to generate enough textual data, for modern NLP approaches to work.

This talk touches upon the core challenges in this setting, and considers a few possible solutions. We will first touch upon how we can use what we know about language, from the field of linguistic typology, before considering approaches to resource creation for truly low-resource language communities.
We will look at Creoles, a type of natural languages spoken by approximately 180 million people, which notably evolved from historical linguistic contact between unrelated languages. Creoles typically lack standardization of written language, and are frequently stigmatized due to historical ties with colonization and slavery. Whereas a large portion of the world’s languages can be characterised as low-resource, Creoles typically are in a no-resource scenario.

Because of these challenges, typical data-hungry approaches to NLP do not extend to Creoles. How can we develop technology to include such smaller communities?

Daniel Russo, AAU
Date & Time
Thursday, November 9, 2023, 3:00 PM - 3:30 PM
Theater 1
AI Ethics

Slides from presentation
Slides from the presentation will be visible on this site if the speaker in question wishes to share them.
Please note that you need to be signed in in order to see them.