Enabling NLP for Small Language Communities

Only a handful of the world’s languages benefit from today’s modern language technology, such as online search, automatic translation, and recent advances in generative AI, as exemplified by ChatGPT.

Due to the massive amounts of data required by state-of-the-art solutions, such languages will continue to be marginalised. For instance, it is not feasible for a community the size of the Faroe Islands to generate enough textual data, for modern NLP approaches to work.

This talk touches upon the core challenges in this setting, and considers a few possible solutions. We will first touch upon how we can use what we know about language, from the field of linguistic typology, before considering approaches to resource creation for truly low-resource language communities.
We will look at Creoles, a type of natural languages spoken by approximately 180 million people, which notably evolved from historical linguistic contact between unrelated languages. Creoles typically lack standardization of written language, and are frequently stigmatized due to historical ties with colonization and slavery. Whereas a large portion of the world’s languages can be characterised as low-resource, Creoles typically are in a no-resource scenario.

Because of these challenges, typical data-hungry approaches to NLP do not extend to Creoles. How can we develop technology to include such smaller communities?

Daniel Russo, AAU
Dato & Tid
torsdag, november 9, 2023, 3:00 PM - 3:30 PM
Sal 1
AI & Etik

Slides fra seminar
Slides fra seminaret vil være synlige på denne side, hvis den pågældende taler ønsker at dele dem. Bemærk venligst, at du skal være logget ind for at se dem.