Whether it’s virtual assistants on our phones or chatbots on government websites, the large language models (LLMs) that power AI tools such as ChatGPT are almost everywhere online.
But growing evidence suggests these LLMs are judging dialect speakers harshly.
In 2024 researchers from the University of California, Berkely, tested ChatGPT’s responses to several varieties of English dialects from places like India, Ireland, and Nigeria.
Compared to American or British English, responses to dialects showed increases in stereotyping (18% worse), demeaning content (25% worse) and condescending responses (15% worse).
Some models also simply cannot comprehend dialects at all. In July 2025, an AI assistant used by the Derby City Council struggled to understand a radio presenter’s Derbyshire dialect when she used words like mardy (complaining) and duck (dear/love) during a call she'd made on air to test the AI assistant.
Other dialect speakers have experienced much worse effects. As businesses and governments use more AI in their services, researchers are getting worried. AI developers, however, see an opportunity to provide tailored LLMs for dialect speakers.
In a new German study presented at the 2025 Conference on Empirical Methods in Natural Language Processing in Suzhou, China, researchers first gathered ten LLMs including OpenAI's ChatGPT-5 mini and Meta's Llama 3.1. They then presented the models with texts in either standard German or one of seven German dialects, such as Bavarian, North Frisian and Kölsch.
The models were asked to describe the speakers of these texts with personal attributes, and to then assign individuals in different scenarios. For example, the models were asked who should be hired for low-education work or where they think the speakers lived.
In nearly all tests the models attached stereotypes to dialect speakers. The LLMs described them as uneducated, farm workers and needing anger management. This bias grew when the LLMs were told the text was a dialect.
"We actually see, I think, really shocking adjectives being attached to the dialect speakers," Minh Duc Bui of Johannes Gutenberg-University Mainz, one of the study's co-lead authors, told DW.
To view this video please enable JavaScript, and consider upgrading to a web browser that supports HTML5 video
This type of consistent dialect bias is "impactful and alarming," said Emma Harvey, a PhD student in information science at Cornell University in the US.
In July she and colleagues published research that showed that Amazon's AI shopping assistant, Rufus, responded with vague or even incorrect answers to people writing in an African American English dialect. And if those inputs have typos, the replies can get even worse.
"As LLMs become more widely used, this means that they may not only perpetuate but also amplify existing biases and harms," Harvey told DW.
In India, one job applicant turned to ChatGPT to proofread his English on a job application. One of the corrections included changing the applicant's surname to one that signaled a higher position in India's caste structure, the MIT Technology Review reported in October 2025.
So one-size-fits-all LLMs don't seem to work. Instead, it might be time for AI to embrace dialects.
One paper published in Current Opinion in Psychology in August 2024 suggests that personalized AI "speaking" dialects could lead to users viewing them as warmer, more competent and authentic.
LLMs first gather a lot of text and then generate the likely result to a given prompt. The problem lies in who writes the text. This means LLMs learning from web data could also pick up what someone writes about a dialect-speaker, said Carolin Holtermann from the University of Hamburg and the co-lead author of the German paper.
Holtermann says that one benefit of LLMs is that, unlike many human speakers, biases can also be tuned out of the system.
"We can actually steer against this kind of expression," she told DW.
AI companies make sure their LLMs reply in a way that users want them to, and don’t discriminate against gender or age. But so far it doesn't seem like this alignment training includes nuanced data like dialects.
The answer might lie with more customized LLMs. One of the AIs in the German study, Aya Expanse, said that the model tested in the paper was a research-only model and that they work with business clients to customize their LLM for factors including dialects.
Other AI companies are making this customization a selling point. An LLM called Arcee-Meraj for example focuses on multiple Arabic dialects such as Egyptian, Levantine, Maghrebi and Gulf.
As new and more customized LLMs appear, Holtmann says that AI should not be considered as an enemy of dialects but rather as a flawed tool that, like humans, can improve.
