Disability Representation in Text-to-Image AI Models
In this post, we summarize the findings of our research paper “‘They only care to show us the wheelchair:’ Disability Representation in Text-to-Image AI Models”, which was accepted to the ACM Conference on Human Factors in Computing Systems (CHI) 2024. This paper was authored by Kelly Avery Mack, Rida Qadri, Remi Denton, Shaun K. Kane* and Cynthia L. Bennett*. (*Both authors contributed equally to this work).
Generative artificial intelligence (GAI) is becoming increasingly popular, with text-to-image (T2I) models becoming more powerful in the types of representations they can produce. Text-to-image models take in a text prompt and generate an image corresponding to the prompt. However, prior work has found that GAI models can recreate problematic biases and stereotypes that exist in the real world and the data that they were trained on [1–2, 4–5]. We wanted to investigate how T2I represented people with disabilities, and specifically, if they produced images that perpetuated bias or ones that showed positive portrayals of disability.
To understand how disabled communities feel about the representation of disabled people that T2I models generate, we conducted eight focus group sessions with 25 people. In these sessions, we asked participants to share their thoughts about different images we generated based on the topics that interested them, and we asked them to prompt the model themselves and respond to the results. The images we used in this study came from three publicly available models in the summer of 2023: Midjourney, DALLE 2, and Stable Diffusion 1.5.
Models produced negative tropes about disability
Our participants identified many negative stereotypes of disability that were common in these models. Our paper contains a full list, but here we share a few of the most common or notable themes. It is important to note that these problematic representations have real world impacts for people with disabilities. As one of our participants commented: “These stereotypes can limit opportunities for people with disabilities in … education, employment, and social interactions by shaping negative perceptions about their capabilities.”
Too many wheelchairs
Models overwhelmingly produced images of people using wheelchairs when we prompted the model for topics about disability, which was also a trend found by researchers looking at the outputs of large language models [4]. While some people with disabilities do use wheelchairs, there are also a wide variety of other experiences with disability that rarely appeared in the model outputs, unless we explicitly prompted for them. This trend perpetuates the misconception that almost all people with disabilities use wheelchairs.
Over-emphasis on negative emotions
Our participants were also unhappy that most of the people in the model’s images looked sad, lonely, or in pain. One participant commented: “what struck me is that we always concentrate on the bad,” and explained that, while some elements of disability might be hard, there are many neutral to positive experiences when living with a disability as well. People were frustrated that there were not “more smiles,” and that the model overwhelmingly showed people looking stationary and upset.
Dehumanizing imagery
Finally, the models output images that were dehumanizing. Some images had near horror-movie-like aesthetics, which were unsettling to the participants. These images include faceless heads, stacks of grey bodies, faces that have unrealistically exaggerated decaying skin and sunken eyes, faces covered in shadows, and tattered or dirty clothes. One participant stated: “this image should have required a long and detailed prompt to achieve something so bizarre,” rather than being returned for a simple prompt like: “a person with depression.”
Another form of dehumanizing representation was showing images that cropped the person out of the frame. These images highlighted the assistive technology at the expense of removing a person’s face from the photo.
People wanted images of disabled people doing everyday activities
Instead of these images of people using wheelchairs looking sad or lonely, our participants wanted to see representations of disabled people in community with nondisabled people doing everyday activities, like cooking dinner, playing sports, or parenting a child. They emphasized the desire to see a diverse range of disabilities, races, genders, and ages represented in the images, since “[disability] looks like you and you and me. And that person over there and that person over there.”
Some topics were more complicated
Our participant’s feedback was not always so straightforward. Sometimes they discussed their own conflicting priorities in representation or directly disagreed with each other. Here are a few of the top issues where our participants had conflicting opinions:
- How can we represent disabilities that are less visually discernible in images? Some disabilities are more or less visible than others in different contexts [3]. For example, our participants were split about how to represent people with chronic or mental health conditions, people with Autism, or people with epilepsy in a single image. Participants wanted to be able to show these disabilities in images, but were unsure of how to represent them in a single image without falling back on reductive representations (e.g., showing a person with depression looking sad).
- How do models choose what details to add to an image? No prompt can ever fully define what an image should look like; the model needs to fill in some of the blanks. For example, if a person enters the prompt “a person playing basketball,” the model needs to decide if the person is playing with other people, if they are playing inside or outside, and what kind of clothes they are wearing. Sometimes, our participants did not approve of the way the model filled in the blanks in their prompts. For example, images of “a person using a walker in a park” showed only the disabled person rather than a busy public park, which participants felt made the person look lonely.
- How can model filters protect users without offending them? We found that the filtering mechanisms put in place to help ensure generated images follow content policies sometimes filtered out certain groups altogether. For example, when we put the prompt “a person with bipolar disorder” into three models, two returned results, and one returned an error message stating: “It looks like this request may not follow our content policy.” With this response, the model suggests that the user is incorrect or bad for trying to prompt about a stigmatized disability (in this case, bipolar disorder), which further alienates people with this disability. Models create filters to attempt to keep users safe by preventing them from seeing offensive content. But, when they filter out or are unable to show respectful images related to our participants’ experiences, these filtering mechanisms contribute to making topics like disabilities taboo when communities want them to be normalized.
In summary, text-to-image AI models are biased in how they represent people with disabilities. They are biased towards showing people in wheelchairs and showing people looking sad, incapable, or scary, which are not accurate portrayals of the vast diversity of disabilities and experiences that exist. Instead, participants wanted realistic portrayals of disabled individuals doing everyday things to help normalize disability in society. We argue that AI model developers need to continue to engage with people with disabilities to limit the harm that these models produce. We invite you to read our paper for more details.
References
[1] Bender, Emily M., et al. “On the dangers of stochastic parrots: Can language models be too big?🦜.” Proceedings of the 2021 ACM conference on fairness, accountability, and transparency. 2021.
[2] Bianchi, Federico, et al. “Easily accessible text-to-image generation amplifies demographic stereotypes at large scale.” Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency. 2023.
[3] Faucett, Heather A., et al. “(In) visibility in disability and assistive technology.” ACM Transactions on Accessible Computing (TACCESS) 10.4 (2017): 1–17.
[4] Gadiraju, Vinitha, et al. “” I wouldn’t say offensive but…”: Disability-Centered Perspectives on Large Language Models.” Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency. 2023.
[5] Qadri, Rida, et al. “Ai’s regimes of representation: A community-centered study of text-to-image models in south asia.” Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency.