Designing Tools for High-Quality Alt Text Authoring

Kelly Avery Mack
6 min readAug 23, 2021


In this post, we summarize the findings of our research paper “Designing Tools for High-Quality Alt Text Authoring,” which was accepted to the ACM Conference on Accessible Computing (ASSETS) 2021. This paper was authored by Kelly Mack, Edward Cutrell, Bongshin Lee, and Meredith Ringel Morris while at Microsoft Research.

Images are used to convey information in numerous contexts including social, work, and education, and understanding these images is often required to make sense of what is going on. This is equally true for sighted users and for those who are blind or low vision, and/or use screen readers to interact with technology. When a screen reader user (SRU) encounters an image, their screen reader software will read aloud alt text, which is a description of the content of the image. Unfortunately, alt text is sparsely included with images and may not always be present. While technologists have created artificial intelligence (AI) based systems to create alt text for images, these systems still struggle to achieve high accuracy and relevant information for a given context or task at hand. Therefore, it is important to encourage human-users to create high-quality alt text. We designed new interfaces to support alt text authors with two tasks: 1) alt text authoring and 2) providing feedback for automatic alt texts. We then conducted two complementary studies with alt text authors and consumers to test the effectiveness of our interfaces. Our findings show that interfaces that suggest what to include in alt text are beneficial. Additionally, our study revealed a difference between SRU and author definitions of “acceptable alt text” and negative impacts of automatic alt text on final alt text quality.

four subimages (a) the free-form authoring interface. the traditional alt text editing pane with the addition of bullets before the text box. the bullets say: “the subject(s) in detail, the setting, the actions/interactions, and “other relevant info” (b) the template authoring interface. an interface pane with 4 prompts each followed by an individual text box. ( c) and (d) the feedback interfaces below an image asking acceptable, unacceptable or offensive.
Caption: Interface variations that we tested with alt text authors: authoring interfaces (top) and feedback interfaces (bottom).

Alt Text Authoring Interfaces: Both of our authoring interfaces conveyed tips on what to include in the alt text, but they differed in how much structure they provided. One presented tips in a more “prompt-response” format, whereas the other presented a consolidated bulleted list of suggestions.

Alt Text Feedback Interfaces: We created two interfaces, varying the location of the interface and the method used to collect feedback — both allowed users to indicate if alt text was acceptable, unacceptable, or offensive and provide written feedback. The location was either below the image or in a side pane, and the method of input was either with icons (thumbs up and down) or check boxes.

We conducted two complementary studies with alt text authors and SRUs. In our first study, we interviewed 12 alt text authors about their experience with authoring alt text. We then had them generate alt text with our new interface designs and with the existing PowerPoint interface as a control. In our second study, we interviewed people who use screen readers about perceptions of alt text and automatic alt text. We then asked them to review the alt text generated with our new interfaces to examine whether our interfaces were encouraging the creation of SRU-defined high-quality alt text.

Results and Discussion

Differing alt text preferences among people using screen readers

Our SRU interviewees reported the characteristics of alt text that were most important to them. Some traits were widely shared among all interviewees, including accuracy (i.e., no incorrect information) and completeness (i.e., no important information is omitted). Other characteristics varied between individuals, including use of natural-sounding language (as opposed to machine-generated language), conciseness, and detail. These latter two characteristics are often in opposition of each other. For example, the two most preferred alt texts for an image of a person drinking coffee were:

Detailed: “A woman with curly black hair, glasses, and a green sweater sits in a coffee shop or office. She has a cup of espresso in one hand, a saucer in front of her. She is leaning on one arm and looking at the camera, smiling slightly.”

Concise: “A young lady looking at the camera sitting down drinking a cappuccino.”

Implications: Our results suggest that a one-size-fits-all approach to alt text is suboptimal for alt text consumption. Technology that allows for the customization of what types of details of an image are read from the alt text could be valuable, but technically challenging. Since asking users to write multiple versions of alt text (e.g., both a more and a less detailed version) is implausible, using natural language techniques like text summarization could be an interesting area of future work.

Author experiences creating alt text with and without interface variations

Before testing our interfaces, alt text author interviewees reported trouble knowing what to include in alt text: “What’s alt text versus what’s a caption? I don’t really know … [I’m not sure] what kind of context to put in it, [and] where to begin and end.” After using our interfaces, participants felt more supported in writing alt text: “It’s kind of nice to have that detail … so you’re just not out there, like ‘what should I put?’” At the same time, the alt text generated with our interfaces was more frequently of higher quality than that created with the traditional PowerPoint interface, both according to an existing quality scale and reports from SRU interviewees. While the more structured template-based interface was considered more tedious by experienced alt text authors, less experienced authors preferred the more structured template.

Implications: Our findings suggest that providing suggestions for what to include in alt text leads to more complete, detailed alt text. While the more free-form interface with a bulleted list of suggestions was appreciated by experienced authors and the majority of the time performed better than the traditional PowerPoint interface, we found that less experienced authors to alt text could benefit from a bit more structure and support.

Feedback interface outcomes

Our author interviewees generally appreciated the feedback interface, where alt text could be marked acceptable, unacceptable, or offensive. They found the interface quick and easy to use, with most preferring icon-based feedback over checkbox. Preferences around location were split, some preferring closer proximity to the image and others preferring it to be separate in a side pane.

However, through use of the feedback interface, we found considerable differences between what author and SRU interviewees considered to be “acceptable” alt text. SRU interviewees specified that accuracy was the most important; inaccurate alt text was considered “misleading.” For example, all SRU interviewees marked automatic alt text that stated a person was sitting “on a table” as unacceptable, since the person was actually sitting “at the table.” On the other hand, eight out of 11 author interviewees marked this same alt text as acceptable. They noted that what they considered “acceptable” automatic alt text was a “pretty low bar” and they expected the AI to include “just the basics.”

Implications: We found a difference between alt text author and consumer definitions of automatic alt text. This finding highlights two key areas worth further investigation. First, it is important to create feedback mechanisms for alt text consumers, since they are the end users of the technology. Second, alt text authors need to be better supported in understanding what is important to be included in alt text, which can be assisted by our interface designs.

Automatic alt text views and effects

Finally, we found that author interviewees preferred to have the alt text authoring area prefilled with automatic alt text generated by AI. Even though several authors commented on the inaccuracy of the alt text, they preferred to have it since removing it was of little cost and it helped them feel more supported: “ … it makes me feel like somebody is helping me and I like that because I’m more inclined to put the effort into it because I’m like, ‘Oh, you don’t have to do it alone.’”

However, we also found that the quality of alt text, as judged by a subjective scale, scored significantly lower than the quality created when starting from a blank text box. For an image of a young person with curly black hair and glasses drinking coffee, the automatic alt text was “A person sitting on a table.” The alt text created by one interviewee before and after seeing the automatic alt text was:

Before: “A young lady with dark curly hair and glasses, sitting down at a coffee table. She is holding an espresso cup with her right arm and leaning her head on her left hand.”

After: “A young female person sitting on a table, smiling at the camera.”

Implications: We suggest that further research investigate how to encourage authors to more carefully review automatic alt text, perhaps by prompting them to check for specific elements or add key information that an AI system cannot recognize (e.g., the context for why the image was included).

We hope that this work spurs conversation around the use of automatic alt text and feedback interfaces in applications that use alt text, and we are excited to report that a gated release of the free-form authoring interface has been released in PowerPoint.