Keywords

1 Introduction

The availability of Sign Language in media varies world-wide, primarily due to differences in accessible communication services, such as Sign Language Interpreters (SLI) [1]. This variation is largely influenced by each country’s recognition of sign language (SL) and the legal requirements to provide SLI in media. Approximately 2.8% of the U.S. population—including Deaf or hard-of-hearing people (DHH), family members, educators, and interpreters—are SL users. Within this group, 100,000 to 200,000 people are Deaf primary users of American Sign Language (ASL) [2].

The inclusion of SL in media offers benefits that subtitles cannot, such as real-time representation of intonation, emotions, and other auditory information. Additionally, literacy levels within the Deaf community vary widely [2, 3]. Many people who are proficient in reading closed captions still prefer SL as their native or primary language. Access to information in SL is recognized as a human right under Article 21 of the United Nations Convention on the Rights of Persons with Disabilities [4]. The W3C Web Content Accessibility Guidelines (WCAG) require sign language media for Level AAA compliance, as per Success Criterion 1.2.6 [5]. Additionally, the World Association of SLI and the World Federation of the Deaf guidelines require it for emergency broadcasts [6]. The use of SLI in media is not a recent innovation but rather a technology that has existed for over 70 years and remains under-investigated [7].

A milestone in the U.S. was reached when the National Association of the Deaf successfully sued the White House to ensure the provision of SLI during all briefings and emergency communication [8, 9]. However, this requirement has not been universally applied across all media outlets, and enforcement mechanisms are lacking to ensure widespread content availability. In some instances, media outlets retransmitted press briefings but cropped out the SLI, highlighting the challenges in consistently applying W3C Level AAA standards across different media. More work on state-of-the art technology is needed to ensure universal provision of SLI in video programming.

Our goal is to ensure that interpreted content is accessible and can be toggled on and off, like subtitles for the DHH. Toward this goal, we cover a pair of usability studies on a functional proof of concept implementation named “Closed Interpreting Accessibility” (CIA). They build upon previous research conducted in Europe [7, 10], Asia [11], and the U.S. [12], and focus on how specific features and customization options available in the CIA prototype can enhance accessibility.

This paper is structured around two research questions:

  1. 1.

    RQ1: Which technical features are most useful in supporting SL in visual media?

  2. 2.

    RQ2: How do viewers of the Deaf community interact with and process information under different viewing contexts?

2 Related Work

Bosch et al. [7] investigated placement and viewing behavior on TV, including aspects such as eye gaze patterns, attention shifts, and visual strategies. They found that users focus more on SLI positioned close to the main screen area, rather than in a picture-in-picture (PIP) format and prefer a medium size for the interpreter’s video. A study in Asia focused on users choosing settings such as color, placement, and on/off features [11]. This research revealed a preference among deaf users for medium-sized interpreter videos with no background color, positioned outside the video window, depending on the content type. Additionally, Kushalnagar et al. [12] found that an interpreter window size of at least one-third of the full screen is optimal for visual processing. Debevc et al. [10] and Kushalnagar et al. [12] also noted a preference among deaf users for access to customized controls and a transparent background for the SLI’s video.

From these studies, a clear pattern emerges. Deaf users prefer medium-sized sign language interpretation with a transparent background and, crucially, the ability to customize these settings according to individual preferences and needs. The concept of CIA is fairly new and has not been investigated much in the U.S. yet. This project aims to learn more about the available features of sign language interpreter placement, background, size, and positioning should be applied in practice. Our prototype made modifications to the open-source WCAG-compliant AblePlayer [13], which added functionality to drag, and resize the SLI, as well as change background color and transparency.

3 Method

Two studies were conducted. The first focused on user evaluation and usability of the proposed new features with the proof-of-concept platform, where users can adjust the SLI, and the second focused on analyzing eye gaze and attention behavior.

3.1 Study #1: Technical Features: Evaluation and Usability

Ten DHH and hearing individuals participated in the study. All were proficient in sign language, and five also used speech. Each received a $25 gift card. This study employed a mixed-methods approach under controlled conditions. The experimental design had two components: (i) video viewing with user customization, with descriptive observation, and (ii) evaluation surveys. The videos were presented on a mockup news website using AblePlayer. We added SL support through a drop-down menu for selecting a SLI with different color backgrounds (Blue, Green, and White) and transparency levels from 25% to 100%. Users could drag and reposition the SLI.

Four videos were presented to each participant: an Oscar ceremony clip, a news segment on gas prices, an interview, and an animal documentary, each ranging from 45 to 75 s. Two Certified Deaf Interpreters (CDI), one African American and one White, provided interpretation. The combination of four videos and two interpreters was randomized on a per-participant basis. Observational data were collected using the Think Aloud Protocol (TAP) [14], allowing participants to express their thoughts, feelings, and decisions in real-time while being recorded. An ASL version of the System Usability Scale (ASL-SUS) was used to evaluate the system’s usability, along with a 7-point Likert-style adjective scale about the overall user-friendliness of the technology [15]. This scale included 10 questions, each rated on a 5-point Likert scale. Additionally, a question about the overall user-friendliness of the technology was asked with a 7-point Likert scale ranging from Worst Imaginable to Best Imaginable.

3.2 Study #2: Interaction and Processing by Users

Nine DHH individuals, aged 18 to 44, participated in the study: five males, three females, and one nonbinary individual. Three were Black, three White, two Asian, and one Hispanic. Five participants preferred both English and ASL, while the rest preferred ASL. Each received a $25 gift card. This study utilized a mixed-methods approach under controlled conditions. The participants observed stimuli on a 27′′ monitor, positioned 42′′ away at eye level, and a mounted webcam recorded eye movements. The Gorilla Experiment Builder [16], sampled the video every 50 ms to generate eye tracking data. Due to technical difficulties, eye-tracking data was analyzed for only one participant, rather than all participants.

The stimuli had two 60-s videos depicting two scenarios: passive (interview) and active (basketball game). For each scenario, the SLI was presented in three sizes (regular, large, and super) and two positions with respect to the video (interior, exterior): see Fig. 1 for examples from the active scenario. We excluded Interior/Super because it did not fit inside the video. Participants watched 10 videos in random order to cover all video types, sizes and positioning combinations. The SL content was prerecorded by a CDI as in Study #1. Participants completed a brief survey and provided feedback after each video. A 5-point Likert scale assessed (i) comprehension, (ii) visibility, and (iii) cohesion, followed by an open-ended question.

Fig. 1.
5 screenshots have a photo of different views of a basketball stadium with players. An overlap** photo of a man with a hand gesture is in different sizes on the screen, revealing exterior-regular, exterior-large, exterior-super, interior-regular, and interior-large.

Interpreter positioning and size in the active scenario.

4 Results

4.1 Study #1: Technical Features: Evaluation and Usability

Upon aggregating the scores of all ten participants, the mean System Usability Scale (SUS) score 82.2, which indicates good overall usability and effective system design. Notably, two participants recorded lower scores of 65 and 55, respectively. Despite these variations, the mean score remains significantly above the average SUS benchmark of 68, denoting a high level of usability. On the adjective scale, reactions were positive. 33.3% rated user-friendliness as ‘Good’ (score 5), while a larger segment, 50%, perceived it as ‘Excellent’ (score 6). Finally, 16.7% of users rated it as the ‘Best Imaginable’ (score 7). Participants liked the ability to reposition the interpreter within the window video: “The draggable signer window is fantastic…” Customizable backgrounds had a mixed reception, especially the transparency feature. Two-thirds of the participants felt that this feature was not helpful because the variability of foreground color can interfere with the viewing experience. One participant commented: “It (transparency) looks cool at first, but when I enter Fullscreen mode and activate it with 50% opacity with a busy background, it is difficult to view”.

This an experimental study where the platform was built on AblePlayer, and it was not fully tested beforehand for its UIUX. Therefore, there were some glitches, which were resolved during the study. Notably, the intuitiveness of the preference settings is complex and not friendly to inexperienced computer users. Some participants require more time and guidance to take advantage of the features offered. Ultimately, most participants agreed that the CIA was a useful platform to support the accessibility of SL in the media. Few respondents indicated they would still prefer PIP or auto-generated live captions for immediate accessibility.

4.2 Study #2: Interaction and Processing by Users

In the passive scenario, participants predominantly focused on the right side of the screen where the interviewer and interpreter were positioned. The heatmap (Fig. 2) provides insight into a participant’s gaze activity, highlighting attention on both the interviewer and, notably, the interpreter. Participants slightly preferred the exterior interpreter scenarios (Fig. 3); however, paired t-tests showed no significant differences. Participants commented that exterior placement facilitated optimal visibility of both the interviewer and the guest without any visual obstruction: “I prefer an interpreter on the exterior from outside of the video because it is easy to see clearly on the side of the video than ‘in the way’ of the video screen”. We observed that larger interpreter sizes, particularly the Exterior/Super setting, obscured the background video or introduced visual distractions.

Fig. 2.
A screenshot has a photo of an interview setup with a photo of an interpreter in the bottom right of the screen. An irregular patch of heatmap overlaps the interviewer and interpreter.

Passive scenario Exterior/Regular interpreter placement.

Fig. 3.
A grouped bar graph of passive scenario evaluation plots bars for comprehension, visibility, and cohesion in levels of agreement versus interior regular, interior large, exterior regular, exterior large, and exterior super.

Passive scenario evaluation interface behavior.

In the active scenario of a basketball game, data from one participant showed more variability in their eye gaze patterns. Their attention was not solely confined to the interpreter but engaged with the entire screen. This variation can largely be attributed to the moving and unpredictable nature of the on-screen action, primarily centered around the movement of the basketball (Fig. 4). No consistent preferences as to placement emerged (Fig. 5), although one participant commented on a preference for the interior interface: “Less strain on my eyes and still want to see the actions and the interpreter at the same time”. Others noted that larger interpreter sizes obstructed the content in the corner of the screen, interfering at critical moments, such as a winning shot. A participant stated: “A large-sized interpreter outside the video is much better and clearer”.

Fig. 4.
A screenshot has a photo of a basketball stadium with a photo of an interpreter overlap** the bottom right of the photo. An irregular patch of heatmap overlaps the photo, leaving a small portion at the top.

Active scenario Interior/Regular interpreter placement.

Fig. 5.
A grouped bar graph of active scenario evaluation plots bars for comprehension, visibility, and cohesion in level of agreement versus interior regular, interior large, exterior regular, exterior large, and exterior super.

Active scenario evaluation interface behavior.

5 Discussion

Participants liked the usability of the CIA prototype. In retrospect, both studies have addressed key research questions in terms of proving improved accessibility in the media with SL. The SL-specific features presented in the first study are all useful, especially with respect to the sizing, draggable interpreter placement, and background options. That way, the outlets could add the accessibility feature without compromising the primary content delivery, such as adding a PIP that cannot be turned on and off. To enhance user experience, it is essential to include a comprehensive set of options that balance a wide variety of choices with an intuitive preference-adjustment interface. These findings are consistent with the specific elements explored in the study conducted by Bosch et al. [7], Debvec et al. [10], and Yi et al. [11].

Results from the second study show that participant preferences with respect to interpreter placement and sizes vary, and limited data on the eye gaze suggests that placement may depend on the type of video shown; however, additional analysis of eye gaze is needed to confirm this. Furthermore, an interpreter occupying approximately one-third of the screen size, referred to as ‘Regular’, was preferred for both types of content. This finding aligns with the work of Kushalnagar et al. [12]. To enhance user experience, a comprehensive set of options must be included, balancing the variety of choices with an intuitive interface for adjusting preferences. This should encompass the option for broader adoption of this technology in mainstream media.

These studies collectively reveal a clear pattern and alignment with findings from other research in the field. Ultimately, this study’s primary goal is to enhance user experience, so personalization of the interpreter layout should be prioritized to align with individual viewing patterns of the diverse user’s preferences. This includes fully accessible preferred language options by the DHH community while encouraging the media outlets to include SLI in their programming without compromising the aesthetic and viewing screen landscape. Moreover, considering the integration of other devices, like Smart TVs and tablets with gesture commands, is essential due to their unique functionalities and the potential impact on user interaction. To achieve comprehensive and inclusive results, future studies should adopt a systematic research design incorporating diverse community members’ perspectives.