1 Introduction

The yearly growth of worldwide social media usage has led to 64.4% of the world population being active online [12]. Furthermore, social media has been a place for building and maintaining a community [19]. Additionally, the COVID-19 pandemic resulted in video conferencing tools growing exponentially [25] and social media in the shape of audio chat platforms emerging. One of the new audio chat platforms is Clubhouse, which was launched in 2020 [13].

The increased engagement online in combination with everyone having the right to take part in society on equal terms [35, 36] puts pressure on social media companies to make their platforms accessible for everyone [5, 9, 38, 40]. The introduction of guidelines like Web Content Accessibility Guidelines (WCAG) [37] has solved some accessibility issues but many remain [6]. Additionally, touchscreens being designed to be handled by visual keys add a layer of complexity [20], despite the introduction of assistive technology, like VoiceOver on iPhone.

A study by Wu and Adamic [40] states that there is still a lot to be covered when it comes to researching the needs of visually impaired users and accessibility of the Internet in general and social media in particular. Similarly, regarding studies on blind users online, Nogueira and Ferreira [26] underline that the UX perspective is missing, whilst Aizpurua, Harper and Vigo [1] highlight that the subjective experience has not been investigated enough. Whilst Radcliffe [31] noted the rapid growth of the new social media, connecting the audio medium to authenticity and intimacy, Strielkowski [33], on the other hand, questioned the future of audio chat platforms. Thus, after the hype (and the COVID-19 pandemic) taking the next turn, audio chat platforms are yet to be investigated.

Thus, the new audio-based social media in combination with taking a user experience perspective on visually impaired users engaging online is a topic yet unexplored. This study is one of the first to focus on the experience of visually impaired users of audio chat platforms. By identifying which factors influence accessibility for visually impaired users of audio platforms, this study aims to enhance the understanding of these users’ experience, as well as create guidelines for accessible design of audio chat platforms. The study focuses particularly on the usability and accessibility of the iPhone application Clubhouse, when accessing it by using VoiceOver.

The following research questions guide the study:

  1. 1.

    What influences the accessibility for visually impaired VoiceOver users to successfully partake in discussions on audio chat platforms?

  2. 2.

    Which guidelines could be devised to assist in designing audio-based platforms accessible for visually impaired users?

This paper first lays out a framework for the study, describing screen and social media accessibility, as well as what to consider when designing for visually impaired users. Next, the methods used are explained, followed by the findings. Finally, the results are discussed against the framework, with concluding remarks and suggestions for further research.

2 Background

The accessibility for visually impaired users, especially when it comes to screens and social media, is set as a framework for this study. Additionally, an overview of designing for accessibility is given.

2.1 Accessibility for the visually impaired

This study builds on the social model of disability [23], which is in line with the World Health Organisation’s [39] definition of visual impairment, and with what universal design strives to accomplish [28]: a visual impairment is the society limiting the visually impaired individuals, resulting in them being (regarded) impaired.

In line with this, Seale [32], implies that the barriers are created by society, leading to individuals being excluded. Visually impaired users face several barriers when engaging online [40], especially as Web 2.0 is focused on visual elements [5]. In particular, Whitney and Kolar [38] state that navigating social media is a time-consuming and difficult experience for visually impaired users. A study by Della Libera and Jurber [13] indicates that visually impaired users favour WhatsApp, as it does not rely heavily on images. The study also adds that Facebook is one of the social media outlets that the visually impaired use, corroborated by several other studies [5, 30, 38, 40]. Thus, the results show that visually impaired users are bound to choose accessible social media platforms [13, 40]. This can lead to them being excluded from the social media their peers use, as well as (or alternatively) them not experiencing social media as relaxing [38].

2.2 Screens and accessibility

With touchscreens being designed for navigation using sight [20], assistive technologies, like screen readers, have emerged to make smartphones accessible for also visually impaired users [1, 22]. VoiceOver is an integrated iPhone screen reader that describes with audio what is visible or selected on the screen, allowing the user to navigate and interact with their iPhone using different gestures [3]. Some gestures can be used throughout all applications, for example, a two-finger flick up to read the page from the top. Several applications, however, implement integrated gestures for specific commands. For example, the two-finger single tap, also called the magic tap, plays and pauses sound in several applications. The rotor function allows for quick access to specific commands of user’s choice, for example, between which items they wish to toggle on the screen.

Despite the usage of assistive technology, using and interacting online is challenging for visually impaired users [1]. Consequently, according to Qui, Hu and Rautenberg [30], blind people contribute less online, staying as passive receivers, potentially linked to accessibility issues. Whitney and Kolar [38] suggest that accessing information by using an assistive technology is more difficult. Furthermore, they single out images with embedded text as inaccessible content despite using screen readers. This is in line with Qui, Hu and Rautenberg [30] pointing out photographs as a source of exclusion. Babu [5] noted in their study that the issues visually impaired users encountered on Facebook were, amongst other things, due to a lack of descriptive labelling.

2.3 Designing for accessibility

The United Nations Convention on the Rights of Persons with Disabilities, Article 4 [36], the Directive 2016/2102 of the European Parliament and of the Council of 26 October 2016 [15], as well as the ISO Standard 9241–171:2008 [21], all promote universal design as the approach when designing for accessibility. Universal design aims to take everyone’s needs into account when designing, promoting inclusion to the highest extent possible [28].

Creating accessible products require time and research, making existing design principles attractive to many companies [17]. WCAG is a popular standard tool that designers turn to early on when designing [6, 8, 37]. However, Babu [5] showcases that some accessibility and usability issues are unrelated to following WCAG guidelines, whilst Begnum et al. [6] point out that some usability issues are unaccounted for in WCAG. More specifically, Power et al. [29] declare that of the problems blind users face online, only half of them are covered by WCAG 2.0.

Many state that studies on visual impairments and social media should focus more on the users [1, 8, 22]. Begnum et al. [6] suggest that whilst guidelines are of importance, empathy with and knowledge of the users is of as much value. Empathy and knowledge can be accumulated by taking into account the user experience, i.e. “the experience the product creates for people who use it in the real world”, including both how easy or hard the usage is, as well as, how the user feels [16, p. 6]. Aizpurua, Harper and Vigo [1] explicitly state that user experience design is central to online accessibility, as it helps understand how users feel when interacting with a website. Yesilada et al. [41] echo the importance of taking actual user needs into account through a focus on user experience, which goes hand in hand with accessibility. Nogueira and Ferreira [26], however, state that there is a lack of studies looking into visual impairment, accessibility, and user experience, simultaneously. Out of 1015 scientific papers they reviewed, only five studies matched all three key topics. Out of those five, none had a focus on social media.

3 Method

In this section the research approach, selection, as well as methods for data collection and analysis are described. Furthermore, reliability, validity and ethical considerations are presented.

3.1 Data collection

The data collection started with familiarisation with Clubhouse, the platform chosen for the case study. The goal was to understand the context of use and facilitate the process of develo** relevant interview questions with both expert and visually impaired user participants. The aim of the interviews with experts was for the authors to extend the understanding of the visually impaired users and accessibility as a field. Moreover, the interviews aided with making informed decisions when preparing questions for the user interviews. Consequently, the aim was to identify what made Clubhouse accessible for visually impaired users, as well as where the platform was lacking in accessibility. The entire data collection process took seven weeks, and it is depicted in Fig. 1.

Fig. 1
figure 1

Visualisation of the research approach

3.1.1 Clubhouse as the case study

Clubhouse was selected as the case study since it was an established audio chat platform with more than 10 million weekly users in 2021 [14]. The versions of Clubhouse in use during this study were 0.1.31–0.1.36. Users in Clubhouse can operate as individuals or start a club. Clubs are groups joined by people interested in similar topics. In the interface, each club name is accompanied by a green unlabelled house icon.

Both individual users and clubs can start and host audio chat rooms where other users can join to listen or discuss. The different chat rooms are shown in a list named the hallway (Fig. 2), where the scheduled and open chat rooms are listed. For each open room, the following information is listed: name of the club hosting the room (if any), the name of the chat room, a few users currently on stage in the room, the total number of users in the room and the number of speakers.

Fig. 2
figure 2

Screenshot of Clubhouse (version 0.1.33), from the hallway, where upcoming and ongoing rooms are listed. Names and images have been replaced with placeholders

The chat rooms (Fig. 3) are divided into a stage section and an audience section, which in turn is divided into followed by the speakers and others in the room. On stage, the room moderator(s) and speaker(s) can engage in discussions, having access to a mute/unmute button, which is located in the same place as the raise hand button. A red line over the microphone icon next to the profile picture of a user indicates that the user is muted. When selecting the user via VoiceOver, both their name and them being muted is announced. An icon in the form of a green star next to the profile picture indicates that this person is a moderator. A beige circle around the profile picture indicates that this person is currently speaking. In version 0.1.35, the magic tap gesture was introduced, which when used, stated the name of the current speaker.

Fig. 3
figure 3

Screenshot of Clubhouse (version 0.1.33), from inside a room. Names and images are replaced by placeholders

When joining a chat room, the user is put into the audience. The audience can only listen. If a user wants to be let up on stage to join the conversation, they select the raise hand button. Selecting the hallway icon will open the hallway whilst continuing listening to the conversation in the room. To leave a chat room, the user selects the leave quietly button.

Users can access their own user profile by clicking on the profile image at the top right corner of the screen. To access the profile of another user, the search function can be used, or the profile picture of another user can be selected when inside a chat room. Users can follow clubs and other users they find interesting.

3.1.2 Selection of participants

The expert participants had working experience within the field of accessibility and visual impairments. Additionally, they were themselves visually impaired. The user participants (Table 1) were selected by convenience sampling; they all defined themselves as visually impaired and accessed their iPhones by using VoiceOver. At the time of the interviews, the participants had been users of Clubhouse ranging from two weeks up to three months. The participants were between 25 and 64 years old. They all used Clubhouse with VoiceOver and most of them were blind, two of them having 1–3% sight whilst one could differ between light and dark. Half of the participants had high technical skill levels both regarding using VoiceOver and Clubhouse, whilst the rest were on a moderate to basic technical skill level. All users and experts were located in Sweden.

Table 1 Information of the user participants interviewed

3.1.3 Interviews

The expert interviews were all conducted within one week. Four experts were interviewed individually and each interview lasted approximately 30 min. All interviews were recorded, resulting in three of them being transcribed, whereas due to a technical malfunction, one interview was written down from memory. The interviews were based on 13 topics, chosen based on the collected knowledge from the literature study as well as the observations (Table 3 in Appendix 1). Whilst the interviews had clear start and end points, the semi-structured approach allowed the experts to dwell on topics of their knowledge [10]. Two experts were active users of Clubhouse, whereas the two other experts had heard about the platform, but did not use it themselves.

The user interviews spanned over three weeks with each interview lasting between 30 to 60 min. All interviews were recorded and transcribed. The interview questions were formed based on the insights from the literature study, the observations, as well as the expert interviews. Most of the questions were broad and open-ended, allowing for a semi-structured approach with varying follow-up questions. Due to an update of Clubhouse after the fourth interview, some questions were revised before the following interviews. Simultaneously, some questions not generating additional data of value were removed (Table 4 in Appendix 1).

3.2 Reliability, validity and ethical aspects

To ensure the reliability, the data collection points were clearly defined [34], and the participants were chosen adhering to specific criteria noted in Sect. 3.2.1 [10]. Additionally, the documented audit trails contribute to transparency (ibid.). The validity was heightened by method triangulation, namely by using observations and interviews, as well as researcher perspective triangulation [24]. Furthermore, participant triangulation took place by sharing the interview statements with the participants. All participants gave their informed consent prior to the interviews [10]. The observations, however, were done with no disclosure, leading to ethical questions arising [2], although the setting observed was public and the focus was on the general behaviour rather than the individuals.

Furthermore, as capabilities and issues with using social media can be a sensitive topic for visually impaired users [30], it was of importance to be mindful that the interaction with participants did not create discomfort for them [17]. It was articulated in the interviews that the focus was on the technology, not on the user’s technical skills. Five weeks for observations intertwined with four weeks for interviews seemed as an appropriate duration considering the saturation of the information reached. Additionally, interviewing ten user participants and four experts were deemed to be a sufficient amount for reliable data [27].

4 Analysis and findings

The data analysis process is described below, followed by the results from the different data collection points being presented separately.

4.1 Analysis

Each data collection point was first analysed separately for insights from the earlier data to be used as a basis for the later data collection points. This approach preserved traceability and allowed for kee** track of which insights came from which data. The observations were continually analysed whilst the data collection was ongoing, allowing for the focus to be narrowed down gradually and clarifying the process of how to conduct the expert and user interviews.

The expert interviews were analysed by consolidating all transcribed interviews into predefined categories, as suggested by Chism, Douglas & Hilson [10]. The user interviews were analysed by allowing the overarching themes to arise from the content as explained by Arvola [4]. Finally, insights from all the data were grouped into four prominent themes and, used to formulate guidelines for designing audio-chat-based platforms in a way that makes partaking accessible for visually impaired users. This process is depicted in Fig. 4.

Fig. 4
figure 4

Data Analysis process

4.2 Findings

The results from the observations, the expert interviews, and the user interviews are presented separately.

4.2.1 Observations

The observations, as the first data point, served to familiarise researchers with the context and the platform, to be able to design better expert and user interview questions. As such, the observations showed that the technical features of the application were quite accessible. For example, almost all buttons were labelled for the VoiceOver, whereas images were limited to profile pictures of users and clubs. However, user work-around behaviour related to missing features in the application caused accessibility problems, for example, applauding with the mute button and users switching their profile picture with an embedded text to state they only were there to listen, or to show a picture mid-discussion. These instances also came up during user interviews and are discussed below within the provided themes.

4.2.2 Expert interviews

The data collected during the expert interviews were divided into the following pre-defined categories, which were based on gaps identified after conducting the observations and the literature study. In addition to the results presented next, the experts aided in practicalities regarding communicating with and interviewing visually impaired individuals.

4.2.2.1 Topic 1: Accessibility and design on social media in general

The conversations with the experts started with their perception of social media in general. A heavy focus on the visual elements as well as cluttered interfaces was issues that the experts underlined. The experts also stressed that in addition to following guidelines when designing for visually impaired users, designers should also follow best practices, include accessibility settings, and involve visually impaired users in the design process.

4.2.2.2 Topic 2: The audio chat platform Clubhouse

One positive aspect noted by the experts was the uncluttered interface of Clubhouse, making it easier to navigate. Another positive aspect was that visually impaired individuals could partake in discussions on close to equal level with sighted individuals, as there were no videos and barely any visual elements that had to be taken into account when talking. However, the experts also noted issues when visually impaired users took part in the chats, including the following: difficulty in knowing who was talking, not knowing the number of participants in the room, not being aware of applauses, as well as both asking to join the stage and joining the stage having the same audio feedback, making it unclear if one indeed had joined the stage.

4.2.3 Qualitative semi-structured interviews with users

Four themes emerged through the data analysis. The first theme encompasses Clubhouse as a social media in general, whereas the two following themes focus on the core activities in Clubhouse: being in the audience in chat rooms and being on the stage in chat rooms. The last theme highlights how the visually impaired users could receive more information on, e.g. activity in a chat room.

4.2.3.1 Theme 1: Clubhouse as a shared social space

The participants used Clubhouse for many reasons: spontaneous conversations, conversations in hang-out rooms, to listen in and discuss formal topics, to get information (there were, for instance, rooms teaching how to use VoiceOver), to find people with shared interests, and for networking.

Apart from Clubhouse, Facebook was used by all participants on a regular basis to connect with friends and get a sense of community. A majority of the participants thought that Facebook was attentive to the needs of visually impaired individuals, for instance, by integrating VoiceOver gestures and labelled buttons and icons. On the downside, they commented that Facebook had become hard to navigate due to its size, many features, as well as the complex layout with banners and many sections.

Thus, a majority of the participants were positive about Clubhouse being small in size and scope as one participant commented:

“The advantage of Clubhouse is that it’s really limited, there’s not that many things one can do”. (P2)

There was, however, a concern of Clubhouse starting to add too many features, which would make it difficult to use the application on a small smartphone screen.

“If there are too many features on a small screen, it’s hard to locate exactly where it is. [...] If there is a bigger portion of the screen dedicated to that functionality, it’s easier”. (P10)

Thus, the fact that Clubhouse was only accessed as a smartphone application, and not via a browser, also made a difference. One of the participants explained how a website is two-dimensional for sighted, but one-dimensional for those using a screen reader, since navigation takes place by moving forward or backwards and memorising the names of the icons to be able to navigate easily. When accessing Clubhouse, the participants explained that being able to locate items via VoiceOver on the screen by utilising the labelling was key.

Several participants noted that the low focus on visual elements made Clubhouse unique. Instagram, Snapchat and TikTok were platforms that some participants had tried but discontinued using or only used sparingly, due to the platforms focusing heavily on pictures and videos. Thus, the participants were positive about Clubhouse not having a dedicated way of sharing pictures, except for the profile picture.

Additionally, two participants appreciated that no camera was used within the app. Whilst most of the participants used video conferencing tools on a regular basis and mentioned both Teams and Zoom being mostly accessible, they also stated that there was an additional pressure of taking the camera into account. The participants pointed out that they did not know if they were centred in front of or looking into the camera when speaking. Additionally, video was not deemed as providing added value for themselves.

Thus, all participants were delighted about the fact that Clubhouse was sound-based. One participant mentioned:

“Clubhouse has taken over my world. I’ve been waiting for this for 10–15 years, it’s amazing that we [who are visually impaired] finally can take ownership of what we are saying, just like anyone else. This is the first app where we can be truly integrated” (P9).

Another participant mentioned that:

“We’ve joked amongst friends, we’ve wished there was something like Voicebook instead of Facebook, which is exactly what Clubhouse is”. (P6)

Some of the participants mentioned that, to their surprise, the number of Swedish visually impaired users was growing steadily on Clubhouse:

“We who are visually impaired, we are usually not the first to start using some new social media, but Clubhouse works for us”. (P8)

Three participants explicitly mentioned that there was a low threshold to start using Clubhouse, although one of them stated that:

“The strong suit of Clubhouse is that it’s very limited [...] but it’s a balancing act – of course one would like more features and enhanced capabilities”. (P2)

4.2.3.2 Theme 2: In the audience

Some of the key actions in Clubhouse are: looking for a chat room, joining a chat room and listening to the conversation whilst being in the audience.

The participants stated that it was easy to find and join chat rooms. Additionally, all participants said it was easy to leave chat rooms, as the icon for leaving the room was labelled. Initially, before update 0.1.32, there had been some inconsistency in labelling buttons related to browsing chat rooms, as some were labelled “hallway” and others “all rooms''. This made it unclear if the different buttons lead to the same or to different places in the application.

When joining a chat room, the participants expressed that, in general, it was easy to listen in on discussions. However, some did explain that initially, the layout of the room was a bit unclear, as the room was divided into a section without a heading indicating people on the stage, and an audience divided into two sections named Followed by the speakers and Others in the room. Thus, clarifying the sections in the room with headings and dividers, including clearer labels, would make navigation easier, according to participants.

User workaround behaviour related to the profile picture was highlighted by some of the participants as not accessible: people on stage changed their profile picture, as a way of “sharing” pictures with the audience during discussions. Two of the participants mentioned that whilst in the audience, sometimes they just wanted to listen. However, they still were encouraged to join the stage. Thus, the visually impaired participants wished for a way to signal to other users that they were only interested in listening, which sighted users did by changing their profile picture to one with embedded text.

The participants also mentioned that they would like to know who was speaking. Few participants were aware of the current speaker visually being marked on the screen. During the first five interviews, the participants explained that the speaker had to either state their name or the visually impaired user had to try to recognise their voice to know who was speaking. The participants were aware of the muted/unmuted icon being visually available next to each person on stage and suggested that knowing when someone was unmuting would indicate the name of the current (or upcoming) speaker.

Some technically more advanced participants used the watch functionality on VoiceOver to know who was unmuting. Watching is a functionality that tracks selected objects on the interface when their state changes, e.g. an icon changing from muted to unmuted. Having to click around in order to find out who was speaking, however, was deemed tedious as it took attention from the discussion. Nevertheless, when the first five interviews were conducted, the name of the person speaking was inaccessible on VoiceOver.

However, in the Clubhouse version 0.1.35, a gesture for VoiceOver announcing the name of the person speaking was added, hence, the remaining five interviews were conducted with this version in use. When the new feature was added, all remaining participants spoke enthusiastically about it. For instance:

“If you would have asked me just a few days ago [what I’m missing on Clubhouse] I would have said it’s a shame you can’t see who’s talking” P7 stated.

However, the feature did not work smoothly for everyone. Two participants experienced problems with the gesture, like the speaker's name not being announced or other applications starting in the background when engaging with this gesture. They assumed the reason was the chosen gesture by Clubhouse being the magic tap, which in several other applications equals “play”.

In addition to not knowing who was speaking, the visually impaired users did not know who entered or left the room and the stage. This was visible on the screen, as one could see the users’ profile pictures appearing and disappearing. With a VoiceOver gesture, the participants could go through the names one by one but lacking the possibility of a quick glance they wished for. One participant commented that:

“When you can’t see, you must consistently check if someone new comes into the room” (P2).

The number of participants inside a room was also available visually as a lineup of profiles. However, even for the sighted, this was not as straightforward as the visually impaired participants assumed. The number of participants was only available in the hallway, whereas one could get an idea of the size of the audience by scrolling through the profiles being present in a room.

From the hallway, the number of room participants was in earlier versions accessible via VoiceOver. However, the participants stated that it was tedious to jump back and forth between the room and the hallway to check the number of room participants, as one missed out on the discussion listening to several VoiceOver announcements. They wished that the total number of participants in the room as well as in each section would be accessible from inside the room.

Two participants pointed out that after an update (0.1.36) the information in the hallway had suddenly become unavailable on VoiceOver, probably due to labelling being removed. One participant stated:

“After the last update, I cannot see how many [people] there are in the rooms anymore […], they’ve removed that information, and that’s a shame” (P9).

Several participants put emphasis on the importance of following interesting clubs and speakers, as the content of Clubhouse was affected by what one followed. When moderators prompted users to follow the club organising the current room, they often referred to a green house icon located in the proximity to the club name. The participants mentioned that the icon was unlabelled and unclickable, making it impossible to locate on the screen and hence, harder to follow a club.

The participants also pointed out that it was not straightforward to follow speakers. Even if they knew the name of the speaker, they still needed to locate the speaker on stage. This was difficult considering that there often were many on stage, sometimes even with the same name. One of the participants interviewed prior to the addition of the magic tap feature, suggested that the current speaker should be placed in a dedicated area on top of the stage. Other participants, interviewed after the magic tap was added, wished that Clubhouse would utilise VoiceOver gestures to a larger extent.

“For example, [following the magic tap gesture and hearing the name of the user selected] one could [use a gesture to] pull down to get options, like ‘follow’.” (P9)

Several participants wished for Clubhouse to implement Clubhouse specific VoiceOver gestures, as gestures made it possible to access features and icons directly, instead of locating them on the screen. Utilising the more advanced rotor functionality for even more possibilities was also suggested.

4.2.3.3 Theme 3: On stage

In addition to joining chat rooms to listen to the conversations, joining the stage to participate in a discussion was another key activity. All participants had been on stage at least once and most participants considered it easy to enter and leave the stage.

However, some of the participants considered it complicated with the two-step action to join the stage. First, they had to select the raised hand icon labelled “Request to speak, button” and then, as the request was accepted by the moderators, confirm a popup with a button labelled “Join to speak, button”. Some participants wished that the sound for joining the stage would be unique, instead of sharing the sound with requesting to join. The participants suggested that it would be easier to access the stage if a VoiceOver gesture for requesting to join the stage was added, combined with a sound confirming the action. In total, half of the participants proposed a dedicated functionality for asking to be let up on stage. The participants commented that this could be combined with a waiting list of speakers, which would be most useful in bigger and more strictly moderated rooms compared to smaller informal rooms, and suggested that the moderators of each room could decide if there should be a waiting list or not. Two participants further wished for a sound notification that informed when one had reached the top of the waiting list.

An additional issue regarding the “Request to speak, button” was linked to the visually impaired users largely being required to locate items on the screen, instead of being able to utilise gestures. When a user was let up on stage, the “Request to speak, button” button was exchanged for a mute/unmute button. Whilst on stage, the participants thought it was easy to locate the mute/unmute icon, however sharing the screen location with the request to join the stage had confused few of the users initially. One user suggested that being able to mute/unmute with a gesture, instead of by locating the button on the screen, would make it much easier to use with VoiceOver.

Whilst voice as the medium was accessible in general, the sighted users on stage often signalled their wish to speak by unmuting or slowly blinking with the mute button, resulting in visually impaired users struggling to take part in conversations:

“I often interrupt [others] when talking, for example, because I haven’t been able to see that someone else has been blinking” (P9).

Whilst it was visually indicated on the screen and VoiceOver could detect if someone was unmuted, the visually impaired users could not quickly access the information. Interrupting their participation, the VoiceOver users had to listen to the VoiceOver reading through all the people on stage to hear if a microphone was muted or unmuted, or they would have to know how to use the watch functionality.

In addition to signalling a wish to speak, the people on stage applauded by rapidly tap** the mute button, resulting in a blinking icon. This was also an obstacle for the visually impaired users as several of the participants explained they could not know when others were applauding. Furthermore, the participants explained that the rapid tap** required lots of training if done with VoiceOver. Hence, all VoiceOver users could not applaud as it technically was too complicated. Whilst the participants wished for a dedicated feature for applauses, they were divided about it, as one participant stated:

“The risk is that if also the audience can applaud, and there are 1200 individuals in the audience and someone says something amazing, that’s gonna be one heck of a hullabaloo if the VoiceOver reads the names of everyone who has applauded” (P10).

To solve this challenge, participants provided several suggestions. One was to restrict applause only to users on stage. Another suggestion was to allow hearing only the applause directed at oneself. Some suggested that the applause could be conveyed by a short sound or vibration, whilst other participants wished for more VoiceOver feedback, like hearing the name of the user applauding or hearing the number of users applauding. However, the suggestions – adding sound, vibrations, names and numbers – were often followed by remarks of them easily becoming overpowering, tiring or hard to distinguish. In addition to the applaud functionality, nine participants wished that Clubhouse would add the possibility to react and to be reacted to, similar to the “liking” gesture on other platforms.

4.2.3.4 Theme 4: Sound notifications to enhance feedback

As showcased above, knowing who was speaking and applauding were two of the major features visually impaired users wished they would have access to. However, there were also other discussions regarding inaccessible information, for example, users joining or leaving a room or a stage.

Eight participants wished for a VoiceOver notification when someone entered the room or stage. One informant pointed out that it also would be good to know who joined, as they might want to steer clear of some topics if a specific person was in the room. In contrast, the majority of the participants preferred no announcement when someone left the room. On the other hand, all participants wished for a VoiceOver announcement when someone left the stage. A participant clarified that if they knew a person left the stage, they wouldn’t address that person anymore.

Since Clubhouse and VoiceOver were both audio-based, there was a possible concurrence for the auditory space. Participants wanted VoiceOver feedback, but at the same time, the announcements risked disturbing the discussions. Factors influencing how much audio feedback participants wanted were suggested as: the size of the room; the room being strict or formal; if one was listening or speaking; and if one was solely focused on the discussion or multitasking.

To keep the amount of VoiceOver announcements balanced, some participants suggested that there could be more VoiceOver announcements in smaller rooms and less in bigger rooms. Another suggestion was that one would only hear announcements related to users they followed. Yet another idea was to use sound feedback instead of VoiceOver announcements, since sounds are shorter. However, sound distinction was mentioned as a concern, as one participant stated:

“There is a risk that if there are too many notification sounds, you can’t remember them all. ‘What was that sound again?’” (P1)

Many participants pointed out that the preferences and needs probably were rather diverse amongst visually impaired individuals. Therefore, participants wished for a possibility to adjust the amount of VoiceOver and sound feedback. The participants suggested adding a setting that users could tweak according to their needs, either at a global level in the user profile or at a local level in each room. Some of the participants, however, were worried that accessibility settings may add complexity and thus recommended having a couple of pre-set levels to choose from for users with lower technical skills.

As an alternative, one participant suggested that Clubhouse could add an Activity Section where they could hear VoiceOver automatically announcing any changes or updates, such as users entering, leaving, or applauding. If users wished to discontinue the announcements, they could just leave the Activity Section. The idea was discussed with four other participants, who were positive.

5 Discussion

The findings from the expert and user interviews corroborated each other to a large extent when similar issues were discussed. In the following section, the findings of the study are viewed through the framework of previous studies on visual impairments, accessibility and design.

5.1 Discussion of results

This study set out to identify the factors influencing the interaction of visually impaired users on an audio chat platform. Additionally, an outcome of the results is the derivation of guidelines for accessible design.

5.1.1 Accessibility for visually impaired users

The first research question the study set out to answer was:

What influences the accessibility for visually impaired VoiceOver users to successfully partake in discussions on audio chat platforms?

The findings of the study showcased that there were several factors that influenced the accessibility of partaking in discussions on the platform. First, accessing Clubhouse via VoiceOver with most parts of the application being clearly labelled was a key component of accessibility. For example, due to clear labelling, users could easily join a room and request to speak. This is in agreement with Babu [5], showing how unclear or missing labelling is one of the defining factors of poor accessibility. Second, the limited scope of Clubhouse, i.e. only having a handful of features available, was another factor positively influencing accessibility. This is in line with the experts mentioning cluttered interfaces creating issues and the study by Whitney and Kolar [38] stating that visually impaired users navigating complex platforms results in missing out on information. Third, the limited focus on videos and photographs, which is an accessibility factor supported by several other studies [13, 30, 38, 40], as well as the experts interviewed. Finally, having sound as the main medium, contributed heavily to the accessibility of the platform. Both the users and the experts pointed out that not having to take into account looking into the camera when talking, allowed for visually impaired users to participate on equal grounds with the sighted. This can be linked to the study by Whitney and Kolar [38] who showcased that information can be lost when accessed using assistive technology. Thus, since the discussions on Clubhouse were a form of audio-based communication and there were no visual elements to take into account, the visually impaired users could interact with minimal use of an assistive technology.

Therefore, the visually impaired users experienced independence, ownership and inclusion when using Clubhouse. They felt they had control of and access to both the application in general as well as the discussions in particular, making the users feel they had ownership of their own actions. The users chose to stay and return, making the platform a deliberate choice of their liking. According to Bigham et al. [7] and Della Líbera and Jurber [13] blind users stick to content, they deem accessible. Therefore, in contrast to the study by Qui, Hu and Rautenberg’s [30], indicating that visually impaired users more often than sighted tend to be spectators on social media, our study shows that the visually impaired users are active content creators. We suggest this is due to the high accessibility of the platform. Additionally, as Gruzd and Haythornthwaite [19] imply regarding social media in general, Clubhouse became a place of community, especially in the light of the ongoing COVID-19 pandemic.

However, there were also findings offering indications as to what lowered the accessibility for visually impaired users. Whilst the limited features and the simple interface were deemed to enhance accessibility, they were also a complicating factor. In contrast to Whitney and Kolar [38], the visually impaired users were at times aware of the content they missed out on.

First, user work-around behaviour led to some activities and features being inaccessible to the visually impaired users. Examples of this included blinking with the mute button to applaud and ask for the floor, as well as, using the profile picture to communicate a visual message or unwillingness to participate in the discussion. As the user behaviour did not follow what the features were intended for, there were no integrated labels [5] and also the pictures with embedded text were inaccessible [38]. Second, there was information only accessible visually on the screen. For example, people on the stage were visually located on the top part of the screen, but there was no heading to indicate they were on stage. Similarly, there was a visual indication of who was unmuted and speaking, but (initially) there was no way to quickly get the same information via VoiceOver. Other aspects were the inability to easily follow a person on the platform, which was an interest of users. Finally, there was a green house icon next to the club name that room moderators often referred to as a guidance point in the interface, but since it was unlabelled, it was not possible for the visually impaired users to locate it.

All of these examples were obstacles for the visually impaired users to listen and partake in the discussions in the chatrooms. This also affected the content of Clubhouse for them, as following interesting users and clubs were what tailored their hallway feed. However, in contrast to Whitney and Kolar [38], the visually impaired users also partly had the wrong impression of what was visually accessible. For example, they assumed there were clear divisions of the sections in the chat rooms and that sighted users had an overview of the number of participants.

Finally, whilst Clubhouse was a social media platform with some less accessible features, it was fairly accessible to the visually impaired users due to labelling, a limited scope of features, as well as focus on audio chatting. Nevertheless, the accessibility of the platform was to a large extent built on the visually impaired users navigating by using generic VoiceOver gestures to locate labelled icons on the screen. Having to take into account where something was located on the screen and interrupting listening to the chats with the labels being read out could be avoided by implementing Clubhouse specific gestures for faster access. Clubhouse had only implemented one gesture, the magic tap, announcing the name of the speaker. This was highly praised by all user participants interviews when the update including the gesture had been released, despite it not working without fault. In addition to the magic tap, the users would want, for example, to be able to use VoiceOver gestures when muting/unmuting themselves or to follow a speaker on stage. Additionally, adding the rotor functionality for shortcuts would take the accessibility to the next level, allowing the visually impaired users to tailor their experience.

Another unexplored possibility for added accessibility was utilising sound and haptic feedback. For example, two distinct features, confirming the request to join the stage and being let up on stage, were announced to the user with the same sound. On the other hand, the user work-around applauds were not conveyed at all. The visually impaired users were positive about implementing clearly distinguished sound and haptic feedback for different features. However, they also showed concern that providing rich feedback could pollute the conversation, and become tiring. As the results also indicated that using VoiceOver on an audio-based platform presented risks with VoiceOver announcements disturbing the ongoing chats, the visually impaired users suggested accessibility settings or an activity centre to regulate the amount of VoiceOver feedback to a level that was helpful and not disturbing.

Thus, the platform had to a large extent not taken into use the integrated Clubhouse specific gestures, the rotor functionality, nor sound or haptic feedback. The platform was fairly accessible but lacked a true revamp of how the platform could be used by visually impaired individuals by utilising interaction modalities more suitable for non-sighted users. Thus, truly empathising with the visually impaired users did not take place when designing Clubhouse.

5.1.2 Designing for accessibility

The second question the study set out to answer was:

Which guidelines could be devised to assist in designing audio-based platforms accessible for visually impaired users?

As showcased above and in several articles [1, 6] empathising with the visually impaired users by taking a UX perspective is a key factor in designing accessible audio chat platforms. Additionally, WCAG is a popular starting point in accessible design, and at times the sole perspective on accessibility, due to restricted resources [17].

To support designing for accessibility, guidelines for designing accessible audio chat platforms were derived based on the results of this study (Table 2). The guidelines do not exist in a vacuum [6] but should be utilised in a holistic approach [1], to complement WCAG, universal and inclusive design approaches as well as best practices.

Table 2 Guidelines devised to assist in designing

6 Conclusion and future work

The heavy reliance on visual elements makes social media platforms less accessible for visually impaired users compared to their sighted peers. This study investigated an audio chat platform, Clubhouse, aiming to understand what influences the accessibility for visually impaired users to successfully partake in discussions, and identify how these insights could be used in other design instances.

The results of the study highlight several aspects that influence the accessibility for visually impaired VoiceOver users on audio chat platforms. The following aspects influence accessibility positively: labelling clearly for VoiceOver users, kee** the scope of features limited, kee** the number of images and video to the minimum, having sound as the main interaction and feedback medium as well as implementing VoiceOver gestures in combination with sound and haptic feedback. If users create work-around solutions due to the limited set of features, being agile and rapidly providing accessible alternatives should be prioritised. In general, accessibility can be heightened by allowing visually impaired users to interact with minimal use of assistive technology, as this improves their feeling of independence and inclusion. An increased implementation of integrated gestures for navigation and interaction would have the potential to significantly increase accessibility for visually impaired individuals. That would be especially true if gestures were conceptualised from the initial design, rather than being an after-development addition.

The study also contributes with a set of guidelines for designing accessible audio chat platforms that adds to the existing recommendations from WCAG, universal and inclusive design approaches as well as best practices. The guidelines include recommendations for designing accessible audio chat platforms in general, like allowing users to define their own preferences in an accessibility settings menu.

There are a few limitations to this study. Only four experts and ten visually impaired users were interviewed, which could be considered a small sample. Additionally, many participants had higher technical skill levels than an average visually impaired user, which could have caused skewed perceptions towards audio chat platforms. Whilst the population from which the participants were included was clearly defined beforehand, thus, facilitating reliability [34], the criteria could have been more strict, taking into account the level of visual impairment of the participants, as well as their technical skill level or to what extent they had been active on Clubhouse. This leads to the study being difficult to repeat which lowers the reliability [34]. Additionally, to be noted, is that the interest in Clubhouse peaked in 2021 when this study was conducted, whilst the interest in the application has decreased drastically since then [18], and in September 2023, they announced relaunching the application as a group chat [11].

Another limitation is this research being conducted as an instrumental case study based only on a qualitative approach, which possibly resulted in a limited understanding of the user needs [10]. Combining the qualitative data collection with a quantitative method would have been preferable [34], as triangulation of methods increases validity [24].

Due to these limitations and considering the lack of studies focusing on the use of social media by the visually impaired [1], the following is proposed as a future work. The new group chat version of Clubhouse should be examined to understand its accessibility. Additionally, case studies of other audio chat platforms would be a good complement, allowing comparison of results, whilst also deepening the knowledge through involving a larger number of visually impaired participants with diverse technical skills. In addition, controlled usability studies could be conducted combined with observations of participants using the platform in order to quantitatively measure UX dimensions, such as effectiveness, efficiency and satisfaction.