On TTS audio description – Robert Kingett

If you’re blind or partially sighted, you’ll know about text to speech. It’s in your daily life, even if you don’t want to have text to speech in your life. The smart speakers. Smart assistants, and other things that have algorithms that people call different things depending on what decade we’re in.

Even if you’re blind or partially sighted, though, you might still not know about audio description. Audio description is a service that brings the visual world into an audio, and sometimes, Braille medium, by publishing the written description script or full film transcript with the description script and timecodes included along with dialog timecodes so Deaf-Blind people can read a movie with Braille. Audio description is a service that was started by blind people, for blind people.

I couldn’t get a general consensus of when audio description actually started. Even Wikipedia has an incomplete history of audio description. Some say, outside of Wikipedia, that it was in the 1910’s. Others say 1980s. Some say it was before. Some say it was when Netflix described Daredevil for the first time. It changes depending on who you ask, but audio description was, and is, a service for the blind and partially sighted first. It was never meant for sighted people, but sighted people take advantage of audio description all the time. Maybe they have facial blindness, but can still, in the eyes of society, see. Maybe they have a reading disability. Audio description can read text on screen. There’s a whole host of use cases for audio description, as outlined in the many episodes about audio description in the Reid my Mind podcast, and other podcasts on the subject.

Audio description even made its way into the audiobook sphere with this accessible adaptation of Winnie the Pooh that’s an audiobook.

If you’re in the audio description space, or at least care about audio description enough, you’ll eventually come across text to speech audio description.

Text to speech audio description, or TTS AD, is exactly what it says on the tin. It’s generated audio description. It isn’t crafted by humans except for some maybe tweaking here and there after things are generated. Corporations love exploiting humans for labor, so the humans will need to clean up what the TTS audio description misses. These humans are paid far less than what they’re worth, or, in some cases, not at all. One human will be cleaning up the poorly generated script for less pay. Another human will clean up the TTS voice. A third human will clean up the automated mixing, all for less pay or even no pay at all.

TTS audio description is everywhere, unfortunately. In fact, you can find many examples of TTS audio description in this master list of audio described titles in the United States.

If you take a look online, you’ll find corporations, and even some blind people, advocating for TTS audio description.

The common reasoning I hear with regards to using TTS audio description is this. Well, people won’t do it. Plus, we have to pay people, and I don’t have to pay a machine. There’s a third agonizing comment I’ll never understand to this day. That comment is, I don’t care about the quality of audio description. I care that I have it at all. Let’s put a pin in that and come back to this thought.

There’s the other side of the coin, the side I’m on, that says the quality of audio description does matter. These TTS audio description advocates also think that in a few more years, humans will never be needed to make audio description again because machines will be super good at it. Good luck when the next tech hype comes about and leaves so called AI in the dust and leaves you without access, but let’s explore the process of audio description.

Audio description is for blind people but is often provided by sighted people without any input from the very blind audience it’s supposed to be for because sighted people believe that just because they have sight, they know how to obtain information better than we do. It’s the same dismissal found in text to speech audio description. Blind audio description writers, like me, aren’t taken seriously in the audio description space because, after all, how are we going to describe something we can’t see? This is their logic, not mine. TTS AD and has this same kind of mindset. We don’t matter. Here’s your thing, go away.

In most cases, audio description is provided as a compliance assurance. The craft of audio description is never or rarely considered. This comparison of TTS AD showcases the lack of attention to the craft. It’s just a compliance thing, corporations think, it isn’t a form of art, or it doesn’t have layers to it. There are people that also think this way too, but the denigration happens because people want to just be done with audio description. They don’t care if we get the information, because they’re compliant. Why should they care if we get all the information?

This apathy, simply put, leads to inaccurate and missing information. Like I said, there are blind and partially sighted people that see this kind of breadcrumb access as just fine. I don’t. I want full inclusion. By its nature, TTS audio description will never give me full inclusion because it’s compliant. It isn’t inclusive. It’s void of culture, of experiences, and, moreover, it’s a flattened representation of the world and the directors vision. I want to have the full experience when I sit down to watch a movie or go to a show or go to a museum. I want to have the full, colorful, breath of the world with me as I drink in this art that someone crafted so I can enjoy it. TTS audio description leaves out everything complete about art. It’s taking away my access by its inauthentic nature of its generation because care wasn’t put into the generation, taking away my understanding of the art. By generating the TTS, corporations can save a few bucks.

The common rebuke to the above is, but humans get stuff wrong too! Three people could look at a painting and get three different interpretations of it. Those interpretations, though, are based in the world and based on experiences and beliefs. I’d rather have a slightly different interpretation of art than to be left out completely by getting a bland view.

Blind and partially sighted people will say that they’re saving money by providing TTS audio description. This way, movies and TV shows will get described that would’ve never had description otherwise.

Enter Vigilante audio description

While it might provide a stop gap, for now, there’s a movement out here being led by audio description professionals that are doing the work corporations refuse to do, pay for, or even contract. It doesn’t have an official name, but I call it vigilante audio description.

I do a lot of vigilante audio description because I like collaborating with people to improve skills. Vigilante audio description is relatively new. Enthusiastic consumers and professionals come together, trade skills, and work on something that helps the community. The Social Audio Description Collective did it, and sometimes still does vigilante audio description when clients aren’t begging to hire the collective. They’re not the only professionals doing fantastic vigilante audio description. Other highly skilled professionals are vigilante describing, such as the audio description creators club in the Audio Vault chat room server.

Vigilante audio description is filling a gap in the market that producers and directors aren’t taking advantage of. Vigilante audio description is providing a service for the community. No profits are involved, and, what’s even better, you get people that care about the craft of audio description and are working to improve their skills and talents rather than making a profit.

You also get to see actual audio description scripts, too!

Obviously, I’m not paid for this up front for the ones I do. I’ve received donations after people found my vigilante audio description and they thought it was good enough to throw me money, but I’m not contracted to do vigilante audio description. I do it because I want to improve my skills.

Vigilante audio description exists because producers don’t make the audio description scripts, or tracks, available everywhere the film or show is playing so the scripts, and tracks, get lost. Outside of a few writers publishing their scripts, the audio description scripts are never published so they are lost. Sometimes, an audio description track is created and never used. Sometimes, the audio description is locked behind a singular service and doesn’t travel with the film or media. Vigilante audio description is filling in those gaps by making the audio description scripts, and tracks, available everywhere regardless of geographic location or platform.

Vigilante audio description isn’t stopping corporations and producers from making their own audio description. If you pay attention, vigilante audio description is pointing out what’s wrong with the landscape of audio description, including second rate access. Studios rarely have audio description tracks travel with the media, for example, so Vigilante audio description makes it available everywhere. That’s just one of many examples of how vigilante audio description is highlighting gaps in the industry.

Vigilante audio description is providing a service because the audio description doesn’t exist or is of severely poor quality that it’s useless. Vigilante audio description is professionals showing the public what’s possible within the film and TV landscape. It’s the same as piracy filling a consumer need that business owners haven’t figured out yet.

Even if there are people that see people providing vigilante audio description as amateurs and not worth paying for, the thing is, that with each subsequent crafting of the script, the tuning of the narrator voice, the polishing of the AD producer, everybody involved in this crafting of audio description has learned something new throughout the process, and, as a result, will do better next time. It’s mutual aid audio description. With TTS audio description, the only thing you’re paying for is an experience that won’t get better until someone cares enough to make the TTS better. Comparing vigilante audio description and TTS audio description is a no brainer for me. I’ll take imperfect growth over lazy scattered breadcrumbs any day of the week.

I enjoy doing vigilante audio description work because it also allows me to show others how to get involved in the audio description industry. If more blind and partially sighted people become involved in the audio description industry, then a fuller picture of what we want will materialize. In my view, that’s a far better investment than a machine.

Thanks for reading! I don't have comments on this blog, so support me financially and or Send me an email to reply to this blog post!