I’ve recently been experimenting with ElevenLabs’ text to speech technology to see how easy it is to produce high quality, instantly generated audio using AI. Turns out it’s really easy.
With less than ten minutes of recorded audio, I was able to transform my Dante-inspired post “Why Hell beats Heaven” into cloned versions of me speaking…
For longer samples of the audio and text, see the links above.
Here’s how I did it—
Record an audio sample
Translate text using GPT-4
Input translated text
That’s really all there is to it. And while the AI tools used above run a combined $40/month, you can still get a lot of play from the free combo of Eleven for hobbyists and ChatGPT. It’s never been this easy to make the world your audience.
Thanks for reading! If you enjoyed this, check out my other writing in the Infovores Newsletter. You can also follow me on Twitter.
I'm thinking about how this technology will improve medical practice (to take one small example). Taking care of a psychiatrically ill patient who doesn't speak your language is difficult. Human translators are great when you can get them but it is an imperfect system, especially when the patient speaks an uncommon language. Having instant translators like this in our pockets will change the game.
American here. In Tokyo in 2018 our English-only group was assigned a Japanese technical translator to help us navigate meetings, etc.
Very competent and sociable guy. One of my smart-alec colleagues opened Google audio translate for English-to-Japanese. Said some things and pressed the translate button.
After hearing the result, the translator was crestfallen. "I have to get another job."
Are we re-entering the time of Babel?