ElevenLabs, the viral AI-powered platform for creating synthetic voices, has raised a new round of cash.
Today, the startup announced the closure of a $19 million Series A round co-led by entrepreneurs Nat Friedman and Daniel Gross alongside Andreessen Horowitz. Other participants included heavyweights Creator Ventures, SV Angel, Instagram co-founder Mike Krieger, Oculus co-founder Brendan Iribe, Deepmind and Inflection AI co-founder Mustafa Suleyman and O’Reilly Media founder Tim O’Reilly.
A source familiar with the matter tells TechCrunch that the tranche values ElevenLabs at $99 million post-money — a respectable figure, especially considering that the startup launched just over a year ago.
“This investment will be used to continue building ElevenLab’s cutting-edge research hub for voice AI and to launch a range of additional products to support specific market verticals such as publishing, gaming, entertainment and conversational applications,” co-founder and CEO Mati Staniszewski told TechCrunch via email.
ElevenLabs, which has made headlines over the past few months for reasons both good and abhorrent, was founded by Staniszewski, who previously worked at Palantir, and his childhood friend Piotr Dabkowski, an ex-Google employee. Inspired by the mediocre dubbing of American movies they watched growing up in Poland, their native country, the pair set about designing a platform that could do better — leveraging AI, of course.
ElevenLabs can turn text into speech using synthetic voices, cloned voices or entirely novel “artificial” voices that mimic the sounds of people of various genders, ages and ethnicities. The company’s AI text-to-speech models are language-agnostic, allowing corporate customers to fine-tune them and build their own, proprietary speech models on top.
Coinciding with the Series A raise, 15-employee ElevenLabs is launching Projects, a workflow for editing and creating long-form spoken content. With Projects, users can generate dialogue segments and even audiobooks without having to leave the platform.
“For business-to-business partners, our technology can be used in areas such as scalable and multilingual audiobook creation, voicing characters in video games, voicing digital articles, supporting the visually impaired to access online written content and powering AI radio,” Staniszewski said.
ElevenLabs, which launched in beta in late January, picked up steam rather quickly — owing to the extremely high quality of its generated voices, speedy generation times and generous free tier. But as alluded to earlier, the publicity hasn’t always been positive — particularly once bad actors began to exploit the platform for their own ends.
4chan, the infamous message board known for its conspiratorial content, used ElevenLabs’ tool to share hateful messages mimicking celebrities like the actor Emma Watson. Elsewhere, The.Verge’s James Vincent was able to tap ElevenLabs to clone targets’ voices in a matter of seconds — generating audio samples containing everything from threats of violence to expressions of racism and transphobia.
In response, ElevenLabs said that it would introduce a set of new safeguards, like limiting voice cloning to paid accounts, banning users who repeatedly violate its terms of service and providing a new AI detection tool.
The detection tool launches today. Called AI Speech Classifier and available as an API to “selected” partners, it’s designed to detect whether an uploaded audio sample contains AI-generated content from ElevenLabs.
“Ensuring Generative AI platforms can be embraced safely is a key challenge for the whole AI-generated sector, including text, image and voice platforms,” Staniszewski said. “We must ensure that people are educated about the nature of the generative media landscape and know that such content is out there — we are committed to building tools to help people detect AI-generated content, in the interest of transparency.”
A voluntary detection tool — assuming it even works as advertised — won’t necessarily deter bad behavior. But there’s another elephant in the room that ElevenLabs hasn’t addressed: the existential threat its tech poses to voice actors.
Motherboard writes about how voice actors are increasingly being asked to sign rights to their voices away so that clients can use AI to generate synthetic versions that could eventually replace them — sometimes without additional compensation. Internal emails seen by The New York Times, meanwile, indicate that Activision Blizzard, one of the biggest game publishers in the world, is working on tools for AI-assisted “voice cloning.”
It would appear that ElevenLabs sees this as the natural progression of things, touting its work with publishers like Storytel and media platforms like TheSoul Publishing and MNTN for audiobooks, video games and radio content. (Storytel and TheSoul Publishing are strategic investors.) The company claims that it has over a million registered users across the creative, entertainment and publishing spaces who’ve created ten years’ worth of audio content.
ElevenLabs plans to eventually extend its AI models to voice dubbing, following in the footsteps of startups like Papercup and Deepdub and building what it calls “a foundation to be able to transfer emotions and intonation from one language to another.”
“This will enable any video to be dubbed into any language in an engaging, effective, and scalable way, all while maintaining the original speaker’s voice,” ElevenLabs writes in a press release. “[We are] already conducting a number of tests with industry partners to enable AI dubbing at scale.”
With $21 million in the bank ($2 million of which came from a pre-seed round in January), ElevenLabs — consequences be damned — is laser-focused on beating back its rivals in the burgeoning generative voice space. They include incumbents like Amazon, Google and Microsoft as well as startups like Murf, Tavus, Resemble AI, Respeecher, Play.ht and Lovo.