They frame it as though it’s for user content, more likely it’s to train AI, but in fact it gives them the right to do almost anything they want - up to (but not including) stealing the content outright.
They frame it as though it’s for user content, more likely it’s to train AI, but in fact it gives them the right to do almost anything they want - up to (but not including) stealing the content outright.
That’s some wild speculation there.
What you described would be a contrived and inefficient workaround that would have little to no impact on its legality compared to just using the underlying texts as part of a training corpus.
Not sure why you think Spotify wouldn’t want to eliminate the cost of voice actors and production. If you’re self-publishing, recording and producing an audiobook traditionally is a substantial expense. If Spotify can offer something like Google’s Auto-Narrated Audiobooks to authors, then that would enable them to bring those authors to Spotify (potentially exclusively).
Spotify’s goal also is not necessarily to imitate the voices from the existing audiobooks. There is a lot that goes into making an audiobook successful, and just copying the voice alone wouldn’t convey that. For example, pairing tone and cadence changes with what’s being narrated, techniques for conveying dialogue, particularly between different characters, etc… How you speak is just as important as your raw voice.
That would allow Spotify to create audiobooks using those techniques without using the voice of anyone who hadn’t signed away rights to it. However I would argue that some of the techniques they would likely use are integral to a person’s voice.
It’s also feasible that Spotify wants to be able to take an existing audiobook and make it available with a different voice. This wouldn’t require the audiobook to have ever been trained on - they would just replace the existing voice in it with another while preserving the pauses, tone shifts, etc. (and possibly adjusting them to be appropriate for the new voice).
More closely aligned to the specific derivative work they mentioned would be to implement something like Kindle/Audible’s Whispersync, potentially in collaboration with a non-Amazon ebook retailer like Barnes&Noble or Kobo.
This is a much better take.
Intonation is huge, and something general models tend to have trouble with - especially with something like an audiobook, which is narration - it’s very contextual in a way not found in almost any other form of communication. It even encapsulates every other form of context through dialogue.
And not only that - a lot of audiobooks have versions by multiple voice actors. And they might change a word here or there, but it’s highly structured data - it’s truly a treasure trove
I’d go a step further and say they really want access to the dataset - not just for audiobooks, but because this is a fantastic dataset to train very context aware (and silky smooth) text to voice.
Spotify probably doesn’t have the chops to do this, but they might be trying to leverage the dataset - I’m not sure if they could sell it wholesale or not, but if nothing else they could “partner” with Microsoft or Google to train VTT capabilities into multi-modal LLMs (a pitch with all the buzzwords to make investors need to change their underwear)