Last week’s chaos caused me to spend more time hacking this week than usual. I’ve been spending my time talking to as many people as possible in the podcast space, so theres been a few ideas I’ve been itching to build.
One of these was a speaker identification database - being able to know who is speaking just by their voice.
Standing this up was pretty simple - the audio tracks, transcripts, and speaker segments were available for all ingested shows and provide a solid start for what I wanted to deliver. What was missing was being able to compare speakers across episodes; to know that SPEAKER_00
in one show is SPEAKER_02
in another.
This existing dataset turned out to have all the information required to build a database of high quality audio snippets which can be used for speaker identification. All that was left was to manually label a few missing speakers, eg SPEAKER_00=Joe Biden
, and the system could match all instances of this speech in the entirety of the database.
Pretty good for a weeks worth of work.
This final data point allows for transcripts to be annotated in a way that provides a richer experience. Starting with a wall of text like this:
And using data about the speaker, sentiment, and this new voice identification system we’re able to return a transcript thats segmented on who’s talking, what their name is, and how they’re feeling about what they’re saying:
There’s more to hack on, but its always a good feeling to be able to knock out a feature like this in a day or so.