The pipeline, the code and systems that can ingest arbitrary podcast data, is more or less running at the moment. Its overly complex, considering that it currently requires 4 different servers to operate, but its done in a way that it keeps the cost of operation down while I’m still experimenting on this dataset. In the process of assembling all of this I realized something that should have been obvious from the start:
I’ve build an enrichment service for podcast data and I fell backwards into it. Give me a podcast episode and I can tell you what ads are running on it, what people are talking about, and the emotional charge there-in. Across all languages and able to reach into all past content thats still available.
Critically this is done in such a way to be defensible(-ish) as a product. Its not just waving a modern AI tool over a some audio and repackaging it, rather its taking several tools and combining them in a specific way such that the resulting data is unique. Its not something an PM can cowboy their way through with some clever use of zapier and airtable.
The specific value/use-case of this service is still unknown to me, but theres a few different products that a dataset like this can support:
Ad spend and placement insight into the podcast space
Alerting for changes in conversation
Trends and influencer sourcing as concepts wax and wane
Visibility into the non-english language podcast space for all of the above.
As I build the dataset (the lights in my apartment are still flickering) I’ll have a better sense into the feasibility of each of these things. I’ll share more of the good stuff as I discover it.
My mind has obviously been in the data-space for a while after working at Clearbit for 5 years, with a lot of my engineering practices having been sharpened on those sort of problems. Maybe I shouldn’t have been surprised this is what I ended up doing once I had time to hack the planet.
Old habits die hard I guess.