Podsmart: Summarize Podcasts with AI

Isaac Tham
6 min readApr 22, 2023
Sign up for Podsmart today and choose 5 podcast episodes to transcribe and summarize for free!

For the past three months, I’ve been working on a podcast AI app and today I’m really excited to unveil it to the world! Podsmart summarizes podcasts to help busy intellectuals learn more efficiently. We transcribe podcasts and generates key insights in an interactive summary, saving you hours of listening.

This is a personally meaningful project for many reasons — I’m an avid listener of podcasts and face many pain-points my app aims to address, this is the first publicly-released software product I’ve done, and I developed the app from an army bunk using a mobile hotspot the past three months. I’ll go into my development journey in a later post, but for now this post is about the idea behind Podsmart.

The Power of Audio: Speech Dominates the Spread of Ideas Today

Firstly, audio is the dominant medium where ideas are produced. Thought leaders and industry experts share valuable insights and experience in the form of interviews, fireside chats, presentations, and speeches. Think of some of the most influential people in the world today. How did you come to know of their ideas and beliefs? Unless you are thinking of a famous author, is exceedingly likely that these people communciated their ideas through speaking, and you either listened to their speech or read an article based on what they said, instead of them communicating their ideas directly to you in written text. Hence, engaging with valuable ideas necessarily involves the audio medium.

One reason for this is the ease of audio production — people communciate through speech much more easily and quickly than writing. The average person speaks at least 7000 words a day, while an average professional writer writes 1000 words a day. Not just the volume but also the rate of information transfer is higher through speech. People speak at a rate of around 150 words per minute, compared to just 40 words per minute for writing.

As a result, audio media such as podcasts have been growing in popularity, with the proportion of the US population having listened to a podcast in the past month tripling in the past 10 years to 38%, with 18% listening to podcasts daily, and youths (age 12 to 34) being main group consuming podcasts.

The Drawbacks of Audio Consumption

However, audio information is more difficult to consume.

The rate of information consumption is slower than with text. Humans read at an average of 300 wpm, compared to 210 wpm for listening. Furthermore, podcasts are often in the form of conversations, which means there is more irrelevant information in those 210 words, such as small talk, filler words, and pauses, which further reduces the rate at which we receive relevant information. For those with limited time, the inferior information density of the audio medium be a barrier to consuming information and interacting with valuable ideas. Personally, I love listening to podcasts, especially on economics and technology. Every week ~20 new episodes are released from the main podcasts I follow, and with each episode averaging an hour long, I hardly have the time to listen to all the podcasts I want to engage with. Hence, the sheer volume of audio production means that many valuable ideas are contained in audio, but those ideas much less densely concentrated than in text. This shows the crucial necessity for solutions that summarize and distil key ideas from audio.

Furthermore, insights gained from engaging with the audio medium are difficult to retrieve and share. It is very difficult to search for specific information within audio mediums like podcasts. For example, when I’m in a conversation with friends, sometimes I recall an interesting insight from a podcast episode I previously listened to. I tell my friends that I’ll send them a link of the podcast that mentions that fact, but I realize I forgot which one out of the several podcasts I listen to contained that fact. Moreover, even if I eventually remembered the specific podcast, while text can be easily searched using a Ctrl-F function, searching for a sentence in an audio recording usually means painfully listening through the entire recording for seconds of relevant information. Such frustrating experiences inhibit people from interacting with ideas in the audio sphere.

Today’s Transcription Services are Inadequate

The obvious solution to unlocking the information in audio is by transcribing them to text. For decades, human transcription has been the gold-standard, but this is painfully slow — professional transcribers transcribe 1 hour of audio in 2–3 hours — and expensive — with the standard price for 1 hour transcription being $90. Recent technological innovations have heralded a new era of AI transcription services. However, the mass adoption of AI transcription has been held back, audio is an extremely difficult modality for machines to process, with many factors such as quality, background noises, diversity of audio sources, formats, and compression types preventing effective transcription solutions from emerging as readily as image-recognition or NLP models.

Today, popular AI transcription services are priced at $0.85 (Otter.ai), $0.87 (Deepgram), $0.90 (AssemblyAI), $1.25 (Amazon) and $1.44 (Google Speech to Text) per hour of audio. Despite this, existing Speech-to-Text APIs have been reported to be complicated to implement, requiring in-depth expertise on matters such as audio formats, sampling rate etc.

Additionally, these AI transcription services fail to address the problem of audio’s poor information density as most do not offer summarization services, limiting the effectiveness of the end-user consuming the transcripts. Even if transcription is available cheaply, the additional time required to sieve out important information is costly compared to text. Among the limited pool of AI services that offer summarization are AssemblyAI (which also include sentiment analysis and entity detection), for $3/hr (in total) and Sonix’s premium plan at $5/hr with a $22/month subscription.

We need a cost-efficient end-to-end solution that transcribes podcasts and summarizes the content into digestable insights.

Enter Podsmart. The intelligent app that seamlessly unlocks knowledge from podcasts.

Podsmart synthesizes information from podcasts into an accessible and intuitive visual format, enabling you to interact with your podcasts and take action like never before.

Search for any podcast that is available on Spotify, and Podsmart generates a state-of-the-art transcription of podcast episodes.

Beyond transcribing, Podsmart’s audio intelligence features allow you to effectively extract insights from podcasts. Podsmart uses AI clustering techniques to extract the podcast’s main topics and provides informative AI-generated titles and summaries of each topic. Podsmart provides a summary of the episode, with color-coded highlights to show which topic each part of the summary corresponds to.

Podsmart understands that listers want to interact with podcasts at various levels of granularity. We often want more details about a certain topic and only a brief summary about another topic, and different people focus on different topics. Hence, providing merely a default summary, or on the other hand, just showing the entire text transcript, is inadequate. With Podsmart, clicking on each topic gives greater granularity — the timestamps of the audio segments making up each topic — so you can listen to the audio yourself, along with segment summaries, as well as the raw transcript text word-for-word.

Podsmart allows you to interact with your audio knowledge as well through the Q&A chatbot. Using semantic search on podcast transcripts, the chatbot delivers accurate, customized answers, along with the most relevant transcript segments for you to explore further in more detail. Furthermore, Podsmart allows you to integrate information across multiple podcast episodes — the chatbot can synthesize the most relevant information across many different podcasts to arrive at an answer. This is ideal for comparing and contrasting opinions across different podcasts.

Podsmart supports non-English podcasts, making it a perfect tool for those trying to pick up a new language.

Podsmart supports non-English podcasts, making it a perfect tool for those trying to pick up a new language.

A bonus feature of Podsmart is multilingual support. As someone trying to pick up a new language (Spanish), I listen to language podcasts to learn new vocabulary and sentence structures, as effective language learning requires correlating the spoken and written word. However, I’m always frustrated by transcriptions being stuck behind a paywall or non-existent. For second-language learners like me, Podsmart has you covered. Podsmart transcribes and translates podcasts, displaying text in both languages side-by-side to effectively bridge audio and text learning.

Unlimited access to Podsmart comes with a monthly subscription of $4.99 — giving you a valuable product at a superior price.

Ideas are valuable and powerful, and in today’s fast-paced world, it’s crucial to process, internalize and integrate new ideas efficiently. Use Podsmart — the intelligent app that seamlessly unlocks knowledge form podcasts.

--

--

Isaac Tham

economics enthusiast, data science devotee, f1 fanatic, son of God