back-end front-end Tools Web Services

Ep.58 – Whisper AI and searchable audio

Okay, I am not sure they sell Wispa chocolate bars internationally, but worry not, this podcase has nothing to do with chocolate (sadly!).

Use Whisper AI to transcribe audio from video/different audio formats in any language and produce the text of the audio

We are talking Whisper AI from the good folks who brought (and totally destroyed our world! 🙂 ) ChatGPT to the unexpected masses.

Read to the end to see how you can run Whisper AI as a docker container 🙂

The reason you might not have heard of it as much as ChatGPT is because you will generally need to run Whisper AI yourself or use a cloud service as an example.  I have played with Whisper AI via Python/PyTorch using the Google Collab service which you can use for free (need to check the actual info here!) to work with some of these types of use-cases.  In my case, this enabled a VM of sorts to actually run Python/PyTorch to interact with Whisper AI.

So, Whisper AI itself is pretty amazing but then add the fact you can of course now make the transcriptions searchable.  In my case I am already working with Microsoft Azure Cognitive Search for this type of thing however, you could also use something like OramaSearch which allows you to use JavaScript to perform powerful search capabilities to not only find the text but also time-link to the audio itself.  There is an example where a developer has used Astro (JavaScript framework) to stitch the UI side together – I will get a link to that.

But now the real golden nugget: you can run Whisper AI as a docker container!  oh yes, but make sure you have some decent PC performance 🙂  This opens up new opportunities of course not in the least being able to run on your own infrastructure.

Until the next time, take care.