Cutting the Cord: Harnessing OpenAI's Whisper in an Offline Environment

Recently I presented a webinar on using ChatGPT for OSINT, where I used the entire one-hour window presenting. I encouraged attendees to ask questions in chat and said I would answer them on my blog. One of the questions was regarding another of OpenAI’s offerings, Whisper.

I recently blogged about Whisper, but the summary is that it’s an incredible audio-to-text transcription engine that can be installed on your local system and used 100% free.  The question was, does Whisper send the information to the cloud, or can it be done 100% offline? Offline would be fantastic from an OPSEC and privacy concerns point of view. I found one third-party blog post from when Whisper first launched, which mentioned that it could be used offline, but the official Whisper didn’t explicitly mention it.  In my reply, I said to expect to see a blog post on that soon, and this is that post.

I started off by using pip to install Whisper on a small NUC device running Ubuntu 22.04

Once that was done, I installed the opensource tool that specializes in audio and video files.

Here I'm confirming that Whisper was installed correctly. Notice the large number of languages supported. 

When asked to transcribe a video file I downloaded from YouTube, Whisper downloads its "medium" model by default. The file is only 461Mb in size and downloads quickly. Once downloaded, it correctly identifies the audio as being English and starts to transcribe it.

In order to test Whisper's offline capability, I disconnected the ethernet cable from the NUC and confirmed that the device was not connected to the internet. 

I then used Whisper to transcribe a video with English audio.

Here it's transcribing a video in Spanish.
Here I've asked Whisper to translate the Spanish audio to English, and it's doing it. All while still offline.

Whisper would be a fantastic tool no matter what, but the ability to transcribe video offline is a huge win for users who prioritize privacy and data security. Things like law enforcement transcribing interviews or intelligence analysts downloading and transcribing audio and videos on their focus area can utilize this tool and avoid a lot of the worries around OPSEC using third-party tools and services. 



Popular posts from this blog

SANS Index How To Guide with Pictures

Introducing FaviconLocator: The Eazy Button to Searching by Favicon

Automating Domain Squatting Detection with DNSTwist and Python