Transcribe Block

The Transcribe block takes spoken language input and creates a text transcription.1 Read on for information on configuring the Transcribe block, as well as other tips.

Caution: Beta Ahead!

Please note, the Transcribe block is currently a public beta. We believe it's ready for everyday use, but it is not yet fully refined.

One reason for providing early access to Transcribe is to obtain feedback from users. Please get in touch with questions and feedback.

Configuring Transcribe

Using the Transcribe block should be fairly straightforward. This section provides details on all of its setup options.

Model

Transcribe offers two different transcription models: Low Resources and High Accuracy.2 You can download and use either or both models. The Low Resources model will download quickly and take up minimal local disk space, while the High Accuracy model will take longer to download and use more disk space.

In practice, the Low Resources model will use less CPU power and produce transcripts quickly, which can be helpful for real-time (or near real-time) use cases. By contrast, the High Accuracy model will use more CPU power and take longer to produce a transcript, but it will provide the most accurate results.

Language

The Language selector can be used to make Transcribe focus its transcription efforts on a specific language, to produce better results. If multiple languages may be spoken, the Auto state can be used. Transcribe will then attempt automatic detection of languages.

Input(s)

Transcribe will show an Input field for each input connected to the block. Each Input can be given a custom name, which will then preface its content in the resulting transcript. Several variables are available for use in this field.

For more details, see “Transcribe From Multiple Inputs” below.

Output

File name

Specify the desired file name for your transcript. Several variables are available for use in this field.

Save to

Specify the location to which Audio Hijack should save your transcript files. By default, Audio Hijack saves to ~/Documents/Audio Hijack.

Transcribe Audio From a Microphone

The most straightforward way to use Transcribe is by providing it with live audio from a microphone connected to your Mac. The block will transcribe this audio, producing a text file containing a transcript.

In Audio Hijack’s Template Chooser, you’ll find a Transcribe template to help you get started transcribing from a microphone. This simple template takes audio from a microphone and runs it through the Transcribe block, then saves it to an audio file. By default, transcription files are saved into an Audio Hijack sub-folder in your Mac's Documents folder, named with the date and time followed by the word “Transcription”.

Use the Transcribe template to speak into your Mac and get a transcript back out.

Transcribe Audio From an Application

Audio Hijack can capture audio from any application running on your Mac, and that means you can also transcribe anything you can hear. This is particularly useful for voice and video calls on Zoom, Skype, and other VoIP services.

With text transcripts, your meetings on Zoom and calls on FaceTime can now be referenced and searched. Use Transcribe with any application on your Mac, for endless speech to text possibilities.

Transcribe From a File

Transcribe can also assist if you have an existing audio file and want to get a transcript from it. To do this, you’ll play the file in any app (such as MacOS’s QuickTime Player) and capture the audio with Audio Hijack.

Once the audio is flowing through Audio Hijack, you can route it through the Transcribe block to get a transcript.

Transcribe From Multiple Inputs

Transcribe is especially handy for podcast creators who wish to provide a text transcript for their shows. You can configure your podcast setup so that each speaker is identified, based on input.

Block Nicknames

You can also use block nicknames to identify your speakers. Below, the name of each input block has been edited (to “Ammo” and “Ammette”). The Source variable is then used to get the speaker’s name before their text, like so:

Here, the first input has been given the nickname “Ammo”, while the second has been given the nickname “Ammette”. The resulting file will look like so:

Check Your Transcripts

The Transcribe block is powered by Whisper, OpenAI’s impressive automatic speech recognition system. While the speech recognition is very good, it is not perfect. Be sure to check your transcripts for accuracy.

Apple Silicon Recommended

If you’re on an Intel Mac, you’ll see the following notice when using Transcribe:

The processing requirements for the transcription models are fairly hefty, and Intel Macs struggle to keep up. You can still use the Transcribe block with an Intel Mac, but results may be slow or inconsistent. Apple Silicon-based Macs are thus recommended for use with Transcribe.


Footnotes:

  1. In addition to transcribing English, Transcribe can understand and transcribe 98 other languages. See the full list in the Language menu within the Transcribe block. Note that accuracy and quality of transcripts varies by language. ↩︎

  2. At present, “Low Resources” uses the “Base” Whisper model, while “High Accuracy” uses the “Large (v2)” Whisper model. ↩︎

  3. Scripting and AutomationLive Stream Block