My Awesome Phase Vocoder (Audio Examples and Free Software Download)

I've been spending a good amount of time working on a phase vocoder lately. I'm happy to say I have an early working demo version that sounds quite good.

Piano Waveform Stretched and Compressed

The phase vocoder allows for stretching and compressing the time duration of audio (as shown in the waveforms above) without changing the pitch or having adverse effects on the quality of the audio.

The phase vocoder is a crucial part of many popular music production software packages, and used in various other audio processing applications. For music production, it allows for beat matching audio snippets (i.e. loops). When used in conjunction with a resampler, one can produce multiple notes from a single audio sample (e.g. a single piano key strike).

I won't go into all of the details of how the phase vocoder works, but the basic process consists of taking a short duration of audio, converting it to the frequency domain and calculating new "stretched" or "compressed" phase values. Once the new phases are calculated, the frequency information is converted back to the time domain resulting in a new output signal. Overlapping and adding these output signals results in stretched or compressed audio output.

Audio Stretching/Compressing Examples

The following short snippets of audio demonstrate the quality of my phase vocoder. Each sound has three different audio samples: the original sound, an example of the sound stretched using my phase vocoder and an example of the sound compressed, again, using my phase vocoder:

Sound Original Stretched Compressed

The stretch and compress values applied to the above sounds vary. For example, the piano is stretched to 150% of the original sound and compressed to 75%. The 808 kick is 400%/25%, the guitar is 180%/40%, and the synth is 200%/50% respectively.

Download Link to the Software

If you're interested in giving it a try on your own audio, I have a simple command line executable that should work for all Intel 64 bit Windows, OS X and Debian-based Linux systems. Download links:

Note that these are just zipped executables (not an installer). The command line usage is quite simple:

 CommandLinePhaseVocoder -input inputfile.wav -output outputfile.wav -stretch stretchfactor
Also, note that there are currently a few limitations to be aware of:
  • It currently supports mono (single channel) 16 bit uncompressed wave files only.
  • I have not done much of any experimentation yet with stretch factors below 0.25 and above 4.0, so no guarantees when stretching or compressing by such extreme values.
  • It's currently designed for "short" wave files (as in, under ~30 seconds) with a single transient at the beginning of the audio. See below for more information on transients.
  • This is not what I would call "production quality software". For example, I do not have an automated release build for the project in place, automated testing, numerous test cases, etc in place. This is simply an early demonstration of my phase vocoder
What's Next

As mentioned above, this is simply an early demonstration of my phase vocoder and it still has some limitations. It currently has no audio transient detection. It simply expects the beginning of the input audio file to start with a transient. If tested on audio that does not start with a transient, or contains multiple transients, I would expect the quality to suffer quite a bit. Adding transient detection to the process is something I plan on doing very soon.

I plan to also implement a resampler soon. As mentioned above, this will allow me to use the output of the phase vocoder to produce a variety of different pitches (notes) of the same sound.

Beyond that, I'm aware of at least one other method to implement time stretching/compression of digital audio signals. I believe the processing for this method is entirely in the time domain (as opposed to the frequency domain). If I understand correctly, the process looks for small snippets of similar adjacent audio waveforms within the input signal, and then either duplicates the snippet to time-stretch the audio or eliminates it to compress the audio.

I'm interested in implementing this algorithm and comparing it's quality to my phase vocoder. Hopefully I'll have more to report on this soon.

Date: February 14th, 2017 at 9:42pm
Author: Terence Darwen
Tags: Audio Software, Cross-Platform Development, Phase Vocoder, Digital Signal Processing, DSP, C++

Previous Next