Skip to main content

Audio analysis: Mel Spectrograms

·922 words·5 mins·
Matt Moore
Author
Matt Moore

Waveform
#

You’ve maybe seen audio waveforms before. A waveform is essentially a pressure graph over time. It represents the amount of displacement or pressure at any given point in time. It’s a very effective tool for audio editors to use for slicing audio and manipulating volume (pressure).

You can see below in the waveform graph, the X axis is time and the Y axis is amplitude (pressure).

waveform

While useful for general audio editing/mastering, waveforms represent the time domain, and don’t give us any useful information about specific frequencies.

Discrete Fourier Transform
#

If we want to see the specific frequencies in a given audio signal, we need to go from the “time domain” of the waveform to the “frequency domain”. We do this with Fourier Transforms. To be exact, we use a “discrete” FFT.

Notice the X axis here represents frequency, and the Y axis represents magnitude of each frequency.

discrete-fft