Waveform#
You’ve maybe seen audio waveforms before. A waveform is essentially a pressure graph over time. It represents the amount of displacement or pressure at any given point in time. It’s a very effective tool for audio editors to use for slicing audio and manipulating volume (pressure).
You can see below in the waveform graph, the X axis is time and the Y axis is amplitude (pressure).
import librosa, librosa.display
import numpy as np
import matplotlib.pyplot as plt
import IPython.display as ipd
plt.style.use('dark_background')
signal, sr = librosa.load("Haunting_song_of_humpback_whales-youtube-W5Trznre92c.wav")
ipd.Audio(signal, rate=sr)
plt.figure(figsize=(20, 5))
librosa.display.waveshow(signal, sr=sr)
plt.title("Waveshow", fontdict=dict(size=18))
plt.xlabel("Time", fontdict=dict(size=15))
plt.ylabel("Amplitude", fontdict=dict(size=15))
plt.show()
While useful for general audio editing/mastering, waveforms represent the time domain, and don’t give us any useful information about specific frequencies.
Discrete Fourier Transform#
If we want to see the specific frequencies in a given audio signal, we need to go from the “time domain” of the waveform to the “frequency domain”. We do this with Fourier Transforms. To be exact, we use a “discrete” FFT.
Notice the X axis here represents frequency, and the Y axis represents magnitude of each frequency.
import librosa, librosa.display
import numpy as np
import matplotlib.pyplot as plt
import IPython.display as ipd
plt.style.use('dark_background')
signal, sr = librosa.load("Haunting_song_of_humpback_whales-youtube-W5Trznre92c.wav")
ipd.Audio(signal, rate=sr)

