Home>
background

I used to draw a spectrogram by Fourier transforming the piano sound source I recorded. The corresponding code and output are as follows:

import numpy as np
import wave as wave
import scipy.signal as sp
import matplotlib.pyplot as plt
import math
file = wave.open ('/Users/***/Desktop/Musica/doremi.wav')
data = file.readframes (file.getnframes ())
data = np.frombuffer (data, dtype = np.int16)
f, t, i = sp.stft (data, fs = file.getframerate (), window ='hann', nperseg = 512, noverlap = 256)
i = 10 * np.log (np.abs (i))
plt.title ('Time-Frequency')
plt.xlabel ('Time (s)')
plt.ylabel ('Frequency (Hz)')
plt.pcolormesh (t, f, i, cmap ='jet')
plt.show ()


Question

I was wondering what f and t are in the above code, so I searched variously, but it is more understandable than saying "It's the frequency axis ~ It's the time axis ~" I couldn't find the explanation anywhere. Here's the output I ran:

(abridgement)
print (f)
print (t)
[0.00000000e + 00 5.80498866e-03 1.16099773e-02 ... 1.76297506e + 01
 1.76355556e + 01 1.76413605e + 01]
[0. 86.1328125 172.265625 258.3984375 344.53125430.6640625 516.796875 602.9296875 689.0625 775.1953125
   861.328125 947.4609375 1033.59375 1119.7265625 1205.859375
  1291.9921875 1378.125 1464.2578125 1550.390625 1636.5234375
  1722.65625 1808.7890625 1894.921875 1981.0546875 2067.1875
  2153.3203125 2239.453125 2325.5859375 2411.71875 2497.8515625
  2583.984375 2670.1171875 2756.25 2842.3828125 2928.515625
  3014.6484375 3100.78125 3186.9140625 3273.046875 3359.1796875
  3445.3125 3531.4453125 3617.578125 3703.7109375 3789.84375
  3875.9765625 3962.109375 4048.2421875 4134.375 4220.5078125
  4306.640625 4392.7734375 4478.90625 4565.0390625 4651.171875
  4737.3046875 4823.4375 4909.5703125 4995.703125 5081.8359375
  5167.96875 5254.1015625 5340.234375 5426.3671875 5512.5
  5598.6328125 5684.765625 5770.8984375 5857.03125 5943.1640625
  6029.296875 6115.4296875 6201.5625 6287.6953125 6373.828125
  6459.9609375 6546.09375 6632.2265625 6718.359375 6804.4921875
  6890.625 6976.7578125 7062.890625 7149.0234375 7235.15625
  7321.2890625 7407.421875 7493.5546875 7579.6875 7665.8203125
  7751.953125 7838.0859375 7924.21875 8010.3515625 8096.484375
  8182.6171875 8268.75 8354.8828125 8441.015625 8527.1484375
  8613.28125 8699.4140625 8785.546875 8871.6796875 8957.8125
  9043.9453125 9130.078125 9216.2109375 9302.34375 9388.4765625
  9474.609375 9560.7421875 9646.875 9733.0078125 9819.140625
  9905.2734375 9991.40625 10077.5390625 10163.671875 10249.8046875
 10335.9375 10422.0703125 10508.203125 10594.3359375 10680.46875
 10766.6015625 10852.734375 10938.8671875 11025. 11111.1328125
 11197.265625 11283.3984375 11369.53125 11455.6640625 11541.796875
 11627.9296875 11714.0625 11800.1953125 11886.328125 11972.460937512058.59375 12144.7265625 12230.859375 12316.9921875 12403.125
 12489.2578125 12575.390625 12661.5234375 12747.65625 12833.7890625
 12919.921875 13006.0546875 13092.1875 13178.3203125 13264.453125
 13350.5859375 13436.71875 13522.8515625 13608.984375 13695.1171875
 13781.25 13867.3828125 13953.515625 14039.6484375 14125.78125
 14211.9140625 14298.046875 14384.1796875 14470.3125 14556.4453125
 14642.578125 14728.7109375 14814.84375 14900.9765625 14987.109375
 15073.2421875 15159.375 15245.5078125 15331.640625 15417.7734375
 15503.90625 15590.0390625 15676.171875 15762.3046875 15848.4375
 15934.5703125 16020.703125 16106.8359375 16192.96875 16279.1015625
 16365.234375 16451.3671875 16537.5 16623.6328125 16709.765625
 16795.8984375 16882.03125 16968.1640625 17054.296875 17140.4296875
 17226.5625 17312.6953125 17398.828125 17484.9609375 17571.09375
 17657.2265625 17743.359375 17829.4921875 17915.625 18001.7578125
 18087.890625 18174.0234375 18260.15625 18346.2890625 18432.421875
 18518.5546875 18604.6875 18690.8203125 18776.953125 18863.0859375
 18949.21875 19035.3515625 19121.484375 19207.6171875 19293.75
 19379.8828125 19466.015625 19552.1484375 19638.28125 19724.4140625
 19810.546875 19896.6796875 19982.8125 20068.9453125 20155.078125
 20241.2109375 20327.34375 20413.4765625 20499.609375 20585.7421875
 20671.875 20758.0078125 20844.140625 20930.2734375 21016.40625
 21102.5390625 21188.671875 21274.8046875 21360.9375 21447.0703125
 21533.203125 21619.3359375 21705.46875 21791.6015625 21877.734375
 21963.8671875 22050.]


Looking at these numbers, it's just that a certain number is multiplied by a constant, what is the frequency and time? How does this array of numbers contribute to frequency analysis? I didn't understand that. I'm sorry for the amateur question, but I would appreciate it if you could help me.

  • Answer # 1

    stft (Short-Time Fourier Transform) is a Fourier transform that takes out a short time by gradually shifting the window function that takes out a short time in the time axis direction and multiplying it by the original sound source (taking out a short time). It is generally correct to think that t is the elapsed time from the start of playing the piano, f is the reference frequency, and i is used as a coefficient to derive the piano sound (combination of volume for each f) at time t. ..

    Reference frequency f = (f0, f1, f2, f3, ・ ・ ・)
    Piano sound at time t = i (f0, t) * f0 + i (f1, t) * f1 + i (f2, t) * f2 + i (f3, t) * f3 + ・ ・ ・

    Therefore,It is meaningful only when t and f are used as reference values ​​and i is used as a coefficient.Is a thingEven if you look only at t and f, it looks like a value that increases monotonously...

    Note that i is a complex number and represents not only the volume but also the phase.