The piano tone captured by the iPhone voice memo was converted to wav and displayed as a waveform (navy).
I made this script.signal.stft/istft and displayed the waveform again with matplotlib (aquamarine).
The corresponding code and output are as follows:

import numpy as np
import wave as wave
import scipy.signal as sp
import matplotlib.pyplot as plt
import math
file = wave.open ('/Users/***/Desktop/Musica/doremi.wav')
data = file.readframes (file.getnframes ())
data = np.frombuffer (data, dtype = np.int16)
duration = file.getnframes () /file.getframerate ()
x1_list = []
for i in range (len (data)):
    x1_list.append (duration * i/len (data))
x1 = np.array (x1_list)
y1 = np.array (data)
f, t, stft_i = sp.stft (data, fs = file.getframerate (), window ='hann', nperseg = 512, noverlap = 256)
stft_i = 10 * np.log (np.abs (stft_i))
t, istft_i = sp.istft (stft_i, fs = file.getframerate (), window ='hann', nperseg = 512, noverlap = 256)
x2 = t
y2 = istft_i
plt.title ('Spectrogram of DoReMi')
plt.xlabel ('Times (s)')
plt.ylabel ('Intensity (Pa)')
plt.plot (x1, y1, color ='navy')
plt.plot (x2, y2, color ='aquamarine')
plt.show ()

Question ①

The time of the re-synthesized waveform is doubled. Unlike x1, t here is a complex signal, but why does the total length change?

Question ②

Intensity is completely different from before Fourier transform/inverse transform in the first place. What was wrong?

I'm sorry for the amateur question, but if there are any points that you can help, please teach me.

  • Answer # 1

    Between conversion and inverse conversion
    stft_i = 10 * np.log (np.abs (stft_i))
    Isn't it because it contains?

    In particular, it seems that the inverse conversion is in a flat shape because the volume is crushed as a whole by log.