> Fine for the time dimension of the spectrogram, but rather dubious for the frequency axis.
MFCCs[1] are exactly that, a type of convolution along the frequency axis of a Fourier transform, and are highly apt features for music classification tasks.
It makes sense if you think of timbre as a time-varying relationship between the harmonics of a single pitch; translation invariance along the frequency axis can tell you that you there are partials typical e.g. of a guitar or of a flute, without caring what particular pitch those instruments are playing.
And timbre is a bigger source of variety in popular music than e.g. the particular notes used.
MFCCs[1] are exactly that, a type of convolution along the frequency axis of a Fourier transform, and are highly apt features for music classification tasks.
It makes sense if you think of timbre as a time-varying relationship between the harmonics of a single pitch; translation invariance along the frequency axis can tell you that you there are partials typical e.g. of a guitar or of a flute, without caring what particular pitch those instruments are playing. And timbre is a bigger source of variety in popular music than e.g. the particular notes used.
[1] https://en.wikipedia.org/wiki/Mel-frequency_cepstrum