Fast Fourier transform. How do I select the frequency of a note?

Interested in implementing note selection and signal recognition based on notes. So if you take a piano, record the sound of a pair of keys, and then apply the FFT, the output is an array of complex numbers, where the amplitude is the modulus of the complex number, and the argument is its phase. Here is a little not clear how to select the frequency of the note?

Author: Kromster, 2017-05-19

1 answers

The Fourier transform is performed on complex numbers. The input for the conversion should be set to the real part of the amplitude (the value of the signal), and to the imaginary part of zero.

At the output, we get an array of complex numbers, where the amplitude is the modulus of the complex number, and the argument is its phase. Each element of the array is a single harmonic, starting with the zero and ending with the nth. Or is it called spectra? Different sources have different information on this subject information.

The spectra/harmonics are separated from each other by a discrete step equal to the sampling frequencies/number of samples. The number of samples is equal to the number of numbers of amplitudes at the input, the length of the arrays of incoming complex numbers, the number of the length of the input sample. In general, the number of complex numbers in the input.

More samples at the input, more frequency resolution at the output.

That is, at the output, having received 1024 complex values, we get 1024 values of the amplitude-phase. From By arranging the array in order and amplitude, you can get something like a visualization of the sound amplitude by frequency.

If the sampling is 44100 Hz, and the input array has a length of 65536, then the step between the array elements at the output is 0.672912598 Hz.

For the purposes of human speech recognition, such accuracy is meaningless and redundant. By a genetic algorithm (natural selection of the human species in the environment of the planet Earth), the optimal maximum step will be between frequencies from 1 to 9 Hz, that is, it is desirable to supply a maximum of 8192 amplitude-time values for the sampling frequency of 44100.

But before you submit this data for recognition, you need to tinker with it a little more, but I do not know how. There is something about the windows and something to do with the functions. I didn't understand anything.

FIXUPD:

The fft input, as already written above, is fed an array of sample[N], where N is a multiple of the power of two,N=65536, the sampling frequency Fs=44100, then on in the output, we get the values of the arrays i[].length=N and i[].lingth=N. Or to put it another way, an array of complex quantities C[N].

How to interpret it?

C[0] - содержит инфу об частоте 0 герц.
C[1] - содержит инфу об частоте 0.672912598 герц.
C[n] - содержит инфу об частоте n*Fs/N герц.

References:

Https://m.habrahabr.ru/post/247385/

Http://introcs.cs.princeton.edu/java/32class/Complex.java

Http://websound.ru/articles/theory/fft.htm

Mirror effect: http://psi-logic.narod.ru/fft/fft8.htm

Mirror effect like is it a consequence of the fact that only real values are fed to the input to the fft, without the imaginary part? (correct me if I'm wrong).

Smudge effect: http://psi-logic.narod.ru/fft/fft9.htm

Article about the practice: https://habr.com/ru/post/269991/

Assumption: The mirror effect can occur due to the fact that only the real signal is applied to the input, and not the I/Q signal decomposed.

 2
Author: Askalite, 2020-10-31 01:06:25