28 Mayıs 2021 Cuma

Voice Recognition

Giriş
Modern yöntemlerle eski yöntemler arasında fark var.

Modern Yöntem
Neural Networks kullanılıyor

Eski Yöntem
Açıklaması şöyle
I did basic voice recognition on an Atari ST (8MHz 68000, 8-bit mono sampling1). If it could be done on a 1985 desktop2 then it should be no problem for an early naughties cell-phone3.

IIRC4, the algorithm was roughly as follows:

- Sample the audio (8-bit mono @ 22kHz?)
- Split the audio into short (½ second?) pieces
- Do an FFT on each piece. The results are placed into a 2-dimensional array (piece #, binned frequency intensity)
- Compare the array against a set of reference patterns (one for each recognizable word, stored in the same format) and return the closest match (along with the strength of the 
match). A diagram illustrating this is at the end of this answer.

No neural networks were used (though I undoubtedly experimented with them), just basic arithmetic. Training was done by recording the same word multiple times and then averaging the resulting arrays. Note that the algorithm only worked for discrete words, not continuous speech.
Algoritma için  şekil şöyle



Hiç yorum yok:

Yorum Gönder