Yazılım Çorbası: Voice Recognition

28 Mayıs 2021 Cuma

Voice Recognition

Giriş

Modern yöntemlerle eski yöntemler arasında fark var.

Modern Yöntem

Neural Networks kullanılıyor

Eski Yöntem

Açıklaması şöyle

I did basic voice recognition on an Atari ST (8MHz 68000, 8-bit mono sampling1). If it could be done on a 1985 desktop2 then it should be no problem for an early naughties cell-phone3.

IIRC4, the algorithm was roughly as follows:

- Sample the audio (8-bit mono @ 22kHz?)
- Split the audio into short (½ second?) pieces
- Do an FFT on each piece. The results are placed into a 2-dimensional array (piece #, binned frequency intensity)
- Compare the array against a set of reference patterns (one for each recognizable word, stored in the same format) and return the closest match (along with the strength of the
match). A diagram illustrating this is at the end of this answer.

No neural networks were used (though I undoubtedly experimented with them), just basic arithmetic. Training was done by recording the same word multiple times and then averaging the resulting arrays. Note that the algorithm only worked for discrete words, not continuous speech.

Algoritma için şekil şöyle

Yazılım Çorbası

28 Mayıs 2021 Cuma

Voice Recognition

Hiç yorum yok:

Yorum Gönder

Blog Arşivi