VoiceGuard - A Speaker Recognizer

VoiceGuard

Project's name:	VoiceGuard
Introduction:	VoiceGuard is a speaker recognition application for Windows. The target is to develop a software that will allow access to users to their machines by verifying their voice prints.
mailing list:	Voiceguard-development Please use this list to send feature requests

Status

The project is under planning phase

Whole Story

VoiceGuard, once developed will work as follows. During the Windows logon process, VoiceGuard will get activated and force the user to verify their identity from their voice. User will speak a word (We have not yet decided if we will use Text Dependent or Text Independent methods for speaker verification) and VoiceGuard will record the audio after sampling. After recording, the audio data will go through a series of mathematical computations, the result of which will be matched to a set of pre-recorded results. The closest match will verify a person. Recorded results are gathered during training sessions.

Requirements

Operating System: Windows (Hopefully any flavor. i.e. 95/98/NT/2K/XP)

Hardware: PC with speakers and sound card

Team

The following people have contributed to the project:
1. Parhar (os_parhar@hotmail.com )
2. Chris Mahn
3. Cory Clark
4. Mike Canann

Technical Details

Cory has suggested the following approach (17 Oct, 2002):
The reflection coefficients or PARCORs (partial correlation coeffs) are pretty easy to get. We will need to first filter the audio signal to correct for the spectral tilt of the vocal tract. Then we can use something like a 14th or maybe higher order autocorrelation to then compute these reflection coefficients. The autocorrelation should be pretty easy. I'm not sure the best way to go after that, but perhaps we should average all the PARCORs over like 2 seconds of audio and then use the coefficients as the identifier for that person. Then we can take an unknown piece of audio, find the reflection coeffs averaged over 2 seconds, and see how close they are to the previously identified coeffs, the closest match is identified as the speaker. Now if this works well, maybe we add some more features to the vector, like maybe average voice pitch, or something like that. We can also play around with higher order autocorrelation , but I think the usefulness of the higher orders starts to decline.

Last updated: 17 Oct, 2002