Karaoke Trainer

Figure 2: Photo of client using the device.

Designers: Swarna Solanki and Omar Aman


Our client is a 19 year old male who has severe mental retardation.  He has limited dexterity, visual impairment, and limited verbal capabilities.  The client has become motivated to practice vocalizing through music therapy, whereas other forms of speech therapy were ineffective.  A music therapy session involves the client’s therapist singing a familiar song and stopping before the end of a phrase, for example, “Take me out to the ball…”  Once he verbally finishes the phrase, “game”, the therapist will continue singing the next line of the song.  If he does not respond, the therapist encourages him to do finish the line before moving on.

Figure 1: Photo of the device with cover removed to reveal the circuitry inside

Figure 1: Photo of the device with cover removed to reveal the circuitry inside

The goal of our project was to develop a device that simulates these music therapy sessions so that he may practice vocalizing independently.  Voice detection was necessary so that the device knows when the client has vocalized.  However, his voice is difficult to understand, so speech recognition is not necessary.  As long as he is vocalizing, he is meeting the therapist’s goals.  In addition, a simple interface is required, preferably one comparable to an Apple iPod shuffle, which he was already familiar with, so that he may use it without supervision.


The device successfully motivates the client to practice vocalization by singing along with his favorite songs.  Very quickly, the client learned that the device would only continue playing the song if he vocalized appropriately.  Thus, if he mumbled or whispered the word and the device could not register his voice, the client would make another attempt that was louder and clearer.  The positive feedback from the device puts a smile on his face as he claps and listens to the music.  This result was exactly the type of influence his family and therapist would hope for.  After using the device for a short amount of time, there was an obvious increase in the client’s ability to respond verbally to those around him, just as he would experience after a session with his music therapist.

The client’s mother stated that the device was simply “Great!” after she saw it in action for the first time.  She continued, “there are very few times in Alex’s life that somebody can actually make a difference for him.  I feel like this device is going to be a huge success!”  The client’s father stated, “I am so happy, I feel like I am going to cry!”


The device is based on the PIC 16F876 microcontroller (Microchip, Inc, Chandler AZ).  Other major components are the music player, the user interface, and the voice detection circuit.  The music player is a Rogue Robotics (Toronto, ON) uMP3 Playback Module.  This uses a Secure Digital (SD) card to store MP3 files, and interfaces with the microcontroller to select which song to play.   Each song is stored in a folder on the SD card.  Each folder contains the song broken into separate tracks according to the client’s expected responses.

The user interface consists of a “next” button, similar to the next song button on an iPod Shuffle.  This button is active at the beginning of a song, but at the therapist’s request, it becomes inactive once the client starts singing a song.  This forces him to complete a song once he starts it.

The voice detection system must identify when the client vocalizes, while ignoring when he claps or grunts, as well as other ambient noises.  According to our studies, actual vocalization results in a signal of greater amplitude than background noise or grunts, and of greater duration than hand claps or finger snaps.  Thus, our circuit must identify vocalization as signals that are above a minimum threshold duration and amplitude.

In our vocalization detection system, the microphone is connected to an instrumentation amplifier, which is then input to a comparator.  This outputs 5V when the amplified microphone signal exceeds the reference voltage.  The comparator output is sent to a retriggerable one shot, which outputs a 5V pulse for a short time period when the comparator output goes high.  Because the one shot is retriggerable, it maintains a constant 5V output as long as the input signal has peaks more often than the output pulse duration of the one shot.  This signal is fed to a PIC, which measures the duration of the one shot output, and determines if it is long enough to represent vocalization.  The system effectively filters out clapping, finger snapping, grunting, and other short or quiet sounds.

The reference voltage for the comparator can be adjusted by the parent or therapist.  An LED array shows the relative strength of the reference.  Using a potentiometer, the user can set this reference voltage between 0 and 1 V, based on the response of the microphone.  The LED array allows a visual comparison between the output signal of the instrumentation amplifier and the reference voltage at input of the comparator.

Our final device employed an external speaker or headphones.  The sensitivity control was intentionally difficult to access to avoid accidental adjustment.  The next button and LED array, volume control, audio jacks, and on/off switch were all made accessible.  The enclosure was custom made of blue acrylic.  The total cost of the device was $396.

Figure 2: Photo of client using the device.

Figure 2: Photo of client using the device.

Post a Comment

Your email is never shared. Required fields are marked *