AiroSound

AiroSound

The gesture based air instrument interpreter
Navigation Welcome to the homepage of the AiroSound project.

Who we are

We are:
Gert Menke (219 027) Ralf Miunske (223 816)

Abstract

AiroSound aims to be a landmark-based virtual instrument interpreter which uses image recognition techniques to determine certain gestures that will then be transformed into music.

Because of the difficulties involved with reliable human gesture detection, we decided to use landmarks such as coloured gloves that can hopefully be detected by our software.
It should then detect the users motions and convert them into sound in a somehow intuitively understandable manner.

How it works

AiroSound uses a simple V4L capeable webcam to extract certain image features as the size, bounding box and center of gravity from any previously learned colored glove to translate this information into notes that can be played back easily.
 
At this point we would like to thank the authors of the LTI-Lib for saving us a lot of work and causing us a some pain instead.
 

(Gert playing the air-guitar)
To achieve this aim the software acts as follows:
  • Learning the background

    (coloured Ralf with greyed background)
    As the software starts the first images from the video source are taken as the empty background. Anything seen during that phase will be ignored for any future operations until the background is retrained. This is done by applying the lti::backgroundModel to any image as described in 'Pfinder: Real-Time Tracking of the Human Body' from Wren, Azarbayejani, Darrell and Pentland published in IEEE PAMI July 1997, vol 19, no 7, pp.
    The background substraction was additionally improved by a shadow reduction that is based on the assumption that shadows usually only change the HSV-value of an image.
    Furthermore a fast median filter is applied to any resulting background mask to reduce some of the noise.

     
  • Training the golves and the AccousticTowel(tm)
    After AiroSound knows which parts of the video image belong to the background it is neccessary to calibrate the color of each glove or the AccousticTowel(tm). To do so, one has to place the glove and only the glove that shall be trained into the video image. Now the program extracts those parts of the image that do not belong to the previously learned background and determines their average hue, saturation and their deviations respectively. From now on any background-substracted image will be classified using the trained colors resulting in a labeled image mask containing only the recognized indexed blobs. Fortunately we did not encounter any of the commom lighting problems due to the recalibration everytime the software was started.
     
  • Geometric analysis
    Now that AiroSound knows which pixels belong to which coloured glove or AccousticTowel(tm) some simple geometric features of the 30 largest blobs are computed as their bounding box, size and center of gravity by applying lti::multiGeometricFeaturesFromMask the the labeled mask. Afterwards only the geometric features of the largest blob of any color is preserved.
     
  • Queing
    When the geometric features of each color's largest blob have been computed by the analysis thread they are packed into timestamped events and put into a global event queue from which the synthesis can dispatch them any time.
     

(full-featured Gert)
  • Geometric calibration
    Even though the geometric events are already put into the event queue, the air-guitar does still not know which glove positions represent the it's neck's top and bottom as well those positions of the guitars "trigger" respectively. To do so, the musician simply has to pose the way he linkes to play his instrument.
     
  • Interpreting the motion
    After telling AiroSound all of the four coordinates it owns the ability to interpret the musicians motions and to translate them into notes. The mapping quite simple by now: The guitar's pitch is determinded by the glove's relative position between the previously calibrated upper and lowers bounds of the guitar's neck. The playing the note itself is triggered by moving the other glove from the upper to the lower half of the learned trigger range.
    The AccousticTowel's(tm) horizontal and vertical positions in the image define the two mixed frequencies. The AccousticTowel's(tm) volume is determined by it's visible horizonal and vertical expansion.
     
  • Playing the note using samples
    After the note to play is determinded and triggered AiroSound knows two ways of generating the appropiate sound. The simple way is to play back prerecorded samples for each note using an erxternal player such as artsplay.
     
  • Playing the note using PureData
    As more sophisticated way of playing the computed note AiroSound uses a udp connection to our pd-patch telling pd which duration and midi-note to play.
     
  • The PD-patch
    The PD-patch receives sound information using udp and routes the incoming information to it's destination instrument. The guitar is configured by setting the tone duration, falloff and volume of it's main frequency as well as the ones of the harmonics. As a new guitar note arrives the preconfigured settings will be played back.
    The AccousticTowel(tm) own presets for two voices, containing the base frequency, top frequency and volumes. As a new tone arrives it will be played back and fall of during a longer time. Everytime a new sound arrives, the volume will rise back to the maximum again and the frequencies will slide smoothly to the ew ones.
    Unfortunately we did not solve by now the problrem of synthesizing guitar sound manually. Thus we included the possibility to play back prerecorded guiter samples, although the synthesis ist used normally.
     

Future work

By now the only instruments supported by AiroSound are the air-guitar and the AccousticTowel(tm). However AiroSound is not limited to those two instruments. As an example, one could also imagine an air-piano, air-drum or some completely new and unconventional musical instrument that would even be impossible to implement with real matter. One could change the instrument one is playing by showing a simple gesture to the cam and select a new one in a similar manner.
Another simple extension to the project would be the support of even more musicians at once, as the software supports a very selective color recognition that would be capeable of distinguishing even more than only six golves as it does by now.

References

  • Video for Linux
  • The LTI-Lib
  • 'Pfinder: Real-Time Tracking of the Human Body' from Wren, Azarbayejani, Darrell and Pentland published in IEEE PAMI July 1997, vol 19, no 7, pp.
  • ARTS

Download