Group 3
From CoIN
Contents |
EyeTable
Output
Project Files and instructions
Project Report and Presentation
WiimoteTest.exe for Wii Calibration Note: This doesn't work - need to download from http://www.codeplex.com/WiimoteLib
Members
- Ilya Brin
- Daniel Eisenberg
- Kevin Li
Documents
Elements
- 1 Laptop
- 1 Bluetooth USB
- 2 WiiMote
- 2 Bluetooth Headset
- 2 Regular headset
- 4 LEDs and Batteries
To Do
- 10/2
-
Complete this wiki page with your project design details -
Research and find the link to proper IR LED to purchase -
Test Wii API (You can borrow one WiiMote from Mehrbod)
-
- 10/9
-
Using single computer and headset, test a software that captures speaking rate, F0, etc. in real time (See Elements page, discuss with Alan and/or Group 1) -
Write methods to extract basic head motions from a stream of periodic LED positions
-
Demo Details
Two demonstrators will have a chat at the EyeTable and a projected screen will analyze how much each person is enjoying the conversation. The following will be displayed:
- Talkative meter for each person
- Excitedness meter for each person
- Awkward meter, measuring the overall awkwardness (or lack therof) of the conversation
At the conclusion of the conversation, the screen will suggest services to the participants based on how the conversation went. We will do this for a boring conversation and a lively conversation.
Completed elements:
- Software
- On-the-fly voice analysis
- Extracts F0, volume, excitedness
- Wii-based LED tracking
- Software tested and framework built
- On-the-fly voice analysis
- Hardware
- Audio hardware all set up
- Wiimotes successfully synchronized with software
To do:
- Software
- Better voice excitedness algorithms
- Fine-tune head nodding detection based on actual data from head movements
- Improve appearance for presentation
- Hardware
- Attach IR LED's to headsets
- Acquire stereo-to-mono splitter for the two microphones
- Mount Wii's in position
Introduction
Modern technology is reaching an important milestone: we are gaining the ability to integrate intelligent applications into nearly every aspect of our lives. However, as humans use more technology, the technology ought to understand more about humans. With this project, we seek to answer the question: can computers understand the moods and desires of their users without actually understanding the content of their speech?
The EyeTable (name subject to change) will be an intelligent restaurant table that can sense whether customers are having a positive or negative dining experience. We plan to use novel applications of currently available technology in order to analyze body language and tone-of-voice. Additionally, the algorithms and heuristics that we develop will still be applicable to more advanced, less intrusive technologies that may emerge in the future.
Overview & Deliverables
Basic functionality will be built around the “date” scenario – a man and a woman sitting across from one another at a restaurant table. We plan to implement the following capabilities:
First, the table will sense how well the date is going based on head movements and tone of voice. Each participant will wear special bluetooth headsets which allow the table to track the orientation of their heads and process the sound of their speech. This will allow the system to keep track of eye contact, positive and negative gestures, frequency of conversation, etc. Based on this data, the table will determine whether the date is going well or poorly.
Second, the table will respond to specific cues from the diners and alert the waiter appropriately. When both diners begin to move their heads to the left and right, as if looking for something or somebody, the waiter will be summoned for assistance via a screen in the “kitchen” (i.e. behind the scenes). If one of the diners tilts his or her head backwards significantly, as if to drink the last drops of a beverage, the waiter will be instructed to bring a refill. When both diners cease to tilt their heads downward for a long period of time and are no longer talking, the table will assume that the meal is finished and they are ready to receive their bill. Additional gestures can be detected if time and capability permits.
Third, the table will suggest additional services to the diners during and after their date. Armed with the knowledge of how well the date is going, the table will present products and activities that best suit the customers via a visual interface. If the date is going well, the table can suggest that the waiter try to sell them an additional bottle of wine. Afterwards, it can suggest fun or romantic after-dinner activities to the diners. On the other hand, victims of a disastrous date can be provided with phone numbers for cab companies.
A secondary goal is to implement the above system with more than two people sitting around a circular table, which presents additional challenges in tracking eye contact and differentiating voices. One idea is to have a table of people with Bluetooth headsets on, and when one user looks at somebody and speaks, his/her voice is amplified in only that recipient’s headset.
Future Uses & Commercial Applications
Using body language and voice tone to infer mood has an important advantage in commercial applications: it does not invade the privacy of the users’ conversation. This could be useful in many hospitality industries where clients may not want a machine listening in to what they are saying. Obviously, customers at a restaurant would not be willing to wear a special apparatus that monitors their body movements. However, we anticipate that video processing technology will eventually enable computers to determine eye contact and body movement completely remotely. In this way, the algorithms themselves will be easily scalable – although the hardware and software interfaces may become more advanced, the associations between body language, tone, and emotion in humans is likely to remain the same.
Challenges
This project involves a number of technical challenges and open research questions:
- How can we track head movement and eye contact between two (or more) people in a semi-uncontrolled environment?
- Each participant will wear a headset with two infrared LED’s attached (see “Elements” below). Two Wiimotes will be mounted in the center of the table, each pointing upwards towards one of the diners. A computer will use data from the Wii API to determine head orientation.
- How can we use voice tone to accurately determine whether a date is going well?
- Software exists which can extract tune and pitch from speech. We plan to use this data to determine how “excited” the speaker is. We hypothesize that more frequent, excited conversation correlates with better dates.
- What head movement patterns indicate positive and negative emotions?
- This will require some research, but intuition suggests that frequent nodding is more positive than static or jerky head movement.
- How can we determine whether somebody is finished eating a meal based on head movements?
- Once again, this will involve some experimentation. We intend to record the head-movement data of a few people eating and determine some common characteristics. Other cues can also be used, such as a lack of conversation and the diners “looking around” for their waiter to bring the check.
