Group 1

From CoIN

Jump to: navigation, search

Contents

CiBorg (Computationally Intelligent Business Organization System)

Output

Project Report and Presentation

Project programs: Server (Readme) and Client

Members

  • Justin Hurley
  • Brian Martin
  • Gregory Methvin

Documents

Project Design

Elements

  • 1 Laptop
  • 4 Wired Headsets
  • 4 Bluetooth Headsets
  • 1 USB Soundcards

To Do

  • Using single computer and headset, test a software that captures speaking rate, F0, etc. in real time (See Elements page, discuss with Alan and/or Group 3)
  • Testing whether two audio input works using Bluetooth (TA can provide headsets, most laptops have Bluetooth)
  • Research USB Soundcards, check specs for concurrency capability and provide links

Introduction

CiBorg (Computationally Intelligent Business Organization System) aims to improve communication in a meeting environment. By recording and monitoring what each person says, CiBorg can make suggestions to improve group communication. CiBorg consists of three main components: receiving, processing, and responding to audio input.

Receiving

Receiving audio information from multiple speakers poses many problems, in particular separating the sources. Common business setups use a central microphone such as a PZM (Pressure Zone Microphone) to record multiple speakers. By using offset speakers, the noise from individual speakers can be cancelled to partially isolate a specific source. Separating audio sources is complicated and resource intensive. Instead, of separating inputs given one input source, we will use a separate input for each participant. In the business environment Bluetooth headsets have become commonplace. By using each participant’s individual Bluetooth headset we will be able to isolate each speaker. Each Bluetooth headset will be connected to one central computer where all processing will take place. So that we can place a larger focus on the intelligent aspects of the system instead of the hardware, we will limit the input to four headsets.

Processing

The majority of the project will focus on processing the audio inputs. The computer will record each input as a separate audio track. From each input we can determine the following data:

  • Time (when the person is speaking)
  • Volume (changes in relative volume and overall volume)
  • Pitch
  • Waveform

Using these inputs, we will investigate determining the following characteristics of speech:

  • Rate (number and duration of breaks in duration of breaks in speech) (Kuny)
  • Annunciation (smoothness of the waveform)
  • Tone (intonation from pitch) (Kuny) (Intonation (linguistics))
  • Interruptions (using time to determine when two or more people are speaking simultaneously)
  • Agreement/Disagreement (recognizing brief sounds as response instead of interruptions)
  • Group Dynamics (when people speak, how much each person speaks, etc…)

For the above, we need to investigate what characterizes each. The ideas in parentheses will need to be further explored.

Responding

Using the above information CiBorg will intelligently make suggestions to each person through their headset. For example, if a person is repeatedly interrupting other participants, CiBorg will advise that they be more courteous. Similarly, if a person is talking fast, CiBorg will quietly suggest that they slow down. If the person slows done, CiBorg will give them positive encouragement (“well done – this is a much better rate”). Likewise, by monitoring others speech CiBorg can advise the speaker. If the other participants are not engaged in the presentation, CiBorg will alert the speaker.

Challenges

  1. Multiple Inputs: Allowing any number of audio inputs would require significant hardware considerations. To avoid this, we will limit our implementation to four speakers. Likely, this will require an additional (external) sound card. Using mono input will help manage multiple inputs.
  2. Synchronization: The audio inputs need to be synchronized for the time information to be useful. To resolve this we will need to investigate the timestamp data associated with input and present methods.
  3. Real-Time: For the responses to be useful, CiBorg must be able to respond in real-time. Again, an additional sound card will be necessary. We will need to investigate the processing required.
  4. Bluetooth: With Bluetooth headsets there is a potential for disconnect. Our system will need to respond appropriately. More importantly, we want to focus on the processing portion as much as possible. As such, we will use wired headsets initially that can plug directly into the sound card.

Scenarios

The applications and implementation of our design is illustrated in the following two scenarios:

  • Aaron is nervous and speaking fast during his sales presentation, losing the attention and focus of the audience. By monitoring his rate, CiBorg knows that he has begun to talk faster. Recognizing this, his headset reminds him to slow down his speech. As it continues to monitor his speech, CiBorg will encourage him for being clearer and regaining the focus of the audience.
  • Bob is very strongly opinioned; his frequent interruptions cause his coworkers to be less inclined to give input. CiBorg recognizes that the other participants are not expressing their opinions because of Bob’s interruptions and suggests to him to allow others to speak. Each time Bob interrupts, a small sound can be played to alert him.

Tentative Implementation Plan

  • Thu 10/9/08: Planning completed, software layout. Hardware setup, audio inputs recognized. Research speech analysis.
  • Thu 10/16/08: Record each audio channel. Begin algorithm design.
  • Thu 10/23/08: Identify time spoken for each speaker, synchronization. Data for how much each person speaks, when they speak, and interruption information.
  • Thu 10/30/08: Midpoint Milestone - Working Demo. Group dynamics information.
  • Thu 11/13/08: Individual data recording: volume, pitch, and waveform.
  • Thu 11/20/08: Individual data analysis: rate and tone.
  • Thu 12/04/08: Project presentations
  • Thu 12/11/08: Project Reports (5-8 Pages) + Project presentations