Human Computer Interaction
Adaptive Spoken Dialog with Human and Computer Partners
PI: Susan Brennan (Psychology) Co-PIs: Richard Gerrig (Psychology), Marie Huffman (Linguistics), Arthur Samuel (Psychology), Amanda Stent (Computer Science)Â
This 4-year project takes an innovative and multidisciplinary approach to characterizing how speech production and interpretation are coordinated in dialogs. It examines how speakers adapt to both human and computer conversational partners, and will produce prototype systems that flexibly adapt to a human user over the course of a dialog. Adaptations include those that make processing easier for both partners (and that may be fairly automatic), such as converging on the same wording or dialect; adaptations also include adjustments made by one partner explicitly "for" the other. Findings from human dialog will be applied to systems that use speech recognition and generation, with the goals of (1) adapting the system's vocabulary, dialect, and perspective to the user's needs whenever feasible (responsive generation), and (2) shaping users to spontaneously adapt their utterances to forms that the system can process more robustly (directive generation). The project brings together methods and theoretical perspectives from computer science, linguistics and psychology to advance theories and improve applications. Our methods include controlled experiments; data collection in the lab, in the field, and on the Web; corpus analysis; simulation studies; and prototyping and evaluation of spoken dialog systems. Three applications are planned: a picture matching game, a PDA-based calendar system, and a telephone-based course evaluation system for Stony Brook's undergraduate community. (NSF)
Content-Driven Techniques for Non-Visual Web Access
PI: Y. Annie Liu
The World Wide Web has evolved into an indispensable medium for dissemination of information, entertainment, commerce and education. However, the graphical nature of most browsing software as well as the diversity and complexity of web content has limited access to this technology for an entire community of persons with visual disabilities. Existing audio browsers that are based on text-to-speech conversion (e.g. screen readers) are not capable of describing the conceptual organization of a document's content or of letting a user select parts of a document to listen to. As a result, persons with visual disabilities can find it difficult to understand the organization of documents (such as being able to distinguish topics, correlate similar items, etc.), and waste considerable time and attention listening to irrelevant information. This project is developing HearSay, a system that will bring the browsing experience of persons with visual disabilities closer to that of sighted people. HearSay will be based on automated techniques for structuring the content of web documents into labeled partitions consisting of logically related items. By enabling interactive speech-driven guided exploration, in which the system presents the document's labeled content, and the user selects which parts of the content to listen to and when to navigate to a new page, HearSay will make non-visual browsing far less cumbersome. Furthermore for repetitive browsing tasks, HearSay will let users create and retrieve personalized content in different ways, ranging from content-based voicemarking of selected partitions in a page to powerful personal information assistants that gather and present user-defined information at the user's command. The ability to browse the Web using alternative modalities as will be facilitated by HearSay, will offer significant benefits not only to users with disabilities, but also to mobile users of hand-held devices.  (NSF)

