- Chapter 1Speech: Articulatory, Linguistic, Acoustic, and Perceptual Descriptions
- 1. Introduction
- 2. The Encoded Nature of Speech
- 3. The Production of Speech
- 3.1 The Vocal FoldsVoiced and Voiceless Sounds
- 3.2 The Supraglottal Vocal Tract
- 3.3 American English Consonants
- 3.3.1 Place of Articulation
- 3.3.2 Manner of Articulation
- 3.4 American English Vowels
- 4. Acoustic Theory of Speech Production
- 4.1 The Source-Filter Theory
- 4.2 The excitation source
- 4.3 The Filter: A Formant Representation
- 4.4 Other Representations of the Acoustic Filter
- 4.5 Independence of Source and Filter
- 5. Linguistic Units of Speech
- 5.1 Phonemes
- 5.2 Distinctive Features
- 5.3 Morphemes
- 5.4 Syllabic Structure
- 5.5 Prosodic Structure
- 6. Acoustic Characteristics of Speech
- 6.1 Frequency analysis
- 6.2 Formants
- 6.3 Consonants
- 6.4 Vowels
- 7. Auditory Perception
- 7.1 Human auditionPerceptual measures
- 7.2 Loudness Estimation
- 7.3 Speech in noise
- 7.4 Pitch perception
- 8. Speech Perception
- 8.1 Differences between Speakers
- 8.2 Continuous Speech
- 9. Summary
- References
- Chapter 2Speech Technology
- 1. Speech Analysis
- 1.1 Short-Time Speech Analysis
- 1.2 Time-Domain Analysis of Speech Signals
- 1.2.1 Short-Time Average Zero-Crossing Rate
- 1.2.2 Short-Time Autocorrelation Function
- 1.3 Frequency-Domain Parameters
- 1.3.1 Short-Time Fourier Analysis
- 1.3.2 Formant Estimation and Tracking
- 1.4 F0 (Pitch) Estimation
- 1.5 Reduction of Information
- 2. Digital Coding of Speech Signals
- 2.1 Quantization
- 2.2 Speech Redundancies
- 2.3 Time-Adaptive Waveform Coding
- 2.4 Exploiting Properties of the Spectral Envelope
- 2.4.1 Differential Pulse Code Modulation (DPCM)
- 2.4.2 Linear Prediction
- 2.4.3 Delta Modulation (DM)
- 2.4.4 Adaptive Differential Pulse-Code Modulation (ADPCM)
- 2.5 Exploiting the Periodicity of Voiced Speech
- 2.6 Exploiting Auditory Limitations
- 2.7 Spectral (Frequency-Domain) Coders
- 2.8 Vocoders
- 2.8.1 Linear Predictive (LPC) Vocoders
- 2.8.2 Least-Squares LPC Analysis
- 2.8.3 Spectral Estimation via LPC
- 2.8.4 LPC Synthesis
- 2.8.5 Window Considerations
- 2.8.6 Excitation for Vocoders
- 2.8.7 Multipulse-Excited LPC (MLPC)
- 2.9 Vector Quantization (VQ)
- 2.10 Network Considerations
- 3. Speech Synthesis
- 3.1 Concatenation of Speech Units
- 3.2 Unrestricted Text (Text-to-Speech) Systems
- 3.3 Formant Synthesis
- 3.4 Linear Predictive Coding (LPC) Synthesis
- 3.5 Intonation
- 4. Speech Recognition
- 4.1 Performance Evaluation
- 4.2 Basic Pattern Recognition Approach
- 4.3 Parametric Representation
- 4.4 Distance Measures and Decisions
- 4.5 Timing Considerations
- 4.6 Segmentation
- 4.7 Dynamic Time Warping (DTW)
- 4.8 Applying Vector Quantization to ASR
- 4.9 Networks for Speech Recognition
- 4.10 Using Prosodics to Aid Recognition
- 5. References
- Chapter 3Text-to-Speech Systems
- 1. Introduction
- 1.1 Applications of TTS
- 1.2 Stages of Text-to-Speech Conversion
- 2. Overview of TTS Technology and Behavioral Science Contributions to TTS Technology
- 2.1 Text normalization
- 2.2 Syntactic Parsing
- 2.3 Word Pronunciation
- 2.3.1 Letter-to-Sound Rules
- 2.3.2 Dictionaries
- 2.3.3 Subjective Evaluation of Pronunciation Accuracy
- 2.4 Determination of Prosody
- 2.4.1 Intonation
- 2.4.2 Segmental Durations
- 2.4.3 Intensity
- 2.5 Speech Synthesis
- 3. TTS Evaluations
- 4. Human Factors Issues in Tailoring TTS Technology for Specific Applications
- 4.1 Wording and Tuning TTS Announcements
- 5. Summary and Conclusions
- 6. References
- Chapter 4Using Speech Recognition Systems: Issues in Cognitive Engineering
- 1. Abstract
- 2. Acknowledgments
- 3. Introduction
- 4. The Recognition System
- 4.1 Characteristics of Speech Recognition Systems
- 4.2 Training a Recognition System
- 4.3 Speech Input
- 4.4 Contributions from Speech Perception Research
- 4.5 The User Interface
- 4.6 Performance Assessment
- 4.7 The Next Generation of Recognition Systems
- 5. The Human Operator
- 6. The Recognition Environment and Context
- 7. Conclusions and Implications
- 8. References
- Chapter 5Intelligibility and Acceptability Testing for Speech Technology
- 1. Abstract
- 2. Introduction
- 3. Overview of the Testing Process
- 3.1 The Speaker
- 3.2 The Speaker Environment
- 3.3 The Voice Processor
- 3.4 The Transmission Channel
- 3.5 The Listener
- 3.6 The Listening Environment
- 4. Factors that Influence Speech Intelligibility and Acceptability
- 4.1 Segmental Information
- 4.2 Suprasegmental Information
- 4.3 Nonspeech Sounds
- 4.4 Contextual Information
- 4.5 Speaker Recognition
- 5. Overview of Speech Evaluation Techniques
- 5.1 Intelligibility Test Methods
- 5.1.1 Standard Tests
- 5.1.1.1 PB Words
- 5.1.1.2 Diagnostic Rhyme Test
- 5.1.1.3 Modified Rhyme Test
- 5.1.1.4 Other Rhyme Tests
- 5.1.1.5 Comparison of Tests
- 5.1.2 Sentence Tests: Harvard and Haskins Sentences
- 5.1.3 Other Speech Materials
- 5.2 Acceptability Test Methods
- 5.2.1 Diagnostic Acceptability Measure (DAM)
- 5.2.2 Mean Opinion Score (MOS)
- 5.2.3 Phoneme Specific Sentences
- 5.3 Communicability Tests
- 5.3.1 Free Conversation Test
- 5.3.2 Diagnostic Communicability Test
- 5.3.3 NRL Communicability Test
- 5.4 Physical Measures of the Speech Signal
- 5.4.1 Articulation Index (AI)
- 5.4.2 Speech Transmission Index (STI)
- 5.4.3 Combined measures
- 6. Relations Among Different Tests
- 7. Selecting Test Methods
- 7.1 Reasons for Testing
- 7.2 Type of Voice Application and Comparisons to be Made
- 8. General Recommendations
- 9. Acknowledgments
- 10. References
- Chapter 6Perception and Comprehension of Speech
- 1. Introduction
- 1.1 Voice Output Devices
- 1.2 Behavioral Evaluation
- 1.3 Intelligibility
- 1.4 Comprehension
- 1.5 Attention and Processing Resources
- 1.5.1 Limited Capacity
- 1.5.2 Attention and Synthetic Speech
- 1.5.3 Attention and Comprehension
- 1.5.4 Summary
- 2. Measures of Comprehension
- 2.1 Successive Measures
- 2.1.1 Recall
- 2.1.2 Recognition
- 2.1.3 Sentence Verification
- 2.2 Simultaneous Measures
- 2.2.1 Monitoring Tasks
- 2.2.2 Reading Times
- 3. Comprehension of Synthetic Speech
- 3.1 Successive Measures
- 3.1.1 Sentence Verification
- 3.1.2 Comprehension of Fluent Connected Speech
- 3.2 Simultaneous Measures
- 4. Summary and Conclusions
- 4.1 Efficiency of Comprehension
- 4.2 Capacity Demands
- 4.3 Training Effects and Perceptual Learning
- 4.4 Applications
- 4.5 Future Directions
- 4.5.1 Capacity Demands
- 4.5.2 Training Effects
- 4.5.3 Memory Decay
- 4.5.4 Generalization
- 4.5.5 Miscellaneous Issues
- 5. References
- Chapter 7Human Factors in Lifecycle Development
- 1. Introduction And Motivation
- 2. A normative model of life cycle development.
- 3. Problem Finding
- 4. Problem Formulation
- 5. Invention
- 6. Design
- 6.1 Product Definition
- 6.2 Initial Design
- 6.3 Usability Criteria
- 7. Prototype
- 8. Evaluation
- 9. Redesign
- 10. Testing
- 11. Deployment
- 12. Observation in the Field
- 13. Post-Mortem
- 14. Special Problems With Voice
- 14.1 Testing for Technologies that Do Not Exist
- Chapter 8Technical Issues Underlying the Development and Use of a Speech Research Laboratory
- 1. Introduction
- 2. Requirements
- 2.1 Input Source
- 2.2 Basic Principles of Data Acquisition
- 2.2.1 Resolution
- 2.2.2 Sampling Rate
- 2.2.3 Triggering Options
- 2.2.4 Number of Channels
- 2.2.5 Input Voltage Range
- 2.2.6 Direct-Memory-Access (DMA)
- 2.3 Filters and Low-Pass Filtering
- 2.3.1 Programmable Cut-Off Frequency
- 2.3.2 Roll-Off Frequencies
- 2.3.3 Input Voltage Range
- 2.3.4 Active or Passive Filters
- 2.4 Computer Systems
- 2.5 Speech Software
- 2.5.1 Ease of Recording
- 2.5.2 Options in Displaying the Digitized Speech Signal
- 2.5.3 Types of analyses of the speech signal
- 2.5.4 Digital Filtering
- 2.5.5 File Manipulation
- 2.6 Speech Output and Signal Generation (Speech Synthesis)
- 3. Areas of Speech Research and Its Applications
- 4. A Description of a Functional Speech Lab
- 5. References
- Chapter 9Speech Recognition
- 1. Introduction
- 1.1 Special Considerations of Natural Speech for Speech Recognizers
- 2. Subdividing the Speech Recognition Problem
- 3. Overview of Recognition Algorithms
- 3.1 Endpoint Detection
- 3.2 Speech Coding
- 3.3 Recognition Algorithms
- 3.4 Second Level Algorithms
- 4. Defining Your Speech Recognition Needs
- 4.1 Application Environment
- 4.2 Vocabulary
- 4.3 Speaker Dependence and Training Methods
- 5. Future Speech Recognition Systems
- 6. References
- 7.1 Annotated References
- Chapter 10Psychological and Human Factors Issues in the Design of Speech Recognition Systems
- 1. Introduction
- 2. Current Applications
- 3. The Physical and Acoustical Environment
- 3.1 The Public Switched Telephone Network (PSTN)
- 3.2 The Acoustic Environment and Human Conversational Behavior
- 4. Accuracy/System Evaluation
- 4.1 Measures of Recognition Performance
- 4.2 Accuracy of Strings Versus Isolated Digits or Words
- 4.3 Recognizer Performance in Lab Versus Field
- 4.4 The Need for Multiple Criteria
- 5. Typical Problems Encountered in Current Systems
- 6. User-Interface Design
- 6.1 Dialogue Control
- 6.1.1 Feedback and Timing
- 6.1.2 Error Recovery
- 6.1.3 Exiting the System
- 6.2 Convergence
- 6.3 Combining Input: Voice and Tone Dialing
- 6.4 Prompt Messages
- 7. Developing Automation with ASR: A Five-Stage Approach
- 7.1 Stage 1
- 7.2 Stage 2
- 7.3 Stage 3
- 7.4 Stage 4
- 7.5 Stage 5
- 8. Future Systems and Their Requirements
- 8.1 Conversational Computer Systems
- 8.2 Vocabulary Requirements
- 9. Conclusions
- 10. Acknowledgments
- 11. References
- Chapter 11Voiced Mail: Speech Synthesis of Electronic Mail
- 1. Applications of Synthetic Speech
- 2. Limitations of Synthetic Speech Output
- 3. Voiced Mail Operation
- 3.1 Text Filtering
- 3.1.1 The From
- 3.1.2 The Date
- 3.2 Message Ordering
- 3.3 Intelligibility Issues
- 3.4 Advantages of Multiple Media
- 3.5 Flow of Presentation in a System Without Menus
- 3.6 Simple Command Set
- 4. Voiced Mail Today
- 5. References
- Chapter 12The Design of Spoken Language Interfaces
- 1. Introduction
- 2. Task Analysis
- 3. Language Design
- 3.1 The Wizard paradigm
- 3.2 Protocol Transcription
- 3.3 Language Analysis and Design
- 3.4 Parser Design
- 4. The Recognition System
- 5. Interface Design
- 5.1 Interaction Structure
- 6. System Evaluation
- 6.1 Task Completion Time
- 6.2 Speech recognizer performance
- 7. Summary
- 8. Acknowledgments
- 9. References
- Chapter 13Synthetic Spoken Driving Instructions by Telephone and while Driving: Implications for Interfaces
- 1. Direction Assistance
- 1.1 The initial message
- 1.2 Six Different Protocols for Data Entry
- 1.3 Correcting errors
- 1.4 How Direction Assistance Determines Locations
- 1.5 The Telephone is a Mixed Blessing for Direction Assistance
- 1.6 The Describer Produces a Text Description of the Route
- 1.7 The Narrator Reads the Route Description
- 2. The Back Seat Driver
- 2.1 The Back Seat Driver Maintains the Drivers Sense of Co-presence
- 3. Conclusions
- 4. Acknowledgments
- 5. References
- Chapter 14Human Factors Contributions to the Development of a Speech Recognition Cellular Telephone
- Chapter 15Voice Quality Assessment of Digital Network Technologies
- 1. Introduction
- 2. Voice Quality Assessments of Echo Cancellation Devices
- 3. Performance of Digital Circuit Multiplication Systems
- 3.1 Voice Quality Associated with Low Bit-Rate Voice Compression
- 3.2 Voice Quality Associated with Digital Speech Interpolation
- 3.3 Voice Quality Associated with Hybrid Systems
- 4. Concluding Remarks
- 5. Acknowledgments
- 6. References
- Chapter 16Behavioral Aspects of Speech Technology: Industrial Systems
- 1. Introduction
- 2. General Design Considerations
- 3. Determining a Role for Speech Input
- 3.1 Task Analysis
- 3.2 Task loading
- 3.3 User Population
- 3.4 The Environment
- 3.5 Technology Capabilities
- 4. Optimizing System Design
- 4.1 Transaction Time
- 4.2 Dialogue Design
- 4.2.1 Dialogue Acts
- 4.2.2 Feedback
- 4.2.3 Error Correction
- 5. Case Studies
- 5.1 Jaguar Cars
- 5.2 Caterpillar
- 6. Conclusions
- 7. References
- Chapter 17Toys That Talk: Two Case Studies
- 1. Introduction
- 2. Speak & Spell
- 2.1 Program Beginnings
- 2.2 Alternatives
- 2.3 Processing Architecture
- 2.4 Program Support
- 2.5 Market Assessments
- 2.6 Product Design
- 2.7 Production Start-Up
- 2.8 Conclusions
- 3. Julie Doll
- 3.1 Technology
- 3.2 Product Development
- 3.3 Product Integration
- 4. Conclusions
- 5. References
- Chapter 18HADES: A Case Study of the Development of a Signal Analysis System
- 1. Introduction
- 2. Background
- 3. Early DevelopmentsWENDY and SPA
- 4. HUIThe Haskins User Interface Shell
- 5. A New Approach to System Development
- 6. Initial Steps to a Modern Design: SPEED
- 7. The HADES Prototype: An Advanced Design
- 8. HADES 0.8: The Beta Version
- 9. Moving to Release
- 10. Conclusion
- 11. References
- 12. Acknowledgments
- Chapter 19The Perceptual and Acoustic Assessment of the Speech of Hearing-Impaired Talkers
- 1. Abstract
- 2. Introduction
- 3. Methods
- 3.1 Subjects
- 3.2 Material
- 3.3 Recording Conditions
- 3.4 Digitization and Phonetic Judgment
- 3.5 Acoustic Analysis
- 4. Results
- 4.1 Intelligibility, Perceptual Consensus, and Phonetic Distortion
- 4.2 Acoustic Properties and Perceptual Judgments
- 4.2.1 Vowel Fronting
- 4.2.2 Stridency
- 4.2.3 Nasal Manner
- 4.2.4 Fundamental Frequency
- 4.2.5 Breathiness and Nasality
- 5. Discussion
- 6. Conclusion
- 7. Acknowledgements
- 8. References
- Chapter 20The Development of Text-To-Speech Technology For Use In Communication Aids
- 1. Introduction
- 2. The Multi-Language Text-To-Speech System
- 3. Special Features for Applications
- 3.1 Reading Mode
- 3.2 Speech Tempo and Loudness
- 3.3 Variation in Voices
- 3.4 A New Voice Source
- 3.5 Modeling Style
- 3.6 Controlling Emphasis
- 3.7 Saving Sentences
- 3.8 User-Defined Pronunciations
- 3.9 Indexing
- 3.10 Changing Language
- 4. Text-To-Speech In Voice Prosthesis
- 5. Multi-Talk
- 6. Blisstalk
- 7. Word Prediction
- 8. Talking Terminals
- 9. Daily Newspapers
- 10. Public Applications
- 11. Speech Recognition
- 12. Conclusions
- 13. References
- Chapter 21Computer Assisted Speech Training: Practical Considerations
- 1. Introduction
- 2. Computer-Based SpeechTraining (CBST) Systems
- 3. Design Considerations for Speech Feedback
- 4. Speech Training Curriculum
- 5. Role of Clinical Evaluation
- 5.1 Acceptability Evaluations
- 5.2 Clinical Effectiveness
- 5.3 Independent Verification of Clinical Effectiveness
- 5.3 Beta-test Evaluations
- 6. Summary
- 7. References
- 8. Acknowledgments
- Chapter 22What Computerized Speech Can Add to Remedial Reading
- 1. Introduction
- 2. What is the Deficit Underlying Reading Disabilities?
- 3. What Digitized Speech Allows
- 3.1 Short-term Studies with DECtalk Speech Synthesizer
- 3.2 Long-term Studies with ROSS
- 3.3 Taking Advantage of DECtalks Flexibility
- 3.4 Limitations of DECtalk
- 4. References
- Chapter 23Design of a Hearing Screening Test using Synthetic Speech
- 1. Introduction
- 2. Word sets
- 3. Synthesis of speech material
- 4. Screening protocol
- 5. Summary
- 6. Acknowledgments
- 7. References
- Chapter 24Speech Technology in Interactive Instruction
- 1. Introduction
- 2. Knowledge versus Skill
- 3. Spoken Instruction
- 3.1 Intelligibility
- 3.2 Naturalness
- 3.3 Identical Repetition
- 4. Instruction, Presentation and Feedback
- 5. Speech Storage and Production
- 5.1 Speed of Response
- 5.2 Controllable Intelligibility
- 5.3 Pre-established Temporal Structure
- 5.4 Prosodic Continuity
- 5.5 Limited On-line Memory
- 5.6 Real-time Operation
- 5.7 Reliability
- 5.8 Upgrades and Modification
- 6. References
Index