Chapter 1—Speech: Articulatory, Linguistic, Acoustic, and Perceptual Descriptions
1. Introduction
2. The Encoded Nature of Speech
3. The Production of Speech
3.1 The Vocal Folds—Voiced and Voiceless Sounds
3.2 The Supraglottal Vocal Tract
3.3 American English Consonants
3.3.1 Place of Articulation
3.3.2 Manner of Articulation
3.4 American English Vowels
4. Acoustic Theory of Speech Production
4.1 The Source-Filter Theory
4.2 The excitation source
4.3 The Filter: A Formant Representation
4.4 Other Representations of the Acoustic Filter
4.5 Independence of Source and Filter
5. Linguistic Units of Speech
5.1 Phonemes
5.2 Distinctive Features
5.3 Morphemes
5.4 Syllabic Structure
5.5 Prosodic Structure
6. Acoustic Characteristics of Speech
6.1 Frequency analysis
6.2 Formants
6.3 Consonants
6.4 Vowels
7. Auditory Perception
7.1 Human audition—Perceptual measures
7.2 Loudness Estimation
7.3 Speech in noise
7.4 Pitch perception
8. Speech Perception
8.1 Differences between Speakers
8.2 Continuous Speech
9. Summary
References

Chapter 2—Speech Technology
1. Speech Analysis
1.1 Short-Time Speech Analysis
1.2 Time-Domain Analysis of Speech Signals
1.2.1 Short-Time Average Zero-Crossing Rate
1.2.2 Short-Time Autocorrelation Function
1.3 Frequency-Domain Parameters
1.3.1 Short-Time Fourier Analysis
1.3.2 Formant Estimation and Tracking
1.4 F0 (“Pitch”) Estimation
1.5 Reduction of Information
2. Digital Coding of Speech Signals
2.1 Quantization
2.2 Speech Redundancies
2.3 Time-Adaptive Waveform Coding
2.4 Exploiting Properties of the Spectral Envelope
2.4.1 Differential Pulse Code Modulation (DPCM)
2.4.2 Linear Prediction
2.4.3 Delta Modulation (DM)
2.4.4 Adaptive Differential Pulse-Code Modulation (ADPCM)
2.5 Exploiting the Periodicity of Voiced Speech
2.6 Exploiting Auditory Limitations
2.7 Spectral (Frequency-Domain) Coders
2.8 Vocoders
2.8.1 Linear Predictive (LPC) Vocoders
2.8.2 Least-Squares LPC Analysis
2.8.3 Spectral Estimation via LPC
2.8.4 LPC Synthesis
2.8.5 Window Considerations
2.8.6 Excitation for Vocoders
2.8.7 Multipulse-Excited LPC (MLPC)
2.9 Vector Quantization (VQ)
2.10 Network Considerations
3. Speech Synthesis
3.1 Concatenation of Speech Units
3.2 Unrestricted Text (Text-to-Speech) Systems
3.3 Formant Synthesis
3.4 Linear Predictive Coding (LPC) Synthesis
3.5 Intonation
4. Speech Recognition
4.1 Performance Evaluation
4.2 Basic Pattern Recognition Approach
4.3 Parametric Representation
4.4 Distance Measures and Decisions
4.5 Timing Considerations
4.6 Segmentation
4.7 Dynamic Time Warping (DTW)
4.8 Applying Vector Quantization to ASR
4.9 Networks for Speech Recognition
4.10 Using Prosodics to Aid Recognition
5. References

Chapter 3—Text-to-Speech Systems
1. Introduction
1.1 Applications of TTS
1.2 Stages of Text-to-Speech Conversion
2. Overview of TTS Technology and Behavioral Science Contributions to TTS Technology
2.1 Text normalization
2.2 Syntactic Parsing
2.3 Word Pronunciation
2.3.1 Letter-to-Sound Rules
2.3.2 Dictionaries
2.3.3 Subjective Evaluation of Pronunciation Accuracy
2.4 Determination of Prosody
2.4.1 Intonation
2.4.2 Segmental Durations
2.4.3 Intensity
2.5 Speech Synthesis
3. TTS Evaluations
4. Human Factors Issues in Tailoring TTS Technology for Specific Applications
4.1 Wording and Tuning TTS Announcements
5. Summary and Conclusions
6. References

Chapter 4—Using Speech Recognition Systems: Issues in Cognitive Engineering
1. Abstract
2. Acknowledgments
3. Introduction
4. The Recognition System
4.1 Characteristics of Speech Recognition Systems
4.2 Training a Recognition System
4.3 Speech Input
4.4 Contributions from Speech Perception Research
4.5 The User Interface
4.6 Performance Assessment
4.7 The Next Generation of Recognition Systems
5. The Human Operator
6. The Recognition Environment and Context
7. Conclusions and Implications
8. References

Chapter 5—Intelligibility and Acceptability Testing for Speech Technology
1. Abstract
2. Introduction
3. Overview of the Testing Process
3.1 The Speaker
3.2 The Speaker Environment
3.3 The Voice Processor
3.4 The Transmission Channel
3.5 The Listener
3.6 The Listening Environment
4. Factors that Influence Speech Intelligibility and Acceptability
4.1 Segmental Information
4.2 Suprasegmental Information
4.3 Nonspeech Sounds
4.4 Contextual Information
4.5 Speaker Recognition
5. Overview of Speech Evaluation Techniques
5.1 Intelligibility Test Methods
5.1.1 Standard Tests
5.1.1.1 PB Words
5.1.1.2 Diagnostic Rhyme Test
5.1.1.3 Modified Rhyme Test
5.1.1.4 Other Rhyme Tests
5.1.1.5 Comparison of Tests
5.1.2 Sentence Tests: Harvard and Haskins Sentences
5.1.3 Other Speech Materials
5.2 Acceptability Test Methods
5.2.1 Diagnostic Acceptability Measure (DAM)
5.2.2 Mean Opinion Score (MOS)
5.2.3 Phoneme Specific Sentences
5.3 Communicability Tests
5.3.1 Free Conversation Test
5.3.2 Diagnostic Communicability Test
5.3.3 NRL Communicability Test
5.4 Physical Measures of the Speech Signal
5.4.1 Articulation Index (AI)
5.4.2 Speech Transmission Index (STI)
5.4.3 Combined measures
6. Relations Among Different Tests
7. Selecting Test Methods
7.1 Reasons for Testing
7.2 Type of Voice Application and Comparisons to be Made
8. General Recommendations
9. Acknowledgments
10. References

Chapter 6—Perception and Comprehension of Speech
1. Introduction
1.1 Voice Output Devices
1.2 Behavioral Evaluation
1.3 Intelligibility
1.4 Comprehension
1.5 Attention and Processing Resources
1.5.1 Limited Capacity
1.5.2 Attention and Synthetic Speech
1.5.3 Attention and Comprehension
1.5.4 Summary
2. Measures of Comprehension
2.1 Successive Measures
2.1.1 Recall
2.1.2 Recognition
2.1.3 Sentence Verification
2.2 Simultaneous Measures
2.2.1 Monitoring Tasks
2.2.2 Reading Times
3. Comprehension of Synthetic Speech
3.1 Successive Measures
3.1.1 Sentence Verification
3.1.2 Comprehension of Fluent Connected Speech
3.2 Simultaneous Measures
4. Summary and Conclusions
4.1 Efficiency of Comprehension
4.2 Capacity Demands
4.3 Training Effects and Perceptual Learning
4.4 Applications
4.5 Future Directions
4.5.1 Capacity Demands
4.5.2 Training Effects
4.5.3 Memory Decay
4.5.4 Generalization
4.5.5 Miscellaneous Issues
5. References

Chapter 7—Human Factors in Lifecycle Development
1. Introduction And Motivation
2. A normative model of life cycle development.
3. Problem Finding
4. Problem Formulation
5. Invention
6. Design
6.1 Product Definition
6.2 Initial Design
6.3 Usability Criteria
7. Prototype
8. Evaluation
9. Redesign
10. Testing
11. Deployment
12. Observation in the Field
13. Post-Mortem
14. Special Problems With Voice
14.1 Testing for Technologies that Do Not Exist

Chapter 8—Technical Issues Underlying the Development and Use of a Speech Research Laboratory
1. Introduction
2. Requirements
2.1 Input Source
2.2 Basic Principles of Data Acquisition
2.2.1 Resolution
2.2.2 Sampling Rate
2.2.3 Triggering Options
2.2.4 Number of Channels
2.2.5 Input Voltage Range
2.2.6 Direct-Memory-Access (DMA)
2.3 Filters and Low-Pass Filtering
2.3.1 Programmable Cut-Off Frequency
2.3.2 Roll-Off Frequencies
2.3.3 Input Voltage Range
2.3.4 Active or Passive Filters
2.4 Computer Systems
2.5 Speech Software
2.5.1 Ease of Recording
2.5.2 Options in Displaying the Digitized Speech Signal
2.5.3 Types of analyses of the speech signal
2.5.4 Digital Filtering
2.5.5 File Manipulation
2.6 Speech Output and Signal Generation (Speech Synthesis)
3. Areas of Speech Research and Its Applications
4. A Description of a Functional Speech Lab
5. References

Chapter 9—Speech Recognition
1. Introduction
1.1 Special Considerations of Natural Speech for Speech Recognizers
2. Subdividing the Speech Recognition Problem
3. Overview of Recognition Algorithms
3.1 Endpoint Detection
3.2 Speech Coding
3.3 Recognition Algorithms
3.4 Second Level Algorithms
4. Defining Your Speech Recognition Needs
4.1 Application Environment
4.2 Vocabulary
4.3 Speaker Dependence and Training Methods
5. Future Speech Recognition Systems
6. References
7.1 Annotated References

Chapter 10—Psychological and Human Factors Issues in the Design of Speech Recognition Systems
1. Introduction
2. Current Applications
3. The Physical and Acoustical Environment
3.1 The Public Switched Telephone Network (PSTN)
3.2 The Acoustic Environment and Human Conversational Behavior
4. Accuracy/System Evaluation
4.1 Measures of Recognition Performance
4.2 Accuracy of Strings Versus Isolated Digits or Words
4.3 Recognizer Performance in Lab Versus Field
4.4 The Need for Multiple Criteria
5. Typical Problems Encountered in Current Systems
6. User-Interface Design
6.1 Dialogue Control
6.1.1 Feedback and Timing
6.1.2 Error Recovery
6.1.3 Exiting the System
6.2 Convergence
6.3 Combining Input: Voice and Tone Dialing
6.4 Prompt Messages
7. Developing Automation with ASR: A Five-Stage Approach
7.1 Stage 1
7.2 Stage 2
7.3 Stage 3
7.4 Stage 4
7.5 Stage 5
8. Future Systems and Their Requirements
8.1 Conversational Computer Systems
8.2 Vocabulary Requirements
9. Conclusions
10. Acknowledgments
11. References

Chapter 11—Voiced Mail: Speech Synthesis of Electronic Mail
1. Applications of Synthetic Speech
2. Limitations of Synthetic Speech Output
3. Voiced Mail Operation
3.1 Text Filtering
3.1.1 The “From”
3.1.2 The “Date”
3.2 Message Ordering
3.3 Intelligibility Issues
3.4 Advantages of Multiple Media
3.5 Flow of Presentation in a System Without Menus
3.6 Simple Command Set
4. Voiced Mail Today
5. References

Chapter 12—The Design of Spoken Language Interfaces
1. Introduction
2. Task Analysis
3. Language Design
3.1 The “Wizard” paradigm
3.2 Protocol Transcription
3.3 Language Analysis and Design
3.4 Parser Design
4. The Recognition System
5. Interface Design
5.1 Interaction Structure
6. System Evaluation
6.1 Task Completion Time
6.2 Speech recognizer performance
7. Summary
8. Acknowledgments
9. References

Chapter 13—Synthetic Spoken Driving Instructions by Telephone and while Driving: Implications for Interfaces
1. Direction Assistance
1.1 The initial message
1.2 Six Different Protocols for Data Entry
1.3 Correcting errors
1.4 How Direction Assistance Determines Locations
1.5 The Telephone is a Mixed Blessing for Direction Assistance
1.6 The Describer Produces a Text Description of the Route
1.7 The Narrator Reads the Route Description
2. The Back Seat Driver
2.1 The Back Seat Driver Maintains the Driver’s Sense of Co-presence
3. Conclusions
4. Acknowledgments
5. References

Chapter 14—Human Factors Contributions to the Development of a Speech Recognition Cellular Telephone
Chapter 15—Voice Quality Assessment of Digital Network Technologies
1. Introduction
2. Voice Quality Assessments of Echo Cancellation Devices
3. Performance of Digital Circuit Multiplication Systems
3.1 Voice Quality Associated with Low Bit-Rate Voice Compression
3.2 Voice Quality Associated with Digital Speech Interpolation
3.3 Voice Quality Associated with Hybrid Systems
4. Concluding Remarks
5. Acknowledgments
6. References

Chapter 16—Behavioral Aspects of Speech Technology: Industrial Systems
1. Introduction
2. General Design Considerations
3. Determining a Role for Speech Input
3.1 Task Analysis
3.2 Task loading
3.3 User Population
3.4 The Environment
3.5 Technology Capabilities
4. Optimizing System Design
4.1 Transaction Time
4.2 Dialogue Design
4.2.1 Dialogue Acts
4.2.2 Feedback
4.2.3 Error Correction
5. Case Studies
5.1 Jaguar Cars
5.2 Caterpillar
6. Conclusions
7. References

Chapter 17—Toys That Talk: Two Case Studies
1. Introduction
2. Speak & Spell
2.1 Program Beginnings
2.2 Alternatives
2.3 Processing Architecture
2.4 Program Support
2.5 Market Assessments
2.6 Product Design
2.7 Production Start-Up
2.8 Conclusions
3. Julie Doll
3.1 Technology
3.2 Product Development
3.3 Product Integration
4. Conclusions
5. References

Chapter 18—HADES: A Case Study of the Development of a Signal Analysis System
1. Introduction
2. Background
3. Early Developments—WENDY and SPA
4. HUI—The Haskins User Interface Shell
5. A New Approach to System Development
6. Initial Steps to a Modern Design: SPEED
7. The HADES Prototype: An Advanced Design
8. HADES 0.8: The Beta Version
9. Moving to Release
10. Conclusion
11. References
12. Acknowledgments

Chapter 19—The Perceptual and Acoustic Assessment of the Speech of Hearing-Impaired Talkers
1. Abstract
2. Introduction
3. Methods
3.1 Subjects
3.2 Material
3.3 Recording Conditions
3.4 Digitization and Phonetic Judgment
3.5 Acoustic Analysis
4. Results
4.1 Intelligibility, Perceptual Consensus, and Phonetic Distortion
4.2 Acoustic Properties and Perceptual Judgments
4.2.1 Vowel Fronting
4.2.2 Stridency
4.2.3 Nasal Manner
4.2.4 Fundamental Frequency
4.2.5 Breathiness and Nasality
5. Discussion
6. Conclusion
7. Acknowledgements
8. References

Chapter 20—The Development of Text-To-Speech Technology For Use In Communication Aids
1. Introduction
2. The Multi-Language Text-To-Speech System
3. Special Features for Applications
3.1 Reading Mode
3.2 Speech Tempo and Loudness
3.3 Variation in Voices
3.4 A New Voice Source
3.5 Modeling Style
3.6 Controlling Emphasis
3.7 Saving Sentences
3.8 User-Defined Pronunciations
3.9 Indexing
3.10 Changing Language
4. Text-To-Speech In Voice Prosthesis
5. Multi-Talk
6. Blisstalk
7. Word Prediction
8. Talking Terminals
9. Daily Newspapers
10. Public Applications
11. Speech Recognition
12. Conclusions
13. References

Chapter 21—Computer Assisted Speech Training: Practical Considerations
1. Introduction
2. Computer-Based SpeechTraining (CBST) Systems
3. Design Considerations for Speech Feedback
4. Speech Training Curriculum
5. Role of Clinical Evaluation
5.1 Acceptability Evaluations
5.2 Clinical Effectiveness
5.3 Independent Verification of Clinical Effectiveness
5.3 Beta-test Evaluations
6. Summary
7. References
8. Acknowledgments

Chapter 22—What Computerized Speech Can Add to Remedial Reading
1. Introduction
2. What is the Deficit Underlying Reading Disabilities?
3. What Digitized Speech Allows
3.1 Short-term Studies with DECtalk Speech Synthesizer
3.2 Long-term Studies with ROSS
3.3 Taking Advantage of DECtalk’s Flexibility
3.4 Limitations of DECtalk
4. References

Chapter 23—Design of a Hearing Screening Test using Synthetic Speech
1. Introduction
2. Word sets
3. Synthesis of speech material
4. Screening protocol
5. Summary
6. Acknowledgments
7. References

Chapter 24—Speech Technology in Interactive Instruction
1. Introduction
2. Knowledge versus Skill
3. Spoken Instruction
3.1 Intelligibility
3.2 Naturalness
3.3 Identical Repetition
4. Instruction, Presentation and Feedback
5. Speech Storage and Production
5.1 Speed of Response
5.2 Controllable Intelligibility
5.3 Pre-established Temporal Structure
5.4 Prosodic Continuity
5.5 Limited On-line Memory
5.6 Real-time Operation
5.7 Reliability
5.8 Upgrades and Modification
6. References
Index
Copyright © CRC Press LLC
Hosted by uCoz