Skip to content

ICDM-UESTC/Invalid-Speech-Rejection-Solution-for-Complex-Scenarios

Repository files navigation

Invalid Speech Rejection for Voice Robots in Complex Scenarios

1. Project Introduction

This project focuses on robust understanding for customer-service outbound voice robots in complex real-world conditions.

In practical outbound-call scenarios, the SLU system is easily affected by:

  • environmental noise
  • background human speech
  • non-human-machine conversation segments
  • colloquial and non-standard expressions

These disturbances can lead to invalid responses and false triggering. To address this issue, we design an invalid speech rejection solution for complex scenarios, improving reject robustness while maintaining intent understanding performance.

2. Project Structure (Concise)

ICDM-SLU/
├── eval.sh                              # Main evaluation entry script
├── speech_intent_slot_eval.py           # Inference + intent/reject metric computation
├── slu_models_e2e.py                    # Core end-to-end SLU model
├── e2e_cl_fusion_data.py                # Clean/noisy data fusion and data pipeline
├── fusionLayerWrapper.py                # Fusion layer wrapper
├── contrastive_pretraining.py           # Contrastive pretraining
├── val_ic_cf_from_eval.py               # Validation helper example
├── tokenizer.sh                         # Tokenizer training script
├── requirements.txt                     # Python dependency list
├── configs/
│   └── parakeet_transformer_large_bpe_SE_fusion.yaml   # Main model/training config
├── ckpt/
│   └── Intent/                          # Model checkpoints and prediction outputs (e.g., predictions.json)
├── encoders/
│   ├── parakeet-tdt_ctc-110m/           # Pretrained ASR encoder resources
│   ├── whisper-m/                       # Whisper-related resources
│   └── t5_tiny/                         # Lightweight text encoder/LM resources
├── eval_utils/
│   ├── inference.py                     # Inference runner
│   ├── evaluator.py                     # SLURP evaluator
│   └── evaluation/metrics/              # Metric utilities
├── SE_modules/                          # Speech enhancement modules (including Matcha-related components)
└── scripts/                             # Utility scripts (export, tokenizer, ASR/LM, etc.)

3. Usage and Inference

3.1 Data Preparation

The evaluation input paths used in eval.sh are:

  • dataset_manifest: ./data/test/manifest.json
  • asr_transcripts_filepath: ./data/test/asr_transcripts.jsonl

You can:

  1. run directly with this path layout;
  2. place data in your own location and update paths in eval.sh accordingly.

Recommended in-repo organization:

ICDM-SLU-open-ver/
├── data/
│   └── test/
│       ├── manifest.json
│       └── asr_transcripts.jsonl

SLURP can be obtained from: https://github.com/pswietojanski/slurp

Data construction uses SLURP as the base corpus, then mixes VoiceBank human-speech noise and ESC-50 environmental sounds at equal ratio under SNRS_DB = [0, 5, 10], with WER greater than 25%.

3.2 Environment Setup

conda create -n slu python=3.10.12 -y
conda activate slu
pip install -r requirements.txt

3.3 Run Evaluation

bash eval.sh

3.4 Result Location

If output_filename is not explicitly set, speech_intent_slot_eval.py writes predictions to:

  • ckpt/Intent/predictions.json

The terminal prints:

  • scenario/action/intent-related metrics
  • reject precision, recall, F1, and accuracy

3.5 Final Performance

Under the above noise-augmented SLURP setting, this solution achieves:

  • Intent recognition accuracy > 90%
  • Reject accuracy > 95%

4. Open-Source Pretrained Models

5. Our implementation refers to the following great repository:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors