FunASR搭建语音识别服务和VAD检测_funasr vad

技术文档

调整VAD参数

1. 查找VAD模型的配置文件

FunASR中的VAD模型为FSMN-VAD，参数配置类为VADXOptions，可以在以下路径中找到：

/workspace/FunASR/runtime/python/onnxruntime/funasr_onnx/utils/e2e_vad.py

其中，VADXOptions类定义了多个VAD参数。以下是一些常见参数的定义：

class VADXOptions: sample_rate: int = 16000 detect_mode: int = VadDetectMode.kVadMutipleUtteranceDetectMode.value snr_mode: int = 0 max_end_silence_time: int = 800 max_start_silence_time: int = 3000 do_start_point_detection: bool = True do_end_point_detection: bool = True window_size_ms: int = 200 sil_to_speech_time_thres: int = 150 speech_to_sil_time_thres: int = 150 speech_2_noise_ratio: float = 1.0 do_extend: int = 1 lookback_time_start_point: int = 200 lookahead_time_end_point: int = 100 max_single_segment_time: int = 60000

这些参数控制了VAD的静音检测、语音与噪音之间的比率等。具体参数意义如下：

max_single_segment_time：单段音频的最大时长，默认60000毫秒（1分钟）。
max_end_silence_time：检测到结束静音的最大时长，默认800毫秒。
max_start_silence_time：检测到开始静音的最大时长，默认3000毫秒。
sil_to_speech_time_thres：从静音到语音的时间阈值，默认150毫秒。
speech_to_sil_time_thres：从语音到静音的时间阈值，默认150毫秒。

2. 修改VAD配置
VAD模型的实际配置是从模型目录中的config.yaml文件读取的。可以在以下路径找到config.yaml文件：

/workspace/models/damo/speech_fsmn_vad_zh-cn-16k-common-onnx/config.yaml

config.yaml文件中的model_conf字段包含了VAD模型的详细配置：

model: FsmnVADStreamingmodel_conf: sample_rate: 16000 detect_mode: 1 snr_mode: 0 max_end_silence_time: 800 max_start_silence_time: 3000 do_start_point_detection: True do_end_point_detection: True window_size_ms: 200 sil_to_speech_time_thres: 150 speech_to_sil_time_thres: 150 speech_2_noise_ratio: 1.0 do_extend: 1 lookback_time_start_point: 200 lookahead_time_end_point: 100 max_single_segment_time: 60000

3. 修改参数示例
假设你想减少静音端点的检测时间，可以将max_end_silence_time的默认值从800毫秒改为600毫秒。只需编辑config.yaml文件，将以下行：

max_end_silence_time: 800

改为：

max_end_silence_time: 600

这样，你的VAD模型将在600毫秒后检测到结束静音，适用于需要更快速响应的语音识别场景。

调优实践建议

直播场景配置

{ \"max_single_segment_time\": 30000, // 30秒分段 \"max_end_silence_time\": 500, // 快速结束检测 \"max_start_silence_time\": 1000, // 过滤开场噪音 \"sil_to_speech_time_thres\": 80, // 敏感语音起始 \"speech_to_sil_time_thres\": 200 // 宽松语音结束}

客服录音处理

{ \"max_single_segment_time\": 60000, // 保留完整对话 \"max_end_silence_time\": 1500, // 等待客户确认 \"enable_semantic_sentence_detection\": true // 语义分割}

性能影响对比

参数 低值风险 高值风险max_single_segment_time 语义割裂 内存溢出max_end_silence_time 提前截断（漏识别） 延迟结束（多噪声）sil_to_speech_time_thres噪音误触发 语音起始漏检

部分参考链接如下:
https://blog.51cto.com/u_16732038/12047312

FunASR搭建语音识别服务和VAD检测_funasr vad

调优实践建议

性能影响对比

公告

DeepSeek全套部署资料免费下载

免费可商用字体批量下载

标签

FunASR搭建语音识别服务和VAD检测_funasr vad

调优实践建议

性能影响对比

相关问题

公告

DeepSeek全套部署资料免费下载

免费可商用字体批量下载

标签