fix: 修复Edge TTS音频格式不识别问题#237
Conversation
| while retry_count < max_retries: | ||
| try: | ||
| communicate = edge_tts.Communicate(text, voice) | ||
| communicate = edge_tts.Communicate(text, voice, audio_format="riff-24khz-16bit-mono-pcm") |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
| filename = f"{voice}_{safe_text}.wav" | ||
| return os.path.join(self.audio_dir, filename) | ||
|
|
||
| def _is_valid_audio(self, file_path: str) -> bool: |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
Pull request overview
该 PR 旨在修复 Edge TTS 生成的音频文件“扩展名为 WAV 但内容实际为 MP3”导致 soundfile/libsndfile 报 Format not recognised、进而无法播放的问题,并通过缓存校验自动清理旧的坏缓存文件。
Changes:
- 在生成 Edge TTS 语音时显式指定输出为 RIFF/WAV PCM(
riff-24khz-16bit-mono-pcm)。 - 在命中磁盘缓存前新增音频有效性校验逻辑(
soundfile.info()),无效则删除并重新生成。 - 触及到文件末尾一行的行尾字符(出现混合行尾风险)。
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return os.path.join(self.audio_dir, filename) | ||
|
|
||
| def _is_valid_audio(self, file_path: str) -> bool: | ||
| """检查音频文件是否为有效格式""" |
| except Exception: | ||
| logger.warning(f"缓存文件格式无效,将重新生成: {file_path}") | ||
| try: | ||
| os.remove(file_path) | ||
| except OSError: | ||
| pass |
| self.voice_engine.stop() | ||
| except Exception as e: | ||
| logger.warning(f"停止系统TTS引擎失败: {e}") | ||
| logger.warning(f"停止系统TTS引擎失败: {e}") |
|
@copilot Review the code at the location below. A potential bug has been identified by an AI Location: app/common/voice/voice.py#L406 Potential issue: The Location: app/common/voice/voice.py#L471 Potential issue: In |
Agent-Logs-Url: https://github.com/SECTL/SecRandom/sessions/4f3a64bb-fdf0-422c-b1ed-180dbdab86d1 Co-authored-by: WSXYT <102407247+WSXYT@users.noreply.github.com>
Agent-Logs-Url: https://github.com/SECTL/SecRandom/sessions/4f3a64bb-fdf0-422c-b1ed-180dbdab86d1 Co-authored-by: WSXYT <102407247+WSXYT@users.noreply.github.com>
已确认这两个问题都是真实问题,并已修复(提交
|
问题描述
用户在使用Edge TTS播放语音时,遇到
LibsndfileError: Error opening '...wav': Format not recognised.错误,导致音频无法正常播放。根本原因
edge_tts.Communicate.save()方法在保存Edge TTS生成的音频时,即使文件扩展名为.wav,实际写入的却是MP3格式的数据。libsndfile库(被soundfile使用)无法识别这种伪装成WAV的MP3文件,从而抛出格式不识别的错误。解决方案
edge_tts.Communicate时,明确指定audio_format="riff-24khz-16bit-mono-pcm"。这确保了Edge TTS服务返回的是libsndfile可以正确解析的RIFF/WAV格式音频。VoiceCacheManager.get_voice方法中,新增_is_valid_audio辅助方法。在从磁盘加载缓存文件之前,会先通过soundfile.info()检查其格式有效性。如果文件格式无效(例如,旧的MP3-in-WAV文件),则会删除该文件并触发重新生成,从而自动清理并修复损坏的缓存。此更改确保了Edge TTS生成的音频文件与播放系统之间的兼容性,解决了因格式不匹配导致的播放错误。
Fixes SECRANDOM-5E