<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>音频处理 on Chico's Tech Blog</title><link>https://realtime-ai.chat/tags/%E9%9F%B3%E9%A2%91%E5%A4%84%E7%90%86/</link><description>Recent content in 音频处理 on Chico's Tech Blog</description><image><title>Chico's Tech Blog</title><url>https://github.com/chicogong.png</url><link>https://github.com/chicogong.png</link></image><generator>Hugo</generator><language>zh-cn</language><lastBuildDate>Thu, 15 Jan 2026 10:00:00 +0800</lastBuildDate><atom:link href="https://realtime-ai.chat/tags/%E9%9F%B3%E9%A2%91%E5%A4%84%E7%90%86/index.xml" rel="self" type="application/rss+xml"/><item><title>TTS数据准备：从录音到训练的完整流程</title><link>https://realtime-ai.chat/posts/tts-data-preparation/</link><pubDate>Thu, 15 Jan 2026 10:00:00 +0800</pubDate><guid>https://realtime-ai.chat/posts/tts-data-preparation/</guid><description>TTS 数据准备完整流程:从录音、采样率到清洗标注,数据质量决定 80% 的语音合成效果。</description><content:encoded><![CDATA[<h2 id="数据决定上限">数据决定上限</h2>
<p>TTS模型效果好不好，80%取决于数据质量。</p>
<p><strong>常见问题：</strong></p>
<ul>
<li>录音有底噪 → 合成出来有杂音</li>
<li>音量不稳定 → 合成忽大忽小</li>
<li>断句不自然 → 合成节奏奇怪</li>
</ul>
<hr>
<h2 id="录音要求">录音要求</h2>
<h3 id="硬件">硬件</h3>
<table>
  <thead>
      <tr>
          <th>设备</th>
          <th>推荐</th>
          <th>预算</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>麦克风</td>
          <td>电容麦（如AT2020）</td>
          <td>¥500-1500</td>
      </tr>
      <tr>
          <td>声卡</td>
          <td>独立声卡或USB麦</td>
          <td>¥300-800</td>
      </tr>
      <tr>
          <td>环境</td>
          <td>安静房间+吸音棉</td>
          <td>¥100-300</td>
      </tr>
  </tbody>
</table>
<h3 id="录音参数">录音参数</h3>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">采样率: 48kHz（至少16kHz）
</span></span><span class="line"><span class="cl">位深: 24-bit
</span></span><span class="line"><span class="cl">格式: WAV（无损）
</span></span></code></pre></td></tr></table>
</div>
</div><h3 id="录音技巧">录音技巧</h3>
<ol>
<li><strong>距离</strong>：麦克风离嘴15-20cm</li>
<li><strong>音量</strong>：保持-12dB到-6dB之间</li>
<li><strong>状态</strong>：正常语速，自然呼吸</li>
<li><strong>时长</strong>：至少2小时（5-10小时效果更好）</li>
</ol>
<hr>
<h2 id="数据清洗流程">数据清洗流程</h2>
<pre class="mermaid">graph LR
    A[原始音频] --> B[降噪]
    B --> C[音量标准化]
    C --> D[切分句子]
    D --> E[对齐文本]
    E --> F[质量检查]
    F --> G[训练数据]
</pre><h3 id="1-降噪">1. 降噪</h3>
<p><strong>工具</strong>：Audacity（免费）、Adobe Podcast（在线）</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="c1"># 使用ffmpeg + rnnoise降噪</span>
</span></span><span class="line"><span class="cl">ffmpeg -i input.wav -af <span class="s2">&#34;arnndn=m=rnnoise-models/bd.rnnn&#34;</span> output.wav
</span></span></code></pre></td></tr></table>
</div>
</div><blockquote>
<p>⚠️ 注意：过度降噪会导致声音失真，宁可保留少量底噪</p></blockquote>
<h3 id="2-音量标准化">2. 音量标准化</h3>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="c1"># 标准化到-16 LUFS</span>
</span></span><span class="line"><span class="cl">ffmpeg -i input.wav -af <span class="nv">loudnorm</span><span class="o">=</span><span class="nv">I</span><span class="o">=</span>-16:TP<span class="o">=</span>-1.5:LRA<span class="o">=</span><span class="m">11</span> output.wav
</span></span></code></pre></td></tr></table>
</div>
</div><h3 id="3-切分句子">3. 切分句子</h3>
<p><strong>原则</strong>：一个音频片段 = 一句话（3-15秒）</p>
<p><strong>工具</strong>：</p>
<ul>
<li><strong>Whisper</strong>：自动转写+时间戳</li>
<li><strong>UVR</strong>：人声分离（去除背景音乐）</li>
</ul>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="c1"># 使用Whisper获取时间戳</span>
</span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">whisper</span>
</span></span><span class="line"><span class="cl"><span class="n">model</span> <span class="o">=</span> <span class="n">whisper</span><span class="o">.</span><span class="n">load_model</span><span class="p">(</span><span class="s2">&#34;medium&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">result</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">transcribe</span><span class="p">(</span><span class="s2">&#34;audio.wav&#34;</span><span class="p">,</span> <span class="n">word_timestamps</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</span></span></code></pre></td></tr></table>
</div>
</div><h3 id="4-文本对齐">4. 文本对齐</h3>
<p>每个音频片段需要对应的文本标注：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">audio_001.wav | 今天天气真不错。
</span></span><span class="line"><span class="cl">audio_002.wav | 我们去公园散步吧。
</span></span><span class="line"><span class="cl">audio_003.wav | 好的，等我换个衣服。
</span></span></code></pre></td></tr></table>
</div>
</div><p><strong>常见格式</strong>：</p>
<ul>
<li>LJSpeech格式：<code>文件名|文本</code></li>
<li>CSV格式：带时间戳</li>
</ul>
<hr>
<h2 id="常见坑">常见坑</h2>
<h3 id="坑1录音环境不一致">坑1：录音环境不一致</h3>
<p>同一批数据，有的在卧室录，有的在办公室录 → 合成效果不稳定</p>
<p><strong>解决</strong>：固定一个录音环境</p>
<h3 id="坑2情绪变化大">坑2：情绪变化大</h3>
<p>开始精神饱满，后面疲惫无力 → 合成声音不稳定</p>
<p><strong>解决</strong>：分多次录，每次1小时以内</p>
<h3 id="坑3文本标注错误">坑3：文本标注错误</h3>
<p>音频说&quot;今天&quot;，标注写&quot;昨天&quot; → 模型学混了</p>
<p><strong>解决</strong>：用Whisper自动转写，人工校验</p>
<h3 id="坑4切分太碎">坑4：切分太碎</h3>
<p>3个字一个片段 → 模型学不到完整语调</p>
<p><strong>解决</strong>：保持5-15秒一个片段</p>
<hr>
<h2 id="数据量参考">数据量参考</h2>
<table>
  <thead>
      <tr>
          <th>目标</th>
          <th>数据量</th>
          <th>效果</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>快速验证</td>
          <td>10分钟</td>
          <td>勉强能用</td>
      </tr>
      <tr>
          <td>基本可用</td>
          <td>1-2小时</td>
          <td>像那么回事</td>
      </tr>
      <tr>
          <td>效果不错</td>
          <td>5-10小时</td>
          <td>接近真人</td>
      </tr>
      <tr>
          <td>专业级</td>
          <td>20小时+</td>
          <td>几乎分不出</td>
      </tr>
  </tbody>
</table>
<hr>
<h2 id="推荐工具">推荐工具</h2>
<table>
  <thead>
      <tr>
          <th>用途</th>
          <th>工具</th>
          <th>备注</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>录音</td>
          <td>Audacity</td>
          <td>免费</td>
      </tr>
      <tr>
          <td>降噪</td>
          <td>Adobe Podcast</td>
          <td>在线免费</td>
      </tr>
      <tr>
          <td>人声分离</td>
          <td>UVR5</td>
          <td>开源</td>
      </tr>
      <tr>
          <td>转写</td>
          <td>Whisper</td>
          <td>开源</td>
      </tr>
      <tr>
          <td>批处理</td>
          <td>FFmpeg</td>
          <td>命令行工具</td>
      </tr>
  </tbody>
</table>
<hr>
<h2 id="下一步">下一步</h2>
<p>数据准备好后，就可以开始训练了。下篇讲TTS模型微调。</p>
<p>有问题留言。</p>
]]></content:encoded></item></channel></rss>