<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Tool Use on Chico's Tech Blog</title><link>https://realtime-ai.chat/tags/tool-use/</link><description>Recent content in Tool Use on Chico's Tech Blog</description><image><title>Chico's Tech Blog</title><url>https://github.com/chicogong.png</url><link>https://github.com/chicogong.png</link></image><generator>Hugo</generator><language>zh-cn</language><lastBuildDate>Sun, 17 May 2026 11:00:00 +0800</lastBuildDate><atom:link href="https://realtime-ai.chat/tags/tool-use/index.xml" rel="self" type="application/rss+xml"/><item><title>给 Agent 写工具:一个好 tool 长什么样</title><link>https://realtime-ai.chat/posts/agent-tool-design/</link><pubDate>Sun, 17 May 2026 11:00:00 +0800</pubDate><guid>https://realtime-ai.chat/posts/agent-tool-design/</guid><description>Agent 跑不好,常常不是模型不行,是工具设计得差。这篇讲清工具描述、参数、返回值、错误回传、粒度切分该怎么做,每条都配正反例。</description><content:encoded><![CDATA[<p>我见过一个团队为了让 Agent &ldquo;更聪明&rdquo;,把模型从中杯换成大杯,账单翻了三倍,效果几乎没动。后来定位下来,问题出在一个叫 <code>query</code> 的工具上:它的描述只有一句&quot;查询数据库&quot;,返回的是一坨 4000 行的 JSON,里面塞满了 <code>created_at_unix</code>、<code>tenant_uuid</code>、<code>row_version</code> 这种字段。模型不是不聪明,是它每次调用完都得在一堆噪声里捞针,然后经常捞错。</p>
<p>把这个工具拆成两个、描述写清楚、返回值砍掉八成,中杯模型的表现就超过了原来大杯的版本。</p>
<p>这不是个例。<strong>Agent 能力的天花板,很多时候是工具设计,不是模型。</strong> 模型是你换不动的那部分——它由 Anthropic、OpenAI 训练,你只能选型;工具是你完全能控制的那部分。把精力花在能控制的地方,回报率高得多。</p>
<p>Anthropic 在 2026 年那篇《Writing effective tools for AI agents》里有一句话我很认同:工具是一种新的软件形态,它是<strong>确定性系统和非确定性 Agent 之间的契约</strong>。你不能再按&quot;给另一个程序员写 API&quot;的思路写工具——调用方变了,设计原则就得跟着变。</p>
<h2 id="工具描述你在跟模型招标">工具描述:你在跟模型&quot;招标&quot;</h2>
<p>模型面对一组工具,做的事情和招标差不多:读每个工具的描述,判断&quot;这个活该派给谁&quot;。描述写得含糊,它就选错;描述之间边界不清,它就来回横跳。</p>
<p>最常见的坏味道是<strong>用实现细节代替使用场景</strong>。</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl"># 反例
</span></span><span class="line"><span class="cl">{
</span></span><span class="line"><span class="cl">  &#34;name&#34;: &#34;db_query&#34;,
</span></span><span class="line"><span class="cl">  &#34;description&#34;: &#34;对主库执行 SQL 查询&#34;
</span></span><span class="line"><span class="cl">}
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"># 正例
</span></span><span class="line"><span class="cl">{
</span></span><span class="line"><span class="cl">  &#34;name&#34;: &#34;search_orders&#34;,
</span></span><span class="line"><span class="cl">  &#34;description&#34;: &#34;按用户 ID、时间范围或订单状态查询订单。
</span></span><span class="line"><span class="cl">                  用于回答&#39;用户买过什么&#39;&#39;某笔订单到哪了&#39;这类问题。
</span></span><span class="line"><span class="cl">                  不要用它查商品库存——那是 search_inventory 的活。&#34;
</span></span><span class="line"><span class="cl">}
</span></span></code></pre></td></tr></table>
</div>
</div><p>差别在哪?反例描述的是&quot;工具内部怎么干活&quot;(执行 SQL),模型并不关心这个;它关心的是&quot;什么时候该用我&quot;。正例直接给出<strong>触发场景</strong>,还顺手划清了和邻居工具的边界。</p>
<p>这里有个容易被忽略的点:<strong>当你有多个相似工具时,描述里必须明确&quot;我不是谁&quot;。</strong> Anthropic 的建议是用命名空间区分,比如 <code>asana_search</code> 和 <code>jira_search</code>,或者更细的 <code>asana_projects_search</code>、<code>asana_users_search</code>。前缀本身就是一种边界声明。光靠名字还不够时,就在描述里直接写&quot;查 X 用我,查 Y 请用那个工具&quot;。</p>
<p>另一个实战技巧:<strong>在描述里塞一两个使用示例</strong>。模型在互联网文本里见过的函数,旁边大多带着调用例子,这种格式它最熟。一个 <code>search_orders(user_id=&quot;u_123&quot;, status=&quot;shipped&quot;)</code> 的示例,比三行抽象说明管用。2026 年 Anthropic 的 Claude API 干脆把这个能力产品化了,叫 Tool Use Examples——可见示例不是锦上添花,是正经手段。</p>
<h2 id="参数让模型填得对而不是填得全">参数:让模型&quot;填得对&quot;,而不是&quot;填得全&quot;</h2>
<p>参数设计的核心矛盾是:你想要灵活,模型想要明确。这两者经常打架,而你应该站在模型这边。</p>
<p><strong>第一,别用裸字符串当枚举。</strong> 一个 <code>status</code> 参数,如果你在描述里写&quot;传订单状态&quot;,模型可能传 <code>&quot;已发货&quot;</code>、<code>&quot;shipped&quot;</code>、<code>&quot;SHIPPED&quot;</code>、<code>&quot;发货中&quot;</code>——四种写法,你的代码能认几种?直接用枚举把可选值锁死:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="c1"># 反例:status 是 str,模型自由发挥</span>
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">search_orders</span><span class="p">(</span><span class="n">user_id</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">status</span><span class="p">:</span> <span class="nb">str</span><span class="p">):</span> <span class="o">...</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 正例:枚举,模型只能在合法值里选</span>
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">enum</span> <span class="kn">import</span> <span class="n">Enum</span>
</span></span><span class="line"><span class="cl"><span class="k">class</span> <span class="nc">OrderStatus</span><span class="p">(</span><span class="nb">str</span><span class="p">,</span> <span class="n">Enum</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="n">PENDING</span> <span class="o">=</span> <span class="s2">&#34;pending&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">SHIPPED</span> <span class="o">=</span> <span class="s2">&#34;shipped&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">DELIVERED</span> <span class="o">=</span> <span class="s2">&#34;delivered&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">CANCELLED</span> <span class="o">=</span> <span class="s2">&#34;cancelled&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">search_orders</span><span class="p">(</span><span class="n">user_id</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">status</span><span class="p">:</span> <span class="n">OrderStatus</span> <span class="o">|</span> <span class="kc">None</span> <span class="o">=</span> <span class="kc">None</span><span class="p">):</span> <span class="o">...</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p><strong>第二,能有默认值就别让模型填。</strong> 每多一个必填参数,就多一个模型出错的机会。分页的 <code>page_size</code>、排序的 <code>order_by</code>,给个合理默认值,模型大多数时候根本不用碰它。</p>
<p><strong>第三,警惕&quot;看起来很像&quot;的参数。</strong> 一个工具同时收 <code>start_date</code> 和 <code>end_date</code>,模型偶尔会填反。如果业务允许,合并成一个 <code>time_range</code> 枚举(<code>last_7_days</code>、<code>last_30_days</code>、<code>this_month</code>)往往更稳——你把&quot;理解日期区间&quot;这件事从模型手里拿回来了。当然,需要精确区间时该用两个还得用两个,这是取舍,不是教条。</p>
<p>一个判断标准:<strong>如果一个参数,你自己都要想三秒才知道该填什么,模型只会比你更糊涂。</strong></p>
<h2 id="返回值给模型能用的信息不是给它一份数据库导出">返回值:给模型能用的信息,不是给它一份数据库导出</h2>
<p>这是我见过踩坑最多的地方,值得单独讲。</p>
<p>工具的返回值会<strong>原封不动进入模型的上下文窗口</strong>。这意味着两件事:一是它占 token,占的还是最贵的那部分;二是模型要从里面提取信息做下一步决策。所以返回值的设计目标只有一个——<strong>高信噪比</strong>。</p>
<p>反例长这样:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="cl">  <span class="nt">&#34;data&#34;</span><span class="p">:</span> <span class="p">[{</span>
</span></span><span class="line"><span class="cl">    <span class="nt">&#34;order_id&#34;</span><span class="p">:</span> <span class="s2">&#34;ord_8f3a2b1c-9d4e-4f5a-8b6c-1d2e3f4a5b6c&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="nt">&#34;tenant_uuid&#34;</span><span class="p">:</span> <span class="s2">&#34;tn_a1b2c3d4&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="nt">&#34;created_at_unix&#34;</span><span class="p">:</span> <span class="mi">1747300800</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="nt">&#34;updated_at_unix&#34;</span><span class="p">:</span> <span class="mi">1747387200</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="nt">&#34;row_version&#34;</span><span class="p">:</span> <span class="mi">7</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="nt">&#34;status_code&#34;</span><span class="p">:</span> <span class="mi">2</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="nt">&#34;_internal_flags&#34;</span><span class="p">:</span> <span class="p">{</span> <span class="nt">&#34;is_migrated&#34;</span><span class="p">:</span> <span class="kc">true</span><span class="p">,</span> <span class="nt">&#34;shard&#34;</span><span class="p">:</span> <span class="mi">3</span> <span class="p">}</span>
</span></span><span class="line"><span class="cl">  <span class="p">}]</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>模型看到这个,得自己去想:<code>status_code: 2</code> 是什么意思?<code>created_at_unix</code> 怎么换算成人话?<code>tenant_uuid</code> 要不要在下一步带上?这些都是噪声,而且每一条都是潜在的出错点。</p>
<p>Anthropic 的原则说得很直白:<strong>返回人类可读的字段,别返回底层技术标识符。</strong> <code>name</code>、<code>status</code>、<code>created_at</code>(写成可读时间)这种字段能直接指导模型的下一步动作;<code>uuid</code>、<code>mime_type</code>、<code>row_version</code> 不能,它们只是占地方。</p>
<p>正例:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="cl">  <span class="nt">&#34;orders&#34;</span><span class="p">:</span> <span class="p">[{</span>
</span></span><span class="line"><span class="cl">    <span class="nt">&#34;id&#34;</span><span class="p">:</span> <span class="s2">&#34;ord_8f3a2b1c&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="nt">&#34;status&#34;</span><span class="p">:</span> <span class="s2">&#34;shipped&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="nt">&#34;created_at&#34;</span><span class="p">:</span> <span class="s2">&#34;2026-05-15 14:00&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="nt">&#34;total&#34;</span><span class="p">:</span> <span class="s2">&#34;¥299.00&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="nt">&#34;items_summary&#34;</span><span class="p">:</span> <span class="s2">&#34;无线耳机 x1&#34;</span>
</span></span><span class="line"><span class="cl">  <span class="p">}],</span>
</span></span><span class="line"><span class="cl">  <span class="nt">&#34;total_count&#34;</span><span class="p">:</span> <span class="mi">47</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">  <span class="nt">&#34;showing&#34;</span><span class="p">:</span> <span class="s2">&#34;1-10&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">  <span class="nt">&#34;hint&#34;</span><span class="p">:</span> <span class="s2">&#34;还有 37 条,加 status 或更窄的时间范围可缩小结果&#34;</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>注意最后那个 <code>hint</code> 字段。<strong>返回值不只是数据,也是给模型的下一步提示。</strong> 当结果太多时,与其返回 47 条把上下文撑爆,不如返回 10 条加一句&quot;还有 37 条,这样筛&quot;。Anthropic 把这类机制叫分页、范围过滤、截断,核心思想一致:别让模型被数据淹没,主动引导它做更窄、更省 token 的查询。</p>
<p>下面这张图是返回值设计的取舍:</p>
<pre class="mermaid">flowchart TD
  A[工具拿到原始结果] --> B{结果量大吗?}
  B -->|小| C[直接返回可读字段]
  B -->|大| D[截断 + 分页]
  D --> E[附 hint:怎么缩小范围]
  C --> F[剔除 uuid/时间戳/内部 flag]
  E --> F
  F --> G[进入模型上下文]
  style F fill:#fde7c2,stroke:#e8b23c
  style E fill:#fde7c2,stroke:#e8b23c
</pre><p>橙色那两块——<strong>剔除噪声字段</strong>和<strong>附带引导提示</strong>——是最容易省略、又最影响效果的环节。</p>
<h2 id="错误怎么回错误信息是给模型的操作手册">错误怎么回:错误信息是给模型的&quot;操作手册&quot;</h2>
<p>工具调用失败是常态,不是异常。模型填错参数、查的资源不存在、触发了限流——这些每天都在发生。真正决定 Agent 韧性的,是<strong>出错之后它能不能自己爬起来</strong>。而它能不能爬起来,取决于你的错误信息写成什么样。</p>
<p>反例:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s2">&#34;Invalid input&#34;</span><span class="p">)</span>          <span class="c1"># 模型:啥 input?哪儿错了?</span>
</span></span><span class="line"><span class="cl"><span class="k">return</span> <span class="p">{</span><span class="s2">&#34;error&#34;</span><span class="p">:</span> <span class="s2">&#34;ERR_4012&#34;</span><span class="p">}</span>                 <span class="c1"># 模型:4012 是什么我怎么知道</span>
</span></span><span class="line"><span class="cl"><span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="n">traceback</span><span class="o">...</span><span class="p">)</span>                <span class="c1"># 模型:吞掉半屏 token,然后还是不知道咋办</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>这三种回法的共同问题是:<strong>模型读完不知道下一步该干什么。</strong> 它要么放弃,要么用同样的错参数原样重试,卡进死循环。</p>
<p>好的错误信息要满足一个标准——<strong>模型读完就知道怎么改</strong>:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="c1"># 正例:说清错在哪 + 给出可执行的下一步</span>
</span></span><span class="line"><span class="cl"><span class="k">return</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">  <span class="s2">&#34;error&#34;</span><span class="p">:</span> <span class="s2">&#34;参数 status 的值 &#39;发货中&#39; 不合法&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">  <span class="s2">&#34;valid_values&#34;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&#34;pending&#34;</span><span class="p">,</span> <span class="s2">&#34;shipped&#34;</span><span class="p">,</span> <span class="s2">&#34;delivered&#34;</span><span class="p">,</span> <span class="s2">&#34;cancelled&#34;</span><span class="p">],</span>
</span></span><span class="line"><span class="cl">  <span class="s2">&#34;hint&#34;</span><span class="p">:</span> <span class="s2">&#34;你可能想用 &#39;shipped&#39;&#34;</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">return</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">  <span class="s2">&#34;error&#34;</span><span class="p">:</span> <span class="s2">&#34;未找到 user_id &#39;u_999&#39; 对应的用户&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">  <span class="s2">&#34;hint&#34;</span><span class="p">:</span> <span class="s2">&#34;确认 ID 是否正确,或先用 search_users 按用户名查到 ID&#34;</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>Anthropic 的说法是:你可以<strong>对错误信息做提示工程</strong>,把它写成清晰、可执行的改进建议,而不是不透明的错误码或堆栈。一条好的错误信息会顺手告诉模型&quot;下一步该调哪个工具&quot;——上面那个 <code>search_users</code> 的提示就是。这等于把错误信息也当成了引导模型的一个入口。</p>
<p>还有个常被忽略的点:<strong>错误也要省 token。</strong> 别把整个 Python traceback 塞回去,那几百个 token 对模型几乎没有信息价值。给一句人话就够了。</p>
<h2 id="工具粒度太细太粗都不行">工具粒度:太细太粗都不行</h2>
<p>最后一个,也是最难的——工具切多大。</p>
<p><strong>切太细的坑。</strong> 把 <code>get_user</code>、<code>get_user_orders</code>、<code>get_order_detail</code> 拆成三个独立工具,听起来很&quot;单一职责&quot;。但 Agent 要回答&quot;用户最近这单到哪了&quot;,得连着调三次:第一次拿 user,第二次拿 order 列表,第三次拿 detail。三次往返,三段返回值堆进上下文,任何一步选错都得重来。<strong>工具太细,模型就被迫去干编排的活,而编排正是它最容易出错的地方。</strong></p>
<p><strong>切太粗的坑。</strong> 反过来做一个万能的 <code>manage_order</code>,靠一个 <code>action</code> 参数切换&quot;查询/创建/退款/改地址&quot;。模型每次都要先想清楚 <code>action</code> 填什么、对应又该带哪些参数,描述也长得没法读。而且一个工具权限太大,审计和兜底都难做——你没法只给某个 Agent &ldquo;查询&quot;权限而不给&quot;退款&quot;权限。</p>
<p>我的经验法则是:<strong>按&quot;用户意图&quot;切,不按&quot;数据库表&quot;切,也不按&quot;一个超级动作&quot;切。</strong></p>
<table>
  <thead>
      <tr>
          <th>切法</th>
          <th>例子</th>
          <th>问题</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>按表切(太细)</td>
          <td><code>get_user</code> / <code>get_orders</code> / <code>get_items</code></td>
          <td>模型被迫多次编排,易错</td>
      </tr>
      <tr>
          <td>按超级动作切(太粗)</td>
          <td><code>manage_order(action=...)</code></td>
          <td>参数耦合、描述爆炸、权限难控</td>
      </tr>
      <tr>
          <td><strong>按意图切(推荐)</strong></td>
          <td><code>get_order_status(order_id)</code> 一次返回订单+物流+商品摘要</td>
          <td>一次调用解决一个完整问题</td>
      </tr>
  </tbody>
</table>
<p>判断方法很简单:<strong>想象一个真实的用户问题,数一数 Agent 要调几次工具才能答上。</strong> 如果一个常见问题要调四五次,你的工具大概率切太细了;如果一个工具的描述你得写满一屏才说得清,那它八成切太粗了。</p>
<p>Anthropic 反复强调的&quot;evaluation-driven development&quot;在这里特别管用:先拿真实任务跑一批评测,看 Agent 卡在哪、绕了多少弯路,再回头调工具的粒度。工具设计不是一次写对的,是测出来、改出来的。</p>
<h2 id="几条收尾的话">几条收尾的话</h2>
<p>把上面的拆开看是五个话题,合起来其实是一个视角的转变:<strong>你不是在给程序写接口,你是在给一个会读字、会犯错、上下文有限的&quot;实习生&quot;写操作手册。</strong></p>
<p>落到日常,优先级我会这么排:</p>
<ol>
<li><strong>先治返回值。</strong> 砍掉 uuid、时间戳、内部 flag,只留可读字段。这一步零成本,收益立竿见影。</li>
<li><strong>再治错误信息。</strong> 把每条错误都改成&quot;说清错在哪 + 下一步怎么办&rdquo;。Agent 的韧性主要靠这个。</li>
<li><strong>然后理顺粒度。</strong> 按意图切,用真实任务量一量调用次数。</li>
<li><strong>最后打磨描述和参数。</strong> 加示例、上枚举、给默认值。</li>
</ol>
<p>别一上来就盯着换模型。先把你能 100% 控制的那部分——工具——做扎实了,再去谈模型选型。很多时候,中杯配一组好工具,比大杯配一组烂工具跑得稳得多,还便宜。</p>
<hr>
<p><strong>参考资料</strong></p>
<ul>
<li><a href="https://www.anthropic.com/engineering/writing-tools-for-agents">Writing effective tools for AI agents — Anthropic Engineering</a></li>
<li><a href="https://www.anthropic.com/engineering/advanced-tool-use">Introducing advanced tool use on the Claude Developer Platform — Anthropic</a></li>
<li><a href="https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents">Effective context engineering for AI agents — Anthropic</a></li>
<li><a href="https://modelcontextprotocol.info/docs/tutorials/writing-effective-tools/">Writing Effective Tools for Agents: Complete MCP Development Guide</a></li>
</ul>
]]></content:encoded></item></channel></rss>