docs: add fix-double-finalize-and-bindings-api implementation plan

This commit is contained in:
wren
2026-04-28 11:44:31 +08:00
parent 1b4e0ec00a
commit be9fc4856b
15 changed files with 5733 additions and 0 deletions
+241
View File
@@ -0,0 +1,241 @@
<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>AuditCtx — 结构与调用速查</title>
<style>
:root {
--c-bg: #fafaf8;
--c-panel: #ffffff;
--c-text: #1a1a1a;
--c-muted: #666;
--c-subtle: #999;
--c-border: #e5e5e0;
--c-accent: #2a4d6e;
--c-accent-soft: #eaf0f6;
--c-code-bg: #2d2d2a;
--c-code-text: #f0f0e8;
--c-inline-bg: #f0ede5;
--c-inline-text: #5a3a1a;
--font-cn: "PingFang SC", "Microsoft YaHei", system-ui, sans-serif;
--font-mono: "JetBrains Mono", "SF Mono", Menlo, monospace;
}
* { box-sizing: border-box; }
html, body {
margin: 0; padding: 0;
background: var(--c-bg); color: var(--c-text);
font-family: var(--font-cn); line-height: 1.7; font-size: 15px;
}
.layout {
max-width: 880px;
margin: 0 auto;
padding: 48px 56px 80px;
}
h1 {
font-size: 28px; font-weight: 600; margin: 0 0 6px;
border-bottom: 3px solid var(--c-accent); padding-bottom: 14px;
}
h2 {
font-size: 20px; font-weight: 600;
margin: 40px 0 14px;
padding-left: 12px;
border-left: 4px solid var(--c-accent);
}
.subtitle { color: var(--c-muted); font-size: 14px; margin-bottom: 16px; }
.lead {
background: var(--c-accent-soft);
border-left: 4px solid var(--c-accent);
padding: 14px 20px;
margin: 20px 0 28px;
border-radius: 0 6px 6px 0;
}
.lead strong { color: var(--c-accent); }
p { margin: 10px 0; }
ul { padding-left: 22px; margin: 8px 0; }
ul li { margin: 5px 0; }
ul li::marker { color: var(--c-accent); }
code {
font-family: var(--font-mono);
background: var(--c-inline-bg);
color: var(--c-inline-text);
padding: 1px 6px; border-radius: 3px;
font-size: 13px;
}
pre {
background: var(--c-code-bg);
color: var(--c-code-text);
padding: 16px 20px;
border-radius: 6px;
overflow-x: auto;
margin: 12px 0;
font-family: var(--font-mono);
font-size: 13px;
line-height: 1.55;
}
pre code {
background: transparent; color: inherit; padding: 0; font-size: inherit;
}
.cmt { color: #8a9a7a; }
table {
width: 100%; border-collapse: collapse; margin: 12px 0;
background: var(--c-panel);
border: 1px solid var(--c-border);
border-radius: 6px; overflow: hidden;
font-size: 14px;
}
th, td {
padding: 9px 13px; text-align: left; vertical-align: top;
border-bottom: 1px solid var(--c-border);
}
th {
background: #f5f3ec; font-weight: 600;
color: var(--c-muted); font-size: 13px;
}
tr:last-child td { border-bottom: none; }
td code { font-size: 12px; }
.fileref {
font-family: var(--font-mono); font-size: 12px;
background: #efece4; color: #5a3a1a;
padding: 1px 6px; border-radius: 3px;
border: 1px solid #d8d2c0;
}
.note {
background: var(--c-accent-soft);
border-left: 4px solid var(--c-accent);
padding: 10px 16px; margin: 14px 0;
border-radius: 0 6px 6px 0;
font-size: 14px;
}
.note .tag { font-weight: 700; color: var(--c-accent); margin-right: 6px; }
.field-cat-身份 { color: #2a5a8a; font-weight: 600; }
.field-cat-装配 { color: #80560a; font-weight: 600; }
.field-cat-产物 { color: #4a7050; font-weight: 600; }
.field-cat-诊断 { color: #888; font-weight: 600; }
</style>
</head>
<body>
<div class="layout">
<h1>AuditCtx — 结构与调用速查</h1>
<div class="subtitle">LeAudit 服务编排层的不可变上下文 · 2026-04-27</div>
<div class="lead">
<strong>是什么:</strong>
<code>AuditCtx</code><span class="fileref">src/leaudit/services/audit_ctx.py</span>)是一次评查运行的<strong>不可变骨架</strong><code>@dataclass(frozen=True)</code>,构造一次,跨 service 流转;每个 stage 用 <code>with_xxx()</code> 派生新 ctx,旧 ctx 不变。
</div>
<h2>结构</h2>
<table>
<thead><tr><th style="width:24%">字段</th><th style="width:14%"></th><th style="width:32%">类型</th><th>说明</th></tr></thead>
<tbody>
<tr><td><code>document_id</code></td><td><span class="field-cat-身份">身份</span></td><td><code>str</code></td><td>这次 run 的 ID</td></tr>
<tr><td><code>rules_file</code></td><td><span class="field-cat-身份">身份</span></td><td><code>RulesFile | None</code></td><td>评查规则;None 时由分类器决定</td></tr>
<tr><td><code>file_path</code></td><td><span class="field-cat-身份">身份</span></td><td><code>str | None</code></td><td>本地文件路径</td></tr>
<tr><td><code>page_range</code></td><td><span class="field-cat-身份">身份</span></td><td><code>tuple[int,...] | None</code></td><td>子文档页面范围</td></tr>
<tr><td><code>services</code></td><td><span class="field-cat-装配">装配</span></td><td><code>AuditServices</code></td><td>所有 service / client 的容器</td></tr>
<tr><td><code>config</code></td><td><span class="field-cat-装配">装配</span></td><td><code>AuditConfig</code></td><td>运行期旋钮</td></tr>
<tr><td><code>normalized_doc</code></td><td><span class="field-cat-产物">产物</span></td><td><code>NormalizedDocument | None</code></td><td>OCR + 分类 + 分段</td></tr>
<tr><td><code>extraction</code></td><td><span class="field-cat-产物">产物</span></td><td><code>ExtractionBundle | None</code></td><td>抽取字段 / 多实体 / 派生</td></tr>
<tr><td><code>phase</code></td><td><span class="field-cat-产物">产物</span></td><td><code>str | None</code></td><td><code>draft</code><code>executed</code></td></tr>
<tr><td><code>evaluation</code></td><td><span class="field-cat-产物">产物</span></td><td><code>EvaluationResult | None</code></td><td>每条规则的评查结论</td></tr>
<tr><td><code>fallback_tasks</code></td><td><span class="field-cat-产物">产物</span></td><td><code>tuple[RescueTask,...]</code></td><td>失败规则的修救任务</td></tr>
<tr><td><code>extraction_errors</code></td><td><span class="field-cat-诊断">诊断</span></td><td><code>tuple[str,...]</code></td><td>抽取错误日志</td></tr>
<tr><td><code>timing</code></td><td><span class="field-cat-诊断">诊断</span></td><td><code>Mapping[str,float]</code></td><td>每阶段耗时</td></tr>
</tbody>
</table>
<div class="note"><span class="tag">不可变性:</span><code>frozen=True</code> 防字段重绑;<code>__post_init__</code> 把 list/dict 强转 <code>tuple</code> / <code>MappingProxyType</code>,防 <code>ctx.timing["x"] = 1</code> 这种穿透改写。</div>
<h2>调用</h2>
<h3 style="font-size:15px; color:var(--c-accent); margin:20px 0 8px;">1. 构造</h3>
<pre><code>from leaudit.services.audit_ctx import AuditCtx
from leaudit.services.audit_services import AuditServices
from leaudit.config.audit_config import AuditConfig
services = AuditServices(
llm_client=llm, vlm_client=vlm, ocr_client=ocr,
normalization=norm_svc, extraction=ext_svc, evaluation=eval_svc,
)
ctx = AuditCtx(
document_id="my_doc",
rules_file=rules,
services=services,
file_path="/path/to/doc.pdf",
config=AuditConfig(group_size=8),
)</code></pre>
<h3 style="font-size:15px; color:var(--c-accent); margin:20px 0 8px;">2. 演化(每个 stage 一次)</h3>
<pre><code>ctx = ctx.with_normalized_doc(ocr_result) <span class="cmt"># Stage 1</span>
ctx = ctx.with_extraction(extraction) <span class="cmt"># Stage 3</span>
ctx = ctx.with_phase("executed") <span class="cmt"># Stage 4</span>
ctx = ctx.with_evaluation(evaluation) <span class="cmt"># Stage 5</span>
ctx = ctx.with_fallback_task(task) <span class="cmt"># Stage 6(每条失败规则一次)</span>
ctx = ctx.with_timing(ocr=1.2, total=8.5) <span class="cmt"># 累加耗时</span></code></pre>
<p>所有 <code>with_xxx()</code> 都是 <code>dataclasses.replace()</code> 的薄包装——<strong>返回新对象</strong>,旧 ctx 不变。要用 <code>ctx = ctx.with_...()</code> 接住返回值。</p>
<h3 style="font-size:15px; color:var(--c-accent); margin:20px 0 8px;">3. 读取</h3>
<pre><code>ocr_result = ctx.normalized_doc
fields = ctx.extraction.fields
phase = ctx.phase
score = ctx.evaluation.total_score
ocr_secs = ctx.timing.get("ocr", 0.0)
text = ctx.source_text <span class="cmt"># 派生属性 = ctx.extraction.source_text</span></code></pre>
<h3 style="font-size:15px; color:var(--c-accent); margin:20px 0 8px;">4. 两个桥接 API</h3>
<table>
<thead><tr><th style="width:30%">API</th><th>用途</th></tr></thead>
<tbody>
<tr>
<td><code>ctx.effective_config</code></td>
<td>合并 <code>ctx.config</code> + <code>Settings</code><strong>业务代码统一只读这个</strong>,禁止 <code>get_settings()</code> / <code>os.getenv</code>(架构 guard 在 <span class="fileref">tests/architecture/</span></td>
</tr>
<tr>
<td><code>ctx.as_rescue_inputs()</code></td>
<td>转成 engine 旧的 <code>RescueInputs</code> 形状,桥接 <code>engine._try_ai_rescue</code>,不用改 engine</td>
</tr>
</tbody>
</table>
<h2>典型调用:AuditService.audit(ctx)</h2>
<p>所有 stage 已经在 <code>AuditService.audit()</code> 里编排好了(<span class="fileref">src/leaudit/services/audit_service.py:152</span>)。普通用法只要构造 ctx 然后调它:</p>
<pre><code>ctx = AuditCtx(document_id="...", rules_file=rules, services=services,
file_path="...", config=cfg)
ctx = await audit_service.audit(ctx)
<span class="cmt"># 跑完 ctx 已经被填满 normalized_doc / extraction / phase / evaluation / fallback_tasks / timing</span></code></pre>
<p>七个 stage 顺序:<code>Normalize</code><code>Resolve rules</code><code>Extract</code><code>Phase 判定</code><code>Evaluate</code><code>Rescue</code><code>Finalize</code></p>
<h2>规则三条</h2>
<ul>
<li>加产物字段 → <code>audit_ctx.py</code> 加字段 + 一个 <code>with_xxx</code> helper</li>
<li>读运行配置 → <code>ctx.effective_config</code><strong>不要</strong>直接 <code>get_settings()</code></li>
<li>改 stage 行为 → <strong>必须</strong> return 新 ctx<strong>不要</strong>原地改字段</li>
</ul>
</div>
</body>
</html>