Files
leaudit-platform-backend/docs/leaudit/AuditCtx深度解读-2026-04-27.html
T
2026-05-09 20:04:08 +08:00

242 lines
10 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>AuditCtx — 结构与调用速查</title>
<style>
:root {
--c-bg: #fafaf8;
--c-panel: #ffffff;
--c-text: #1a1a1a;
--c-muted: #666;
--c-subtle: #999;
--c-border: #e5e5e0;
--c-accent: #2a4d6e;
--c-accent-soft: #eaf0f6;
--c-code-bg: #2d2d2a;
--c-code-text: #f0f0e8;
--c-inline-bg: #f0ede5;
--c-inline-text: #5a3a1a;
--font-cn: "PingFang SC", "Microsoft YaHei", system-ui, sans-serif;
--font-mono: "JetBrains Mono", "SF Mono", Menlo, monospace;
}
* { box-sizing: border-box; }
html, body {
margin: 0; padding: 0;
background: var(--c-bg); color: var(--c-text);
font-family: var(--font-cn); line-height: 1.7; font-size: 15px;
}
.layout {
max-width: 880px;
margin: 0 auto;
padding: 48px 56px 80px;
}
h1 {
font-size: 28px; font-weight: 600; margin: 0 0 6px;
border-bottom: 3px solid var(--c-accent); padding-bottom: 14px;
}
h2 {
font-size: 20px; font-weight: 600;
margin: 40px 0 14px;
padding-left: 12px;
border-left: 4px solid var(--c-accent);
}
.subtitle { color: var(--c-muted); font-size: 14px; margin-bottom: 16px; }
.lead {
background: var(--c-accent-soft);
border-left: 4px solid var(--c-accent);
padding: 14px 20px;
margin: 20px 0 28px;
border-radius: 0 6px 6px 0;
}
.lead strong { color: var(--c-accent); }
p { margin: 10px 0; }
ul { padding-left: 22px; margin: 8px 0; }
ul li { margin: 5px 0; }
ul li::marker { color: var(--c-accent); }
code {
font-family: var(--font-mono);
background: var(--c-inline-bg);
color: var(--c-inline-text);
padding: 1px 6px; border-radius: 3px;
font-size: 13px;
}
pre {
background: var(--c-code-bg);
color: var(--c-code-text);
padding: 16px 20px;
border-radius: 6px;
overflow-x: auto;
margin: 12px 0;
font-family: var(--font-mono);
font-size: 13px;
line-height: 1.55;
}
pre code {
background: transparent; color: inherit; padding: 0; font-size: inherit;
}
.cmt { color: #8a9a7a; }
table {
width: 100%; border-collapse: collapse; margin: 12px 0;
background: var(--c-panel);
border: 1px solid var(--c-border);
border-radius: 6px; overflow: hidden;
font-size: 14px;
}
th, td {
padding: 9px 13px; text-align: left; vertical-align: top;
border-bottom: 1px solid var(--c-border);
}
th {
background: #f5f3ec; font-weight: 600;
color: var(--c-muted); font-size: 13px;
}
tr:last-child td { border-bottom: none; }
td code { font-size: 12px; }
.fileref {
font-family: var(--font-mono); font-size: 12px;
background: #efece4; color: #5a3a1a;
padding: 1px 6px; border-radius: 3px;
border: 1px solid #d8d2c0;
}
.note {
background: var(--c-accent-soft);
border-left: 4px solid var(--c-accent);
padding: 10px 16px; margin: 14px 0;
border-radius: 0 6px 6px 0;
font-size: 14px;
}
.note .tag { font-weight: 700; color: var(--c-accent); margin-right: 6px; }
.field-cat-身份 { color: #2a5a8a; font-weight: 600; }
.field-cat-装配 { color: #80560a; font-weight: 600; }
.field-cat-产物 { color: #4a7050; font-weight: 600; }
.field-cat-诊断 { color: #888; font-weight: 600; }
</style>
</head>
<body>
<div class="layout">
<h1>AuditCtx — 结构与调用速查</h1>
<div class="subtitle">LeAudit 服务编排层的不可变上下文 · 2026-04-27</div>
<div class="lead">
<strong>是什么:</strong>
<code>AuditCtx</code><span class="fileref">src/leaudit/services/audit_ctx.py</span>)是一次评查运行的<strong>不可变骨架</strong><code>@dataclass(frozen=True)</code>,构造一次,跨 service 流转;每个 stage 用 <code>with_xxx()</code> 派生新 ctx,旧 ctx 不变。
</div>
<h2>结构</h2>
<table>
<thead><tr><th style="width:24%">字段</th><th style="width:14%"></th><th style="width:32%">类型</th><th>说明</th></tr></thead>
<tbody>
<tr><td><code>document_id</code></td><td><span class="field-cat-身份">身份</span></td><td><code>str</code></td><td>这次 run 的 ID</td></tr>
<tr><td><code>rules_file</code></td><td><span class="field-cat-身份">身份</span></td><td><code>RulesFile | None</code></td><td>评查规则;None 时由分类器决定</td></tr>
<tr><td><code>file_path</code></td><td><span class="field-cat-身份">身份</span></td><td><code>str | None</code></td><td>本地文件路径</td></tr>
<tr><td><code>page_range</code></td><td><span class="field-cat-身份">身份</span></td><td><code>tuple[int,...] | None</code></td><td>子文档页面范围</td></tr>
<tr><td><code>services</code></td><td><span class="field-cat-装配">装配</span></td><td><code>AuditServices</code></td><td>所有 service / client 的容器</td></tr>
<tr><td><code>config</code></td><td><span class="field-cat-装配">装配</span></td><td><code>AuditConfig</code></td><td>运行期旋钮</td></tr>
<tr><td><code>normalized_doc</code></td><td><span class="field-cat-产物">产物</span></td><td><code>NormalizedDocument | None</code></td><td>OCR + 分类 + 分段</td></tr>
<tr><td><code>extraction</code></td><td><span class="field-cat-产物">产物</span></td><td><code>ExtractionBundle | None</code></td><td>抽取字段 / 多实体 / 派生</td></tr>
<tr><td><code>phase</code></td><td><span class="field-cat-产物">产物</span></td><td><code>str | None</code></td><td><code>draft</code><code>executed</code></td></tr>
<tr><td><code>evaluation</code></td><td><span class="field-cat-产物">产物</span></td><td><code>EvaluationResult | None</code></td><td>每条规则的评查结论</td></tr>
<tr><td><code>fallback_tasks</code></td><td><span class="field-cat-产物">产物</span></td><td><code>tuple[RescueTask,...]</code></td><td>失败规则的修救任务</td></tr>
<tr><td><code>extraction_errors</code></td><td><span class="field-cat-诊断">诊断</span></td><td><code>tuple[str,...]</code></td><td>抽取错误日志</td></tr>
<tr><td><code>timing</code></td><td><span class="field-cat-诊断">诊断</span></td><td><code>Mapping[str,float]</code></td><td>每阶段耗时</td></tr>
</tbody>
</table>
<div class="note"><span class="tag">不可变性:</span><code>frozen=True</code> 防字段重绑;<code>__post_init__</code> 把 list/dict 强转 <code>tuple</code> / <code>MappingProxyType</code>,防 <code>ctx.timing["x"] = 1</code> 这种穿透改写。</div>
<h2>调用</h2>
<h3 style="font-size:15px; color:var(--c-accent); margin:20px 0 8px;">1. 构造</h3>
<pre><code>from leaudit.services.audit_ctx import AuditCtx
from leaudit.services.audit_services import AuditServices
from leaudit.config.audit_config import AuditConfig
services = AuditServices(
llm_client=llm, vlm_client=vlm, ocr_client=ocr,
normalization=norm_svc, extraction=ext_svc, evaluation=eval_svc,
)
ctx = AuditCtx(
document_id="my_doc",
rules_file=rules,
services=services,
file_path="/path/to/doc.pdf",
config=AuditConfig(group_size=8),
)</code></pre>
<h3 style="font-size:15px; color:var(--c-accent); margin:20px 0 8px;">2. 演化(每个 stage 一次)</h3>
<pre><code>ctx = ctx.with_normalized_doc(ocr_result) <span class="cmt"># Stage 1</span>
ctx = ctx.with_extraction(extraction) <span class="cmt"># Stage 3</span>
ctx = ctx.with_phase("executed") <span class="cmt"># Stage 4</span>
ctx = ctx.with_evaluation(evaluation) <span class="cmt"># Stage 5</span>
ctx = ctx.with_fallback_task(task) <span class="cmt"># Stage 6(每条失败规则一次)</span>
ctx = ctx.with_timing(ocr=1.2, total=8.5) <span class="cmt"># 累加耗时</span></code></pre>
<p>所有 <code>with_xxx()</code> 都是 <code>dataclasses.replace()</code> 的薄包装——<strong>返回新对象</strong>,旧 ctx 不变。要用 <code>ctx = ctx.with_...()</code> 接住返回值。</p>
<h3 style="font-size:15px; color:var(--c-accent); margin:20px 0 8px;">3. 读取</h3>
<pre><code>ocr_result = ctx.normalized_doc
fields = ctx.extraction.fields
phase = ctx.phase
score = ctx.evaluation.total_score
ocr_secs = ctx.timing.get("ocr", 0.0)
text = ctx.source_text <span class="cmt"># 派生属性 = ctx.extraction.source_text</span></code></pre>
<h3 style="font-size:15px; color:var(--c-accent); margin:20px 0 8px;">4. 两个桥接 API</h3>
<table>
<thead><tr><th style="width:30%">API</th><th>用途</th></tr></thead>
<tbody>
<tr>
<td><code>ctx.effective_config</code></td>
<td>合并 <code>ctx.config</code> + <code>Settings</code><strong>业务代码统一只读这个</strong>,禁止 <code>get_settings()</code> / <code>os.getenv</code>(架构 guard 在 <span class="fileref">tests/architecture/</span></td>
</tr>
<tr>
<td><code>ctx.as_rescue_inputs()</code></td>
<td>转成 engine 旧的 <code>RescueInputs</code> 形状,桥接 <code>engine._try_ai_rescue</code>,不用改 engine</td>
</tr>
</tbody>
</table>
<h2>典型调用:AuditService.audit(ctx)</h2>
<p>所有 stage 已经在 <code>AuditService.audit()</code> 里编排好了(<span class="fileref">src/leaudit/services/audit_service.py:152</span>)。普通用法只要构造 ctx 然后调它:</p>
<pre><code>ctx = AuditCtx(document_id="...", rules_file=rules, services=services,
file_path="...", config=cfg)
ctx = await audit_service.audit(ctx)
<span class="cmt"># 跑完 ctx 已经被填满 normalized_doc / extraction / phase / evaluation / fallback_tasks / timing</span></code></pre>
<p>七个 stage 顺序:<code>Normalize</code><code>Resolve rules</code><code>Extract</code><code>Phase 判定</code><code>Evaluate</code><code>Rescue</code><code>Finalize</code></p>
<h2>规则三条</h2>
<ul>
<li>加产物字段 → <code>audit_ctx.py</code> 加字段 + 一个 <code>with_xxx</code> helper</li>
<li>读运行配置 → <code>ctx.effective_config</code><strong>不要</strong>直接 <code>get_settings()</code></li>
<li>改 stage 行为 → <strong>必须</strong> return 新 ctx<strong>不要</strong>原地改字段</li>
</ul>
</div>
</body>
</html>