408 lines
13 KiB
Markdown
408 lines
13 KiB
Markdown
# RAG DSL Bridge Merge Implementation Plan
|
|
|
|
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
|
|
|
**Goal:** Safely bring `gitea/jande-feature-dsl` RAG-backed rule DSL execution into current `wren-dev` without overwriting tenant, permission, or page-quality work already present.
|
|
|
|
**Architecture:** Add a shared `RagRetriever` service that both RAG chat and the leaudit evaluation pipeline can call. Wire the pipeline to pass a retriever into `leaudit.evaluate_extraction`, where the installed/local `leaudit` package already supports `stage.rag.query_template`. Keep current tenant-aware RAG chat code as the source of truth and only replace duplicate retrieval helpers with calls into the shared retriever.
|
|
|
|
**Tech Stack:** Python 3.12, FastAPI, SQLAlchemy async sessions, Chroma, httpx, pytest, leaudit DSL/evaluation engine.
|
|
|
|
---
|
|
|
|
## File Structure
|
|
|
|
- Create `fastapi_modules/fastapi_leaudit/rag_engine/retriever.py`: shared RAG retrieval implementation copied from `gitea/jande-feature-dsl`, with vector search, keyword fallback, source filtering, document hydration, and source formatting.
|
|
- Modify `fastapi_modules/fastapi_leaudit/leaudit_bridge/pipeline.py`: inject `RagRetriever` into `LauditPipeline` and pass it to `evaluate_extraction`.
|
|
- Modify `fastapi_modules/fastapi_leaudit/services/impl/ragChatServiceImpl.py`: preserve current tenant-aware chat API and delegate retrieval internals to `RagRetriever`.
|
|
- Modify `rules/行政处罚/rules.yaml`: append only the new `JZ-JD-005` RAG-backed rule from `gitea/jande-feature-dsl`; do not rewrite existing rules.
|
|
- Create `tests/test_rag_retriever.py`: validates vector retrieval source filtering and keyword fallback.
|
|
- Create `tests/test_leaudit_rag_bridge.py`: validates pipeline passes the injected retriever to leaudit evaluation.
|
|
|
|
## Execution Constraints
|
|
|
|
- Do not run `git merge gitea/jande-feature-dsl` in the working tree.
|
|
- Do not touch current untracked files unless explicitly requested.
|
|
- Do not overwrite `fastapi_modules/fastapi_leaudit/services/impl/documentServiceImpl.py`; it has unrelated page-quality work in progress.
|
|
- Stage and commit only after explicit user approval.
|
|
- If `ragChatServiceImpl.py` behavior changes beyond retrieval helper delegation, stop and re-review before continuing.
|
|
|
|
### Task 1: Copy Shared Retriever And Tests
|
|
|
|
**Files:**
|
|
- Create: `fastapi_modules/fastapi_leaudit/rag_engine/retriever.py`
|
|
- Create: `tests/test_rag_retriever.py`
|
|
- Create: `tests/test_leaudit_rag_bridge.py`
|
|
|
|
- [ ] **Step 1: Restore files from source branch**
|
|
|
|
Run:
|
|
|
|
```bash
|
|
git checkout gitea/jande-feature-dsl -- fastapi_modules/fastapi_leaudit/rag_engine/retriever.py tests/test_rag_retriever.py tests/test_leaudit_rag_bridge.py
|
|
```
|
|
|
|
Expected: three files appear in working tree; no existing tenant code is modified.
|
|
|
|
- [ ] **Step 2: Verify copied files are isolated**
|
|
|
|
Run:
|
|
|
|
```bash
|
|
git status --short fastapi_modules/fastapi_leaudit/rag_engine/retriever.py tests/test_rag_retriever.py tests/test_leaudit_rag_bridge.py
|
|
```
|
|
|
|
Expected:
|
|
|
|
```text
|
|
A fastapi_modules/fastapi_leaudit/rag_engine/retriever.py
|
|
A tests/test_leaudit_rag_bridge.py
|
|
A tests/test_rag_retriever.py
|
|
```
|
|
|
|
### Task 2: Wire Retriever Into Pipeline
|
|
|
|
**Files:**
|
|
- Modify: `fastapi_modules/fastapi_leaudit/leaudit_bridge/pipeline.py`
|
|
- Test: `tests/test_leaudit_rag_bridge.py`
|
|
|
|
- [ ] **Step 1: Inspect current pipeline imports and constructor**
|
|
|
|
Run:
|
|
|
|
```bash
|
|
sed -n '1,120p' fastapi_modules/fastapi_leaudit/leaudit_bridge/pipeline.py
|
|
```
|
|
|
|
Expected: confirm current imports include `StorageAdapter` and constructor currently has `ocr_client`, `llm_client`, `storage_adapter`.
|
|
|
|
- [ ] **Step 2: Add retriever import**
|
|
|
|
Add this import next to the existing platform imports:
|
|
|
|
```python
|
|
from fastapi_modules.fastapi_leaudit.rag_engine.retriever import RagRetriever
|
|
```
|
|
|
|
- [ ] **Step 3: Extend constructor without changing existing arguments**
|
|
|
|
Change constructor signature to include:
|
|
|
|
```python
|
|
rag_retriever: RagRetriever | None = None,
|
|
```
|
|
|
|
Then initialize:
|
|
|
|
```python
|
|
self.rag_retriever = rag_retriever or RagRetriever()
|
|
```
|
|
|
|
- [ ] **Step 4: Pass retriever to evaluation**
|
|
|
|
In the `evaluate_extraction(...)` call, add:
|
|
|
|
```python
|
|
retriever=self.rag_retriever,
|
|
```
|
|
|
|
Expected: this is a pure dependency-injection change; OCR, extraction, storage, and tenant behavior are unchanged.
|
|
|
|
### Task 3: Preserve Tenant-Aware RAG Chat And Delegate Retrieval
|
|
|
|
**Files:**
|
|
- Modify: `fastapi_modules/fastapi_leaudit/services/impl/ragChatServiceImpl.py`
|
|
|
|
- [ ] **Step 1: Inspect current import area and constructor**
|
|
|
|
Run:
|
|
|
|
```bash
|
|
sed -n '1,110p' fastapi_modules/fastapi_leaudit/services/impl/ragChatServiceImpl.py
|
|
```
|
|
|
|
Expected: current file imports `build_openai_embeddings_url`, `get_chroma`, and `TenantResolver`; constructor initializes `self.TenantResolver`.
|
|
|
|
- [ ] **Step 2: Update imports minimally**
|
|
|
|
Remove unused direct retrieval imports after delegation:
|
|
|
|
```python
|
|
build_openai_embeddings_url,
|
|
```
|
|
|
|
Remove:
|
|
|
|
```python
|
|
from fastapi_modules.fastapi_leaudit.rag_engine.chroma_client import get_chroma
|
|
```
|
|
|
|
Add:
|
|
|
|
```python
|
|
from fastapi_modules.fastapi_leaudit.rag_engine.retriever import RagRetriever
|
|
```
|
|
|
|
Keep:
|
|
|
|
```python
|
|
from fastapi_modules.fastapi_leaudit.services.impl.tenantResolver import TenantResolver
|
|
```
|
|
|
|
- [ ] **Step 3: Preserve constructor and add retriever**
|
|
|
|
Change constructor to:
|
|
|
|
```python
|
|
def __init__(self, retriever: RagRetriever | None = None) -> None:
|
|
self.TenantResolver = TenantResolver()
|
|
self.retriever = retriever or RagRetriever()
|
|
```
|
|
|
|
Expected: existing tenant resolver remains active.
|
|
|
|
- [ ] **Step 4: Replace `_retrieve_context` implementation only**
|
|
|
|
Replace the body of `_retrieve_context` with:
|
|
|
|
```python
|
|
result = await self.retriever.retrieve(query=query, dataset_id=dataset_id)
|
|
return result.chunks, result.dataset_name
|
|
```
|
|
|
|
Do not change method signature.
|
|
|
|
- [ ] **Step 5: Delegate `_embed_texts`**
|
|
|
|
Replace the body of `_embed_texts` with:
|
|
|
|
```python
|
|
return await self.retriever._embed_texts(texts, model_name)
|
|
```
|
|
|
|
- [ ] **Step 6: Delegate keyword fallback helper**
|
|
|
|
Replace the body of `_keyword_retrieve_context` with:
|
|
|
|
```python
|
|
chunks = await self.retriever._keyword_retrieve_context(
|
|
dataset_id=dataset_id,
|
|
collection_name=collection_name,
|
|
dataset_name=dataset_name,
|
|
query=query,
|
|
top_k=top_k,
|
|
score_threshold=score_threshold,
|
|
source_names=None,
|
|
)
|
|
return chunks[:top_k]
|
|
```
|
|
|
|
- [ ] **Step 7: Delegate keyword utility methods**
|
|
|
|
Replace helper bodies:
|
|
|
|
```python
|
|
def _build_keyword_terms(self, query: str) -> list[str]:
|
|
return self.retriever._build_keyword_terms(query)
|
|
|
|
def _normalize_keyword_query(self, query: str) -> str:
|
|
return self.retriever._normalize_keyword_query(query)
|
|
|
|
def _score_keyword_chunk(self, *, query: str, terms: list[str], content: str, document_name: str) -> float:
|
|
return self.retriever._score_keyword_chunk(
|
|
query=query,
|
|
terms=terms,
|
|
content=content,
|
|
document_name=document_name,
|
|
)
|
|
```
|
|
|
|
- [ ] **Step 8: Delegate source building and hydration**
|
|
|
|
Replace `_build_sources` body:
|
|
|
|
```python
|
|
build_sources = getattr(self.retriever, "build_sources", None)
|
|
if callable(build_sources):
|
|
return build_sources(context_chunks, dataset_name)
|
|
return RagRetriever(hydrate_documents=False).build_sources(context_chunks, dataset_name)
|
|
```
|
|
|
|
Replace `_hydrate_document_hits` body:
|
|
|
|
```python
|
|
return await self.retriever._hydrate_document_hits(dataset_id, chunks)
|
|
```
|
|
|
|
Expected: public chat behavior, tenant filtering, app resolution, session ownership, and permission feedback remain current `wren-dev` behavior.
|
|
|
|
### Task 4: Append Administrative Penalty RAG Rule
|
|
|
|
**Files:**
|
|
- Modify: `rules/行政处罚/rules.yaml`
|
|
|
|
- [ ] **Step 1: Confirm new rule is not already present**
|
|
|
|
Run:
|
|
|
|
```bash
|
|
grep -n "JZ-JD-005" rules/行政处罚/rules.yaml
|
|
```
|
|
|
|
Expected: no output before insertion.
|
|
|
|
- [ ] **Step 2: Append only the source-branch rule**
|
|
|
|
Extract from `gitea/jande-feature-dsl` and insert the `JZ-JD-005` block after the existing `JZ-JD-004` rule in `rules/行政处罚/rules.yaml`.
|
|
|
|
The inserted rule must include:
|
|
|
|
```yaml
|
|
- rule_id: JZ-JD-005
|
|
name: 案由及裁量标准适用准确性
|
|
desc: 结合处罚决定书认定依据、处罚依据、罚款项目和罚款金额,检索案由与裁量标准,判断处罚种类和罚款幅度是否适用准确。
|
|
risk: medium
|
|
score: 10
|
|
scope:
|
|
- 处罚决定书
|
|
rag:
|
|
collection: general_legal_kb
|
|
top_k: 5
|
|
source_names:
|
|
- 广东省烟草专卖行政处罚裁量执行标准-rag.md
|
|
- 案由_行政处罚与反走私管理治理办法.md
|
|
query_template: |
|
|
认定依据:{{处罚决定书.认定依据}}
|
|
处罚依据:{{处罚决定书.处罚依据}}
|
|
罚款项目:{{处罚决定书.罚款项目}}
|
|
罚款基数:{{处罚决定书.罚款基数}}
|
|
罚款比例:{{处罚决定书.罚款比例}}
|
|
罚款总额:{{处罚决定书.罚款总额}}
|
|
问题:检索对应案由、裁量档次、处罚种类和罚款幅度
|
|
inject_as: rag_context
|
|
resources_as: rag_resources
|
|
stages:
|
|
- id: '1'
|
|
check: required
|
|
fields:
|
|
- 处罚决定书.认定依据
|
|
- 处罚决定书.处罚依据
|
|
- 处罚决定书.罚款项目
|
|
- 处罚决定书.罚款基数
|
|
- 处罚决定书.罚款比例
|
|
- 处罚决定书.罚款总额
|
|
- id: '2'
|
|
check: ai
|
|
prompt: |
|
|
请结合检索到的法律知识和卷宗处罚决定书字段,判断案由、裁量档次、处罚种类和罚款幅度是否适用准确。
|
|
|
|
【检索依据】
|
|
{{rag_context}}
|
|
|
|
【处罚决定书字段】
|
|
认定依据:{{处罚决定书.认定依据}}
|
|
处罚依据:{{处罚决定书.处罚依据}}
|
|
罚款项目:{{处罚决定书.罚款项目}}
|
|
罚款基数:{{处罚决定书.罚款基数}}
|
|
罚款比例:{{处罚决定书.罚款比例}}
|
|
罚款总额:{{处罚决定书.罚款总额}}
|
|
|
|
【判断要求】
|
|
1. 判断违法事实对应案由是否准确;
|
|
2. 判断处罚依据是否能支撑对应处罚种类;
|
|
3. 判断罚款基数、比例、总额是否落在裁量标准允许幅度内;
|
|
4. 若检索依据不足以确认,应给出 warn,不要编造依据。
|
|
logic: 1 AND 2
|
|
messages:
|
|
pass: 案由、裁量档次、处罚种类和罚款幅度适用准确。
|
|
fail: 案由、裁量档次、处罚种类或罚款幅度可能适用不准确,请核对。
|
|
references_laws:
|
|
- 《中华人民共和国行政处罚法》第五十九条
|
|
type: ai_rule
|
|
```
|
|
|
|
Expected: YAML diff only contains the new rule block.
|
|
|
|
### Task 5: Verify Compile And Targeted Tests
|
|
|
|
**Files:**
|
|
- Verify: `fastapi_modules/fastapi_leaudit/rag_engine/retriever.py`
|
|
- Verify: `fastapi_modules/fastapi_leaudit/leaudit_bridge/pipeline.py`
|
|
- Verify: `fastapi_modules/fastapi_leaudit/services/impl/ragChatServiceImpl.py`
|
|
- Verify: `tests/test_rag_retriever.py`
|
|
- Verify: `tests/test_leaudit_rag_bridge.py`
|
|
|
|
- [ ] **Step 1: Run Python compile check**
|
|
|
|
Run:
|
|
|
|
```bash
|
|
.venv/bin/python -m py_compile fastapi_modules/fastapi_leaudit/rag_engine/retriever.py fastapi_modules/fastapi_leaudit/leaudit_bridge/pipeline.py fastapi_modules/fastapi_leaudit/services/impl/ragChatServiceImpl.py
|
|
```
|
|
|
|
Expected: exit code 0, no syntax errors.
|
|
|
|
- [ ] **Step 2: Run targeted tests**
|
|
|
|
Run:
|
|
|
|
```bash
|
|
.venv/bin/pytest tests/test_rag_retriever.py tests/test_leaudit_rag_bridge.py -q
|
|
```
|
|
|
|
Expected: all tests pass.
|
|
|
|
- [ ] **Step 3: Run current relevant RAG chat tests if present**
|
|
|
|
Run:
|
|
|
|
```bash
|
|
find tests -maxdepth 2 -type f -iname '*rag*chat*' -o -iname '*rag*permission*'
|
|
```
|
|
|
|
If files are found, run them with `.venv/bin/pytest <files> -q`. Expected: pass or pre-existing unrelated failure documented.
|
|
|
|
### Task 6: Final Diff Review
|
|
|
|
**Files:**
|
|
- Review all touched files.
|
|
|
|
- [ ] **Step 1: Show changed files**
|
|
|
|
Run:
|
|
|
|
```bash
|
|
git status --short
|
|
```
|
|
|
|
Expected: touched files include only the planned files plus pre-existing unrelated files.
|
|
|
|
- [ ] **Step 2: Review diff summary**
|
|
|
|
Run:
|
|
|
|
```bash
|
|
git diff --stat
|
|
```
|
|
|
|
Expected: planned files dominate; no accidental frontend, database, or page-quality code changes except pre-existing `documentServiceImpl.py`.
|
|
|
|
- [ ] **Step 3: Confirm no conflict markers**
|
|
|
|
Run:
|
|
|
|
```bash
|
|
grep -R "<<<<<<<\\|=======\\|>>>>>>>" -n fastapi_modules tests rules || true
|
|
```
|
|
|
|
Expected: no output.
|
|
|
|
- [ ] **Step 4: Report status instead of committing**
|
|
|
|
Do not commit automatically. Report:
|
|
|
|
- Files changed.
|
|
- Tests run and result.
|
|
- Whether rules YAML needs OSS/database re-import to become active.
|
|
- Any known risk, especially dependency on installed `leaudit` package supporting `retriever`.
|