Files

T

wren 68d0b4c878 feat: update audit platform workspace

2026-05-25 09:50:01 +08:00

13 KiB

Raw Blame History

RAG DSL Bridge Merge Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Safely bring gitea/jande-feature-dsl RAG-backed rule DSL execution into current wren-dev without overwriting tenant, permission, or page-quality work already present.

Architecture: Add a shared RagRetriever service that both RAG chat and the leaudit evaluation pipeline can call. Wire the pipeline to pass a retriever into leaudit.evaluate_extraction, where the installed/local leaudit package already supports stage.rag.query_template. Keep current tenant-aware RAG chat code as the source of truth and only replace duplicate retrieval helpers with calls into the shared retriever.

Tech Stack: Python 3.12, FastAPI, SQLAlchemy async sessions, Chroma, httpx, pytest, leaudit DSL/evaluation engine.

File Structure

Create fastapi_modules/fastapi_leaudit/rag_engine/retriever.py: shared RAG retrieval implementation copied from gitea/jande-feature-dsl, with vector search, keyword fallback, source filtering, document hydration, and source formatting.
Modify fastapi_modules/fastapi_leaudit/leaudit_bridge/pipeline.py: inject RagRetriever into LauditPipeline and pass it to evaluate_extraction.
Modify fastapi_modules/fastapi_leaudit/services/impl/ragChatServiceImpl.py: preserve current tenant-aware chat API and delegate retrieval internals to RagRetriever.
Modify rules/行政处罚/rules.yaml: append only the new JZ-JD-005 RAG-backed rule from gitea/jande-feature-dsl; do not rewrite existing rules.
Create tests/test_rag_retriever.py: validates vector retrieval source filtering and keyword fallback.
Create tests/test_leaudit_rag_bridge.py: validates pipeline passes the injected retriever to leaudit evaluation.

Execution Constraints

Do not run git merge gitea/jande-feature-dsl in the working tree.
Do not touch current untracked files unless explicitly requested.
Do not overwrite fastapi_modules/fastapi_leaudit/services/impl/documentServiceImpl.py; it has unrelated page-quality work in progress.
Stage and commit only after explicit user approval.
If ragChatServiceImpl.py behavior changes beyond retrieval helper delegation, stop and re-review before continuing.

Task 1: Copy Shared Retriever And Tests

Files:

Create: fastapi_modules/fastapi_leaudit/rag_engine/retriever.py
Create: tests/test_rag_retriever.py
Create: tests/test_leaudit_rag_bridge.py
Step 1: Restore files from source branch

Run:

git checkout gitea/jande-feature-dsl -- fastapi_modules/fastapi_leaudit/rag_engine/retriever.py tests/test_rag_retriever.py tests/test_leaudit_rag_bridge.py

Expected: three files appear in working tree; no existing tenant code is modified.

Step 2: Verify copied files are isolated

Run:

git status --short fastapi_modules/fastapi_leaudit/rag_engine/retriever.py tests/test_rag_retriever.py tests/test_leaudit_rag_bridge.py

Expected:

A  fastapi_modules/fastapi_leaudit/rag_engine/retriever.py
A  tests/test_leaudit_rag_bridge.py
A  tests/test_rag_retriever.py

Task 2: Wire Retriever Into Pipeline

Files:

Modify: fastapi_modules/fastapi_leaudit/leaudit_bridge/pipeline.py
Test: tests/test_leaudit_rag_bridge.py
Step 1: Inspect current pipeline imports and constructor

Run:

sed -n '1,120p' fastapi_modules/fastapi_leaudit/leaudit_bridge/pipeline.py

Expected: confirm current imports include StorageAdapter and constructor currently has ocr_client, llm_client, storage_adapter.

Step 2: Add retriever import

Add this import next to the existing platform imports:

from fastapi_modules.fastapi_leaudit.rag_engine.retriever import RagRetriever

Step 3: Extend constructor without changing existing arguments

Change constructor signature to include:

        rag_retriever: RagRetriever | None = None,

Then initialize:

        self.rag_retriever = rag_retriever or RagRetriever()

Step 4: Pass retriever to evaluation

In the evaluate_extraction(...) call, add:

            retriever=self.rag_retriever,

Expected: this is a pure dependency-injection change; OCR, extraction, storage, and tenant behavior are unchanged.

Task 3: Preserve Tenant-Aware RAG Chat And Delegate Retrieval

Files:

Modify: fastapi_modules/fastapi_leaudit/services/impl/ragChatServiceImpl.py
Step 1: Inspect current import area and constructor

Run:

sed -n '1,110p' fastapi_modules/fastapi_leaudit/services/impl/ragChatServiceImpl.py

Expected: current file imports build_openai_embeddings_url, get_chroma, and TenantResolver; constructor initializes self.TenantResolver.

Step 2: Update imports minimally

Remove unused direct retrieval imports after delegation:

    build_openai_embeddings_url,

Remove:

from fastapi_modules.fastapi_leaudit.rag_engine.chroma_client import get_chroma

Add:

from fastapi_modules.fastapi_leaudit.rag_engine.retriever import RagRetriever

Keep:

from fastapi_modules.fastapi_leaudit.services.impl.tenantResolver import TenantResolver

Step 3: Preserve constructor and add retriever

Change constructor to:

    def __init__(self, retriever: RagRetriever | None = None) -> None:
        self.TenantResolver = TenantResolver()
        self.retriever = retriever or RagRetriever()

Expected: existing tenant resolver remains active.

Step 4: Replace _retrieve_context implementation only

Replace the body of _retrieve_context with:

        result = await self.retriever.retrieve(query=query, dataset_id=dataset_id)
        return result.chunks, result.dataset_name

Do not change method signature.

Step 5: Delegate _embed_texts

Replace the body of _embed_texts with:

        return await self.retriever._embed_texts(texts, model_name)

Step 6: Delegate keyword fallback helper

Replace the body of _keyword_retrieve_context with:

        chunks = await self.retriever._keyword_retrieve_context(
            dataset_id=dataset_id,
            collection_name=collection_name,
            dataset_name=dataset_name,
            query=query,
            top_k=top_k,
            score_threshold=score_threshold,
            source_names=None,
        )
        return chunks[:top_k]

Step 7: Delegate keyword utility methods

Replace helper bodies:

    def _build_keyword_terms(self, query: str) -> list[str]:
        return self.retriever._build_keyword_terms(query)

    def _normalize_keyword_query(self, query: str) -> str:
        return self.retriever._normalize_keyword_query(query)

    def _score_keyword_chunk(self, *, query: str, terms: list[str], content: str, document_name: str) -> float:
        return self.retriever._score_keyword_chunk(
            query=query,
            terms=terms,
            content=content,
            document_name=document_name,
        )

Step 8: Delegate source building and hydration

Replace _build_sources body:

        build_sources = getattr(self.retriever, "build_sources", None)
        if callable(build_sources):
            return build_sources(context_chunks, dataset_name)
        return RagRetriever(hydrate_documents=False).build_sources(context_chunks, dataset_name)

Replace _hydrate_document_hits body:

        return await self.retriever._hydrate_document_hits(dataset_id, chunks)

Expected: public chat behavior, tenant filtering, app resolution, session ownership, and permission feedback remain current wren-dev behavior.

Task 4: Append Administrative Penalty RAG Rule

Files:

Modify: rules/行政处罚/rules.yaml
Step 1: Confirm new rule is not already present

Run:

grep -n "JZ-JD-005" rules/行政处罚/rules.yaml

Expected: no output before insertion.

Step 2: Append only the source-branch rule

Extract from gitea/jande-feature-dsl and insert the JZ-JD-005 block after the existing JZ-JD-004 rule in rules/行政处罚/rules.yaml.

The inserted rule must include:

  - rule_id: JZ-JD-005
    name: 案由及裁量标准适用准确性
    desc: 结合处罚决定书认定依据、处罚依据、罚款项目和罚款金额，检索案由与裁量标准，判断处罚种类和罚款幅度是否适用准确。
    risk: medium
    score: 10
    scope:
    - 处罚决定书
    rag:
      collection: general_legal_kb
      top_k: 5
      source_names:
      - 广东省烟草专卖行政处罚裁量执行标准-rag.md
      - 案由_行政处罚与反走私管理治理办法.md
      query_template: |
        认定依据：{{处罚决定书.认定依据}}
        处罚依据：{{处罚决定书.处罚依据}}
        罚款项目：{{处罚决定书.罚款项目}}
        罚款基数：{{处罚决定书.罚款基数}}
        罚款比例：{{处罚决定书.罚款比例}}
        罚款总额：{{处罚决定书.罚款总额}}
        问题：检索对应案由、裁量档次、处罚种类和罚款幅度
      inject_as: rag_context
      resources_as: rag_resources
    stages:
    - id: '1'
      check: required
      fields:
      - 处罚决定书.认定依据
      - 处罚决定书.处罚依据
      - 处罚决定书.罚款项目
      - 处罚决定书.罚款基数
      - 处罚决定书.罚款比例
      - 处罚决定书.罚款总额
    - id: '2'
      check: ai
      prompt: |
        请结合检索到的法律知识和卷宗处罚决定书字段，判断案由、裁量档次、处罚种类和罚款幅度是否适用准确。

        【检索依据】
        {{rag_context}}

        【处罚决定书字段】
        认定依据：{{处罚决定书.认定依据}}
        处罚依据：{{处罚决定书.处罚依据}}
        罚款项目：{{处罚决定书.罚款项目}}
        罚款基数：{{处罚决定书.罚款基数}}
        罚款比例：{{处罚决定书.罚款比例}}
        罚款总额：{{处罚决定书.罚款总额}}

        【判断要求】
        1. 判断违法事实对应案由是否准确；
        2. 判断处罚依据是否能支撑对应处罚种类；
        3. 判断罚款基数、比例、总额是否落在裁量标准允许幅度内；
        4. 若检索依据不足以确认，应给出 warn，不要编造依据。
    logic: 1 AND 2
    messages:
      pass: 案由、裁量档次、处罚种类和罚款幅度适用准确。
      fail: 案由、裁量档次、处罚种类或罚款幅度可能适用不准确，请核对。
    references_laws:
    - 《中华人民共和国行政处罚法》第五十九条
    type: ai_rule

Expected: YAML diff only contains the new rule block.

Task 5: Verify Compile And Targeted Tests

Files:

Verify: fastapi_modules/fastapi_leaudit/rag_engine/retriever.py
Verify: fastapi_modules/fastapi_leaudit/leaudit_bridge/pipeline.py
Verify: fastapi_modules/fastapi_leaudit/services/impl/ragChatServiceImpl.py
Verify: tests/test_rag_retriever.py
Verify: tests/test_leaudit_rag_bridge.py
Step 1: Run Python compile check

Run:

.venv/bin/python -m py_compile fastapi_modules/fastapi_leaudit/rag_engine/retriever.py fastapi_modules/fastapi_leaudit/leaudit_bridge/pipeline.py fastapi_modules/fastapi_leaudit/services/impl/ragChatServiceImpl.py

Expected: exit code 0, no syntax errors.

Step 2: Run targeted tests

Run:

.venv/bin/pytest tests/test_rag_retriever.py tests/test_leaudit_rag_bridge.py -q

Expected: all tests pass.

Step 3: Run current relevant RAG chat tests if present

Run:

find tests -maxdepth 2 -type f -iname '*rag*chat*' -o -iname '*rag*permission*'

If files are found, run them with .venv/bin/pytest <files> -q. Expected: pass or pre-existing unrelated failure documented.

Task 6: Final Diff Review

Files:

Review all touched files.
Step 1: Show changed files

Run:

git status --short

Expected: touched files include only the planned files plus pre-existing unrelated files.

Step 2: Review diff summary

Run:

git diff --stat

Expected: planned files dominate; no accidental frontend, database, or page-quality code changes except pre-existing documentServiceImpl.py.

Step 3: Confirm no conflict markers

Run:

grep -R "<<<<<<<\\|=======\\|>>>>>>>" -n fastapi_modules tests rules || true

Expected: no output.

Step 4: Report status instead of committing

Do not commit automatically. Report:

Files changed.
Tests run and result.
Whether rules YAML needs OSS/database re-import to become active.
Any known risk, especially dependency on installed leaudit package supporting retriever.

13 KiB Raw Blame History

RAG DSL Bridge Merge Implementation Plan

File Structure

Execution Constraints

Task 1: Copy Shared Retriever And Tests

Task 2: Wire Retriever Into Pipeline

Task 3: Preserve Tenant-Aware RAG Chat And Delegate Retrieval

Task 4: Append Administrative Penalty RAG Rule

Task 5: Verify Compile And Targeted Tests

Task 6: Final Diff Review

13 KiB

Raw Blame History