first

2026-03-11 16:49:00 +08:00
commit 52d7d14795
53 changed files with 4991 additions and 0 deletions
--- a/.dockerignore
+++ b/.dockerignore
@@ -0,0 +1,10 @@
 .git
 .tmp
 .venv*
 __pycache__
 *.pyc
 *.pyo
 *.pyd
 *.log
 *.pid
 README.draft.md
--- a/.env
+++ b/.env
@@ -0,0 +1,36 @@
 # ===== 网关与后端端口 =====
 GATEWAY_HOST=127.0.0.1
 GATEWAY_PORT=8080
 BACKEND_HOST=127.0.0.1
 BACKEND_PORT=8081
 # ===== 推理参数 =====
 THINK_MODE=think-on
 CTX_SIZE=16384
 IMAGE_MIN_TOKENS=256
 IMAGE_MAX_TOKENS=2048
 MMPROJ_OFFLOAD=on
 # ===== 文件系统只读范围 =====
 READONLY_FS_ROOTS=C:\;D:\;E:\  # 留空时默认只允许读取项目目录；多个目录用分号分隔，例如 D:\docs;D:\projects
 READONLY_FS_MAX_READ_BYTES=524288  # 单次读取上限，默认 512KB 524288
 # ===== 文件系统操作权限 =====
 ENABLE_FILE_WRITE=True        # 总开关：是否允许大模型调用写入工具 关闭 False
 REQUIRE_HUMAN_CONFIRM=True    # 高危开关：写入前是否强制终端弹出确认提示 (y/n) 关闭 False
 WRITEABLE_FS_ROOTS=E:\AI_Workspace;E:\Temp\Output  # 安全红线：仅允许写入的根目录列表，多个用分号隔开。留空则禁止一切写入。
 # 记忆文件的存放路径，建议使用相对路径或灵活的绝对路径
 MEMORY_FILE_PATH=./.tmp/super_agent_data/memory.json
 # ===== 9B 模型路径（可不改，使用默认目录） =====
 MODEL_PATH=.tmp/models/crossrepo/lmstudio-community__Qwen3.5-9B-GGUF/Qwen3.5-9B-Q4_K_M.gguf
 MMPROJ_PATH=.tmp/models/crossrepo/lmstudio-community__Qwen3.5-9B-GGUF/mmproj-Qwen3.5-9B-BF16.gguf
 # 如果要一键切到 Q8，可执行 .\install_q8.cmd，它会自动把下面两项改成 Q8
 # ===== 安装阶段下载源（可选覆盖） =====
 # LLAMA_WIN_CUDA_URL=
 # MODEL_GGUF_URL=
 # MODEL_MMPROJ_URL=
 # MODEL_GGUF_SHA256=
 # MODEL_MMPROJ_SHA256=
--- a/.env.example
+++ b/.env.example
@@ -0,0 +1,28 @@
 # ===== 网关与后端端口 =====
 GATEWAY_HOST=127.0.0.1
 GATEWAY_PORT=8080
 BACKEND_HOST=127.0.0.1
 BACKEND_PORT=8081
 # ===== 推理参数 =====
 THINK_MODE=think-on
 CTX_SIZE=16384
 IMAGE_MIN_TOKENS=256
 IMAGE_MAX_TOKENS=1024
 MMPROJ_OFFLOAD=off
 # ===== 文件系统只读范围 =====
 READONLY_FS_ROOTS=  # 留空时默认只允许读取项目目录；多个目录用分号分隔，例如 D:\docs;D:\projects
 READONLY_FS_MAX_READ_BYTES=524288  # 单次读取上限，默认 512KB
 # ===== 9B 模型路径（可不改，使用默认目录） =====
 MODEL_PATH=.tmp/models/crossrepo/lmstudio-community__Qwen3.5-9B-GGUF/Qwen3.5-9B-Q4_K_M.gguf
 MMPROJ_PATH=.tmp/models/crossrepo/lmstudio-community__Qwen3.5-9B-GGUF/mmproj-Qwen3.5-9B-BF16.gguf
 # 如果要一键切到 Q8，可执行 .\install_q8.cmd，它会自动把下面两项改成 Q8
 # ===== 安装阶段下载源（可选覆盖） =====
 # LLAMA_WIN_CUDA_URL=
 # MODEL_GGUF_URL=
 # MODEL_MMPROJ_URL=
 # MODEL_GGUF_SHA256=
 # MODEL_MMPROJ_SHA256=
--- a/README-ToolHub.md
+++ b/README-ToolHub.md
@@ -0,0 +1,47 @@
 # Qwen3.5-9B ToolHub
 基于 Qwen3.5-9B 多模态模型 + 可调用工具的本地一体化部署方案
 ✅联网搜索、看图、读文件
 模型推理在本机 GPU 完成，可通过 API 接口使用
 需要 Windows 10/11、NVIDIA 显卡（≥ 8 GB 显存）、Python 3.10+
 ## 启动
 ```
 1. 双击 bootstrap.bat   ← 首次安装，下载约 6 GB 模型
 2. .\start_8080_toolhub_stack.cmd start
 3. 浏览器打开 http://127.0.0.1:8080
 停止：.\start_8080_toolhub_stack.cmd stop
 ```
 每次启动需要 30–60 秒加载模型。
 ## 其他路线
 上面是 Windows 默认主线。如果你的情况不同，可以选择：
 - **Docker Compose** — 已装好 Docker 且 GPU 容器可用的环境。`docker compose up --build` 即可。→ [详细说明](docs/DOCKER_COMPOSE.md)
 - **WSL** — 已有 WSL 环境的用户。`./install.sh` + `./start_8080_toolhub_stack.sh start`，底层复用 Windows 主链路。
 - **Q8 量化（约占用10.2 GB）** — 如果你的显存 ≥ 12 GB ，双击 `bootstrap_q8.bat`，脚本自动切换模型并下载。
 ## 能做什么
 - 联网搜索，抓取网页，提炼摘要并附来源
 - 上传图片直接提问，支持局部放大和以图搜图
 - 只读浏览本机文件，让 AI 帮你看文档和日志
 - 内置思维链，复杂问题可展开推理过程
 - OpenAI 兼容 API（`http://127.0.0.1:8080/v1`），可对接任意兼容客户端
 ## 文档
 - [详细介绍](docs/QUICKSTART.md) — 安装、启动、配置、服务管理
 - [常见问题](docs/TROUBLESHOOTING.md) — 排障指引
 - [Docker Compose](docs/DOCKER_COMPOSE.md) — 容器化部署
 ## 致谢
 - [Qwen3.5](https://github.com/QwenLM/Qwen3) — 通义千问多模态大模型
 - [llama.cpp](https://github.com/ggml-org/llama.cpp) — 高性能 GGUF 推理引擎
--- a/README.md
+++ b/README.md
@@ -0,0 +1,152 @@
 # Qwen3.5-9B ToolHub Enhanced Version
 > **版本标识**：**原版基础功能 + 二开增强模块** | **Qwen3.5 多模态工具链本地一体化部署方案**
 基于 Qwen3.5-9B 多模态模型 + 可调用工具的本地一体化部署方案。本项目是在原版基础上深度二次开发，实现了 AI 从"只能看"到"**能写、能记、能感知**"的质变。
 ---
 ## 📌 项目定位与声明
 ### 基础定位
 - ✅联网搜索、看图、读文件（原版能力）
 - **模型推理在本机 GPU 完成，可通过 API 接口使用**
 - 需要 Windows 10/11、NVIDIA 显卡（≥8GB 显存）、Python 3.10+
 ### 声明
 本版本由 **老王 (Lao Wang)** 及 AI 协作伙伴共同完成，旨在探索本地小规模参数模型在实际办公场景中的生产力极限。  
 开源致谢：[Qwen3.5](https://github.com/QwenLM/Qwen3) | [llama.cpp](https://github.com/ggml-org/llama.cpp)
 ---
 ## 🚀 核心增强功能（二开版独有）
 ### 1. ⚡ 原子化物理写入引擎 (Atomic Write Engine)
 - **功能突破**：新增 `write_tools.py` 模块，赋予模型真正的"物理写权限"
 - **静默落盘**：通过 `.env`环境变量配置 `WRITEABLE_FS_ROOTS`白名单，实现安全、极速的自动化文件保存
 - **沙盒保护**：严格限制写入目录，确保系统核心文件的安全
 ### 2. 🧠 "睁眼即知"的持久化记忆热注入 (Persistent Memory Injection)
 - **长期记忆库**：独立开发 `memory_tools.py`，支持基于 JSON 的偏好、身份和习惯存储
 - **热加载技术**：重构 `toolhub_gateway_agent.py`，在每一轮对话初始化时，将 `memory.json`内容动态注入 System Prompt
 - **零开销感知**：AI 无需主动翻阅本子，即可毫秒级感知用户昵称（如"老王"）、特定排版偏好（如 Markdown）等重要信息
 ### 3. 🛡️ 反侦察网页抓取增强 (Robust Web Fetcher)
 - **HTTP 429 修复**：解决了原版抓取 GitHub 等站点时频发的 HTTP 429 错误
 - **技术细节**：集成主流浏览器 User-Agent伪装，引入指数补偿重试逻辑
 ### 4. ⏰ 实时环境感知系统
 - **时效性补全**：动态注入系统实时时间、星期及运行环境上下文，显著提升处理时间敏感型指令的准确度
 ---
 ## ✅ 完整功能清单（合并版）
 | 能力类别 | 基础功能 (原版) | 增强功能 (二开) |
 |---------|-----------------|---------------|
 | **联网搜索** | ✅网页抓取、摘要提炼、附来源链接 | ✅反爬优化，429错误自动重试 |
 | **图片处理** | ✅看图提问、局部放大、以图搜图 | - |
 | **文件操作** | ✅只读浏览本机文件/日志 | ⭐原子化写入（白名单沙盒） |
 | **记忆管理** | 无长期记忆 | ⭐JSON持久化偏好库 |
 | **环境感知** | 基础上下文 | ⭐实时时间/星期动态注入 |
 | **API接口** | ✅OpenAI兼容 API (v1) | - |
 ---
 ## 🛠️ 安装与配置（完整流程）
 ### 主线部署：Windows 默认方式（推荐新手）
 #### 首次安装（约6GB模型）
 ```bash
 # 方法一：标准启动脚本（双击运行）
 bootstrap.bat
 # 方法二：Q8量化版（显存≥12GB，占用约10.2GB）
 bootstrap_q8.bat
 ```
 #### 启动服务
 ```bash
 .\start_8080_toolhub_stack.cmd start
 # 浏览器访问 http://127.0.0.1:8080
 停止：.\start_8080_toolhub_stack.cmd stop
 ```
 > ⚠️ **每次启动需要30–60秒加载模型**
 ---
 ### 🔧 增强配置（二开版专属）
 在 `.env`文件中添加以下配置以启用增强功能：
 ```env
 # =========================================
 # 【二开版】增强功能开关与路径配置
 # =========================================
 # ✅ 开启文件写入功能
 ENABLE_FILE_WRITE=True
 # 📂 写入权限白名单（沙盒保护）
 WRITEABLE_FS_ROOTS=E:\AI_Workspace
 # 💾 记忆文件存储路径
 MEMORY_FILE_PATH=./.tmp/super_agent_data/memory.json
 ```
 ---
 ### 其他部署路线
 - **🧊 WSL 模式** — 已有WSL环境的用户  
  ```bash
  ./install.sh + ./start_8080_toolhub_stack.sh start
  ```
  底层复用Windows主链路，适合双系统开发场景。
 ---
 ## 📖 文档导航
 | 章节 | 说明 | 路径 |
 |------|------|------|
 | [详细介绍](docs/QUICKSTART.md) | 安装、启动、配置、服务管理 | docs/QUICKSTART.md |
 | [常见问题](docs/TROUBLESHOOTING.md) | 排障指引，含HTTP 429处理 | docs/TROUBLESHOOTING.md |
 | [Docker Compose](docs/DOCKER_COMPOSE.md) | 容器化部署指南 | docs/DOCKER_COMPOSE.md |
 ---
 ## 📊 系统实时状态（环境上下文）
 ```text
 当前时间：🕒 2026-03-11 星期三 16:29:03 CST
 运行模式：本地 GPU 推理 + OpenAI API兼容层
 API端点：http://127.0.0.1:8080/v1
 ```
 ---
 ## ⚙️ 启动命令速查表
 | 操作 | Windows命令 |  
 |------|------------|
 | **首次安装** | `bootstrap.bat` |  
 | **Q8量化版** | `bootstrap_q8.bat` (≥12GB显存) |  
 | **启动服务** | `.\start_8080_toolhub_stack.cmd start` |  
 | **停止服务** | `.\start_8080_toolhub_stack.cmd stop` |  
 ---
 ## 🤝 开源致谢
 - **[Qwen3.5](https://github.com/QwenLM/Qwen3)** — 通义千问多模态大模型
 - **[llama.cpp](https://github.com/ggml-org/llama.cpp)** — 高性能GGUF推理引擎
 ---
 祝你调试愉快！🚀
--- a/agent_runtime/init.py
+++ b/agent_runtime/init.py
@@ -0,0 +1,7 @@
 from . import code_tool  # noqa: F401
 from . import image_zoom_tool  # noqa: F401
 from . import search_tools  # noqa: F401
 from . import system_tools  # noqa: F401
 from . import web_fetch_tool  # noqa: F401
 from . import workflow_tools  # noqa: F401
 from . import memory_tools  # noqa: F401
--- a/agent_runtime/pycache/init.cpython-310.pyc
+++ b/agent_runtime/pycache/init.cpython-310.pyc
--- a/agent_runtime/pycache/code_tool.cpython-310.pyc
+++ b/agent_runtime/pycache/code_tool.cpython-310.pyc
--- a/agent_runtime/pycache/image_source_map.cpython-310.pyc
+++ b/agent_runtime/pycache/image_source_map.cpython-310.pyc
--- a/agent_runtime/pycache/image_zoom_tool.cpython-310.pyc
+++ b/agent_runtime/pycache/image_zoom_tool.cpython-310.pyc
--- a/agent_runtime/pycache/memory_tools.cpython-310.pyc
+++ b/agent_runtime/pycache/memory_tools.cpython-310.pyc
--- a/agent_runtime/pycache/readonly_tools.cpython-310.pyc
+++ b/agent_runtime/pycache/readonly_tools.cpython-310.pyc
--- a/agent_runtime/pycache/search_tools.cpython-310.pyc
+++ b/agent_runtime/pycache/search_tools.cpython-310.pyc
--- a/agent_runtime/pycache/system_tools.cpython-310.pyc
+++ b/agent_runtime/pycache/system_tools.cpython-310.pyc
--- a/agent_runtime/pycache/web_fetch_tool.cpython-310.pyc
+++ b/agent_runtime/pycache/web_fetch_tool.cpython-310.pyc
--- a/agent_runtime/pycache/workflow_tools.cpython-310.pyc
+++ b/agent_runtime/pycache/workflow_tools.cpython-310.pyc
--- a/agent_runtime/pycache/write_tools.cpython-310.pyc
+++ b/agent_runtime/pycache/write_tools.cpython-310.pyc
--- a/agent_runtime/code_tool.py
+++ b/agent_runtime/code_tool.py
@@ -0,0 +1,74 @@
 import json
 import subprocess
 import sys
 import tempfile
 from pathlib import Path
 from typing import Union
 from qwen_agent.tools.base import BaseTool, register_tool
 from qwen_agent.utils.utils import extract_code
 ROOT_DIR = Path(__file__).resolve().parents[1]
 RUN_DIR = ROOT_DIR / '.tmp' / 'super_agent_data' / 'code_runs'
 DEFAULT_TIMEOUT = 60
@register_tool('code_interpreter', allow_overwrite=True)
 class LocalCodeInterpreterTool(BaseTool):
    description = '本机 Python 代码执行工具，返回 stdout 和 stderr。'
    parameters = {
        'type': 'object',
        'properties': {
            'code': {
                'type': 'string',
                'description': '要执行的 Python 代码'
            },
            'timeout_sec': {
                'type': 'integer',
                'description': '超时时间，单位秒',
                'default': DEFAULT_TIMEOUT
            }
        },
        'required': ['code'],
    }
    def call(self, params: Union[str, dict], **kwargs) -> str:
        params_dict = self._parse_code_params(params)
        code = params_dict['code']
        timeout_sec = int(params_dict.get('timeout_sec', DEFAULT_TIMEOUT))
        RUN_DIR.mkdir(parents=True, exist_ok=True)
        with tempfile.NamedTemporaryFile(mode='w', suffix='.py', dir=RUN_DIR, delete=False, encoding='utf-8') as fp:
            fp.write(code)
            script_path = fp.name
        completed = subprocess.run(
            [sys.executable, script_path],
            text=True,
            capture_output=True,
            timeout=timeout_sec,
            check=False,
        )
        payload = {
            'script_path': script_path,
            'returncode': completed.returncode,
            'stdout': completed.stdout,
            'stderr': completed.stderr,
        }
        return json.dumps(payload, ensure_ascii=False, indent=2)
    def _parse_code_params(self, params: Union[str, dict]) -> dict:
        if isinstance(params, dict):
            if 'code' not in params:
                raise ValueError('code 字段缺失')
            return params
        try:
            parsed = json.loads(params)
            if isinstance(parsed, dict) and 'code' in parsed:
                return parsed
        except json.JSONDecodeError:
            pass
        code = extract_code(params)
        if not code.strip():
            raise ValueError('未检测到可执行代码')
        return {'code': code}
--- a/agent_runtime/image_source_map.py
+++ b/agent_runtime/image_source_map.py
@@ -0,0 +1,32 @@
 import threading
 from pathlib import Path
 from typing import Dict
 _MAP_LOCK = threading.Lock()
 _SAFE_TO_ORIGINAL: Dict[str, str] = {}
 MAX_RECORDS = 2048
 def _normalize_path(path_or_uri: str) -> str:
    raw = path_or_uri.strip()
    if raw.startswith('file://'):
        raw = raw[len('file://'):]
    return str(Path(raw).expanduser().resolve())
 def register_safe_image(safe_path: str, original_path: str) -> None:
    safe_abs = _normalize_path(safe_path)
    original_abs = _normalize_path(original_path)
    with _MAP_LOCK:
        _SAFE_TO_ORIGINAL[safe_abs] = original_abs
        if len(_SAFE_TO_ORIGINAL) <= MAX_RECORDS:
            return
        overflow = len(_SAFE_TO_ORIGINAL) - MAX_RECORDS
        for key in list(_SAFE_TO_ORIGINAL.keys())[:overflow]:
            del _SAFE_TO_ORIGINAL[key]
 def resolve_original_image(path_or_uri: str) -> str:
    safe_abs = _normalize_path(path_or_uri)
    with _MAP_LOCK:
        return _SAFE_TO_ORIGINAL.get(safe_abs, safe_abs)
--- a/agent_runtime/image_zoom_tool.py
+++ b/agent_runtime/image_zoom_tool.py
@@ -0,0 +1,185 @@
 import math
 import os
 import uuid
 import base64
 from io import BytesIO
 from pathlib import Path
 from typing import List, Tuple, Union
 import requests
 from PIL import Image
 from qwen_agent.llm.schema import ContentItem
 from qwen_agent.log import logger
 from qwen_agent.tools.base import BaseToolWithFileAccess, register_tool
 from qwen_agent.utils.utils import extract_images_from_messages
 from .image_source_map import resolve_original_image
 MAX_IMAGE_PIXELS = int(os.getenv('SAFE_MAX_IMAGE_PIXELS', str(4 * 1024 * 1024)))
 MAX_IMAGE_SIDE = int(os.getenv('SAFE_MAX_IMAGE_SIDE', '3072'))
 MIN_IMAGE_SIDE = int(os.getenv('SAFE_MIN_IMAGE_SIDE', '28'))
 MIN_BBOX_SIDE = 32
 JPEG_QUALITY = int(os.getenv('SAFE_JPEG_QUALITY', '90'))
 RESAMPLE_LANCZOS = getattr(getattr(Image, 'Resampling', Image), 'LANCZOS')
 HTTP_TIMEOUT_SEC = 30
 def _normalize_local_path(path_or_uri: str) -> str:
    raw = path_or_uri.strip()
    if raw.startswith('file://'):
        raw = raw[len('file://'):]
    return str(Path(raw).expanduser().resolve())
 def _is_image_data_uri(image_ref: str) -> bool:
    return image_ref.strip().lower().startswith('data:image')
 def _load_data_uri_image(image_ref: str) -> Image.Image:
    try:
        header, encoded = image_ref.split(',', 1)
    except ValueError as exc:
        raise ValueError('data URI 格式错误') from exc
    if ';base64' not in header.lower():
        raise ValueError('仅支持 base64 图片 data URI')
    decoded = base64.b64decode(encoded)
    return Image.open(BytesIO(decoded)).convert('RGB')
 def _resolve_image_reference(image_ref: str) -> str:
    if _is_image_data_uri(image_ref):
        return image_ref
    if image_ref.startswith('http://') or image_ref.startswith('https://'):
        return image_ref
    return resolve_original_image(image_ref)
 def _load_image(image_ref: str, work_dir: str) -> Image.Image:
    if _is_image_data_uri(image_ref):
        return _load_data_uri_image(image_ref)
    if image_ref.startswith('http://') or image_ref.startswith('https://'):
        response = requests.get(image_ref, timeout=HTTP_TIMEOUT_SEC)
        response.raise_for_status()
        return Image.open(BytesIO(response.content)).convert('RGB')
    local = _normalize_local_path(image_ref)
    if os.path.exists(local):
        return Image.open(local).convert('RGB')
    fallback = os.path.join(work_dir, image_ref)
    return Image.open(fallback).convert('RGB')
 def _ensure_min_bbox(
    left: float,
    top: float,
    right: float,
    bottom: float,
    img_w: int,
    img_h: int,
 ) -> Tuple[int, int, int, int]:
    width = max(1.0, right - left)
    height = max(1.0, bottom - top)
    if width >= MIN_BBOX_SIDE and height >= MIN_BBOX_SIDE:
        return int(left), int(top), int(right), int(bottom)
    scale = MIN_BBOX_SIDE / min(width, height)
    half_w = width * scale * 0.5
    half_h = height * scale * 0.5
    center_x = (left + right) * 0.5
    center_y = (top + bottom) * 0.5
    new_left = max(0, int(math.floor(center_x - half_w)))
    new_top = max(0, int(math.floor(center_y - half_h)))
    new_right = min(img_w, int(math.ceil(center_x + half_w)))
    new_bottom = min(img_h, int(math.ceil(center_y + half_h)))
    return new_left, new_top, new_right, new_bottom
 def _relative_bbox_to_absolute(bbox_2d: list, img_w: int, img_h: int) -> Tuple[int, int, int, int]:
    rel_x1, rel_y1, rel_x2, rel_y2 = [float(v) for v in bbox_2d]
    abs_x1 = max(0.0, min(img_w, rel_x1 / 1000.0 * img_w))
    abs_y1 = max(0.0, min(img_h, rel_y1 / 1000.0 * img_h))
    abs_x2 = max(0.0, min(img_w, rel_x2 / 1000.0 * img_w))
    abs_y2 = max(0.0, min(img_h, rel_y2 / 1000.0 * img_h))
    left = min(abs_x1, abs_x2)
    top = min(abs_y1, abs_y2)
    right = max(abs_x1, abs_x2)
    bottom = max(abs_y1, abs_y2)
    return _ensure_min_bbox(left, top, right, bottom, img_w, img_h)
 def _scale_size(width: int, height: int) -> Tuple[int, int]:
    pixel_count = width * height
    if pixel_count <= 0:
        raise ValueError(f'无效图片尺寸: {width}x{height}')
    scale_by_pixels = math.sqrt(MAX_IMAGE_PIXELS / pixel_count) if pixel_count > MAX_IMAGE_PIXELS else 1.0
    longest_side = max(width, height)
    scale_by_side = MAX_IMAGE_SIDE / longest_side if longest_side > MAX_IMAGE_SIDE else 1.0
    scale = min(1.0, scale_by_pixels, scale_by_side)
    return (
        max(MIN_IMAGE_SIDE, int(width * scale)),
        max(MIN_IMAGE_SIDE, int(height * scale)),
    )
 def _resize_crop_if_needed(image: Image.Image) -> Image.Image:
    width, height = image.size
    new_w, new_h = _scale_size(width, height)
    if (new_w, new_h) == (width, height):
        return image
    return image.resize((new_w, new_h), RESAMPLE_LANCZOS)
@register_tool('image_zoom_in_tool', allow_overwrite=True)
 class OriginalImageZoomTool(BaseToolWithFileAccess):
    description = '基于原图裁切指定区域，并在裁切后按安全阈值缩放输出。'
    parameters = {
        'type': 'object',
        'properties': {
            'bbox_2d': {
                'type': 'array',
                'items': {
                    'type': 'number'
                },
                'minItems': 4,
                'maxItems': 4,
                'description': '裁切框，格式 [x1,y1,x2,y2]，坐标范围 0 到 1000'
            },
            'label': {
                'type': 'string',
                'description': '目标对象标签'
            },
            'img_idx': {
                'type': 'number',
                'description': '图片索引，从 0 开始'
            }
        },
        'required': ['bbox_2d', 'label', 'img_idx']
    }
    def call(self, params: Union[str, dict], **kwargs) -> List[ContentItem]:
        params = self._verify_json_format_args(params)
        images = extract_images_from_messages(kwargs.get('messages', []))
        if not images:
            return [ContentItem(text='Error: 未找到输入图片')]
        img_idx = int(params['img_idx'])
        if img_idx < 0 or img_idx >= len(images):
            return [ContentItem(text=f'Error: img_idx 越界，当前图片数量 {len(images)}')]
        os.makedirs(self.work_dir, exist_ok=True)
        try:
            image_ref = images[img_idx]
            source_ref = _resolve_image_reference(image_ref)
            image = _load_image(source_ref, self.work_dir)
            bbox = _relative_bbox_to_absolute(params['bbox_2d'], *image.size)
            cropped = image.crop(bbox)
            resized = _resize_crop_if_needed(cropped)
            output_path = os.path.abspath(os.path.join(self.work_dir, f'{uuid.uuid4()}.jpg'))
            resized.save(output_path, format='JPEG', quality=JPEG_QUALITY, optimize=True)
            return [ContentItem(image=output_path)]
        except Exception as exc:
            logger.warning(str(exc))
            return [ContentItem(text=f'Tool Execution Error {exc}')]
--- a/agent_runtime/memory_tools.py
+++ b/agent_runtime/memory_tools.py
@@ -0,0 +1,74 @@
 import json
 import os
 from pathlib import Path
 from typing import Union
 from qwen_agent.tools.base import BaseTool, register_tool
 # 从环境变量读取，如果读不到则默认为当前目录下的 memory.json
 # 使用 .resolve() 自动处理相对路径转绝对路径的逻辑
 MEMORY_FILE = Path(os.getenv('MEMORY_FILE_PATH', './memory.json')).resolve()
 def _load_memory() -> list:
    """内部函数：安全加载记忆并强制转换为列表格式"""
    if not MEMORY_FILE.exists():
        return []
    try:
        content = MEMORY_FILE.read_text(encoding='utf-8').strip()
        if not content:
            return []
        data = json.loads(content)
        # 核心修复：如果读到的是字典或其他格式，强制转为列表
        if isinstance(data, list):
            return data
        return [] 
    except Exception:
        return []
 def _save_memory(memories: list):
    """内部函数：安全保存"""
    try:
        MEMORY_FILE.parent.mkdir(parents=True, exist_ok=True)
        MEMORY_FILE.write_text(json.dumps(memories, ensure_ascii=False, indent=2), encoding='utf-8')
    except Exception as e:
        print(f"写入记忆文件失败: {e}")
@register_tool('manage_memory', allow_overwrite=True)
 class MemoryTool(BaseTool):
    description = '长期记忆管理工具。支持 add (添加), list (查看), delete (删除索引)。'
    parameters = {
        'type': 'object',
        'properties': {
            'operation': {'type': 'string', 'description': '操作类型: add|list|delete'},
            'content': {'type': 'string', 'description': '记忆内容（仅add模式）'},
            'index': {'type': 'integer', 'description': '索引号（仅delete模式）'}
        },
        'required': ['operation'],
    }
    def call(self, params: Union[str, dict], **kwargs) -> str:
        params = self._verify_json_format_args(params)
        op = params['operation'].lower()
        memories = _load_memory()
        if op == 'add':
            content = params.get('content', '').strip()
            if not content:
                return "错误：内容不能为空。"
            memories.append(content)
            _save_memory(memories)
            return f"✅ 成功存入：『{content}』"
        elif op == 'list':
            if not memories:
                return "目前没有任何长期记忆。"
            return "记忆列表：\n" + "\n".join([f"[{i}] {m}" for i, m in enumerate(memories)])
        elif op == 'delete':
            idx = params.get('index')
            if idx is None or not (0 <= idx < len(memories)):
                return f"错误：索引 {idx} 无效。"
            removed = memories.pop(idx)
            _save_memory(memories)
            return f"🗑️ 已删除：『{removed}』"
        return f"不支持的操作: {op}"
--- a/agent_runtime/readonly_tools.py
+++ b/agent_runtime/readonly_tools.py
@@ -0,0 +1,107 @@
 import json
 import os
 from pathlib import Path
 from typing import Iterable, Union
 from qwen_agent.tools.base import BaseTool, register_tool
 DEFAULT_MAX_READ_BYTES = 512 * 1024
 def _project_root() -> Path:
    return Path(__file__).resolve().parents[1]
 def _split_root_items(raw: str) -> list[str]:
    if not raw.strip():
        return []
    return [item.strip() for item in raw.split(os.pathsep) if item.strip()]
 def _resolve_roots() -> tuple[Path, ...]:
    roots_value = os.getenv('READONLY_FS_ROOTS', '')
    root_items = _split_root_items(roots_value)
    if not root_items:
        legacy_root = os.getenv('READONLY_FS_ROOT', '')
        if legacy_root.strip():
            root_items = [legacy_root.strip()]
    if not root_items:
        root_items = [str(_project_root())]
    return tuple(Path(os.path.expanduser(item)).resolve() for item in root_items)
 def _resolve_target(raw_path: str) -> Path:
    return Path(os.path.expanduser(raw_path)).resolve()
 def _is_within_root(target: Path, root: Path) -> bool:
    try:
        target.relative_to(root)
        return True
    except ValueError:
        return False
 def _ensure_within_roots(target: Path, roots: Iterable[Path]) -> None:
    allowed_roots = tuple(roots)
    if any(_is_within_root(target, root) for root in allowed_roots):
        return
    allowed_text = ', '.join(str(root) for root in allowed_roots)
    raise PermissionError(f'只允许访问这些根目录内的路径: {allowed_text}；拒绝: {target}')
@register_tool('filesystem', allow_overwrite=True)
 class ReadOnlyFilesystemTool(BaseTool):
    description = '只读文件系统工具，支持 list 和 read 两种操作。'
    parameters = {
        'type': 'object',
        'properties': {
            'operation': {
                'type': 'string',
                'description': '仅支持 list|read'
            },
            'path': {
                'type': 'string',
                'description': '目标路径'
            },
        },
        'required': ['operation', 'path'],
    }
    def call(self, params: Union[str, dict], **kwargs) -> str:
        params = self._verify_json_format_args(params)
        operation = str(params['operation']).strip().lower()
        if operation not in {'list', 'read'}:
            raise PermissionError(f'只读策略已启用，禁止 operation={operation}')
        roots = _resolve_roots()
        target = _resolve_target(str(params['path']))
        _ensure_within_roots(target, roots)
        if operation == 'list':
            return self._list_path(target)
        return self._read_file(target)
    def _list_path(self, target: Path) -> str:
        if not target.exists():
            raise FileNotFoundError(f'路径不存在: {target}')
        if target.is_file():
            stat = target.stat()
            payload = {'type': 'file', 'path': str(target), 'size': stat.st_size}
            return json.dumps(payload, ensure_ascii=False)
        items = []
        for child in sorted(target.iterdir()):
            item_type = 'dir' if child.is_dir() else 'file'
            size = child.stat().st_size if child.is_file() else None
            items.append({'name': child.name, 'type': item_type, 'size': size})
        payload = {'type': 'dir', 'path': str(target), 'items': items}
        return json.dumps(payload, ensure_ascii=False, indent=2)
    def _read_file(self, target: Path) -> str:
        if not target.exists() or not target.is_file():
            raise FileNotFoundError(f'文件不存在: {target}')
        limit = int(os.getenv('READONLY_FS_MAX_READ_BYTES', str(DEFAULT_MAX_READ_BYTES)))
        size = target.stat().st_size
        if size > limit:
            raise ValueError(f'文件过大: {size} bytes，超过读取上限 {limit} bytes')
        return target.read_text(encoding='utf-8')
--- a/agent_runtime/search_tools.py
+++ b/agent_runtime/search_tools.py
@@ -0,0 +1,135 @@
 import os
 import re
 from typing import List, Union
 from ddgs import DDGS
 from qwen_agent.llm.schema import ContentItem
 from qwen_agent.tools.base import BaseTool, register_tool
 DEFAULT_RESULTS = 6
 DEFAULT_REGION = os.getenv('WEB_SEARCH_REGION', 'wt-wt')
 DEFAULT_SAFESEARCH = os.getenv('WEB_SEARCH_SAFESEARCH', 'on')
 QUERY_SUFFIX_PATTERN = re.compile(
    r'(是谁|是什么|是啥|什么意思|介绍一下|请介绍|是谁啊|是谁呀|是啥啊|是啥呀|吗|嘛|呢|么)$'
 )
 def _normalize_query(query: str) -> str:
    compact = query.strip()
    compact = compact.replace('？', '?').replace('！', '!').replace('。', '.')
    compact = compact.strip(' ?!.,;:，。？！；：')
    compact = compact.removeprefix('请问').strip()
    compact = QUERY_SUFFIX_PATTERN.sub('', compact).strip()
    compact = compact.strip(' ?!.,;:，。？！；：')
    return compact or query.strip()
 def _clamp_results(value: int) -> int:
    if value < 1:
        return 1
    if value > 12:
        return 12
    return value
@register_tool('web_search', allow_overwrite=True)
 class LocalWebSearchTool(BaseTool):
    description = '搜索互联网并返回标题、链接和摘要。'
    parameters = {
        'type': 'object',
        'properties': {
            'query': {
                'type': 'string',
                'description': '搜索关键词'
            },
            'max_results': {
                'type': 'integer',
                'description': '返回条数，建议 1 到 12',
                'default': DEFAULT_RESULTS
            }
        },
        'required': ['query'],
    }
    def call(self, params: Union[str, dict], **kwargs) -> str:
        params = self._verify_json_format_args(params)
        query = _normalize_query(params['query'])
        if not query:
            raise ValueError('query 不能为空')
        max_results = _clamp_results(int(params.get('max_results', DEFAULT_RESULTS)))
        with DDGS() as ddgs:
            results = list(
                ddgs.text(
                    query=query,
                    max_results=max_results,
                    region=DEFAULT_REGION,
                    safesearch=DEFAULT_SAFESEARCH,
                )
            )
        if not results:
            return f'未检索到结果，query={query}'
        lines = []
        for idx, item in enumerate(results, start=1):
            title = item.get('title', '').strip()
            href = item.get('href', '').strip()
            body = item.get('body', '').strip()
            lines.append(f'[{idx}] {title}\nURL: {href}\n摘要: {body}')
        return '\n\n'.join(lines)
@register_tool('image_search', allow_overwrite=True)
 class LocalImageSearchTool(BaseTool):
    description = '按关键词搜索图片并返回图文结果。'
    parameters = {
        'type': 'object',
        'properties': {
            'query': {
                'type': 'string',
                'description': '图片搜索关键词'
            },
            'max_results': {
                'type': 'integer',
                'description': '返回条数，建议 1 到 12',
                'default': DEFAULT_RESULTS
            }
        },
        'required': ['query'],
    }
    def call(self, params: Union[str, dict], **kwargs) -> List[ContentItem]:
        params = self._verify_json_format_args(params)
        query = _normalize_query(params['query'])
        if not query:
            raise ValueError('query 不能为空')
        max_results = _clamp_results(int(params.get('max_results', DEFAULT_RESULTS)))
        try:
            with DDGS() as ddgs:
                results = list(
                    ddgs.images(
                        query=query,
                        max_results=max_results,
                        region=DEFAULT_REGION,
                        safesearch=DEFAULT_SAFESEARCH,
                    )
                )
        except Exception as exc:
            return [ContentItem(text=f'图片检索失败: {exc}')]
        if not results:
            return [ContentItem(text=f'未检索到图片，query={query}')]
        content: List[ContentItem] = []
        for idx, item in enumerate(results, start=1):
            title = item.get('title', '').strip()
            image_url = item.get('image', '').strip()
            page_url = item.get('url', '').strip()
            text = f'[{idx}] {title}\n图片: {image_url}\n来源: {page_url}'
            content.append(ContentItem(text=text))
            if image_url:
                content.append(ContentItem(image=image_url))
        return content
--- a/agent_runtime/system_tools.py
+++ b/agent_runtime/system_tools.py
@@ -0,0 +1,159 @@
 import json
 import os
 import shutil
 import subprocess
 from pathlib import Path
 from typing import Union
 from qwen_agent.tools.base import BaseTool, register_tool
 DEFAULT_TIMEOUT = 60
 def _ensure_parent(path: Path) -> None:
    parent = path.parent
    parent.mkdir(parents=True, exist_ok=True)
 def _build_shell_command(command: str) -> list[str]:
    if os.name == 'nt':
        return ['powershell.exe', '-NoProfile', '-Command', command]
    return ['bash', '-lc', command]
@register_tool('filesystem', allow_overwrite=True)
 class FilesystemTool(BaseTool):
    description = '文件系统工具，支持目录列举、读写文件、创建目录和删除。'
    parameters = {
        'type': 'object',
        'properties': {
            'operation': {
                'type': 'string',
                'description': 'list|read|write|append|mkdir|remove'
            },
            'path': {
                'type': 'string',
                'description': '目标路径'
            },
            'content': {
                'type': 'string',
                'description': '写入内容，仅 write 或 append 需要'
            }
        },
        'required': ['operation', 'path'],
    }
    def call(self, params: Union[str, dict], **kwargs) -> str:
        params = self._verify_json_format_args(params)
        operation = params['operation'].strip().lower()
        target = Path(os.path.expanduser(params['path'])).resolve()
        handlers = {
            'list': self._list_path,
            'read': self._read_file,
            'write': self._write_file,
            'append': self._append_file,
            'mkdir': self._mkdir_path,
            'remove': self._remove_path,
        }
        if operation not in handlers:
            raise ValueError(f'不支持的 operation: {operation}')
        return handlers[operation](target, params)
    def _list_path(self, target: Path, params: dict) -> str:
        if not target.exists():
            raise FileNotFoundError(f'路径不存在: {target}')
        if target.is_file():
            stat = target.stat()
            return json.dumps({'type': 'file', 'path': str(target), 'size': stat.st_size}, ensure_ascii=False)
        items = []
        for child in sorted(target.iterdir()):
            item_type = 'dir' if child.is_dir() else 'file'
            size = child.stat().st_size if child.is_file() else None
            items.append({'name': child.name, 'type': item_type, 'size': size})
        return json.dumps({'type': 'dir', 'path': str(target), 'items': items}, ensure_ascii=False, indent=2)
    def _read_file(self, target: Path, params: dict) -> str:
        if not target.exists() or not target.is_file():
            raise FileNotFoundError(f'文件不存在: {target}')
        return target.read_text(encoding='utf-8')
    def _write_file(self, target: Path, params: dict) -> str:
        content = params.get('content')
        if content is None:
            raise ValueError('write 操作必须提供 content')
        _ensure_parent(target)
        target.write_text(content, encoding='utf-8')
        return f'写入成功: {target}'
    def _append_file(self, target: Path, params: dict) -> str:
        content = params.get('content')
        if content is None:
            raise ValueError('append 操作必须提供 content')
        _ensure_parent(target)
        with target.open('a', encoding='utf-8') as fp:
            fp.write(content)
        return f'追加成功: {target}'
    def _mkdir_path(self, target: Path, params: dict) -> str:
        target.mkdir(parents=True, exist_ok=True)
        return f'目录已创建: {target}'
    def _remove_path(self, target: Path, params: dict) -> str:
        if not target.exists():
            raise FileNotFoundError(f'路径不存在: {target}')
        if target.is_dir():
            shutil.rmtree(target)
        else:
            target.unlink()
        return f'删除成功: {target}'
@register_tool('run_command', allow_overwrite=True)
 class RunCommandTool(BaseTool):
    description = '执行本机命令并返回退出码、标准输出和标准错误。'
    parameters = {
        'type': 'object',
        'properties': {
            'command': {
                'type': 'string',
                'description': '待执行命令'
            },
            'cwd': {
                'type': 'string',
                'description': '执行目录'
            },
            'timeout_sec': {
                'type': 'integer',
                'description': '超时时间秒数',
                'default': DEFAULT_TIMEOUT
            }
        },
        'required': ['command'],
    }
    def call(self, params: Union[str, dict], **kwargs) -> str:
        params = self._verify_json_format_args(params)
        command = params['command'].strip()
        if not command:
            raise ValueError('command 不能为空')
        timeout_sec = int(params.get('timeout_sec', DEFAULT_TIMEOUT))
        cwd_raw = params.get('cwd') or os.getcwd()
        cwd = str(Path(os.path.expanduser(cwd_raw)).resolve())
        completed = subprocess.run(
            _build_shell_command(command),
            cwd=cwd,
            text=True,
            capture_output=True,
            timeout=timeout_sec,
            check=False,
        )
        payload = {
            'command': command,
            'cwd': cwd,
            'returncode': completed.returncode,
            'stdout': completed.stdout,
            'stderr': completed.stderr,
        }
        return json.dumps(payload, ensure_ascii=False, indent=2)
--- a/agent_runtime/web_fetch_tool.py
+++ b/agent_runtime/web_fetch_tool.py
@@ -0,0 +1,104 @@
 import time
 import random
 from typing import Tuple, Union
 import requests
 from requests import Response
 from requests.exceptions import SSLError, RequestException
 from bs4 import BeautifulSoup
 from qwen_agent.tools.base import BaseTool, register_tool
 DEFAULT_MAX_CHARS = 10000
 # 模拟真实浏览器请求头，防止 GitHub 等网站返回 429
 COMMON_HEADERS = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8',
    'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8',
    'Accept-Encoding': 'gzip, deflate, br',
    'Connection': 'keep-alive'
 }
 def _normalize_text(text: str) -> str:
    lines = [line.strip() for line in text.splitlines()]
    lines = [line for line in lines if line]
    return '\n'.join(lines)
 def _fetch_page(url: str, timeout: int = 30, retries: int = 2) -> Tuple[Union[Response, str], bool]:
    """带有重试机制和伪装头的抓取函数"""
    for i in range(retries + 1):
        try:
            if i > 0:
                time.sleep(2 + random.uniform(1, 2) * i)
            response = requests.get(url, headers=COMMON_HEADERS, timeout=timeout, verify=True)
            if response.status_code == 429:
                if i < retries: continue
                return f"错误：目标网站限制了请求频率 (429)。请稍后再试，禁止读取本地无关文件。", False
            response.raise_for_status()
            return response, False
        except SSLError:
            try:
                response = requests.get(url, headers=COMMON_HEADERS, timeout=timeout, verify=False)
                response.raise_for_status()
                return response, True
            except Exception as e:
                return f"SSL 错误且备选方案失败: {str(e)}", False
        except RequestException as e:
            if i < retries: continue
            return f"网络抓取失败: {str(e)}", False
    return "未知抓取错误", False
 def _extract_page_text(html: str, max_chars: int) -> Tuple[str, str]:
    soup = BeautifulSoup(html, 'html.parser')
    for tag in soup(['script', 'style', 'noscript', 'header', 'footer', 'nav']):
        tag.decompose()
    title = soup.title.string.strip() if soup.title and soup.title.string else '无标题'
    body_text = _normalize_text(soup.get_text(separator='\n'))
    return title, body_text[:max_chars]
@register_tool('web_fetch', allow_overwrite=True)
 class WebFetchTool(BaseTool):
    description = '抓取网页正文并返回可读文本。'
    parameters = {
        'type': 'object',
        'properties': {
            'url': {'type': 'string', 'description': '网页链接'},
            'max_chars': {'type': 'integer', 'description': '返回最大字符数', 'default': DEFAULT_MAX_CHARS}
        },
        'required': ['url'],
    }
    def call(self, params: Union[str, dict], **kwargs) -> str:
        params = self._verify_json_format_args(params)
        url = params['url'].strip()
        max_chars = int(params.get('max_chars', DEFAULT_MAX_CHARS))
        result, insecure = _fetch_page(url)
        if isinstance(result, str):
            return result
        title, body_text = _extract_page_text(result.text, max_chars)
        insecure_note = '（注意：使用了非安全连接）\n' if insecure else ''
        return f'标题: {title}\n链接: {url}\n{insecure_note}\n{body_text}'
@register_tool('web_extractor', allow_overwrite=True)
 class WebExtractorTool(BaseTool):
    description = '提取单个网页正文。'
    parameters = {
        'type': 'object',
        'properties': {
            'url': {'type': 'string', 'description': '网页链接'},
            'max_chars': {'type': 'integer', 'description': '返回最大字符数', 'default': DEFAULT_MAX_CHARS}
        },
        'required': ['url'],
    }
    def call(self, params: Union[str, dict], **kwargs) -> str:
        # 复用 WebFetchTool 的逻辑，但作为独立的类注册
        fetcher = WebFetchTool(self.cfg)
        return fetcher.call(params, **kwargs)
--- a/agent_runtime/workflow_tools.py
+++ b/agent_runtime/workflow_tools.py
@@ -0,0 +1,170 @@
 import json
 import os
 import subprocess
 from datetime import datetime
 from pathlib import Path
 from typing import Any, Dict, Union
 from qwen_agent.tools.base import BaseTool, register_tool
 ROOT_DIR = Path(__file__).resolve().parents[1]
 DATA_DIR = ROOT_DIR / '.tmp' / 'super_agent_data'
 MEMORY_FILE = DATA_DIR / 'memory.json'
 TODO_DIR = DATA_DIR / 'todos'
 TASK_FILE = DATA_DIR / 'tasks.jsonl'
 def _build_shell_command(command: str) -> list[str]:
    if os.name == 'nt':
        return ['powershell.exe', '-NoProfile', '-Command', command]
    return ['bash', '-lc', command]
 def _ensure_data_dirs() -> None:
    DATA_DIR.mkdir(parents=True, exist_ok=True)
    TODO_DIR.mkdir(parents=True, exist_ok=True)
    if not MEMORY_FILE.exists():
        MEMORY_FILE.write_text('{}', encoding='utf-8')
 def _load_memory() -> Dict[str, Any]:
    _ensure_data_dirs()
    return json.loads(MEMORY_FILE.read_text(encoding='utf-8'))
 def _save_memory(data: Dict[str, Any]) -> None:
    _ensure_data_dirs()
    MEMORY_FILE.write_text(json.dumps(data, ensure_ascii=False, indent=2), encoding='utf-8')
@register_tool('save_memory', allow_overwrite=True)
 class SaveMemoryTool(BaseTool):
    description = '保存一条长期记忆，按 key 覆盖写入。'
    parameters = {
        'type': 'object',
        'properties': {
            'key': {
                'type': 'string',
                'description': '记忆键名'
            },
            'value': {
                'type': 'string',
                'description': '记忆内容'
            }
        },
        'required': ['key', 'value'],
    }
    def call(self, params: Union[str, dict], **kwargs) -> str:
        params = self._verify_json_format_args(params)
        key = params['key'].strip()
        if not key:
            raise ValueError('key 不能为空')
        memory = _load_memory()
        memory[key] = params['value']
        _save_memory(memory)
        return f'已保存记忆: {key}'
@register_tool('read_memory', allow_overwrite=True)
 class ReadMemoryTool(BaseTool):
    description = '读取长期记忆，支持读取单个 key 或全部。'
    parameters = {
        'type': 'object',
        'properties': {
            'key': {
                'type': 'string',
                'description': '可选，不传则返回全部记忆'
            }
        },
        'required': [],
    }
    def call(self, params: Union[str, dict], **kwargs) -> str:
        params = self._verify_json_format_args(params)
        memory = _load_memory()
        key = params.get('key')
        if key:
            return json.dumps({key: memory.get(key)}, ensure_ascii=False, indent=2)
        return json.dumps(memory, ensure_ascii=False, indent=2)
@register_tool('todo_write', allow_overwrite=True)
 class TodoWriteTool(BaseTool):
    description = '写入任务清单文件。'
    parameters = {
        'type': 'object',
        'properties': {
            'title': {
                'type': 'string',
                'description': '清单标题'
            },
            'items': {
                'type': 'array',
                'items': {
                    'type': 'string'
                },
                'description': '任务项数组'
            }
        },
        'required': ['title', 'items'],
    }
    def call(self, params: Union[str, dict], **kwargs) -> str:
        params = self._verify_json_format_args(params)
        _ensure_data_dirs()
        ts = datetime.now().strftime('%Y%m%d_%H%M%S')
        safe_title = ''.join(ch if ch.isalnum() else '_' for ch in params['title'])[:40]
        todo_path = TODO_DIR / f'{ts}_{safe_title}.md'
        lines = [f'# {params["title"]}', '']
        for item in params['items']:
            lines.append(f'- [ ] {item}')
        todo_path.write_text('\n'.join(lines), encoding='utf-8')
        return f'任务清单已写入: {todo_path}'
@register_tool('task', allow_overwrite=True)
 class TaskTool(BaseTool):
    description = '登记任务并可选执行命令，返回执行结果。'
    parameters = {
        'type': 'object',
        'properties': {
            'task_name': {
                'type': 'string',
                'description': '任务名称'
            },
            'notes': {
                'type': 'string',
                'description': '任务说明'
            },
            'command': {
                'type': 'string',
                'description': '可选，执行命令'
            }
        },
        'required': ['task_name'],
    }
    def call(self, params: Union[str, dict], **kwargs) -> str:
        params = self._verify_json_format_args(params)
        _ensure_data_dirs()
        event = {
            'time': datetime.now().isoformat(timespec='seconds'),
            'task_name': params['task_name'],
            'notes': params.get('notes', ''),
            'command': params.get('command', ''),
        }
        result = None
        command = params.get('command')
        if command:
            run = subprocess.run(_build_shell_command(command), text=True, capture_output=True, check=False)
            result = {
                'returncode': run.returncode,
                'stdout': run.stdout,
                'stderr': run.stderr,
            }
            event['result'] = result
        with TASK_FILE.open('a', encoding='utf-8') as fp:
            fp.write(json.dumps(event, ensure_ascii=False) + '\n')
        payload = {'saved_to': str(TASK_FILE), 'task': event, 'command_result': result}
        return json.dumps(payload, ensure_ascii=False, indent=2)
--- a/agent_runtime/write_tools.py
+++ b/agent_runtime/write_tools.py
@@ -0,0 +1,43 @@
 import os
 import json
 from pathlib import Path
 from typing import Union
 from qwen_agent.tools.base import BaseTool, register_tool
 def _split_items(raw: str) -> list[str]:
    return [item.strip() for item in raw.split(';') if item.strip()]
 def _resolve_write_roots() -> tuple[Path, ...]:
    roots_value = os.getenv('WRITEABLE_FS_ROOTS', '')
    return tuple(Path(os.path.expanduser(item)).resolve() for item in _split_items(roots_value))
@register_tool('write_file', allow_overwrite=True)
 class WriteFileTool(BaseTool):
    description = '文件写入工具。只要路径在白名单内，即可直接创建或覆盖文件。'
    parameters = {
        'type': 'object',
        'properties': {
            'path': {'type': 'string', 'description': '目标绝对路径'},
            'content': {'type': 'string', 'description': '要写入的完整内容'}
        },
        'required': ['path', 'content'],
    }
    def call(self, params: Union[str, dict], **kwargs) -> str:
        params = self._verify_json_format_args(params)
        target = Path(os.path.expanduser(str(params['path']))).resolve()
        content = str(params.get('content', ''))
        # 核心防线：检查是否在白名单内
        roots = _resolve_write_roots()
        if not any(target.is_relative_to(root) for root in roots):
            allowed = ", ".join(str(r) for r in roots)
            return f"拒绝写入：路径不在白名单内。允许范围：{allowed}"
        try:
            target.parent.mkdir(parents=True, exist_ok=True)
            with open(target, 'w', encoding='utf-8') as f:
                f.write(content)
            return f"✅ 成功：内容已保存至 {target}"
        except Exception as e:
            return f"写入失败：{str(e)}"
--- a/bootstrap.bat
+++ b/bootstrap.bat
@@ -0,0 +1,13 @@
@echo off
 setlocal
 set SCRIPT_DIR=%~dp0
 powershell.exe -NoProfile -ExecutionPolicy Bypass -File "%SCRIPT_DIR%install.ps1" %*
 if errorlevel 1 (
  echo.
  echo [bootstrap] Install failed.
  exit /b 1
 )
 echo.
 echo [bootstrap] Install completed.
 echo [bootstrap] Start command: .\start_8080_toolhub_stack.cmd start
 exit /b 0
--- a/bootstrap_q8.bat
+++ b/bootstrap_q8.bat
@@ -0,0 +1,13 @@
@echo off
 setlocal
 set SCRIPT_DIR=%~dp0
 powershell.exe -NoProfile -ExecutionPolicy Bypass -File "%SCRIPT_DIR%install_q8.ps1" %*
 if errorlevel 1 (
  echo.
  echo [bootstrap_q8] Q8 install failed.
  exit /b 1
 )
 echo.
 echo [bootstrap_q8] Q8 install started or completed.
 echo [bootstrap_q8] Start command: .\start_8080_toolhub_stack.cmd start
 exit /b 0
--- a/compose.yml
+++ b/compose.yml
@@ -0,0 +1,56 @@
 services:
  gateway:
    build:
      context: .
      dockerfile: docker/gateway/Dockerfile
    restart: unless-stopped
    environment:
      GATEWAY_HOST: 0.0.0.0
      GATEWAY_PORT: 8080
      BACKEND_BASE: http://backend:8081
      MODEL_SERVER: http://backend:8081/v1
      BACKEND_WAIT_HINT: docker compose logs -f backend
      ACCESS_URLS: http://127.0.0.1:${GATEWAY_PORT:-8080}
      READONLY_FS_ROOTS: /workspace
    ports:
      - "${GATEWAY_PORT:-8080}:8080"
    volumes:
      - .:/workspace:ro
    depends_on:
      - backend
    healthcheck:
      test: ["CMD-SHELL", "curl -fsS http://127.0.0.1:8080/gateway/health >/dev/null || exit 1"]
      interval: 30s
      timeout: 5s
      retries: 3
      start_period: 10s
  backend:
    build:
      context: .
      dockerfile: docker/backend/Dockerfile
    restart: unless-stopped
    environment:
      HOST: 0.0.0.0
      PORT: 8081
      THINK_MODE: ${THINK_MODE:-think-on}
      CTX_SIZE: ${CTX_SIZE:-16384}
      IMAGE_MIN_TOKENS: ${IMAGE_MIN_TOKENS:-256}
      IMAGE_MAX_TOKENS: ${IMAGE_MAX_TOKENS:-1024}
      MMPROJ_OFFLOAD: ${MMPROJ_OFFLOAD:-off}
      MODEL_PATH: /models/model.gguf
      MMPROJ_PATH: /models/mmproj.gguf
      MODEL_GGUF_URL: ${MODEL_GGUF_URL:-https://huggingface.co/lmstudio-community/Qwen3.5-9B-GGUF/resolve/main/Qwen3.5-9B-Q4_K_M.gguf}
      MODEL_MMPROJ_URL: ${MODEL_MMPROJ_URL:-https://huggingface.co/lmstudio-community/Qwen3.5-9B-GGUF/resolve/main/mmproj-Qwen3.5-9B-BF16.gguf}
      MODEL_GGUF_SHA256: ${MODEL_GGUF_SHA256:-}
      MODEL_MMPROJ_SHA256: ${MODEL_MMPROJ_SHA256:-}
    expose:
      - "8081"
    volumes:
      - toolhub-models:/models
    gpus: all
    healthcheck:
      test: ["NONE"]
 volumes:
  toolhub-models:
--- a/docker/backend/Dockerfile
+++ b/docker/backend/Dockerfile
@@ -0,0 +1,15 @@
 FROM ghcr.io/ggml-org/llama.cpp:server-cuda
 USER root
 WORKDIR /app
 RUN apt-get update \
    && apt-get install -y --no-install-recommends curl ca-certificates \
    && rm -rf /var/lib/apt/lists/*
 COPY docker/backend/entrypoint.sh /usr/local/bin/toolhub-backend-entrypoint.sh
 COPY docker/backend/entrypoint_helpers.sh /usr/local/bin/toolhub-backend-helpers.sh
 RUN chmod +x /usr/local/bin/toolhub-backend-entrypoint.sh /usr/local/bin/toolhub-backend-helpers.sh
 ENTRYPOINT ["/usr/local/bin/toolhub-backend-entrypoint.sh"]
--- a/docker/backend/entrypoint.sh
+++ b/docker/backend/entrypoint.sh
@@ -0,0 +1,176 @@
 #!/usr/bin/env bash
 set -euo pipefail
 DEFAULT_GGUF_URL="https://huggingface.co/lmstudio-community/Qwen3.5-9B-GGUF/resolve/main/Qwen3.5-9B-Q4_K_M.gguf"
 DEFAULT_MMPROJ_URL="https://huggingface.co/lmstudio-community/Qwen3.5-9B-GGUF/resolve/main/mmproj-Qwen3.5-9B-BF16.gguf"
 BACKEND_READY_TIMEOUT_SEC=180
 RECENT_LOG_LINE_COUNT=80
 . /usr/local/bin/toolhub-backend-helpers.sh
 log_step() {
  printf '[toolhub-backend] %s\n' "$1"
 }
 log_stage() {
  log_step "$1"
 }
 resolve_llama_server_bin() {
  local candidate=""
  if candidate="$(command -v llama-server 2>/dev/null)"; then
    printf '%s\n' "$candidate"
    return
  fi
  candidate="/app/llama-server"
  if [[ -x "$candidate" ]]; then
    printf '%s\n' "$candidate"
    return
  fi
  printf '未找到 llama-server，可执行文件既不在 PATH 中，也不在 /app/llama-server\n' >&2
  exit 1
 }
 require_positive_integer() {
  local key="$1"
  local value="$2"
  if [[ ! "$value" =~ ^[0-9]+$ ]] || [[ "$value" -le 0 ]]; then
    printf '%s 必须是正整数，收到: %s\n' "$key" "$value" >&2
    exit 1
  fi
 }
 verify_sha256() {
  local path="$1"
  local expected="$2"
  if [[ -z "$expected" ]]; then
    return
  fi
  local actual
  actual="$(sha256sum "$path" | awk '{print $1}')"
  if [[ "${actual,,}" != "${expected,,}" ]]; then
    printf 'SHA256 校验失败: %s\n' "$path" >&2
    printf '期望: %s\n' "$expected" >&2
    printf '实际: %s\n' "$actual" >&2
    exit 1
  fi
 }
 resolve_runtime_profile() {
  case "${THINK_MODE:-think-on}" in
    think-on)
      REASONING_BUDGET="-1"
      MAX_TOKENS="-1"
      ;;
    think-off)
      REASONING_BUDGET="0"
      MAX_TOKENS="2048"
      ;;
    *)
      printf '不支持的 THINK_MODE: %s\n' "${THINK_MODE:-}" >&2
      exit 1
      ;;
  esac
 }
 main() {
  local host_addr="${HOST:-0.0.0.0}"
  local port_num="${PORT:-8081}"
  local model_path="${MODEL_PATH:-/models/model.gguf}"
  local mmproj_path="${MMPROJ_PATH:-/models/mmproj.gguf}"
  local gguf_url="${MODEL_GGUF_URL:-$DEFAULT_GGUF_URL}"
  local mmproj_url="${MODEL_MMPROJ_URL:-$DEFAULT_MMPROJ_URL}"
  local ctx_size="${CTX_SIZE:-16384}"
  local image_min_tokens="${IMAGE_MIN_TOKENS:-256}"
  local image_max_tokens="${IMAGE_MAX_TOKENS:-1024}"
  local mmproj_offload="${MMPROJ_OFFLOAD:-off}"
  local backend_ready_timeout_sec="$BACKEND_READY_TIMEOUT_SEC"
  local llama_server_bin
  local runtime_dir="/tmp/toolhub-backend"
  local stdout_log="${runtime_dir}/llama-server.stdout.log"
  local stderr_log="${runtime_dir}/llama-server.stderr.log"
  local llama_pid
  log_stage '阶段 1/6: 检查运行参数'
  require_positive_integer "PORT" "$port_num"
  require_positive_integer "CTX_SIZE" "$ctx_size"
  require_positive_integer "IMAGE_MIN_TOKENS" "$image_min_tokens"
  require_positive_integer "IMAGE_MAX_TOKENS" "$image_max_tokens"
  require_positive_integer "BACKEND_READY_TIMEOUT_SEC" "$backend_ready_timeout_sec"
  if (( image_min_tokens > image_max_tokens )); then
    printf 'IMAGE_MIN_TOKENS 不能大于 IMAGE_MAX_TOKENS\n' >&2
    exit 1
  fi
  if [[ "$mmproj_offload" != "on" && "$mmproj_offload" != "off" ]]; then
    printf 'MMPROJ_OFFLOAD 仅支持 on 或 off，收到: %s\n' "$mmproj_offload" >&2
    exit 1
  fi
  resolve_runtime_profile
  llama_server_bin="$(resolve_llama_server_bin)"
  mkdir -p "$runtime_dir"
  : > "$stdout_log"
  : > "$stderr_log"
  log_stage '阶段 2/6: 检查或下载主模型'
  download_if_missing "$model_path" "$gguf_url" "主模型"
  log_stage '阶段 3/6: 检查或下载视觉模型'
  download_if_missing "$mmproj_path" "$mmproj_url" "视觉模型"
  log_stage '阶段 4/6: 校验模型文件'
  verify_sha256 "$model_path" "${MODEL_GGUF_SHA256:-}"
  verify_sha256 "$mmproj_path" "${MODEL_MMPROJ_SHA256:-}"
  local args=(
    -m "$model_path"
    -mm "$mmproj_path"
    --n-gpu-layers all
    --flash-attn on
    --fit on
    --fit-target 256
    --temp 1.0
    --top-p 0.95
    --top-k 20
    --min-p 0.1
    --presence-penalty 1.5
    --repeat-penalty 1.05
    -n "$MAX_TOKENS"
    --reasoning-budget "$REASONING_BUDGET"
    -c "$ctx_size"
    --image-min-tokens "$image_min_tokens"
    --image-max-tokens "$image_max_tokens"
    --host "$host_addr"
    --port "$port_num"
    --webui
  )
  if [[ "$mmproj_offload" == "off" ]]; then
    args+=(--no-mmproj-offload)
  else
    args+=(--mmproj-offload)
  fi
  log_stage '阶段 5/6: 启动 llama-server'
  log_step "启动参数: host=$host_addr port=$port_num think=${THINK_MODE:-think-on}"
  "$llama_server_bin" "${args[@]}" >"$stdout_log" 2>"$stderr_log" &
  llama_pid=$!
  log_step "llama-server 已启动: PID ${llama_pid}"
  log_stage '阶段 6/6: 等待模型加载到 GPU'
  if ! wait_for_backend_ready "$port_num" "$backend_ready_timeout_sec" "$llama_pid" "$stdout_log" "$stderr_log"; then
    if kill -0 "$llama_pid" 2>/dev/null; then
      kill "$llama_pid" 2>/dev/null || true
      wait "$llama_pid" 2>/dev/null || true
    fi
    exit 1
  fi
  wait "$llama_pid"
 }
 main "$@"
--- a/docker/backend/entrypoint_helpers.sh
+++ b/docker/backend/entrypoint_helpers.sh
@@ -0,0 +1,156 @@
 #!/usr/bin/env bash
 show_recent_server_logs() {
  local stdout_log="$1"
  local stderr_log="$2"
  log_step '后端启动失败，最近日志如下'
  if [[ -s "$stdout_log" ]]; then
    log_step '=== 最近标准输出 ==='
    tail -n "$RECENT_LOG_LINE_COUNT" "$stdout_log"
  fi
  if [[ -s "$stderr_log" ]]; then
    log_step '=== 最近标准错误 ==='
    tail -n "$RECENT_LOG_LINE_COUNT" "$stderr_log" >&2
  fi
 }
 probe_backend_ready() {
  local port_num="$1"
  curl -fsS "http://127.0.0.1:${port_num}/health" >/dev/null 2>&1
 }
 wait_for_backend_ready() {
  local port_num="$1"
  local timeout_sec="$2"
  local llama_pid="$3"
  local stdout_log="$4"
  local stderr_log="$5"
  local elapsed_sec=0
  while (( elapsed_sec < timeout_sec )); do
    if ! kill -0 "$llama_pid" 2>/dev/null; then
      log_step '后端启动失败: llama-server 进程已提前退出'
      show_recent_server_logs "$stdout_log" "$stderr_log"
      return 1
    fi
    if probe_backend_ready "$port_num"; then
      log_step '后端健康检查已通过，网关会继续完成预热'
      return 0
    fi
    log_step "等待模型加载到 GPU... ${elapsed_sec}/${timeout_sec} 秒"
    sleep 1
    elapsed_sec=$((elapsed_sec + 1))
  done
  log_step "后端在 ${timeout_sec} 秒内未就绪"
  show_recent_server_logs "$stdout_log" "$stderr_log"
  return 1
 }
 format_bytes() {
  local bytes="$1"
  awk -v bytes="$bytes" '
    BEGIN {
      split("B KiB MiB GiB TiB", units, " ")
      value = bytes + 0
      idx = 1
      while (value >= 1024 && idx < 5) {
        value /= 1024
        idx++
      }
      printf "%.1f %s", value, units[idx]
    }
  '
 }
 resolve_content_length() {
  local url="$1"
  curl -fsSLI "$url" \
    | tr -d '\r' \
    | awk 'tolower($1) == "content-length:" { print $2 }' \
    | tail -n 1
 }
 read_file_size() {
  local path="$1"
  if [[ -f "$path" ]]; then
    stat -c '%s' "$path"
    return
  fi
  printf '0\n'
 }
 render_progress_message() {
  local label="$1"
  local current_bytes="$2"
  local total_bytes="$3"
  local speed_bytes="$4"
  local current_text
  local total_text
  local speed_text
  current_text="$(format_bytes "$current_bytes")"
  speed_text="$(format_bytes "$speed_bytes")"
  total_text="$(format_bytes "${total_bytes:-0}")"
  if [[ -n "$total_bytes" && "$total_bytes" =~ ^[0-9]+$ && "$total_bytes" -gt 0 ]]; then
    awk -v label="$label" -v current="$current_bytes" -v total="$total_bytes" \
      -v current_text="$current_text" -v total_text="$total_text" -v speed_text="$speed_text" '
      BEGIN {
        pct = (current / total) * 100
        printf "下载%s: %.1f%%  %s / %s  %s/s\n",
          label, pct, current_text, total_text, speed_text
      }
    '
    return
  fi
  printf '下载%s: 已下载 %s  %s/s\n' "$label" "$current_text" "$speed_text"
 }
 download_if_missing() {
  local path="$1"
  local url="$2"
  local label="$3"
  local temp_path="${path}.part"
  local total_bytes=""
  local previous_bytes=0
  local current_bytes=0
  local speed_bytes=0
  local curl_pid
  mkdir -p "$(dirname "$path")"
  if [[ -f "$path" ]]; then
    log_step "检测到现有${label}，跳过下载"
    return
  fi
  log_step "下载${label}: $url"
  total_bytes="$(resolve_content_length "$url" || true)"
  previous_bytes="$(read_file_size "$temp_path")"
  curl --fail --location --retry 5 --retry-delay 2 --retry-connrefused \
    --continue-at - --output "$temp_path" --silent --show-error "$url" &
  curl_pid=$!
  while kill -0 "$curl_pid" 2>/dev/null; do
    sleep 2
    current_bytes="$(read_file_size "$temp_path")"
    speed_bytes=$(( (current_bytes - previous_bytes) / 2 ))
    if (( speed_bytes < 0 )); then
      speed_bytes=0
    fi
    log_step "$(render_progress_message "$label" "$current_bytes" "$total_bytes" "$speed_bytes")"
    previous_bytes="$current_bytes"
  done
  if ! wait "$curl_pid"; then
    printf '下载失败: %s\n' "$url" >&2
    exit 1
  fi
  current_bytes="$(read_file_size "$temp_path")"
  log_step "下载${label}完成: $(format_bytes "$current_bytes")"
  mv "$temp_path" "$path"
 }
--- a/docker/gateway/Dockerfile
+++ b/docker/gateway/Dockerfile
@@ -0,0 +1,14 @@
 FROM python:3.11-slim
 ENV PYTHONUNBUFFERED=1
 WORKDIR /app
 COPY requirements.txt /app/requirements.txt
 RUN python -m pip install --no-cache-dir --upgrade pip wheel \
    && python -m pip install --no-cache-dir -r /app/requirements.txt
 COPY . /app
 CMD ["python", "run_8080_toolhub_gateway.py", "--host", "0.0.0.0", "--port", "8080"]
--- a/docs/DOCKER_COMPOSE.md
+++ b/docs/DOCKER_COMPOSE.md
@@ -0,0 +1,84 @@
 # Docker Compose
 ToolHub 提供 Docker Compose 入口，适合 Linux 主机部署，或不想在 Windows 宿主机安装 Python 的用户。这是一条可选路线，不替代 Windows 原生脚本主线。
 ---
 ## 前提条件
 - Docker 和 Docker Compose 已安装
 - NVIDIA GPU 驱动已安装，且 NVIDIA Container Toolkit 可用
 验证 GPU 容器环境：
 ```bash
 docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
 ```
 ---
 ## 启动与停止
 ```bash
 docker compose up --build         # 前台启动
 docker compose up --build -d      # 后台启动
 docker compose down               # 停止
 ```
 首次启动时后端容器会自动下载模型文件，之后缓存在 Docker 命名卷 `toolhub-models` 中。
 启动后浏览器访问 [http://127.0.0.1:8080](http://127.0.0.1:8080)。
 如果后端还在下载模型或加载模型到 GPU，浏览器会先显示准备中页面。此时直接查看：
 ```bash
 docker compose logs -f backend
 ```
 确认下载和加载进度即可。
 ---
 ## 容器结构
 Compose 启动两个服务：
 | 服务 | 镜像基础 | 职责 |
 | --- | --- | --- |
 | `gateway` | `python:3.11-slim` | 网关层，提供网页入口和 OpenAI 兼容 API（端口 8080） |
 | `backend` | `ghcr.io/ggml-org/llama.cpp:server-cuda` | 模型后端，GPU 推理（端口 8081） |
 架构与 Windows 原生路线一致：浏览器访问网关，网关将推理请求转发给后端。网关容器通过只读方式挂载项目目录（`/workspace`），文件系统访问行为与 Windows 路线保持一致。
 ---
 ## 模型管理
 模型不会打进镜像，由后端容器首次启动时从 Hugging Face 下载，缓存在命名卷 `toolhub-models` 中。默认下载 Q4_K_M 量化。
 如需切换到 Q8，在 `.env` 中将 `MODEL_GGUF_URL` 改为 Q8 下载地址，也可以先在宿主机执行 `.\install_q8.cmd` 让它自动修改，然后重启容器：
 ```bash
 docker compose down
 docker compose up --build -d
 ```
 > 容器内模型缓存（命名卷）和 Windows 路线的本地缓存（`.tmp/models/`）是两套独立缓存，互不影响。
 ---
 ## 配置
 Compose 通过 `.env` 文件读取配置。以下变量会影响容器行为：
 | 变量 | 默认值 | 说明 |
 | --- | --- | --- |
 | `GATEWAY_PORT` | `8080` | 网关对外端口 |
 | `BACKEND_PORT` | `8081` | 后端对外端口 |
 | `THINK_MODE` | `think-on` | 思考模式 |
 | `CTX_SIZE` | `16384` | 上下文窗口大小 |
 | `IMAGE_MIN_TOKENS` | `256` | 图像最小 token 数 |
 | `IMAGE_MAX_TOKENS` | `1024` | 图像最大 token 数 |
 | `MMPROJ_OFFLOAD` | `off` | 视觉投影卸载开关 |
 修改 `.env` 后重启容器生效。
--- a/docs/QUICKSTART.md
+++ b/docs/QUICKSTART.md
@@ -0,0 +1,137 @@
 # 快速开始
 从零到能用的完整说明。默认路线为 Windows 原生，WSL 和 Docker Compose 见末尾。
 ---
 ## 系统要求
 | 项目 | 要求 |
 | --- | --- |
 | 操作系统 | Windows 10 / 11 |
 | GPU | NVIDIA，驱动 ≥ 525，建议 ≥ 8 GB 显存 |
 | Python | 3.10+，已加入 PATH |
 | 磁盘 | ≥ 20 GB 可用空间 |
 > Q4_K_M 量化下模型加上视觉投影约占 6.1 GB 显存。8 GB 显存可正常运行。
 Docker Compose 路线不需要在宿主机安装 Python，系统要求见 [Docker Compose 文档](DOCKER_COMPOSE.md)。
 ---
 ## 1. 安装
 双击 `bootstrap.bat`，或在命令行执行：
 ```powershell
 .\install.cmd
 ```
 安装脚本会自动完成：
 - 创建 Python 虚拟环境并安装依赖
 - 下载 llama.cpp CUDA 运行时
 - 下载 Qwen3.5-9B Q4_K_M 主模型与 mmproj 视觉投影模型
 首次安装需要下载约 6 GB 模型文件，请确保网络通畅。
 ---
 ## 2. 启动
 ```powershell
 .\start_8080_toolhub_stack.cmd start
 ```
 首次启动需要 30–60 秒加载模型到 GPU。看到"栈已启动"即表示就绪。
 ---
 ## 3. 打开网页
 浏览器访问 [http://127.0.0.1:8080](http://127.0.0.1:8080)。
 ---
 ## 4. 服务管理
 ```powershell
 .\start_8080_toolhub_stack.cmd start     # 启动
 .\start_8080_toolhub_stack.cmd stop      # 停止
 .\start_8080_toolhub_stack.cmd restart   # 重启
 .\start_8080_toolhub_stack.cmd status    # 查看状态
 .\start_8080_toolhub_stack.cmd logs      # 查看日志
 ```
 ---
 ## 5. 可选：升级到 Q8 量化
 显存 ≥ 12 GB 时，可以切换到 Q8 获得更高推理精度。
 双击 `bootstrap_q8.bat`，或执行 `.\install_q8.cmd`。脚本会自动修改 `.env` 中的模型路径和下载地址，然后开始下载。视觉模型 mmproj 不需要更换。
 下载完成后执行 `.\start_8080_toolhub_stack.cmd restart` 切换。
 ---
 ## 6. 配置
 复制 `.env.example` 为 `.env`，按需修改，启动脚本会自动加载。
 常见调整：
 **切换思考模式：**
 ```powershell
 $env:THINK_MODE = 'think-off'; .\start_8080_toolhub_stack.cmd restart
 ```
 **缩小上下文以节省显存：**
 ```powershell
 $env:CTX_SIZE = '8192'; .\start_8080_toolhub_stack.cmd restart
 ```
 **扩大文件系统可读范围：** 修改 `.env` 中的 `READONLY_FS_ROOTS`，多个目录用分号分隔。留空时默认只读项目目录。
 修改后执行 `.\start_8080_toolhub_stack.cmd restart` 生效。
 ---
 ## 7. API 调用
 网关兼容 OpenAI API 格式：
 ```bash
 curl http://127.0.0.1:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen3.5-9B-Q4_K_M",
    "stream": true,
    "messages": [
      {"role": "user", "content": "今天有什么科技新闻？"}
    ]
  }'
 ```
 支持 OpenAI API 的客户端可将 Base URL 设为 `http://127.0.0.1:8080/v1`。
 ---
 ## 其他入口
 ### WSL
 WSL 入口复用 Windows 主链路，不会创建独立的 Linux 虚拟环境。
 ```bash
 ./install.sh                             # 安装
 ./start_8080_toolhub_stack.sh start      # 启动
 ```
 服务管理命令与 Windows 一致，把 `.cmd` 换成 `.sh` 即可。
 ### Docker Compose
 不需要在宿主机安装 Python 或手动下载模型。详见 [Docker Compose 文档](DOCKER_COMPOSE.md)。
--- a/docs/RELEASE_NOTES.md
+++ b/docs/RELEASE_NOTES.md
@@ -0,0 +1,21 @@
 # RELEASE NOTES
 ## v1.0.0
 发布日期：2026-03-04
 主要内容：
 - 交付范围固定为 Qwen3.5-9B
 - 入口固定为 8080 网页
 - 工具能力集成到 8080 网关
 - 支持流式输出与思维链输出
 - 回答下方显示性能统计
 - 思维链 token 计入统计
 - 输入栏上方实时统计隐藏
 - 提供安装脚本、启动脚本和文档
 限制：
 - 仅支持 9B 模型
 - 仅支持本机部署
--- a/docs/TROUBLESHOOTING.md
+++ b/docs/TROUBLESHOOTING.md
@@ -0,0 +1,116 @@
 # 常见问题与排障
 ---
 ## 1. PowerShell 报脚本执行策略错误
 看到 `PSSecurityException` 或 `about_Execution_Policies`，改用 `.cmd` 入口即可：
 ```powershell
 .\install.cmd
 .\start_8080_toolhub_stack.cmd start
 ```
 如果一定要直接调用 `.ps1`：
 ```powershell
 powershell -NoProfile -ExecutionPolicy Bypass -File .\install.ps1
 ```
 ---
 ## 2. 提示 llama-server.exe 不存在
 重新执行安装脚本：
 ```powershell
 .\install.cmd
 ```
 完成后确认文件存在：`.tmp\llama_win_cuda\llama-server.exe`。
 ---
 ## 3. 提示模型文件不完整
 检查以下两个文件是否存在且大小正常：
 - `.env` 里 `MODEL_PATH` 指向的主模型文件，默认为 `Qwen3.5-9B-Q4_K_M.gguf`，执行过 Q8 安装则为 `Qwen3.5-9B-Q8_0.gguf`
 - `.tmp\models\crossrepo\lmstudio-community__Qwen3.5-9B-GGUF\mmproj-Qwen3.5-9B-BF16.gguf`
 文件残缺或为 0 字节时，删除后重新执行 `.\install.cmd`。
 ---
 ## 4. 启动后模型未就绪
 ```powershell
 .\start_8080_toolhub_stack.cmd status
 .\start_8080_toolhub_stack.cmd logs
 ```
 首次启动需要 30–60 秒加载模型，刚启动不久的话稍等片刻。
 ---
 ## 5. 页面报内容编码错误
 ```powershell
 .\start_8080_toolhub_stack.cmd restart
 ```
 如果仍然出现，清浏览器缓存后刷新。
 ---
 ## 6. 显存不足
 Q4_K_M 量化下模型加上视觉投影约占 6.1 GB 显存。如果显存紧张：
 **缩小上下文窗口：**
 ```powershell
 $env:CTX_SIZE = '8192'; .\start_8080_toolhub_stack.cmd restart
 ```
 **降低图像 token 上限：**
 ```powershell
 $env:IMAGE_MAX_TOKENS = '512'; .\start_8080_toolhub_stack.cmd restart
 ```
 也可以直接修改 `.env` 里对应的值，然后重启。
 ---
 ## 7. 看不到回答下方的性能统计
 重启服务后发一条新消息即可看到。旧消息不会回填统计数据。
 ---
 ## 8. WSL 相关
 WSL 入口复用 Windows 主链路。如果 WSL 中找不到 `powershell.exe`，检查 WSL 配置中 `interop` 是否被禁用。
 ---
 ## 9. Docker Compose 相关
 ### 容器启动失败
 确认 GPU 容器环境可用：
 ```bash
 docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
 ```
 如果无法正常输出显卡信息，先解决 GPU 容器环境问题。
 ### 模型下载失败
 容器首次启动时自动下载模型。下载失败时可通过 `.env` 覆盖 `MODEL_GGUF_URL` 和 `MODEL_MMPROJ_URL` 指向更快的源，再执行 `docker compose up --build`。
 ### 端口冲突
 修改 `.env` 中的 `GATEWAY_PORT` 和 `BACKEND_PORT`，再重启容器。
--- a/env_config.ps1
+++ b/env_config.ps1
@@ -0,0 +1,120 @@
 function Normalize-EnvValue {
    param([string]$Value)
    $trimmed = $Value.Trim()
    if (-not $trimmed) {
        return ''
    }
    if ($trimmed.StartsWith('#')) {
        return ''
    }
    $hashIndex = $trimmed.IndexOf(' #')
    if ($hashIndex -ge 0) {
        $trimmed = $trimmed.Substring(0, $hashIndex).TrimEnd()
    }
    $hasQuotes = (
        ($trimmed.StartsWith('"') -and $trimmed.EndsWith('"')) -or
        ($trimmed.StartsWith("'") -and $trimmed.EndsWith("'"))
    )
    if ($hasQuotes -and $trimmed.Length -ge 2) {
        return $trimmed.Substring(1, $trimmed.Length - 2)
    }
    return $trimmed
 }
 function Import-EnvFile {
    param([string]$Path)
    if (-not (Test-Path $Path)) {
        return
    }
    foreach ($line in Get-Content -Path $Path -Encoding UTF8) {
        $trimmed = $line.Trim()
        if (-not $trimmed -or $trimmed.StartsWith('#')) {
            continue
        }
        $delimiter = $trimmed.IndexOf('=')
        if ($delimiter -lt 1) {
            continue
        }
        $key = $trimmed.Substring(0, $delimiter).Trim()
        $value = Normalize-EnvValue -Value ($trimmed.Substring($delimiter + 1))
        if (-not $key -or (Test-Path "Env:$key")) {
            continue
        }
        [Environment]::SetEnvironmentVariable($key, $value, 'Process')
    }
 }
 function Resolve-ManagedPath {
    param(
        [string]$BaseDir,
        [string]$Value,
        [string]$DefaultRelativePath
    )
    $effective = if ([string]::IsNullOrWhiteSpace($Value)) { $DefaultRelativePath } else { $Value.Trim() }
    if ([string]::IsNullOrWhiteSpace($effective)) {
        return ''
    }
    if ([System.IO.Path]::IsPathRooted($effective)) {
        return $effective
    }
    return [System.IO.Path]::GetFullPath((Join-Path $BaseDir $effective))
 }
 function Ensure-EnvFile {
    param(
        [string]$Path,
        [string]$TemplatePath
    )
    if (Test-Path $Path) {
        return
    }
    if (Test-Path $TemplatePath) {
        Copy-Item -Path $TemplatePath -Destination $Path -Force
        return
    }
    Set-Content -Path $Path -Value @() -Encoding UTF8
 }
 function Set-EnvFileValue {
    param(
        [string]$Path,
        [string]$Key,
        [string]$Value
    )
    $lines = [System.Collections.Generic.List[string]]::new()
    if (Test-Path $Path) {
        foreach ($line in Get-Content -Path $Path -Encoding UTF8) {
            $lines.Add([string]$line)
        }
    }
    $replacement = "$Key=$Value"
    $pattern = '^\s*' + [regex]::Escape($Key) + '\s*='
    $updated = $false
    for ($i = 0; $i -lt $lines.Count; $i++) {
        if ($lines[$i] -match $pattern) {
            $lines[$i] = $replacement
            $updated = $true
            break
        }
    }
    if (-not $updated) {
        if ($lines.Count -gt 0 -and -not [string]::IsNullOrWhiteSpace($lines[$lines.Count - 1])) {
            $lines.Add('')
        }
        $lines.Add($replacement)
    }
    Set-Content -Path $Path -Value $lines -Encoding UTF8
 }
--- a/install.cmd
+++ b/install.cmd
@@ -0,0 +1,5 @@
@echo off
 setlocal
 set SCRIPT_DIR=%~dp0
 powershell.exe -NoProfile -ExecutionPolicy Bypass -File "%SCRIPT_DIR%install.ps1" %*
 exit /b %ERRORLEVEL%
--- a/install.ps1
+++ b/install.ps1
@@ -0,0 +1,49 @@
 param(
  [switch]$Wsl
 )
 $ErrorActionPreference = 'Stop'
 $ScriptDir = Split-Path -Parent $MyInvocation.MyCommand.Path
 $WinInstaller = Join-Path $ScriptDir 'install.win.ps1'
 function Write-Step {
  param([string]$Message)
  Write-Host "[install] $Message"
 }
 function Invoke-WslInstaller {
  if (-not (Get-Command wsl.exe -ErrorAction SilentlyContinue)) {
    throw 'wsl.exe not found. Please install WSL first.'
  }
  Write-Step "Run install.sh inside WSL"
  $WslDir = (& wsl.exe wslpath -a "$ScriptDir").Trim()
  if ([string]::IsNullOrWhiteSpace($WslDir)) {
    throw 'Cannot convert current directory to a WSL path.'
  }
  $Cmd = "cd '$WslDir' && ./install.sh"
  & wsl.exe bash -lc $Cmd
  if ($LASTEXITCODE -ne 0) {
    throw "Install failed, exit code: $LASTEXITCODE"
  }
  Write-Step 'Install completed (WSL)'
  Write-Step 'Start command: ./start_8080_toolhub_stack.sh start'
 }
 function Invoke-WinInstaller {
  if (-not (Test-Path $WinInstaller)) {
    throw "Windows installer not found: $WinInstaller"
  }
  Write-Step 'Run install.win.ps1'
  & powershell.exe -NoProfile -ExecutionPolicy Bypass -File $WinInstaller
  if ($LASTEXITCODE -ne 0) {
    throw "Windows install failed, exit code: $LASTEXITCODE"
  }
  Write-Step 'Install completed (Windows)'
  Write-Step 'Start command: .\start_8080_toolhub_stack.cmd start'
 }
 if ($Wsl) {
  Invoke-WslInstaller
 } else {
  Invoke-WinInstaller
 }
--- a/install.sh
+++ b/install.sh
@@ -0,0 +1,96 @@
 #!/usr/bin/env bash
 set -euo pipefail
 ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 WIN_INSTALLER_PS1="$ROOT_DIR/install.win.ps1"
 print_usage() {
  cat <<'USAGE'
 用法:
  ./install.sh
 说明:
  这是 WSL 兼容入口。
  它会直接复用 Windows 安装主脚本，和 cmd / PowerShell 的安装结果保持一致。
 USAGE
 }
 to_win_path_if_needed() {
  local raw="$1"
  if [[ -z "$raw" ]]; then
    printf ''
    return
  fi
  if [[ "$raw" == /* ]]; then
    wslpath -w "$raw"
    return
  fi
  printf '%s' "$raw"
 }
 ps_escape_single_quotes() {
  printf "%s" "$1" | sed "s/'/''/g"
 }
 require_windows_power_shell() {
  if ! command -v powershell.exe >/dev/null 2>&1; then
    echo "未找到 powershell.exe，WSL 兼容入口无法调用 Windows 安装器。"
    exit 1
  fi
  if [[ ! -f "$WIN_INSTALLER_PS1" ]]; then
    echo "缺少安装脚本: $WIN_INSTALLER_PS1"
    exit 1
  fi
 }
 build_env_overrides() {
  local -n out_ref=$1
  out_ref=()
  for key in PYTHON_BIN LLAMA_WIN_CUDA_URL LLAMA_WIN_CUDART_URL MODEL_GGUF_URL MODEL_MMPROJ_URL MODEL_GGUF_SHA256 MODEL_MMPROJ_SHA256; do
    if [[ -z "${!key-}" ]]; then
      continue
    fi
    local value="${!key}"
    if [[ "$key" == "PYTHON_BIN" ]]; then
      value="$(to_win_path_if_needed "$value")"
    fi
    out_ref+=("$key=$value")
  done
 }
 build_ps_env_setup() {
  local -n env_ref=$1
  local lines=()
  local item key value escaped_value
  for item in "${env_ref[@]}"; do
    key="${item%%=*}"
    value="${item#*=}"
    escaped_value="$(ps_escape_single_quotes "$value")"
    lines+=("[Environment]::SetEnvironmentVariable('$key', '$escaped_value', 'Process')")
  done
  printf '%s; ' "${lines[@]}"
 }
 main() {
  if [[ "${1:-}" == "-h" || "${1:-}" == "--help" ]]; then
    print_usage
    exit 0
  fi
  require_windows_power_shell
  local installer_win
  installer_win="$(wslpath -w "$WIN_INSTALLER_PS1")"
  local env_overrides=()
  build_env_overrides env_overrides
  local ps_command
  local ps_env_setup
  ps_env_setup="$(build_ps_env_setup env_overrides)"
  ps_command="[Console]::InputEncoding = [System.Text.UTF8Encoding]::new(\$false); [Console]::OutputEncoding = [System.Text.UTF8Encoding]::new(\$false); chcp 65001 > \$null; ${ps_env_setup}& '$installer_win'"
  powershell.exe -NoProfile -ExecutionPolicy Bypass -Command "$ps_command"
 }
 main "$@"
--- a/install.win.ps1
+++ b/install.win.ps1
@@ -0,0 +1,438 @@
 param()
 $ErrorActionPreference = 'Stop'
 $ProgressPreference = 'SilentlyContinue'
 $ScriptDir = Split-Path -Parent $MyInvocation.MyCommand.Path
 $RootDir = (Resolve-Path $ScriptDir).Path
 $EnvConfig = Join-Path $RootDir 'env_config.ps1'
 if (Test-Path $EnvConfig) {
    . $EnvConfig
    Import-EnvFile -Path (Join-Path $RootDir '.env')
 }
 $VenvDir = Join-Path $RootDir '.venv-qwen35'
 $VenvPython = Join-Path $VenvDir 'Scripts\python.exe'
 $LlamaDir = Join-Path $RootDir '.tmp\llama_win_cuda'
 $ModelRelativeDir = '.tmp\models\crossrepo\lmstudio-community__Qwen3.5-9B-GGUF'
 $DefaultGgufRelativePath = Join-Path $ModelRelativeDir 'Qwen3.5-9B-Q4_K_M.gguf'
 $DefaultMmprojRelativePath = Join-Path $ModelRelativeDir 'mmproj-Qwen3.5-9B-BF16.gguf'
 $GgufPath = Resolve-ManagedPath -BaseDir $RootDir -Value $env:MODEL_PATH -DefaultRelativePath $DefaultGgufRelativePath
 $MmprojPath = Resolve-ManagedPath -BaseDir $RootDir -Value $env:MMPROJ_PATH -DefaultRelativePath $DefaultMmprojRelativePath
 $DefaultGgufUrl = 'https://huggingface.co/lmstudio-community/Qwen3.5-9B-GGUF/resolve/main/Qwen3.5-9B-Q4_K_M.gguf'
 $DefaultMmprojUrl = 'https://huggingface.co/lmstudio-community/Qwen3.5-9B-GGUF/resolve/main/mmproj-Qwen3.5-9B-BF16.gguf'
 $LlamaReleaseApiUrl = 'https://api.github.com/repos/ggml-org/llama.cpp/releases/latest'
 $LlamaReleasePageUrl = 'https://github.com/ggml-org/llama.cpp/releases/latest'
 $LlamaReleaseDownloadPrefix = 'https://github.com/ggml-org/llama.cpp/releases/latest/download/'
 $PreferredCudaBinAssetRegexes = @(
    '^llama-.*-bin-win-cuda-12\.4-x64\.zip$',
    '^llama-.*-bin-win-cuda-13\.1-x64\.zip$',
    '^llama-.*-bin-win-cuda-.*-x64\.zip$'
 )
 $PreferredCudaRuntimeAssetRegexes = @(
    '^cudart-llama-bin-win-cuda-12\.4-x64\.zip$',
    '^cudart-llama-bin-win-cuda-13\.1-x64\.zip$',
    '^cudart-llama-bin-win-cuda-.*-x64\.zip$'
 )
 function Write-Step {
    param([string]$Message)
    Write-Host "[install] $Message"
 }
 function New-PythonCandidate {
    param(
        [string]$Label,
        [string]$Command,
        [string[]]$Args = @()
    )
    return [PSCustomObject]@{
        Label = $Label
        Command = $Command
        Args = $Args
    }
 }
 function Get-PythonCandidates {
    $candidates = @()
    if ($env:PYTHON_BIN) {
        $candidates += New-PythonCandidate -Label "PYTHON_BIN=$($env:PYTHON_BIN)" -Command $env:PYTHON_BIN
    }
    $candidates += New-PythonCandidate -Label 'py -3' -Command 'py' -Args @('-3')
    $candidates += New-PythonCandidate -Label 'python' -Command 'python'
    $candidates += New-PythonCandidate -Label 'python3' -Command 'python3'
    return $candidates
 }
 function Test-PythonCandidate {
    param([object]$PythonSpec)
    $probeCode = 'import sys, venv; raise SystemExit(0 if sys.version_info >= (3, 10) else 3)'
    try {
        & $PythonSpec.Command @($PythonSpec.Args + @('-c', $probeCode)) *> $null
    } catch {
        Write-Step "跳过 Python 候选 $($PythonSpec.Label): $($_.Exception.Message)"
        return $false
    }
    if ($LASTEXITCODE -eq 0) {
        return $true
    }
    if ($LASTEXITCODE -eq 3) {
        Write-Step "跳过 Python 候选 $($PythonSpec.Label): Python 版本低于 3.10"
        return $false
    }
    Write-Step "跳过 Python 候选 $($PythonSpec.Label): 解释器不可用或缺少 venv 模块，exit code: $LASTEXITCODE"
    return $false
 }
 function Resolve-PythonSpec {
    foreach ($candidate in Get-PythonCandidates) {
        if (Test-PythonCandidate -PythonSpec $candidate) {
            Write-Step "使用 Python: $($candidate.Label)"
            return $candidate
        }
    }
    throw '未找到可用 Python，请安装 Python 3.10+ 并确保 venv 模块可用。'
 }
 function Invoke-CommandChecked {
    param(
        [string]$Command,
        [string[]]$CommandArgs,
        [string]$Action,
        [string]$DisplayName = $Command
    )
    try {
        & $Command @CommandArgs
    } catch {
        throw "$Action 失败。命令: $DisplayName。错误: $($_.Exception.Message)"
    }
    if ($LASTEXITCODE -ne 0) {
        throw "$Action 失败。命令: $DisplayName。exit code: $LASTEXITCODE"
    }
 }
 function Invoke-Python {
    param(
        [object]$PythonSpec,
        [string[]]$PythonArgs,
        [string]$Action
    )
    Invoke-CommandChecked -Command $PythonSpec.Command -CommandArgs ($PythonSpec.Args + $PythonArgs) -Action $Action -DisplayName $PythonSpec.Label
 }
 function Test-VenvPython {
    param([string]$Path)
    if (-not (Test-Path $Path)) {
        return $false
    }
    try {
        & $Path '-c' 'import sys' *> $null
    } catch {
        return $false
    }
    return $LASTEXITCODE -eq 0
 }
 function Ensure-Dir {
    param([string]$Path)
    if (-not (Test-Path $Path)) {
        New-Item -Path $Path -ItemType Directory -Force | Out-Null
    }
 }
 function Resolve-CurlPath {
    $curl = Get-Command curl.exe -ErrorAction SilentlyContinue
    if (-not $curl) {
        throw '未找到 curl.exe，无法执行带进度显示的下载。'
    }
    return $curl.Source
 }
 function Download-File {
    param(
        [string]$Url,
        [string]$OutFile
    )
    Write-Step "下载: $Url"
    $targetDir = Split-Path -Parent $OutFile
    if (-not [string]::IsNullOrWhiteSpace($targetDir)) {
        Ensure-Dir $targetDir
    }
    $tempFile = "$OutFile.part"
    $curlPath = Resolve-CurlPath
    $curlArgs = @(
        '--fail',
        '--location',
        '--retry', '5',
        '--retry-delay', '2',
        '--output', $tempFile
    )
    if (Test-Path $tempFile) {
        Write-Step '检测到未完成下载，继续传输'
        $curlArgs += @('--continue-at', '-')
    }
    $curlArgs += $Url
    try {
        & $curlPath @curlArgs
    } catch {
        throw "下载失败。命令: curl.exe。错误: $($_.Exception.Message)"
    }
    if ($LASTEXITCODE -ne 0) {
        throw "下载失败。命令: curl.exe。exit code: $LASTEXITCODE"
    }
    if (Test-Path $OutFile) {
        Remove-Item -Path $OutFile -Force -ErrorAction SilentlyContinue
    }
    Move-Item -Path $tempFile -Destination $OutFile -Force
 }
 function Verify-Sha256 {
    param(
        [string]$Path,
        [string]$Expected
    )
    if ([string]::IsNullOrWhiteSpace($Expected)) {
        return
    }
    $actual = (Get-FileHash -Path $Path -Algorithm SHA256).Hash.ToLowerInvariant()
    $exp = $Expected.ToLowerInvariant()
    if ($actual -ne $exp) {
        throw "SHA256 校验失败: $Path"
    }
 }
 function Get-LlamaReleaseAssetsFromApi {
    try {
        $release = Invoke-RestMethod -Uri $LlamaReleaseApiUrl -Method Get
        return @($release.assets | ForEach-Object {
            [PSCustomObject]@{
                Name = [string]$_.name
                Url = [string]$_.browser_download_url
            }
        })
    } catch {
        Write-Step "GitHub API 不可用，改用页面解析。原因: $($_.Exception.Message)"
        return @()
    }
 }
 function Get-LlamaReleaseAssetsFromHtml {
    try {
        $response = Invoke-WebRequest -Uri $LlamaReleasePageUrl -UseBasicParsing
    } catch {
        throw "获取 llama.cpp release 页面失败: $($_.Exception.Message)"
    }
    $content = [string]$response.Content
    $regex = '(?:cudart-)?llama-[^"''<> ]*bin-win-cuda-[0-9.]+-x64\.zip'
    $matches = [regex]::Matches($content, $regex, [System.Text.RegularExpressions.RegexOptions]::IgnoreCase)
    $seen = @{}
    $assets = @()
    foreach ($match in $matches) {
        $name = [string]$match.Value
        $key = $name.ToLowerInvariant()
        if ($seen.ContainsKey($key)) {
            continue
        }
        $seen[$key] = $true
        $assets += [PSCustomObject]@{
            Name = $name
            Url = "$LlamaReleaseDownloadPrefix$name"
        }
    }
    return $assets
 }
 function Select-LlamaAsset {
    param(
        [object[]]$Assets,
        [string[]]$Regexes
    )
    foreach ($regex in $Regexes) {
        $candidate = $Assets | Where-Object { $_.Name -match $regex } | Select-Object -First 1
        if ($candidate) {
            return $candidate
        }
    }
    return $null
 }
 function Resolve-LlamaCudaAssets {
    if ($env:LLAMA_WIN_CUDA_URL) {
        $binName = Split-Path -Path $env:LLAMA_WIN_CUDA_URL -Leaf
        $runtimeUrl = if ($env:LLAMA_WIN_CUDART_URL) { [string]$env:LLAMA_WIN_CUDART_URL } else { '' }
        $runtimeName = if ([string]::IsNullOrWhiteSpace($runtimeUrl)) { '' } else { (Split-Path -Path $runtimeUrl -Leaf) }
        Write-Step "使用自定义 llama.cpp 主包: $binName"
        if (-not [string]::IsNullOrWhiteSpace($runtimeName)) {
            Write-Step "使用自定义 CUDA 运行时包: $runtimeName"
        }
        return @{
            BinUrl = [string]$env:LLAMA_WIN_CUDA_URL
            RuntimeUrl = $runtimeUrl
        }
    }
    $assets = Get-LlamaReleaseAssetsFromApi
    if ($assets.Count -eq 0) {
        $assets = Get-LlamaReleaseAssetsFromHtml
    }
    if ($assets.Count -eq 0) {
        throw '自动解析 llama.cpp CUDA 资源失败，未读取到任何 win-cuda 包。'
    }
    $bin = Select-LlamaAsset -Assets $assets -Regexes $PreferredCudaBinAssetRegexes
    if (-not $bin) {
        $preview = (@($assets | Select-Object -ExpandProperty Name | Select-Object -First 12)) -join ', '
        throw "自动解析失败：未找到完整 CUDA 主包。可用资源: $preview"
    }
    $runtime = Select-LlamaAsset -Assets $assets -Regexes $PreferredCudaRuntimeAssetRegexes
    Write-Step "使用 llama.cpp 主包: $($bin.Name)"
    if ($runtime) {
        Write-Step "可选 CUDA 运行时包: $($runtime.Name)"
    }
    return @{
        BinUrl = [string]$bin.Url
        RuntimeUrl = if ($runtime) { [string]$runtime.Url } else { '' }
    }
 }
 function Get-LlamaRuntimeStatus {
    param([string]$BaseDir)
    $missing = @()
    $llamaExe = Test-Path (Join-Path $BaseDir 'llama-server.exe')
    if (-not $llamaExe) {
        $missing += 'llama-server.exe'
    }
    $cudaBackendDll = @(Get-ChildItem -Path $BaseDir -Filter 'ggml-cuda*.dll' -File -ErrorAction SilentlyContinue | Select-Object -First 1)
    if ($cudaBackendDll.Count -eq 0) {
        $missing += 'ggml-cuda*.dll'
    }
    $cudartDll = @(Get-ChildItem -Path $BaseDir -Filter 'cudart64_*.dll' -File -ErrorAction SilentlyContinue | Select-Object -First 1)
    if ($cudartDll.Count -eq 0) {
        $missing += 'cudart64_*.dll'
    }
    $cublasDll = @(Get-ChildItem -Path $BaseDir -Filter 'cublas64_*.dll' -File -ErrorAction SilentlyContinue | Select-Object -First 1)
    if ($cublasDll.Count -eq 0) {
        $missing += 'cublas64_*.dll'
    }
    return @{
        Ready = ($missing.Count -eq 0)
        Missing = $missing
    }
 }
 function Clear-LlamaRuntimeDirectory {
    if (-not (Test-Path $LlamaDir)) {
        Ensure-Dir $LlamaDir
        return
    }
    try {
        Get-ChildItem -Path $LlamaDir -Force -ErrorAction Stop | Remove-Item -Recurse -Force -ErrorAction Stop
    } catch {
        throw "清理 CUDA 运行时目录失败，请先停止服务后重试。目录: $LlamaDir。错误: $($_.Exception.Message)"
    }
 }
 function Ensure-PythonEnv {
    $python = Resolve-PythonSpec
    $venvExists = Test-Path $VenvDir
    $venvReady = Test-VenvPython -Path $VenvPython
    if ($venvExists -and -not $venvReady) {
        Write-Step "检测到不完整或非 Windows 虚拟环境，重建: $VenvDir"
        Remove-Item -Path $VenvDir -Recurse -Force -ErrorAction SilentlyContinue
        if (Test-Path $VenvDir) {
            Write-Step '目录无法直接删除，尝试 venv --clear 重建'
            Invoke-Python -PythonSpec $python -PythonArgs @('-m', 'venv', '--clear', $VenvDir) -Action '清空并重建虚拟环境'
        }
    }
    if (-not (Test-Path $VenvDir)) {
        Write-Step "创建虚拟环境: $VenvDir"
        Invoke-Python -PythonSpec $python -PythonArgs @('-m', 'venv', $VenvDir) -Action '创建虚拟环境'
    }
    if (-not (Test-VenvPython -Path $VenvPython)) {
        throw "虚拟环境未就绪: $VenvPython。请检查上面的 Python 或权限报错。"
    }
    Write-Step '安装 Python 依赖'
    Invoke-CommandChecked -Command $VenvPython -CommandArgs @('-m', 'pip', 'install', '--upgrade', 'pip', 'wheel') -Action '升级 pip 和 wheel'
    Invoke-CommandChecked -Command $VenvPython -CommandArgs @('-m', 'pip', 'install', '-r', (Join-Path $RootDir 'requirements.txt')) -Action '安装 requirements.txt 依赖'
 }
 function Ensure-LlamaRuntime {
    Ensure-Dir $LlamaDir
    $status = Get-LlamaRuntimeStatus -BaseDir $LlamaDir
    if ($status.Ready) {
        Write-Step '检测到完整 CUDA 运行时，跳过下载'
        return
    }
    Write-Step '检测到不完整 CUDA 运行时，清理后重装'
    Clear-LlamaRuntimeDirectory
    $assets = Resolve-LlamaCudaAssets
    $binZipPath = Join-Path $LlamaDir 'llama-win-cuda-bin.zip'
    Download-File -Url $assets.BinUrl -OutFile $binZipPath
    Write-Step '解压 llama.cpp CUDA 主包'
    Expand-Archive -Path $binZipPath -DestinationPath $LlamaDir -Force
    $foundServer = Get-ChildItem -Path $LlamaDir -Filter 'llama-server.exe' -Recurse -File | Select-Object -First 1
    if (-not $foundServer) {
        throw 'llama-server.exe 下载或解压失败，未在主包中找到可执行文件。'
    }
    $srcDir = Split-Path -Parent $foundServer.FullName
    $srcDirResolved = (Resolve-Path $srcDir).Path
    $llamaDirResolved = (Resolve-Path $LlamaDir).Path
    if ($srcDirResolved -ne $llamaDirResolved) {
        Copy-Item -Path (Join-Path $srcDir '*') -Destination $LlamaDir -Recurse -Force
    }
    $status = Get-LlamaRuntimeStatus -BaseDir $LlamaDir
    $needRuntime = ($status.Missing | Where-Object { $_ -match '^cudart64_|^cublas64_' }).Count -gt 0
    if ($needRuntime -and -not [string]::IsNullOrWhiteSpace([string]$assets.RuntimeUrl)) {
        $runtimeZipPath = Join-Path $LlamaDir 'llama-win-cuda-runtime.zip'
        Download-File -Url $assets.RuntimeUrl -OutFile $runtimeZipPath
        Write-Step '解压 CUDA 运行时补充包'
        Expand-Archive -Path $runtimeZipPath -DestinationPath $LlamaDir -Force
    }
    $status = Get-LlamaRuntimeStatus -BaseDir $LlamaDir
    if (-not $status.Ready) {
        $missingText = ($status.Missing -join ', ')
        throw "CUDA 运行时不完整，缺失: $missingText"
    }
 }
 function Ensure-ModelFiles {
    Ensure-Dir (Split-Path -Parent $GgufPath)
    Ensure-Dir (Split-Path -Parent $MmprojPath)
    $ggufUrl = if ($env:MODEL_GGUF_URL) { $env:MODEL_GGUF_URL } else { $DefaultGgufUrl }
    $mmprojUrl = if ($env:MODEL_MMPROJ_URL) { $env:MODEL_MMPROJ_URL } else { $DefaultMmprojUrl }
    Write-Step "主模型路径: $GgufPath"
    Write-Step "视觉模型路径: $MmprojPath"
    if (-not (Test-Path $GgufPath)) {
        Download-File -Url $ggufUrl -OutFile $GgufPath
    } else {
        Write-Step '检测到现有 9B 主模型，跳过下载'
    }
    if (-not (Test-Path $MmprojPath)) {
        Download-File -Url $mmprojUrl -OutFile $MmprojPath
    } else {
        Write-Step '检测到现有 mmproj，跳过下载'
    }
    Verify-Sha256 -Path $GgufPath -Expected $env:MODEL_GGUF_SHA256
    Verify-Sha256 -Path $MmprojPath -Expected $env:MODEL_MMPROJ_SHA256
 }
 function Main {
    Ensure-PythonEnv
    Ensure-LlamaRuntime
    Ensure-ModelFiles
    Write-Step '安装完成'
    Write-Step '启动命令: .\\start_8080_toolhub_stack.cmd start'
    Write-Step '停止命令: .\\start_8080_toolhub_stack.cmd stop'
 }
 Main
--- a/install_q8.cmd
+++ b/install_q8.cmd
@@ -0,0 +1,5 @@
@echo off
 setlocal
 set SCRIPT_DIR=%~dp0
 powershell.exe -NoProfile -ExecutionPolicy Bypass -File "%SCRIPT_DIR%install_q8.ps1" %*
 exit /b %ERRORLEVEL%
--- a/install_q8.ps1
+++ b/install_q8.ps1
@@ -0,0 +1,63 @@
 param()
 $ErrorActionPreference = 'Stop'
 $ScriptDir = Split-Path -Parent $MyInvocation.MyCommand.Path
 $RootDir = (Resolve-Path $ScriptDir).Path
 $EnvConfig = Join-Path $RootDir 'env_config.ps1'
 if (-not (Test-Path $EnvConfig)) {
    throw "未找到 env_config.ps1: $EnvConfig"
 }
 . $EnvConfig
 $EnvFile = Join-Path $RootDir '.env'
 $EnvExample = Join-Path $RootDir '.env.example'
 $InstallScript = Join-Path $RootDir 'install.win.ps1'
 $Q8RelativePath = '.tmp/models/crossrepo/lmstudio-community__Qwen3.5-9B-GGUF/Qwen3.5-9B-Q8_0.gguf'
 $MmprojRelativePath = '.tmp/models/crossrepo/lmstudio-community__Qwen3.5-9B-GGUF/mmproj-Qwen3.5-9B-BF16.gguf'
 $Q8Url = 'https://huggingface.co/lmstudio-community/Qwen3.5-9B-GGUF/resolve/main/Qwen3.5-9B-Q8_0.gguf'
 $MmprojUrl = 'https://huggingface.co/lmstudio-community/Qwen3.5-9B-GGUF/resolve/main/mmproj-Qwen3.5-9B-BF16.gguf'
 function Write-Step {
    param([string]$Message)
    Write-Host "[install_q8] $Message"
 }
 function Set-ProcessEnvValue {
    param(
        [string]$Key,
        [string]$Value
    )
    [Environment]::SetEnvironmentVariable($Key, $Value, 'Process')
 }
 function Update-Q8Env {
    Ensure-EnvFile -Path $EnvFile -TemplatePath $EnvExample
    Set-EnvFileValue -Path $EnvFile -Key 'MODEL_PATH' -Value $Q8RelativePath
    Set-EnvFileValue -Path $EnvFile -Key 'MMPROJ_PATH' -Value $MmprojRelativePath
    Set-EnvFileValue -Path $EnvFile -Key 'MODEL_GGUF_URL' -Value $Q8Url
    Set-EnvFileValue -Path $EnvFile -Key 'MODEL_MMPROJ_URL' -Value $MmprojUrl
    Set-EnvFileValue -Path $EnvFile -Key 'MODEL_GGUF_SHA256' -Value ''
    Set-EnvFileValue -Path $EnvFile -Key 'MODEL_MMPROJ_SHA256' -Value ''
 }
 function Main {
    if (-not (Test-Path $InstallScript)) {
        throw "未找到安装脚本: $InstallScript"
    }
    Update-Q8Env
    Set-ProcessEnvValue -Key 'MODEL_PATH' -Value $Q8RelativePath
    Set-ProcessEnvValue -Key 'MMPROJ_PATH' -Value $MmprojRelativePath
    Set-ProcessEnvValue -Key 'MODEL_GGUF_URL' -Value $Q8Url
    Set-ProcessEnvValue -Key 'MODEL_MMPROJ_URL' -Value $MmprojUrl
    Set-ProcessEnvValue -Key 'MODEL_GGUF_SHA256' -Value ''
    Set-ProcessEnvValue -Key 'MODEL_MMPROJ_SHA256' -Value ''
    Write-Step "已写入 .env: MODEL_PATH=$Q8RelativePath"
    Write-Step '已切换到 Q8 量化下载源，开始执行 install.win.ps1'
    & powershell.exe -NoProfile -ExecutionPolicy Bypass -File $InstallScript
    if ($LASTEXITCODE -ne 0) {
        throw "Q8 安装失败，exit code: $LASTEXITCODE"
    }
 }
 Main
--- a/requirements.txt
+++ b/requirements.txt
@@ -0,0 +1,10 @@
 fastapi==0.135.1
 uvicorn==0.41.0
 requests==2.32.5
 qwen-agent==0.0.34
 ddgs==9.11.1
 beautifulsoup4==4.14.3
 Pillow==11.3.0
 numpy==2.3.3
 soundfile==0.13.1
 python-dateutil==2.9.0.post0
--- a/run_8080_toolhub_gateway.py
+++ b/run_8080_toolhub_gateway.py
@@ -0,0 +1,593 @@
 #!/usr/bin/env python3
 import argparse
 import os
 import threading
 import time
 from contextlib import asynccontextmanager
 from dataclasses import dataclass
 from typing import Any, Dict, List, Set, Tuple
 import requests
 import uvicorn
 from fastapi import FastAPI, Request
 from fastapi.responses import JSONResponse, Response, StreamingResponse
 from starlette.concurrency import run_in_threadpool
 from toolhub_gateway_agent import (
    build_non_stream_response,
    run_chat_completion,
    stream_chat_completion,
 )
 DEFAULT_GATEWAY_HOST = '127.0.0.1'
 DEFAULT_GATEWAY_PORT = 8080
 DEFAULT_BACKEND_BASE = 'http://127.0.0.1:8081'
 DEFAULT_MODEL_SERVER = 'http://127.0.0.1:8081/v1'
 DEFAULT_TIMEOUT_SEC = 180
 DEFAULT_BACKEND_WAIT_HINT = ''
 DEFAULT_ACCESS_URLS = 'http://127.0.0.1:8080,http://localhost:8080'
 READY_ANNOUNCE_INTERVAL_SEC = 2
 WAIT_LOG_INTERVAL_SEC = 10
 WARMUP_MESSAGE = '请只回复一个字：好'
 WARMUP_PARSE_ERROR_MARKER = 'Failed to parse input'
 STREAM_CHUNK_BYTES = 8192
 SUPPORTED_PROXY_METHODS = ['GET', 'POST', 'PUT', 'PATCH', 'DELETE', 'OPTIONS', 'HEAD']
 HOP_HEADERS = {
    'connection',
    'keep-alive',
    'proxy-authenticate',
    'proxy-authorization',
    'te',
    'trailers',
    'transfer-encoding',
    'upgrade',
 }
 LOCAL_CONFIG_KEY = 'LlamaCppWebui.config'
 LOCAL_OVERRIDES_KEY = 'LlamaCppWebui.userOverrides'
 WEBUI_SETTINGS_PATCH = f"""
 <script>
 (function () {{
  try {{
    var cfgKey = '{LOCAL_CONFIG_KEY}';
    var ovKey = '{LOCAL_OVERRIDES_KEY}';
    var cfg = JSON.parse(localStorage.getItem(cfgKey) || '{{}}');
    cfg.showMessageStats = true;
    cfg.keepStatsVisible = false;
    cfg.showThoughtInProgress = true;
    cfg.disableReasoningParsing = false;
    localStorage.setItem(cfgKey, JSON.stringify(cfg));
    var overrides = JSON.parse(localStorage.getItem(ovKey) || '[]');
    var set = new Set(Array.isArray(overrides) ? overrides : []);
    ['showMessageStats', 'keepStatsVisible', 'showThoughtInProgress', 'disableReasoningParsing']
      .forEach(function (k) {{ set.add(k); }});
    localStorage.setItem(ovKey, JSON.stringify(Array.from(set)));
  }} catch (e) {{
    console.error('webui settings patch failed', e);
  }}
 }})();
 </script>
 <style>
 .chat-processing-info-container {{
  display: none !important;
 }}
 </style>
 """.strip()
 BACKEND_LOADING_HTML = """
 <!doctype html>
 <html lang="zh-CN">
 <head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <title>ToolHub 正在准备中</title>
  <style>
    :root {
      color-scheme: light;
      font-family: "Segoe UI", "PingFang SC", "Microsoft YaHei", sans-serif;
      background: #0f172a;
      color: #e2e8f0;
    }
    body {
      margin: 0;
      min-height: 100vh;
      display: grid;
      place-items: center;
      background:
        radial-gradient(circle at top, rgba(59, 130, 246, 0.18), transparent 45%),
        linear-gradient(180deg, #111827, #020617);
    }
    main {
      width: min(680px, calc(100vw - 32px));
      padding: 28px;
      border-radius: 20px;
      background: rgba(15, 23, 42, 0.88);
      border: 1px solid rgba(148, 163, 184, 0.24);
      box-shadow: 0 24px 80px rgba(15, 23, 42, 0.45);
    }
    h1 {
      margin: 0 0 12px;
      font-size: 28px;
    }
    .status {
      display: flex;
      align-items: center;
      gap: 16px;
      margin-bottom: 18px;
    }
    .spinner-shell {
      position: relative;
      width: 40px;
      height: 40px;
      flex: 0 0 auto;
    }
    .spinner-ring {
      position: absolute;
      inset: 0;
      border-radius: 999px;
      border: 3px solid rgba(148, 163, 184, 0.16);
      border-top-color: #93c5fd;
      border-right-color: rgba(96, 165, 250, 0.92);
      animation: spin 12s steps(12, end) infinite;
      will-change: transform;
      transform: translateZ(0);
    }
    .spinner-ring::after {
      content: "";
      position: absolute;
      top: 3px;
      left: 50%;
      width: 7px;
      height: 7px;
      margin-left: -3.5px;
      border-radius: 999px;
      background: #e0f2fe;
      box-shadow: 0 0 12px rgba(96, 165, 250, 0.78);
    }
    .spinner-core {
      position: absolute;
      inset: 9px;
      border-radius: 999px;
      background:
        radial-gradient(circle, rgba(191, 219, 254, 0.96) 0, rgba(147, 197, 253, 0.82) 34%, rgba(59, 130, 246, 0.18) 65%, transparent 72%);
    }
    p {
      margin: 10px 0;
      line-height: 1.7;
      color: #cbd5e1;
    }
    .state-line {
      margin-top: 14px;
      color: #93c5fd;
    }
    .elapsed-line {
      margin-top: 8px;
      color: #cbd5e1;
      font-variant-numeric: tabular-nums;
    }
    .hint-box {
      margin-top: 16px;
      padding: 14px 16px;
      border-radius: 14px;
      background: rgba(15, 23, 42, 0.72);
      border: 1px solid rgba(148, 163, 184, 0.18);
    }
    details {
      margin-top: 16px;
      color: #94a3b8;
    }
    summary {
      cursor: pointer;
    }
    pre {
      margin: 10px 0 0;
      padding: 12px;
      border-radius: 12px;
      background: rgba(2, 6, 23, 0.86);
      border: 1px solid rgba(148, 163, 184, 0.16);
      color: #cbd5e1;
      white-space: pre-wrap;
      word-break: break-word;
      font-family: "Cascadia Code", "Consolas", monospace;
      font-size: 13px;
      line-height: 1.6;
    }
    code {
      font-family: "Cascadia Code", "Consolas", monospace;
      color: #f8fafc;
    }
    @keyframes spin {
      from { transform: rotate(0deg); }
      to { transform: rotate(360deg); }
    }
    @media (prefers-reduced-motion: reduce) {
      .spinner-ring {
        animation-duration: 12s;
      }
    }
  </style>
 </head>
 <body>
  <main>
    <div class="status">
      <div class="spinner-shell" aria-hidden="true">
        <div class="spinner-ring"></div>
        <div class="spinner-core"></div>
      </div>
      <h1>ToolHub 正在准备中</h1>
    </div>
    <p>网关已经启动，但模型后端暂时还没有就绪。</p>
    <p>如果这是第一次启动，程序可能正在下载模型文件，或者正在把模型加载到 GPU。</p>
    <p>页面会停留在这个等待界面里，并自动检查后端状态。准备完成后会自动进入聊天界面，不再整页反复刷新。</p>
    <p class="state-line" id="state-line">正在检查后端状态...</p>
    <p class="elapsed-line" id="elapsed-line">已等待 0 秒</p>
    <div class="hint-box">
      <p>如果你是刚在终端里执行了启动命令，最直接的进度信息通常就在那个终端窗口里。</p>
      __HINT_BLOCK__
    </div>
    <details>
      <summary>查看技术详情</summary>
      <pre>__DETAIL__</pre>
    </details>
  </main>
  <script>
    (function () {
      var stateLine = document.getElementById('state-line');
      var elapsedLine = document.getElementById('elapsed-line');
      var healthUrl = '/gateway/health';
      var startedAt = Date.now();
      function updateState(message) {
        if (stateLine) {
          stateLine.textContent = message;
        }
      }
      function updateElapsed() {
        if (!elapsedLine) {
          return;
        }
        var elapsedSec = Math.floor((Date.now() - startedAt) / 1000);
        elapsedLine.textContent = '已等待 ' + elapsedSec + ' 秒';
      }
      async function pollHealth() {
        try {
          var response = await fetch(healthUrl, { cache: 'no-store' });
          var payload = await response.json();
          if (payload.status === 'ok') {
            updateState('后端已经就绪，正在进入聊天界面...');
            updateElapsed();
            window.location.reload();
            return;
          }
          updateState('模型仍在准备中，页面会自动继续等待。');
        } catch (error) {
          updateState('暂时还连不上后端，继续等待即可。');
        }
        window.setTimeout(pollHealth, 4000);
      }
      updateElapsed();
      window.setInterval(updateElapsed, 1000);
      window.setTimeout(pollHealth, 1200);
    })();
  </script>
 </body>
 </html>
 """.strip()
@dataclass(frozen=True)
 class GatewayConfig:
    backend_base: str
    model_server: str
    gateway_host: str
    gateway_port: int
    timeout_sec: int = DEFAULT_TIMEOUT_SEC
    backend_wait_hint: str = DEFAULT_BACKEND_WAIT_HINT
    access_urls: Tuple[str, ...] = ()
@dataclass
 class GatewayState:
    ready_event: threading.Event
 def parse_args() -> GatewayConfig:
    parser = argparse.ArgumentParser(description='Run 8080 toolhub gateway with 8081 llama-server backend.')
    parser.add_argument('--host', default=os.getenv('GATEWAY_HOST', DEFAULT_GATEWAY_HOST))
    parser.add_argument('--port', type=int, default=int(os.getenv('GATEWAY_PORT', str(DEFAULT_GATEWAY_PORT))))
    parser.add_argument('--backend-base', default=os.getenv('BACKEND_BASE', DEFAULT_BACKEND_BASE))
    parser.add_argument('--model-server', default=os.getenv('MODEL_SERVER', DEFAULT_MODEL_SERVER))
    parser.add_argument('--timeout-sec', type=int, default=int(os.getenv('GATEWAY_TIMEOUT_SEC', str(DEFAULT_TIMEOUT_SEC))))
    parser.add_argument('--backend-wait-hint', default=os.getenv('BACKEND_WAIT_HINT', DEFAULT_BACKEND_WAIT_HINT))
    parser.add_argument('--access-urls', default=os.getenv('ACCESS_URLS', DEFAULT_ACCESS_URLS))
    args = parser.parse_args()
    return GatewayConfig(
        backend_base=args.backend_base.rstrip('/'),
        model_server=args.model_server.rstrip('/'),
        gateway_host=args.host,
        gateway_port=args.port,
        timeout_sec=args.timeout_sec,
        backend_wait_hint=args.backend_wait_hint.strip(),
        access_urls=parse_access_urls(args.access_urls),
    )
 def parse_access_urls(raw: str) -> Tuple[str, ...]:
    urls = [item.strip() for item in raw.split(',') if item.strip()]
    return tuple(dict.fromkeys(urls))
 def filtered_headers(headers: Dict[str, str]) -> Dict[str, str]:
    blocked = HOP_HEADERS | {'host', 'content-length', 'proxy-connection'}
    return {key: value for key, value in headers.items() if key.lower() not in blocked}
 def drop_headers_ci(headers: Dict[str, str], names: Set[str]) -> Dict[str, str]:
    lowered = {name.lower() for name in names}
    return {key: value for key, value in headers.items() if key.lower() not in lowered}
 def build_backend_url(base: str, path: str, query: str) -> str:
    if not query:
        return f'{base}{path}'
    return f'{base}{path}?{query}'
 def stream_upstream(upstream: requests.Response):
    try:
        for chunk in upstream.iter_content(chunk_size=STREAM_CHUNK_BYTES):
            if chunk:
                yield chunk
    finally:
        upstream.close()
 def inject_webui_settings(html: str) -> str:
    if WEBUI_SETTINGS_PATCH in html:
        return html
    if '<head>' in html:
        return html.replace('<head>', f'<head>\n{WEBUI_SETTINGS_PATCH}\n', 1)
    if '<body>' in html:
        return html.replace('<body>', f'<body>\n{WEBUI_SETTINGS_PATCH}\n', 1)
    return f'{WEBUI_SETTINGS_PATCH}\n{html}'
 def build_backend_loading_response(detail: str, wait_hint: str) -> Response:
    safe_detail = detail.replace('&', '&amp;').replace('<', '&lt;').replace('>', '&gt;')
    hint_block = ''
    if wait_hint:
        safe_hint = wait_hint.replace('&', '&amp;').replace('<', '&lt;').replace('>', '&gt;')
        hint_block = f'<p>如果你想单独查看后端准备进度，可以执行：<br><code>{safe_hint}</code></p>'
    html = BACKEND_LOADING_HTML.replace('__DETAIL__', safe_detail).replace('__HINT_BLOCK__', hint_block)
    return Response(
        content=html,
        status_code=200,
        media_type='text/html; charset=utf-8',
        headers={'Cache-Control': 'no-store, max-age=0'},
    )
 def is_root_request(request: Request, path: str) -> bool:
    return request.method == 'GET' and path in {'/', '/index.html'}
 def is_backend_wait_status(status_code: int) -> bool:
    return status_code in {502, 503, 504}
 def format_access_urls(access_urls: Tuple[str, ...]) -> str:
    return ' '.join(access_urls)
 def check_backend_ready(cfg: GatewayConfig) -> bool:
    try:
        response = requests.get(f'{cfg.backend_base}/health', timeout=cfg.timeout_sec)
        response.raise_for_status()
    except Exception:  # noqa: BLE001
        return False
    return True
 def announce_access_urls(cfg: GatewayConfig) -> None:
    if not cfg.access_urls:
        return
    print(
        f'[toolhub-gateway] 网页入口已经开放，正在加载模型，完成后可访问: {format_access_urls(cfg.access_urls)}',
        flush=True,
    )
 def announce_backend_ready(cfg: GatewayConfig) -> None:
    if not cfg.access_urls:
        return
    print(
        f'[toolhub-gateway] 模型已完成加载和预热，可以打开: {format_access_urls(cfg.access_urls)}',
        flush=True,
    )
 def is_gateway_ready(state: GatewayState) -> bool:
    return state.ready_event.is_set()
 def warmup_model(cfg: GatewayConfig) -> Tuple[bool, str]:
    payload = {
        'messages': [{'role': 'user', 'content': WARMUP_MESSAGE}],
        'max_tokens': 1,
        'stream': False,
        'temperature': 0,
    }
    try:
        response = requests.post(
            f'{cfg.model_server}/chat/completions',
            json=payload,
            timeout=cfg.timeout_sec,
        )
    except Exception as exc:  # noqa: BLE001
        return False, f'模型预热请求失败: {exc}'
    if response.ok:
        return True, '模型预热已完成'
    body = response.text.strip()
    if response.status_code == 500 and WARMUP_PARSE_ERROR_MARKER in body:
        return True, '模型首轮预热已经完成'
    return False, f'模型预热暂未完成: HTTP {response.status_code} {body[:200]}'
 def run_ready_announcer(cfg: GatewayConfig, state: GatewayState) -> None:
    last_wait_detail = ''
    last_wait_log_at = 0.0
    announce_access_urls(cfg)
    while True:
        if check_backend_ready(cfg):
            ready, wait_detail = warmup_model(cfg)
        else:
            ready, wait_detail = False, '后端健康检查尚未通过'
        if ready:
            state.ready_event.set()
            announce_backend_ready(cfg)
            return
        now = time.monotonic()
        if wait_detail != last_wait_detail or (now - last_wait_log_at) >= WAIT_LOG_INTERVAL_SEC:
            print(f'[toolhub-gateway] 后端仍在准备中: {wait_detail}', flush=True)
            last_wait_detail = wait_detail
            last_wait_log_at = now
        time.sleep(READY_ANNOUNCE_INTERVAL_SEC)
 async def handle_gateway_health(cfg: GatewayConfig, state: GatewayState) -> Dict[str, Any]:
    status = 'ok' if is_gateway_ready(state) else 'warming'
    backend_error = ''
    try:
        health = requests.get(f'{cfg.backend_base}/health', timeout=cfg.timeout_sec)
        health.raise_for_status()
    except Exception as exc:  # noqa: BLE001
        status = 'degraded'
        backend_error = str(exc)
    return {'status': status, 'backend_base': cfg.backend_base, 'backend_error': backend_error}
 async def handle_chat_completions(request: Request, cfg: GatewayConfig) -> Response:
    payload = await request.json()
    stream = bool(payload.get('stream', False))
    if stream:
        try:
            iterator = stream_chat_completion(payload, cfg.model_server, cfg.timeout_sec)
        except Exception as exc:  # noqa: BLE001
            error = {'error': {'code': 500, 'type': 'gateway_error', 'message': str(exc)}}
            return JSONResponse(status_code=500, content=error)
        return StreamingResponse(iterator, media_type='text/event-stream')
    try:
        result = await run_in_threadpool(run_chat_completion, payload, cfg.model_server, cfg.timeout_sec)
    except Exception as exc:  # noqa: BLE001
        error = {'error': {'code': 500, 'type': 'gateway_error', 'message': str(exc)}}
        return JSONResponse(status_code=500, content=error)
    answer = result['answer']
    model = result['model']
    reasoning = result.get('reasoning', '')
    return JSONResponse(content=build_non_stream_response(answer, model, reasoning))
 async def handle_proxy(request: Request, full_path: str, cfg: GatewayConfig, state: GatewayState) -> Response:
    path = '/' + full_path
    if is_root_request(request, path) and not is_gateway_ready(state):
        return build_backend_loading_response('模型正在加载或预热，完成后会自动进入聊天界面。', cfg.backend_wait_hint)
    url = build_backend_url(cfg.backend_base, path, request.url.query)
    headers = filtered_headers(dict(request.headers))
    body = await request.body()
    try:
        upstream = requests.request(
            method=request.method,
            url=url,
            headers=headers,
            data=body,
            stream=True,
            timeout=cfg.timeout_sec,
            allow_redirects=False,
        )
    except Exception as exc:  # noqa: BLE001
        if is_root_request(request, path):
            return build_backend_loading_response(str(exc), cfg.backend_wait_hint)
        if request.method == 'GET' and path == '/favicon.ico':
            return Response(status_code=204)
        error = {'error': {'type': 'proxy_error', 'message': str(exc)}}
        return JSONResponse(status_code=502, content=error)
    response_headers = filtered_headers(dict(upstream.headers))
    content_type = upstream.headers.get('content-type', '')
    if is_root_request(request, path) and is_backend_wait_status(upstream.status_code):
        detail = upstream.text.strip() or f'backend returned {upstream.status_code}'
        upstream.close()
        return build_backend_loading_response(detail, cfg.backend_wait_hint)
    if request.method == 'GET' and path == '/favicon.ico' and is_backend_wait_status(upstream.status_code):
        upstream.close()
        return Response(status_code=204)
    if 'text/event-stream' in content_type:
        return StreamingResponse(
            stream_upstream(upstream),
            status_code=upstream.status_code,
            headers=response_headers,
            media_type='text/event-stream',
        )
    is_webui_html = (
        request.method == 'GET'
        and path in {'/', '/index.html'}
        and upstream.status_code == 200
        and 'text/html' in content_type
    )
    if is_webui_html:
        encoding = upstream.encoding or 'utf-8'
        html = upstream.content.decode(encoding, errors='replace')
        injected = inject_webui_settings(html)
        upstream.close()
        clean_headers = drop_headers_ci(response_headers, {'content-encoding', 'content-length', 'etag'})
        return Response(
            content=injected.encode('utf-8'),
            status_code=200,
            headers=clean_headers,
            media_type='text/html; charset=utf-8',
        )
    upstream.raw.decode_content = False
    data = upstream.raw.read(decode_content=False)
    upstream.close()
    return Response(content=data, status_code=upstream.status_code, headers=response_headers)
 def create_app(cfg: GatewayConfig, state: GatewayState) -> FastAPI:
    @asynccontextmanager
    async def lifespan(_: FastAPI):
        threading.Thread(target=run_ready_announcer, args=(cfg, state), daemon=True).start()
        yield
    app = FastAPI(title='Qwen3.5 ToolHub Gateway 8080', lifespan=lifespan)
    @app.get('/gateway/health')
    async def gateway_health() -> Dict[str, Any]:
        return await handle_gateway_health(cfg, state)
    @app.post('/v1/chat/completions')
    async def chat_completions(request: Request) -> Response:
        return await handle_chat_completions(request, cfg)
    @app.api_route('/{full_path:path}', methods=SUPPORTED_PROXY_METHODS)
    async def proxy_all(request: Request, full_path: str) -> Response:
        return await handle_proxy(request, full_path, cfg, state)
    return app
 def main() -> None:
    cfg = parse_args()
    state = GatewayState(ready_event=threading.Event())
    app = create_app(cfg, state)
    uvicorn.run(app, host=cfg.gateway_host, port=cfg.gateway_port, log_level='info')
 if __name__ == '__main__':
    main()
--- a/start_8080_toolhub_stack.cmd
+++ b/start_8080_toolhub_stack.cmd
@@ -0,0 +1,5 @@
@echo off
 setlocal
 set SCRIPT_DIR=%~dp0
 powershell.exe -NoProfile -ExecutionPolicy Bypass -File "%SCRIPT_DIR%start_8080_toolhub_stack.ps1" %*
 exit /b %ERRORLEVEL%
--- a/start_8080_toolhub_stack.ps1
+++ b/start_8080_toolhub_stack.ps1
@@ -0,0 +1,292 @@
 param(
    [string]$Command = 'status'
 )
 $ErrorActionPreference = 'Stop'
 $ProgressPreference = 'SilentlyContinue'
 $ScriptDir = Split-Path -Parent $MyInvocation.MyCommand.Path
 $RootDir = (Resolve-Path $ScriptDir).Path
 $EnvConfig = Join-Path $RootDir 'env_config.ps1'
 if (Test-Path $EnvConfig) {
    . $EnvConfig
    Import-EnvFile -Path (Join-Path $RootDir '.env')
 }
 $PythonBin = Join-Path $RootDir '.venv-qwen35\Scripts\python.exe'
 $GatewayRun = Join-Path $RootDir 'run_8080_toolhub_gateway.py'
 $RuntimeDir = Join-Path $RootDir '.tmp\toolhub_gateway'
 $PidFile = Join-Path $RuntimeDir 'gateway.pid'
 $LogFile = Join-Path $RuntimeDir 'gateway.log'
 $ErrLogFile = Join-Path $RuntimeDir 'gateway.err.log'
 $ModelSwitch = Join-Path $RootDir 'switch_qwen35_webui.ps1'
 $GatewayHost = if ($env:GATEWAY_HOST) { $env:GATEWAY_HOST } else { '127.0.0.1' }
 $GatewayPort = if ($env:GATEWAY_PORT) { $env:GATEWAY_PORT } else { '8080' }
 $BackendHost = if ($env:BACKEND_HOST) { $env:BACKEND_HOST } else { '127.0.0.1' }
 $BackendPort = if ($env:BACKEND_PORT) { $env:BACKEND_PORT } else { '8081' }
 $ThinkMode = if ($env:THINK_MODE) { $env:THINK_MODE } else { 'think-on' }
 $BackendWaitHint = '.\start_8080_toolhub_stack.cmd logs'
 $SpinnerFrameIntervalMs = 120
 $SpinnerProbeIntervalMs = 1000
 function Ensure-Dir {
    param([string]$Path)
    if (-not (Test-Path $Path)) {
        New-Item -Path $Path -ItemType Directory -Force | Out-Null
    }
 }
 function Test-GatewayRunning {
    if (-not (Test-Path $PidFile)) {
        return $false
    }
    $raw = Get-Content -Path $PidFile -ErrorAction SilentlyContinue | Select-Object -First 1
    $gatewayPid = 0
    if (-not [int]::TryParse([string]$raw, [ref]$gatewayPid)) {
        return $false
    }
    $proc = Get-Process -Id $gatewayPid -ErrorAction SilentlyContinue
    return $null -ne $proc
 }
 function Test-GatewayReady {
    try {
        $null = Invoke-RestMethod -Uri "http://$GatewayHost`:$GatewayPort/gateway/health" -Method Get -TimeoutSec 2
        return $true
    } catch {
        return $false
    }
 }
 function Show-GatewayFailureLogs {
    Write-Host '网关启动失败，最近日志如下:'
    if (Test-Path $LogFile) {
        Write-Host '=== 网关标准输出 ==='
        Get-Content -Path $LogFile -Tail 120 -ErrorAction SilentlyContinue
    }
    if (Test-Path $ErrLogFile) {
        Write-Host '=== 网关标准错误 ==='
        Get-Content -Path $ErrLogFile -Tail 120 -ErrorAction SilentlyContinue
    }
 }
 function Write-SpinnerLine {
    param(
        [string]$Label,
        [double]$Current,
        [int]$Total,
        [int]$Tick
    )
    $frames = @('|', '/', '-', '\')
    $frame = $frames[$Tick % $frames.Count]
    $currentText = [string][int][Math]::Floor($Current)
    Write-Host -NoNewline "`r$Label $frame $currentText/$Total 秒"
 }
 function Complete-SpinnerLine {
    Write-Host ''
 }
 function Stop-OrphanGatewayProcesses {
    try {
        $rootPattern = [regex]::Escape($RootDir)
        $targets = Get-CimInstance Win32_Process -Filter "Name='python.exe'" -ErrorAction SilentlyContinue | Where-Object {
            $cmd = [string]$_.CommandLine
            $cmd -match 'run_8080_toolhub_gateway\.py' -and $cmd -match $rootPattern
        }
        foreach ($proc in $targets) {
            if ($proc.ProcessId) {
                Stop-Process -Id ([int]$proc.ProcessId) -Force -ErrorAction SilentlyContinue
            }
        }
    } catch {}
 }
 function Start-Backend {
    if ($env:MODEL_KEY -and $env:MODEL_KEY -ne '9b') {
        throw "当前交付包仅支持 MODEL_KEY=9b，收到: $($env:MODEL_KEY)"
    }
    $oldHost = $env:HOST
    $oldPort = $env:PORT
    try {
        $env:HOST = $BackendHost
        $env:PORT = $BackendPort
        & powershell.exe -NoProfile -ExecutionPolicy Bypass -File $ModelSwitch '9b' $ThinkMode
        if ($LASTEXITCODE -ne 0) {
            throw '后端启动失败，请先查看上面的直接原因'
        }
    } finally {
        $env:HOST = $oldHost
        $env:PORT = $oldPort
    }
 }
 function Start-Gateway {
    Ensure-Dir $RuntimeDir
    Stop-OrphanGatewayProcesses
    if (Test-GatewayRunning) {
        Write-Host '网关状态: 已运行'
        Write-Host "PID: $(Get-Content -Path $PidFile)"
        return
    }
    if (-not (Test-Path $PythonBin)) {
        throw "Python 环境不存在: $PythonBin"
    }
    $args = @(
        $GatewayRun,
        '--host', $GatewayHost,
        '--port', $GatewayPort,
        '--backend-base', "http://$BackendHost`:$BackendPort",
        '--model-server', "http://$BackendHost`:$BackendPort/v1"
    )
    if (Test-Path $ErrLogFile) {
        Remove-Item -Path $ErrLogFile -Force -ErrorAction SilentlyContinue
    }
    $oldWaitHint = $env:BACKEND_WAIT_HINT
    try {
        $env:BACKEND_WAIT_HINT = $BackendWaitHint
        $proc = Start-Process -FilePath $PythonBin -ArgumentList $args -WindowStyle Hidden -RedirectStandardOutput $LogFile -RedirectStandardError $ErrLogFile -PassThru
    } finally {
        $env:BACKEND_WAIT_HINT = $oldWaitHint
    }
    Set-Content -Path $PidFile -Value $proc.Id -Encoding ascii
    $timeoutSec = 60
    $stopwatch = [System.Diagnostics.Stopwatch]::StartNew()
    $nextProbeMs = 0
    $tick = 0
    while ($stopwatch.Elapsed.TotalSeconds -lt $timeoutSec) {
        Write-SpinnerLine -Label '网关启动中...' -Current $stopwatch.Elapsed.TotalSeconds -Total $timeoutSec -Tick $tick
        if ($stopwatch.ElapsedMilliseconds -ge $nextProbeMs) {
            if (-not (Test-GatewayRunning)) {
                break
            }
            if (Test-GatewayReady) {
                Complete-SpinnerLine
                return
            }
            $nextProbeMs += $SpinnerProbeIntervalMs
        }
        Start-Sleep -Milliseconds $SpinnerFrameIntervalMs
        $tick++
    }
    Complete-SpinnerLine
    Show-GatewayFailureLogs
    throw '网关启动失败。'
 }
 function Stop-Gateway {
    Stop-OrphanGatewayProcesses
    if (-not (Test-GatewayRunning)) {
        if (Test-Path $PidFile) {
            Remove-Item -Path $PidFile -Force -ErrorAction SilentlyContinue
        }
        Write-Host '网关状态: 未运行'
        return
    }
    $gatewayPid = [int](Get-Content -Path $PidFile | Select-Object -First 1)
    Stop-Process -Id $gatewayPid -Force -ErrorAction SilentlyContinue
    Start-Sleep -Seconds 1
    if (Test-Path $PidFile) {
        Remove-Item -Path $PidFile -Force -ErrorAction SilentlyContinue
    }
    Write-Host '网关状态: 已停止'
 }
 function Show-Status {
    Write-Host '=== 网关 ==='
    if (Test-GatewayRunning) {
        $state = if (Test-GatewayReady) { '可访问' } else { '初始化中' }
        Write-Host '状态: 运行中'
        Write-Host "PID: $(Get-Content -Path $PidFile)"
        Write-Host "地址: http://$GatewayHost`:$GatewayPort"
        Write-Host "健康: $state"
        Write-Host "日志: $LogFile"
        Write-Host "错误日志: $ErrLogFile"
    } else {
        Write-Host '状态: 未运行'
    }
    Write-Host ''
    Write-Host '=== 模型后端 ==='
    $oldHost = $env:HOST
    $oldPort = $env:PORT
    try {
        $env:HOST = $BackendHost
        $env:PORT = $BackendPort
        & powershell.exe -NoProfile -ExecutionPolicy Bypass -File $ModelSwitch 'status'
    } finally {
        $env:HOST = $oldHost
        $env:PORT = $oldPort
    }
 }
 function Show-Logs {
    Write-Host '=== 网关日志 ==='
    if (Test-Path $LogFile) {
        Get-Content -Path $LogFile -Tail 120
    }
    if (Test-Path $ErrLogFile) {
        Write-Host '=== 网关错误日志 ==='
        Get-Content -Path $ErrLogFile -Tail 120
        return
    }
    Write-Host '暂无日志'
 }
 function Stop-Backend {
    $oldHost = $env:HOST
    $oldPort = $env:PORT
    try {
        $env:HOST = $BackendHost
        $env:PORT = $BackendPort
        & powershell.exe -NoProfile -ExecutionPolicy Bypass -File $ModelSwitch 'stop'
    } finally {
        $env:HOST = $oldHost
        $env:PORT = $oldPort
    }
 }
 function Start-Stack {
    try {
        Write-Host '步骤 1/2: 启动模型后端'
        Start-Backend
        Write-Host '步骤 2/2: 启动网关服务'
        Start-Gateway
        Write-Host '栈已启动'
        Write-Host "网页入口: http://$GatewayHost`:$GatewayPort"
        Write-Host '可用状态检查命令: .\start_8080_toolhub_stack.cmd status'
        Write-Host '停止命令: .\start_8080_toolhub_stack.cmd stop'
    } catch {
        Write-Host $_.Exception.Message
        exit 1
    }
 }
 function Stop-Stack {
    Stop-Gateway
    Stop-Backend
 }
 switch ($Command) {
    'start' { Start-Stack; break }
    'stop' { Stop-Stack; break }
    'restart' { Stop-Stack; Start-Stack; break }
    'status' { Show-Status; break }
    'logs' { Show-Logs; break }
    default {
        Write-Host '用法:'
        Write-Host '  .\\start_8080_toolhub_stack.cmd {start|stop|restart|status|logs}'
        Write-Host ''
        Write-Host '可选环境变量:'
        Write-Host '  GATEWAY_HOST=127.0.0.1'
        Write-Host '  GATEWAY_PORT=8080'
        Write-Host '  BACKEND_HOST=127.0.0.1'
        Write-Host '  BACKEND_PORT=8081'
        Write-Host '  THINK_MODE=think-on'
        exit 1
    }
 }
--- a/start_8080_toolhub_stack.sh
+++ b/start_8080_toolhub_stack.sh
@@ -0,0 +1,105 @@
 #!/usr/bin/env bash
 set -euo pipefail
 ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 PS1_PATH="$ROOT_DIR/start_8080_toolhub_stack.ps1"
 print_usage() {
  cat <<'USAGE'
 用法:
  ./start_8080_toolhub_stack.sh {start|stop|restart|status|logs}
 说明:
  WSL 入口会直接复用 Windows 主脚本的完整启动链路。
  包括后端 GPU 强校验与网关管理，行为与 cmd / PowerShell 保持一致。
 USAGE
 }
 to_win_path_if_needed() {
  local raw="$1"
  if [[ -z "$raw" ]]; then
    printf ''
    return
  fi
  if [[ "$raw" == /* ]]; then
    wslpath -w "$raw"
    return
  fi
  printf '%s' "$raw"
 }
 ps_escape_single_quotes() {
  printf "%s" "$1" | sed "s/'/''/g"
 }
 require_windows_power_shell() {
  if ! command -v powershell.exe >/dev/null 2>&1; then
    echo "未找到 powershell.exe，无法从 WSL 调用 Windows 栈脚本。"
    exit 1
  fi
  if [[ ! -f "$PS1_PATH" ]]; then
    echo "缺少栈脚本: $PS1_PATH"
    exit 1
  fi
 }
 build_env_overrides() {
  local -n out_ref=$1
  out_ref=()
  for key in GATEWAY_HOST GATEWAY_PORT BACKEND_HOST BACKEND_PORT THINK_MODE HOST PORT CTX_SIZE IMAGE_MIN_TOKENS IMAGE_MAX_TOKENS MMPROJ_OFFLOAD GPU_MEMORY_DELTA_MIN_MIB; do
    if [[ -n "${!key-}" ]]; then
      out_ref+=("$key=${!key}")
    fi
  done
  if [[ -n "${BIN_PATH-}" ]]; then
    out_ref+=("BIN_PATH=$(to_win_path_if_needed "$BIN_PATH")")
  fi
  if [[ -n "${MODEL_PATH-}" ]]; then
    out_ref+=("MODEL_PATH=$(to_win_path_if_needed "$MODEL_PATH")")
  fi
  if [[ -n "${MMPROJ_PATH-}" ]]; then
    out_ref+=("MMPROJ_PATH=$(to_win_path_if_needed "$MMPROJ_PATH")")
  fi
 }
 build_ps_env_setup() {
  local -n env_ref=$1
  local lines=()
  local item key value escaped_value
  for item in "${env_ref[@]}"; do
    key="${item%%=*}"
    value="${item#*=}"
    escaped_value="$(ps_escape_single_quotes "$value")"
    lines+=("[Environment]::SetEnvironmentVariable('$key', '$escaped_value', 'Process')")
  done
  printf '%s; ' "${lines[@]}"
 }
 main() {
  local command="${1:-status}"
  case "$command" in
    start|stop|restart|status|logs) ;;
    *)
      print_usage
      exit 1
      ;;
  esac
  require_windows_power_shell
  local ps1_win
  ps1_win="$(wslpath -w "$PS1_PATH")"
  local env_overrides=()
  build_env_overrides env_overrides
  local ps_command
  local ps_env_setup
  ps_env_setup="$(build_ps_env_setup env_overrides)"
  ps_command="[Console]::InputEncoding = [System.Text.UTF8Encoding]::new(\$false); [Console]::OutputEncoding = [System.Text.UTF8Encoding]::new(\$false); chcp 65001 > \$null; ${ps_env_setup}& '$ps1_win' '$command'"
  powershell.exe -NoProfile -ExecutionPolicy Bypass -Command "$ps_command"
 }
 main "${1:-status}"
--- a/switch_qwen35_webui.ps1
+++ b/switch_qwen35_webui.ps1
@@ -0,0 +1,499 @@
 param(
    [string]$Command = 'status',
    [string]$ThinkMode = 'think-on'
 )
 $ErrorActionPreference = 'Stop'
 $ProgressPreference = 'SilentlyContinue'
 $ScriptDir = Split-Path -Parent $MyInvocation.MyCommand.Path
 $RootDir = (Resolve-Path $ScriptDir).Path
 $EnvConfig = Join-Path $RootDir 'env_config.ps1'
 if (Test-Path $EnvConfig) {
    . $EnvConfig
    Import-EnvFile -Path (Join-Path $RootDir '.env')
 }
 $BinPath = if ($env:BIN_PATH) { $env:BIN_PATH } else { Join-Path $RootDir '.tmp\llama_win_cuda\llama-server.exe' }
 $HostAddr = if ($env:HOST) { $env:HOST } else { '127.0.0.1' }
 $PortNum = if ($env:PORT) { $env:PORT } else { '8081' }
 $CtxSize = if ($env:CTX_SIZE) { $env:CTX_SIZE } else { '16384' }
 $ImageMinTokens = if ($env:IMAGE_MIN_TOKENS) { $env:IMAGE_MIN_TOKENS } else { '256' }
 $ImageMaxTokens = if ($env:IMAGE_MAX_TOKENS) { $env:IMAGE_MAX_TOKENS } else { '1024' }
 $MmprojOffload = if ($env:MMPROJ_OFFLOAD) { $env:MMPROJ_OFFLOAD } else { 'off' }
 $ModelPath = Resolve-ManagedPath -BaseDir $RootDir -Value $env:MODEL_PATH -DefaultRelativePath '.tmp\models\crossrepo\lmstudio-community__Qwen3.5-9B-GGUF\Qwen3.5-9B-Q4_K_M.gguf'
 $MmprojPath = Resolve-ManagedPath -BaseDir $RootDir -Value $env:MMPROJ_PATH -DefaultRelativePath '.tmp\models\crossrepo\lmstudio-community__Qwen3.5-9B-GGUF\mmproj-Qwen3.5-9B-BF16.gguf'
 $WebuiDir = Join-Path $RootDir '.tmp\webui'
 $PidFile = Join-Path $WebuiDir 'llama_server.pid'
 $CurrentLogFile = Join-Path $WebuiDir 'current.log'
 $CurrentErrLogFile = Join-Path $WebuiDir 'current.err.log'
 $GpuMemoryDeltaMinMiB = if ($env:GPU_MEMORY_DELTA_MIN_MIB) { $env:GPU_MEMORY_DELTA_MIN_MIB } else { '1024' }
 $BackendReadyTimeoutSec = if ($env:BACKEND_READY_TIMEOUT_SEC) { $env:BACKEND_READY_TIMEOUT_SEC } else { '180' }
 $GpuVerifyTimeoutSec = if ($env:GPU_VERIFY_TIMEOUT_SEC) { $env:GPU_VERIFY_TIMEOUT_SEC } else { '180' }
 $SpinnerFrameIntervalMs = 120
 $SpinnerProbeIntervalMs = 1000
 function Ensure-Dir {
    param([string]$Path)
    if (-not (Test-Path $Path)) {
        New-Item -Path $Path -ItemType Directory -Force | Out-Null
    }
 }
 function Test-Health {
    try {
        $null = Invoke-RestMethod -Uri "http://$HostAddr`:$PortNum/health" -Method Get -TimeoutSec 2
        return $true
    } catch {
        return $false
    }
 }
 function Get-ModelId {
    try {
        $models = Invoke-RestMethod -Uri "http://$HostAddr`:$PortNum/v1/models" -Method Get -TimeoutSec 3
        if ($models.data -and $models.data.Count -gt 0) {
            return [string]$models.data[0].id
        }
        return ''
    } catch {
        return ''
    }
 }
 function Write-SpinnerLine {
    param(
        [string]$Label,
        [double]$Current,
        [int]$Total,
        [int]$Tick
    )
    $frames = @('|', '/', '-', '\')
    $frame = $frames[$Tick % $frames.Count]
    $currentText = [string][int][Math]::Floor($Current)
    Write-Host -NoNewline "`r$Label $frame $currentText/$Total 秒"
 }
 function Complete-SpinnerLine {
    Write-Host ''
 }
 function Test-ProcessRunning {
    param([int]$ProcessId)
    try {
        $null = Get-Process -Id $ProcessId -ErrorAction Stop
        return $true
    } catch {
        return $false
    }
 }
 function Wait-Ready {
    param([int]$ProcessId)
    $timeoutSec = [int]$BackendReadyTimeoutSec
    $stopwatch = [System.Diagnostics.Stopwatch]::StartNew()
    $nextProbeMs = 0
    $tick = 0
    while ($stopwatch.Elapsed.TotalSeconds -lt $timeoutSec) {
        Write-SpinnerLine -Label '后端加载中...' -Current $stopwatch.Elapsed.TotalSeconds -Total $timeoutSec -Tick $tick
        if ($stopwatch.ElapsedMilliseconds -ge $nextProbeMs) {
            if (-not (Test-ProcessRunning -ProcessId $ProcessId)) {
                Complete-SpinnerLine
                return @{ Ready = $false; Reason = 'llama-server 进程已提前退出' }
            }
            if (Test-Health) {
                $modelId = Get-ModelId
                if (-not [string]::IsNullOrWhiteSpace($modelId)) {
                    Complete-SpinnerLine
                    return @{ Ready = $true; Reason = "模型已就绪: $modelId" }
                }
            }
            $nextProbeMs += $SpinnerProbeIntervalMs
        }
        Start-Sleep -Milliseconds $SpinnerFrameIntervalMs
        $tick++
    }
    Complete-SpinnerLine
    return @{ Ready = $false; Reason = "后端在 $timeoutSec 秒内未就绪" }
 }
 function Read-LogText {
    param([string]$Path)
    if (-not (Test-Path $Path)) {
        return ''
    }
    try {
        $lines = Get-Content -Path $Path -Tail 400 -ErrorAction SilentlyContinue
        if ($null -eq $lines) {
            return ''
        }
        return ($lines -join "`n")
    } catch {
        return ''
    }
 }
 function Show-RecentServerLogs {
    param(
        [string]$OutLogPath,
        [string]$ErrLogPath
    )
    Write-Host '后端启动失败，最近日志如下:'
    if (Test-Path $OutLogPath) {
        Write-Host '=== 标准输出 ==='
        Get-Content -Path $OutLogPath -Tail 120 -ErrorAction SilentlyContinue
    }
    if (Test-Path $ErrLogPath) {
        Write-Host '=== 标准错误 ==='
        Get-Content -Path $ErrLogPath -Tail 120 -ErrorAction SilentlyContinue
    }
 }
 function Test-GpuReadyFromLogs {
    param(
        [string]$OutLogPath,
        [string]$ErrLogPath
    )
    $content = (Read-LogText -Path $OutLogPath) + "`n" + (Read-LogText -Path $ErrLogPath)
    if ([string]::IsNullOrWhiteSpace($content)) {
        return @{ Ready = $false; Reason = '日志为空' }
    }
    $match = [regex]::Match($content, 'offloaded\s+(\d+)\/(\d+)\s+layers\s+to\s+GPU', [System.Text.RegularExpressions.RegexOptions]::IgnoreCase)
    if ($match.Success) {
        $offloaded = [int]$match.Groups[1].Value
        $total = [int]$match.Groups[2].Value
        if ($offloaded -gt 0) {
            return @{ Ready = $true; Reason = "offloaded $offloaded/$total" }
        }
        return @{ Ready = $false; Reason = "offloaded 0/$total" }
    }
    $cpuFallbackPattern = 'cuda[^`n]*failed|no cuda-capable device|unable to initialize cuda|using cpu'
    if ($content -match $cpuFallbackPattern) {
        return @{ Ready = $false; Reason = '检测到 CUDA 初始化失败或 CPU 回退' }
    }
    return @{ Ready = $false; Reason = '未检测到 GPU 卸载证据' }
 }
 function Ensure-GpuOffload {
    param(
        [int]$ProcessId,
        [int]$BaselineMemoryMiB,
        [string]$OutLogPath,
        [string]$ErrLogPath
    )
    $moduleResult = @{ Ready = $false; Reason = '未执行检查' }
    $result = @{ Ready = $false; Reason = '未知原因' }
    $nvidiaResult = @{ Ready = $false; Reason = '未执行检查' }
    $timeoutSec = [int]$GpuVerifyTimeoutSec
    $stopwatch = [System.Diagnostics.Stopwatch]::StartNew()
    $nextProbeMs = 0
    $tick = 0
    while ($stopwatch.Elapsed.TotalSeconds -lt $timeoutSec) {
        Write-SpinnerLine -Label 'GPU 校验中...' -Current $stopwatch.Elapsed.TotalSeconds -Total $timeoutSec -Tick $tick
        if ($stopwatch.ElapsedMilliseconds -ge $nextProbeMs) {
            if (-not (Test-ProcessRunning -ProcessId $ProcessId)) {
                Complete-SpinnerLine
                throw 'llama-server 在 GPU 校验期间提前退出'
            }
            $moduleResult = Test-CudaBackendLoaded -ProcessId $ProcessId
            $result = Test-GpuReadyFromLogs -OutLogPath $OutLogPath -ErrLogPath $ErrLogPath
            $nvidiaResult = Test-GpuReadyByNvidiaSmi -BaselineMemoryMiB $BaselineMemoryMiB
            if ($moduleResult.Ready -and ($result.Ready -or $nvidiaResult.Ready)) {
                Complete-SpinnerLine
                if ($result.Ready) {
                    return "$($moduleResult.Reason)；$($result.Reason)"
                }
                return "$($moduleResult.Reason)；$($nvidiaResult.Reason)"
            }
            $nextProbeMs += $SpinnerProbeIntervalMs
        }
        Start-Sleep -Milliseconds $SpinnerFrameIntervalMs
        $tick++
    }
    Complete-SpinnerLine
    throw "已禁止 CPU 回退，但未检测到 GPU 卸载。模块检查: $($moduleResult.Reason)；nvidia-smi: $($nvidiaResult.Reason)；日志检查: $($result.Reason)"
 }
 function Test-CudaBackendLoaded {
    param([int]$ProcessId)
    try {
        $mods = Get-Process -Id $ProcessId -Module -ErrorAction Stop
        $cuda = $mods | Where-Object { $_.ModuleName -match '^ggml-cuda.*\.dll$' } | Select-Object -First 1
        if ($null -ne $cuda) {
            return @{ Ready = $true; Reason = "检测到 $($cuda.ModuleName) 已加载" }
        }
        return @{ Ready = $false; Reason = '未检测到 ggml-cuda*.dll' }
    } catch {
        return @{ Ready = $false; Reason = '无法读取 llama-server 进程模块' }
    }
 }
 function Test-GpuReadyByNvidiaSmi {
    param([int]$BaselineMemoryMiB)
    $snapshot = Get-GpuMemoryUsedMiB
    if (-not $snapshot.Ok) {
        return @{ Ready = $false; Reason = $snapshot.Reason }
    }
    $delta = $snapshot.UsedMiB - $BaselineMemoryMiB
    if ($snapshot.UsedMiB -gt 0 -and $delta -ge [int]$GpuMemoryDeltaMinMiB) {
        return @{ Ready = $true; Reason = "nvidia-smi 显存占用 ${snapshot.UsedMiB}MiB，较基线增加 ${delta}MiB" }
    }
    return @{ Ready = $false; Reason = "显存占用 ${snapshot.UsedMiB}MiB，较基线增加 ${delta}MiB，阈值 ${GpuMemoryDeltaMinMiB}MiB" }
 }
 function Get-GpuMemoryUsedMiB {
    $nvidia = Get-Command nvidia-smi.exe -ErrorAction SilentlyContinue
    if (-not $nvidia) {
        $nvidia = Get-Command nvidia-smi -ErrorAction SilentlyContinue
    }
    if (-not $nvidia) {
        return @{ Ok = $false; UsedMiB = 0; Reason = 'nvidia-smi 不可用' }
    }
    $output = & $nvidia.Source '--query-gpu=memory.used' '--format=csv,noheader,nounits' 2>&1
    if ($LASTEXITCODE -ne 0) {
        return @{ Ok = $false; UsedMiB = 0; Reason = 'nvidia-smi 执行失败' }
    }
    $rows = @($output | ForEach-Object { "$_".Trim() } | Where-Object { $_ -match '^[0-9]+$' })
    if ($rows.Count -eq 0) {
        return @{ Ok = $false; UsedMiB = 0; Reason = 'nvidia-smi 未返回显存数据' }
    }
    $maxUsed = 0
    foreach ($row in $rows) {
        $memValue = 0
        if ([int]::TryParse($row, [ref]$memValue)) {
            if ($memValue -gt $maxUsed) {
                $maxUsed = $memValue
            }
        }
    }
    return @{ Ok = $true; UsedMiB = $maxUsed; Reason = 'ok' }
 }
 function Get-StartupFailureReason {
    param(
        [string]$OutLogPath,
        [string]$ErrLogPath
    )
    $content = (Read-LogText -Path $OutLogPath) + "`n" + (Read-LogText -Path $ErrLogPath)
    if ([string]::IsNullOrWhiteSpace($content)) {
        return ''
    }
    $bindMatch = [regex]::Match($content, "couldn't bind HTTP server socket, hostname:\s*([^,]+), port:\s*([0-9]+)", [System.Text.RegularExpressions.RegexOptions]::IgnoreCase)
    if ($bindMatch.Success) {
        $busyPort = $bindMatch.Groups[2].Value
        return "端口 $busyPort 已被占用，请先关闭占用该端口的服务，再重新启动"
    }
    return ''
 }
 function Get-PortOwnerSummary {
    param([string]$Port)
    try {
        $listeners = Get-NetTCPConnection -LocalPort ([int]$Port) -State Listen -ErrorAction SilentlyContinue
        if (-not $listeners) {
            return ''
        }
        $owners = @()
        foreach ($listener in @($listeners | Select-Object -ExpandProperty OwningProcess -Unique)) {
            $proc = Get-Process -Id $listener -ErrorAction SilentlyContinue
            if ($proc) {
                $owners += ('{0} (PID {1})' -f $proc.ProcessName, $proc.Id)
            } else {
                $owners += ('PID {0}' -f $listener)
            }
        }
        return ($owners -join ', ')
    } catch {
        return ''
    }
 }
 function Stop-Server {
    if (Test-Path $PidFile) {
        $raw = Get-Content -Path $PidFile -ErrorAction SilentlyContinue | Select-Object -First 1
        $serverPid = 0
        if ([int]::TryParse([string]$raw, [ref]$serverPid) -and $serverPid -gt 0) {
            try {
                Stop-Process -Id $serverPid -Force -ErrorAction SilentlyContinue
            } catch {}
        }
    }
    $procs = Get-Process -Name 'llama-server' -ErrorAction SilentlyContinue
    if ($procs) {
        $procs | Stop-Process -Force -ErrorAction SilentlyContinue
    }
    if (Test-Path $PidFile) {
        Remove-Item -Path $PidFile -Force -ErrorAction SilentlyContinue
    }
    if (Test-Path $CurrentErrLogFile) {
        Remove-Item -Path $CurrentErrLogFile -Force -ErrorAction SilentlyContinue
    }
 }
 function Show-Status {
    if (Test-Health) {
        $modelId = Get-ModelId
        if ([string]::IsNullOrWhiteSpace($modelId)) {
            $modelId = 'loading'
        }
        Write-Host '状态: 运行中'
        Write-Host "地址: http://$HostAddr`:$PortNum"
        Write-Host "模型: $modelId"
        if (Test-Path $CurrentLogFile) {
            $p = Get-Content -Path $CurrentLogFile -ErrorAction SilentlyContinue | Select-Object -First 1
            if ($p) {
                Write-Host "日志: $p"
            }
        }
        if (Test-Path $CurrentErrLogFile) {
            $ep = Get-Content -Path $CurrentErrLogFile -ErrorAction SilentlyContinue | Select-Object -First 1
            if ($ep) {
                Write-Host "错误日志: $ep"
            }
        }
        return
    }
    Write-Host '状态: 未运行'
 }
 function Resolve-RuntimeProfile {
    switch ($ThinkMode) {
        'think-on' { return @{ ReasoningBudget = '-1'; MaxTokens = '-1' } }
        'think-off' { return @{ ReasoningBudget = '0'; MaxTokens = '2048' } }
        default { throw "不支持的思考模式: $ThinkMode" }
    }
 }
 function Validate-Limits {
    if (($CtxSize -notmatch '^[0-9]+$') -or ($ImageMinTokens -notmatch '^[0-9]+$') -or ($ImageMaxTokens -notmatch '^[0-9]+$')) {
        throw 'CTX_SIZE / IMAGE_MIN_TOKENS / IMAGE_MAX_TOKENS 必须是正整数'
    }
    if ([int]$CtxSize -le 0 -or [int]$ImageMinTokens -le 0 -or [int]$ImageMaxTokens -le 0) {
        throw 'CTX_SIZE / IMAGE_MIN_TOKENS / IMAGE_MAX_TOKENS 必须大于 0'
    }
    if ([int]$ImageMinTokens -gt [int]$ImageMaxTokens) {
        throw 'IMAGE_MIN_TOKENS 不能大于 IMAGE_MAX_TOKENS'
    }
    if ($MmprojOffload -ne 'on' -and $MmprojOffload -ne 'off') {
        throw 'MMPROJ_OFFLOAD 仅支持 on 或 off'
    }
    if (($GpuMemoryDeltaMinMiB -notmatch '^[0-9]+$') -or [int]$GpuMemoryDeltaMinMiB -le 0) {
        throw 'GPU_MEMORY_DELTA_MIN_MIB 必须是正整数'
    }
    if (($BackendReadyTimeoutSec -notmatch '^[0-9]+$') -or [int]$BackendReadyTimeoutSec -le 0) {
        throw 'BACKEND_READY_TIMEOUT_SEC 必须是正整数'
    }
    if (($GpuVerifyTimeoutSec -notmatch '^[0-9]+$') -or [int]$GpuVerifyTimeoutSec -le 0) {
        throw 'GPU_VERIFY_TIMEOUT_SEC 必须是正整数'
    }
 }
 function Start-Server {
    if (-not (Test-Path $BinPath)) {
        throw "llama-server.exe 不存在: $BinPath"
    }
    if (-not (Test-Path $ModelPath) -or -not (Test-Path $MmprojPath)) {
        throw "模型文件不完整。`nMODEL_PATH=$ModelPath`nMMPROJ_PATH=$MmprojPath"
    }
    Ensure-Dir $WebuiDir
    Validate-Limits
    $profile = Resolve-RuntimeProfile
    Stop-Server
    $portOwner = Get-PortOwnerSummary -Port $PortNum
    if ($portOwner) {
        throw "端口 $PortNum 已被占用: $portOwner"
    }
    $args = @(
        '-m', $ModelPath,
        '-mm', $MmprojPath,
        '--n-gpu-layers', 'all',
        '--flash-attn', 'on',
        '--fit', 'on',
        '--fit-target', '256',
        '--temp', '1.0',
        '--top-p', '0.95',
        '--top-k', '20',
        '--min-p', '0.1',
        '--presence-penalty', '1.5',
        '--repeat-penalty', '1.05',
        '-n', $profile.MaxTokens,
        '--reasoning-budget', $profile.ReasoningBudget,
        '-c', $CtxSize,
        '--image-min-tokens', $ImageMinTokens,
        '--image-max-tokens', $ImageMaxTokens,
        '--host', $HostAddr,
        '--port', $PortNum,
        '--webui'
    )
    if ($MmprojOffload -eq 'off') {
        $args += '--no-mmproj-offload'
    } else {
        $args += '--mmproj-offload'
    }
    $logPath = Join-Path $WebuiDir ("llama_server_9b_{0}.log" -f (Get-Date -Format 'yyyyMMdd_HHmmss'))
    $errLogPath = Join-Path $WebuiDir ("llama_server_9b_{0}.err.log" -f (Get-Date -Format 'yyyyMMdd_HHmmss'))
    if (Test-Path $logPath) {
        Remove-Item -Path $logPath -Force -ErrorAction SilentlyContinue
    }
    if (Test-Path $errLogPath) {
        Remove-Item -Path $errLogPath -Force -ErrorAction SilentlyContinue
    }
    $baselineGpuMemoryMiB = 0
    $gpuBaseline = Get-GpuMemoryUsedMiB
    if ($gpuBaseline.Ok) {
        $baselineGpuMemoryMiB = [int]$gpuBaseline.UsedMiB
    }
    Write-Host '后端进程启动中，正在装载模型到 GPU...'
    $proc = Start-Process -FilePath $BinPath -ArgumentList $args -WindowStyle Hidden -RedirectStandardOutput $logPath -RedirectStandardError $errLogPath -PassThru
    Set-Content -Path $PidFile -Value $proc.Id -Encoding ascii
    Set-Content -Path $CurrentLogFile -Value $logPath -Encoding utf8
    Set-Content -Path $CurrentErrLogFile -Value $errLogPath -Encoding utf8
    $startupReady = $false
    try {
        $readyResult = Wait-Ready -ProcessId $proc.Id
        if (-not $readyResult.Ready) {
            $startupFailureReason = Get-StartupFailureReason -OutLogPath $logPath -ErrLogPath $errLogPath
            if ($startupFailureReason) {
                throw "服务启动失败: $startupFailureReason"
            }
            throw "服务启动失败: $($readyResult.Reason)"
        }
        $gpuInfo = Ensure-GpuOffload -ProcessId $proc.Id -BaselineMemoryMiB $baselineGpuMemoryMiB -OutLogPath $logPath -ErrLogPath $errLogPath
        Write-Host "GPU 校验通过: $gpuInfo"
        $startupReady = $true
    } finally {
        if (-not $startupReady) {
            Show-RecentServerLogs -OutLogPath $logPath -ErrLogPath $errLogPath
            Stop-Server
        }
    }
    Write-Host "已切换到 9b，思考模式: $ThinkMode"
    Write-Host "地址: http://$HostAddr`:$PortNum"
    Write-Host "视觉限制: image tokens $ImageMinTokens-$ImageMaxTokens, mmproj offload=$MmprojOffload, ctx=$CtxSize"
    Show-Status
 }
 switch ($Command) {
    'status' { Show-Status; break }
    'stop' { Stop-Server; Write-Host '服务已停止'; break }
    '9b' { Start-Server; break }
    default {
        Write-Host '用法:'
        Write-Host '  .\\switch_qwen35_webui.ps1 status'
        Write-Host '  .\\switch_qwen35_webui.ps1 stop'
        Write-Host '  .\\switch_qwen35_webui.ps1 9b [think-on|think-off]'
        exit 1
    }
 }
--- a/switch_qwen35_webui.sh
+++ b/switch_qwen35_webui.sh
@@ -0,0 +1,101 @@
 #!/usr/bin/env bash
 set -euo pipefail
 ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 PS1_PATH="$ROOT_DIR/switch_qwen35_webui.ps1"
 print_usage() {
  cat <<'USAGE'
 用法:
  ./switch_qwen35_webui.sh status
  ./switch_qwen35_webui.sh stop
  ./switch_qwen35_webui.sh 9b [think-on|think-off]
 说明:
  WSL 入口会直接复用 Windows 主脚本的 GPU 强校验逻辑。
  若未成功加载到 GPU，脚本会直接失败，不会回退 CPU。
 USAGE
 }
 to_win_path_if_needed() {
  local raw="$1"
  if [[ -z "$raw" ]]; then
    printf ''
    return
  fi
  if [[ "$raw" == /* ]]; then
    wslpath -w "$raw"
    return
  fi
  printf '%s' "$raw"
 }
 require_windows_power_shell() {
  if ! command -v powershell.exe >/dev/null 2>&1; then
    echo "未找到 powershell.exe，WSL 模式无法调用 Windows 后端脚本。"
    exit 1
  fi
  if [[ ! -f "$PS1_PATH" ]]; then
    echo "缺少后端脚本: $PS1_PATH"
    exit 1
  fi
 }
 build_env_overrides() {
  local -n out_ref=$1
  out_ref=()
  for key in HOST PORT CTX_SIZE IMAGE_MIN_TOKENS IMAGE_MAX_TOKENS MMPROJ_OFFLOAD GPU_MEMORY_DELTA_MIN_MIB; do
    if [[ -n "${!key-}" ]]; then
      out_ref+=("$key=${!key}")
    fi
  done
  if [[ -n "${BIN_PATH-}" ]]; then
    out_ref+=("BIN_PATH=$(to_win_path_if_needed "$BIN_PATH")")
  fi
  if [[ -n "${MODEL_PATH-}" ]]; then
    out_ref+=("MODEL_PATH=$(to_win_path_if_needed "$MODEL_PATH")")
  fi
  if [[ -n "${MMPROJ_PATH-}" ]]; then
    out_ref+=("MMPROJ_PATH=$(to_win_path_if_needed "$MMPROJ_PATH")")
  fi
 }
 main() {
  local command="${1:-status}"
  local think_mode="${2:-think-on}"
  case "$command" in
    status|stop) ;;
    9b)
      case "$think_mode" in
        think-on|think-off) ;;
        *)
          echo "不支持的思考模式: $think_mode"
          exit 1
          ;;
      esac
      ;;
    *)
      print_usage
      exit 1
      ;;
  esac
  require_windows_power_shell
  local ps1_win
  ps1_win="$(wslpath -w "$PS1_PATH")"
  local env_overrides=()
  build_env_overrides env_overrides
  if [[ "$command" == "9b" ]]; then
    env "${env_overrides[@]}" powershell.exe -NoProfile -ExecutionPolicy Bypass -File "$ps1_win" "$command" "$think_mode"
    return
  fi
  env "${env_overrides[@]}" powershell.exe -NoProfile -ExecutionPolicy Bypass -File "$ps1_win" "$command"
 }
 main "${1:-status}" "${2:-think-on}"
--- a/toolhub_gateway_agent.py
+++ b/toolhub_gateway_agent.py
@@ -0,0 +1,446 @@
 import json
 import os
 import time
 import datetime
 import uuid
 from pathlib import Path
 from typing import Any, Dict, Iterable, List, Optional, Sequence, Union
 import requests
 from qwen_agent.agents import Assistant
 from qwen_agent.llm.schema import ContentItem, Message
 import agent_runtime  # noqa: F401
 from agent_runtime import readonly_tools  # noqa: F401
 DEFAULT_SYSTEM_PROMPT = (
    '你是 Qwen3.5，本地部署的多模态中文助手。\n'
    '默认中文回答。\n'
    '当用户只是打招呼或闲聊时，自然回应即可，不要主动枚举全部工具。\n'
    '你的目标是先使用可用工具获得可验证信息，再给出结论。\n'
    '规则:\n'
    '1. 对最新信息先用 web_search，再按需用 web_fetch 或 web_extractor 抓取正文。\n'
    '2. 对人名、作品名、小众概念等不确定知识先 web_search，若结果歧义则改写关键词再检索一次。\n'
    '3. 核心规则：你已具备 filesystem 的读写能力。你可以读取文件，如果用户有需求，你也可以调用 write_file 工具进行写入。\n'
    '4. 图片问题先看整图，细节再用 image_zoom_in_tool，使用相对坐标。\n'
    '5. 工具失败时必须明确说明原因，不得伪造结果。\n'
    '6. 联网任务要控制上下文预算，优先少量高质量来源。\n'
    '7. 严禁在未获授权的情况下使用 filesystem 工具查看助手自身的源代码或运行目录。\n'
    '8. 联网任务要控制上下文预算，优先少量高质量来源，避免搬运大段无关正文。\n'
    '9. 长期记忆（主动意识）：你拥有 manage_memory 工具。当你从对话中识别出以下内容时，必须【主动】调用 add 操作：\n    - 用户的明确偏好（如：喜欢 MD 格式、不喜欢繁琐说明）。\n    - 重要的个人事实（如：职业、项目代号、系统配置路径）。\n    - 约定的工作习惯（如：每段代码都要加注释）。\n  执行后，在回复中自然地告知用户“我已记下此习惯/信息”。当用户问“你了解我什么”或要求修改时，配合 list 和 delete 操作。\n'
 )
 DEFAULT_FUNCTION_LIST = [
    'web_search',
    'web_fetch',
    'web_extractor',
    'image_search',
    'image_zoom_in_tool',
    'filesystem',
    'manage_memory',  # 👈 检查这里，确保引号和逗号都齐了
 ]
 TIMINGS_EMIT_INTERVAL_SEC = 0.8
 MAX_FALLBACK_PART_TEXT_CHARS = 512
 # --- 注入逻辑开始 ---
 def get_injected_memory() -> str:
    """从环境变量指定的路径动态加载记忆"""
    path_str = os.getenv('MEMORY_FILE_PATH', './memory.json')
    memory_path = Path(path_str).resolve()
    if not memory_path.exists():
        return ""
    try:
        with open(memory_path, 'r', encoding='utf-8') as f:
            memories = json.load(f)
            if not isinstance(memories, list) or not memories:
                return ""
            memory_str = "\n".join([f"- {m}" for m in memories])
            return f"\n【长期记忆库（已自动加载）】:\n{memory_str}\n"
    except Exception as e:
        print(f"注入记忆失败: {e}")
        return ""
 def fetch_model_id(model_server: str, timeout_sec: int) -> str:
    response = requests.get(f'{model_server}/models', timeout=timeout_sec)
    response.raise_for_status()
    return response.json()['data'][0]['id']
 def _extract_image_uri(part: Dict[str, Any]) -> Optional[str]:
    keys = ('image_url', 'image', 'url', 'input_image', 'image_uri')
    for key in keys:
        value = part.get(key)
        if isinstance(value, str) and value.strip():
            return value.strip()
        if isinstance(value, dict):
            nested = value.get('url') or value.get('image_url') or value.get('image')
            if isinstance(nested, str) and nested.strip():
                return nested.strip()
    return None
 def _build_compact_part_text(part: Dict[str, Any], part_type: Any) -> str:
    part_keys = sorted(str(k) for k in part.keys())
    payload = {'type': str(part_type or 'unknown'), 'keys': part_keys[:12]}
    text = part.get('text')
    if isinstance(text, str) and text.strip():
        payload['text'] = text.strip()[:MAX_FALLBACK_PART_TEXT_CHARS]
    return json.dumps(payload, ensure_ascii=False)
 def extract_generate_cfg(payload: Dict[str, Any]) -> Dict[str, Any]:
    cfg: Dict[str, Any] = {}
    keys = ('temperature', 'top_p', 'top_k', 'presence_penalty', 'frequency_penalty')
    for key in keys:
        value = payload.get(key)
        if value is not None:
            cfg[key] = value
    repeat_penalty = payload.get('repeat_penalty')
    if repeat_penalty is not None:
        cfg['repetition_penalty'] = repeat_penalty
    extra_body = payload.get('extra_body')
    if not isinstance(extra_body, dict):
        extra_body = {}
    chat_template_kwargs = extra_body.get('chat_template_kwargs')
    if not isinstance(chat_template_kwargs, dict):
        chat_template_kwargs = {}
    # 默认开启思考，若上层显式传入 false 则保持用户值。
    chat_template_kwargs.setdefault('enable_thinking', True)
    extra_body['chat_template_kwargs'] = chat_template_kwargs
    requested_reasoning_format = payload.get('reasoning_format')
    if isinstance(requested_reasoning_format, str) and requested_reasoning_format.strip():
        extra_body.setdefault('reasoning_format', requested_reasoning_format.strip())
    else:
        extra_body.setdefault('reasoning_format', 'deepseek')
    extra_body.setdefault('reasoning_budget', -1)
    cfg['extra_body'] = extra_body
    max_tokens = payload.get('max_tokens')
    if isinstance(max_tokens, int) and max_tokens > 0:
        cfg['max_tokens'] = max_tokens
    if not cfg:
        cfg = {'temperature': 0.7, 'top_p': 0.9, 'max_tokens': 512}
    return cfg
 def build_agent(
    model_server: str,
    timeout_sec: int,
    generate_cfg: Dict[str, Any],
    model_id: Optional[str] = None,
    system_prompt: str = DEFAULT_SYSTEM_PROMPT,
 ) -> Assistant:
    if not model_id:
        model_id = fetch_model_id(model_server, timeout_sec)
    llm_cfg = {
        'model': model_id,
        'model_server': model_server,
        'api_key': os.getenv('OPENAI_API_KEY', 'EMPTY'),
        'model_type': 'qwenvl_oai',
        'generate_cfg': generate_cfg,
    }
    # === 核心改造 1：动态组装功能列表 ===
    actual_function_list = list(DEFAULT_FUNCTION_LIST)
    if os.getenv('ENABLE_FILE_WRITE', 'False').lower() == 'true':
        import agent_runtime.write_tools  # 触发 register_tool 注册
        actual_function_list.append('write_file')
    # === 核心改造 2：动态注入记忆与实时时间 ===
    # 1. 先从物理文件加载记忆
    persistent_memory = get_injected_memory() 
    # 2. 获取实时时间
    now = datetime.datetime.now()
    current_time = now.strftime("%Y年%m月%d日 %H:%M:%S")
    weekdays = ["一", "二", "三", "四", "五", "六", "日"]
    dynamic_context = f"【系统实时状态】\n当前时间：{current_time}，星期{weekdays[now.weekday()]}。\n"
    # 3. 按照：时间 -> 长期记忆 -> 原始指令 的顺序拼接
    actual_system_prompt = dynamic_context + persistent_memory + system_prompt
    return Assistant(
        name='Qwen3.5-9B-ToolHub-8080',
        description='8080 网页工具代理',
        llm=llm_cfg,
        function_list=actual_function_list,
        system_message=actual_system_prompt,
    )
 def to_content_items(content: Any) -> Union[str, List[ContentItem]]:
    if isinstance(content, str):
        return content
    if not isinstance(content, list):
        return str(content)
    items: List[ContentItem] = []
    for part in content:
        if not isinstance(part, dict):
            items.append(ContentItem(text=str(part)))
            continue
        part_type = part.get('type')
        if part_type in (None, 'text', 'input_text'):
            text = part.get('text', '')
            if text:
                items.append(ContentItem(text=str(text)))
            continue
        image_uri = _extract_image_uri(part)
        if image_uri:
            items.append(ContentItem(image=image_uri))
            continue
        items.append(ContentItem(text=_build_compact_part_text(part, part_type)))
    return items if items else ''
 def to_qwen_messages(openai_messages: Sequence[Dict[str, Any]]) -> List[Message]:
    qwen_messages: List[Message] = []
    for item in openai_messages:
        role = str(item.get('role', '')).strip()
        if role not in {'system', 'user', 'assistant'}:
            continue
        qwen_messages.append(Message(role=role, content=to_content_items(item.get('content', ''))))
    if not qwen_messages:
        raise ValueError('messages 为空或不包含可用角色')
    return qwen_messages
 def content_to_text(content: Any) -> str:
    if isinstance(content, str):
        return content
    if not isinstance(content, list):
        return str(content)
    texts: List[str] = []
    for item in content:
        if isinstance(item, str):
            texts.append(item)
            continue
        if isinstance(item, dict) and item.get('text'):
            texts.append(str(item['text']))
            continue
        text = getattr(item, 'text', None)
        if text:
            texts.append(str(text))
    return '\n'.join(texts).strip()
 def extract_answer_and_reasoning(messages: Sequence[Message]) -> Dict[str, str]:
    answer = ''
    reasoning_parts: List[str] = []
    for message in messages:
        if getattr(message, 'role', '') != 'assistant':
            continue
        content_text = content_to_text(message.get('content', ''))
        if content_text:
            answer = content_text
        reasoning_text = content_to_text(message.get('reasoning_content', ''))
        if reasoning_text:
            reasoning_parts.append(reasoning_text)
    return {'answer': answer, 'reasoning': '\n'.join(reasoning_parts).strip()}
 def run_chat_completion(payload: Dict[str, Any], model_server: str, timeout_sec: int) -> Dict[str, str]:
    openai_messages = payload.get('messages')
    if not isinstance(openai_messages, list):
        raise ValueError('messages 必须是数组')
    model_id = fetch_model_id(model_server, timeout_sec)
    agent = build_agent(model_server, timeout_sec, extract_generate_cfg(payload), model_id=model_id)
    qwen_messages = to_qwen_messages(openai_messages)
    final_batch = None
    for batch in agent.run(qwen_messages):
        final_batch = batch
    if not final_batch:
        raise RuntimeError('未收到模型输出')
    texts = extract_answer_and_reasoning(final_batch)
    answer = texts['answer']
    reasoning = texts['reasoning']
    return {'model': model_id, 'answer': answer, 'reasoning': reasoning}
 def build_sse_chunk(
    chat_id: str,
    created: int,
    model: str,
    delta: Dict[str, Any],
    finish_reason: Optional[str] = None,
    timings: Optional[Dict[str, Any]] = None,
 ) -> bytes:
    chunk = {
        'id': chat_id,
        'object': 'chat.completion.chunk',
        'created': created,
        'model': model,
        'choices': [{'index': 0, 'delta': delta, 'finish_reason': finish_reason}],
    }
    if timings:
        chunk['timings'] = timings
    return f"data: {json.dumps(chunk, ensure_ascii=False)}\n\n".encode('utf-8')
 def text_delta(previous: str, current: str) -> str:
    if not current:
        return ''
    if current.startswith(previous):
        return current[len(previous):]
    return current
 def model_base_url(model_server: str) -> str:
    if model_server.endswith('/v1'):
        return model_server[:-3]
    return model_server.rstrip('/')
 def count_text_tokens(model_server: str, timeout_sec: int, text: str) -> int:
    if not text:
        return 0
    url = f'{model_base_url(model_server)}/tokenize'
    response = requests.post(url, json={'content': text}, timeout=timeout_sec)
    response.raise_for_status()
    data = response.json()
    tokens = data.get('tokens')
    if not isinstance(tokens, list):
        raise ValueError('tokenize 返回格式异常')
    return len(tokens)
 def build_live_timings(token_count: int, elapsed_sec: float) -> Dict[str, Any]:
    safe_elapsed = elapsed_sec if elapsed_sec > 0 else 1e-6
    return {
        'prompt_n': 0,
        'prompt_ms': 0,
        'predicted_n': token_count,
        'predicted_ms': safe_elapsed * 1000.0,
        'predicted_per_second': token_count / safe_elapsed,
        'cache_n': 0,
    }
 def merge_generated_text(reasoning: str, answer: str) -> str:
    if reasoning and answer:
        return f'{reasoning}\n{answer}'
    return reasoning or answer
 def stream_chat_completion(payload: Dict[str, Any], model_server: str, timeout_sec: int) -> Iterable[bytes]:
    openai_messages = payload.get('messages')
    if not isinstance(openai_messages, list):
        raise ValueError('messages 必须是数组')
    model_id = fetch_model_id(model_server, timeout_sec)
    agent = build_agent(model_server, timeout_sec, extract_generate_cfg(payload), model_id=model_id)
    qwen_messages = to_qwen_messages(openai_messages)
    now = int(time.time())
    chat_id = f'chatcmpl-{uuid.uuid4().hex}'
    yield build_sse_chunk(chat_id, now, model_id, {'role': 'assistant'})
    previous_answer = ''
    previous_reasoning = ''
    started_at = time.perf_counter()
    last_timing_at = started_at
    last_reported_tokens = -1
    last_counted_text = ''
    for batch in agent.run(qwen_messages):
        texts = extract_answer_and_reasoning(batch)
        answer = texts['answer']
        reasoning = texts['reasoning']
        reasoning_inc = text_delta(previous_reasoning, reasoning)
        if reasoning_inc:
            yield build_sse_chunk(chat_id, now, model_id, {'reasoning_content': reasoning_inc})
        answer_inc = text_delta(previous_answer, answer)
        if answer_inc:
            yield build_sse_chunk(chat_id, now, model_id, {'content': answer_inc})
        generated_text = merge_generated_text(reasoning, answer)
        current_time = time.perf_counter()
        should_emit_timing = (
            generated_text
            and generated_text != last_counted_text
            and (current_time - last_timing_at) >= TIMINGS_EMIT_INTERVAL_SEC
        )
        if should_emit_timing:
            token_count = count_text_tokens(model_server, timeout_sec, generated_text)
            if token_count != last_reported_tokens:
                timings = build_live_timings(token_count, current_time - started_at)
                yield build_sse_chunk(chat_id, now, model_id, {}, timings=timings)
                last_reported_tokens = token_count
            last_counted_text = generated_text
            last_timing_at = current_time
        previous_reasoning = reasoning
        previous_answer = answer
    final_generated_text = merge_generated_text(previous_reasoning, previous_answer)
    if final_generated_text and final_generated_text != last_counted_text:
        final_time = time.perf_counter()
        token_count = count_text_tokens(model_server, timeout_sec, final_generated_text)
        if token_count != last_reported_tokens:
            timings = build_live_timings(token_count, final_time - started_at)
            yield build_sse_chunk(chat_id, now, model_id, {}, timings=timings)
    yield build_sse_chunk(chat_id, now, model_id, {}, 'stop')
    yield b'data: [DONE]\n\n'
 def build_non_stream_response(answer: str, model: str, reasoning: str = '') -> Dict[str, Any]:
    now = int(time.time())
    message = {'role': 'assistant', 'content': answer}
    if reasoning:
        message['reasoning_content'] = reasoning
    return {
        'id': f'chatcmpl-{uuid.uuid4().hex}',
        'object': 'chat.completion',
        'created': now,
        'model': model,
        'choices': [{
            'index': 0,
            'message': message,
            'finish_reason': 'stop',
        }],
        'usage': {'prompt_tokens': 0, 'completion_tokens': 0, 'total_tokens': 0},
    }
 def sse_lines(answer: str, model: str, reasoning: str = '') -> Iterable[bytes]:
    now = int(time.time())
    chat_id = f'chatcmpl-{uuid.uuid4().hex}'
    chunks = [
        {
            'id': chat_id,
            'object': 'chat.completion.chunk',
            'created': now,
            'model': model,
            'choices': [{'index': 0, 'delta': {'role': 'assistant'}, 'finish_reason': None}],
        },
    ]
    if reasoning:
        chunks.append({
            'id': chat_id,
            'object': 'chat.completion.chunk',
            'created': now,
            'model': model,
            'choices': [{'index': 0, 'delta': {'reasoning_content': reasoning}, 'finish_reason': None}],
        })
    chunks.append({
        'id': chat_id,
        'object': 'chat.completion.chunk',
        'created': now,
        'model': model,
        'choices': [{'index': 0, 'delta': {'content': answer}, 'finish_reason': None}],
    })
    chunks.append({
        'id': chat_id,
        'object': 'chat.completion.chunk',
        'created': now,
        'model': model,
        'choices': [{'index': 0, 'delta': {}, 'finish_reason': 'stop'}],
    })
    for chunk in chunks:
        yield f"data: {json.dumps(chunk, ensure_ascii=False)}\n\n".encode('utf-8')
    yield b'data: [DONE]\n\n'