Qwen2.5-VL-3B模型端侧部署

发布时间：2026-01-27 19:19:17

835 阅读

0 评论

Qwen2.5-VL-3B 模型部署与运行指南

本项目基于 ai-engine-direct-helper (QAI_AppBuilder)
https://github.com/quic/ai-engine-direct-helper.git

第一部分：Windows 平台使用

本部分介绍如何在 Windows 环境下配置并运行 Qwen2.5-VL-3B 模型。

1.1 资源下载与准备

下载模型文件：访问网站下载对应平台的模型文件：
Qwen2.5-VL-3B 骁龙 X Elite 平台 (8380) 模型下载
Qwen2.5-VL-3B 骁龙 X2 Elite 平台(8480) 模型下载
将下载模型放置ai-engine-direct-helper\samples\genie\python\models目录下。
下载 Genie 服务程序：前往 GitHub Releases 页面下载 GenieAPIService_v2.1.3_QAIRT_v2.42.0_v73.zip：Releases 下载页面。
解压文件：将下载的压缩包解压至项目代码目录 ai-engine-direct-helper\samples 下。

1.2 启动服务与运行示例

操作步骤：打开终端，进入 samples 目录，分别运行服务和客户端命令。

# 1. 进入目录
cd ai-engine-direct-helper\samples

# 2. 启动 GenieAPI 服务 (加载配置文件)
GenieAPIService\GenieAPIService.exe -c "genie\python\models\qwen2.5vl3b\config.json" -l
成功启动会有日志
[W] load successfully! use second: 4.56947
 [W] Model load successfully: qwen2.5vl3b
 [W] GenieService::setupHttpServer start
 [W] GenieService::setupHttpServer end
 [A] [OK] Genie API Service IS Running.
 [A] [OK] Genie API Service -> http://0.0.0.0:8910

# 3. 运行客户端进行测试 (确保当前目录下有 test.png 图片)
GenieAPIClient.exe --prompt "what is the image descript?" --img test.png --stream --model qwen2.5vl3b

注意: 运行客户端命令前，请确保当前目录下存在名为 test.png 的测试图片文件。

第二部分：Android 平台使用

2.1 资源下载与安装

下载模型文件：与 Windows 平台一致，请先下载对应平台的模型：
Qwen2.5-VL-3B 骁龙 8 至尊版平台 (8750) 模型下载
Qwen2.5-VL-3B 第五代骁龙 8 至尊版平台 (8850) 模型下载
将下载模型放置/sdcard/GenieModels/目录下。
下载与安装 APK：访问 GitHub Releases 页面下载 GenieAPIService.apk 并安装至您的 Android 设备：Releases 下载页面。

2.2 示例应用编译与运行

Android 平台的示例应用源码位于项目目录中，您需要自行编译。

源码路径： samples\android\GenieChat
使用说明： 请使用 Android Studio 打开该目录，进行编译并安装到设备上，配合已安装的 GenieAPIService 使用。

2.3 示例应用截图

Geniechat

第三部分：Python 调用指南

无论是在 Windows 运行 GenieAPIService.exe 还是在 Android 启动 GenieAPIService.apk，服务启动成功后都会显示一个 IP 地址和端口（例如 127.0.0.1:8910 或手机IP）。我们可以使用 Python 通过 OpenAI 兼容接口调用该服务。

3.1 环境准备

请确保已安装 openai 库。

pip install openai

3.2 Python 调用代码 (vl_client.py)

创建一个 Python 脚本（例如 vl_client.py），并将以下代码复制进去。请注意根据实际情况修改 IP 地址。

import argparse
import base64
import requests
import os
from openai import OpenAI

# --- 配置 ---
IP_ADDR = "127.0.0.1:8080"
MODEL_NAME = "qwen2.5vl3b-8380-2.42" # 请确保这是你要调用的模型名称
API_KEY = "123"

# --- 辅助函数：图片编码 ---
def encode_image(image_input):
    """根据路径或URL获取图片的Base64编码"""
    if image_input.startswith(('http://', 'https://')):
        try:
            print(f"Downloading image from URL: {image_input}...")
            response = requests.get(image_input, timeout=10)
            response.raise_for_status()
            return base64.b64encode(response.content).decode('utf-8')
        except Exception as e:
            raise Exception(f"Failed to download image from URL: {e}")
    else:
        try:
            if not os.path.exists(image_input):
                raise FileNotFoundError(f"Local file not found: {image_input}")
            with open(image_input, "rb") as image_file:
                return base64.b64encode(image_file.read()).decode('utf-8')
        except Exception as e:
            raise Exception(f"Failed to load local image: {e}")

def main():
    # 1. 参数解析
    parser = argparse.ArgumentParser(description="Genie API Client for LLM and VL models")
    parser.add_argument("--stream", action="store_true", help="Enable streaming output")
    parser.add_argument("--prompt", type=str, default="Hello", help="The text prompt")
    # 关键修改：required=False，使其变为可选
    parser.add_argument("--image", type=str, required=False, help="Path to image or URL (Trigger VL mode)")
    args = parser.parse_args()

    # 2. 初始化客户端
    client = OpenAI(base_url="http://" + IP_ADDR + "/v1", api_key=API_KEY)

    # 基础 extra_body 配置
    extra_body = {
        "size": 4096,
        "temp": 1.5,
        "top_k": 13,
        "top_p": 0.6
    }

    # 3. 根据是否提供图片参数，构建不同的请求体
    messages_payload = []
   
    if args.image:
        # =========== VL (图文) 模式 ===========
        print(f"--- Mode: VL (Visual Language) [Image: {args.image}] ---")
        try:
            base64_image = encode_image(args.image)
        except Exception as e:
            print(f"Error processing image: {e}")
            return

        # VL 模型特殊的 extra_body 结构
        custom_messages = [
            {"role": "system", "content": "You are a helpful assistant."},
            {
                "role": "user",
                "content": {
                    "question": args.prompt,  
                    "image": base64_image
                }
            }
        ]
        # 将真实数据放入 extra_body
        extra_body["messages"] = custom_messages
       
        # 标准 messages 传占位符 (Genie VL 的特殊要求)
        messages_payload = [{"role": "user", "content": "placeholder"}]
   
    else:
        # =========== LLM (纯文本) 模式 ===========
        print("--- Mode: LLM (Text Only) ---")
        messages_payload = [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": args.prompt}
        ]
        # LLM 模式下，extra_body 不需要包含 messages 字段

    # 4. 发送请求
    try:
        if args.stream:
            response = client.chat.completions.create(
                model=MODEL_NAME,
                stream=True,
                messages=messages_payload,
                extra_body=extra_body
            )
            print("Response: ", end="")
            for chunk in response:
                if chunk.choices:
                    content = chunk.choices[0].delta.content
                    if content is not None:
                        print(content, end="", flush=True)
            print() # 换行
        else:
            response = client.chat.completions.create(
                model=MODEL_NAME,
                messages=messages_payload,
                extra_body=extra_body
            )
            if response.choices:
                print("Response:", response.choices[0].message.content)

    except Exception as e:
        print(f"\nRequest failed: {e}")

if __name__ == "__main__":
    main()

3.3 运行脚本

在命令行中运行脚本，指定图片路径和（可选）提示词：

python vl_client.py --image test.png --prompt "图片里有什么？" --stream
python vl_client.py --image "https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png" --prompt "What implies in this logo?"