State‑of‑the‑art large language model useful on a variety of language understanding and generation tasks.
The Qwen2.5‑3B‑Instruct is a state‑of‑the‑art multilingual language model with 3 billion parameters, excelling in language understanding, generation, coding, and mathematics.
SC8380
推理速度: 18 TPS
Input sequence length for Prompt Processor:128
Context length:4096
Number of parameters:3B
Precision:W4A16 (4-bit weights, 16-bit activations)
Num of key-value heads: The model uses Grouped-Query Attention (GQA).
Information about the model parts: The model is split into 5 parts, and weight sharing is enabled across models with different auto-regression lengths (e.g., 128 and 32).
Supported languages: Multiple languages, including English and various European languages that use the Latin alphabet.
Minimum QNN SDK version required:2.31
对话
内容生成
客户支持
SC8380
Source Model: Apache 2.0
Deployable Model: Apache 2.0