Llama 3.2 3B

简介

Llama 3.2 3B是Meta推出的轻量级大语言模型，专为资源受限的移动设备设计。尽管参数规模精简，但在多种语言理解和生成任务上仍保持优异表现，适合在手机和平板设备上高效部署运行。Llama 3.2 3B以其小巧的模型体积和出色的性能平衡著称，3B参数版本特别针对移动端场景进行了优化。该模型同样支持128个token的输入序列长度和4096个token的最大上下文，采用w4和w8混合量化技术，在保证推理质量的同时大幅降低内存占用和计算功耗。无论是日常对话、内容创作还是客户支持场景，Llama 3.2 3B都能提供流畅自然的交互体验，是移动设备上运行大语言模型的理想选择。

效果视频

规格与下载

设备型号	下载链接
骁龙 X Elite-8380	下载
骁龙 8 至尊版-8750	下载
第五代骁龙 8 至尊版-8850	下载

技术细节

Input sequence length for Prompt Processor:128
Maximum context length:4096
Quantization Type:w4 + w8 (few layers) with fp16 activations and w4a16 + w8a16 (few layers) are supported
Supported languages:English.
TTFT:Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt. The lower bound is for a short prompt (up to 128 tokens, i.e., one iteration of the prompt processor) and the upper bound is for a prompt using the full context length (4096 tokens).
Response Rate:Rate of response generation after the first response token.

应用领域

Dialogue
Content Generation
Customer Support

授权信息

Source Model: APACHE-2.0
Deployable Model: AI-HUB-MODELS-LICENSE