Llama 3.1 8B

简介

Llama 3.1 8B是Meta公司开发的新一代大语言模型，在各类语言理解与生成任务上展现卓越性能。该模型采用4-bit和8-bit混合量化技术，专为设备端部署优化，适合在骁龙移动平台上流畅运行。Llama 3.1系列是Meta最新推出的开源大语言模型家族，在多项权威基准测试中取得了领先成绩。模型支持128个token的输入序列长度，最大上下文长度达4096个token，能够满足大多数对话和内容生成场景的需求。经过量化优化后，模型体积大幅减小，推理效率显著提升，使得在手机、平板等移动设备上部署成为可能，为用户带来本地化的AI体验。

效果视频

规格与下载

设备型号	下载链接
骁龙 X Elite-8380	下载
骁龙 8 至尊版-8750	下载
第五代骁龙 8 至尊版-8850	下载

技术细节

Input sequence length for Prompt Processor:128
Maximum context length:4096
Quantization Type:w4a16 + w8a16 (few layers)
Language(s) supported:English.
TTFT:Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt. The lower bound is for a short prompt (up to 128 tokens, i.e., one iteration of the prompt processor) and the upper bound is for a prompt using the full context length (4096 tokens).
Response Rate:Rate of response generation after the first response token.

应用领域

Dialogue
Content Generation
Customer Support

授权信息

Source Model: APACHE-2.0
Deployable Model: AI-HUB-MODELS-LICENSE