Llama 3.1 8B是Meta公司开发的新一代大语言模型,在各类语言理解与生成任务上展现卓越性能。该模型采用4-bit和8-bit混合量化技术,专为设备端部署优化,适合在骁龙移动平台上流畅运行。Llama 3.1系列是Meta最新推出的开源大语言模型家族,在多项权威基准测试中取得了领先成绩。模型支持128个token的输入序列长度,最大上下文长度达4096个token,能够满足大多数对话和内容生成场景的需求。经过量化优化后,模型体积大幅减小,推理效率显著提升,使得在手机、平板等移动设备上部署成为可能,为用户带来本地化的AI体验。
Input sequence length for Prompt Processor:128
Maximum context length:4096
Quantization Type:w4a16 + w8a16 (few layers)
Language(s) supported:English.
TTFT:Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt. The lower bound is for a short prompt (up to 128 tokens, i.e., one iteration of the prompt processor) and the upper bound is for a prompt using the full context length (4096 tokens).
Response Rate:Rate of response generation after the first response token.
Dialogue
Content Generation
Customer Support
Source Model: APACHE-2.0
Deployable Model: AI-HUB-MODELS-LICENSE