Llama-3.1-Minitron4B: Nvidia's Compressed Language Model

🦙

Llama-3.1-Minitron4B

Nvidia's compressed version of Llama3, designed for resource-constrained devices

Key Features

  • Only 400 million parameters
  • Efficient training and deployment
  • Comparable performance to larger models

Technologies Used

  • 🔪 Model Pruning (Deep & Wide)
  • 🧠 Knowledge Distillation
  • 🔬 NeMo-Aligner Fine-tuning

Training Efficiency

40x

Less training data

16%

Performance improvement

Model Capabilities

📝 Instruction Following
🎭 Role Playing
🔍 Retrieval-Augmented Generation
⚙️ Function Calling
AIbase Logo

@AIbase

Tags:Llama-3.1-Minitron4B Nvidia language model compressed language model resource-constrained devices model pruning knowledge distillation AI model deployment Hugging Face commercial use