Llama
https://www.llama.com/
Mistral
https://mistral.ai/models
Etc:
LLaMA 2 7B / 13B / 70Bparams Meta Custom (non-commercial for some) Powerful, widely used. Available via Hugging Face.
LLaMA 3 8B / 70B Meta Custom (open-weight) Newer, more capable than LLaMA 2. May have commercial restrictions.
Mistral 7B 7B Mistral AI Apache 2.0 Fast, strong performance. Supports multi-query attention.
Mixtral 8x7B ~12.9B active Mistral AI Apache 2.0 Sparse MoE (uses 2 of 8 experts). High performance.
Phi-2 / Phi-3 2.7B / 3.8B+ Microsoft MIT / Open Small but very efficient. Good for on-device.
Gemma 2 2B / 7B Google DeepMind Apache 2.0 Lightweight, efficient, for research & commercial use.
Command R / R+ 7B+ Cohere Apache 2.0 Fine-tuned for RAG (retrieval-augmented generation).
OpenHermes 2.5 / 2.5-Mistral 7B Teknium Open (depends on base) Popular open-instruct models built on Mistral.
Yi-34B 34B 01.AI Open (restrictions may apply) High-performance model from China.
Dolphin 2.7 7B Cognitive Computations Open Strong performance; instruction-tuned.
StableLM Zephyr 3B / 7B Stability AI Open Aligned with RLHF, chat-tuned.
Pythia 70M–12B EleutherAI Apache 2.0 Designed for transparency & research.
RedPajama 3B / 7B Together / Hazy Research Apache 2.0 Full-stack dataset + model project.
Falcon 7B / 40B TII (UAE) Apache 2.0 (7B), custom (40B) Early open model; still useful.
Where to Use or Download Them
Hugging Face – Most are hosted here with easy-to-use APIs.
Ollama – Run models locally with one command (supports LLaMA 2/3, Mistral, etc.).
LMStudio – GUI for running open LLMs locally on Mac/Windows.
Replicate – Run open models via web APIs.
GPT4All – Desktop apps and models optimized for offline use.
---
🧠 Tips When Choosing a Model
Use Mistral 7B or Mixtral 8x7B for high-quality, efficient chat or RAG apps.
Use Phi-3 or Gemma 2B for on-device or low-resource environments.
Use LLaMA 3 (8B or 70B) if you want Meta’s best open-weight models for research.