Run Llama 3 8B locally
Llama 3 8B is the model most people should try first. It's small enough to run on almost any laptop from the last five years, capable enough to feel genuinely useful, and free to download. If you're new to local AI, this is the one.
$ hivebear run llama-3-8bHiveBear will profile your hardware, pick the right quantization for your pool, and fall back to the hive if your machine can't carry it alone.
Hardware: running it alone
Any laptop with 16 GB of RAM, any M1+ Mac, or any PC with a discrete GPU from the last five years can run this model comfortably.
Q4_K_M quantization gets you to ~5 GB on disk and ~6-7 GB of active memory. The Raspberry Pi 5 with 8 GB of RAM will run it, just slowly.
Hardware: running it on the hive
You don't really need the hive for this one — it fits on almost anything alone. Where the hive helps is if you want faster tokens/sec: splitting across two peers can roughly double throughput on weaker hardware.
Llama 3 8B is the model we recommend starting with before attempting the bigger ones on the hive. It's the best way to get a feel for what 'fast enough' vs 'too slow' means on your hardware.
Things to know
Real gotchas from the hive. No sales pitch.
- →The base instruct model is trained to refuse some things — if you're hitting refusals on benign tasks, try a community fine-tune.
- →Context window on base Llama 3 (not 3.1) is only 8K tokens — fine for chat, short for long documents.
What Llama 3 8B is great at
Starter local LLM. Chat, quick questions, coding help, summarization. Fast enough on modern hardware to feel interactive.
If this isn't the one, try these instead
- →Mistral 7B — similar size, different training data, often better at non-English tasks.
- →Phi-3 Mini — even smaller (~4B), stronger on reasoning than its size suggests.
- →Qwen 2.5 7B — strong all-rounder, especially good at multilingual and code.
Give it a run on your hive
Free, open-source, no sign-up. The hive helps when your machine can't carry it alone.
