A neighbor's guide

Run Llama 3 8B locally

Llama · 8BContext: 8K (Llama 3), 128K (Llama 3.1)Released 2024

Llama 3 8B is the model most people should try first. It's small enough to run on almost any laptop from the last five years, capable enough to feel genuinely useful, and free to download. If you're new to local AI, this is the one.

One command to run it

$ hivebear run llama-3-8b

HiveBear will profile your hardware, pick the right quantization for your pool, and fall back to the hive if your machine can't carry it alone.

Hardware: running it alone

Any laptop with 16 GB of RAM, any M1+ Mac, or any PC with a discrete GPU from the last five years can run this model comfortably.

Memory

~5 GB (Q4) to ~16 GB (fp16)

GPU

Runs on CPU, any modern laptop GPU, M-series Macs, 6 GB+ discrete GPUs

Q4_K_M quantization gets you to ~5 GB on disk and ~6-7 GB of active memory. The Raspberry Pi 5 with 8 GB of RAM will run it, just slowly.

Hardware: running it on the hive

Example pool

You don't really need the hive for this one — it fits on almost anything alone. Where the hive helps is if you want faster tokens/sec: splitting across two peers can roughly double throughput on weaker hardware.

Llama 3 8B is the model we recommend starting with before attempting the bigger ones on the hive. It's the best way to get a feel for what 'fast enough' vs 'too slow' means on your hardware.

Things to know

Real gotchas from the hive. No sales pitch.

→The base instruct model is trained to refuse some things — if you're hitting refusals on benign tasks, try a community fine-tune.
→Context window on base Llama 3 (not 3.1) is only 8K tokens — fine for chat, short for long documents.

What Llama 3 8B is great at

Starter local LLM. Chat, quick questions, coding help, summarization. Fast enough on modern hardware to feel interactive.

If this isn't the one, try these instead

→Mistral 7B — similar size, different training data, often better at non-English tasks.
→Phi-3 Mini — even smaller (~4B), stronger on reasoning than its size suggests.
→Qwen 2.5 7B — strong all-rounder, especially good at multilingual and code.

Give it a run on your hive

Free, open-source, no sign-up. The hive helps when your machine can't carry it alone.

Download HiveBear Ask in Discord Hugging Face card

More models the hive is running

Llama · 70B

Llama 3 70B

DeepSeek · 671B (MoE, ~37B active) + distilled variants