Knowledge

Run DeepSeek locally

#AI

DeepSeek's open-weight reasoning models run on your own hardware, no API key required. Here is how to pull and run DeepSeek-R1 with Ollama, pick a size that fits your machine, and query it over HTTP.

Published by Mark van Eijk on July 1, 2026
Updated on July 1, 2026 · 2 minute read

  1. Install Ollama
  2. Pull and run DeepSeek-R1
  3. Pick a size that fits
  4. Query it over HTTP
  5. Run it for a team

DeepSeek releases its models as open weights, which means you can download DeepSeek-R1 and run it entirely on your own hardware. No account, no API key, no per-token bill, and your prompts never leave the machine. The easiest way to do it is with Ollama, which handles the download and serves the model over HTTP.

Install Ollama

If you don't already have it, install Ollama first:

curl -fsSL https://ollama.com/install.sh | sh

On macOS, brew install ollama or the desktop app work too. See run a local LLM with Ollama for the full setup.

Pull and run DeepSeek-R1

Run the model and Ollama downloads the weights on first use:

ollama run deepseek-r1

That default tag is a distilled model that fits most machines. DeepSeek-R1 is a reasoning model, so it thinks out loud inside <think> tags before giving its final answer. Ask it something with a few steps and you'll see the working:

>>> If a server handles 200 requests per second and each takes 40ms, how many run concurrently?

Pick a size that fits

DeepSeek ships in several distilled sizes. Choose the largest that fits in memory, since a model has to fit in RAM or VRAM to run well:

ollama run deepseek-r1:1.5b   # laptop / CPU friendly
ollama run deepseek-r1:7b     # modest GPU
ollama run deepseek-r1:14b    # mid-range GPU
ollama run deepseek-r1:32b    # high-end GPU
ollama run deepseek-r1:70b    # 40 GB+ VRAM

The small distilled versions are fine-tuned to imitate the full R1's reasoning, so you keep much of the behaviour at a fraction of the size. Check what you've downloaded and remove sizes you're done with:

ollama list
ollama rm deepseek-r1:1.5b

Query it over HTTP

Once it's pulled, the model is available on Ollama's API on port 11434, so any script or app can call it:

curl http://localhost:11434/api/generate -d '{
  "model": "deepseek-r1",
  "prompt": "List three causes of a slow database query.",
  "stream": false
}'

The response includes the reasoning trace as well as the answer. If you only want the final answer, strip everything up to and including the closing </think> tag before you display it.

Run it for a team

For shared use, put DeepSeek on a server with a GPU rather than a laptop. The steps are the same as any model: bind Ollama to all interfaces, then protect it with a reverse proxy and a token. The full hardening walkthrough — sizing the box, Nginx in front, TLS, firewalling the port — is in self-host an LLM on your own server.

Subscribe to our newsletter

Do you want to receive regular updates with fresh and exclusive content to learn more about web development, hosting, security and performance? Subscribe now!

Frequently asked questions

Which DeepSeek model size should I run?
Pick the largest that fits comfortably in memory. The distilled 1.5B and 7B variants run on a laptop or a modest GPU, 14B and 32B want a real GPU, and the full 70B needs 40 GB+ of VRAM. The distilled models are smaller versions fine-tuned to mimic the full R1's reasoning, and are the right starting point for most people.
What is the difference between DeepSeek-R1 and a normal chat model?
R1 is a reasoning model: it produces an explicit chain of thought (wrapped in <think> tags) before its final answer. That makes it stronger at maths, logic, and multi-step problems, but slower and more verbose than a standard chat model. For quick factual replies a smaller general model is often a better fit.
Can I run DeepSeek without a GPU?
Yes, the smaller distilled models (1.5B and 7B) run on CPU, just slowly. Inference speed on CPU is fine for occasional use but frustrating for anything interactive. A GPU, or an Apple Silicon Mac with unified memory, makes a large difference.
Is it safe to run DeepSeek locally with private data?
Running the open weights locally means your prompts never leave your machine, which is the main privacy advantage over a hosted API. The model weights themselves run offline once downloaded. As with any self-hosted service, the security work is in how you expose the API, not in the model.