We live in the age of API wrappers. Most “AI Apps” are just thin veils over OpenAI or Anthropic, burning credits with every keystroke.
I wanted something different. I wanted autonomy.
The Problem: The API Tax
Building complex software with agents is expensive. A simple refactor session can burn 1M tokens. When you’re experimenting, that “meter anxiety” kills creativity.
I needed a brain that was smart enough to code, but cheap enough to run in a loop.
The Solution: Silicon at Home
I have a gaming rig gathering dust during the day. It has an NVIDIA RTX 3060 with 12GB VRAM.
Enter Ollama and Qwen 2.5 Coder 32B.
By running the model locally, I get:
- Zero Cost: No per-token billing.
- Privacy: My code never leaves my network.
- Speed: Local inference is surprisingly snappy.
The Stack: OpenClaw + Tailscale
The magic glue is OpenClaw. It runs on my laptop (or VPS) but “thinks” using my gaming PC via a Tailscale tunnel.
# On Gaming PC
ollama serve
# On Laptop
export OLLAMA_BASE_URL=http://100.x.y.z:11434
openclaw run "Build me a website"
The agent on my laptop sends the prompt over the encrypted mesh network to my desktop. The GPU spins up, writes the code, and sends it back. OpenClaw then takes that code and deploys it to my VPS.
The Result
You’re looking at it.
This entire site—the Astro configuration, the component structure, the CSS—was generated by an agent running on consumer hardware, deployed to a $5 VPS, all while I watched.
The future isn’t just in the cloud. It’s on your desk.