How I Stopped AI Memory Amnesia Without a Vector DB

Most agents feel smart for 30 minutes, then forget the plot.

You explain your workflow on Monday, and by Wednesday the agent asks the same setup questions again. Not because it’s dumb—because session context is ephemeral.

I wanted a memory system that does three things at once:

  1. Keeps full history
  2. Keeps startup context cheap
  3. Lets the agent know what it already knows

The Core Problem

Most memory approaches solve retrieval, not awareness.

  • Flat MEMORY.md grows too fast and becomes a manual pruning chore.
  • RAG-only memory works when you know what to query, but fails for “Do I already have context on this?”
  • Huge context windows increase cost and reduce signal quality over time.

What was missing was an always-on, low-cost awareness layer.

The Model: Compaction Tree

I moved to a five-level memory hierarchy:

  • Raw logs (full fidelity)
  • Daily summaries
  • Weekly summaries
  • Monthly summaries
  • ROOT.md (global awareness index)

At session start, the agent reads ROOT.md first. That gives a fast map of active context, recent decisions, recurring patterns, and open threads.

If needed, it drills down to monthly/weekly/daily nodes, then raw logs only as a last step.

Why This Works

It separates awareness from retrieval.

  • Awareness: “I already have context on contractor web design and deployment workflows.”
  • Retrieval: “Find all notes about DNS issues from last week.”

You can still use semantic search, but now search is a second layer—not the only memory strategy.

Practical Benefits

  • Lower token burn at session start
  • Fewer repeated explanations
  • Better continuity across days
  • Clear carry-forward tasks and decisions
  • Human-editable local files (no lock-in)

Local-First by Default

This approach runs on plain markdown files and scripts. No hosted memory API, no extra billing surface, no dependency on external state.

That also makes it easy to version, inspect, and repair.

Final Take

If your agent keeps forgetting context, don’t just add more retrieval.

Give it a memory shape.

A compacted hierarchy gives you the best of both worlds: persistent history and fast operational awareness.

And that’s the difference between an assistant that chats and one that compounds.