AI Agents - Memory Schema

Introduction

In the past 6 weeks, I've been using Hermes Agent as my main agent for coding, inspection and research. Here's my take after diving into most parts of it.

For the first week, I kept testing it the way I test every agent: I told it something, closed the session and open a new one, and tried to retrieve it back again by asking again. Most agents do not really have memory. They have chat history, a big context window, and sometimes a vector store after the fact. It looks like memory in a demo and falls apart the moment work spans more than one session.

Hermes did not forget. That sounds like a small thing. It is not. After six weeks of using it as my daily driver, the part that changed how I work was not any single feature. It was that I stopped re-explaining myself. I had spent many months training myself to re-establish context at the start of every conversation, like dumping a README file before closing a session and re-reading it again the next day. Or, trying to write AI-aided skills to make sure a workflow is always remembered and investing many hours fine-tuning it to my liking. With Hermes I slowly dropped those habits, and nothing broke. Which made me impressed, and also interested to know more about their implementation.

So this is less a tour of an architecture and more a report on what living with one taught me. The architecture matters, and I will get to it, because the reasons it works are worth understanding. But the headline is the day-to-day: the agent remembers the right things, can dig up the rest when asked, and turns workflows I repeat into things it knows how to do. Let me start with where that actually showed up.

The win I did not expect: I stopped repeating myself

The thing that wears you down with most agents is not that they are wrong. It is that they are amnesiac. You explain your stack, your conventions, the way you like commands shown before they run, and then you do it again tomorrow, and again next week. Every conversation starts from zero.

By the second week with Hermes, that tax was mostly gone. I had told it, once, to show me the command before running it. It remembered. I mentioned the staging host quirk, the port I always forget, the fact that I prefer concise output. Those stuck. None of them needed to be repeated, and none of them needed a vector database to retrieve, because they were not buried in a transcript somewhere. They lived in a small, always-loaded layer, that I even didn't build or had to mention.

That is the first lesson, and it is the one the architecture is built around: giving an agent more text doesn't tell it what matters. Pouring a longer history into the context window does not help the agent know whether a line is a standing preference, a one-off task state, a correction, or evidence it might need later. It also gives it no way to stop carrying old mistakes forward. Memory design has to answer those questions before you ever pick a database.

Different memories, different rules

The reason a single bucket fails is that the things we lump together as "memory" want to be treated differently.

Memory type	Example	How the agent should use it
Semantic memory	"This cluster runs on K3s, and deploys go through Helm."	Keep it compact and available often.
Episodic memory	The conversation where we decided to move config out of Docker images and into a ConfigMap.	Search it when someone needs the full history.
Procedural memory	The Ansible playbook for provisioning a new worker node.	Load it when the task matches.
Task state	A `kubectl rollout` that is currently in progress.	Scope it tightly and expire it when the task ends.

LangChain draws a similar line between thread-scoped state and long-term memory, with semantic, episodic, and procedural forms. Its memory overview is worth reading because it makes scope explicit. A memory that belongs to one conversation should not quietly become a fact for every future user.

This is easy to miss when everything sits in one vector store. Search can find related text, but it cannot tell you whether that text is still true or whether the agent is even allowed to act on it. I felt the difference in practice. The preference "show the command first" never came back to me as a fuzzy nearest-neighbor match. It was just there, every turn, because it had earned a place in the hot layer instead of being one more document to retrieve.

Why more than one layer

An agent needs a small hot layer for facts that deserve a spot in every prompt. It needs a searchable evidence layer for the conversations and tool results behind those facts. It needs a procedural layer for methods that are too big to carry everywhere but worth keeping. Sometimes it needs a provider that can do semantic retrieval, a graph, or a richer model of the user.

The principle is the same one I kept bumping into: do not make every stored thing a permanent prompt tax.

What Hermes keeps in the hot layer

Hermes keeps its built-in layer deliberately small. It persists MEMORY.md for agent and environment notes, plus USER.md for preferences and communication style. The documented defaults cap them at 2,200 and 1,375 characters, and Hermes loads both into the system prompt at the start of a session. Hermes persistent memory

Those caps annoyed me at first. They turned out to be the point. A preference like "show the command before running it" earns its place on many turns. A long debugging transcript does not. By keeping the always-loaded layer tight, Hermes forces the agent to decide what is worth carrying, which is exactly the decision that keeps the hot layer useful instead of bloated. The transcripts go somewhere else: Hermes stores sessions in SQLite with full message history and FTS5 full-text search, and the agent retrieves the original discussion when it actually needs the detail. Hermes sessions

That gives a clean split between what is always present and what is one query away.

Need	Hermes layer
A stable fact needed often	Built-in memory or user profile
The reason behind a decision	Searchable session history
A workflow that has proved useful	A skill loaded on demand
Deeper semantic or graph retrieval	An optional external provider

The second win: workflows that became skills

If the first win was the agent remembering my preferences, the second one took longer to notice and ended up mattering more. Hermes does not only read memory. It updates it, and some of those updates are not facts at all. They are procedures.

Here is how it played out. There was a recovery routine I kept walking the agent through, the same sequence of checks every time a particular service misbehaved. The third or fourth time, instead of treating it as a fresh problem, the agent had turned the routine into a skill. The next time the service fell over, it just ran the method. I had not sat down to author anything. The repetition itself produced the procedure.

Hermes treats skills as procedural memory. The agent can create or update a skill after it finds a useful workflow, hits a repeated failure mode, or gets a correction. Skills use progressive disclosure, so they do not sit in every prompt. Hermes skills

This is the part that made Hermes feel more complete than a memory store. It can hold "the staging host uses port 2222" as a short fact while keeping a longer recovery method as a skill. A fact tells the agent something. A skill tells it how to do something. Plenty of agent products promise that learning loop. Hermes is the first one where I watched it happen without me running the loop by hand.

Under the hood, the mechanics behind both updates are worth knowing. During a task, the agent can add, replace, or remove entries in its built-in memory. MEMORY.md takes environment facts, project conventions, and things learned while working. USER.md takes preferences and communication style. Exact duplicates are rejected. When an entry would blow past the configured limit, Hermes rejects the write instead of silently dropping older material, so the agent has to consolidate or remove something first. Hermes persistent memory A background review runs after a conversation and can save a durable correction, update an existing memory, or promote a repeated workflow into a skill.

The friction: it remembers things I meant to forget

I want to be honest about the part that wore on me, because it is the kind of thing a feature list never mentions.

The same instinct that makes Hermes good at keeping my standing preferences sometimes fires on the wrong thing. I would tell it something that was true for one task only, a flag I wanted just this once, a path that mattered for a single debugging session, and later find that it had filed that away as a durable preference. It read a one-off instruction as a standing rule. Nothing dramatic happens when it gets this wrong, but the next session starts carrying a preference I never meant to keep, and over a few weeks those add up.

So I got into the habit of opening MEMORY.md and USER.md every now and then and pruning the entries that should never have been promoted. It is not a long job, but it is a recurring one, and it is the clearest reminder that admission is the hard part of memory. Deciding what is worth keeping is exactly the judgment call an agent will sometimes miss, and the cost of a miss lands later, quietly, in a future session.

Hermes does give you a lever for this. Turn on memory.write_approval and foreground and background writes get staged for you to confirm before they land, and skills have their own approval gate with a diff review on top. Hermes memory write approval Hermes skills Approval catches the over-eager writes at the door instead of leaving them for you to find later, at the cost of a confirmation step on every write. That tradeoff, between reviewing up front and cleaning up after, is one you should expect to tune rather than set once.

There is a related quirk worth flagging, less friction than a thing to understand. Hermes writes an update to storage right away, but it does not rewrite the built-in memory block in the middle of the current session. The current prompt keeps its frozen snapshot. A new session, or another prompt rebuild, picks up the new version. Hermes documents this as a way to keep prompt caching stable. Hermes prompt assembly

That sounds limiting until you sit with the alternative. If the agent rewrote its system prompt after every small write, cost and behavior would get harder to predict. In the current conversation the agent already knows what it just learned. In the next session it gets the compact version that survived curation. In practice I rarely noticed the lag, and once I knew the reason it stopped bothering me at all.

Why keep session search separate from memory

An agent should not have to choose between forgetting everything and repeating everything. Hermes keeps full sessions as evidence and lets the agent search them when needed. I leaned on this more than I expected. "Why did we change this setting?" is a question I ask a lot, and the answer was usually in a tool result from two weeks earlier, not in a five-line summary. The summary tells you the current state. The transcript tells you how you got there.

The separation also paid off when work moved between surfaces. Hermes documents session persistence and handoff across its supported interfaces, so a terminal conversation can continue in a messaging channel without turning every past message into permanent system context. Hermes sessions

Where Hermes is strong, and where it is not

Hermes did not invent persistent memory, semantic search, or agent-managed procedures. LangGraph gives teams low-level state and storage primitives. If you need custom transactions, a particular database topology, or application-owned state machines, a framework is probably the better starting point, and you should reach for one.

Hermes is strong when you want the whole memory loop already wired together: local facts, searchable sessions, cross-surface continuity, optional provider depth, skills, and governance over updates. You do not have to assemble every part before the agent can remember a preference, find an old decision, or hold on to a workflow that worked. After six weeks, that "already wired together" quality is what I would actually pay for. I have built memory stacks by hand before. Not having to was the difference between an agent I configure and an agent I use.

What to take from this

Do not open by asking whether to use a vector store, SQLite, or a graph database. Open by deciding what the agent is allowed to remember, who owns it, where the evidence lives, when a fact expires, and how someone corrects it. The storage choice falls out of those answers. It does not lead them.

Hermes is interesting because it answers those questions with separate layers instead of one large memory feature, and using it day to day is what made the value concrete for me. I stopped re-explaining myself. I watched a repeated chore turn into a skill on its own. I traded some approval friction for trust as I learned what the agent chose to keep. The architecture is the reason all of that holds together, but the experience is what convinced me.

AI Agents - Memory Schema

Introduction

The win I did not expect: I stopped repeating myself

Different memories, different rules

Why more than one layer

What Hermes keeps in the hot layer

The second win: workflows that became skills

The friction: it remembers things I meant to forget

Why keep session search separate from memory

Where Hermes is strong, and where it is not

What to take from this

Further reading

Comments

More from this blog

Observability Is a Method, Not a Stack

RTS – AST Based Test Selection: A Deep Dive into Building Production-Grade Regression Test Selection

Command Palette

Introduction

The win I did not expect: I stopped repeating myself

Different memories, different rules

Why more than one layer

What Hermes keeps in the hot layer

The second win: workflows that became skills

The friction: it remembers things I meant to forget

Why keep session search separate from memory

Where Hermes is strong, and where it is not

What to take from this

Further reading

Comments

More from this blog