Building AI Apps with Persistent Memory: A Practical SDK Guide
Learn how to add persistent memory to AI apps built on the Anthropic or OpenAI API — architecture, what to store, and a TypeScript SDK walkthrough.
MemNexus Team
Engineering
You've built the AI feature. It calls the API, streams a response, formats it nicely. Users are impressed — until the second session. Then they have to re-explain everything. Their stack. Their preferences. The context you spent the first session establishing. Gone.
This isn't a UX problem you can paper over with a better onboarding flow or a longer system prompt. It's structural. The LLM your app is built on is stateless by design, and the memory features inside ChatGPT or Claude Desktop don't cross the wall into the API. When you call the Anthropic API or the OpenAI API, you get a blank model. Every time.
The solution is to build the memory layer yourself — and it's more straightforward than it sounds.
What the architecture looks like without memory
Most AI features follow the same basic loop:
- Build a system prompt with static instructions
- Append the user's message
- Call the model
- Return the response
- Repeat next session from scratch
The system prompt might be excellent. But it's static. It doesn't know this specific user prefers terse responses. It doesn't know they're building on a PostgreSQL backend with a Zod validation layer. It doesn't know they already tried the approach you're about to suggest and found a gotcha.
Every session, the model meets your user for the first time.
What the architecture looks like with memory
Add a memory layer and the loop changes:
- At session start: query the memory store for context relevant to this user and topic
- Inject retrieved memories into the system prompt
- Append the user's message
- Call the model
- Return the response
- At session end: extract anything worth saving and write it to the memory store
The model still resets between sessions. But the knowledge doesn't. Your app carries forward what matters and surfaces it when it's relevant.
This is the pattern behind consumer AI memory features — it's just not exposed to you when you're on the API side. You need to build it yourself, or use a library that handles the infrastructure. If you're evaluating purpose-built memory infrastructure, see how MemNexus compares to Mem0 — a common alternative for app developers.
What's actually worth persisting
Not everything from a session belongs in a memory store. Three categories are consistently valuable:
User preferences. How they like to communicate. Their formatting preferences. Whether they want verbose explanations or just the answer. Their preferred patterns, libraries, and conventions. These are often stated explicitly early in a relationship and never again — unless the app remembers them.
Factual context. Their tech stack. Project structure. Team conventions. Domain-specific knowledge about their business or codebase. This is the background the model needs to give relevant answers instead of generic ones. A user who's told your app they use raw SQL with pg doesn't want suggestions involving an ORM.
Interaction history. What they've tried. Decisions made and why. Problems solved and how. This is the most valuable category and the hardest to capture — but it's what lets your app say "we already covered this" or "last time you ran into X, you solved it by doing Y."
What's not worth persisting
Raw conversation transcripts are too noisy. Long exchanges contain false starts, clarifications, and back-and-forth that dilutes the signal. You want the distilled insight, not the full dialogue.
Facts the model already knows aren't worth storing. Saving "Python's map() returns a lazy iterator" wastes space and search relevance. Your memory store should contain things specific to this user or this project, not general programming knowledge.
Low-signal messages don't need to be saved. "Can you repeat that?" or "Thanks" aren't context worth carrying forward. A useful heuristic: if the message doesn't contain a preference, a fact about the user's situation, or a decision, it probably doesn't belong in the store.
The pattern in code
Here's the conceptual structure. This isn't pseudocode — these are real SDK calls you'd make using the MemNexus TypeScript SDK.
Install the package and initialize a client authenticated with your API key. Then, at the start of a chat session, search the memory store for context relevant to what the user is working on:
import { MemnexusClient } from "@memnexus-ai/mx-typescript-sdk";
const memory = new MemnexusClient({ apiKey: process.env.MX_API_KEY });
// Before the session: retrieve relevant context for this user
const relevant = await memory.memories.search({
query: userMessage, // use the opening message as the search query
topics: [userId], // scope to this user's memories
limit: 5,
});
const memoryContext = relevant.data
.map((r) => r.memory.content)
.join("\n");
Then build the system prompt with retrieved context injected:
const systemPrompt = `
You are a helpful assistant.
${memoryContext ? `Context about this user:\n${memoryContext}` : ""}
${yourBaseInstructions}
`.trim();
Call the model as normal, with the enriched system prompt. After the session ends, save anything worth keeping:
// After the session: save what's worth remembering
await memory.memories.create({
content: "User confirmed they prefer TypeScript over JavaScript and use Zod for all validation. Avoid suggesting class-based patterns.",
topics: [userId, "preferences"],
});
The extraction step — deciding what to save — is where most of the judgment lives. You can do this manually, with a second LLM call that summarizes the session, or with a structured extraction prompt that pulls out preferences and facts. The MemNexus server handles topic extraction, entity recognition, and semantic indexing automatically once you write the content.
The MemNexus SDK approach
The SDK takes care of the infrastructure so you can focus on what to store and when.
Install with npm:
npm install @memnexus-ai/mx-typescript-sdk
Authenticate with your API key:
import { MemnexusClient } from "@memnexus-ai/mx-typescript-sdk";
const client = new MemnexusClient({
apiKey: process.env.MX_API_KEY,
});
The memory store is fully managed. You write content, and the server extracts structured facts and topics from it automatically — so your search results improve over time without additional engineering on your end. Hybrid search combines vector similarity with full-text matching, so queries like "user's database preferences" or "decisions made about the auth layer" return accurate results without requiring exact keyword matches.
You can scope memories to individual users with topics, group related memories into conversations for context-aware retrieval, and search with time filters to prioritize recent context over older records.
Full documentation is at /docs/guides/sdk/installation.
Beyond the basics
Once the pattern is in place, you can extend it in useful directions.
Scoping by user keeps memory stores clean — each user's context doesn't bleed into another's. Use a stable user identifier as a topic when creating memories, and filter by it when searching.
Tiering by recency helps surface the most relevant context first. A preference stated last week matters more than one from six months ago. The SDK's recent filter on search lets you weight recent memories more heavily in what you inject.
Grouping sessions into conversations makes retrieval more coherent. When a user returns to a topic they worked on before, searching by conversation brings back a clustered set of related memories rather than scattered results. The SDK supports conversationId on memory creation for exactly this.
Persistent memory is what makes users come back
The difference between an AI feature users try once and one they rely on is almost always context. A model that knows this user, their project, and their history gives answers that feel relevant instead of generic. That relevance compounds — every session adds to what the app knows, which makes the next session more useful.
The structural pieces are straightforward: search before a session, inject what's relevant, save what matters after. The MemNexus SDK handles the storage, indexing, and retrieval infrastructure. What's left is your judgment about what's worth remembering — which is the interesting part.
Request access to MemNexus and start building memory into your AI features.
Get updates on AI memory and developer tools. No spam.
Related Posts
5-Minute Setup: Persistent Memory for Your AI Coding Assistant
Install MemNexus, save your first memory, and connect your AI tools — all with a single setup command. Your AI assistant remembers everything from here on out.
Three Memory Patterns Every AI Agent Needs
Most AI agents have no memory between sessions. Here are three patterns — session memory, preference memory, and knowledge memory — that make agents genuinely useful over time.
How AI coding assistants forget everything (and why that's a hard problem to solve)
Every AI coding assistant resets at session end — not a bug, but an architectural constraint baked into LLMs. Why it happens and what you can do about it.