You've built the AI feature. It calls the API, streams a response, formats it nicely. Users are impressed — until the second session. Then they have to re-explain everything. Their stack. Their preferences. The context you spent the first session establishing. Gone.

This isn't a UX problem you can paper over with a better onboarding flow or a longer system prompt. It's structural. The LLM your app is built on is stateless by design, and the memory features inside ChatGPT or Claude Desktop don't cross the wall into the API. When you call the Anthropic API or the OpenAI API, you get a blank model. Every time.

The solution is to build the memory layer yourself — and it's more straightforward than it sounds.

What the architecture looks like without memory

Most AI features follow the same basic loop:

Build a system prompt with static instructions
Append the user's message
Call the model
Return the response
Repeat next session from scratch

The system prompt might be excellent. But it's static. It doesn't know this specific user prefers terse responses. It doesn't know they're building on a PostgreSQL backend with a Zod validation layer. It doesn't know they already tried the approach you're about to suggest and found a gotcha.

Every session, the model meets your user for the first time.

What the architecture looks like with memory

Add a memory layer and the loop changes:

At session start: query the memory store for context relevant to this user and topic
Inject retrieved memories into the system prompt
Append the user's message
Call the model
Return the response
At session end: extract anything worth saving and write it to the memory store

The model still resets between sessions. But the knowledge doesn't. Your app carries forward what matters and surfaces it when it's relevant.

This is the pattern behind consumer AI memory features — it's just not exposed to you when you're on the API side. You need to build it yourself, or use a purpose-built memory layer that handles the infrastructure.

What's actually worth persisting

Not everything from a session belongs in a memory store. Three categories are consistently valuable:

User preferences. How they like to communicate. Their formatting preferences. Whether they want verbose explanations or just the answer. Their preferred patterns, libraries, and conventions. These are often stated explicitly early in a relationship and never again — unless the app remembers them.

Factual context. Their tech stack. Project structure. Team conventions. Domain-specific knowledge about their business or codebase. This is the background the model needs to give relevant answers instead of generic ones. A user who's told your app they use raw SQL with pg doesn't want suggestions involving an ORM.

Interaction history. What they've tried. Decisions made and why. Problems solved and how. This is the most valuable category and the hardest to capture — but it's what lets your app say "we already covered this" or "last time you ran into X, you solved it by doing Y."

What's not worth persisting

Raw conversation transcripts are too noisy. Long exchanges contain false starts, clarifications, and back-and-forth that dilutes the signal. You want the distilled insight, not the full dialogue.

Facts the model already knows aren't worth storing. Saving "Python's map() returns a lazy iterator" wastes space and search relevance. Your memory store should contain things specific to this user or this project, not general programming knowledge.

Low-signal messages don't need to be saved. "Can you repeat that?" or "Thanks" aren't context worth carrying forward. A useful heuristic: if the message doesn't contain a preference, a fact about the user's situation, or a decision, it probably doesn't belong in the store.

The pattern in code

Here's the conceptual structure. This isn't pseudocode — these are real SDK calls you'd make using the MemNexus TypeScript SDK.

Install the package and initialize a client authenticated with your API key. Then, at the start of a chat session, search the memory store for context relevant to what the user is working on:

import { MemnexusClient } from "@memnexus-ai/mx-typescript-sdk";

const memory = new MemnexusClient({ apiKey: process.env.MX_API_KEY });

// Before the session: retrieve relevant context for this user
const relevant = await memory.memories.search({
  query: userMessage,          // use the opening message as the search query
  topics: [userId],            // scope to this user's memories
  limit: 5,
});

const memoryContext = relevant.data
  .map((r) => r.memory.content)
  .join("\n");

Then build the system prompt with retrieved context injected:

const systemPrompt = `
You are a helpful assistant.

${memoryContext ? `Context about this user:\n${memoryContext}` : ""}

${yourBaseInstructions}
`.trim();

Call the model as normal, with the enriched system prompt. After the session ends, save anything worth keeping:

// After the session: save what's worth remembering
await memory.memories.create({
  content: "User confirmed they prefer TypeScript over JavaScript and use Zod for all validation. Avoid suggesting class-based patterns.",
  topics: [userId, "preferences"],
});

The extraction step — deciding what to save — is where most of the judgment lives. You can do this manually, with a second LLM call that summarizes the session, or with a structured extraction prompt that pulls out preferences and facts. The MemNexus server handles automatic content extraction — topic extraction, entity recognition, and semantic indexing — once you write the content.

The MemNexus SDK approach

The SDK takes care of the infrastructure so you can focus on what to store and when.

Install with npm:

npm install @memnexus-ai/mx-typescript-sdk

Authenticate with your API key:

import { MemnexusClient } from "@memnexus-ai/mx-typescript-sdk";

const client = new MemnexusClient({
  apiKey: process.env.MX_API_KEY,
});

The memory store is fully managed. You write content, and the server extracts structured facts and topics from it automatically — so your search results improve over time without additional engineering on your end. Hybrid search combines vector similarity with full-text matching, so queries like "user's database preferences" or "decisions made about the auth layer" return accurate results without requiring exact keyword matches.

You can scope memories to individual users with topics, group related memories into conversations for context-aware retrieval, and search with time filters to prioritize recent context over older records.

Full documentation is at /docs/guides/sdk/installation.

Beyond the basics

Once the pattern is in place, you can extend it in useful directions.

Scoping by user keeps memory stores clean — each user's context doesn't bleed into another's. Use a stable user identifier as a topic when creating memories, and filter by it when searching.

Tiering by recency helps surface the most relevant context first. A preference stated last week matters more than one from six months ago. The SDK's recent filter on search lets you weight recent memories more heavily in what you inject.

Grouping sessions into conversations makes retrieval more coherent. When a user returns to a topic they worked on before, searching by conversation brings back a clustered set of related memories rather than scattered results. The SDK supports conversationId on memory creation for exactly this.

Persistent memory is what makes users come back

The difference between an AI feature users try once and one they rely on is almost always context. A model that knows this user, their project, and their history gives answers that feel relevant instead of generic. That relevance compounds — every session adds to what the app knows, which makes the next session more useful.

The structural pieces are straightforward: search before a session, inject what's relevant, save what matters after. The MemNexus SDK handles the storage, indexing, and retrieval infrastructure. What's left is your judgment about what's worth remembering — which is the interesting part.

For a hands-on walkthrough of the core concepts, see the persistent memory quickstart.

Ready to give your AI tools persistent memory? Join the MemNexus preview — it takes less than five minutes to get started.

Building AI Apps with Persistent Memory: A Practical SDK Guide

What the architecture looks like without memory

What the architecture looks like with memory

What's actually worth persisting

What's not worth persisting

The pattern in code

The MemNexus SDK approach

Beyond the basics

Persistent memory is what makes users come back

Give your coding agents memory that persists

Related Posts

How to Give Your Coding Agent Persistent Memory

Three Slash Commands So You Never Have to Memorize MCP Tool Syntax

5-Minute Setup: Persistent Memory for Your AI Coding Assistant