MemNexus already does a solid job of storing and retrieving your memories. Semantic search finds what you need, narrative reconstruction tracks how knowledge evolves, and timeline search lets you replay how a debugging session unfolded. But we kept thinking about one thing: every memory you save is rich with structure that the system could understand better. Version numbers, technology names, decisions, configuration values — all sitting right there in the content.

So we built something to extract it automatically.

The Idea

When you save a memory like this:

"Deployed CLI v1.7.29 with batch retrieval support.
Max 100 IDs per request."

You might tag it with cli, deployment — which is totally reasonable. That's what you're thinking about in the moment. But the content itself contains a lot more signal:

| What's in the Content | Type | |----------------------|------| | v1.7.29 | Version number | | batch retrieval | Feature name | | 100 IDs | Configuration limit | | batch retrieval | Feature name | | CLI | Project reference |

We thought: what if MemNexus could pick up on all of that automatically? Not to replace your tags, but to supplement them. Your tags capture intent ("this is a deployment"). Extraction captures structure ("here are the specific things mentioned").

Introducing Automatic Content Extraction

Starting in v1.26.0, MemNexus can automatically extract three types of structured metadata from every memory:

Topics — Additional tags derived from the content (3-7 per memory), merged alongside your own
Facts — Structured knowledge triples like ("connection pool", "CONFIGURED_AS", "50 connections")
Entities — Named things: people, projects, technologies, versions, API endpoints

Extraction runs asynchronously in the background. Memory creation is just as fast as before — you get your response immediately, and extraction completes 1-3 seconds later.

What It Looks Like in Practice

Topic Enrichment

Say you save a memory about fixing a JWT bug and tag it completed:

Input: "Fixed the authentication bug where JWT tokens weren't being refreshed.
        The issue was in the middleware - added proper token expiry checking."

Your tags: completed

Extracted topics:
- authentication
- jwt
- token-refresh
- middleware
- bug-fix

Final topic list: completed, authentication, jwt, token-refresh, middleware, bug-fix

Your tag captures the status. The extracted topics capture the technical surface area. Now when you search for "jwt middleware" three weeks from now, you land right on it.

Fact Extraction

Facts get pulled out as subject-predicate-object triples with confidence scores:

Input: "Increased database connection pool to 50 connections.
        This resolved the performance issues under high load."

Extracted facts:
- ("connection pool", "CONFIGURED_AS", "50 connections", confidence: 0.92)
- ("performance issues", "RESOLVED_BY", "pool size increase", confidence: 0.88)

This means MemNexus can start answering structured queries — "what's our connection pool set to?" — by looking up facts directly, rather than relying solely on semantic similarity across your full memory set.

Entity Recognition

Entities get classified by type and linked into the knowledge graph:

Input: "John reviewed the analytics integration PR. We're using their
        JavaScript SDK v1.88.0 for the marketing site analytics."

Extracted entities:
- (PERSON: "John")
- (TECHNOLOGY: "analytics platform")
- (PROJECT: "marketing site")
- (VERSION: "v1.88.0")
- (CONCEPT: "analytics")

Once entities are linked, you get a new way to explore your knowledge. "What do we know about our analytics setup?" surfaces every memory that mentions it, regardless of how it was tagged.

More Signal, Same Speed

One thing we were careful about: extraction shouldn't slow down search. So we designed it as additional retrieval paths that run in parallel alongside the existing search, all within a single database round-trip. Search latency stays at 50-70ms. The difference is that each query now has more ways to find a relevant result — through extracted topics, shared entities, and matching facts, on top of the semantic similarity and manual tags that were already working.

A Richer Knowledge Graph

Here's what the graph looks like with extraction enabled:

Memories connect to topics (both your manual tags and extracted ones), facts, and entities through typed relationships. Entities that appear together in memories are linked via co-occurrence relationships.

This opens up some interesting queries:

"Show me all memories mentioning Redis" — entity traversal across your whole knowledge base
"What technologies does the marketing site use?" — follow project-to-technology entity links
"What's the current CLI version?" — direct fact lookup: subject="CLI", predicate="VERSION_IS"

The graph gets more connected with every memory, and those connections become new retrieval paths. To see how these connections power intelligent search, check out our deep dive on graph-aware search.

Under the Hood

We built a provider-agnostic extraction interface that supports multiple LLM backends. The current default is optimized for speed, cost-effectiveness, and handling technical content well. We can swap in different models for cases where we want more nuanced entity classification, without changing any of the pipeline logic.

Not every extraction is worth keeping, so we filter by confidence. Each extracted topic, fact, and entity comes with a confidence score, and anything below our threshold gets logged but not added to the graph. This keeps the metadata clean and search results sharp.

For existing memories created before we enabled extraction, we built a backfill pipeline that processes them in batches with rate limiting. It runs oldest-first so the knowledge graph builds up chronologically, handles failures gracefully, and can be scoped to individual users or run across the board.

It Just Works

From your perspective as a user, nothing changes about how you create or search memories. You keep using mx memories create and mx memories search exactly as before. The extraction happens behind the scenes — your memory gets richer metadata automatically, and search has more paths to find what you're looking for.

You might notice that searches start surfacing results you wouldn't have found before. That's the extraction doing its job.

What We're Thinking About Next

Extraction is one piece of a larger goal: making MemNexus a system that gets smarter as you use it. The more memories you add, the more connections the graph discovers, and the better retrieval gets.

Some things we're exploring:

Entity-based exploration — browse your knowledge graph by entity ("show me everything about Redis") rather than just searching by query
Fact-powered Q&A — answer direct questions ("what port does the API run on?") from extracted facts without needing a full search
Cross-memory patterns — surface connections between memories that share entities or facts, even if they're about different topics

Extraction also powers our Memory Digest feature, which synthesizes complete project briefings from your enriched knowledge graph.

We're building toward memory that organizes itself. Extraction is the foundation.

Automatic Content Extraction is available now in MemNexus Core API v1.26.0+. Get up and running with our persistent memory quickstart.

Ready to give your AI tools persistent memory? Join the MemNexus preview — it takes less than five minutes to get started.

We Taught MemNexus to Read Between the Lines

The Idea

Introducing Automatic Content Extraction

What It Looks Like in Practice

Topic Enrichment

Fact Extraction

Entity Recognition

More Signal, Same Speed

A Richer Knowledge Graph

Under the Hood

It Just Works

What We're Thinking About Next

Give your coding agents memory that persists

Related Posts

Your Agent Now Finds What It Missed Before

Find What You Actually Worked On: Conversation-Based Memory Retrieval

Precision Search: Include What You Want, Exclude What You Don't