Conversational Memory
Maintain context across multi-turn conversations by managing short-term, long-term, and summary memory for AI assistants.
Overview
LLMs are stateless — each API call is independent. Conversational Memory patterns maintain context across turns by managing what previous information to include in each request. The challenge: context windows are limited, so you must be strategic about what to remember.
When to Use
- Building chatbots or conversational assistants
- Multi-turn workflows requiring context from earlier turns
- Long conversations that exceed the context window
- Applications where users expect the AI to remember preferences or facts
Architecture
flowchart TB
U[User Message] --> MM[Memory Manager]
subgraph Memory Types
BM[Buffer Memory<br>Recent N messages]
SM[Summary Memory<br>Compressed history]
EM[Entity Memory<br>Key facts & entities]
end
MM --> BM
MM --> SM
MM --> EM
BM --> PB[Prompt Builder]
SM --> PB
EM --> PB
U --> PB
PB --> LLM[LLM]
LLM --> R[Response]
R --> MM
Memory Strategies
| Strategy | How It Works | Pros | Cons |
|---|---|---|---|
| Buffer | Keep last N messages | Simple, preserves recent context | Loses old context |
| Window | Sliding window of N tokens | Token-aware | Hard cuts mid-conversation |
| Summary | LLM summarizes older messages | Compact, retains key info | Lossy, extra LLM call |
| Entity | Extract and track entities/facts | Structured, selective | Complex to maintain |
| Hybrid | Combine summary + recent buffer | Best of both | More complexity |
Implementation
Gotchas & Best Practices
Always count tokens, not messages. A single message can be thousands of tokens. Use a tokenizer to measure actual context usage and leave room for the response.
Repeated summarization is lossy — important details can gradually disappear. Consider keeping critical facts in a separate entity store that never gets summarized away.
In production, combine approaches: summary for old history + buffer for recent messages + entity memory for key facts. This gives the best coverage within token limits.
Give users a way to see and correct what the AI remembers. This builds trust and fixes inevitable extraction errors.
Variations
- Buffer Memory — Simple FIFO message queue
- Summary Memory — LLM-generated summaries of older context
- Entity Memory — Track specific entities and facts
- Knowledge Graph Memory — Store relationships between entities
- Retrieval-Based Memory — Embed past messages, retrieve relevant ones per query