memory intermediate

Conversational Memory

Maintain context across multi-turn conversations by managing short-term, long-term, and summary memory for AI assistants.

memoryconversationcontext-windowsummarizationhistory

Overview

LLMs are stateless — each API call is independent. Conversational Memory patterns maintain context across turns by managing what previous information to include in each request. The challenge: context windows are limited, so you must be strategic about what to remember.

When to Use

  • Building chatbots or conversational assistants
  • Multi-turn workflows requiring context from earlier turns
  • Long conversations that exceed the context window
  • Applications where users expect the AI to remember preferences or facts

Architecture

flowchart TB
    U[User Message] --> MM[Memory Manager]
    
    subgraph Memory Types
        BM[Buffer Memory<br>Recent N messages]
        SM[Summary Memory<br>Compressed history]
        EM[Entity Memory<br>Key facts & entities]
    end
    
    MM --> BM
    MM --> SM
    MM --> EM
    
    BM --> PB[Prompt Builder]
    SM --> PB
    EM --> PB
    U --> PB
    PB --> LLM[LLM]
    LLM --> R[Response]
    R --> MM

Memory Strategies

StrategyHow It WorksProsCons
BufferKeep last N messagesSimple, preserves recent contextLoses old context
WindowSliding window of N tokensToken-awareHard cuts mid-conversation
SummaryLLM summarizes older messagesCompact, retains key infoLossy, extra LLM call
EntityExtract and track entities/factsStructured, selectiveComplex to maintain
HybridCombine summary + recent bufferBest of bothMore complexity

Implementation

▶ Interactive Example (python)

Gotchas & Best Practices

🚨 Token Limits Are Hard

Always count tokens, not messages. A single message can be thousands of tokens. Use a tokenizer to measure actual context usage and leave room for the response.

⚠️ Summary Drift

Repeated summarization is lossy — important details can gradually disappear. Consider keeping critical facts in a separate entity store that never gets summarized away.

💡 Hybrid Is Usually Best

In production, combine approaches: summary for old history + buffer for recent messages + entity memory for key facts. This gives the best coverage within token limits.

💡 Let Users Correct Memory

Give users a way to see and correct what the AI remembers. This builds trust and fixes inevitable extraction errors.

Variations

  • Buffer Memory — Simple FIFO message queue
  • Summary Memory — LLM-generated summaries of older context
  • Entity Memory — Track specific entities and facts
  • Knowledge Graph Memory — Store relationships between entities
  • Retrieval-Based Memory — Embed past messages, retrieve relevant ones per query

Further Reading