AI agents start every session from scratch. A persistent memory system lets agents store lessons and retrieve relevant context via vector search. Turso’s embedded database with built-in vector functions makes this possible in a single local file. This guide covers the schema and queries for building a memory system with vector similarity search.Documentation Index
Fetch the complete documentation index at: https://docs.turso.tech/llms.txt
Use this file to discover all available pages before exploring further.
Schema
At its simplest, a memory system needs just two tables: one for memories and one for tracking which memories were used in which tasks.memoriesstores each lesson with its vector embedding and a category (e.g.correction,insight,user,discovery). The embedding is a 384-dimensional int8-quantized vector (F8_BLOB(384)).tasksrecords each task the agent worked on, linking task descriptions to their embeddings for retrieval.
Connecting
Storing a memory
When the agent learns something — a correction, a user preference, an insight — store it with an embedding. Usevector8() to convert a JSON float array into an int8-quantized vector:
vector8() is a JSON-stringified float array (e.g. 384 dimensions from a model like all-MiniLM-L6-v2). Turso handles the quantization internally:
Retrieving relevant memories
When a new task starts, find the most relevant memories using Turso’svector_distance_cos() function:
vector_distance_cos() returns cosine distance (0 = identical, 2 = opposite). Lower is better. To convert to similarity: 1.0 - distance.Correcting wrong memories
When a memory turns out to be wrong, delete it and store a corrected version:Task tracking
Record tasks to understand what the agent has been working on:Purging old memories
Remove memories that have been around a long time but are never retrieved:Key design points
- Vector search uses Turso’s
vector8()andvector_distance_cos()for cosine similarity directly in SQL. Int8 quantization reduces storage by 75% compared to float32 with minimal impact on search quality. - For small per-project datasets, a full table scan with
ORDER BY distance LIMIT kis fast enough without a vector index. - Everything runs in a single embedded database file with no external services.
Going further
The schema above is deliberately minimal. In practice you will want to experiment with strategies for keeping the memory store useful as it grows. Some ideas:- Weighted memories — add a
weight REALcolumn and boost or penalize memories based on whether the agent found them useful after retrieval. Use an exponential moving average to update weights over time. - Time decay — multiply a recency factor into the retrieval ranking so that stale memories rank lower. For example:
(1.0 - distance) * POWER(0.95, days_since_last_retrieved). - Garbage collection — periodically remove memories with low weight and high retrieval count (retrieved often, never useful), or memories that have never been retrieved after a threshold period.
- Task scoring — track outcome metrics (tokens used, errors, corrections) per task and use them to compute a credit signal for the memories that were retrieved during that task.
- Retrieval attribution — add a join table
(memory_id, task_id, similarity, credit)to record exactly which memories were used in each task and how they were rated.