AI agents start every session from scratch. A persistent memory system lets agents store lessons and retrieve relevant context via vector search. Turso’s embedded database with built-in vector functions makes this possible in a single local file.
This guide covers the schema and queries for building a memory system with vector similarity search.
Schema
At its simplest, a memory system needs just two tables: one for memories and one for tracking which memories were used in which tasks.
CREATE TABLE IF NOT EXISTS memories (
id TEXT PRIMARY KEY,
content TEXT NOT NULL,
embedding F8_BLOB(384),
category TEXT NOT NULL,
created_at INTEGER NOT NULL,
last_retrieved INTEGER,
retrieval_count INTEGER DEFAULT 0,
source_task TEXT
);
CREATE TABLE IF NOT EXISTS tasks (
id TEXT PRIMARY KEY,
description TEXT,
embedding F8_BLOB(384),
started_at INTEGER,
finished_at INTEGER
);
memories stores each lesson with its vector embedding and a category (e.g. correction, insight, user, discovery). The embedding is a 384-dimensional int8-quantized vector (F8_BLOB(384)).
tasks records each task the agent worked on, linking task descriptions to their embeddings for retrieval.
Connecting
import { connect } from "@tursodatabase/database";
const db = await connect(".memory/agent.db");
await db.exec(SCHEMA);
Storing a memory
When the agent learns something — a correction, a user preference, an insight — store it with an embedding. Use vector8() to convert a JSON float array into an int8-quantized vector:
INSERT INTO memories (id, content, embedding, category, created_at, source_task)
VALUES (?, ?, vector8(?), ?, ?, ?);
The parameter for vector8() is a JSON-stringified float array (e.g. 384 dimensions from a model like all-MiniLM-L6-v2). Turso handles the quantization internally:
db.prepare(`
INSERT INTO memories (id, content, embedding, category, created_at)
VALUES (?, ?, vector8(?), ?, ?)
`).run(id, content, JSON.stringify(Array.from(embedding)), "correction", Date.now());
Retrieving relevant memories
When a new task starts, find the most relevant memories using Turso’s vector_distance_cos() function:
SELECT id, content, category, created_at, retrieval_count,
vector_distance_cos(embedding, vector8(?)) AS distance
FROM memories
WHERE embedding IS NOT NULL
ORDER BY distance ASC
LIMIT ?;
vector_distance_cos() returns cosine distance (0 = identical, 2 = opposite). Lower is better. To convert to similarity: 1.0 - distance.
After retrieval, update the memory’s metadata:
UPDATE memories SET last_retrieved = ?, retrieval_count = retrieval_count + 1
WHERE id = ?;
Correcting wrong memories
When a memory turns out to be wrong, delete it and store a corrected version:
DELETE FROM memories WHERE id = ?;
INSERT INTO memories (id, content, embedding, category, created_at)
VALUES (?, ?, vector8(?), 'correction', ?);
Task tracking
Record tasks to understand what the agent has been working on:
-- Start a task
INSERT INTO tasks (id, description, embedding, started_at)
VALUES (?, ?, vector8(?), ?);
-- Complete a task
UPDATE tasks SET finished_at = ? WHERE id = ?;
Purging old memories
Remove memories that have been around a long time but are never retrieved:
DELETE FROM memories
WHERE retrieval_count = 0 AND created_at < ?;
Key design points
- Vector search uses Turso’s
vector8() and vector_distance_cos() for cosine similarity directly in SQL. Int8 quantization reduces storage by 75% compared to float32 with minimal impact on search quality.
- For small per-project datasets, a full table scan with
ORDER BY distance LIMIT k is fast enough without a vector index.
- Everything runs in a single embedded database file with no external services.
Going further
The schema above is deliberately minimal. In practice you will want to experiment with strategies for keeping the memory store useful as it grows. Some ideas:
- Weighted memories — add a
weight REAL column and boost or penalize memories based on whether the agent found them useful after retrieval. Use an exponential moving average to update weights over time.
- Time decay — multiply a recency factor into the retrieval ranking so that stale memories rank lower. For example:
(1.0 - distance) * POWER(0.95, days_since_last_retrieved).
- Garbage collection — periodically remove memories with low weight and high retrieval count (retrieved often, never useful), or memories that have never been retrieved after a threshold period.
- Task scoring — track outcome metrics (tokens used, errors, corrections) per task and use them to compute a credit signal for the memories that were retrieved during that task.
- Retrieval attribution — add a join table
(memory_id, task_id, similarity, credit) to record exactly which memories were used in each task and how they were rated.
There is no single best approach — the right strategy depends on your agent’s workload, how often memories are created, and how aggressively you want to prune. Start simple and iterate.
Example
memelord is an MCP server and hooks system that implements this pattern with reinforcement learning and weight decay as a persistent memory layer for AI coding agents.