Turso and libSQL enables vector search capability without an extension.

This feature is currently in technical preview. Join us in Discord to provide feedback and report any issues.

Full support for vector search in the Turso platform starts from version v0.24.24
(use the turso group show <group-name> command to check the group version).

How it works

  • Create a table with one or more vector columns (e.g. FLOAT32)
  • Provide vector values in binary format or convert text representation to binary using the appropriate conversion function (e.g. vector32(...))
  • Calculate vector similarity between vectors in the table or from the query itself using dedicated vector functions (e.g. vector_distance_cos)
  • Create a special vector index to speed up nearest neighbors queries (use the libsql_vector_idx(column) expression in the CREATE INDEX statement to create vector index)
  • Query the index with the special vector_top_k(idx_name, q_vector, k) table-valued function

Vectors

Types

LibSQL uses the native SQLite BLOB storage class for vector columns. To align with SQLite affinity rules, all type names have two alternatives: one that is easy to type and another with a _BLOB suffix that is consistent with affinity rules.

We suggest library authors use type names with the _BLOB suffix to make results more generic and universal. For regular applications, developers can choose either alternative, as the type name only serves as a hint for SQLite and external extensions.

As LibSQL does not introduce a new storage class, all metadata about vectors is also encoded in the BLOB itself. This comes at the cost of a few bytes per row but greatly simplifies the design of the feature.

The table below lists six vector types currently supported by LibSQL. Types are listed from more precise and storage-heavy to more compact but less precise alternatives (the number of dimensions in vector DD is used to estimate storage requirements for a single vector).

Type nameStorage (bytes)Description
FLOAT64 | F64_BLOB8D+18D + 1Implementation of IEEE 754 double precision format for 64-bit floating point numbers
FLOAT32 | F32_BLOB4D4DImplementation of IEEE 754 single precision format for 32-bit floating point numbers
FLOAT16 | F16_BLOB2D+12D + 1Implementation of IEEE 754-2008 half precision format for 16-bit floating point numbers
FLOATB16 | FB16_BLOB2D+12D + 1Implementation of bfloat16 format for 16-bit floating point numbers
FLOAT8 | F8_BLOBD+14D + 14LibSQL specific implementation which compresses each vector component to single u8 byte b and reconstruct value from it using simple transformation: shift+alphab\texttt{shift} + \texttt{alpha} \cdot b
FLOAT1BIT | F1BIT_BLOBD8+3\lceil \frac{D}{8} \rceil + 3LibSQL-specific implementation which compresses each vector component down to 1-bit and packs multiple components into a single machine word, achieving a very compact representation

For most applications, the FLOAT32 type should be a good starting point, but you may want to explore more compact options if your table has a large number of rows with vectors.

While FLOAT16 and FLOATB16 use the same amount of storage, they provide different trade-offs between speed and accuracy. Generally, operations over bfloat16 are faster but come at the expense of lower precision.

Functions

To work with vectors, LibSQL provides several functions that operate in the vector domain. Each function understands vectors in binary format aligned with the six types described above or in text format as a single JSON array of numbers.

Currently, LibSQL supports the following functions:

Function nameDescription
vector64 | vector32 | vector16 | vectorb16 | vector8 | vector1bitConversion function shiwh accepts valid vector and convert it to the corresponding target type
vectorAlias for vector32 conversion function
vector_extractExtraction function which accepts valid vector and return its text representation
vector_distance_cosCosine distance (1 - cosine similarity) function which operates over vector of same type with same dimensionality
vector_distance_l2Euclidian distance function which operates over vector of same type with same dimensionality

Vectors usage

1

Create a table

Begin by declaring a column used for storing vectors with the F32_BLOB datatype:

CREATE TABLE movies (
  title    TEXT,
  year     INT,
  full_emb F32_BLOB(4), -- 4-dimensional f32 vector
);

The number in parentheses (4) specifies the dimensionality of the vector. This means each vector in this column will have exactly 4 components.

2

Generate and insert embeddings

Once you generate embeddings for your data (via an LLM), you can insert them into your table:

INSERT INTO movies (title, year, embedding)
VALUES
  ('Napoleon', 2023, vector32('[0.800, 0.579, 0.481, 0.229]')),
  ('Black Hawk Down', 2001, vector32('[0.406, 0.027, 0.378, 0.056]')),
  ('Gladiator', 2000, vector32('[0.698, 0.140, 0.073, 0.125]')),
  ('Blade Runner', 1982, vector32('[0.379, 0.637, 0.011, 0.647]'))

Popular tools like LangChain, Hugging Face or OpenAI can be used to generate embeddings.

3

Peform a vector similarity search

You can now write queries combining vectors and standard SQLite data:

SELECT title,
       vector_extract(embedding),
       vector_distance_cos(embedding, vector32('[0.064, 0.777, 0.661, 0.687]'))
FROM movies
ORDER BY 
       vector_distance_cos(embedding, vector32('[0.064, 0.777, 0.661, 0.687]'))
ASC;

The vector_distance_cos function calculates the cosine distance, which equals to 1 - cosine similarity. Therefore, a smaller distance indicates that the vectors are closer to each other.

Vector Limitations

  • Euclidian distance is not supported for 1-bit FLOAT1BIT vectors
  • LibSQL can only operate on vectors with no more than 65536 dimensions

Indexing

Nearest neighbors (NN) queries are popular for various AI-powered applications (RAG uses NN queries to extract relevant information, and recommendation engines can suggest items based on embedding similarity).

LibSQL implements DiskANN algorithm in order to speed up approximate neareast neighbors queries for tables with vector colums.

The DiskANN algorithm trades search accuracy for speed, so LibSQL queries may return slightly suboptimal neighbors for tables with a large number of rows.

Vector Index

LibSQL introduces a custom index type that helps speed up nearest neighbors queries against a fixed distance function (cosine similarity by default).

From a syntax perspective, the vector index differs from ordinary application-defined B-Tree indices in that it must wrap the vector column into a libsql_vector_idx marker function like this

CREATE INDEX movies_idx ON movies (libsql_vector_idx(embedding));

Vector index works only for column with one of the vector types described above

The vector index is fully integrated into the LibSQL core, so it inherits all operations and most features from ordinary indices:

  • An index created for a table with existing data will be automatically populated with this data
  • All updates to the base table will be automatically reflected in the index
  • You can rebuild index from scratch using REINDEX movies_idx command
  • You can drop index with DROP INDEX movies_idx command
  • You can create partial vector index with a custom filtering rule:
CREATE INDEX movies_idx ON movies (libsql_vector_idx(embedding)) 
WHERE year >= 2000;

Query

At the moment vector index must be queried explicitly with special vector_top_k(idx_name, q_vector, k) table-valued function. The function accepts index name, query vector and amount of neighbors to return. This function search for k approximate nearest neighbors and return ROWID of these rows or PRIMARY KEY if base index do not have ROWID.

In order for table-valued function to work query vector must have same vector type and same dimensionality.

Settings

LibSQL vector index optionall can accept settings which must be specified as a variadic parameters of the libsql_vector_idx function as a strings in the format key=value:

CREATE INDEX movies_idx 
ON movies(libsql_vector_idx(embedding, 'metric=l2', 'compress_neighbors=float8'));

At the momen LibSQL supports following settings:

Setting keyValue typeDescription
metriccosine | l2Which distance function to use for building index.
Default: cosine
max_neighborspositive integerHow many neighbors to store for every node in the DiskANN graph. The lower the setting — the less storage index will use in exchange to search precision.
Default: 3D3 \sqrt{D} where DD — dimensionality of vector column
compress_neighborsfloat1bit|float8|
float16|floatb16|
float32
Which vector type must be used to store neighbors for every node in the DiskANN graph. The more compact vector type is used for neighbors — the less storage index will use in exchange to search precision.
Default: no comperssion (neighbors has same type as base table)
alphapositive float 1\geq 1“Density” parameter of general sparse neighborhood graph build during DiskANN algorithm. The lower parameter — the more sparse is DiskANN graph which can speed up query speed in exchange to lower search precision.
Default: 1.2
search_lpositive integerSetting which limits amount of neighbors visited during vector search. The lower the setting — the faster will be search query in exchange to search precision.
Default: 200
insert_lpositive integerSetting which limits amount of neighbors visited during vector insert. The lower the setting — the faster will be insert query in exchange to DiskANN graph navigability properties.
Default: 70

Vector index for column of type T1 with max_neighbors=M and compress_neighbors=T2 will approximately use N(Storage(T1)+MStorage(T2))\texttt{N} (Storage(\texttt{T1}) + \texttt{M} \cdot Storage(\texttt{T2})) storage bytes for N rows.

Index usage

1

Create a table

Begin by declaring a column used for storing vectors with the F32_BLOB datatype:

CREATE TABLE movies (
  title    TEXT,
  year     INT,
  full_emb F32_BLOB(4), -- 4-dimensional f32 vector
);

The number in parentheses (4) specifies the dimensionality of the vector. This means each vector in this column will have exactly 4 components.

2

Generate and insert embeddings

Once you generate embeddings for your data (via an LLM), you can insert them into your table:

INSERT INTO movies (title, year, embedding)
VALUES
  ('Napoleon', 2023, vector32('[0.800, 0.579, 0.481, 0.229]')),
  ('Black Hawk Down', 2001, vector32('[0.406, 0.027, 0.378, 0.056]')),
  ('Gladiator', 2000, vector32('[0.698, 0.140, 0.073, 0.125]')),
  ('Blade Runner', 1982, vector32('[0.379, 0.637, 0.011, 0.647]'))

Popular tools like LangChain, Hugging Face or OpenAI can be used to generate embeddings.

3

Create an Index

Create an index using the libsql_vector_idx function:

CREATE INDEX movies_idx ON movies(libsql_vector_idx(embedding));

This creates an index optimized for vector similarity searches on the embedding column.

The libsql_vector_idx marker function is required and used by libSQL to distinguish ANN-indices from ordinary B-Tree indices.

4

Query the indexed table

SELECT title, year 
FROM vector_top_k('movies_idx', vector32('[0.064, 0.777, 0.661, 0.687]'), 3)
JOIN movies ON movies.rowid = id
WHERE year >= 2020;

This query uses the vector_top_k table-valued function to efficiently find the top 3 most similar vectors to [0.064, 0.777, 0.661, 0.687] using the index.

Index limitations

  • Vector index works only for tables with ROWID or with singular PRIMARY KEY. Composite PRIMARY KEY without ROWID is not supported