Vector Search Quickstart
This notebook shows the core vector search workflow in ApertureDB:
- Create a DescriptorSet (vector index)
- Add Descriptors (embeddings with metadata)
- Run KNN search to find similar items
For real embeddings and runnable end-to-end examples, jump straight to the notebooks linked at the bottom.
Connect to ApertureDB
Option A: ApertureDB Cloud (recommended)
Sign up for a free 30-day trial. Get your key from Connect > Generate API Key, add it to a .env file in this directory:
APERTUREDB_KEY=your_key_here
Option B: Community Edition (local Docker)
Run this in a terminal before starting the notebook:
docker run -d --name aperturedb \
-p 55555:55555 -e ADB_MASTER_KEY=admin -e ADB_FORCE_SSL=false \
aperturedata/aperturedb-community
See client configuration options for all connection methods and server setup options for deployment choices.
%pip install --upgrade --quiet aperturedb python-dotenv
# Option A: ApertureDB Cloud
from dotenv import load_dotenv
load_dotenv() # loads APERTUREDB_KEY from .env into the environment
True
# Option B: Community Edition (local Docker)
# !adb config create localdb --active \
# --host localhost --port 55555 \
# --username admin --password admin \
# --no-use-ssl --no-interactive
from aperturedb.CommonLibrary import create_connector
client = create_connector()
response, _ = client.query([{"GetStatus": {}}])
client.print_last_response()
[
{
"GetStatus": {
"info": "OK",
"status": 0,
"system": "ApertureDB",
"version": "0.19.6"
}
}
]
Create a Vector Index
A DescriptorSet is a named, indexed collection of vectors. All vectors in a set must have the same number of dimensions.
SET_NAME = "recipe_search"
client.query([{
"AddDescriptorSet": {
"name": SET_NAME,
"dimensions": 4, # use 384, 512, 1024, etc. for real models
"engine": "FaissFlat", # exact search; use HNSW for large-scale ANN
"metric": "CS", # cosine similarity; or "L2" for Euclidean
}
}])
client.print_last_response()
[
{
"AddDescriptorSet": {
"status": 0
}
}
]
Add Vectors
Each Descriptor is a float32 vector plus optional metadata properties. The vector is passed as a binary blob.
import numpy as np
dishes = [
{"name": "Butter Chicken", "cuisine": "Indian", "vec": [0.9, 0.1, 0.8, 0.2]},
{"name": "Rajma Chawal", "cuisine": "Indian", "vec": [0.8, 0.2, 0.9, 0.1]},
{"name": "Ramen", "cuisine": "Japanese", "vec": [0.1, 0.9, 0.2, 0.8]},
{"name": "Sushi", "cuisine": "Japanese", "vec": [0.2, 0.8, 0.1, 0.9]},
{"name": "Focaccia", "cuisine": "Italian", "vec": [0.5, 0.5, 0.6, 0.4]},
]
for dish in dishes:
vec = np.array(dish["vec"], dtype="float32")
client.query([{
"AddDescriptor": {
"set": SET_NAME,
"properties": {"name": dish["name"], "cuisine": dish["cuisine"]},
}
}], [vec.tobytes()])
print(f"Added {len(dishes)} descriptors")
Added 5 descriptors
KNN Search
FindDescriptor takes a query vector and returns the k nearest neighbors by the set's distance metric.
query_vec = np.array([0.85, 0.15, 0.85, 0.15], dtype="float32") # close to Indian dishes
response, _ = client.query([{
"FindDescriptor": {
"set": SET_NAME,
"k_neighbors": 3,
"distances": True,
"results": {"all_properties": True},
}
}], [query_vec.tobytes()])
client.print_last_response()
[
{
"FindDescriptor": {
"entities": [
{
"_distance": 0.9966610670089722,
"_set_name": "recipe_search",
"_uniqueid": "3.192.488740",
"cuisine": "Indian",
"name": "Butter Chicken"
},
{
"_distance": 0.9966610670089722,
"_set_name": "recipe_search",
"_uniqueid": "3.193.488760",
"cuisine": "Indian",
"name": "Rajma Chawal"
},
{
"_distance": 0.867941677570343,
"_set_name": "recipe_search",
"_uniqueid": "3.196.488820",
"cuisine": "Italian",
"name": "Focaccia"
}
],
"returned": 3,
"status": 0
}
}
]
Python SDK: Descriptors Wrapper
The Descriptors class in the Python SDK wraps the query language and adds reranking with MMR.
from aperturedb.Descriptors import Descriptors
descriptors = Descriptors(client)
# Basic similarity search — distances available
descriptors.find_similar(
set=SET_NAME,
vector=query_vec,
k_neighbors=3,
distances=True,
)
print("find_similar:")
for r in descriptors.response:
print(f" {r['name']:<20} distance={r['_distance']:.4f}")
print()
# MMR: diversify results (avoids near-duplicates)
# Note: find_similar_mmr uses blobs internally for reranking;
# _distance is not available in the output.
descriptors.find_similar_mmr(
set=SET_NAME,
vector=query_vec,
k_neighbors=3,
fetch_k=5,
lambda_mult=0.5, # 0.0 = max diversity, 1.0 = similarity only
)
print("find_similar_mmr (diversified):")
for r in descriptors.response:
print(f" {r['name']:<20} cuisine={r['cuisine']}")
find_similar:
Butter Chicken distance=0.9967
Rajma Chawal distance=0.9967
Focaccia distance=0.8679
find_similar_mmr (diversified):
Butter Chicken cuisine=Indian
Rajma Chawal cuisine=Indian
Focaccia cuisine=Italian
Cleanup
client.query([{"DeleteDescriptorSet": {"with_name": SET_NAME}}])
client.print_last_response()
[
{
"DeleteDescriptorSet": {
"count": 1,
"status": 0
}
}
]
Next Steps
Replace the synthetic vectors above with real embeddings from your data:
| Data type | Notebook |
|---|---|
| Text / documents | Recipe Text Search — sentence-transformers on Cookbook dish descriptions |
| Work with PDFs — chunk, embed, and search a PDF blob | |
| Images | Image Vector Search — CLIP embeddings on dish images, text-to-image search |
| Video frames | Video Vector Search — CLIP frame embeddings, text-to-frame search |
| Audio | Audio Vector Search — audio embedding and search |
| Bulk loading | Bulk Embeddings — ParallelLoader for large-scale ingestion |
| Hybrid search | Hybrid Search — combine KNN with metadata filters |