Skip to main content

Text Embedding Models

ApertureDB stores text embeddings as Descriptors in a DescriptorSet. Any model that produces a fixed-size float vector works — open-source local models and API-based models both follow the same pattern.

Runnable Notebooks

For setup and client configuration, see Client Configuration. For server setup options, see Server Setup.


Sentence Transformers

sentence-transformers provides lightweight open-source models that run locally without an API key.

pip install -U aperturedb sentence-transformers
from sentence_transformers import SentenceTransformer
import numpy as np
from aperturedb.CommonLibrary import create_connector

client = create_connector()
model = SentenceTransformer("all-MiniLM-L6-v2") # 384-dimensional

recipes = [
{"name": "Butter Chicken", "description": "Creamy tomato-based curry with tender chicken pieces", "cuisine": "Indian"},
{"name": "Focaccia", "description": "Italian flatbread topped with olive oil, rosemary, and sea salt", "cuisine": "Italian"},
{"name": "Crepe Flambe", "description": "Thin French pancakes flambeed with Grand Marnier and orange zest","cuisine": "French"},
{"name": "Rajma Chawal", "description": "Red kidney beans slow-cooked in spiced tomato gravy over rice", "cuisine": "Indian"},
{"name": "Ramen", "description": "Japanese noodle soup with soft-boiled egg, nori, and pork belly", "cuisine": "Japanese"},
]

descriptions = [r["description"] for r in recipes]
embeddings = model.encode(descriptions, normalize_embeddings=True)

# Create DescriptorSet
client.query([{"AddDescriptorSet": {
"name": "recipe_text_search",
"dimensions": 384,
"engine": "HNSW",
"metric": "CS"
}}])

# Add descriptors with metadata
for recipe, emb in zip(recipes, embeddings):
client.query(
[{"AddDescriptor": {
"set": "recipe_text_search",
"properties": {"name": recipe["name"], "cuisine": recipe["cuisine"]}
}}],
[emb.astype("float32").tobytes()]
)

Search:

query_emb = model.encode(["spicy curry with rice"], normalize_embeddings=True)[0]

response, _ = client.query(
[{"FindDescriptor": {
"set": "recipe_text_search",
"k_neighbors": 3,
"constraints": {"cuisine": ["==", "Indian"]},
"distances": True,
"results": {"all_properties": True}
}}],
[query_emb.astype("float32").tobytes()]
)
client.print_last_response()

Common models:

ModelDimensionsNotes
all-MiniLM-L6-v2384Fast, lightweight, good general-purpose
all-mpnet-base-v2768Higher quality, slower
clip-ViT-B-32512Encodes both text and images (multimodal)

Cohere

Cohere's embed-english-v3.0 produces 1024-dimensional embeddings and uses separate input_type values for documents vs. queries, which improves retrieval accuracy.

pip install -U aperturedb cohere
import cohere, numpy as np
from aperturedb.CommonLibrary import create_connector

co = cohere.Client("YOUR_COHERE_API_KEY")
client = create_connector()

descriptions = [r["description"] for r in recipes]

doc_embeddings = co.embed(
texts=descriptions,
model="embed-english-v3.0",
input_type="search_document"
).embeddings

client.query([{"AddDescriptorSet": {
"name": "recipe_cohere",
"dimensions": 1024,
"engine": "HNSW",
"metric": "CS"
}}])

for recipe, emb in zip(recipes, doc_embeddings):
vec = np.array(emb, dtype="float32")
client.query(
[{"AddDescriptor": {"set": "recipe_cohere", "properties": {"name": recipe["name"]}}}],
[vec.tobytes()]
)

# Query — use input_type="search_query" for the query embedding
query_emb = np.array(
co.embed(texts=["noodle soup"], model="embed-english-v3.0", input_type="search_query").embeddings[0],
dtype="float32"
)
response, _ = client.query(
[{"FindDescriptor": {"set": "recipe_cohere", "k_neighbors": 3, "distances": True, "results": {"all_properties": True}}}],
[query_emb.tobytes()]
)
client.print_last_response()

Google Gemini

Google's text-embedding-004 model produces 768-dimensional embeddings and supports separate task types for documents vs. queries (retrieval_document / retrieval_query), similar to Cohere. The same ApertureDB pattern applies: create a DescriptorSet with dimensions: 768, embed your texts with the Gemini API, and pass the float32 bytes to AddDescriptor.

pip install -U aperturedb google-generativeai

For a complete working example, see Ayesha Imran's GraphRAG series which uses Gemini embeddings with ApertureDB for knowledge graph construction and retrieval:


What's Next