Text Embedding Models

ApertureDB stores text embeddings as Descriptors in a DescriptorSet. Any model that produces a fixed-size float vector works — open-source local models and API-based models both follow the same pattern.

Runnable Notebooks

Recipe Text Search — sentence-transformers on Cookbook dish descriptions
Work with Descriptors — add, search, update, and delete embeddings

For setup and client configuration, see Client Configuration. For server setup options, see Server Setup.

Sentence Transformers

sentence-transformers provides lightweight open-source models that run locally without an API key.

pip install -U aperturedb sentence-transformers

from sentence_transformers import SentenceTransformer
import numpy as np
from aperturedb.CommonLibrary import create_connector

client = create_connector()
model = SentenceTransformer("all-MiniLM-L6-v2")   # 384-dimensional

recipes = [
    {"name": "Butter Chicken",    "description": "Creamy tomato-based curry with tender chicken pieces",            "cuisine": "Indian"},
    {"name": "Focaccia",          "description": "Italian flatbread topped with olive oil, rosemary, and sea salt", "cuisine": "Italian"},
    {"name": "Crepe Flambe",      "description": "Thin French pancakes flambeed with Grand Marnier and orange zest","cuisine": "French"},
    {"name": "Rajma Chawal",      "description": "Red kidney beans slow-cooked in spiced tomato gravy over rice",   "cuisine": "Indian"},
    {"name": "Ramen",             "description": "Japanese noodle soup with soft-boiled egg, nori, and pork belly",  "cuisine": "Japanese"},
]

descriptions = [r["description"] for r in recipes]
embeddings = model.encode(descriptions, normalize_embeddings=True)

# Create DescriptorSet
client.query([{"AddDescriptorSet": {
    "name": "recipe_text_search",
    "dimensions": 384,
    "engine": "HNSW",
    "metric": "CS"
}}])

# Add descriptors with metadata
for recipe, emb in zip(recipes, embeddings):
    client.query(
        [{"AddDescriptor": {
            "set": "recipe_text_search",
            "properties": {"name": recipe["name"], "cuisine": recipe["cuisine"]}
        }}],
        [emb.astype("float32").tobytes()]
    )

Search:

query_emb = model.encode(["spicy curry with rice"], normalize_embeddings=True)[0]

response, _ = client.query(
    [{"FindDescriptor": {
        "set": "recipe_text_search",
        "k_neighbors": 3,
        "constraints": {"cuisine": ["==", "Indian"]},
        "distances": True,
        "results": {"all_properties": True}
    }}],
    [query_emb.astype("float32").tobytes()]
)
client.print_last_response()

Common models:

Model	Dimensions	Notes
`all-MiniLM-L6-v2`	384	Fast, lightweight, good general-purpose
`all-mpnet-base-v2`	768	Higher quality, slower
`clip-ViT-B-32`	512	Encodes both text and images (multimodal)

Cohere

Cohere's embed-english-v3.0 produces 1024-dimensional embeddings and uses separate input_type values for documents vs. queries, which improves retrieval accuracy.

pip install -U aperturedb cohere

import cohere, numpy as np
from aperturedb.CommonLibrary import create_connector

co = cohere.Client("YOUR_COHERE_API_KEY")
client = create_connector()

descriptions = [r["description"] for r in recipes]

doc_embeddings = co.embed(
    texts=descriptions,
    model="embed-english-v3.0",
    input_type="search_document"
).embeddings

client.query([{"AddDescriptorSet": {
    "name": "recipe_cohere",
    "dimensions": 1024,
    "engine": "HNSW",
    "metric": "CS"
}}])

for recipe, emb in zip(recipes, doc_embeddings):
    vec = np.array(emb, dtype="float32")
    client.query(
        [{"AddDescriptor": {"set": "recipe_cohere", "properties": {"name": recipe["name"]}}}],
        [vec.tobytes()]
    )

# Query — use input_type="search_query" for the query embedding
query_emb = np.array(
    co.embed(texts=["noodle soup"], model="embed-english-v3.0", input_type="search_query").embeddings[0],
    dtype="float32"
)
response, _ = client.query(
    [{"FindDescriptor": {"set": "recipe_cohere", "k_neighbors": 3, "distances": True, "results": {"all_properties": True}}}],
    [query_emb.tobytes()]
)
client.print_last_response()

Google Gemini

Google's text-embedding-004 model produces 768-dimensional embeddings and supports separate task types for documents vs. queries (retrieval_document / retrieval_query), similar to Cohere. The same ApertureDB pattern applies: create a DescriptorSet with dimensions: 768, embed your texts with the Gemini API, and pass the float32 bytes to AddDescriptor.

pip install -U aperturedb google-generativeai

For a complete working example, see Ayesha Imran's GraphRAG series which uses Gemini embeddings with ApertureDB for knowledge graph construction and retrieval:

What's Next

Text Chunking — split documents into passages before embedding
Building RAG Pipelines — MMR reranking, LangChain, LlamaIndex
Bulk Embedding Ingestion — parallel ingestion with ParallelLoader

Sentence Transformers​

Cohere​

Google Gemini​

What's Next​

Sentence Transformers

Cohere

Google Gemini

What's Next