Text Embedding Models
ApertureDB stores text embeddings as Descriptors in a DescriptorSet. Any model that produces a fixed-size float vector works — open-source local models and API-based models both follow the same pattern.
- Recipe Text Search — sentence-transformers on Cookbook dish descriptions
- Work with Descriptors — add, search, update, and delete embeddings
For setup and client configuration, see Client Configuration. For server setup options, see Server Setup.
Sentence Transformers
sentence-transformers provides lightweight open-source models that run locally without an API key.
pip install -U aperturedb sentence-transformers
from sentence_transformers import SentenceTransformer
import numpy as np
from aperturedb.CommonLibrary import create_connector
client = create_connector()
model = SentenceTransformer("all-MiniLM-L6-v2") # 384-dimensional
recipes = [
{"name": "Butter Chicken", "description": "Creamy tomato-based curry with tender chicken pieces", "cuisine": "Indian"},
{"name": "Focaccia", "description": "Italian flatbread topped with olive oil, rosemary, and sea salt", "cuisine": "Italian"},
{"name": "Crepe Flambe", "description": "Thin French pancakes flambeed with Grand Marnier and orange zest","cuisine": "French"},
{"name": "Rajma Chawal", "description": "Red kidney beans slow-cooked in spiced tomato gravy over rice", "cuisine": "Indian"},
{"name": "Ramen", "description": "Japanese noodle soup with soft-boiled egg, nori, and pork belly", "cuisine": "Japanese"},
]
descriptions = [r["description"] for r in recipes]
embeddings = model.encode(descriptions, normalize_embeddings=True)
# Create DescriptorSet
client.query([{"AddDescriptorSet": {
"name": "recipe_text_search",
"dimensions": 384,
"engine": "HNSW",
"metric": "CS"
}}])
# Add descriptors with metadata
for recipe, emb in zip(recipes, embeddings):
client.query(
[{"AddDescriptor": {
"set": "recipe_text_search",
"properties": {"name": recipe["name"], "cuisine": recipe["cuisine"]}
}}],
[emb.astype("float32").tobytes()]
)
Search:
query_emb = model.encode(["spicy curry with rice"], normalize_embeddings=True)[0]
response, _ = client.query(
[{"FindDescriptor": {
"set": "recipe_text_search",
"k_neighbors": 3,
"constraints": {"cuisine": ["==", "Indian"]},
"distances": True,
"results": {"all_properties": True}
}}],
[query_emb.astype("float32").tobytes()]
)
client.print_last_response()
Common models:
| Model | Dimensions | Notes |
|---|---|---|
all-MiniLM-L6-v2 | 384 | Fast, lightweight, good general-purpose |
all-mpnet-base-v2 | 768 | Higher quality, slower |
clip-ViT-B-32 | 512 | Encodes both text and images (multimodal) |
Cohere
Cohere's embed-english-v3.0 produces 1024-dimensional embeddings and uses separate input_type values for documents vs. queries, which improves retrieval accuracy.
pip install -U aperturedb cohere
import cohere, numpy as np
from aperturedb.CommonLibrary import create_connector
co = cohere.Client("YOUR_COHERE_API_KEY")
client = create_connector()
descriptions = [r["description"] for r in recipes]
doc_embeddings = co.embed(
texts=descriptions,
model="embed-english-v3.0",
input_type="search_document"
).embeddings
client.query([{"AddDescriptorSet": {
"name": "recipe_cohere",
"dimensions": 1024,
"engine": "HNSW",
"metric": "CS"
}}])
for recipe, emb in zip(recipes, doc_embeddings):
vec = np.array(emb, dtype="float32")
client.query(
[{"AddDescriptor": {"set": "recipe_cohere", "properties": {"name": recipe["name"]}}}],
[vec.tobytes()]
)
# Query — use input_type="search_query" for the query embedding
query_emb = np.array(
co.embed(texts=["noodle soup"], model="embed-english-v3.0", input_type="search_query").embeddings[0],
dtype="float32"
)
response, _ = client.query(
[{"FindDescriptor": {"set": "recipe_cohere", "k_neighbors": 3, "distances": True, "results": {"all_properties": True}}}],
[query_emb.tobytes()]
)
client.print_last_response()
Google Gemini
Google's text-embedding-004 model produces 768-dimensional embeddings and supports separate task types for documents vs. queries (retrieval_document / retrieval_query), similar to Cohere. The same ApertureDB pattern applies: create a DescriptorSet with dimensions: 768, embed your texts with the Gemini API, and pass the float32 bytes to AddDescriptor.
pip install -U aperturedb google-generativeai
For a complete working example, see Ayesha Imran's GraphRAG series which uses Gemini embeddings with ApertureDB for knowledge graph construction and retrieval:
What's Next
- Text Chunking — split documents into passages before embedding
- Building RAG Pipelines — MMR reranking, LangChain, LlamaIndex
- Bulk Embedding Ingestion — parallel ingestion with
ParallelLoader