Image Embedding Models

ApertureDB stores images and their embeddings together, linked by a graph edge. A KNN query can traverse from matching descriptors directly to image blobs — no separate fetch step.

Runnable Notebooks

Image Vector Search — CLIP embeddings on Cookbook dish images, text-to-image search
Quick Start — includes CLIP food image search (section 5c)

For setup and client configuration, see Client Configuration. For server setup options, see Server Setup.

CLIP

CLIP embeds images and text into the same vector space, enabling text-to-image and image-to-image search. The clip-ViT-B-32 model from sentence-transformers is the simplest way to use it without PyTorch boilerplate:

pip install -U aperturedb sentence-transformers Pillow requests

import requests
import numpy as np
from PIL import Image
from io import BytesIO
from sentence_transformers import SentenceTransformer
from aperturedb.CommonLibrary import create_connector

client = create_connector()
model = SentenceTransformer("clip-ViT-B-32")   # 512-dimensional

# Create DescriptorSet
client.query([{"AddDescriptorSet": {
    "name": "food_image_search",
    "dimensions": 512,
    "engine": "HNSW",
    "metric": "CS",
}}])

# Add image + embedding in one transaction
image_url = "https://example.com/butter_chicken.jpg"
resp = requests.get(image_url, timeout=10)
img  = Image.open(BytesIO(resp.content)).convert("RGB")
emb  = model.encode(img, normalize_embeddings=True).astype("float32")

client.query(
    [
        {"AddImage": {"url": image_url, "_ref": 1, "properties": {"dish": "Butter Chicken", "cuisine": "Indian"}}},
        {"AddDescriptor": {"set": "food_image_search", "connect": {"ref": 1, "class": "has_embedding"}, "properties": {"dish": "Butter Chicken"}}},
    ],
    [emb.tobytes()]
)

Text-to-image search — CLIP text and image embeddings are comparable, so a text query returns visually matching images:

query_emb = model.encode("creamy curry", normalize_embeddings=True).astype("float32")

q = [
    {"FindDescriptor": {"set": "food_image_search", "k_neighbors": 5, "distances": True, "_ref": 1}},
    {"FindImage": {"is_connected_to": {"ref": 1, "class": "has_embedding"}, "blobs": True, "results": {"all_properties": True}}},
]
response, blobs = client.query(q, [query_emb.tobytes()])

The FindDescriptor → FindImage traversal returns matched images and metadata in one round trip.

Ingesting the Cookbook Dataset

The Cookbook dataset (20+ dish photos) can be ingested with CLIP embeddings in one command using the ApertureDB CLI:

wget https://github.com/aperture-data/Cookbook/raw/refs/heads/main/scripts/load_cookbook_data.sh
bash load_cookbook_data.sh

This ingests all dish images with CLIP ViT-B/16 embeddings stored in a ViT-B/16 DescriptorSet. After ingestion, the Quick Start notebook's section 5c runs text-to-image search over all dish photos.

FaceNet

For a large-scale example with a different model, the CelebA Face Similarity Search walkthrough uses FaceNet embeddings on 200k+ celebrity images with metadata-filtered KNN search (hair color, glasses, age).

Structured Ingestion with DataModels

For bulk ingestion using typed Pydantic schemas, see Structured Ingestion with DataModels. This approach is used in the Cookbook dataset loader and the CelebA similarity search example.

What's Next

Image Vector Search notebook — end-to-end with the Cookbook dataset
Bulk Embedding Ingestion — parallel ingestion with ParallelLoader
Structured Ingestion with DataModels — Pydantic schemas for typed ingestion pipelines

CLIP​

Ingesting the Cookbook Dataset​

FaceNet​

Structured Ingestion with DataModels​

What's Next​

CLIP

Ingesting the Cookbook Dataset

FaceNet

Structured Ingestion with DataModels

What's Next