Image Vector Search

This notebook embeds food images from the Cookbook dataset using a CLIP model and runs text-to-image search against ApertureDB.

Connect to ApertureDB

Option A: ApertureDB Cloud (recommended)
Sign up for a free 30-day trial. Get your key from Connect → Generate API Key, add it to a .env file in this directory:

APERTUREDB_KEY=your_key_here

Option B: Community Edition (local Docker)
Run this in a terminal before starting the notebook:

docker run -d --name aperturedb \
  -p 55555:55555 -e ADB_MASTER_KEY=admin -e ADB_FORCE_SSL=false \
  aperturedata/aperturedb-community

See client configuration options for all connection methods and server setup options for deployment choices.

%pip install aperturedb sentence-transformers Pillow requests pandas python-dotenv

from dotenv import load_dotenv
load_dotenv()

True

# !adb config create localdb --active \
#     --host localhost --port 55555 \
#     --username admin --password admin \
#     --no-use-ssl --no-interactive

from aperturedb.CommonLibrary import create_connector

client = create_connector()
response, _ = client.query([{"GetStatus": {}}])
client.print_last_response()

[
    {
        "GetStatus": {
            "info": "OK",
            "status": 0,
            "system": "ApertureDB",
            "version": "0.19.6"
        }
    }
]

Step: Load the CLIP model

from sentence_transformers import SentenceTransformer
model = SentenceTransformer("clip-ViT-B-32")
print(f"Embedding dimensions: {model.get_sentence_embedding_dimension()}")

Step: Load image URLs from Cookbook dataset

import pandas as pd
dishes = pd.read_csv(
    "https://raw.githubusercontent.com/aperture-data/Cookbook/refs/heads/main/images.adb.csv"
)
dishes = dishes.head(10)   # use first 10 for this demo
print(f"Loaded {len(dishes)} dishes")
print(dishes[["dish_name", "url", "food_tags"]].to_string(index=False))

Loaded 10 dishes
                                                      dish_name                                                                                              url food_tags
                                                   rajma chawal https://raw.githubusercontent.com/aperture-data/Cookbook/refs/heads/main/images/001%20Large.jpeg    Indian
                                                  paneer bhurji https://raw.githubusercontent.com/aperture-data/Cookbook/refs/heads/main/images/002%20Large.jpeg    Indian
                                                      moong dal https://raw.githubusercontent.com/aperture-data/Cookbook/refs/heads/main/images/003%20Large.jpeg    Indian
                                                 Butter chicken https://raw.githubusercontent.com/aperture-data/Cookbook/refs/heads/main/images/004%20Large.jpeg    Indian
                                                       porridge    https://raw.githubusercontent.com/aperture-data/Cookbook/refs/heads/main/images/IMG_0898.jpeg  Scottish
                                                   baked potato    https://raw.githubusercontent.com/aperture-data/Cookbook/refs/heads/main/images/IMG_5728.jpeg  Scottish
                           haggis bonbons and steak and ale pie    https://raw.githubusercontent.com/aperture-data/Cookbook/refs/heads/main/images/IMG_5743.jpeg  Scottish
butter chicken with special fried rice and assorted naan breads    https://raw.githubusercontent.com/aperture-data/Cookbook/refs/heads/main/images/IMG_5786.jpeg    Indian
                                      duck breast and rump lamb    https://raw.githubusercontent.com/aperture-data/Cookbook/refs/heads/main/images/IMG_5812.jpeg   British
                 traditional scottish and continental breakfast    https://raw.githubusercontent.com/aperture-data/Cookbook/refs/heads/main/images/IMG_5815.jpeg  Scottish

Step: Create a DescriptorSet for image embeddings

SET_NAME = "dish_image_search"

client.query([{
    "AddDescriptorSet": {
        "name":       SET_NAME,
        "dimensions": 512,     # CLIP ViT-B/32 output size
        "engine":     "HNSW",  # approximate nearest neighbor for larger sets
        "metric":     "CS",    # cosine similarity
    }
}])
client.print_last_response()

[
    {
        "AddDescriptorSet": {
            "status": 0
        }
    }
]

Step: Add images with embeddings in one transaction

Each AddImage + AddDescriptor pair is a single atomic transaction. The descriptor is linked to the image via connect. The CLIP model encodes the PIL image directly.

import requests
from PIL import Image
from io import BytesIO
import numpy as np

def fix_url(url):
    """Normalize Cookbook image URLs: refs/heads/main -> main, spaces -> underscores."""
    return url.replace("refs/heads/main", "main").replace("%20", "_").replace(" ", "_")

for _, row in dishes.iterrows():
    img_url = fix_url(row["url"])
    resp = requests.get(img_url, timeout=10)
    resp.raise_for_status()
    img_bytes = resp.content
    img = Image.open(BytesIO(img_bytes)).convert("RGB")
    emb = model.encode(img, normalize_embeddings=True).astype("float32")

    q = [
        {
            "AddImage": {
                "_ref": 1,
                "properties": {
                    "dish_name": row["dish_name"],
                    "cuisine":   row["food_tags"],
                }
            }
        },
        {
            "AddDescriptor": {
                "set":   SET_NAME,
                "connect": {"ref": 1, "class": "has_embedding"},
                "properties": {
                    "dish_name": row["dish_name"],
                    "cuisine":   row["food_tags"],
                },
                "connect": {
                    "ref": 1,
                    "class": "has_embedding"
                }
            }
        }
    ]
    # blob[0] -> AddImage (raw image bytes), blob[1] -> AddDescriptor (embedding)
    client.query(q, [img_bytes, emb.tobytes()])

print("Done adding images and embeddings")

Done adding images and embeddings

Step: Text-to-image search

CLIP encodes text and images into the same embedding space, so a text query returns the visually closest images.

query_text = "creamy curry with rice"

query_emb = model.encode(query_text, normalize_embeddings=True).astype("float32")

# Find top-3 descriptors, then traverse to linked images
q = [
    {
        "FindDescriptor": {
            "set":         SET_NAME,
            "k_neighbors": 3,
            "distances":   True,
            "_ref":        1,
            "results":     {"all_properties": True},
        }
    },
    {
        "FindImage": {
            "is_connected_to": {"ref": 1, "connection_class": "has_embedding"},
            "blobs":           False,
            "results":         {"all_properties": True},
        }
    }
]

response, _ = client.query(q, [query_emb.tobytes()])

if not isinstance(response, list) or response[0].get("FindDescriptor", {}).get("status", 0) != 0:
    print("Query failed — check the response:")
    client.print_last_response()
else:
    descriptors = response[0]["FindDescriptor"].get("entities", [])
    images      = response[1]["FindImage"].get("entities", [])

    print(f'Query: "{query_text}"\n')
    for desc, img in zip(descriptors, images):
        print(f"  {img['dish_name']:<25} cuisine={img['cuisine']:<15} distance={desc['_distance']:.4f}")

Query: "creamy curry with rice"

  rajma chawal              cuisine=Indian          distance=0.3054
  moong dal                 cuisine=Indian          distance=0.2955
  Butter chicken            cuisine=Indian          distance=0.2759

Step: Cleanup (optional)

client.query([{"DeleteDescriptorSet": {"with_name": SET_NAME}}])
client.query([{"DeleteImage": {}}])
client.print_last_response()

[
    {
        "DeleteImage": {
            "count": 10,
            "status": 0
        }
    }
]

Connect to ApertureDB​

Step: Load the CLIP model​

Step: Load image URLs from Cookbook dataset​

Step: Create a DescriptorSet for image embeddings​

Step: Add images with embeddings in one transaction​

Step: Text-to-image search​

Step: Cleanup (optional)​