Skip to main content

ApertureDB as Vector Database

ApertureDB allows us to store and manage multimodal data, including videos, clips, and their associated embeddings With its support for indexing embeddings (also known as feature vectors or descriptors like in ApertureDB) of any dimensions, perfoming K-near neighbor (KNN) search and classification, ApertureDB offers all the necessary support as a vector database.

Descriptor Sets, Collections, Vector Search Space, or Vector Index

A DescriptorSet, also called collection, vector search space, or vector index is created to store the embeddings. It is a group of descriptors with a fixed number of dimensions that are the result of the same algorithm for feature extraction. For instance, we can create a DescriptorSet and insert multiple descriptors obtained by using OpenFace (128 dimensions), and then index and perform matching operations over those descriptors. This set defines the search space for our embeddings.

The engine (e.g. HNSW) and distance metrics (e.g. cosine) are assigned to the set when creating it. All the embeddings added to the set are then indexed using the engine and KNN is based on the specified distance metric. ApertureDB allows descriptor sets to be created with multiple engines and distance metrics so users can choose KNN criteria on the fly.

Descriptors or Embeddings

Unimodal or multimodal embeddings are always added to a DescriptorSet, along with other metadata properties and labels associated with it.

A blob must be provided in the array of blobs, which contains the descriptor's values. The blob is an binary array of 32-bit floating point values. The size of the blob, in bytes, must be dimensions*4.

Application Examples

Some examples of ApertureDB being used as a vector database:

Building {Vector|Graph|Agentic|*}RAG Pipelines

In our LangChain integration, we implement one reranking method (MMR in this case) in our Python connector in order to build RAG pipelines. Users can also combine the graph capabilities to do more interesting graph-vector use cases like restricting responses based on user access permissions for secure RAG implementations. Users can also monitor and improve their RAG responses through better input processing.