ApertureDB as Vector Database
ApertureDB allows us to store and manage multimodal data, including videos, clips, and their associated embeddings With its support for indexing embeddings (also known as feature vectors or descriptors like in ApertureDB) of any dimensions, perfoming K-near neighbor (KNN) search and classification, ApertureDB offers all the necessary support as a vector database.
Descriptor Sets, Collections, Vector Search Space, or Vector Index
A DescriptorSet, also called collection, vector search space, or vector index is created to store the embeddings. It is a group of descriptors with a fixed number of dimensions that are the result of the same algorithm for feature extraction. For instance, we can create a DescriptorSet and insert multiple descriptors obtained by using OpenFace (128 dimensions), and then index and perform matching operations over those descriptors. This set defines the search space for our embeddings.
The engine (e.g. HNSW) and distance metrics (e.g. cosine) are assigned to the set when creating it. All the embeddings added to the set are then indexed using the engine and KNN is based on the specified distance metric. ApertureDB allows descriptor sets to be created with multiple engines and distance metrics so users can choose KNN criteria on the fly.
Descriptors or Embeddings
Unimodal or multimodal embeddings are always added to a DescriptorSet, along with other metadata properties and labels associated with it.
A blob must be provided in the array of blobs, which contains the descriptor's values. The blob is an binary array of 32-bit floating point values. The size of the blob, in bytes, must be dimensions*4.
Application Examples
Some examples of ApertureDB being used as a vector database:
- Finding faces in images using multimodal embeddings
- Building agents using ApertureDB vector search
- Video semantic search
Building {Vector|Graph|Agentic|*}RAG Pipelines
In our LangChain integration, we implement one reranking method (MMR in this case) in our Python connector in order to build RAG pipelines. Users can also combine the graph capabilities to do more interesting graph-vector use cases like restricting responses based on user access permissions for secure RAG implementations. Users can also monitor and improve their RAG responses through better input processing.