Skip to main content

LlamaIndex

LlamaIndex is a simple, flexible framework for building knowledge assistants using LLMs connected to your enterprise data. It provides end-to-end tooling to ship a context-augmented AI agent to production. ApertureDB's capabilities fit within its ecosystem as an advanced search and retriever. This is useful for using an ApertureDB vector store as part of pipelines such as RAG (Retrieval-Augmented Generation).

ApertureDB is a hybrid vector store and graph database. Currently, LlamaIndex supports the vector store functionality of ApertureDB (contributed by AIMon team). This means that you can use ApertureDB as a provider for LlamaIndex's vector store. This allows you to store and retrieve vectors from ApertureDB using LlamaIndex's API.

In the future, we plan to add support for ApertureDB's graph database functionality to LlamaIndex. This will allow you to store and query graphs in ApertureDB using LlamaIndex's API.

This is a work in progress and will be contributed upstream to the LlamaIndex repository soon. If you are using this integration, please let us know at team@aperturedata.io.

Vector Store

ApertureDB's integration with LlamaIndex is done through the ApertureDB Vector Store. This is a Python package that provides an interface for storing and retrieving vectors from ApertureDB.

The source code for this integration is available in the llama_index repository.

Examples of using ApertureDB in LlamaIndex:

Example code using Aperturedb Vector store


!mkdir data
!cd data && wget https://vldb.org/pvldb/vol14/p3240-remis.pdf && cd -

from llama_index.core import (
VectorStoreIndex,
SimpleDirectoryReader,
StorageContext
)
from llama_index.vector_stores.ApertureDB import ApertureDBVectorStore

from google.colab import userdata
import os

os.environ["OPENAI_API_KEY"] = userdata.get("OPENAI_API_KEY")

adb_client = ApertureDBVectorStore(dimensions=1536)
storage_context = StorageContext.from_defaults(vector_store=adb_client)

documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

query_engine = index.as_query_engine()
query_str = [
"Who all created VDMS. Give me a list.",
"How many images were ingested in VDMS for its scale test?",
"What are distinguishing features of VDMS?"
]
for qs in query_str:
response = query_engine.query(qs)
print(f"{qs=}\r\n")
print(response)

Here's a link to a complete working example.

Graph database

Aperturedb can also be used as a PropertyGraph in LlamaIndex. There is an implementation in the github respository

An example to auto build a Knowledge Graph from a unstructured (text form) data using an LLM can be implemented as follows:

mkdir -p 'data/paul_graham/'
wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

Ingesting the nodes

from llama_index.graph_stores.ApertureDB import ApertureDBGraphStore
from llama_index.core import SimpleDirectoryReader, PropertyGraphIndex
from llama_index.core.indices.property_graph import SchemaLLMPathExtractor
from llama_index.llms.openai import OpenAI

graph_store = ApertureDBGraphStore()
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()

kg_extractor = SchemaLLMPathExtractor(
llm=OpenAI(model="gpt-3.5-turbo", temperature=0.0),
strict=True
)

index = PropertyGraphIndex.from_documents(
documents,
embed_kg_nodes=False,
kg_extractors=[kg_extractor],
property_graph_store=graph_store,
show_progress=True,
)

Querying the nodes.

from llama_index.graph_stores.ApertureDB import ApertureDBGraphStore
from llama_index.core import SimpleDirectoryReader, PropertyGraphIndex
from llama_index.core.indices.property_graph import SchemaLLMPathExtractor
from llama_index.llms.openai import OpenAI

graph_store = ApertureDBGraphStore()

kg_extractor = SchemaLLMPathExtractor(
llm=OpenAI(model="gpt-3.5-turbo", temperature=0.0),
strict=True,
)

index = PropertyGraphIndex.from_existing(
embed_kg_nodes=False,
kg_extractors=[kg_extractor],
property_graph_store=graph_store,
show_progress=True
)

def run_queries(query_engine):
query_str = [
"What has Paul graham worked on?",
"Who has Paul Graham worked with?"
]
for qs in query_str:
response = query_engine.query(qs)
print(f"{qs=}")
print(f"{response.response=}")
print("=" * 50)

run_queries(query_engine)

Implementation details

Those attempting a hybrid approach should note a few of details of how LlamaIndex vectore stores and documents are represented internally in ApertureDB:

  • The LlamaIndex vector store corresponds to a DescriptorSet in ApertureDB
  • Documents with embeddings correspond to Descriptors.
  • The document id field is stored in the uniqueid property
  • The document text field is stored in the text property
  • Metadata properties are stored as properties with a lm_ prefix.
  • The Entity nodes of LlamaIndex get converted into Entities in ApertureDB.
  • The Relation nodes of LlamaIndex get converted into Connections in ApertureDB.