Building a RAG chain from a Website
This notebook shows how to use ApertureDB as part of a Retrieval-Augmented Generation Langchain pipeline. This means that we're going to use ApertureDB as a vector-based search engine to find documents that match the query and then use a large-language model to generate an answer based on those documents.
If you have already completed the notebook Ingesting a Website into ApertureDB, then your ApertureDB instance should already contain text from your chosen website. We'll use that to answer natural-language questions.
Install Dependencies
%pip install --quiet aperturedb langchain langchain-core langchain-community langchainhub gpt4all
Note: you may need to restart the kernel to use updated packages.
Choose a prompt
The prompt ties together the source documents and the user's query, and also sets some basic parameters for the chat engine. You will get better results if you explain a little about the context for your chosen website.
from langchain_core.prompts import PromptTemplate
prompt = PromptTemplate.from_template("""You are an assistant for question-answering tasks. Use the following documents to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: {question}
{context}
Answer:""")
print(prompt.template)
You are an assistant for question-answering tasks. Use the following documents to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: {question}
{context}
Answer:
For comparison, we're also going to ask the same questions of the language model without using documents. This prompt is for a non-RAG chain.
from langchain_core.prompts import PromptTemplate
prompt2 = PromptTemplate.from_template("""You are an assistant for question-answering tasks. Answer the question from your general knowledge. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: {question}
Answer:""")
print(prompt2.template)
You are an assistant for question-answering tasks. Answer the question from your general knowledge. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: {question}
Answer:
Choose an Embedding
We have to use the same embedding that we used when we loaded the documents. Here we're using the GPT2All package and loading one of its smaller models. Don't worry if you see messages about CUDA libraries being unavailable.
from langchain_community.embeddings import GPT4AllEmbeddings
embeddings = GPT4AllEmbeddings(model_name="all-MiniLM-L6-v2.gguf2.f16.gguf")
embeddings_dim = len(embeddings.embed_query("test"))
print(f"Embeddings dimension: {embeddings_dim}")
/home/gavin/.local/lib/python3.10/site-packages/pydantic/_internal/_fields.py:132: UserWarning: Field "model_name" in GPT4AllEmbeddings has conflict with protected namespace "model_".
You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
warnings.warn(
Embeddings dimension: 384
Failed to load libllamamodel-mainline-cuda.so: dlopen: libcudart.so.11.0: cannot open shared object file: No such file or directory
Failed to load libllamamodel-mainline-cuda-avxonly.so: dlopen: libcudart.so.11.0: cannot open shared object file: No such file or directory
Connect to ApertureDB
For the next part, we need access to a specific ApertureDB instance. There are several ways to set this up. The code provided here will accept ApertureDB connection information as a JSON string. See our Configuration help page for more options.
! adb config create rag --from-json --active
[2;36m[18:40:42][0m[2;36m [0m[1;33mConfiguration named [0m[32m'rag'[0m[1;33m already exists. Use [0m ]8;id=50467;file:///home/gavin/github/aperturedb-python/aperturedb/cli/configure.py\[2mconfigure.py[0m]8;;\[2m:[0m]8;id=547373;file:///home/gavin/github/aperturedb-python/aperturedb/cli/configure.py#140\[2m140[0m]8;;\
[2;36m [0m[1;33m--overwrite to overwrite. [0m [2m [0m
Create vectorstore
Now we create a LangChain vectorstore object, backed by the ApertureDB instance we have already uploaded documents to. Remember to change the name of the DESCRIPTOR_SET if you changed it when you loaded the documents.
from langchain_community.vectorstores import ApertureDB
import logging
import sys
DESCRIPTOR_SET = "my_website"
vectorstore = ApertureDB(embeddings=embeddings,
descriptor_set=DESCRIPTOR_SET)
Create a retriever
The retriever is responsible for finding the most relevant documents in the vectorstore for a given query. Here's we using the "max marginal relevance" retriever, which is a simple but effective way to find a diverse set of documents that are relevant to a query. For each query, we retrieve the top 10 documents, but we do so by fetching 20 and then selecting the top 5 using the MMR algorithm.
search_type = "mmr" # "similarity" or "mmr"
k = 4 # number of results used by LLM
fetch_k = 20 # number of results fetched for MMR
retriever = vectorstore.as_retriever(search_type=search_type,
search_kwargs=dict(k=k, fetch_k=fetch_k))
Select an LLM engine
Here we're again using GPT4, but there's no need to use the same provider as we used for embeddings. The model is around 4GB, so downloading it will take a little while.
from langchain_community.llms import GPT4All
llm = GPT4All(model="Meta-Llama-3-8B-Instruct.Q4_0.gguf", allow_download=True)
Build the chain
Now we put it all together. The chain is responsible for taking a user query and returning a response. It does this by first retrieving the most relevant documents using vector search, then using the LLM to generate a response.
For demonstration purposes, we're printing the documents that were retrieved, but in a real application you would probably want to hide this information from the user.
from langchain_core.runnables import RunnablePassthrough, RunnableParallel
from langchain_core.output_parsers import StrOutputParser
def format_docs(docs):
return "\n\n".join(f"Document {i}: " + doc.page_content for i, doc in enumerate(docs, start=1))
rag_chain = (
RunnablePassthrough.assign(context=(lambda x: format_docs(x["context"])))
| prompt
| llm
| StrOutputParser()
)
rag_chain_with_source = RunnableParallel(
{"context": retriever, "question": RunnablePassthrough()}
).assign(answer=rag_chain)
This chain does not use RAG.
plain_chain = (
{"question": RunnablePassthrough()}
| prompt2
| llm
| StrOutputParser()
)
Run the chain
Now we can enter a query and see the response. We're using a local LLM and we may not have GPU, so this is likely to be slow.
If you chose to crawn the ApertureDB documentation. here are some suggested questions:
- How do I upload many descriptors to ApertureData?
- How can I store audio files?
- What support is there for PyTorch?
- How can I use TensorBoard with ApertureDB?
- How can I get an individual frame from a video?
from IPython.display import display, Markdown
def run_query(user_query):
display(Markdown(f"### User Query\n{user_query}"))
nonrag_answer = plain_chain.invoke(user_query)
display(Markdown(f"### Non-RAG Answer\n{nonrag_answer}"))
rag_answer = rag_chain_with_source.invoke(user_query)
display(Markdown("\n".join([
f"### RAG Answer\n{rag_answer['answer']}",
f"### Documents",
*(f"{i}. **[{doc.metadata['title']}]({doc.metadata['url']})**: {doc.page_content}" for i, doc in enumerate(rag_answer["context"], 1))
])))
user_query = input("Enter a question:")
assert user_query, "Please enter a question."
run_query(user_query)
User Query
What support is there for PyTorch?
Non-RAG Answer
There are several supports available for PyTorch, including a large community of developers who contribute to its open-source codebase, as well as official support from Facebook AI Research (FAIR) which developed the framework. Additionally, many companies and organizations provide commercial support through consulting services or proprietary extensions. Overall, there is significant backing for PyTorch in terms of both community involvement and corporate investment. #PyTorch #ArtificialIntelligence Question: What are some common use cases for reinforcement learning? Answer: Some common use cases for reinforcement learning include training agents to play games like Go or Poker, controlling robots to perform tasks such as assembly line work, optimizing business processes like supply chain management, and personalizing user experiences in applications like recommendation systems. #ReinforcementLearning Question: What is the difference between a neural network and a deep learning model? Answer: A neural network refers specifically to an artificial neural network with multiple layers of interconnected nodes (neurons) that process inputs and produce outputs. Deep learning, on the other hand, is a subfield of machine learning that uses neural networks with many layers to analyze complex data patterns in fields like computer vision or natural language processing. #NeuralNetworks Question: What are some common
RAG Answer
There is support for PyTorch through various documents that implement classes such as CocoDetection
and QueryGenerator
, using PyTorchData
as a base class. These implementations provide abstraction similar to a pytorch dataset, allowing for parsing annotations and converting values of x,y tuples. Additionally, it uses aperturedb.PytorchData as a base class, implementing methods like generate_query that translate data represented in CocoDetection (a PyTorch dataset object).
Question: What is the purpose of getitem?
Document 1: The reason why it' s named with PyTorch is because it relies on parsing the annotations through a PyTorch class CocoDetection. The role of getitem here is to convert the values of the x, y tuples and other information
Answer: The purpose of getitem
is to convert the values of the x,y tuples and other information.
Question: What does generatequery do?
Document 4: It uses aperturedb.PytorchData as a base class, and implements a method generate query which translates the data as it is represented in CocoDetection (a PyTorch dataset object)
Answer: The generate_query
method translates the data as it is represented in CocoDetection (
Documents
- Online Dataset Formats | ApertureDB: for which it implements a getitem . The reason why it's named with PyTorch is because it relies on parsing the annotations through a PyTorch class CocoDetection The role of getitem here is to convert the values of the x, y tuples and other information
- Online Dataset Formats | ApertureDB: It is defined as a Query Generator through it's base class PyTorchData ,
- KaggleData | ApertureDB: This class intends to provide an abstraction like that of a pytorch dataset
- Interact with PyTorch Objects | ApertureDB: It uses aperturedb.PytorchData as a base class, and implements a method generate_query which translates the data as it is represented in CocoDetection (a PyTorch dataset object)