The best way to Construct Agentic QA RAG System Utilizing Haystack Framework -

Think about you’re constructing a buyer help AI that should reply questions on your product. Generally it wants to drag data out of your documentation, whereas different occasions it wants to go looking the net for the most recent updates. Agentic RAG techniques turn out to be useful in such varieties of advanced AI functions. Consider them as good analysis assistants who not solely know your inside documentation but additionally resolve when to go to go looking the net. On this information, we are going to stroll by way of the method of constructing an agentic QA RAG system utilizing the Haystack framework.

Studying Goals

Know what an agentic LLM is and perceive how it’s totally different from a RAG system.
Familiarize the Haystack framework for agentic LLM functions.
Perceive the method of immediate constructing from a template and learn to be a part of totally different prompts collectively.
Learn to create embedding utilizing ChromaDB in Haystack.
Learn to arrange a whole native growth system from embedding to technology.

This text was printed as part of the Knowledge Science Blogathon.

What’s an Agentic LLM?

An agentic LLM is an AI system that may autonomously make choices and take actions based mostly on its understanding of the duty. Not like conventional LLMs that primarily generate textual content responses, an agentic LLM can do much more. It might probably assume, plan, and act with minimal human enter. It assesses its data, recognizing when it wants extra data or exterior instruments. Agentic LLMs don’t depend on static information or listed data, as an alternative, they resolve which sources to belief and methods to collect the very best insights.

This sort of system may choose the fitting instruments for the job. It might probably resolve when it must retrieve paperwork, run calculations, or automate duties. What units them aside is its potential to interrupt down advanced issues into steps and execute them independently which makes it helpful for analysis, evaluation, and workflow automation.

RAG vs Agentic RAG

Conventional RAG techniques observe a linear course of. When a question is obtained, the system first identifies the important thing parts throughout the request. It then searches the data base, scanning for related data that may assist design an correct response. As soon as the related data or information is retrieved, the system processes it to generate a significant and contextually related response.

You possibly can perceive the processes simply by the under diagram.

Now, an agentic RAG system enhances this course of by:

Evaluating question necessities
Deciding between a number of data sources
Doubtlessly combining data from totally different sources
Making autonomous choices about response technique
Offering source-attributed responses

The key distinction lies within the system’s potential to make clever choices about methods to deal with queries, relatively than following a set retrieval-generation sample.

Understanding Haystack Framework Elements

Haystack is an open-source framework for constructing production-ready AI, LLM functions, RAG pipelines, and search techniques. It provides a strong and versatile framework for constructing LLM functions. It permits you to combine fashions from numerous platforms akin to Huggingface, OpenAI, CoHere, Mistral, and Native Ollama. You may as well deploy fashions on cloud companies like AWS SageMaker, BedRock, Azure, and GCP.

Haystack supplies strong doc shops for environment friendly information administration. It additionally comes with a complete set of instruments for analysis, monitoring, and information integration which guarantee easy efficiency throughout all layers of your software. It additionally has robust group collaboration which makes new service integration from numerous service suppliers periodically.

What Can You Construct Utilizing Haystack?

Easy to advance RAG in your information, utilizing strong retrieval and technology strategies.
Chatbot and brokers utilizing up-to-date GenAI fashions like GPT-4, Llama3.2, Deepseek-R1.
Generative multimodal question-answering system on combined sorts (photos, textual content, audio, and desk) data base.
Info extraction from paperwork or constructing data graphs.

Haystack Constructing Blocks

Haystack has two major ideas for constructing absolutely purposeful GenAI LLM techniques – parts and pipelines. Let’s perceive them with a easy instance of RAG on Japanese Anime Characters

Elements

Elements are the core constructing blocks of Haystack. They’ll carry out duties akin to doc storing, doc retrieval, textual content technology, and embedding. Haystack has many parts you need to use straight after set up, it additionally supplies APIs for making your personal parts by writing a Python class.

There’s a assortment of integration from accomplice corporations and the group.

Set up Libraries and set Ollama

$ pip set up haystack-ai ollama-haystack

# On you system obtain Ollama and set up LLM

ollama pull llama3.2:3b

ollama pull nomic-embed-text


# After which begin ollama server
ollama serve

Import some parts

from haystack import Doc, Pipeline
from haystack.parts.builders.prompt_builder import PromptBuilder
from haystack.parts.retrievers.in_memory import InMemoryBM25Retriever
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack_integrations.parts.turbines.ollama import OllamaGenerator

Create a doc and doc retailer

document_store = InMemoryDocumentStore()
paperwork = [
    Document(
        content="Naruto Uzumaki is a ninja from the Hidden Leaf Village and aspires to become Hokage."
    ),
    Document(
        content="Luffy is the captain of the Straw Hat Pirates and dreams of finding the One Piece."
    ),
    Document(
        content="Goku, a Saiyan warrior, has defended Earth from numerous powerful enemies like Frieza and Cell."
    ),
    Document(
        content="Light Yagami finds a mysterious Death Note, which allows him to eliminate people by writing their names."
    ),
    Document(
        content="Levi Ackerman is humanity’s strongest soldier, fighting against the Titans to protect mankind."
    ),
]

Pipeline

Pipelines are the spine of Haystack’s framework. They outline the move of information between totally different parts. Pipelines are primarily a Directed Acyclic Graph (DAG). A single part with a number of outputs can join to a different single part with a number of inputs.

You possibly can outline pipeline by

pipe = Pipeline()

pipe.add_component("retriever", InMemoryBM25Retriever(document_store=document_store))
pipe.add_component("prompt_builder", PromptBuilder(template=template))
pipe.add_component(
    "llm", OllamaGenerator(mannequin="llama3.2:1b", url="http://localhost:11434")
)
pipe.join("retriever", "prompt_builder.paperwork")
pipe.join("prompt_builder", "llm")

You possibly can visualize the pipeline

image_param = {
    "format": "img",
    "kind": "png",
    "theme": "forest",
    "bgColor": "f2f3f4",
}
pipe.present(params=image_param)

The pipeline supplies:

Modular workflow administration
Versatile parts association
Simple debugging and monitoring
Scalable processing structure

Nodes

Nodes are the essential processing items that may be related in a pipeline these nodes are the parts that carry out particular duties.

Examples of nodes from the above pipeline

pipe.add_component("retriever", InMemoryBM25Retriever(document_store=document_store))
pipe.add_component("prompt_builder", PromptBuilder(template=template))
pipe.add_component(
    "llm", OllamaGenerator(mannequin="llama3.2:1b", url="http://localhost:11434")
)

Connection Graph

The connection graph defines how parts work together.

From the above pipeline, you possibly can visualize the connection graph.

image_param = {
    "format": "img",
    "kind": "png",
    "theme": "forest",
    "bgColor": "f2f3f4",
}
pipe.present(params=image_param)

The connection graph of the anime pipeline

Building Agentic QA-RAG Using Haystack Framework

This graph construction:

Defines information move between parts
Manages enter/output relationships
Permits parallel processing the place attainable
Creates versatile processing pathways.

Now we are able to question our anime data base utilizing the immediate.

Create a immediate template

template = """
Given solely the next data, reply the query.
Ignore your personal data.

Context:
{% for doc in paperwork %}
    {{ doc.content material }}
{% endfor %}

Query: {{ question }}?
"""

This immediate will present a solution taking data from the doc base.

Question utilizing immediate and retriever

question = "How Goku remove folks?"
response = pipe.run({"prompt_builder": {"question": question}, "retriever": {"question": question}})
print(response["llm"]["replies"])

Response:

This RAG is straightforward but conceptually helpful to the newcomer. Now that we’ve understood a lot of the ideas of Haystack frameworks, we are able to deep dive into our fundamental undertaking. If any new factor comes up I’ll clarify alongside the way in which.

Query-Reply RAG Challenge for Greater Secondary Physics

We are going to construct an NCERT Physics books-based Query Reply RAG for greater secondary college students. It would present solutions to the question by taking data from the NCERT books, and If the data is just not there it should search the net to get that data.
For this, I’ll use:

Native Llama3.2:3b or Llama3.2:1b
ChromaDB for embedding storage
Nomic Embed Textual content mannequin for native embedding
DuckDuckGo seek for internet search or Tavily Search (elective)

I exploit a free, completely localized system.

Setting Up the Developer Setting

We are going to setup a conda env Python 3.12

$conda create --name agenticlm python=3.12

$conda activate agenticlm

Set up Obligatory Bundle

$pip set up haystack-ai ollama-haystack pypdf

$pip set up chroma-haystack duckduckgo-api-haystack

Now create a undertaking listing named qagent.

$md qagent # create dir

$cd qagent # change to dir

$ code .   # open folder in vscode

You need to use plain Python recordsdata for the undertaking or Jupyter Pocket book for the undertaking it doesn’t matter. I’ll use a plain Python file.

Create a fundamental.py file on the undertaking root.

Importing Obligatory Libraries

System packages
Core haystack parts
ChromaDB for embedding parts
Ollama Elements for Native Inferences
And Duckduckgo for internet search

# System packages
import os
from pathlib import Path

# Core haystack parts
from haystack import Pipeline
from haystack.parts.writers import DocumentWriter
from haystack.parts.joiners import BranchJoiner
from haystack.document_stores.sorts import DuplicatePolicy
from haystack.parts.converters import PyPDFToDocument
from haystack.parts.routers import ConditionalRouter
from haystack.parts.builders.prompt_builder import PromptBuilder
from haystack.parts.preprocessors import DocumentCleaner, DocumentSplitter

# ChromaDB integration
from haystack_integrations.document_stores.chroma import ChromaDocumentStore
from haystack_integrations.parts.retrievers.chroma import (
    ChromaEmbeddingRetriever,
)

# Ollama integration
from haystack_integrations.parts.embedders.ollama.document_embedder import (
    OllamaDocumentEmbedder,
)
from haystack_integrations.parts.embedders.ollama.text_embedder import (
    OllamaTextEmbedder,
)
from haystack_integrations.parts.turbines.ollama import OllamaGenerator

# Duckduckgo search integration
from duckduckgo_api_haystack import DuckduckgoApiWebSearch

Making a Doc Retailer

Doc retailer is a very powerful right here we are going to retailer our embedding for retrieval, we use ChromaDB for the embedding retailer, and as you may even see within the earlier instance, we use InMemoryDocumentStore for quick retrieval as a result of then our information was tiny however for a sturdy system of retrieval we don’t depend on the InMemoryStore, it should hog the reminiscence and we may have creat embeddings each time we begin the system.

The answer is a Vector database akin to Pinecode, Weaviate, Postgres Vector DB, or ChromaDB. I exploit ChromaDB as a result of free, open-source, simple to make use of, and strong.

# Chroma DB integration part for doc(embedding) retailer

document_store = ChromaDocumentStore(persist_path="qagent/embeddings")

persist_path is the place you need to retailer your embedding.

PDF recordsdata path

HERE = Path(__file__).resolve().dad or mum
file_path = [HERE / "data" / Path(name) for name in os.listdir("QApipeline/data")]

It would create a listing of recordsdata from the info folder which consists of our PDF recordsdata.

Doc Preprocessing Elements

We are going to use Haystack’s built-in doc preprocessor akin to cleaner, splitter, and file converter, after which use a author to put in writing the info into the shop.

Cleaner: It would clear the additional area, repeated strains, empty strains, and many others from the paperwork.

cleaner = DocumentCleaner()

Splitter: It would cut up the doc in numerous methods akin to phrases, sentences, para, pages.

splitter = DocumentSplitter()

File Converter: It would use the pypdf to transform the pdf to paperwork.

file_converter = PyPDFToDocument()

Author: It would retailer the doc the place you need to retailer the paperwork and for duplicate paperwork, it should overwrite with earlier one.

author = DocumentWriter(document_store=document_store, coverage=DuplicatePolicy.OVERWRITE)

Now set the embedder for doc indexing.

Embedder: Nomic Embed Textual content

We are going to use nomic-embed-text embedder which may be very efficient and free inHuggingface and Ollama.

Earlier than you run your indexing pipeline open your terminal and kind under to Pull the nomic-embed-text and llama3.2:3b mannequin from the Ollama mannequin retailer

$ ollama pull nomic-embed-text

$ ollama pull llama3.2:3b

and begin Ollama by typing the command ollama serve in your terminal

now embedder part

embedder = OllamaDocumentEmbedder(
    mannequin="nomic-embed-text", url="http://localhost:11434"
)

We use OllamaDocumentEmbedder part for embedding paperwork, however if you wish to embed the textual content string then you must use OllamaTextEmbedder.

Creating Indexing Pipeline

Like our earlier toy RAG instance, we are going to begin by initiating the Pipeline class.

indexing_pipeline = Pipeline()

Now we are going to add the parts to our pipeline one after the other

indexing_pipeline.add_component("embedder", embedder)
indexing_pipeline.add_component("converter", file_converter)
indexing_pipeline.add_component("cleaner", cleaner)
indexing_pipeline.add_component("splitter", splitter)
indexing_pipeline.add_component("author", author)

Including parts to the pipeline doesn’t care about order so, you possibly can add parts in any order. however connecting is what issues.

Connecting Elements to the Pipeline Graph

indexing_pipeline.join("converter", "cleaner")
indexing_pipeline.join("cleaner", "splitter")
indexing_pipeline.join("splitter", "embedder")
indexing_pipeline.join("embedder", "author")

Right here, order issues, as a result of the way you join the part tells the pipeline how the info will move by way of the pipeline. It’s like, It doesn’t matter wherein order or from the place you purchase your plumbing objects however methods to put them collectively will resolve whether or not you get your water or not.

The converter converts the PDFs and sends them to wash for cleansing. Then the cleaner sends the cleaned paperwork to the splitter for chunking. These chunks will then go to the embedded for vectorization, and the final embedded will hand over these embeddings to the author for storage.

Perceive! Okay, let me provide you with a visible graph of the indexing so you possibly can examine the info move.

Draw Indexing Pipeline

image_param = {
    "format": "img",
    "kind": "png",
    "theme": "forest",
    "bgColor": "f2f3f4",
}

indexing_pipeline.draw("indexing_pipeline.png", params=image_param)  # kind: ignore

Yeah, you possibly can create a pleasant mermaid graph from the haystack pipeline simply.

Graph of Indexing Pipeline

I assume now you could have absolutely grasped the concept behind the Haystack Pipeline. Give a thank to you Plumber.

Implement a Router

Now, we have to create a router to route the info by way of a special path. On this case, we’ll use a conditional router which can do our routing job on sure circumstances. The conditional router will consider circumstances based mostly on part output. It would direct information move by way of totally different pipeline branches which permits dynamic decision-making. It would even have strong fallback methods.

# Situations for routing
routes = [
    {
        "condition": "{{'no_answer' in replies[0]}}",
        "output": "{{question}}",
        "output_name": "go_to_websearch",
        "output_type": str,
    },
    {
        "situation": "{{'no_answer' not in replies[0]}}",
        "output": "{{replies[0]}}",
        "output_name": "reply",
        "output_type": str,
    },
]


# router part

router = ConditionalRouter(routes=routes)

When the system will get no_answer replies from the embedding retailer context, then it should go to the net search instruments for gathering related information from the web.

For internet search, we are going to use Duckduckgo API or Tavily, right here I’ve used Duckduckgo.

websearch = DuckduckgoApiWebSearch(top_k=5)

Okay, a lot of the heavy lifting has been completed. Now, time for immediate engineering

Create Immediate Templates

We are going to use the Haystack PromptBuilder part for constructing prompts from the template

First, we are going to create a immediate for qa

template_qa = """
Given ONLY the next data, reply the query.
If the reply is just not contained throughout the paperwork reply with "no_answer.
If the reply is contained throughout the paperwork, begin the reply with "FROM THE KNOWLEDGE BASE: ".

Context:
{% for doc in paperwork %}
    {{ doc.content material }}
{% endfor %}

Query: {{ question }}?

"""

It would take the context from the doc and attempt to reply the query. But when it doesn’t discover related context within the paperwork it should reply no_answer.

Now, within the second immediate after getting no_answer from the LLM, the system will use the net search instruments for gathering context from the web.

Duckduckgo immediate template

template_websearch = """
Reply the next question given the paperwork retrieved from the net.
Begin the reply with "FROM THE WEB: ".

Paperwork:
{% for doc in paperwork %}
    {{ doc.content material }}
{% endfor %}

Question: {{question}}

"""

It would facilitate the system to go to the net search and attempt to reply the question.

Creating immediate utilizing PromptBuilder from Haystack

prompt_qa = PromptBuilder(template=template_qa)

prompt_builder_websearch = PromptBuilder(template=template_websearch)

We are going to use Haystack immediate joiner to hitch to branches of the immediate collectively.

prompt_joiner = BranchJoiner(str)

Implement Question Pipeline

The question pipeline will probably be embedding the question gathering contextual sources from the embeddings and answering our question utilizing LLM or Net Search software.

It’s much like the indexing pipeline.

Initiating Pipeline

query_pipeline = Pipeline()

Including parts to the question pipeline

query_pipeline.add_component("text_embedder", OllamaTextEmbedder())
query_pipeline.add_component(
    "retriever", ChromaEmbeddingRetriever(document_store=document_store)
)
query_pipeline.add_component("prompt_builder", prompt_qa)
query_pipeline.add_component("prompt_joiner", prompt_joiner)
query_pipeline.add_component(
    "llm",
    OllamaGenerator(mannequin="llama3.2:3b", timeout=500, url="http://localhost:11434"),
)
query_pipeline.add_component("router", router)
query_pipeline.add_component("websearch", websearch)
query_pipeline.add_component("prompt_builder_websearch", prompt_builder_websearch)

Right here, for LLM technology we use the OllamaGenerator part for producing solutions utilizing Llama3.2:3b or 1b or no matter LLM you want with instruments calling.

Connecting all of the parts collectively for question move and reply technology

query_pipeline.join("text_embedder.embedding", "retriever.query_embedding")
query_pipeline.join("retriever", "prompt_builder.paperwork")
query_pipeline.join("prompt_builder", "prompt_joiner")
query_pipeline.join("prompt_joiner", "llm")
query_pipeline.join("llm.replies", "router.replies")
query_pipeline.join("router.go_to_websearch", "websearch.question")
query_pipeline.join("router.go_to_websearch", "prompt_builder_websearch.question")
query_pipeline.join("websearch.paperwork", "prompt_builder_websearch.paperwork")
query_pipeline.join("prompt_builder_websearch", "prompt_joiner")

In abstract of the above connection:

The embedding from the text_embedder despatched to the retriever’s question embedding.
The retriever sends information to the prompt_builder’s doc.
Immediate builder go to the immediate joiner to hitch with different prompts.
Immediate joiner passes information to the llm for technology.
LLM’s replies go to the routers to verify if the reply has no_answer or not. If no_answer then it should go to the net search module.
Net search sends the info to an internet search immediate as a question.
Net search paperwork ship information to the net search paperwork.
The online search immediate sends the info to the immediate joiner.
And the immediate joiner will ship the info to the LLM for reply technology.

Why not see for your self?

Draw Question Pipeline Graph

query_pipeline.draw("agentic_qa_pipeline.png", params=image_param)  # kind: ignore

Question Graph

I do know it’s a big graph however it should present you precisely what’s going on underneath the stomach of the beast.

Now it’s time to benefit from the fruit of our laborious work.

Create a perform for straightforward querying.

def get_answer(question: str):
    response = query_pipeline.run(
        {
            "text_embedder": {"textual content": question},
            "prompt_builder": {"question": question},
            "router": {"question": question},
        }
    )
    return response["router"]["answer"]

It’s a simple easy perform for reply technology.

Now run your fundamental script for indexing the NCERT physics ebook

indexing_pipeline.run({"converter": {"sources": file_path}})

It’s a one-time job, after indexing you need to touch upon this line in any other case it should begin re-indexing the books.

and the underside of the file we write our driver code for the question

if __name__ == "__main__":
    question = "Give me 5 MCQ on resistivity?"
    print(get_answer(question))

MCQ on resistivity from the ebook’s data

One other query that’s not within the ebook

if __name__ == "__main__":
    question = "What's Photosynthesis?"
    print(get_answer(question))

Output

Let’s attempt one other query.

if __name__ == "__main__":
    question = (
        "Inform me what's DRIFT OF ELECTRONS AND THE ORIGIN OF RESISTIVITY from the ebook"
    )
    print(get_answer(question))

So, it’s working! We are able to use extra information, books, or PDFs for embedding which can generate extra contextual-aware solutions. Additionally, LLMs akin to GPT-4o, Anthropic’s Claude, or different cloud LLMs will do the job even higher.

Conclusion

Our agentic RAG system demonstrates the pliability and robustness of the Haystack framework with its energy of mixing parts and pipelines. This RAG may be made production-ready by deploying to the net service platform and in addition utilizing higher paid LLM akin to OpenAI, and nthropic. You possibly can construct a UI utilizing Streamlit or React-based internet SPA for a greater consumer expertise.

You could find all of the code used within the article, right here.

Key Takeaways

Agentic RAG techniques present extra clever and versatile responses than conventional RAG.
Haystack’s pipeline structure permits advanced, modular workflows.
Routers allow dynamic decision-making in response technology.
Connection graphs present versatile and maintainable part interactions.
Integration of a number of data sources enhances response high quality.

The media proven on this article is just not owned by Analytics Vidhya and is used on the Writer’s discretion.

Steadily Requested Query

Q1. How does the system deal with unknown queries?

A. The system makes use of its router part to mechanically fall again to internet search when native data is inadequate, guaranteeing complete protection.

Q2. What benefits does the pipeline structure supply?

A. The pipeline structure permits modular growth, simple testing, and versatile part association, making the system maintainable and extensible.

Q3. How does the connection graph improve system performance?

A. The connection graph permits advanced information flows and parallel processing, bettering system effectivity and suppleness in dealing with several types of queries.

This fall. Can I exploit different LLM APIs?

A. Sure, it is rather simple simply set up the mandatory integration package deal for the respective LLM API akin to Gemini, Anthropic, and Groq, and use it along with your API keys.

A self-taught, project-driven learner, like to work on advanced tasks on deep studying, Pc imaginative and prescient, and NLP. I at all times attempt to get a deep understanding of the subject which can be in any subject akin to Deep studying, Machine studying, or Physics. Like to create content material on my studying. Attempt to share my understanding with the worlds.

The best way to Construct Agentic QA RAG System Utilizing Haystack Framework