MAS is all you want: supercharge your RAG with a Multi-Agent System

AG2 (previously referred to as AutoGen) is an revolutionary open-source programming framework designed to facilitate the event of AI brokers and improve collaboration amongst a number of brokers to sort out advanced duties. Its main objective is to simplify the creation and analysis of agentic AI. Whereas the official AG2 web site claims that the framework is able to “construct production-ready multi-agent programs in minutes,” I personally consider that there’s nonetheless some work wanted earlier than it may be thought-about absolutely production-ready. Nevertheless, it’s plain that AG2 gives a really user-friendly setting for creating experiments geared toward analysis. You will need to emphasize that there are numerous different frameworks out there for creating multi-agent programs. For instance: Letta, LangGraph, CrewAI, and many others.

On this tutorial we’re going to implement a MAS with:

  • Human → a proxy for human enter.
  • Agent Ingestion → accountable for ingesting info from textual content information or straight from textual content inputs.
  • Agent Retrieve → accountable for extracting related info from the interior database to help different brokers in answering consumer questions.
  • Agent Reply → accountable for offering solutions to consumer queries utilizing info retrieved by the Agent Ingestion.
  • Agent Router → accountable for facilitating communication between the human consumer and different brokers.

Human will work together solely with Agent Router which will probably be accountable of an inner chat group that features Agent Retrieve, Agent Reply and Agent Ingestion. Brokers contained in the chat group collaborate with their data and instruments to offer the perfect reply doable.

# Brokers' Topology

Human <-> Agent Router <-> [Agent Ingestion, Agent Retrieve, Agent Answer]

The entire code for the MA-RAG (Multi-Agent Retrieval-Augmented Technology) system may be discovered within the mas.py file. On this part, we are going to focus on some key elements and options of the code which can be notably noteworthy.

Brokers Definition

To outline an agent in AG2, we use the ConversableAgent() class. For example, to outline the Agent Ingestion:

agent_ingestion = ConversableAgent(
identify = "agent_ingestion",
system_message = SYSTEM_PROMPT_AGENT_INGESTION,
description = DESCRIPTION_AGENT_INGESTION,
llm_config = llm_config,
human_input_mode = "NEVER",
silent=False
)

ee specify:

  • a reputation (agent_ingestion);
  • the system immediate that defines the agent (SYSTEM_PROMPT_AGENT_INGESTION is a variable outlined in prompts.py);
SYSTEM_PROMPT_AGENT_INGESTION = '''

You're the **Ingestion Agent** tasked with buying new data from numerous sources. Your main accountability is to ingest info from textual content information or straight from textual content inputs.

### Key Pointers:
- **No New Data**: You don't contribute new info to conversations; your position is strictly to ingest and retailer data.
- **Analysis of Data**: Earlier than ingesting any new data, rigorously assess whether or not the knowledge supplied is genuinely novel and related.
- **Step-by-Step Strategy**: Take a second to mirror and method every activity methodically. Breathe deeply and concentrate on the method.

### Instruments Accessible:
1. **`path_to_db()`**: Use this instrument to ingest data from a specified textual content file.
2. **`text_to_db()`**: Make the most of this instrument to ingest data straight from supplied textual content.

Your mission is to reinforce the database with correct and related info whereas guaranteeing that you just adhere to the rules above.

'''

  • the outline that may assist through the routing of messages (DESCRIPTION_AGENT_INGESTION is a variable outlined in prompts.py);
DESCRIPTION_AGENT_INGESTION = '''

I'm the **Ingestion Agent** accountable for buying new data from textual content information or straight from user-provided textual content.

'''

  • the configuration for LLM;
llm_config = {
"config_list": [
{
"model": "gpt-4o-mini",
"api_key": os.environ["OPENAI_API_KEY"],
"temperature": 0.7,
}
]
}
  • whether or not to ask for human inputs each time a message is obtained (by setting human_input_mode = “NEVER” the agent won’t ever immediate for human enter);
  • whether or not to not print the message despatched.

Equally, we are able to outline all different brokers (human, agent_retrieve, agent_answer, agent_router).

Including Instruments

Thus far, we now have outlined numerous brokers; nonetheless, as they’re at the moment configured, these brokers can solely obtain textual content inputs and reply with textual content outputs. They don’t seem to be geared up to carry out extra advanced duties that require particular instruments. For example, an agent in its present state can’t entry the database we created within the first a part of this tutorial to conduct searches.

Tools.
Picture by Kajetan Sumila on Unsplash

To allow this performance, we have to “inform” the agent that it has entry to a instrument able to performing sure duties. Our choice for implementing a instrument deterministically, quite than asking the agent to determine it out by itself, is predicated on effectivity and reliability. A deterministic method reduces the probability of errors, as the method may be clearly outlined and coded. However, we are going to nonetheless give the agent the accountability and autonomy to pick out which instrument to make use of, decide the parameters for its use, and determine tips on how to mix a number of instruments to deal with advanced requests. This stability between steerage and autonomy will improve the agent’s capabilities whereas sustaining a structured method.

I hope it’s clear by now that, opposite to the claims made by many non-experts who recommend that brokers are “so clever” that they will effortlessly deal with advanced duties, there may be really a major quantity of labor taking place behind the scenes. The foundational instruments that brokers depend on require cautious examine, implementation, and testing. Nothing happens “automagically,” even within the realm of generative AI. Understanding this distinction is essential for appreciating the complexity and energy concerned in growing efficient AI programs. Whereas these brokers can carry out spectacular duties, their capabilities are the results of meticulous engineering and considerate design quite than innate intelligence.

Bear in mind the capabilities text_to_db() and path_to_db() we created earlier than for the ingestion? We will “register” them to Agent Ingestion on this means:

register_function(
path_to_db,
caller=agent_ingestion,
executor=agent_ingestion,
identify="path_to_db",
description="Ingest new data from a textual content file given its path.",
)

register_function(
text_to_db,
caller=agent_ingestion,
executor=agent_ingestion,
identify="text_to_db",
description="Ingest new data from a bit of dialog.",
)

Equally, we are able to add the retrieve instrument to Agent Retrieve:

register_function(
retrieve_str,
caller=agent_retrieve,
executor=agent_retrieve,
identify="retrieve_str",
description="Retrieve helpful info from inner DB.",
)

MAS Topology

Thus far, we now have outlined every agent, their roles, and the instruments they will make the most of. What stays is how these brokers are organized and the way they impart with each other. We purpose to create a topology during which the Human interacts with the Agent Router, which then participates in a nested chat group with different brokers. This group collaborates to deal with the human question, autonomously figuring out the order of operations, deciding on the suitable instruments, and formulating responses. On this setup, the Agent Router acts as a central coordinator that directs the stream of data among the many brokers (Agent Ingestion, Agent Retrieve, and Agent Reply). Every agent has a particular perform: Agent Ingestion processes incoming knowledge, Agent Retrieve accesses related info from the database, and Agent Reply proposes the ultimate response primarily based on the gathered insights.

To create a gaggle chat, we are able to use the GroupChat() class.

group_chat = GroupChat(
brokers = [
agent_router,
agent_ingestion,
agent_retrieve,
agent_answer
],
messages=[],
send_introductions=False,
max_round=10,
speaker_selection_method="auto",
speaker_transitions_type="allowed",
allowed_or_disallowed_speaker_transitions={
agent_router: [agent_ingestion, agent_retrieve, agent_answer],
agent_ingestion: [agent_router],
agent_retrieve: [agent_answer],
agent_answer: [agent_router],
},
)

On this instantiation, we checklist the brokers that will probably be a part of the group (brokers), determine that they don’t must introduce themselves at first of the chat (send_introductions), set the max rounds of dialog to 10 (max_round), delegate the collection of the speaker at every spherical to the chat supervisor (speaker_selection_method), and constrain the dialog transitions to a specific scheme (allowed_or_disallowed_speaker_transitions).

Created the group, we’d like a gaggle supervisor that handle the order of dialog:

group_chat_manager = GroupChatManager(
groupchat=group_chat,
llm_config=llm_config,
silent=False,
is_termination_msg=lambda msg: "(to human)" in msg["content"].decrease()
)

You will need to observe the lambda perform used for the is_termination_msg parameter. This perform determines when the chat ought to terminate by checking if the final message comprises the substring “(to human).” This mechanism is essential as a result of, within the system immediate for the Agent Router, it specifies: “Clearly point out your message’s meant recipient. For instance, use (to human) when addressing the consumer.” This method gives a transparent sign for when to exit the nested chat and return a response to the human consumer.

Now, we have to make group chat we now have simply created a nested chat that stats from Agent Router.

nested_chats = [
{
"recipient": group_chat_manager,
"summary_method": "last_msg",
}
]

agent_router.register_nested_chats(
nested_chats,
set off=lambda sender: sender in [human],
)

By leveraging a structured communication framework and predefined transitions between brokers, we guarantee environment friendly collaboration between brokers and in the identical time enable flexibility in decision-making.

Let’s begin chatting

We’re actually prepared now. To begin chatting with Agent Router:

chat_results = human.initiate_chat(
agent_router,
message=enter("Ciao! How can I help you right this moment? 😊 : "),
max_turns = 100
)

Or, we are able to run the Python script straight with:

python mas.py

and it’s best to see one thing like

$ python mas.py                                                                                     [conda:mas]
Ciao! How can I help you right this moment? 😊 :

simply sort your message and you’ll obtain the reply from Agent Router.

Right here you’ll find an instance of dialog. The chat began with the query

[Human]

Is there a college in Philadelphia, and was it established earlier than the College of California, Berkeley?

and the reply given by Agent Router was

[Agent Router]

Sure, there’s a college in Philadelphia referred to as the **College of Pennsylvania** (generally known as Penn or UPenn). It’s a non-public Ivy League analysis college and was chartered previous to the U.S. Declaration of Independence, making it one of many oldest establishments of upper training in the US. The college first convened its board of trustees in 1749, which positions it because the fourth-oldest establishment, although some argue it could be the fifth-oldest because of the institution dates of different establishments. 📚

Alternatively, the **College of California, Berkeley** was based in **1868**.

Given these dates, the College of Pennsylvania was established considerably sooner than the College of California, Berkeley. 🎓

**Sources**:
1. College of Pennsylvania Overview (final up to date: 12 January 2025)
2. College of California, Berkeley Overview (final up to date: 12 January 2025)

Be aware that the direct reply to the preliminary query was not supplied within the textual content information we ingested at first. In actual fact, if we take a look at the log of the nested chat at line 74 we are able to see that contained in the nested chat Agent Retrieve, autonomously, makes two completely different queries with the retrieve instrument querying details about College of Philadelphia and UC Berkeley, after which Agent Reply proposes the answer with the retrieved chunks.

An identical mechanism is concerned additionally after we add a brand new knowledge supply (keep in mind the file about Turin we unnoticed beforehand?):

[Human]

What have you learnt in regards to the College of Turin?

[Agent Router]

🤔 It appears I couldn’t discover particular details about the College of Turin. The main points I retrieved have been primarily about different universities just like the College of Barcelona and UC Berkeley. If there’s a specific facet of the College of Turin you’re inquisitive about, or should you’d like info on one thing else, simply let me know! 😊

[Human]

Details about Turin may be present in kb/cities/europe/turin.txt

[Agent Router]

Right here is a few details about Turin:

### Overview

**Location**: Turin (Italian: Torino) is a metropolis in Northern Italy, …

[…]

Within the background, what occurred was that through the first interplay within the nested chat, Agent Reply realized that not one of the retrieved info pertained to Turin (line 163). Nevertheless, the second time, when the brokers detected a brand new supply of data, Agent Ingestion was known as upon (line 179) to course of the brand new knowledge and put it aside within the database. This dynamic illustrates the collaborative nature of the brokers throughout the system. Initially, Agent Reply’s incapability to seek out related info highlights the significance of thorough knowledge retrieval processes. As soon as a brand new supply was recognized, Agent Ingestion swiftly stepped in to make sure that precious info was included into the database, thereby enhancing the brokers’ capacity to reply successfully in future interactions.

You may get extra examples within the log file the place we examined how the system is resilient to exterior contradictory statements (line 34) and the way new info straight from the dialog is saved by Agent Ingestion (strains 54, 352).

Past Toy MA-RAG

We have now explored tips on how to construct a RAG system primarily based on a Multi-Agent paradigm. What we offered is, in fact, a simplification of how such a system must perform in a manufacturing setting. We deliberately unnoticed many essential points (equivalent to guardrails, token consumption, chat interface design, authentication, and many others.) and there are quite a few areas that require important enchancment. For example, an entire pipeline for knowledge ingestion and data base updates is important, in addition to enhancing info retrieval strategies that might leverage graph-based approaches quite than relying solely on embedding similarity. Furthermore, the topology of the brokers may be as advanced as desired. For instance, a number of discussion groups could possibly be created, every specialised in a specific facet of the general pipeline. Moreover, we may introduce oversight/choose roles to critically assess proposed plans and options. The probabilities are just about limitless, and discovering the appropriate answer for a particular use case is usually a type of artwork itself.

The speedy rise in reputation of MAS actually has components of a bubble, however it is usually pushed by the potential of such programs to sort out advanced duties that have been beforehand unimaginable. At present, we’re nonetheless in a preliminary section of this expertise, though platforms are rising to facilitate the creation of MAS. Reflecting on this tutorial, it’s evident that, along with the capabilities of LLMs, the administration of the data base is basically essential for a RAG system, even when enhanced by a MAS.

Furthermore, whereas MAS unlocks new capabilities, it additionally introduces complexities in programming such programs. As we improve the variety of brokers linearly, the variety of interactions between them can probably develop quadratically. With every interplay comes the danger of ambiguities and inefficiencies that will propagate into subsequent interactions. In abstract, there are quite a few alternatives but in addition important new dangers. What we are able to do is attempt to know these programs deeply to be ready for his or her challenges and prospects.