GenAI fashions are good at a handful of duties equivalent to textual content summarization, query answering, and code era. In case you have a enterprise course of which could be damaged down right into a set of steps, and a number of these steps includes certainly one of these GenAI superpowers, then it is possible for you to to partially automate your small business course of utilizing GenAI. We name the software program utility that automates such a step an agent.
Whereas brokers use LLMs simply to course of textual content and generate responses, this primary functionality can present fairly superior habits equivalent to the power to invoke backend companies autonomously.
Let’s say that you simply need to construct an agent that is ready to reply questions equivalent to “Is it raining in Chicago?”. You can not reply a query like this utilizing simply an LLM as a result of it’s not a job that may be carried out by memorizing patterns from massive volumes of textual content. As a substitute, to reply this query, you’ll want to achieve out to real-time sources of climate info.
There’s an open and free API from the US Nationwide Climate Service (NWS) that gives the short-term climate forecast for a location. Nonetheless, utilizing this API to reply a query like “Is it raining in Chicago?” requires a number of extra steps (see Determine 1):
- We might want to arrange an agentic framework to coordinate the remainder of these steps.
- What location is the person fascinated with? The reply in our instance sentence is “Chicago”. It’s not so simple as simply extracting the final phrase of the sentence — if the person have been to ask “Is Orca Island scorching immediately?”, the situation of curiosity could be “Orca Island”. As a result of extracting the situation from a query requires with the ability to perceive pure language, you’ll be able to immediate an LLM to establish the situation the person is fascinated with.
- The NWS API operates on latitudes and longitudes. In order for you the climate in Chicago, you’ll must convert the string “Chicago” into a degree latitude and longitude after which invoke the API. That is known as geocoding. Google Maps has a Geocoder API that, given a spot identify equivalent to “Chicago”, will reply with the latitude and longitude. Inform the agent to make use of this device to get the coordinates of the situation.
- Ship the situation coordinates to the NWS climate API. You’ll get again a JSON object containing climate knowledge.
- Inform the LLM to extract the corresponding climate forecast (for instance, if the query is about now, tonight, or subsequent Monday) and add it to the context of the query.
- Primarily based on this enriched context, the agent is ready to lastly reply the person’s query.
Let’s undergo these steps one after the other.
First, we are going to use Autogen, an open-source agentic framework created by Microsoft. To observe alongside, clone my Git repository, get API keys following the instructions offered by Google Cloud and OpenAI. Change to the genai_agents folder, and replace the keys.env file along with your keys.
GOOGLE_API_KEY=AI…
OPENAI_API_KEY=sk-…
Subsequent, set up the required Python modules utilizing pip:
pip set up -r necessities.txt
It will set up the autogen module and shopper libraries for Google Maps and OpenAI.
Observe the dialogue beneath by ag_weather_agent.py.
Autogen treats agentic duties as a dialog between brokers. So, step one in Autogen is to create the brokers that may carry out the person steps. One would be the proxy for the end-user. It would provoke chats with the AI agent that we are going to discuss with because the Assistant:
user_proxy = UserProxyAgent("user_proxy",
code_execution_config={"work_dir": "coding", "use_docker": False},
is_termination_msg=lambda x: autogen.code_utils.content_str(x.get("content material")).discover("TERMINATE") >= 0,
human_input_mode="NEVER",
)
There are three issues to notice concerning the person proxy above:
- If the Assistant responds with code, the person proxy is able to executing that code in a sandbox.
- The person proxy terminates the dialog if the Assistant response accommodates the phrase TERMINATE. That is how the LLM tells us that the person query has been absolutely answered. Making the LLM do that is a part of the hidden system immediate that Autogen sends to the LLM.
- The person proxy by no means asks the end-user any follow-up questions. If there have been follow-ups, we’d specify the situation beneath which the human is requested for extra enter.
Though Autogen is from Microsoft, it’s not restricted to Azure OpenAI. The AI assistant can use OpenAI:
openai_config = {
"config_list": [
{
"model": "gpt-4",
"api_key": os.environ.get("OPENAI_API_KEY")
}
]
}
or Gemini:
gemini_config = {
"config_list": [
{
"model": "gemini-1.5-flash",
"api_key": os.environ.get("GOOGLE_API_KEY"),
"api_type": "google"
}
],
}
Anthropic and Ollama are supported as effectively.
Provide the suitable LLM configuration to create the Assistant:
assistant = AssistantAgent(
"Assistant",
llm_config=gemini_config,
max_consecutive_auto_reply=3
)
Earlier than we wire the remainder of the agentic framework, let’s ask the Assistant to reply our pattern question.
response = user_proxy.initiate_chat(
assistant, message=f"Is it raining in Chicago?"
)
print(response)
The Assistant responds with this code to achieve out an current Google net service and scrape the response:
```python
# filename: climate.py
import requests
from bs4 import BeautifulSoup
url = "https://www.google.com/search?q=climate+chicago"
response = requests.get(url)
soup = BeautifulSoup(response.textual content, 'html.parser')
weather_info = soup.discover('div', {'id': 'wob_tm'})
print(weather_info.textual content)
```
This will get on the energy of an agentic framework when powered by a frontier foundational mannequin — the Assistant has autonomously found out an internet service that gives the specified performance and is utilizing its code era and execution functionality to offer one thing akin to the specified performance! Nonetheless, it’s not fairly what we needed — we requested whether or not it was raining, and we acquired again the complete web site as a substitute of the specified reply.
Secondly, the autonomous functionality doesn’t actually meet our pedagogical wants. We’re utilizing this instance as illustrative of enterprise use instances, and it’s unlikely that the LLM will find out about your inner APIs and instruments to have the ability to use them autonomously. So, let’s proceed to construct out the framework proven in Determine 1 to invoke the particular APIs we need to use.
As a result of extracting the situation from the query is simply textual content processing, you’ll be able to merely immediate the LLM. Let’s do that with a single-shot instance:
SYSTEM_MESSAGE_1 = """
Within the query beneath, what location is the person asking about?
Instance:
Query: What is the climate in Kalamazoo, Michigan?
Reply: Kalamazoo, Michigan.
Query:
"""
Now, after we provoke the chat by asking whether or not it’s raining in Chicago:
response1 = user_proxy.initiate_chat(
assistant, message=f"{SYSTEM_MESSAGE_1} Is it raining in Chicago?"
)
print(response1)
we get again:
Reply: Chicago.
TERMINATE
So, step 2 of Determine 1 is full.
Step 3 is to get the latitude and longitude coordinates of the situation that the person is fascinated with. Write a Python operate that may known as the Google Maps API and extract the required coordinates:
def geocoder(location: str) -> (float, float):
geocode_result = gmaps.geocode(location)
return (spherical(geocode_result[0]['geometry']['location']['lat'], 4),
spherical(geocode_result[0]['geometry']['location']['lng'], 4))
Subsequent, register this operate in order that the Assistant can name it in its generated code, and the person proxy can execute it in its sandbox:
autogen.register_function(
geocoder,
caller=assistant, # The assistant agent can counsel calls to the geocoder.
executor=user_proxy, # The person proxy agent can execute the geocder calls.
identify="geocoder", # By default, the operate identify is used because the device identify.
description="Finds the latitude and longitude of a location or landmark", # An outline of the device.
)
Be aware that, at the time of writing, operate calling is supported by Autogen just for GPT-4 fashions.
We now increase the instance within the immediate to incorporate the geocoding step:
SYSTEM_MESSAGE_2 = """
Within the query beneath, what latitude and longitude is the person asking about?
Instance:
Query: What is the climate in Kalamazoo, Michigan?
Step 1: The person is asking about Kalamazoo, Michigan.
Step 2: Use the geocoder device to get the latitude and longitude of Kalmazoo, Michigan.
Reply: (42.2917, -85.5872)
Query:
"""
Now, after we provoke the chat by asking whether or not it’s raining in Chicago:
response2 = user_proxy.initiate_chat(
assistant, message=f"{SYSTEM_MESSAGE_2} Is it raining in Chicago?"
)
print(response2)
we get again:
Reply: (41.8781, -87.6298)
TERMINATE
Now that we now have the latitude and longitude coordinates, we’re able to invoke the NWS API to get the climate knowledge. Step 4, to get the climate knowledge, is much like geocoding, besides that we’re invoking a distinct API and extracting a distinct object from the online service response. Please take a look at the code on GitHub to see the complete particulars.
The upshot is that the system immediate expands to embody all of the steps within the agentic utility:
SYSTEM_MESSAGE_3 = """
Observe the steps within the instance beneath to retrieve the climate info requested.
Instance:
Query: What is the climate in Kalamazoo, Michigan?
Step 1: The person is asking about Kalamazoo, Michigan.
Step 2: Use the geocoder device to get the latitude and longitude of Kalmazoo, Michigan.
Step 3: latitude, longitude is (42.2917, -85.5872)
Step 4: Use the get_weather_from_nws device to get the climate from the Nationwide Climate Service on the latitude, longitude
Step 5: The detailed forecast for tonight reads 'Showers and thunderstorms earlier than 8pm, then showers and thunderstorms seemingly. A few of the storms may produce heavy rain. Principally cloudy. Low round 68, with temperatures rising to round 70 in a single day. West southwest wind 5 to eight mph. Likelihood of precipitation is 80%. New rainfall quantities between 1 and a couple of inches attainable.'
Reply: It would rain tonight. Temperature is round 70F.
Query:
"""
Primarily based on this immediate, the response to the query about Chicago climate extracts the best info and solutions the query appropriately.
On this instance, we allowed Autogen to pick the subsequent agent within the dialog autonomously. We will additionally specify a distinct subsequent speaker choice technique: particularly, setting this to be “guide” inserts a human within the loop, and permits the human to pick the subsequent agent within the workflow.
The place Autogen treats agentic workflows as conversations, LangGraph is an open supply framework that means that you can construct brokers by treating a workflow as a graph. That is impressed by the lengthy historical past of representing knowledge processing pipelines as directed acyclic graphs (DAGs).
Within the graph paradigm, our climate agent would look as proven in Determine 2.
There are just a few key variations between Figures 1 (Autogen) and a couple of (LangGraph):
- In Autogen, every of the brokers is a conversational agent. Workflows are handled as conversations between brokers that speak to one another. Brokers soar into the dialog after they imagine it’s “their flip”. In LangGraph, workflows are handled as a graph whose nodes the workflow cycles by way of based mostly on guidelines that we specify.
- In Autogen, the AI assistant is just not able to executing code; as a substitute the Assistant generates code, and it’s the person proxy that executes the code. In LangGraph, there’s a particular ToolsNode that consists of capabilities made accessible to the Assistant.
You’ll be able to observe alongside this part by referring to the file lg_weather_agent.py in my GitHub repository.
We arrange LangGraph by creating the workflow graph. Our graph consists of two nodes: the Assistant Node and a ToolsNode. Communication inside the workflow occurs by way of a shared state.
workflow = StateGraph(MessagesState)
workflow.add_node("assistant", call_model)
workflow.add_node("instruments", ToolNode(instruments))
The instruments are Python features:
@device
def latlon_geocoder(location: str) -> (float, float):
"""Converts a spot identify equivalent to "Kalamazoo, Michigan" to latitude and longitude coordinates"""
geocode_result = gmaps.geocode(location)
return (spherical(geocode_result[0]['geometry']['location']['lat'], 4),
spherical(geocode_result[0]['geometry']['location']['lng'], 4))
instruments = [latlon_geocoder, get_weather_from_nws]
The Assistant calls the language mannequin:
mannequin = ChatOpenAI(mannequin='gpt-3.5-turbo', temperature=0).bind_tools(instruments)
def call_model(state: MessagesState):
messages = state['messages']
response = mannequin.invoke(messages)
# This message will get appended to the present record
return {"messages": [response]}
LangGraph makes use of langchain, and so altering the mannequin supplier is simple. To make use of Gemini, you’ll be able to create the mannequin utilizing:
mannequin = ChatGoogleGenerativeAI(mannequin='gemini-1.5-flash',
temperature=0).bind_tools(instruments)
Subsequent, we outline the graph’s edges:
workflow.set_entry_point("assistant")
workflow.add_conditional_edges("assistant", assistant_next_node)
workflow.add_edge("instruments", "assistant")
The primary and final traces above are self-explanatory: the workflow begins with a query being despatched to the Assistant. Anytime a device is named, the subsequent node within the workflow is the Assistant which is able to use the results of the device. The center line units up a conditional edge within the workflow, for the reason that subsequent node after the Assistant is just not fastened. As a substitute, the Assistant calls a device or ends the workflow based mostly on the contents of the final message:
def assistant_next_node(state: MessagesState) -> Literal["tools", END]:
messages = state['messages']
last_message = messages[-1]
# If the LLM makes a device name, then we path to the "instruments" node
if last_message.tool_calls:
return "instruments"
# In any other case, we cease (reply to the person)
return END
As soon as the workflow has been created, compile the graph after which run it by passing in questions:
app = workflow.compile()
final_state = app.invoke(
{"messages": [HumanMessage(content=f"{system_message} {question}")]}
)
The system message and query are precisely what we employed in Autogen:
system_message = """
Observe the steps within the instance beneath to retrieve the climate info requested.
Instance:
Query: What is the climate in Kalamazoo, Michigan?
Step 1: The person is asking about Kalamazoo, Michigan.
Step 2: Use the latlon_geocoder device to get the latitude and longitude of Kalmazoo, Michigan.
Step 3: latitude, longitude is (42.2917, -85.5872)
Step 4: Use the get_weather_from_nws device to get the climate from the Nationwide Climate Service on the latitude, longitude
Step 5: The detailed forecast for tonight reads 'Showers and thunderstorms earlier than 8pm, then showers and thunderstorms seemingly. A few of the storms may produce heavy rain. Principally cloudy. Low round 68, with temperatures rising to round 70 in a single day. West southwest wind 5 to eight mph. Likelihood of precipitation is 80%. New rainfall quantities between 1 and a couple of inches attainable.'
Reply: It would rain tonight. Temperature is round 70F.
Query:
"""
query="Is it raining in Chicago?"
The result’s that the agent framework makes use of the steps to provide you with a solution to our query:
Step 1: The person is asking about Chicago.
Step 2: Use the latlon_geocoder device to get the latitude and longitude of Chicago.
[41.8781, -87.6298]
[{"number": 1, "name": "This Afternoon", "startTime": "2024–07–30T14:00:00–05:00", "endTime": "2024–07–30T18:00:00–05:00", "isDaytime": true, …]
There's a probability of showers and thunderstorms after 8pm tonight. The low shall be round 73 levels.
Between Autogen and LangGraph, which one do you have to select? Just a few issues:
In fact, the extent of Autogen assist for non-OpenAI fashions and different tooling may enhance by the point you might be studying this. LangGraph may add autonomous capabilities, and Autogen may present you extra fine-grained management. The agent area is shifting quick!
- ag_weather_agent.py: https://github.com/lakshmanok/lakblogs/blob/predominant/genai_agents/ag_weather_agent.py
- lg_weather_agent.py: https://github.com/lakshmanok/lakblogs/blob/predominant/genai_agents/lg_weather_agent.py
This text is an excerpt from a forthcoming O’Reilly e book “Visualizing Generative AI” that I’m writing with Priyanka Vergadia. All diagrams on this publish have been created by the writer.