Constructing a Analysis Agent That Can Write to Google Docs (Half 1) | by Robert Martin-Quick | Nov, 2024

Giant Language Fashions (LLMs) are rapidly discovering use in all types of functions related to analysts and researchers, particularly on the subject of the extraction, group and summarization of textual content data. The neighborhood — each business and open supply — can also be making it more and more simple to construct and scale so-called “agentic” functions, through which the LLM assumes the position of a (hopefully) expert analyst and makes semi-autonomous selections. In a chatbot software, for instance, if the consumer asks a fancy or multi-step question the LLM would possibly have to design a plan of motion, accurately question a number of exterior instruments — maybe calculators, net searchers, vector databases and so on — assemble the outcomes and generate a solution.

Techniques like this are sometimes mentioned to make use of the ReAct framework of immediate engineering, which stands for “Reasoning-Motion”. Mainly, the construction and sequence of prompts forces the LLM to reply the query in very methodical vogue, first by articulating a thought (sometimes a plan of assault), finishing up an motion, then making an statement of the consequence. In agentic techniques, this course of can proceed iteratively till the LLM decides that it’s come to an appropriate reply.

On this collection of articles, we’ll use the LangGraph library and Tavily search software to construct a easy analysis assistant that demonstrates a few of these ideas and would possibly even be helpful for these of us seeking to generate fast, effectively written experiences about any topic. Our agent can be impressed by the plan -> analysis -> write -> submit -> assessment -> revise cycle that occurs in peer-reviewed analysis, and you may check out the prompts for these totally different sections right here.

To make the system really feel extra full, we’ll additionally add the power to mechanically add the fabric generated to a Google Doc, which is explored in half 2. This ought to be thought of as extra of an add-on than an built-in part of the agent, however it’s fascinating in its personal proper and so may be learn as a stand-alone article.

Earlier than taking a look at how we are able to construct this assistant and what it means for it to be “agentic”, we should always suppose briefly about what we’d prefer it to do. The objective is to construct a system that may plan and write quick, informative articles a few given subject, then enhance its personal work by way of assessment and revision.

Why? Primarily that is simply an exploration of know-how, however the usage of LLMs as semi-autonomous researchers is an energetic area of investigation and is yielding fascinating tasks corresponding to GPT-researcher. They’ve the potential to hurry up the work of analysts, college students, authors and researchers — although after all if the objective is human studying, there is no such thing as a substitute for cautious studying, notice taking and dialogue, which AI can’t change.

LLMs like GPT4, Anthropic Claude Sonnet, Meta Llama 3, Google Gemini Professional and so on. can already write nice articles out of the field with only a single immediate. Nonetheless, these LLMs have data cutoffs and so want entry to further instruments in an effort to fetch the newest data, corresponding to information about present occasions. There are many providers — notably instruments like Perplexity, ChatGPT (now accessible through chat.com) and Google’s AI overview that have already got this skill, however they’re geared extra in the direction of offering fast summaries than polished analysis experiences.

Right here, we’re making the belief that a number of iterations of assessment and revision will enhance the standard of an article generated by an LLM. That is actually the way it works in human writing. Our assistant could have the next elements, every with its personal instruction immediate

  • Planner. Turns a poorly outlined job right into a structured article plan
  • Researcher. Takes the plan and searches the web for related content material.
  • Author. Makes use of the plan, retrieved content material and it personal data to put in writing the report
  • Reviewer. Reads the report and affords constructive criticism
  • Editor. Reads the report and the reviewer’s criticism and decides if the report must be revised. In that case, the report is shipped again to the researcher and author levels.

In our implementation every of those elements can be calling the identical LLM, specifically GPT4o-mini, however in an actual software they might simply as simply use totally different, extra specialised fashions.

The output can be a well-written, informative report — ideally with references — that we are able to programmatically drop right into a Google doc for secure protecting. It’s simple to change the “character” or our researcher by adapting the prompts. The editor is especially vital, as a result of it’s the gatekeeper for the tip of the method. If we make our editor very strict, the system would possibly have to loop by way of many revisions to get accepted. To what extent will a stricter editor enhance the standard of the consequence? That’s a really fascinating query which, as they are saying, is past the scope of the present work!

Our analysis assistant relies closely on the instance described on this wonderful quick course about LangGraph. LangGraph is an LLM orchestration library that makes an attempt to make it simpler for us to design and construct dependable brokers. For an in-depth comparability of LangGraph and LangChain, I like to recommend this wonderful article.

What precisely is an agent? It seems that the neighborhood has not but settled on a definition, however no less than broadly talking we’d say that an agent is a multi-step system the place an LLM is allowed to make significant selections in regards to the consequence. This makes it extra advanced (and probably extra unpredictable) than a sequence, which is only a predefined set of LLM calls one after the opposite.

In an agent framework, the LLM has some autonomy over easy methods to remedy the issue it’s given, maybe by selecting the suitable software to name or deciding when to cease refining an answer as soon as it’s ok. In that sense the LLM turns into extra of the mind of the system, appearing extra like a human analyst than simply an API name. One fascinating problem right here is that whereas brokers is perhaps free to make selections, they’re often embedded inside or work together with conventional software program techniques that require structured inputs and outputs. It’s subsequently crucial to pressure the agent to return its solutions in the way in which that these different techniques perceive, whatever the resolution it makes.

For a extra in-depth dialogue of brokers within the context of LangGraph, this documentation may be very useful. Our analysis agent can be fairly a easy one (partly as a result of I’m nonetheless studying this materials too!) however hopefully might be a stepping stone in the direction of one thing extra subtle.

In LangGraph we outline the logic of our system as a graph, which consists of nodes and edges. Nodes are the place LLM calls are made, and edges cross data from one node to the following. Edges could be conditional, that means that they’ll direct data to totally different nodes relying on what resolution is made. Info is handed between nodes in a structured format outlined by a state.

Our analysis assistant has a single stage referred to as AgentState and it appears to be like like this

class AgentState(TypedDict):
"""
A dictionary representing the state of the analysis agent.

Attributes:
job (str): The outline of the duty to be carried out.
plan (str): The analysis plan generated for the duty.
draft (str): The present draft of the analysis report.
critique (str): The critique acquired for the draft.
content material (Checklist[str]): A listing of content material gathered throughout analysis.
revision_number (int): The present revision variety of the draft.
max_revisions (int): The utmost variety of revisions allowed.
finalized_state (bool): Signifies whether or not the report is finalized.
"""

job: str
plan: str
draft: str
critique: str
content material: Checklist[str]
editor_comment: str
revision_number: int
max_revisions: int
finalized_state: bool

That is the place all the data related to our downside will get saved, and could be up to date by LLM motion inside a node of the graph.

Now we are able to outline some nodes. Within the code, all of the nodes are stored inside the AgentNodes class, which is only a method I discovered useful to group them. For instance the planner node appears to be like like this

    def plan_node(self, state: AgentState) -> Dict[str, str]:
"""
Generate a analysis plan primarily based on the present state.

Args:
state (AgentState): The present state of the analysis agent.

Returns:
Dict[str, str]: A dictionary containing the generated analysis plan.
"""
messages = [
SystemMessage(content=ResearchPlanPrompt.system_template),
HumanMessage(content=state["task"]),
]
response = self.mannequin.invoke(messages)
return {"plan": response.content material}

Word the way it takes in an AgentState and returns a modification to one in every of its elements, specifically the textual content for the analysis plan. When this node is run, the plan is up to date.

The code contained in the node perform makes use of normal LangChain syntax. self.mannequin is an occasion of ChatOpenAI, which appears to be like like this

mannequin = ChatOpenAI(
mannequin="gpt-4o-mini", temperature=0, api_key=secrets and techniques["OPENAI_API_KEY"]
)

The immediate consists of a system message from the ResearchPlanPrompt dataclass concatenated with the “job” component of the AgentState, which is the analysis subject supplied by the consumer. The plan immediate appears to be like like this.

@dataclass
class ResearchPlanPrompt:
system_template: str = """
You might be an skilled author tasked with making a high-level define for a analysis report.
Write such a top level view for the user-provided subject. Embody related notes or directions for every part.
The type of the analysis report ought to be geared in the direction of the educated public. It ought to be detailed sufficient to supply
degree of understanding of the subject, however not unnecessarily dense. Consider it extra like a whitepaper to be consumed
by a enterprise chief reasonably than a tutorial journal article.
"""

Comparable nodes have to be made for the next duties

  • Conduct analysis. That is the place we use an LLM to transform the analysis job right into a collection of queries, then use the Tavily search software to search out their solutions on-line and save this below “content material” within the AgentStage. This course of is mentioned in additional element in part 2
  • Write the report. Right here we make use of the duty identify, the analysis plan, the analysis content material and any earlier reviewer feedback to truly write the analysis report. This will get saved below “draft” within the AgentState. At any time when this runs, the revision_number indicator will get up to date.
  • Evaluate the report. Name the LLM to critique the analysis report and save the assessment below “critique”
  • Conduct extra analysis in response to the critique. That is going to soak up the unique draft and the assessment and generate some extra queries for Tavily that ought to assist the system handle the reviewer feedback. As soon as once more, this data is saved below “content material”
  • Decide about whether or not or not the report satisfies the reviewer’s feedback. That is carried out by the LLM with the steering of the editor immediate, which instructs it to make a sure/no resolution on the article and clarify its reasoning.
  • Dummy nodes for rejecting or accepting the analysis. As soon as we get to both of those, we are able to finish the circulate. The ultimate analysis report can then be extracted from the AgentState

We have to make a conditional edge within the graph on the editor node: If the editor says sure, we go to the accepted node. If no, we return to the assessment node.

To outline this logic, we have to make a perform to run contained in the conditional edge. I’ve chosen to place this in an AgentEdges class, however this isn’t a requirement.

 def should_continue(state: AgentState) -> str:
"""
Decide whether or not the analysis course of ought to proceed primarily based on the present state.

Args:
state (AgentState): The present state of the analysis agent.

Returns:
str: The subsequent state to transition to ("to_review", "accepted", or "rejected").
"""
# all the time ship to assessment if editor hasn't made feedback but
current_editor_comments = state.get("editor_comment", [])
if not current_editor_comments:
return "to_review"

final_state = state.get("finalized_state", False)
if final_state:
return "accepted"
elif state["revision_number"] > state["max_revisions"]:
logger.information("Revision quantity > max allowed revisions")
return "rejected"
else:
return "to_review"

In code, your entire graph setup appears to be like like this

from research_assist.researcher.AgentComponents import (
AgentNodes,
AgentState,
AgentEdges,
)
# that is the predefined finish node
from langgraph.graph import END

agent = StateGraph(AgentState)
nodes = AgentNodes(mannequin, searcher)
edges = AgentEdges()

## Nodes
agent.add_node("initial_plan", nodes.plan_node)
agent.add_node("write", nodes.generation_node)
agent.add_node("assessment", nodes.review_node)
agent.add_node("do_research", nodes.research_plan_node)
agent.add_node("research_revise", nodes.research_critique_node)
agent.add_node("reject", nodes.reject_node)
agent.add_node("settle for", nodes.accept_node)
agent.add_node("editor", nodes.editor_node)

## Edges
agent.set_entry_point("initial_plan")
agent.add_edge("initial_plan", "do_research")
agent.add_edge("do_research", "write")
agent.add_edge("write", "editor")

## Conditional edges
agent.add_conditional_edges(
"editor",
edges.should_continue,
{"accepted": "settle for", "to_review": "assessment", "rejected": "reject"},
)
agent.add_edge("assessment", "research_revise")
agent.add_edge("research_revise", "write")
agent.add_edge("reject", END)
agent.add_edge("settle for", END)

Earlier than information can circulate by way of a graph, the graph should be compiled. My understanding from the docs is that simply runs some easy checks on the structured of the graph and returns a CompiledGraph object, which has strategies like stream and invoke.These let you cross inputs to the beginning node, which is outlined utilizing set_entry_point within the code above.

When constructing these graphs, it may be very useful to visualise all of the nodes and edges in a pocket book. This may be carried out with the next command

from IPython.show import Picture

Picture(agent.compile().get_graph().draw_png())

LangGraph affords a couple of alternative ways of drawing the graph, relying on what visualization bundle you’ve got put in. I’m utilizing pygraphviz, which could be put in on an m-series mac utilizing the next command

brew set up graphviz
pip set up -U --no-cache-dir
--config-settings="--global-option=build_ext"
--config-settings="--global-option=-I$(brew --prefix graphviz)/embrace/"
--config-settings="--global-option=-L$(brew --prefix graphviz)/lib/"
pygraphviz
Visualization of the management circulate for our agent. Nodes are the place LLM calls happen, whereas edges point out the circulate of knowledge. Picture generated by the creator.

How can we check our agent? The only method would simply be to name invoke with preliminary values of among the elements of AgentState (i.e. job, max_revisions and revision quantity), which enter the graph on the entry level node.

graph = agent.compile()
res = graph.invoke(
{
"job": "What are the important thing traits in LLM analysis and software that you simply see in 2024",
"max_revisions": 1,
"revision_number": 0,
}
)

After a while (could be a number of minutes if the max_revisions is ready to be massive) it will return a dictionary of the agent state with all of the elements stuffed in. I’m utilizing gpt4o-mini for this and the outcomes are very spectacular, though the extent to which including the “assessment” and “editor” elements actually assist to enhance the standard of the article might be debated and we’ll return to that in part 3.

What if we would like extra perception into the inputs and outputs of the nodes at every stage of the graph? That is important for debugging and explainable because the graph grows or if we’re hoping to deploy one thing like this in manufacturing. Fortunately LangGraph has some nice instruments right here, that are coated below the persistence and streaming sections of its documentation. A minimal implementation appears to be like one thing like this, the place we’re utilizing an in reminiscence retailer to maintain monitor of the updates the come out of every stage of the graph.

from langgraph.retailer.reminiscence import InMemoryStore
from langgraph.checkpoint.reminiscence import MemorySaver
import uuid

checkpointer = MemorySaver()
in_memory_store = InMemoryStore()
graph = agent.compile(checkpointer=checkpointer, retailer=self.in_memory_store)

# Invoke the graph
user_id = "1"
config = {"configurable": {"thread_id": "1", "user_id": user_id}}
namespace = (user_id, "recollections")

for i, replace in enumerate(graph.stream(
{
"job": task_description,
"max_revisions": max_revisions,
"revision_number": 0,
}, config, stream_mode="updates"
)):
# print the information that simply obtained generated
print(replace)
memory_id = str(uuid.uuid4())
# retailer the information that simply obtained generated in reminiscence
self.in_memory_store.put(namespace, memory_id, {"reminiscence": replace})
outcomes.append(replace)

Extra subtle functions would entry the shop from contained in the nodes themselves, permitting a chatbot to recall earlier conversations with a given consumer for instance. Right here we’re simply utilizing the reminiscence to save lots of the outputs of every of the nodes, which may then be considered for debugging functions. We’ll discover {that a} bit extra within the remaining part.

Maybe probably the most fascinating components of the management circulate above are the do_researchand research_revise nodes. Inside each of those nodes we’re utilizing an LLM to generate some net search queries related to the duty, after which we’re utilizing the Tavily API to truly conduct the search. Tavily is a comparatively new service that provides a search engine optimized for AI brokers. Virtually what this implies is that the service returns search outcomes as chunks of related textual content from web sites, reasonably than only a listing of urls (which might have to be scraped and parsed) as within the case of typical search engine APIs.

Beneath the hood, Tavily is probably going utilizing net scrapers and LLMs to extract content material related to the consumer’s search, however all of that’s abstracted away. You may enroll right here for Tavily’s free “Researcher” plan which provides 1000 free API calls. Sadly after that you simply’d have to pay a month-to-month charge to maintain utilizing it, which is probably going solely price it for enterprise use instances.

Lets see an instance utilizing the code similar to what’s happening inside AgentNodes.research_plan_node


from langchain_core.messages import (
SystemMessage,
HumanMessage,
)
from research_assist.researcher.prompts import (
ResearchPlanPrompt,
)
from langchain_openai import ChatOpenAI
from tavily import TavilyClient

class Queries(BaseModel):
"""
A mannequin representing an inventory of search queries.

Attributes:
queries (Checklist[str]): A listing of search queries to be executed.
"""

queries: Checklist[str]

# arrange job
job = """
What are the important thing traits in LLM reseach and software that you simply see in 2024
"""

# arrange LLM and Tavily
mannequin = ChatOpenAI(
mannequin="gpt-4o-mini", temperature=0, api_key=secrets and techniques["OPENAI_API_KEY"]
)
tavily = TavilyClient(api_key=secrets and techniques["TAVILY_API_KEY"])

# generate some queries related to the duty
queries = agent.nodes.mannequin.with_structured_output(Queries).invoke(
[
SystemMessage(content=ResearchPlanPrompt.system_template),
HumanMessage(content=task),
]
)

This generates 5 search queries related to the duty we outlined, which appear to be this

['key trends in LLM research 2024',
'LLM applications 2024',
'latest developments in LLM technology 2024',
'future of LLMs 2024',
'LLM research advancements 2024']

Subsequent we are able to name Tavily search on every of those queries

response = tavily.search(question=queries[0], max_results=2)

This offers a properly formatted consequence with url, title and textual content chunk.

Instance outcomes from a Tavily search. Picture generated by the creator.

This can be a very highly effective and simple to make use of search software that can provide LLM functions entry to the online with out the necessity for further work!

In our researcher agent, we’re at present solely utilizing the content material area, which we extract and append to an inventory which is handed into the AgentState. That data then will get injected into the immediate thats used for the author node, therefore permitting the LLM to have entry to it when producing the report.

There may be much more you are able to do with Tavily search, however remember that experimenting with it is going to rapidly burn by way of your free API calls. In actual fact, for our report writing job there are numerous functions the place Tavily calls most likely aren’t essential (i.e. the LLM already has enough data to put in writing the report), so I might suggest including an extra conditional edge that enables the system to bypass the do_research and research_revise nodes if it determines that an internet search shouldn’t be wanted. I’ll seemingly replace the repo with this alteration quickly.

To solidify every part we simply realized, let’s stroll by way of an instance of the researcher in motion, utilizing the identical job as above.

First, we import the libraries and arrange our LLM and searcher fashions

from research_assist.researcher.Agent import ResearchAgent, load_secrets
from langchain_openai import ChatOpenAI
from tavily import TavilyClient

secrets and techniques = load_secrets()
mannequin = ChatOpenAI(
mannequin="gpt-4o-mini", temperature=0, api_key=secrets and techniques["OPENAI_API_KEY"]
)
tavily = TavilyClient(api_key=secrets and techniques["TAVILY_API_KEY"])

agent = ResearchAgent(mannequin, tavily)

Now we are able to run the agent on a job and provides it a most variety of revisions.

job = """
What are the important thing traits in LLM reseach and software that you simply see in 2024
"""
consequence = agent.run_task(task_description=job,max_revisions=3)

Now the agent will run its job, which could take a few minute. Logging has been added to point out what it’s doing and importantly, the outcomes are being saved to the in_memory_store , which we noticed on the finish of part 2.

The ultimate report is accessible in a couple of methods. Its saved within the consequence listing and could be visualized in a pocket book like this

Markdown(consequence[-3]['write']['draft'])

Its additionally saved within the agent’s reminiscence together with all the opposite outputs. We will entry it as follows

agent.in_memory_store.search(("1", "recollections"))[-3].dict()

The report itself is about 1300 phrases lengthy — a bit an excessive amount of to repeat right here — however I’ve pasted it into the repo right here. We will additionally check out what the editor considered it after one spherical of revision

editor_comments = agent.in_memory_store.search(("1", "recollections"))[-2].dict()
{'worth': {'reminiscence': {'editor': {'editor_comment': 
'The report has addressed the critiques by enhancing depth in key sections,
including readability, and bettering construction with subheadings.
It offers particular examples and discusses moral issues,
making it a precious useful resource. The revisions are enough for publication.',
'finalized_state': True}}},
'key': '9005ad06-c8eb-4c6f-bb94-e77f2bc867bc',
'namespace': ['1', 'memories'],
'created_at': '2024-11-11T06:09:46.170263+00:00',
'updated_at': '2024-11-11T06:09:46.170267+00:00'}

It appears the editor was happy!

For debugging functions, we most likely have to learn although all the opposite outputs although. This may be painful to do in a pocket book so within the subsequent article we’ll talk about how they are often programmatically dropped into Google Docs. Thanks for making it to the tip and we’ll choose up partially 2!

The creator is unaffiliated with any of the instruments mentioned on this article.