The Different Agentic Patterns
What is an Agent?
The 2-agents conversation/collaboration pattern
Simple chats
Tool Use
LLM-based chat
The Collaboration Pattern with Three or more Agents
The Group Chat pattern
The Supervision pattern
The Hierarchical Teams pattern
Planning agents
Plan and Execute
Reasoning without Observation: Plan-Work-Solve Pattern
LLMCompiler: Tool calls optimization
Reflection & Critique
What is an Agent?
There is a fine line between an agent and a non-agentic pipeline. Let’s define an agent:
It perceives an environment: it can receive inputs from its environment. When we think about LLMs, the inputs typically need to be in a textual or image format.
It maintains an internal state: the internal state can be its original knowledge base with additional context that can be updated over time based on new information or experiences.
It has goals or objectives: they influence how the agent prioritizes actions and makes decisions in varying contexts.
It processes inputs using an LLM: not all agents are LLM-based, but as part of the agentic system, some of the agents will use LLMs as the decision engine.
It decides on actions: based on its inputs, its internal state, and its objective, the agent will take an action. The action taken is decided by the decision engine.
The action affects the environment: the actions taken will influence the environment either by creating new data, informing the user, or changing the internal state of other agents.
So why would we want an agent instead of a pipeline where we encode all the possible states and actions that can be taken in the environment? For example, in the previous newsletter, I showed how to implement a complex RAG pipeline.
There were goals and decisions taken, and by the definition, this seems to be an agentic design. Typically, when we talk about agentic patterns, we refer to patterns where the agents will choose their own paths.
If we know of the possible states and action outcomes that can exist in this environment, and the number of state-action pairs is somewhat manageable for an engineer to implement, we are better served by a rigid pipeline. All the actions are pre-defined, and we can monitor the correct behavior of the pipeline. Leaving the choice of actions to an LLM is prone to errors and inefficiencies. LLMs hallucinate, and we are likely to encounter undesired outcomes if we don’t have control over the decisions taken. Here is an example of a simple problem:
We don’t need to leave the choice of action to a decision engine because the path is straightforward!
It might become a better option to use agents to choose the next best actions when the amount of possible state-action pairs is too large. Encoding all the possible outcomes based on all the possible inputs and internal states can become unmanageable in complex problems. That is why we think of agents when we want to solve complex problems. What does “complex problem” mean? It is a problem where the number of states that exist in the environment and the number of possible actions we could take from those can potentially grow infinitely large!
In a closely related subject, this is also the goal behind Reinforcement Learning. When it is hard to find the optimal state-action path for a complex problem, we use reinforcement learning agents to learn the right action to take in a specific state. In reinforcement learning, we use the concept of rewards to teach the objectives to the agents. For LLM agents, we can directly encode the objectives as part of the prompt.
LLMs are not good at making complex decisions. They easily hallucinate, and it is unavoidable. However, LLMs are quite consistent when making very simple decisions. When we engineer agents, we need to account for those weaknesses. Typically, the more complex the problems to solve become, the more complex the decisions that have to be taken. When the complexity increases, it becomes better to increase the number of agents. Each agent is specialized in achieving a specific objective, and the complexity is met by the collaboration of multiple agents that only need to make very simple decisions at each step.
Let’s see how agents can collaborate to solve complex problems!
The 2-agents conversation/collaboration pattern
Simple chats
Let’s imagine we want to build a multi-agent system that can solve problems by writing code. We could have two agents, one that specializes in writing code if need be and another one that specializes in executing code.
They iteratively send messages to each other until one of them decides that the problem has been solved. The code assistant is powered by an LLM that can produce code. Here is the system prompt that Autogen uses for their code assistant agents:
You are a helpful AI assistant.
Solve tasks using your coding and language skills.
In the following cases, suggest python code (in a python coding block) or shell script (in a sh coding block) for the user to execute.
1. When you need to collect info, use the code to output the info you need, for example, browse or search the web, download/read a file, print the content of a webpage or a file, get the current date/time, check the operating system. After sufficient info is printed and the task is ready to be solved based on your language skill, you can solve the task by yourself.
2. When you need to perform some task with code, use the code to perform the task and output the result. Finish the task smartly.
Solve the task step by step if you need to. If a plan is not provided, explain your plan first. Be clear which step uses code, and which step uses your language skill.
When using code, you must indicate the script type in the code block. The user cannot provide any other feedback or perform any other action beyond executing the code you suggest. The user can't modify your code. So do not suggest incomplete code which requires users to modify. Don't use a code block if it's not intended to be executed by the user.
If you want the user to save the code in a file before executing it, put # filename: <filename> inside the code block as the first line. Don't include multiple code blocks in one response. Do not ask users to copy and paste the result. Instead, use 'print' function for the output when relevant. Check the execution result returned by the user.
If the result indicates there is an error, fix the error and output the code again. Suggest the full code instead of partial code or code changes. If the error can't be fixed or if the task is not solved even after the code is executed successfully, analyze the problem, revisit your assumption, collect additional info you need, and think of a different approach to try.
When you find an answer, verify the answer carefully. Include verifiable evidence in your response if possible.
Reply "TERMINATE" in the end when everything is done.
The code executor can be given very little agency and be mostly driven by regex pattern matching. If code is detected in the message sent by the sender, the code executor agent extracts, executes the code, and sends the standard output to the sender. The Code assistant can then utilize the potential error traces or success messages to decide on the next step.
Let’s see an example. A user asks a question, and it is sent to the code assistant:
The request can also be captured as part of a message history system to ensure that we don’t come back to a previously visited state. The code assistant, being prompted to solve the problem with code, will generate a response, including a script following the format imposed by the system prompt. For example, here is a possible response by the agent:
Keep reading with a 7-day free trial
Subscribe to The AiEdge Newsletter to keep reading this post and get 7 days of free access to the full post archives.