Multi-Agent Collaboration Frameworks

Is this the path to AGI?

Picking up from our previous discussion around Agents (mainly single agent systems) and how to keep them coherent as they plan and execute complex tasks, today we shall look at some multi-agent collaboration frameworks. Like us, if LLMs are given too many choices and tools at their disposal, while shoving all steps of a complex task into a long prompt, they tend to miss instructions, hallucinate, get confused, throw an error and/or get stuck in a loop.

There-in lies the value of multi-agent collaboration. Taking the “think step by step” to “plan and execute step by step”, these agentic workflows hold the promise of automating entire teams with a process flow broken down into smaller tasks for specialized agents. Remind you of something?

With the disclaimer of “We are still early” in this niche, there are a few interesting frameworks that have surfaced, promising collaborative and collective decision making via multiple autonomous agents, each with their own persona and capabilities. Maybe this is the path to AGI, maybe not, but it is closest we have come to the way human group dynamics function. A good starting point on how to think about these architectures is by referencing:

  • Environment interface: refers to the context defining how they are deployed and what they interact with.

  • Profiling: traits, actions and skills. These profiles can be explicitly defined by the system designers, generated by the LLMs or based on pre-existing datasets.

  • Communication: comprised of paradigms i.e, style and method such as cooperative, competitive or debate and structure i.e. architecture of the network and the content being exchanged between these agents.

  • Capabilities acquisition: refers to adjustments made based on feedback.

With this as context, let’s look at some multi-agent frameworks.

Chatdev

The ChatDev paper presented the idea of Communicative Agents for Software Development”. On being given a task by a human client, the team of software agents through effective communication and mutual verification automatically craft comprehensive software solutions that encompass source codes, environment dependencies, and user manuals. We as humans know the power of collaboration, pass it on to agents and we get closer to workflow automation.

MetaGPT

Based on emulating the Standard Operating Procedures (SOPs) and assigning specific roles to agents for workflow processing, this paper was a step further from ChatDev, by incorporating efficient human workflows into LLM based multi agent systems. By breaking down the complex task into sub-tasks, using an assembly line paradigm to assign diverse set of roles to agents to execute those tasks, MetaGPT is credited to ushering in the race to agentic workflow automation, leading to open source solutions such as Autogen, crewAI and Langgraph.

Autogen:

A programming framework for agentic AI by Microsoft, Autogen is focused on enabling application development using conversational agents. It is distinctive for its high level of customization, enabling developers to program agents using both natural language and code to define how these agents interact. This versatility enables its use in diverse fields, from technical areas such as coding and mathematics to consumer-focused sectors like entertainment. 

Source: Arxiv, Wu et al

Using Reflection to trigger the inner monologue, Autoegn enables significant improvements in LLM responses. It allows for nested agents to provide and collate feedback before responding, as part of a larger workflow. With Tool Use, these agents can be used to validate responses and execute functions, even for LLMs that may not support function calling!

crewAI

crewAI is a framework for orchestrating role-playing, autonomous AI agents. crewAI allows creating role-based agents which can autonomously delegate tasks and inquire amongst themselves, enhancing problem-solving efficiency. These processes can be sequential or hierarchical, while being able to define tasks with customizable tools.

crewAI’s focuses on process with easily configurable agent interactions, which may be its biggest differentiator vs Autogen which does well in creating conversational, collaborative agents. It is also more flexible in process structuring than ChatDev, offering a production ready framework.

We have in our previous posts discussed in detail how Planning, Memory and Tool Use are the three core components for AI agents. In a similar vein, the key elements of an AI agent as defined by crewAI are:

  • Role playing: Shape, form or function of an AI agent. For example, “you are a FINRA analyst”.

  • Focus: Expert agents focused on specialized roles with limited tools at their disposal to not get confused or hallucinate. Further to the roles, it is important to have well defined tasks with clear goals and processes.

  • Tools: Choose a set of specific tools as per the specialized role, vs giving every agent access to all the tools available. Well defined tools are versatile, fault-tolerant and support caching.

  • Collaboration: Enable the agents to bounce ideas of each other and give feedback to improve performance.

  • Guardrails: Use guardrails to prevent agents from hallucinating or getting stuck in a loop trying to use the same tool repeatedly.

  • Memory: Crucial to enable agents to use learnings from previous interactions for future decision making.

Langgraph

Speaking of flexibility, we need to discuss Langgraph! An extension of the Langchain package, it is more developer focused, and hence offers more customization and flexibility vs both crewAI and Autogen. Langgraph has the concept of “state” at the heart of its framework (ref to state space models in the earlier post on alternative architectures). Each agent is a node in the graph, connected by edges which manage the control flow. This is akin to state space machines, whereby each node can be compared to a state, while the edges connecting them are comparable to transition matrices.

As to how it may differ to Direct Acyclic Graphs (DAG), Langgraph adds cyclicality and persistence to LLM workflows. So for DAGs the Langchain expression language is sufficient (chaining with no looping).

Langchain allows for hierarchical teams connected by a Supervisor, whereby subagents work as a team with routing being handled at multiple levels by the LLM i.e. Langgraph calling another Langgraph. This allows for orchestrating more complex processes with each sub-agent allowing for a custom prompt, LLM, tools and code.

when AGI?

It is evident how multi-agent orchestrations substantially improve performance of LLMs. Choosing specialized, fine-tuned smaller LLMs, these workflows can be optimized for cost and latency. Using Reflection and Tool Use, we are already close to emulating real world structured processes. Such is the excitement around agentic frameworks that they are being touted as the eventual path to AGI - your guess here would be as good as mine! What we can be certain of, is these frameworks powering applications in production settings for significant productivity gains - exciting times!

Until next time…Happy Reading!

Reply

or to participate.