AI Agents

Agents, Agents, Agents everywhere. How did we get here?

Agents are the new frontier of AI. They've become the framework that practitioners and startups use to market themselves. But is this concept really new? Yes and No.

Let's briefly follow the history of Agents starting from year zero: the release of ChatGPT in November 2022.

1st Phase: LLMs are AGI

From December 2022 to March 2023, agents weren't really a topic. People were still trying to understand what to do with LLMs, and there was a widespread belief that we could achieve AGI simply through new model iterations (GPT-4, 5, 6, or whatever). The main focus was on developing what was initially called "semantic search." This technique later evolved into RAG (Retrieval Augmented Generation). The early approach was very simple (now called Naive RAG) compared to more complex techniques (which we now call Agentic RAG).

2nd Phase: AutoGPT and BabyAGI

Around April 2023, it became clear that LLMs alone couldn't achieve all the cool things our imagination could envision. The term "Agents" started appearing more frequently, and projects like AutoGPT and BabyAGI became incredibly popular. And I mean REALLY popular. AutoGPT crashed any conceivable GitHub Star growth records. In the image below you can see how ridiculous its growth was compared to previous VERY SUCCESSFUL projects like Kubernetes and Django.

The main ideas revolved around:

Providing tools to LLMs (such as web browsing, email capabilities, etc.)
Implementing a planner cycle: having the LLM develop a step-by-step plan for specific queries/tasks and then execute each step
Incorporating concepts like persistence and memory (cleverly leveraged by vector database companies to raise hundreds of millions in funding)

Problem: The technology wasn't (and still isn't) ready for solving generic tasks. Without constraints and clear directions, LLMs hallucinate and make mistakes. In multi-step tasks requiring 5 or 10 steps to complete, an error in any single step cascades through the process, rendering the final output unusable.

Results: While these projects were fascinating, they remained far from production-ready applications.

3rd Phase: First Glimpses of Agentic Behaviors

Okok we got it, agents can’t do everything. Still, during 2023 functionalities like ChatGPT's Code Interpreter emerged as the closest thing we had to working agents. OpenAI also attempted to launch plugins, which seemed promising in theory but struggled with integration issues. Indeed, having integrations that both function properly and are called at the right time with the correct parameters remains one of the key challenges in developing functional AI Agents.

On the research front, papers like Voyager and MetaGPT advanced interesting new concepts and offered an early glimpse of agents' potential capabilities.

Andrej Karpathy came up with the analogy between LLMs systems (agents) and OS. The idea got very popular and many startups started using the line “We are the AI OS for X“.

4th Phase: Unbundling

In the first quarter of 2024 and people started to focus more on products that are more specific, doing one single thing well, rather than a horizontal product like ChatGPT, which also saw its growth slow down.

People are now building LLM systems, called Agents, for specific industry verticals, or tasks (for example coding with Devin and Cursor). It is still unclear what is the level of autonomy for these agents and there is a lot of research on UI/UX.
It is clear that in many cases people and agents will work together with very close feedback loops. The most known examples are artifacts in Claude and canvas in ChatGPT.

What is coming next?

The framework below from OpenAI shows how the company thinks about the next decades.

This is a simplification of course.

We are supposedly between Level 2 and 3. The O1-preview model is supposed to be the first step in that direction (but arguably we are not totally there yet).
Startups are now trying to build the first products that could be classified as Level 3. 2025 will be about the first success stories. Level 3 could bring innovations for years, if not decades to come.

The ideas are there for everybody to grab. Code assistants, sales assistants improving the closing rate of below average salespeople, personal assistants doing the job of executive VAs, customers support, financial planners, health coaches, market researchers, etc, etc… I can go on for longer than you are willing to read (congratulations for making it so far btw).

A mixture of improvements in the base models and good engineering will lead us there. Many teams are tackling the same problems. Ideas are worth a dime and execution, as always, will be key.

Agents are the new frontier of AI. They've become the framework that practitioners and startups use to market themselves. But is this concept really new? Yes and No.

Let's briefly follow the history of Agents starting from year zero: the release of ChatGPT in November 2022.

1st Phase: LLMs are AGI

From December 2022 to March 2023, agents weren't really a topic. People were still trying to understand what to do with LLMs, and there was a widespread belief that we could achieve AGI simply through new model iterations (GPT-4, 5, 6, or whatever). The main focus was on developing what was initially called "semantic search." This technique later evolved into RAG (Retrieval Augmented Generation). The early approach was very simple (now called Naive RAG) compared to more complex techniques (which we now call Agentic RAG).

2nd Phase: AutoGPT and BabyAGI

Around April 2023, it became clear that LLMs alone couldn't achieve all the cool things our imagination could envision. The term "Agents" started appearing more frequently, and projects like AutoGPT and BabyAGI became incredibly popular. And I mean REALLY popular. AutoGPT crashed any conceivable GitHub Star growth records. In the image below you can see how ridiculous its growth was compared to previous VERY SUCCESSFUL projects like Kubernetes and Django.

The main ideas revolved around:

Providing tools to LLMs (such as web browsing, email capabilities, etc.)
Implementing a planner cycle: having the LLM develop a step-by-step plan for specific queries/tasks and then execute each step
Incorporating concepts like persistence and memory (cleverly leveraged by vector database companies to raise hundreds of millions in funding)

Problem: The technology wasn't (and still isn't) ready for solving generic tasks. Without constraints and clear directions, LLMs hallucinate and make mistakes. In multi-step tasks requiring 5 or 10 steps to complete, an error in any single step cascades through the process, rendering the final output unusable.

Results: While these projects were fascinating, they remained far from production-ready applications.

3rd Phase: First Glimpses of Agentic Behaviors

Okok we got it, agents can’t do everything. Still, during 2023 functionalities like ChatGPT's Code Interpreter emerged as the closest thing we had to working agents. OpenAI also attempted to launch plugins, which seemed promising in theory but struggled with integration issues. Indeed, having integrations that both function properly and are called at the right time with the correct parameters remains one of the key challenges in developing functional AI Agents.

On the research front, papers like Voyager and MetaGPT advanced interesting new concepts and offered an early glimpse of agents' potential capabilities.

Andrej Karpathy came up with the analogy between LLMs systems (agents) and OS. The idea got very popular and many startups started using the line “We are the AI OS for X“.

4th Phase: Unbundling

In the first quarter of 2024 and people started to focus more on products that are more specific, doing one single thing well, rather than a horizontal product like ChatGPT, which also saw its growth slow down.

People are now building LLM systems, called Agents, for specific industry verticals, or tasks (for example coding with Devin and Cursor). It is still unclear what is the level of autonomy for these agents and there is a lot of research on UI/UX.
It is clear that in many cases people and agents will work together with very close feedback loops. The most known examples are artifacts in Claude and canvas in ChatGPT.

What is coming next?

The framework below from OpenAI shows how the company thinks about the next decades.

This is a simplification of course.

We are supposedly between Level 2 and 3. The O1-preview model is supposed to be the first step in that direction (but arguably we are not totally there yet).
Startups are now trying to build the first products that could be classified as Level 3. 2025 will be about the first success stories. Level 3 could bring innovations for years, if not decades to come.

The ideas are there for everybody to grab. Code assistants, sales assistants improving the closing rate of below average salespeople, personal assistants doing the job of executive VAs, customers support, financial planners, health coaches, market researchers, etc, etc… I can go on for longer than you are willing to read (congratulations for making it so far btw).

A mixture of improvements in the base models and good engineering will lead us there. Many teams are tackling the same problems. Ideas are worth a dime and execution, as always, will be key.