Artificial Intelligence Agent: A new architecture for enterprise automation and intelligent transformation

Generative AI is entering the agent era, with “agentic AI” or “AI agent ” being the buzzwords right now . The agent architectures and early use cases we see today represent only the beginning of a broader transformation that promises to redefine the human-machine dynamic, with profound implications for enterprise applications and infrastructure .

In today's artificial intelligence field, generative AI is showing its wide application potential, especially in the three core application scenarios of search, synthesis and generation , which have shown strong market adaptability. For example, several companies in Menlo Ventures' portfolio, such as Sana, which focuses on enterprise search, Eve, which serves as a legal research assistant, and Typeface*, which is committed to content generation, each represent initial and groundbreaking explorations in these fields, and the core of these explorations is the few-sample reasoning capability of large language models (LLMs).

However, the future of generative AI extends far beyond these first core applications. While AI that can read and write is impressive enough, even more exciting is AI that can think and act on our behalf. Here, cutting-edge application developers such as Anterior, Sema4, and Cognition are building solutions that can handle workflows that used to require a lot of human involvement.

By introducing new building blocks such as multi-step logic , external storage , and access to third-party tools and APIs, a new generation of AI agents is pushing the boundaries of AI capabilities and driving the automation of end-to-end processes.

As we explore the field of AI agents further, we will elaborate on Menlo Ventures' views on the emerging market. First, we will clarify the definition of agents and the key factors that enable their realization. Then, we will trace the architectural development of the modern AI stack, from simple prompt input, to retrieval-augmented generation (RAG) technology, to mature agent systems. In subsequent articles, we will also explore the profound impact of this paradigm shift on the application and infrastructure levels.

1. Four building blocks of AI agents

Fully autonomous agents are defined by four elements that combine to achieve full agent capabilities: reasoning, external memory, execution, and planning.

Reasoning

At the basic level, agents need to be able to perform effective logical reasoning on unstructured data. Currently, cutting-edge basic models such as Anthropic* and OpenAI have demonstrated extremely high efficiency by incorporating partial world models into the pre-trained weights of large language models (LLMs) to acquire extensive knowledge and basic logical rules.

External Memory

In addition to built-in general knowledge, agents also rely on external information storage to save and call domain-specific knowledge and contextual information required to handle specific problems. This usually relies on advanced vector database technology such as Pinecone*.

Implement

To solve problems more effectively, agents need tools to perform tasks. Many early agent platforms provided tool sets pre-set in the code for agents to choose and use as needed. However, as technology continues to develop, more and more general agent tools have emerged, including web browsing, code parsing, authentication and authorization services, and connectors to enterprise systems (such as CRM and ERP) to perform user interface operations in these systems.

Planning

Instead of simply solving problems by continuously predicting the next token (such as writing an entire article in one go), intelligent agents use a more human-like way of thinking, breaking down complex tasks into smaller subtasks and plans, constantly reflecting on progress, and making necessary adjustments based on actual conditions. This strategic planning capability enables agents to handle complex problems more efficiently and respond flexibly to various challenges.

2. AI Examples from RAG to Autonomous Agents

It is important to note that while future fully autonomous agents may integrate all four core building blocks, current Large Language Model (LLM) applications and agents are not yet at this stage.

For example, the popular Retrieval-Augmented Generation (RAG) architecture is not agentic, but relies on reasoning and external memory as its cornerstone. Some designs, such as OpenAI's structured output function, have supported the use of tools, but the key difference is that these applications still treat LLM as a "tool" for semantic search, synthesis, or generation, and its logical flow is still determined by preset code.

In contrast, agents emerge when LLMs are placed at the heart of an application’s control flow, dynamically deciding what actions to perform, which tools to use, and how to parse and respond to input. In this case, some agents don’t even need to interact with external tools or take actual actions.

Artificial Intelligence Agent: Enterprise Automation

At Menlo, we distinguish between three types of agents, each distinguished by its primary use case and the degree of freedom it provides to control the flow of your application.

The most constrained are “decision agents” designs, which use language models to navigate predefined decision trees. “Track agents”, on the other hand, offer more degrees of freedom by setting higher-level goals for agents while constraining the solution space with standard operating procedures (SOPs) and a library of preset “tools”. Finally, at the other end of the spectrum are “general AI agents”, which are almost unconstrained by the data framework and rely entirely on the reasoning power of language models for planning, reflection, and path adjustment, similar to an unfettered for loop.

Next, we’ll dive deeper into five reference architectures and AI agent examples for each agent type.

3. Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation

Establishing a baseline: In today’s AI application space, the Retrieval Augmentation Generation (RAG) architecture has become the standard for many modern AI applications. Taking Sana’s enterprise search function as an example, we can take a closer look at its inner workings.

The entire process begins with loading and converting unstructured files (such as PDFs, slides, text files, etc.), which are usually stored in enterprise data silos such as Google Drive and Notion. Through data preprocessing engines such as Unstructured, these files are converted into a format that can be queried by large language models (LLMs). During this process, the files are "split" into smaller text units for more accurate retrieval. Subsequently, these text units are embedded as vectors and stored in databases such as Pinecone.

When a user asks the AI ​​application a query (e.g., “Please summarize all my meeting notes with Company X”), the system retrieves the most semantically relevant text units to the query and combines them into a “meta-prompt.” This meta-prompt is expanded with the retrieved information and then provided to the LLM. The LLM synthesizes an answer based on this contextual information and returns a concise and to-the-point response to the user.

It is worth noting that the above process only shows a retrieval step containing a single LLM call. In actual applications, the process of AI applications is more complicated and may contain dozens or even hundreds of retrieval steps. These applications usually adopt a "prompt chain" approach, where the output of one retrieval step is used as the input of the next retrieval step, and for different types of tasks, multiple "prompt chains" are executed in parallel. Ultimately, these results are combined to generate the final output.

Reference Architecture: RAG + Prompt Chaning

For example, Legal Research Co-Pilot Eve* might break down a research query about Title VII into separate prompt chains that focus on predetermined subtopics, such as employer background, employment history, Title VII, relevant case law, and supporting evidence for the plaintiff's case. The LLM then runs each prompt chain, generates intermediate outputs for each, and synthesizes the outputs to write a final memorandum.

4. Tool Usage

Reference Architecture: Tool Use

In the field of artificial intelligence, tool invocation or function execution marks an important step from retrieval-augmented generation (RAG) to agent behavior, adding a new dimension to modern AI architectures.

These tools are essentially predefined code modules designed to perform specific tasks. Common features such as web browsing (such as Browserbase, Tiny Fish), code parsing (such as E2B), and authorization and authentication (such as Anon) have emerged. They give large language models (LLMs) the ability to browse web pages, interact with external systems (such as CRM, ERP), and execute custom code. The system will present the available tools to the LLM, which is responsible for selecting an appropriate tool, formatting the necessary input into structured JSON data, and triggering execution through the API to complete the final operation.

Omni’s Calculations AI feature is a vivid example of this approach, which uses LLM to embed appropriate Excel functions directly into a spreadsheet, which then automatically performs calculations and generates complex query results for the user.

However, while the use of tools is powerful, they are not sufficient by themselves to constitute an "agent". This is because the logical control flow is still predetermined by the application. In the following content, we will dig into true agent design, which allows LLM to dynamically write part or all of the logical control flow.

5. Decisioning Agent

Reference Architecture: Decisioning Agent

Next, the first type of agent we will discuss in depth is the decision agent, which uses the agent's decision-making ability to guide complex multi-step reasoning processes and make business decisions based on them. Unlike the RAG architecture or tool usage, this new architecture is the first to hand over part of the control logic to a large language model (LLM) instead of hard-coding all the steps in advance. However, it is still at a lower level of the agent's freedom range, because the agent mainly plays the role of a "router" that navigates between decision trees.

Take Anterior (formerly Co:Helm), a company focused on health plan automation that has developed a clinical decision engine to automatically review claim submissions, which traditionally have been done manually by nurses using payer rules that include conditional knowledge, similar to a complex “choose your adventure” game.

Anterior simplifies this cumbersome process. They first transform the payer rules into a directed acyclic graph (DAG) using rule-based scripts and language models. Then, their agents begin traversing this decision tree, using LLMs at each node to evaluate relevant clinical documents based on specific rules. For simpler nodes, this may only involve a basic retrieval augmentation generation (RAG) step. However, Anterior often faces complex tasks that require subchains, where the agent must choose the best path to proceed to the next node. It updates its state with each decision (managing these intermediate results in memory) and continues to advance through the decision tree until the final decision is made.

Anterior is not alone, and other companies are taking this approach. For example, Norm AI is building an AI agent for regulatory compliance, while Parcha is working on building an agent for customer identity verification (KYC).

 



Leave a Comment

Teach AI Agents to collaborate and compete! CAMEL, the first large-scale multi-agent framework, has received 3.6k stars

Teach AI Agents to collaborate and compete! CAMEL, the first large-scale multi-agent framework, has received 3.6k stars

【New Intelligence Introduction】 AI Agents are a hot topic in the field of large models. Users can introduce multiple LLM Agents with different roles to participate in actual tasks. Agents will engage in various forms of dynamic interactions such as competition and collaboration, thereby producing amazing group intelligence effects. This article introduces the large model mind interaction CAMEL framework (Camel) from the KAUST research team. The CAMEL framework is the earliest well-known project of autonomous agents based on ChatGPT, and has been accepted by the top artificial intelligence conference NeurIPS 2023.

Exploring the Multi-AI Agent Model: Opportunities, Applications, and Future Prospects

Exploring the Multi-AI Agent Model: Opportunities, Applications, and Future Prospects

Abstract: The multi-AI agent model is a powerful artificial intelligence architecture that uses the collaboration and interaction between multiple agents to solve complex problems, perform diverse tasks, and simulate complex system behaviors. In this model, each agent has independent perception, decision-making, and action capabilities, and optimizes the overall system goals through mutual collaboration and information sharing.

Agent application development based on large model (LLM)

Agent application development based on large model (LLM)

At present, the industry generally believes that applications based on large models are concentrated in two directions: RAG and Agent. Regardless of which application, designing, implementing, and optimizing applications that can fully utilize the potential of large models ( LLM ) requires a lot of effort and expertise. As developers begin to create increasingly complex LLM applications, the development process inevitably becomes more complicated. The potential design space of such a process can be huge and complex. The article " How to Build Apps Based on Large Models " provides an exploratory basic framework for large model application development, which can basically be applied to RAG and Agent. However, is there anything unique about agent-oriented large model application development? Is there a large model application development framework focused on Agent?

AI Agent Framework (LLM Agent): How LLM-driven agents lead industry change, application exploration and future prospects

AI Agent Framework (LLM Agent): How LLM-driven agents lead industry change, application exploration and future prospects

Many people may wonder, Agent seems to be not that far from LLM, so why is Agent so popular recently, and why is it not called LLM-Application or other words? This has to start with the origin of Agent, because Agent is a very old term, and can even be traced back to the remarks of Aristotle and Hume. In a philosophical sense, agent refers to an entity with the ability to act, and the word agent refers to the exercise or manifestation of this ability. In a narrow sense, agent usually refers to the manifestation of intentional action; accordingly, the word agent refers to an entity with desires, beliefs, intentions and the ability to act. It should be noted that agents include not only human individuals, but also other entities in the physical and virtual worlds. Importantly, the concept of agent involves the autonomy of individuals, giving them the ability to exercise their will, make choices and take actions, rather than passively responding to external stimuli.

Artificial Intelligence Agent: A new architecture for enterprise automation and intelligent transformation

Artificial Intelligence Agent: A new architecture for enterprise automation and intelligent transformation

Generative AI is entering the agent era, with “agentic AI” or “AI agent ” being the buzzwords right now . The agent architectures and early use cases we see today represent only the beginning of a broader transformation that promises to redefine the human-machine dynamic, with profound implications for enterprise applications and infrastructure .

Generative AI Revolution: 101 Case Study AI Agent How to reshape the business world

Generative AI Revolution: 101 Case Study AI Agent How to reshape the business world

Imagine that every decision in an enterprise can be based on in-depth data analysis, customer service can be personalized to each users needs, and internal processes are automated to the point where human supervision is almost unnecessary. This is not a futuristic fantasy, but an enterprise revolution led by AI Agents , which is rapidly changing our business world.

6 types of AI Agents

6 types of AI Agents

AI Agent is a virtual assistant powered by artificial intelligence that can help automate processes, generate insights, and improve efficiency. This article mainly introduces 6 different AI Agents.

Types, Benefits, and Examples of AI Agents

Types, Benefits, and Examples of AI Agents

AI agents are a key technological advancement that is reshaping business dynamics. Learn how these agents operate and discover their key benefits including efficiency, scalability, and cost-effectiveness. We will explore real-world examples of agents and their applications in various fields, paving the way for future AI trends and their impact on customer experience.

AI Agents are booming, and every company will have more than one AI Agent

AI Agents are booming, and every company will have more than one AI Agent

So, what exactly is an AI agent? Tech giants and startups are exploring the potential of AI agents. However, some companies confuse AI assistants and AI chatbots with fully developed autonomous AI agents. AI Agents consist of a set of instructions and AI models, as well as tools that can be used. Through these instructions, tools, and AI models, AI Agents can focus the power of AI models on specific tasks and provide the right information needed to achieve better results.

Where will AI Agents go in 2025? Two cutting-edge reports paint a panoramic picture

Where will AI Agents go in 2025? Two cutting-edge reports paint a panoramic picture

In 2024, the concept of AI Agents has emerged as a pivotal advancement in the application of large language models (LLMs), akin to harnessing fire and flint to propel human civilization into a new era. While the prominence of big models has plateaued, resulting in a winner-takes-all landscape, AI Agents represent the most effective utilization of these models, addressing the inherent limitations of LLMs in targeted applications. By leveraging the strengths of AI Agents, organizations can navigate complexity and drive innovation, marking a significant milestone in the evolution of artificial intelligence.