AI-Agents 02
The Architectural Spectrum of Agentic Systems

Michael Schöffel
January 20, 202620 min. read
Content
- 1. Summary
- 2. The Paradigm Shift to Agentic Orchestration
- 3. Pure Python: The Architecture of Total Control
- 4. LangChain: The Building Block Principle (Pipeline Architecture)
- 5. LangGraph: The Intelligent Cycle (State Machine Architecture)
- 6. AI Agentic Systems: Role-Based Teams and Orchestration
- 7. Event-Driven & Declarative Architectures
- 8. Long-Running AI Agentic Systems: Persistence and Memory
- 9. Standardized Protocols: The Future of Interoperability
- 10. Conclusion and Strategic Recommendations
- Further posts
1. Summary
The evolution from isolated generative models to "Agentic AI" marks a fundamental paradigm shift in artificial intelligence. The focus is no longer on statistical token prediction, but on the architecture of autonomous systems as a cognitive operating system. Agentic systems differ from pure generative AI through autonomous orchestration, persistent state management, and goal-oriented tool usage across extended planning horizons. This forces a transformation of software development from linear pipelines to complex, cyclical control loops ("Reasoning Loops").
In implementation, "Pure Python" architectures with maximum control compete against specialized framework solutions. While avoiding frameworks ensures transparency for critical infrastructures but requires high maintenance effort, solutions like LangGraph have prevailed for complex requirements. Through graph-based state machines, they enable cyclical workflows, self-reflection, and stable "Human-in-the-Loop" interactions that cannot be realized with linear approaches like LangChain. Additionally, frameworks like CrewAI offer abstractions for role-based team dynamics.
For the scalability of "Long-Running Systems," robust persistence mechanisms like checkpointing are absolutely necessary to ensure fault tolerance and temporal continuity. However, the critical bottleneck for a networked AI ecosystem remains interoperability. The adoption of open standards - specifically the Model Context Protocol (MCP) for tool integration and the Agent-to-Agent (A2A) Protocol for collaboration - forms the necessary foundation for modular systems. Thus, the future AI architecture is based on standardized, autonomous cooperation rather than monolithic silos.
2. The Paradigm Shift to Agentic Orchestration
The landscape of artificial intelligence is undergoing a fundamental metamorphosis in 2026 that goes far beyond the iterative improvement of model parameters or context windows. We are observing a historic departure from isolated, stochastic "Prompt-Response" interactions toward complex, agentic systems characterized by autonomy, persistent state management, and collaborative problem-solving. While generative AI (GenAI) was primarily focused on content creation, "Agentic AI" aims at autonomously achieving complex goals through multi-step action sequences [1].
In this context, the core challenge of modern software architecture no longer lies solely in the quality of the underlying Large Language Model (LLM), but in orchestration. An AI model, no matter how powerful it may be, is like a highly intelligent advisor operating in an isolated room without tools, memory, or procedural instructions. Without a defined architecture, this advisor lacks the ability for temporal continuity, the use of external aids, and the structured decomposition of abstract ambitions into executable steps. The AI architecture functions here as the cognitive operating system that dictates to the model when external information should be retrieved, where intermediate results are persisted, and how errors should be corrected through reflexive loops.
In academic taxonomy, this leap is clearly distinguished. While classic "AI Agents" are often defined as modular systems with high autonomy within specific tasks, "Agentic AI" describes a system that operates through goal decomposition, inter-agent communication, and contextual adaptation across wide planning horizons.
| Feature | Generative AI | Agentic AI |
|---|---|---|
| Primary Goal | Content creation (text, image, code) | Autonomous goal achievement & problem-solving |
| Temporal Continuity | Stateless, session-based | Persistent across workflow stages |
| Planning Horizon | None (Immediate Response) | High (Decomposition into sub-tasks) |
| Memory | Static (Pre-trained) or short context window | Shared episodic & procedural memory |
| Interaction | Linear (Human → AI → Human) | Collaborative (Multi-Agent & Human-in-the-Loop) |
This report provides an exhaustive analysis of the dominant architectural patterns defining this new ecosystem. We examine the spectrum from "Pure Python" implementations that offer maximum control with highest manual effort, to highly abstracted frameworks like LangChain, LangGraph, CrewAI, and LlamaIndex Workflows. A particular focus lies on the technological differentiation between transient systems and "Long-running AI Agentic Systems" that require persistent state management over days or weeks. Finally, we analyze the critical infrastructure for interoperability, namely the Model Context Protocol (MCP) and the Agent-to-Agent (A2A) Protocol, which drive the standardization of communication between models, data sources, and autonomous agents.
3. Pure Python: The Architecture of Total Control
The "Pure Python" approach, often also classified as "No-Framework" or "Vendor-Agnostic Architecture", represents the foundation upon which all more abstract systems are built. It represents the architecture of total control and transparency and often forms the "Escape Hatch" for senior engineers who need to free themselves from the constraints and abstraction layers of modern frameworks.
3.1 Conceptual Philosophy and the "Reasoning Loop"
The philosophy behind Pure Python is the reduction of complexity by avoiding unnecessary middleware. In an era where frameworks often suffer from the problem of "Leaky Abstractions" - where the internal complexity of the framework complicates debugging and makes stack traces unreadable -, direct access to the API of the model provider (e.g., OpenAI, Anthropic) is often the most robust path.
An LLM like GPT-4 is stateless by definition. It has no inherent memory of previous interactions between two API calls. So if we want to conduct a conversation or an agentic process, we as architects must artificially create cognitive continuity. This happens through manual management of a list structure (List/Array), in which all previous messages (SystemMessage, UserMessage, AssistantMessage) are stored and sent again to the model with each new inference step.
The core mechanism of this approach is the infinite loop (While-Loop), which implements the so-called "Reasoning Loop" (thinking cycle) [2]. The algorithm follows a strict deterministic pattern:
- Initialization: A list is created that functions as context storage.
- Model Inference: In each iteration of the loop, the entire history is sent to the model. The model decides based on this context on the next step ("Next Token Prediction" at the macro level).
- Decision Tree: The model delivers either a textual response or requests the execution of a tool (Tool Call). This is the critical branching point in the code.
- Execution: When a tool call is signaled (e.g., through a special JSON schema or tool_calls in the API response), the Python code interrupts the flow. It extracts the function name and the arguments, executes the corresponding Python function locally and injects the result (Observation) back into the message list.
- Recursion: The process repeats with the now enriched context, until the model generates a final response or an abort condition is reached.
This code implements the described approach. We use the OpenAI SDK here as an example, as it is the industry standard for this type of API interaction, but completely omit abstract frameworks like LangChain or CrewAI.
1import openai
2import json
3
4# 1. Initialization: Define tools (Local Functions)
5def get_weather(location):
6 """Simulates an API call for weather data."""
7 print(f"--- [System] Executing tool 'get_weather' for {location} ---")
8 return {"temp": "22°C", "condition": "Sunny"}
9
10# Mapping for execution
11available_tools = {
12 "get_weather": get_weather
13}
14
15def run_pure_python_agent(user_prompt):
16 client = openai.OpenAI() # Requires API key in environment variable
17
18 # 2.1 Cognitive Continuity: The manual list structure
19 messages = [
20 {"role": "system", "content": "You are a helpful assistant. Use tools if needed."},
21 {"role": "user", "content": user_prompt}
22 ]
23
24 # 2.2 The Reasoning Loop (Infinite loop with abort condition)
25 while True:
26 # Step: Model Inference
27 response = client.chat.completions.create(
28 model="gpt-4o",
29 messages=messages,
30 functions=[{ # Tool definition for the model
31 "name": "get_weather",
32 "parameters": {
33 "type": "object",
34 "properties": {"location": {"type": "string"}}
35 }
36 }]
37 )
38
39 response_message = response.choices[0].message
40
41 # 2.3 Decision Tree: Tool Call or Response?
42 if response_message.function_call:
43 # Execution & Recursion
44 function_name = response_message.function_call.name
45 function_args = json.loads(response_message.function_call.arguments)
46
47 # Execute tool locally
48 function_response = available_tools[function_name](**function_args)
49
50 # Inject result into History (Important for cognitive continuity)
51 messages.append(response_message) # Save the model's request
52 messages.append({
53 "role": "function",
54 "name": function_name,
55 "content": json.dumps(function_response)
56 })
57 # Loop continues -> Next inference step
58 else:
59 # Final response received
60 return response_message.content
61
62# Example call
63if __name__ == "__main__":
64 result = run_pure_python_agent("What's the weather like in Berlin?")
65 print(f"\nFinal Answer: {result}")
663.2 Technical Analysis: Transparency vs. Responsibility
The decision for Pure Python is a decision for radical transparency. The developer understands every line of code. There are no hidden prompts that the framework injects in the background ("System Prompts"), and no opaque token costs. This is especially critical for debugging: When an agent hallucinates or is caught in a loop, this approach allows direct inspection of the raw messages and the state object. Standard debuggers and logging mechanisms work immediately.
However, this approach shifts all responsibility to the developer. Topics like parsing the often inconsistent LLM responses, catching API errors (rate limits, timeouts), managing retries, and especially formatting data for different model providers must be implemented manually. This quickly leads to high maintenance effort with growing complexity, as features that are "free" in frameworks (e.g., streaming, caching) must be built here from scratch.
3.3 Use Case: Critical Infrastructure
This architectural style is ideal for highly critical production systems where predictability, security, and auditability are more important than fast time-to-market. It is excellent for microservices that must fulfill a very specific, narrow task (e.g., classification of financial data), without carrying the overhead of a large library. It is also the preferred choice when absolutely no dependencies on third-party code (except the model provider's SDK) are allowed.
4. LangChain: The Building Block Principle (Pipeline Architecture)
LangChain historically marked the first major step toward democratization of LLM application development and established the fundamental concept of the "Chain". It functioned as the "duct tape" that held together the fragmented world of early LLM APIs and created standardized interfaces [3].
4.1 The LangChain Expression Language (LCEL) and DAGs
LangChain's architecture is based on composition and the assembly line principle. Instead of writing monolithic scripts, the framework offers pre-built modules: "Prompts" (instruction templates), "Models" (interfaces to different LLMs), and "Output Parsers" (which convert unstructured text into structured data). These building blocks are connected using the LangChain Expression Language (LCEL). LCEL uses a declarative syntax heavily inspired by UNIX pipes (|) to define the data flow. Data flows as in an industrial process from left to right: Input | Prompt | Model | Parser.
Architecturally, LangChain is primarily designed for DAGs (Directed Acyclic Graphs) - that is, directed, acyclic graphs. This means the information flow is linear and has no loops. This makes the framework extremely efficient for tasks with a clear start and end sequence, but limits it in its ability for complex problem-solving, which often requires iteration.
This diagram visualizes the "assembly line principle" (pipeline architecture), where data flows linearly from one component to the next, without feedback loops (Directed Acyclic Graph).
The following example shows how the pipe syntax (|) mentioned in the text looks in practice. We create a chain that processes a simple question and parses the result directly into a structured format (JSON-like).
1# Requirements: pip install langchain langchain-openai
2from langchain_openai import ChatOpenAI
3from langchain_core.prompts import ChatPromptTemplate
4from langchain_core.output_parsers import StrOutputParser
5
6# 1. Model Interface (Interface to LLMs)
7model = ChatOpenAI(model="gpt-4o-mini")
8
9# 2. Prompt (Instruction template)
10prompt = ChatPromptTemplate.from_template(
11 "You are a technical expert. Explain the concept of {topic} in one short sentence."
12)
13
14# 3. Output Parser (Converts unstructured text to string)
15parser = StrOutputParser()
16
17# 4. The "Chain" - The heart (Building block principle via LCEL)
18# Here you see the UNIX pipes | that define the DAG
19chain = prompt | model | parser
20
21# Execute the pipeline
22if __name__ == "__main__":
23 result = chain.invoke({"topic": "Directed Acyclic Graphs (DAG)"})
24 print(f"Chain Result:\n{result}")
254.2 Dominance in the RAG Sector
LangChain's greatest strength lies in its enormous integration density. There are hundreds of ready-made connectors ("Loaders") to databases, file systems (PDF, Notion, Slack), search engines, and vector stores. This has made LangChain the de facto standard for Retrieval Augmented Generation (RAG)[4]. In a RAG scenario, the flow is usually linear: "Load document, split it into chunks, create embeddings, search for relevant passages for the user's question, generate response". For this pipeline architecture, LangChain is optimized and unbeatable in terms of implementation efficiency.
4.3 Criticism and Limitations
The flip side of the coin is debugging complexity. When a chain fails, due to the deep nesting of "Runnables", it is often hard to figure out which building block the error is in. The "black box" problem is accepted here in exchange for development speed. Critics often complain that simple tasks become unnecessarily complicated through abstraction and stack traces are hard to read. As soon as conditional logic (loops, complex if/else branching) becomes necessary, the LCEL syntax often becomes awkward and hard to understand, making the transition to graph-based approaches necessary.
5. LangGraph: The Intelligent Cycle (State Machine Architecture)
In the real world, problem-solving processes are rarely strictly linear. A human expert doesn't write a report in one go from A to Z. They create a draft, review it, correct errors, research missing facts, and revise the text. LangGraph extends LangChain's concept to model exactly these cyclical workflows and marks the transition from pipelines to state machines [5].
5.1 Graph Theory as Architectural Foundation
LangGraph models AI applications as Finite State Machines. The architecture views the agent as a graph, consisting of three core components:
- Nodes: These represent work steps or functions (e.g., "Research", "Write", "Execute code"). Each node performs a specific task and returns an update for the state.
- Edges: These define the control flow between nodes.
- State: A central data object (often a TypedDict or Pydantic Model), which holds the current status of the application. Each node reads this state and writes changes into it.
The crucial architectural breakthrough is the introduction of Cycles (Loops). The graph can at intersections ("Conditional Edges") based on the current state decide to return to a previous node. This was not possible in LangChain's DAG architecture.
This example implements a minimal "Writer-Reviewer" graph. An agent creates text, and a "critic" node decides if it's good enough or needs to be revised.
1from typing import TypedDict, Annotated, List
2from langgraph.graph import StateGraph, END
3
4# 1. State Definition
5class AgentState(TypedDict):
6 content: str
7 feedback: str
8 iterations: int
9
10# 2. Node Definitions
11def designer_node(state: AgentState):
12 print("--- DESIGNER: Creating draft ---")
13 content = state.get("content", "")
14 if not content:
15 content = "This is a first draft about AI systems."
16 else:
17 content += " (Revised based on feedback)"
18
19 return {"content": content, "iterations": state.get("iterations", 0) + 1}
20
21def reviewer_node(state: AgentState):
22 print("--- REVIEWER: Checking quality ---")
23 if state["iterations"] < 2:
24 return {"feedback": "Please add more details."}
25 return {"feedback": "Good enough."}
26
27# 3. Logic for conditional edges
28def should_continue(state: AgentState):
29 if state["feedback"] == "Good enough.":
30 return "end"
31 return "continue"
32
33# 4. Assemble the graph
34workflow = StateGraph(AgentState)
35
36# Add nodes
37workflow.add_node("designer", designer_node)
38workflow.add_node("reviewer", reviewer_node)
39
40# Define edges
41workflow.set_entry_point("designer")
42workflow.add_edge("designer", "reviewer")
43
44# Conditional edge: Reviewer -> Designer OR End
45workflow.add_conditional_edges(
46 "reviewer",
47 should_continue,
48 {
49 "continue": "designer",
50 "end": END
51 }
52)
53
54# Compile graph
55app = workflow.compile()
56
57# Test run
58inputs = {"content": "", "iterations": 0}
59for output in app.stream(inputs):
60 print(output)
615.2 Reflection, Self-Correction and Human-in-the-Loop
This cyclical structure enables cognitive patterns like "Self-Reflection". An agent can generate a response, evaluate it itself in a subsequent step (e.g., "Is the generated code syntactically correct?" or "Is the answer factually supported?"), and repeat the process step if the result is negative. This leads to significantly higher robustness against errors.
Another unique selling point is the persistence layer ("Checkpointer"). LangGraph saves the state after each step. This not only enables fault tolerance (resumption after crash), but also Human-in-the-Loop workflows. The graph can pause at a specific point ("Breakpoint"), wait for human approval (e.g., before sending an email or a financial transaction), and then continue with the approved or corrected state.
5.3 Use Case: Complex Agents
LangGraph is the industry standard for complex enterprise agents, coding assistants, and customer support bots that need to solve problems autonomously. It is used wherever the logic is non-linear and the agent must "think" and adapt its strategy during runtime.
6. AI Agentic Systems: Role-Based Teams and Orchestration
While LangGraph addresses the technical level of state management, approaches like "AI Agentic Systems" focus on the semantic level of collaboration. Instead of programming a single monolithic agent, a team of specialized agents is created. This reflects human work organization, where complex projects are rarely solved by individuals, but by teams with diversified skills.
6.1 CrewAI: The Role-Based Framework
CrewAI abstracts team dynamics through a strong metaphor system [6]. Agents are defined like employees: They receive a role (e.g., "Senior Research Analyst"), a goal, and a backstory(personality/resume).
- Architecture: The framework acts as orchestrator. It assigns tasks and manages communication. When a "Researcher" agent delivers results, these are automatically passed as context to the "Writer" agent. Processes can be organized sequentially or hierarchically.
- Strengths: The definition via natural language and roles is extremely intuitive and massively lowers the entry barrier. Agents can autonomously delegate tasks and collaborate.
- Weaknesses: High token consumption is critical. The extensive "backstories" and system prompts sent with each step increase costs and latency. Additionally, such systems tend toward instability and "black box" behavior in unstructured chats, as it is difficult to trace why an agent made a specific decision.
The following diagram illustrates how the orchestrator uses roles, goals, and backstories to process tasks in a sequential chain.
This example shows the implementation of the concepts described in the text (role, goal, backstory) using the crewai framework.
1import os
2from crewai import Agent, Task, Crew, Process
3
4# 1. Define agents with specific roles and personalities
5researcher = Agent(
6 role='Senior Research Analyst',
7 goal='Find the latest trends in AI agents 2024',
8 backstory="""You are an experienced analyst with an eye for
9 disruptive technologies. Your writing style is precise and fact-oriented.""",
10 verbose=True,
11 allow_delegation=False
12)
13
14writer = Agent(
15 role='Content Strategist',
16 goal='Create an easy-to-understand blog post based on the research',
17 backstory="""You specialize in making complex technical topics
18 accessible to a broad audience without losing depth.""",
19 verbose=True,
20 allow_delegation=False
21)
22
23# 2. Define tasks
24task_research = Task(
25 description='Analyze the top 3 frameworks for multi-agent systems.',
26 expected_output='A detailed report on CrewAI, LangGraph and AutoGPT.',
27 agent=researcher
28)
29
30task_write = Task(
31 description='Write a blog article about the advantages of role-based AIs.',
32 expected_output='A blog post with 500 words in Markdown format.',
33 agent=writer,
34 context=[task_research] # Here the result of Task 1 is passed as context
35)
36
37# 3. The Crew orchestrates the collaboration
38my_crew = Crew(
39 agents=[researcher, writer],
40 tasks=[task_research, task_write],
41 process=Process.sequential, # Sequential flow as described in the text
42 full_output=True,
43 verbose=True
44)
45
46# Start the process
47result = my_crew.start()
48
49print("######################")
50print("TEAM RESULT:")
51print(result)
526.2 LangGraph Supervisor Model
A structured alternative to CrewAI's often freer chat is the Supervisor Model within LangGraph. Here, a central router agent (the supervisor) is used, which acts as a manager [7].
- Functionality: The supervisor receives the request, analyzes it, and decides which specialist (node in the graph) is responsible. It delegates the work, and after completion, control flow returns to the supervisor. It checks the result and decides whether the task is complete or another specialist (e.g., for editing) is needed.
- Advantages:
- Determinism: The developer defines exactly in the graph who can communicate with whom.
- State Control: The global "State" contains the entire history, ensuring auditability.
- Modularity: New specialists can easily be added as nodes without rewriting the overall logic.
This diagram illustrates the cyclical process where control returns to the supervisor after each work step until the goal is achieved.
This example outlines how a Supervisor Model is structurally built with langgraph and langchain. It shows the orchestration between a "Researcher" and a "Writer".
1import operator
2from typing import Annotated, List, TypedDict, Union
3
4from langchain_openai import ChatOpenAI
5from langgraph.graph import StateGraph, END
6from langchain_core.messages import BaseMessage, HumanMessage
7
8# 1. State Definition
9class AgentState(TypedDict):
10 # The history of the entire communication
11 messages: Annotated[List[BaseMessage], operator.add]
12 # Who is next?
13 next: str
14
15# 2. The Worker Nodes (Specialists)
16def researcher_node(state: AgentState):
17 # Simulates research
18 last_message = state['messages'][-1].content
19 return {"messages": [HumanMessage(content=f"Research result on: {last_message}", name="Researcher")]}
20
21def writer_node(state: AgentState):
22 # Simulates writing a text
23 return {"messages": [HumanMessage(content="Here is the detailed article based on the research.", name="Writer")]}
24
25# 3. The Supervisor Node (Decision Logic)
26def supervisor_node(state: AgentState):
27 llm = ChatOpenAI(model="gpt-4o")
28 # In reality, an LLM would decide who is next
29 # For this example, we use simple logic:
30 if len(state['messages']) < 2:
31 return {"next": "Researcher"}
32 elif "Researcher" in state['messages'][-1].name:
33 return {"next": "Writer"}
34 else:
35 return {"next": "FINISH"}
36
37# 4. Graph Creation
38workflow = StateGraph(AgentState)
39
40# Add nodes
41workflow.add_node("Supervisor", supervisor_node)
42workflow.add_node("Researcher", researcher_node)
43workflow.add_node("Writer", writer_node)
44
45# Define edges (The heart of orchestration)
46workflow.add_edge("Researcher", "Supervisor")
47workflow.add_edge("Writer", "Supervisor")
48
49# Conditional edges from Supervisor
50workflow.add_conditional_edges(
51 "Supervisor",
52 lambda x: x["next"],
53 {
54 "Researcher": "Researcher",
55 "Writer": "Writer",
56 "FINISH": END
57 }
58)
59
60# Set entry point
61workflow.set_entry_point("Supervisor")
62
63# Compile graph
64app = workflow.compile()
65
66# Test run
67inputs = {"messages": [HumanMessage(content="Write a blog post about quantum computers.")]}
68for output in app.stream(inputs):
69 print(output)
706.3 Flexible Graph Topologies: Decentralized and Custom Orchestration
While the supervisor enforces a hierarchical structure, the true strength of LangGraph lies in its agnosticism toward topology. Developers are not restricted to the manager model but can design decentralized networks or specific pipelines.
How it works: In this mode, no central decision maker exists. Instead, the logic of task handoff is programmed directly into the "edges" of the graph. An agent can pass the result of its work directly to the next specialist (chaining) or decide based on a condition which path to take next.
Advantages:
- Lower Latency & Cost: Since no supervisor LLM needs to analyze at every intermediate step who comes next, tokens and time are saved.
- Explicit Workflows: For processes with a clear sequence (e.g., extraction → validation → archiving), direct chaining is more efficient and less error-prone than dynamic assignment.
- State-Driven Cycles: One can precisely define that an agent repeats its task (self-correction) until a technical condition is met, without requiring a human or AI manager to intervene.
Comparison of approaches: In LangGraph, a system can thus function like a clockwork (fixed defined paths), a team (supervisor), or a free network (peer-to-peer). This makes it the most precise tool for complex enterprise applications where predictability is more important than the purely intuitive task distribution of frameworks like CrewAI.
In this diagram, you see that the data flow starts linearly, but creates a direct feedback loop between Editor and Writer when quality issues arise - without detouring through a supervisor.
In this code, we use add_edge for fixed transitions and add_conditional_edges for the editor's quality control.
1import operator
2from typing import Annotated, TypedDict, Literal
3from langgraph.graph import StateGraph, END
4
5# 1. State Definition
6class GraphState(TypedDict):
7 # The document being worked on
8 content: str
9 # Research notes
10 research_notes: str
11 # Number of revisions so far (to prevent infinite loops)
12 revision_count: int
13
14# 2. Node Functions (Nodes)
15def researcher(state: GraphState):
16 print("--- RESEARCHER WORKING ---")
17 return {
18 "research_notes": "AI Trends 2026: Agentic Workflows and LangGraph Flexibility.",
19 "revision_count": 0
20 }
21
22def writer(state: GraphState):
23 print("--- WRITER WORKING ---")
24 notes = state["research_notes"]
25 count = state["revision_count"]
26 return {
27 "content": f"Blog post (Version {count+1}): Based on {notes}...",
28 "revision_count": count + 1
29 }
30
31def editor(state: GraphState):
32 print("--- EDITOR REVIEWING ---")
33 # Simulated logic: Only after the 2nd revision is it 'perfect'
34 if state["revision_count"] < 2:
35 return "re-write"
36 return "accept"
37
38# 3. Graph Construction
39workflow = StateGraph(GraphState)
40
41# Add nodes
42workflow.add_node("researcher", researcher)
43workflow.add_node("writer", writer)
44
45# Define edges
46workflow.set_entry_point("researcher")
47
48# Direct transition: Researcher ALWAYS sends data immediately to Writer
49workflow.add_edge("researcher", "writer")
50
51# Conditional transition: The editor check decides the path
52workflow.add_conditional_edges(
53 "writer",
54 editor, # The 'editor' function decides the next step
55 {
56 "re-write": "writer", # Back to Writer (Loop)
57 "accept": END # End process
58 }
59)
60
61# Compile
62app = workflow.compile()
63
64# Test run
65for output in app.stream({"content": "", "research_notes": "", "revision_count": 0}):
66 print(output)
676.4 Holarchies vs. Hierarchies: A Theoretical Digression
In the advanced consideration of agentic systems, a debate between hierarchical and holarchic structures is emerging. While hierarchies (like the supervisor model) offer clear chains of command, researchers argue that holarchies - in which each unit (holon) is both a whole and a part - are better suited for adaptive systems [8]. In a holarchy, each agent (holon) adapts autonomously to context changes, without waiting for central commands. This enables higher flexibility in dynamic environments where the problem structure changes during runtime ("Contextual Fluidity"). This stands in contrast to rigid hierarchies, which often become bottlenecks when the central manager is overloaded or cannot fully capture the local context.[9]
In this model, commands flow from top to bottom. The Central Manager is the bottleneck: All information must flow through them, and agents only act on instruction.
Here, each agent is a Holon - meaning simultaneously an independent whole and part of a larger system. There is no rigid top-down control; instead, holons communicate autonomously and adapt to context.
6.4.1 Summary of Differences
| Feature | Hierarchy | Holarchy |
| Control | Centralized (Top-Down) | Decentralized (Autonomous) |
| Flexibility | Rigid, prone to bottlenecks | High ("Contextual Fluidity") |
| Role of Unit | Subordinate Executor | Holon (Both whole and part) |
| Communication | Vertical (Command & Report) | Horizontal & Networked |
6.5 Pure Python Hand-off Architecture
For maximum performance, the team concept can also be realized without frameworks via hand-off logic. Each agent is a class or function here. The result of one function serves as input for the next. This eliminates dependencies and maximizes execution speed, but requires manually writing all orchestration logic. This is often the path for high-performance applications where every millisecond of latency counts.
7. Event-Driven & Declarative Architectures
Parallel to the graph-based and linear approaches, paradigms are establishing themselves that focus on events and declaration to solve specific problems of scalability and optimization.
7.1 LlamaIndex Workflows: Event-Driven Architecture
LlamaIndex, originally known as a library for data ingestion and RAG, has introduced an event-driven model with "Workflows" [10]. Instead of imperatively programming "Step A then Step B", you define: "When Event X occurs, execute Step Y".
- Mechanism: Components emit Events (e.g., RetrievalEvent or ReasoningEvent), to which other components react asynchronously (Listener Pattern).
- Advantage: This maximally decouples components and enables highly flexible, asynchronous systems. It is ideal for complex RAG scenarios where, for example, multiple documents need to be processed in parallel and the next step only starts when all events have arrived. It allows natural modeling of parallelism that often seems cumbersome in rigid graphs.
1from llama_index.core.workflow import Workflow, step, StartEvent, StopEvent, Event
2
3class ResearchEvent(Event):
4 content: str
5
6class MyWorkflow(Workflow):
7 @step
8 async def step_1(self, ev: StartEvent) -> ResearchEvent:
9 # Reacts to the start and "fires" a ResearchEvent
10 return ResearchEvent(content="AI data found")
11
12 @step
13 async def step_2(self, ev: ResearchEvent) -> StopEvent:
14 # Waits for a ResearchEvent
15 return StopEvent(result=f"Report about: {ev.content}")
167.2 DSPy: Declarative Self-Improving Python
DSPy (developed at Stanford University) takes a radically different approach [11]: Programming instead of Prompting. The central thesis is that manual prompt optimization ("Prompt Engineering") is fragile, not scalable, and error-prone.
- Architecture: Developers define signatures (input/output types, e.g., Question → Answer) and modules. An "Optimizer" (Teleprompter) then compiles this code into optimal prompts.
- Mechanism: The optimizer runs test runs, automatically selects the best few-shot examples and adapts the instructions to maximize a defined metric (e.g., accuracy).
- Implication: The system optimizes itself. When the underlying LLM changes (e.g., switching from GPT-4 to Llama-3), the prompt doesn't need to be manually rewritten; the compiler is simply re-executed ("Re-Compile"). This promises highest precision and robustness and shifts the focus from the "art" of prompting to the "engineering craft" of system definition.
1import dspy
2
3class AnswerQuestion(dspy.Signature):
4 """Answer questions briefly and precisely."""
5 question = dspy.InputField()
6 answer = dspy.OutputField()
7
8# DSPy finds the optimal prompt for this goal in the background
9calculator = dspy.ChainOfThought(AnswerQuestion)
10print(calculator(question="Why is the sky blue?").answer)
118. Long-Running AI Agentic Systems: Persistence and Memory
A fundamental difference between a chatbot and a real agentic system is the lifespan of the process. Transient agents only exist for the duration of a request. Long-lived agents ("Long-Running Systems") must work on a task for days or weeks, be able to "sleep", and know exactly where they were after a restart. They must be robust against infrastructure failures.
8.1 State-Based Persistence (LangGraph)
The architecture for longevity in LangGraph is based on Checkpoints. After each individual atomic step in the graph, the entire state is serialized and stored in a database (e.g., SQLite, Postgres).
- Resiliency: If the server crashes or is restarted, the agent can load the last valid state using the thread_id and continue as if nothing happened. No progress is lost.
- Time Travel: This architecture even allows developers to travel back in time. You can load the state at an earlier point, manually correct it (e.g., edit a hallucinated response or override an incorrect tool output) and restart the agent from there ("Replay"). This is a powerful tool for debugging and human-in-the-loop corrections.
The diagram shows how LangGraph stores the state in a Checkpoint Storage after each step. This enables fault tolerance (Resiliency) and jumping back in history (Time Travel).
In this example, we use the MemorySaver (an in-memory checkpointer). For production systems, you would simply replace it with a SqliteSaver or PostgresSaver.
1import uuid
2from typing import TypedDict, Annotated
3from langgraph.graph import StateGraph, START, END
4from langgraph.checkpoint.memory import MemorySaver
5
6# 1. State Definition
7class AgentState(TypedDict):
8 input: str
9 steps: list[str]
10
11# 2. Node Definitions
12def research_step(state: AgentState):
13 print("--- Executing Research ---")
14 return {"steps": state["steps"] + ["Research completed"]}
15
16def writing_step(state: AgentState):
17 print("--- Writing Report ---")
18 return {"steps": state["steps"] + ["Report created"]}
19
20# 3. Graph Construction
21workflow = StateGraph(AgentState)
22workflow.add_node("research", research_step)
23workflow.add_node("writer", writing_step)
24
25workflow.add_edge(START, "research")
26workflow.add_edge("research", "writer")
27workflow.add_edge("writer", END)
28
29# 4. Persistence: Add Checkpointer
30checkpointer = MemorySaver()
31app = workflow.compile(checkpointer=checkpointer)
32
33# 5. Execution with a thread_id
34config = {"configurable": {"thread_id": "project-123"}}
35
36print("Starting first phase...")
37initial_input = {"input": "AI Market Analysis", "steps": []}
38app.invoke(initial_input, config)
39
40# --- Simulation: System Restart or Later Continuation ---
41print("\n--- Simulation: Continuation after Pause ---")
42# We use the same thread_id to load the state
43current_state = app.get_state(config)
44print(f"Last known state: {current_state.values['steps']}")
45
46# Time Travel / Manual Correction
47# We could edit the state here before continuing
48# app.update_state(config, {"steps": ["Correction: Research was incomplete"]})
498.2 Knowledge-Based Persistence (CrewAI)
CrewAI approaches the topic via a hierarchical storage system, not primarily through technical state serialization at the graph level.
- Short-Term Memory: Stores the context of the current execution (RAG) to have relevant information ready during discussion.
- Long-Term Memory: Stores insights permanently in a database, so the agent knows what it learned in earlier sessions even after a complete system restart.
- Entity Memory: Stores specific information about entities (e.g., users, projects) to maintain consistency.
- Human-in-the-Loop: Through features like human_input=True, an agent can pause and wait for feedback. However, since the process status is often held in RAM, technical robustness against crashes is often lower here than with LangGraph's database-based state machine, which persists every step.
This diagram illustrates how CrewAI uses the different memory levels to simulate persistence and retain information across multiple executions.
In CrewAI, persistence is activated by setting the flag memory=True. To make this "long-running" in a production environment, the data would normally be stored in a vector database (like ChromaDB or Pinecone), which CrewAI manages in the background.
1from crewai import Agent, Task, Crew, Process
2
3# Example: An agent that learns from past interactions
4# This requires the installation of 'crewai' and 'langchain'.
5
6# 1. Define agents
7researcher = Agent(
8 role='Research Specialist',
9 goal='Analyze the latest trends in {topic} and remember important facts.',
10 backstory='You are an expert in market research and remember previous analyses.',
11 verbose=True,
12 allow_delegation=False,
13 # Activates the memory system (Short-Term, Long-Term, Entity)
14 memory=True
15)
16
17# 2. Define task
18research_task = Task(
19 description='Identify 3 core trends in the field of {topic}. Ask the user for feedback before finalizing the result.',
20 expected_output='A detailed report of the top 3 trends.',
21 agent=researcher,
22 # Human-in-the-Loop: Pauses execution for user input
23 human_input=True
24)
25
26# 3. Assemble the crew
27my_crew = Crew(
28 agents=[researcher],
29 tasks=[research_task],
30 process=Process.sequential,
31 # Global activation of memory for the entire crew
32 memory=True,
33 verbose=True
34)
35
36# 4. Start execution
37# Upon system restart with the same database, the agent would be able to
38# access the Long-Term Memory.
39result = my_crew.kickoff(inputs={'topic': 'Artificial Intelligence 2026'})
40
41print("######################")
42print("FINAL RESULT:")
43print(result)
448.3 The "Memory Wall" and Challenges
A central challenge for long-lived systems is the "Memory Wall" - the limitation of the context window and the cost of tokens. Since the entire history of weeks cannot fit into a prompt, architectures must use strategies like Summarization (periodic summary of old states) or Vector Embeddings to dynamically load relevant memories. This requires a sophisticated memory architecture that goes beyond pure database persistence and evaluates semantic relevance.
8.4 Comparative Summary of Long-Running Architectures
| Feature | LangGraph (State-Based) | CrewAI (Memory-Based) |
|---|---|---|
| What is stored? | The exact "point" in the program flow. | The learned knowledge and facts. |
| Resumption | Exactly at the last checkpoint. | Restarts task but uses old knowledge. |
| Best Application | Highly complex logic graphs. | Knowledge-intensive role-plays & teams. |
With this shift to CrewAI, it becomes clear that "Long-Running" in the AI world has two faces: The technical freezing of a state (LangGraph) and the continuous learning and waiting for humans (CrewAI).
9. Standardized Protocols: The Future of Interoperability
For agents not to operate in isolated silos, standards for communication are emerging. The year 2026 is characterized by the battle for these protocols, which should form the backbone of an "Internet of Agents".
9.1 Model Context Protocol (MCP)
The MCP developed by Anthropic is an open standard that revolutionizes the connection of data sources to AI models [12]. The specification was primarily authored by David Soria Parra and Justin Spahr-Summers.
- Problem: Previously, a separate integration had to be written for each tool (Google Drive, Slack, SQL) for each agent (M x N Problem).
- Solution: MCP defines a universal interface (based on JSON-RPC). An "MCP Server" provides data or tools. Any "MCP Client" (e.g., Claude Desktop, IDEs like Cursor, or custom agents) can use these immediately.
- Analogy: It is the "USB-C for AI applications". Once standardized, the connector fits everywhere. This enables developers to write a connector once and use it across all MCP-compatible platforms.
1from fastapi import FastAPI
2
3app = FastAPI()
4
5# A standardized endpoint that describes tools for agents
6@app.get("/tools")
7async def list_tools():
8 return {
9 "tools": [{
10 "name": "query_database",
11 "description": "Searches in the customer database",
12 "input_schema": {"type": "object", "properties": {"query": {"type": "string"}}}
13 }]
14 }
15
16@app.post("/call")
17async def call_tool(tool_name: str, arguments: dict):
18 # Standardized execution
19 return {"status": "success", "data": "Result from DB"}
209.2 Agent-to-Agent Protocol (A2A)
While MCP solves the vertical connection (agent to data/tools), the A2A protocol initiated by Google addresses horizontal collaboration (agent to agent). It was introduced in April 2025.
- Concept: How does a travel agent find a calendar agent without them being explicitly linked? A2A defines mechanisms for discovery. Each agent publishes an agent.json (similar to a robots.txt or business card) that describes its capabilities.
- Collaboration: Agents can commission each other, negotiate tasks, and exchange results without knowing the internal implementation of the other. This enables dynamic ecosystems where agents can form ad-hoc teams.
- Mechanisms: The protocol uses standard web technologies (HTTP, JSON) and supports both synchronous REST calls and asynchronous webhooks for long-running tasks.
1from fastapi import FastAPI
2from pydantic import BaseModel
3
4app = FastAPI()
5
6# 1. Agent Card Definition
7@app.get("/.well-known/agent.json")
8async def get_agent_card():
9 return {
10 "name": "GlobalResearchBot",
11 "description": "Specialist for web search and summarization",
12 "endpoints": {"execute": "/api/v1/execute"},
13 "capabilities": ["web_search", "summarization"]
14 }
15
16# 2. Implementation of capabilities
17class TaskRequest(BaseModel):
18 capability: str
19 input_data: str
20
21@app.post("/api/v1/execute")
22async def execute_task(request: TaskRequest):
23 if request.capability == "web_search":
24 # Logic for web search
25 return {"result": f"Search for {request.input_data} successful."}
26
27 elif request.capability == "summarization":
28 # Logic for summarization
29 return {"result": f"Summary of {request.input_data[:20]}..."}
30
31 return {"error": "Capability not supported"}
329.3 Comparison of MCP and A2A Protocols
| Feature | Model Context Protocol (MCP) | Agent-to-Agent (A2A) |
| Primary Focus | Vertical: Connection of AI models with data & tools | Horizontal: Connection of autonomous agents with each other |
| Originator | Anthropic (Open Source via MIT License) | Google (Open Source via Linux Foundation) |
| Core Mechanism | JSON-RPC (local or remote) | HTTP / REST / Webhooks |
| Discovery | Explicit configuration of servers | Dynamic discovery via agent.json (Agent Card) |
| Main Application | IDEs, Desktop Apps, Standard Tooling | Decentralized agent networks, B2B agent communication |
| Analogy | "USB-C Connector" | "Universal Translator / Directory" |
10. Conclusion and Strategic Recommendations
The choice of architecture is not a mere matter of taste, but a fundamental strategic decision that determines scalability, maintainability, and success of an AI project.
10.1 Comparative Decision Matrix of Architectures
| Feature | Pure Python | LangChain | LangGraph | CrewAI | LlamaIndex |
| Abstraction Level | Low (Code) | Medium (Pipelines) | Medium (Graphs) | High (Roles) | Medium (Events) |
| Control & Debugging | Maximum | Medium (LCEL opaque) | High (Explicit State) | Low (Black Box) | High |
| State Management | Manual | Limited | Native (Cyclic, DB) | Automatic (Memory) | Native (Async) |
| Multi-Agent | Complex | Possible | Very Good (Graph) | Excellent (Team) | Good |
| Production Readiness | High (Stable) | High (for RAG) | High (Enterprise) | Medium (Unstable) | High (Data-Heavy) |
| Primary Focus | Performance | Simple RAG | Complex Processes | Creativity/Teams | Async Workflows |
10.2 Strategic Outlook
For companies and developers, the following recommendations can be derived:
- Beginners & Simple RAG: Use LangChain. It offers the fastest path to the goal for standard tasks where document processing is in the foreground.
- Process Automation & Enterprise: Use LangGraph. Control over cycles, persistence ("Time Travel"), and error handling is indispensable for business-critical applications. It is the most robust framework for long-running processes.
- Creative Teams & Content: Excels for exploratory tasks and brainstorming where the interaction of different "personalities" offers added value and strict reproducibility is secondary.
- Data-Intensive Workflows: For asynchronous, event-driven processes, LlamaIndex is the first choice.
- Interoperability: Rely on standards. Stop building proprietary tool integrations and instead develop MCP servers. Prepare your agents for A2A to be future-proof and able to participate in agentic networks.
The trend is clearly moving away from rigid scripts toward adaptive, long-lived systems. The architecture of the future is no longer a single pipeline, but a network of specialized, autonomous nodes that communicate via standardized protocols and are capable of correcting their own errors through reflection. The "magic" of AI no longer lies only in the model, but in the structure we give it - the "Reasoning Loop", the memory, and the orchestration.
Further posts
- AI-Agents 01 - Beyond Automation: Designing Cognitive Architectures for AI-Agents
- AI-Agents 02 - The Architectural Spectrum of Agentic Systems
- AI-Agents 03 - Self-Reflection in Agentic Systems
Sources
[1]
A. Ng, "Agentic AI: The Next Step for Large Language Models," The Batch, 2024. [Online]. Available: https://www.deeplearning.ai/the-batch/
[2]
S. Yao et al., "ReAct: Synergizing Reasoning and Acting in Language Models," ICLR, 2023.
[3]
H. Chase, "LangChain Documentation," 2024. [Online]. Available: https://python.langchain.com/
[4]
P. Lewis et al., "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks," NeurIPS, 2020.
[5]
LangChain, "LangGraph: Building Language Agents as Graphs," LangChain Blog, 2024. [Online]. Available: https://blog.langchain.dev/langgraph/
[6]
CrewAI, "CrewAI - Orchestrating Role-Playing Agents," 2024. [Online]. Available: https://www.crewai.com/
[7]
Q. Wu et al., "AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation," ArXiv preprint arXiv:2308.08155, 2023.
[8]
A. Koestler, The Ghost in the Machine. London: Hutchinson, 1967.
[9]
Giret, A., & Botti, V., "Holonic Multi-agent Systems," Computer Science and Information Systems, 2004.
[10]
LlamaIndex, "LlamaIndex Workflows: Event-Driven Agentic AI," 2024. [Online]. Available: https://www.llamaindex.ai/
[11]
O. Khattab et al., "DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines," ICLR, 2024.
[12]
Anthropic, "Introducing the Model Context Protocol," 2024. [Online]. Available: https://www.anthropic.com/news/model-context-protocol
Contact me
Contact me
You got questions or want to get in touch with me?
- Name
- Michael Schöffel
- Phone number
- Mobile number on request
- Location
- Germany, exact location on request
- [email protected]
Send me a message
* By clicking the 'Submit' button you agree to a necessary bot analysis by Google reCAPTCHA. Cookies are set and the usage behavior is evaluated. Otherwise please send me an email directly. The following terms of Google apply: Privacy Policy & Terms of Service.