1. Introduction

The quantity of frameworks emerging for GenAI application development is incredible and, in my opinion, is becoming excessive. Every time a new framework appears, it seems to do more or less the same things as the previous one. Perhaps some have better modularization capabilities or more robust design against long-term obsolescence, but they all seem pretty much the same to me and, although I enjoy experimenting, the study of new miraculous and promising GenAI frameworks is becoming less and less appealing.

Just to give you an idea, a few days ago, I came across AnyAgent: yet another framework that acts as a wrapper for other frameworks!

Don’t get me wrong, the work that Mozilla’s AI team is doing is commendable and I also recommend reading their blog, which publishes very interesting articles, but this kind of Kafkaesque superstructure seems to introduce more problems than it solves because each stratification complicates software maintainability in the long run, not to mention all the issues with dependencies, compatibility, and updates.

I also find it unnecessarily wasteful to specialize vertically in one framework or another, both for lock-in issues and to not lose that abstraction capability I need in designing an end-to-end solution.

Another area where almost all frameworks are showing their limitations is in multi-agent support. The very first GenAI development frameworks focused primarily on LLM wrapping, tracing, and chaining, while now most attention is directed toward agentic (I hate this word) capabilities, but in the vast majority of cases, the original framework design has remained the same.

Most of these frameworks are still monolithic, making it difficult to realize a true multi-agent application without being constrained to the specific framework and underlying runtime.

This is where A2A (Agent-to-Agent) comes into play. The A2A protocol aims to enable smooth communication and collaboration between agents, regardless of the framework or runtime used, through standardized messaging formats and transport protocols. Rather than continuing to study or delve deeper into this or that framework, I decided to take the A2A protocol seriously and try to implement an end-to-end application solution compatible with this standard.

The first step was solving the discovery problem, and in my previous post I described how to build a simple Agent Registry that fits well with the A2A protocol.

Subsequently, I wanted to dive into a concrete use case, so I looked for an interesting application to tackle with my new A2A-compliant Multi-Agent architecture. I then happened to come across the AlphaAgents paper, published a few days ago on arXiv by the BlackRock team.

The paper describes a multi-agent collaboration approach to support stock selection and financial analysis starting from a set of securities. I thought it was a perfect use case and jumped right in!

Unfortunately, the BlackRock team did not release the source code, so I had to implement the entire AlphaAgent solution by myself, using the paper as a high-level specification. I still had to make some assumptions and trade-offs; for example, the information extraction and synthesis tools for the fundamental analysis agent do not actually use RAG, but rather very basic keyword or prefix lookup techniques based on the data returned by Yahoo Finance. As a result, the implementation is not 100% compliant with what is described in the paper, but it can serve as a valid starting point.

In any case, my focus was more on the architecture than on the reliability of the individual agents’ responses. I also aimed for an agnostic approach regarding the underlying GenAI framework, where the orchestrator agent can search for and select specialized agents at runtime, thus completely ignoring the underlying details in terms of models, languages, and libraries.

In the following chapters, I’ll briefly recap what AlphaAgents is and how it works in theory, then describe the architecture of my A2A-compliant implementation and some technical details, and finally show what happens during each step of a concrete execution.

I’ll add a disclaimer that’s probably obvious but mandatory: the system output is intended purely as a demonstration of the multi-agent solution and should not be considered in any way as input for any type of financial decision. I also point out that, although I’ve performed some spot checks for consistency on the calculated KPIs and found results more or less in line with expectations, no thorough or rigorous validation of the results was performed. Therefore, I cannot rule out the possibility of systematic errors or responses influenced by LLM hallucinations.

2. Brief Introduction to AlphaAgents

The AlphaAgents paper introduces an innovative approach to the stock selection problem through a multi-agent system based on Large Language Models (LLMs).

2.1 The Problem and Solution

Traditional portfolio analysis requires considerable human effort to process enormous amounts of heterogeneous data: SEC documents (10-K, 10-Q), earnings call transcripts, target prices, financial ratios, market news, and sector research. This process also presents significant challenges in terms of scalability and objectivity. One of the risks cited multiple times in the paper is that of leaning toward sub-optimal allocations due to subjective factors such as risk aversion or excessive confidence in one’s intuitions.

AlphaAgents address these problems through a multi-agent framework that incorporates some adversarial reasoning techniques and divides financial analysis into distinct specialization areas, each managed by a dedicated agent.

2.2 The Three Agents

Fundamental Agent: Handles traditional fundamental analysis, examining financial statements, cash flows, and financial performance in depth through analysis of SEC documents and financial reports.

Sentiment Agent: Analyzes market sentiment through processing financial news, analyst rating changes, and corporate disclosures to assess potential impact on stock prices.

Valuation Agent: Focuses on technical and quantitative analysis, processing historical price and volume data to calculate volatility metrics, risk-adjusted returns, and valuation indicators.

2.3 Collaboration and Debate Mechanism

The framework implements a centralized coordination mechanism through a Group Chat Agent that manages interaction between specialized agents. This coordinator agent:

Routes requests to the most appropriate agents based on the type of analysis required using an LLM-based approach
Manages the structured debate process implementing a round-robin approach
Ensures fair participation by ensuring each agent speaks at least twice during the debate
Facilitates convergence through a consensus building system that involves sharing intermediate results with all agents
Synthesizes multiple analyses into a coherent report highlighting the debate process
Mitigates hallucinations through cross-agent comparison and collaborative validation

In summary: agents engage in sequential turns until consensus is reached, eliminating the risk that some agents dominate the discussion.

graph TD A[User Request] --> B[Group Chat Agent] B --> C{LLM-based Debate Detection} C -->|Investment Decision| D[Structured Debate Mode] C -->|Factual Query| E[Simple Coordination] D --> F[Round 1: Initial Positions] F --> G[Agent 1 Turn] G --> H[Agent 2 Turn] H --> I[Agent 3 Turn] I --> J[Round 2: Debate & Counter-arguments] J --> K[Agent 1 Response] K --> L[Agent 2 Response] L --> M[Agent 3 Response] M --> N{Consensus Check} N -->|No Consensus| O[Additional Rounds
Max 5 rounds] O --> J N -->|Consensus| P[Participation Check
Min 2 turns each] P -->|Insufficient| Q[Extra Turns] Q --> P P -->|Sufficient| R[Debate Consolidation] E --> S[Concurrent Agent Calls] S --> T[Simple Consolidation] R --> U[Final Report with Debate Metadata] T --> V[Final Report - Standard] style B fill:#e1f5fe style D fill:#fff3e0 style N fill:#ffecb3 style R fill:#e8f5e8 style C fill:#f3e5f5

2.4 Intelligent Risk Tolerance Management

One innovative aspect of the framework is the incorporation of risk tolerance profiles indicated by the user. The paper doesn’t specify precisely how the BlackRock team implemented the introduction of these profiles, but I hypothesize they used prompting mechanisms to automatically infer the risk level from the user query. Starting from this query analysis, a precise indication of the desired risk profile is then extrapolated, which is then used in all interactions with the agents.

This simple expedient therefore allows adding a level of personalization to the recommendations provided by the agents.

2.4.1 Risk Profile Classification

Risk-averse: Conservative investors who prioritize capital preservation, stable returns, dividend-paying stocks, low volatility, safety-first approach
Risk-neutral: Balanced investors seeking moderate growth with reasonable risk, diversified portfolios, standard market exposure
Risk-seeking: Aggressive investors pursuing high returns, accepting high volatility, growth stocks, speculative investments

2.4.2 Automatic Classification Examples

🟢 Risk-Averse (Conservative):
"I'm looking for safe and stable investments for my retirement"
"I'm looking for safe stocks with regular dividends"
"I want to preserve my capital without risks"

🟡 Risk-Neutral (Balanced):  
"Should I invest in Tesla? I want a balanced analysis"
"Should I invest in Apple stock?"
"Microsoft stock analysis for my portfolio"

🔴 Risk-Seeking (Aggressive):
"I want high-growth stocks even if volatile"
"I want maximum returns, I can handle high risk"
"I'm looking for high-potential investments"

2.4.3 Integration in the Debate Process

The inferred risk profile is automatically:

Passed to all specialized agents via metadata in the JSON-RPC message
Maintained throughout the debate for recommendation consistency
Logged for decision process transparency
Used to personalize the analyses of each specialist agent

2.5 Advantages of Multi-Agent Approach with Structured Debate

The system offers several benefits compared to traditional analysis:

Bias mitigation: Each agent operates independently, reducing the impact of human cognitive biases
Fair participation: The turn-taking mechanism allows all agents to contribute fairly
Collaborative validation: Individual positions are challenged and refined through direct confrontation
Guided convergence: The consensus building system leads to more robust and thoughtful decisions
Automatic personalization: Risk tolerance inferred automatically for tailored analyses
Multilingual adaptation: Native system for users of any language without configuration
Scalability: Ability to process much larger volumes of information
Specialization: Each agent can develop deep expertise in its domain
Flexibility: The framework can be extended with additional agents (e.g., Technical Analysis Agent, Macro Economist Agent)

3. Starting Point

To implement the code, I made extensive use of Claude Code, but I dedicated a significant amount of time to defining the detailed technical specifications compared to what I had in mind after reading the paper (e.g., class structure, tool management, etc.), as well as testing and modifying or revising some classes and methods.

Assuming that each agent could be implemented independently, I based myself for simplicity on the following ecosystem:

Runtime: Python3.12
Package manager: uv
GenAI Framework: Langchain Agents (but it could be replaced with another framework like Autogen, CrewAI, Smolagents, etc)
Serving Layer: FastAPI
Communication protocol: JSON-RPC
Testing: simple relaunchable python scripts
Architecture based on Agent Registry, with dynamic discovery of Agent Card for each agent
LLM: for simplicity I used GPT-4o, without worrying too much about model choice

In the following chapters, I illustrate in detail the architecture and implementation. Obviously, one of the preparatory steps was also to equip each agent with an endpoint to expose the Agent Card according to the usual “.well-known/agent-card” path and register the agents within the Agent Registry (Figure 1)


Figure 1 - Adding the Valuation Agent

4. High-Level Design

The architecture is quite simple: it consists of:

1 Agent Registry that manages agent registration and discovery (this is the module I discussed in my previous article)
3 specialized agents, each running on a specific host and port (in my case, localhost and ports 3001-3003)
1 orchestration agent, which receives user inputs, calls other agents, and produces the final response.

4.1 Overall Logical Architecture

I launched all agents locally and configured the Agent Cards accordingly, but obviously the system is designed to also support distributed deployments.

%%{init: {"themeVariables": { "fontSize": "16px", "nodeSpacing": 50, "rankSpacing": 70 }}}%% graph TD %% Client Layer subgraph "Client Layer" CLI[CLI Client] WEB[Web Interface] API[API Client] end %% A2A Protocol Layer subgraph "A2A Protocol Layer" JSONRPC[JSON-RPC Transport] A2AMSG[A2A Message Format] end %% Agent Registry subgraph "Agent Registry" REG[Agent Registry Service] REGDB[(Agent Cards DB)] end %% Orchestration Layer subgraph "Orchestration Layer" GCA[Group Chat Agent] ROUTER[Intelligent Routing] CONSOLIDATOR[Response Consolidator] end %% Specialized Agents subgraph "Specialized Agents" FA[Fundamental Agent
:3002] SA[Sentiment Agent
:3003] VA[Valuation Agent
:3001] end %% External Data Sources subgraph "External Data Sources" YF[Yahoo Finance API] SEC[SEC EDGAR API] NEWS["Financial News APIs"] end %% AI/LLM Layer subgraph "AI/LLM Layer" GPT[OpenAI GPT-4o] VADER[VADER Sentiment] TEXTBLOB[TextBlob Analysis] end %% Connessioni principali CLI --> JSONRPC WEB --> JSONRPC API --> JSONRPC JSONRPC --> A2AMSG A2AMSG --> GCA GCA --> REG REG --> REGDB GCA --> ROUTER ROUTER --> FA ROUTER --> SA ROUTER --> VA FA --> CONSOLIDATOR SA --> CONSOLIDATOR VA --> CONSOLIDATOR CONSOLIDATOR

4.2 Registration Process

In my case, I manually performed registration on the agent registry (Figure 1), however the process could easily be automated. Essentially, each Agent that wants to be published within the organization could autonomously register on the Agent Registry.

Whether in the case of manual or automatic registration, the Group Chat Agent can then query the Agent Registry and discover the specialized agents needed to respond to a user request and decide how to orchestrate communication between them.

sequenceDiagram participant GCA as Group Chat Agent participant REG as Agent Registry participant FA as Fundamental Agent participant SA as Sentiment Agent participant VA as Valuation Agent Note over FA,VA: Agent Startup & Registration FA->>REG: POST /agents/register
{agent_card, endpoints} REG-->>FA: 200 OK {agent_id} SA->>REG: POST /agents/register
{agent_card, endpoints} REG-->>SA: 200 OK {agent_id} VA->>REG: POST /agents/register
{agent_card, endpoints} REG-->>VA: 200 OK {agent_id} Note over GCA,REG: Agent Discovery GCA->>REG: GET /agents/search?skills=fundamental,sentiment,valuation REG-->>GCA: 200 OK {agents[]} Note over GCA,VA: Health Check & Validation GCA->>FA: GET /health FA-->>GCA: 200 OK {status: healthy} GCA->>SA: GET /health SA-->>GCA: 200 OK {status: healthy} GCA->>VA: GET /health VA-->>GCA: 200 OK {status: healthy}

4.3 Agent Card

Each agent exposes its capabilities through an Agent Card compliant with the A2A schema. For example, here is the Agent Card for the Fundamental Agent:

{
  "name": "Fundamental Analysis Agent",
  "description": "Specialized agent for fundamental equity analysis. Analyzes 10-K and 10-Q financial reports, financial statements, and company fundamentals to assess stock value and projected trajectory. Provides qualitative and quantitative analysis of a company's financial performance, cash flow, operations, gross margins, and progress towards stated objectives.",
  "version": "1.0.0",
  "protocolVersion": "0.3.0",
  "url": "http://0.0.0.0:3003/",
  "preferredTransport": "JSONRPC",
  "provider": {
    "organization": "AlphaAgents Financial",
    "url": "https://AlphaAgents-dummy-corp.ai"
  },
  "capabilities": {
    "streaming": true,
    "pushNotifications": true,
    "stateTransitionHistory": true
  },
  "defaultInputModes": [
    "text/plain",
    "application/json",
    "application/pdf"
  ],
  "defaultOutputModes": [
    "text/plain",
    "application/json"
  ],
  "skills": [
    {
      "id": "fundamental_analysis",
      "name": "Fundamental Financial Analysis",
      "description": "Performs comprehensive fundamental analysis of stocks using 10-K and 10-Q reports. Analyzes financial statements, cash flow, income statements, balance sheets, operations, gross margins, and company progress towards objectives. Provides buy/sell/hold recommendations based on fundamental metrics and risk tolerance.",
      "tags": [
        "financial-analysis",
        "10k-reports",
        "10q-reports",
        "financial-statements",
        "cash-flow-analysis",
        "fundamental-research",
        "equity-valuation"
      ],
      "examples": [
        "Analyze Apple's latest 10-K report and provide a fundamental analysis with recommendation",
        "Evaluate Microsoft's cash flow trends and operational efficiency from recent financial filings",
        "Compare Tesla's fundamental metrics against industry benchmarks and provide investment recommendation"
      ]
    },
    {
      "id": "financial_report_extraction",
      "name": "Financial Report Data Extraction",
      "description": "Extracts and processes key financial data from 10-K and 10-Q SEC filings. Uses RAG capabilities to retrieve relevant sections and perform targeted analysis on specific financial metrics, ratios, and disclosures.",
      "tags": [
        "data-extraction",
        "sec-filings",
        "financial-metrics",
        "rag-analysis",
        "document-processing"
      ],
      "examples": [
        "Extract revenue growth metrics from the latest quarterly report",
        "Pull debt-to-equity ratios and leverage analysis from annual filing",
        "Identify key risk factors mentioned in the 10-K management discussion"
      ]
    },
    {
      "id": "risk_assessment",
      "name": "Fundamental Risk Assessment",
      "description": "Assesses fundamental risks based on financial health indicators, debt levels, cash flow stability, competitive positioning, and management guidance. Tailors analysis to different risk tolerance profiles (risk-averse, risk-neutral).",
      "tags": [
        "risk-assessment",
        "financial-health",
        "debt-analysis",
        "competitive-analysis",
        "risk-tolerance"
      ],
      "examples": [
        "Assess the fundamental risk profile of Netflix given current debt levels and competitive pressures",
        "Evaluate Amazon's financial stability risks for risk-averse investors",
        "Analyze fundamental downside risks for Google's cloud business segment"
      ]
    }
  ],
  "additionalInterfaces": [
    {
      "url": "http://0.0.0.0:3003/",
      "transport": "JSONRPC"
    }
  ]
}

4.4 End-to-End Analysis Flow

The analysis process requires the orchestrator (the group chat) to execute an intelligent routing system based on LLM that:

Automatically detects whether the query requires structured debate (investment decisions) or simple coordination (informational queries)
Determines which agents to involve to perform detailed analysis using multilingual semantic analysis
Manages the Round Robin debate process when necessary, ensuring at least 2 turns per agent

For questions regarding investment decisions, the system activates the structured debate mechanism:

Initial round: Each agent presents its initial position sequentially
Debate rounds: Agents confront each other in turns, presenting counter-arguments
Consensus check: Verification of 75% agreement after each round
Participation enforcement: Ensures minimum 2 turns per agent
Final synthesis: Consolidates final positions highlighting the debate process

For informational queries, the system uses traditional parallel coordination to maximize speed and efficiency.

graph TD A[User Request] --> B[Group Chat Agent] B --> C{LLM-based Debate Detection
Multi-language Support} C -->|Investment Decision| D[Structured Debate Path] C -->|Information Query| E[Simple Coordination Path] subgraph "Structured Debate (Sequential)" D --> F[Initial Round - Sequential] F --> G[Agent 1: Initial Analysis] G --> H[Agent 2: Initial Analysis] H --> I[Agent 3: Initial Analysis] I --> J[Debate Round 2] J --> K[Agent 1: Counter-arguments] K --> L[Agent 2: Counter-arguments] L --> M[Agent 3: Counter-arguments] M --> N{Consensus Check
75% Threshold} N -->|No| O[Continue Debate
Max 5 Rounds] O --> J N -->|Yes| P{Participation Check
Min 2 turns each} P -->|No| Q[Extra Turns for Equity] Q --> P P -->|Yes| R[Debate Consolidation
with Metadata] end subgraph "Simple Coordination (Parallel)" E --> S[A2A Message/Send
to Required Agents] S --> T[Concurrent Processing] T --> U[Standard Consolidation] end R --> V[Final Report + Debate History] U --> W[Final Report - Standard] style B fill:#e1f5fe style C fill:#f3e5f5 style D fill:#fff3e0 style N fill:#ffecb3 style R fill:#e8f5e8

4.5 Transport Layer and Messaging Protocol

All inter-agent traffic uses A2A-compliant JSON-RPC protocol:

Request Format:

{
  "jsonrpc": "2.0",
  "method": "message/send",
  "params": {
    "message": {
      "kind": "message",
      "messageId": "uuid",
      "role": "user",
      "parts": [{"kind": "text", "text": "Analyze AAPL"}],
      "contextId": "uuid",
      "taskId": "uuid"
    },
    "metadata": {}
  },
  "id": "uuid"
}

Response Format:

{
  "jsonrpc": "2.0",
  "result": {
    "kind": "message", 
    "messageId": "uuid",
    "role": "agent",
    "parts": [{"kind": "text", "text": "Analysis results..."}],
    "contextId": "uuid",
    "metadata": {
      "analysis_type": "fundamental",
      "confidence_score": 0.85
    }
  },
  "id": "uuid"
}

5. Implementation

The AlphaAgents implementation was created with a focus on modularity, testability, and A2A protocol compliance. Each agent is an independent application with its own dependencies and can be deployed separately.

Implementation Note: As mentioned in the introduction, the implementation differs from the original paper in some technical aspects. In particular:

The Sentiment Agent uses VADER and TextBlob libraries which should represent a simple but effective solution since presumably news will be exclusively in English and won’t require deep contextual understanding. In the future, more sophisticated sentiment analysis models based on LLM or specific models like FinBERT could be integrated.
The Fundamental Agent employs keyword matching on Yahoo Finance data instead of true RAG techniques with vector databases
In general, I focused on architecture rather than specific performance of individual agents

All code is available on GitHub.

5.1 Technology Stack and Dependencies

The project uses Python 3.12 as runtime with uv as package manager for rapid dependency management. Below is an excerpt from the configuration file to give an idea of dependencies:

# pyproject.toml - Global Dependencies
[project]
name = "alpha_agents"
version = "0.1.0"
requires-python = ">=3.12"
dependencies = [
    "langchain>=0.3.27",
    "langchain-openai>=0.3.32",
    "langchain-community>=0.3.28",
    "openai>=1.102.0",
    "yfinance>=0.2.65",
    "pandas>=2.3.2",
    "numpy>=2.3.2",
    "fastapi>=0.116.1",
    "uvicorn>=0.35.0",
    "pydantic>=2.11.7",
    "python-dotenv>=1.1.1",
    "httpx>=0.28.1",
    "aiohttp>=3.12.15",
    "requests>=2.32.5",
    "beautifulsoup4>=4.13.5",
    "feedparser>=6.0.11",
    "newspaper3k>=0.2.8",
    "textblob>=0.18.0",
    "vaderSentiment>=3.3.2",
    "python-dateutil>=2.9.0"
]

5.2 Project Structure

Below is an overview of the project structure:

alpha-agents-implementation/
├── agents/
│   ├── groupchat-agent/          # Main orchestrator
│   │   ├── src/groupchat_agent/
│   │   │   ├── a2a_agent.py      # Coordination logic
│   │   │   ├── registry_service.py # Agent Registry client
│   │   │   └── server.py         # FastAPI server
│   │   └── pyproject.toml
│   ├── fundamental-agent/        # Fundamental analysis
│   │   ├── src/fundamental_agent/
│   │   │   ├── agent.py          # Langchain agent
│   │   │   ├── tools.py          # SEC/Finance tools
│   │   │   └── server.py         # A2A endpoint
│   │   └── pyproject.toml
│   ├── sentiment-agent/          # Sentiment analysis
│   │   └── src/sentiment_agent/
│   │   │   ├── agent.py          # Langchain agent
│   │   │   ├── tools.py          # News collection / Sentiment Analysis
│   │   │   └── server.py         # A2A endpoint
│   │   └── pyproject.toml
│   └── valuation-agent/          # Quantitative analysis
│       └── src/valuation_agent/
│   │   │   ├── agent.py          # Langchain agent
│   │   │   ├── tools.py          # Stock data collection / KPI calculation
│   │   │   └── server.py         # A2A endpoint
│   │   └── pyproject.toml

5.3 Group Chat Agent Implementation

The Group Chat Agent is the heart and entry point of the system, implemented in the A2AGroupChatAgent class. Its main responsibilities include:

Intelligent debate detection: Uses LLM to determine if the query requires structured debate
Automatic risk tolerance inference: Analyzes user message to identify risk profile
Multilingual routing: Identifies necessary agents regardless of query language
Debate orchestration: Manages fair participation through a round-robin turn system
Consensus monitoring: Verifies agreement and terminates debate when appropriate
Metadata enhancement: Automatically enriches metadata with risk_tolerance for all agents

5.3.1 Risk Tolerance Inference

The system uses an LLM-based classifier that analyzes the user’s message to infer the risk profile. The classification guidelines include:

Explicit risk preferences: Terms like “conservative,” “aggressive,” “high growth,” “safe,” “stable”
Investment timeframe: Mentions of “retirement,” “long term,” “quick gains”
Vocabulary analysis: Linguistic choices and tone (cautious vs confident vs speculative)
Asset types mentioned: Bonds/dividends = averse, growth stock = seeking
Multilingual support: Works in any language by analyzing intent and risk signals

As I anticipated earlier, I have no idea how the BlackRock researchers identified the risk profile. However, it’s easy to imagine they did something similar.

5.3.2 Code Excerpt

Below is an excerpt from the A2AGroupChatAgent class illustrating some of the main methods:

class A2AGroupChatAgent:
    def __init__(self, openai_api_key: str, registry_url: str, model_name: str = "gpt-4o"):
        self.llm = ChatOpenAI(
            model=model_name,
            openai_api_key=openai_api_key,
            temperature=0.1  # Low temperature for financial analysis
        )
        self.registry_service = AgentRegistryService(registry_url)
        self._agent_urls = {}
    
    def _requires_debate_analysis(self, user_message: str) -> bool:
        """Uses LLM to intelligently determine if structured debate is needed."""
        system_prompt = """You are a debate necessity classifier for a multi-agent financial analysis system.
        
        STRUCTURED DEBATE IS NEEDED when:
        1. The query asks for investment decisions or recommendations (buy/sell/hold advice)
        2. The query requires weighing conflicting factors or multiple perspectives
        3. The query asks for comprehensive analysis that would benefit from specialist disagreement/consensus
        4. The query involves risk assessment or strategic financial decisions
        5. The query asks for opinions, advice, or evaluations that could have multiple valid perspectives
        6. The query involves portfolio management decisions
        
        The query can be in ANY language. Focus on the intent and meaning, not specific keywords.
        Respond with ONLY "YES" if structured debate is needed, or "NO" if it's not needed."""
        
        response = self.llm.invoke([
            SystemMessage(content=system_prompt),
            HumanMessage(content=f"User query: {user_message}")
        ])
        
        return response.content.strip().upper() == "YES"
    
    def _enhance_metadata_with_risk_tolerance(
        self, user_message: str, metadata: Optional[Dict[str, Any]] = None
    ) -> Dict[str, Any]:
        """Extract or infer risk tolerance and add it to metadata."""
        
        enhanced_metadata = metadata.copy() if metadata else {}
        
        # If risk_tolerance already provided in metadata, use it
        if "risk_tolerance" in enhanced_metadata:
            logger.info(f"Using provided risk_tolerance: {enhanced_metadata['risk_tolerance']}")
            return enhanced_metadata
        
        # Otherwise, infer using LLM
        risk_tolerance = self._infer_risk_tolerance_from_message(user_message)
        enhanced_metadata["risk_tolerance"] = risk_tolerance
        
        logger.info(f"Inferred risk_tolerance: {risk_tolerance} from user message")
        return enhanced_metadata

    def _infer_risk_tolerance_from_message(self, user_message: str) -> str:
        """Use LLM to infer risk tolerance from user message content."""
        
        system_prompt = """You are a financial risk tolerance classifier. 
        Analyze the user's message to determine their risk tolerance profile.

        RISK TOLERANCE LEVELS:
        - "averse": Conservative investors (capital preservation, stable returns, dividends)
        - "neutral": Balanced investors (moderate growth with reasonable risk)  
        - "seeking": Aggressive investors (high returns, accept high volatility)

        CLASSIFICATION GUIDELINES:
        1. Look for explicit risk preferences ("conservative", "aggressive", "safe")
        2. Consider investment timeframe ("retirement", "long-term", "quick gains")
        3. Analyze vocabulary and tone (cautious vs confident vs speculative)
        4. Consider asset mentions (bonds/dividends = averse, growth stocks = seeking)
        
        The message can be in any language. Focus on intent and risk signals.
        Respond with ONLY one word: "averse", "neutral", or "seeking"."""

        try:
            response = self.llm.invoke([
                SystemMessage(content=system_prompt),
                HumanMessage(content=f"User message: {user_message}")
            ])
            
            risk_level = response.content.strip().lower()
            return risk_level if risk_level in ["averse", "neutral", "seeking"] else "neutral"
            
        except Exception as e:
            logger.error(f"Error inferring risk tolerance: {e}")
            return "neutral"  # Safe default

5.4 A2A Communication Patterns

Every communication between agents follows the A2A protocol using JSON-RPC:

async def _send_message_to_agent(
    self, agent_name: str, agent_url: str, 
    message: Dict[str, Any], metadata: Optional[Dict[str, Any]] = None
) -> Dict[str, Any]:
    """Send A2A-compliant message to specialized agents."""
    
    request_payload = {
        "jsonrpc": "2.0",
        "method": "message/send",
        "params": {
            "message": {
                "kind": "message",
                "messageId": str(uuid.uuid4()),
                "role": "user", 
                "parts": [{"kind": "text", "text": user_message}],
                "contextId": context_id,
                "taskId": task_id
            },
            "metadata": metadata or {}
        },
        "id": str(uuid.uuid4())
    }
    
    async with aiohttp.ClientSession(timeout=aiohttp.ClientTimeout(total=300.0)) as session:
        async with session.post(agent_url, json=request_payload, 
                              headers={"Content-Type": "application/json"}) as response:
            result = await response.json()
            
            # Extract content from A2A response
            a2a_result = result.get("result", {})
            if a2a_result.get("kind") == "message":
                parts = a2a_result.get("parts", [])
                analysis = " ".join([p.get("text", "") for p in parts 
                                   if p.get("kind") == "text"])

5.5 Specialized Agent Implementation

5.5.1 Fundamental Agent

The Fundamental Agent uses Langchain with tool calling pattern to access financial data:

class FundamentalAgent:
    def __init__(self, openai_api_key: str, model_name: str = "gpt-4o"):
        self.llm = ChatOpenAI(
            api_key=openai_api_key,
            model=model_name,
            temperature=0.1,
            max_tokens=4000
        )
        
        # Specialized tools for fundamental analysis
        self.tools = get_fundamental_tools()  # Yahoo Finance API, keyword-based analysis
        
        # Specialized prompt for fundamental analysis
        self.system_prompt = """You are a specialized Fundamental Analysis Agent.
        Your core expertise includes:
        
        **Financial Statement Analysis:**
        - Deep analysis of income statements, balance sheets, and cash flow statements
        - Assessment of financial health, profitability, and operational efficiency
        
        **SEC Filing Analysis:**  
        - Comprehensive review of 10-K and 10-Q reports
        - Management Discussion & Analysis (MD&A) insights
        
        **Investment Decision Framework:**
        Your analysis should result in clear BUY/SELL/HOLD recommendations with:
        - Target price estimates with supporting rationale
        - Risk assessment (financial, operational, market risks)"""
        
        # Langchain Agent Executor
        self.agent_executor = AgentExecutor(
            agent=create_openai_tools_agent(self.llm, self.tools, prompt),
            tools=self.tools,
            verbose=True,
            handle_parsing_errors=True,
            max_iterations=10
        )

5.5.2 Sentiment Agent - VADER Implementation

The Sentiment Agent uses a combination of VADER (Valence Aware Dictionary and sEntiment Reasoner) and TextBlob to analyze financial news sentiment:

class NewsSummarizationTool(BaseTool):
    name: str = "analyze_news_sentiment"
    description: str = "Analyzes financial news sentiment using VADER and TextBlob"
    
    def _analyze_vader_sentiment(self, text: str) -> Dict[str, Any]:
        """Analyze sentiment using VADER - optimized for social media and news."""
        vader_analyzer = SentimentIntensityAnalyzer()
        scores = vader_analyzer.polarity_scores(text)
        
        # Compound score from -1 to +1
        compound = scores['compound']
        if compound >= 0.05:
            label = "positive"
        elif compound <= -0.05:
            label = "negative"
        else:
            label = "neutral"
        
        return {
            "compound": float(compound),
            "pos": float(scores['pos']),
            "neg": float(scores['neg']),
            "neu": float(scores['neu']),
            "label": label
        }
    
    def _analyze_textblob_sentiment(self, text: str) -> Dict[str, Any]:
        """Analyze sentiment using TextBlob."""
        blob = TextBlob(text)
        polarity = blob.sentiment.polarity  # -1 to 1
        subjectivity = blob.sentiment.subjectivity  # 0 to 1
        
        if polarity > 0.1:
            label = "positive"
        elif polarity < -0.1:
            label = "negative"
        else:
            label = "neutral"
        
        return {"polarity": float(polarity), "subjectivity": float(subjectivity), "label": label}
    
    def _combine_sentiments(self, textblob: Dict, vader: Dict) -> Dict[str, Any]:
        """Combine results with higher weight for VADER (better for news)."""
        combined_score = (textblob["polarity"] * 0.4) + (vader["compound"] * 0.6)
        
        if combined_score > 0.1:
            label = "positive"
        elif combined_score < -0.1:
            label = "negative"
        else:
            label = "neutral"
        
        return {"score": float(combined_score), "label": label, "confidence": abs(combined_score)}

5.5.3 Fundamental Agent - Keyword-Based Implementation

The fundamental analysis does not use RAG techniques, but employs a very elementary approach based on keyword matching starting from the metadata returned by Yahoo Finance:

class RAGAnalysisTool(BaseTool):  # Name kept for compatibility and future evolution, but NOT true RAG
    name: str = "rag_fundamental_analysis"
    description: str = "Performs fundamental analysis using keyword-based data extraction"
    
    def _find_cash_flow_key(self, data: Dict[str, Any], possible_keys: List[str]) -> Optional[str]:
        """Find correct key for cash flow data using pattern matching."""
        for item_name, _ in data.items():
            for key in possible_keys:
                if key.lower() in item_name.lower():
                    return item_name
        return None
    
    def _analyze_cash_flow(self, financial_context: Dict[str, Any], guidance: Dict[str, Any]):
        """Analyze cash flow with keyword search, not RAG."""
        quarterly_data = financial_context.get("cash_flow", {}).get("quarterly", {}).get("data", {})
        
        if quarterly_data:
            # Search for "Operating Cash Flow" using keyword matching
            operating_cf_key = self._find_cash_flow_key(
                quarterly_data, 
                ["Operating Cash Flow", "Total Cash From Operating Activities"]
            )
            
            if operating_cf_key:
                # Extract values using found pattern
                cf_values = []
                for period, values in quarterly_data.items():
                    if operating_cf_key in values and values[operating_cf_key] is not None:
                        cf_values.append(values[operating_cf_key])

5.5.4 Valuation Agent Tools

The Valuation Agent implements specialized financial calculations using pandas and numpy:

class VolatilityCalculationTool(BaseTool):
    name: str = "calculate_volatility_metrics"
    description: str = "Calculates comprehensive volatility and risk metrics"
    
    def _run(self, symbol: str, risk_free_rate: float = 0.05) -> Dict[str, Any]:
        """Calculate volatility and risk metrics synchronously."""
        try:
            # First fetch the data
            stock_tool = StockDataTool()
            stock_data = stock_tool._run(symbol)
            
            if not stock_data["success"]:
                return stock_data
            
            # Convert to pandas DataFrame for calculations
            price_data = stock_data["price_data"]
            df = pd.DataFrame({
                'Date': pd.to_datetime(price_data["dates"]),
                'Close': price_data["close"],
                'Volume': price_data["volume"]
            })
            df.set_index('Date', inplace=True)
            
            # Calculate daily returns
            df['Daily_Return'] = df['Close'].pct_change()
            df = df.dropna()
            
            if len(df) < 2:
                return {
                    "success": False,
                    "error": "Insufficient data for calculations",
                    "symbol": symbol
                }
            
            # Basic metrics
            daily_returns = df['Daily_Return']
            
            # Daily metrics
            mean_daily_return = daily_returns.mean()
            daily_volatility = daily_returns.std()
            
            # Calculate cumulative return for proper annualized return
            start_price = df['Close'].iloc[0]
            end_price = df['Close'].iloc[-1]
            cumulative_return = (end_price / start_price) - 1
            trading_days = len(df)
            
            # Annualized metrics
            annualized_return = ((1 + cumulative_return) ** (TRADING_DAYS_PER_YEAR / trading_days)) - 1
            annualized_volatility = daily_volatility * np.sqrt(TRADING_DAYS_PER_YEAR)
            
            # Sharpe ratio
            sharpe_ratio = (annualized_return - risk_free_rate) / annualized_volatility if annualized_volatility > 0 else 0
            
            # Maximum drawdown calculation
            cumulative_returns = (1 + daily_returns).cumprod()
            rolling_max = cumulative_returns.expanding().max()
            drawdowns = (cumulative_returns - rolling_max) / rolling_max
            max_drawdown = drawdowns.min()
            
            # Value at Risk (VaR) - 5% and 1%
            var_5 = np.percentile(daily_returns, 5)
            var_1 = np.percentile(daily_returns, 1)
            
            # Additional statistics
            skewness = daily_returns.skew()
            kurtosis = daily_returns.kurtosis()
            
            # Price performance metrics
            total_return = (df['Close'].iloc[-1] - df['Close'].iloc[0]) / df['Close'].iloc[0]
            
            result = {
                "success": True,
                "symbol": symbol.upper(),
                "analysis_period": {
                    "start_date": df.index[0].strftime("%Y-%m-%d"),
                    "end_date": df.index[-1].strftime("%Y-%m-%d"),
                    "trading_days": len(df)
                },
                "price_metrics": {
                    "start_price": float(df['Close'].iloc[0]),
                    "end_price": float(df['Close'].iloc[-1]),
                    "total_return": float(total_return),
                    "annualized_return": float(annualized_return)
                },
                "volatility_metrics": {
                    "daily_volatility": float(daily_volatility),
                    "annualized_volatility": float(annualized_volatility),
                    "volatility_percentage": float(annualized_volatility * 100)
                },
                "risk_metrics": {
                    "sharpe_ratio": float(sharpe_ratio),
                    "max_drawdown": float(max_drawdown),
                    "max_drawdown_percentage": float(max_drawdown * 100),
                    "var_5_percent": float(var_5),
                    "var_1_percent": float(var_1),
                    "risk_free_rate": float(risk_free_rate)
                },
                "distribution_metrics": {
                    "mean_daily_return": float(mean_daily_return),
                    "skewness": float(skewness),
                    "kurtosis": float(kurtosis),
                    "positive_days": int((daily_returns > 0).sum()),
                    "negative_days": int((daily_returns < 0).sum())
                },
                "volume_metrics": {
                    "average_volume": float(df['Volume'].mean()),
                    "volume_volatility": float(df['Volume'].std()),
                    "latest_volume": float(df['Volume'].iloc[-1])
                }
            }
            
            logger.info(f"Successfully calculated metrics for {symbol}")
            return result
            
        except Exception as e:
            logger.error(f"Error calculating metrics for {symbol}: {str(e)}")
            return {
                "success": False,
                "error": f"Failed to calculate metrics for {symbol}: {str(e)}",
                "symbol": symbol
            }

5.6 Agent Registration and Discovery

Each agent exposes a /.well-known/agent-card endpoint for automatic discovery. Before launching the entire stack, I used these endpoints for registration within the Agent Registry:

@app.get("/.well-known/agent-card")
async def get_agent_card():
    return {
        "name": "Valuation Analysis Agent",
        "description": "Specialized agent for stock valuation and technical analysis",
        "version": "1.0.0",
        "protocolVersion": "0.3.0",
        "url": "http://0.0.0.0:3001/",
        "preferredTransport": "JSONRPC",
        "skills": [
            {
                "id": "technical_valuation_analysis",
                "name": "Technical Valuation Analysis",
                "description": "Comprehensive technical analysis using historical data",
                "tags": ["technical-analysis", "volatility-analysis", "valuation-metrics"]
            }
        ]
    }

5.7 Consolidation Pattern with LLM

The consolidation of specialist analyses uses a meta-prompt pattern to synthesize different perspectives.

def _consolidate_analyses(self, agent_responses: List[Dict], user_message: str) -> str:
    system_prompt = """You are a financial analysis consolidator. Your role is to 
    synthesize insights from multiple specialist agents into a coherent response.
    
    CONSOLIDATION GUIDELINES:
    1. Integrate insights from all successful agent responses
    2. Identify agreements and disagreements between agents
    3. Provide clear, actionable recommendations when possible
    4. If agents disagree, explain the different perspectives"""
    
    analyses_text = []
    for response in successful_responses:
        agent_name = response.get("agent", "unknown")
        analysis = response.get("analysis", "No analysis provided")
        analyses_text.append(f"**{agent_name.upper()} AGENT:**\n{analysis}")
    
    consolidation_prompt = f"""
    USER REQUEST: {user_message}
    
    AGENT ANALYSES:
    {chr(10).join(analyses_text)}
    
    Please provide a consolidated analysis that synthesizes these perspectives."""
    
    response = self.llm.invoke([
        SystemMessage(content=system_prompt),
        HumanMessage(content=consolidation_prompt)
    ])
    
    return response.content

This consolidation solution complements the structured debate system implemented: when debate is activated, the final consolidation highlights the collaborative process followed and debate metadata (rounds, turns, participation), while for informational queries it uses standard consolidation to maximize speed and efficiency.

5.8 Current Implementation Simplifications

It’s important to highlight some simplifications I adopted compared to the original paper specifications, to make practical compromises:

Sentiment Analysis: The use of VADER and TextBlob is effective for a prototype, but in a production implementation it would be appropriate to evaluate:

Transformer models specialized for financial text (FinBERT)
More sophisticated multi-source analysis
Financial context-specific handling

Fundamental Analysis: The keyword-based approach works on structured Yahoo Finance data, but a production implementation would require:

True RAG system with vector embeddings for SEC documents
Integration with professional financial databases (Bloomberg, Refinitiv)
Advanced semantic parsing of 10-K/10-Q documents

Validation: The system lacks:

Rigorous backtesting of recommendations
Performance metrics validated on historical datasets
Comparison with market benchmarks

In view of future evolution, we can easily replace these components with more sophisticated implementations without modifying the overall architecture.

6. X-Ray Scan of a User Request

To understand well how the AlphaAgents system works, let’s see it in action! I performed an end-to-end test using a significant user query that requires the involvement of all three specialized agents.

User query: “I’m undecided whether to invest in Tesla or Palantir. My investment horizon is 5-7 years and I want maximum returns, I can handle high risk”

In this section, I will document step by step everything that happens behind the scenes from the moment the request is received by the Group Chat Agent until the generation of the final synthesis report.

6.1 Phase 1: Initialization and Ecosystem Setup

Before the user query can be processed, it’s necessary that the entire AlphaAgents ecosystem is active and operational:

6.1.1 Agent Registry Startup

The Agent Registry is the first component that must be activated, as all agents depend on it for discovery:

# Agent Registry active on port 8000
curl -s http://localhost:8000/health
# Response: {"status":"ok","time":"2025-08-30T17:06:48.102057+00:00"}

6.1.2 Specialized Agent Startup

I launched the specialized agents in my local configuration on ports [3001, 3002, 3003]:

Valuation Agent (port 3001):

2025-08-30 19:06:50 - __main__ - INFO - Starting Valuation Agent...
2025-08-30 19:06:50 - valuation_agent.agent - INFO - Valuation Agent initialized successfully
2025-08-30 19:06:50 - valuation_agent.server - INFO - A2A Valuation Server initialized on 0.0.0.0:3001
INFO: Uvicorn running on http://0.0.0.0:3001

Sentiment Agent (port 3002):

2025-08-30 19:06:50 - __main__ - INFO - Starting Sentiment Agent...
2025-08-30 19:06:50 - sentiment_agent.agent - INFO - Sentiment Agent initialized successfully
2025-08-30 19:06:50 - sentiment_agent.server - INFO - A2A Sentiment Server initialized on 0.0.0.0:3002
INFO: Uvicorn running on http://0.0.0.0:3002

Fundamental Agent (port 3003):

2025-08-30 19:06:50 - __main__ - INFO - Starting Fundamental Agent...
2025-08-30 19:06:50 - fundamental_agent.agent - INFO - Fundamental Agent initialized successfully
2025-08-30 19:06:50 - fundamental_agent.server - INFO - A2A Fundamental Server initialized on 0.0.0.0:3003
INFO: Uvicorn running on http://0.0.0.0:3003

6.1.3 Group Chat Agent Startup

The Group Chat Agent is the last to be started and acts as orchestrator:

2025-08-30 19:07:31 - groupchat_agent.server - INFO - Starting GroupChat Agent server on 0.0.0.0:3000
2025-08-30 19:07:31 - groupchat_agent.a2a_agent - INFO - A2A GroupChat Agent initialized successfully
INFO: Uvicorn running on http://0.0.0.0:3000

6.2 Phase 2: Query Reception and Parsing

6.2.1 JSON-RPC A2A Request

The user query is sent to the Group Chat Agent via JSON-RPC protocol:

{
  "jsonrpc": "2.0",
  "method": "message/send",
  "params": {
    "message": "I'm undecided whether to invest in Tesla or Palantir. My investment horizon is 5-7 years and I want maximum returns, I can handle high risk",
    "metadata": {
      "user_id": "demo_user", 
      "session_id": "demo_session"
    }
  },
  "id": "1"
}

6.2.2 Conversion to A2A Message Format

The server automatically converts the query into standard A2A Message format:

{
  "kind": "message",
  "messageId": "uuid-generated",
  "role": "user", 
  "parts": [
    {
      "kind": "text",
      "text": "I'm undecided whether to invest in Tesla or Palantir. My investment horizon is 5-7 years and I want maximum returns, I can handle high risk"
    }
  ],
  "contextId": "demo_session"
}

6.3 Phase 3: Automatic Risk Tolerance Inference

6.3.1 LLM Call for Risk Assessment

The system semantically analyzes the user query to automatically infer the risk profile:

2025-08-30 19:09:40 - groupchat_agent.a2a_agent - INFO - Inferred risk_tolerance: seeking from user message

LLM Prompt used:

"""
You are a financial risk tolerance classifier. 
Analyze the user's message to determine their risk tolerance profile.

RISK TOLERANCE LEVELS:
- "averse": Conservative investors (capital preservation, stable returns, dividends)
- "neutral": Balanced investors (moderate growth with reasonable risk)  
- "seeking": Aggressive investors (high returns, accept high volatility)

User message: "I'm undecided whether to invest in Tesla or Palantir. My investment horizon is 5-7 years and I want maximum returns, I can handle high risk"
"""

Result: risk_tolerance: "seeking"

As expected, the system identified an aggressive profile based on “maximum returns” and “I can handle high risk”.

6.4 Phase 4: Agent Discovery and Intelligent Routing

6.4.1 Agent Card Retrieval from Agent Registry

The Group Chat Agent queries the Agent Registry to get Agent Card and endpoints of requested agents:

2025-08-30 19:09:40 - groupchat_agent.registry_service - INFO - Fetching specialist agent URLs from registry...

HTTP Request: GET http://localhost:8000/agents?name=valuation
HTTP Request: GET http://localhost:8000/agents?name=sentiment  
HTTP Request: GET http://localhost:8000/agents?name=fundamental

2025-08-30 19:09:40 - groupchat_agent.registry_service - INFO - Found agent 'valuation' at http://0.0.0.0:3001/
2025-08-30 19:09:40 - groupchat_agent.registry_service - INFO - Found agent 'sentiment' at http://0.0.0.0:3002
2025-08-30 19:09:40 - groupchat_agent.registry_service - INFO - Found agent 'fundamental' at http://0.0.0.0:3003/

6.4.2 Routing LLM for Agent Selection

The system uses a prompt to determine which agents to involve:

LLM Routing Prompt:

"""
You are a routing coordinator for financial analysis agents. 
Analyze the user's request and determine which specialist agents should be consulted:

AVAILABLE AGENTS:
- valuation: Technical analysis, price charts, volatility, quantitative metrics
- sentiment: News analysis, market sentiment, social media trends  
- fundamental: Financial reports, earnings, company fundamentals, SEC filings

User request: I'm undecided whether to invest in Tesla or Palantir. My investment horizon is 5-7 years and I want maximum returns, I can handle high risk
"""

Result:

2025-08-30 19:09:41 - groupchat_agent.a2a_agent - INFO - 🤖 LLM ROUTING RESPONSE: ["valuation", "sentiment", "fundamental"]
2025-08-30 19:09:41 - groupchat_agent.a2a_agent - INFO - Determined required agents: ['valuation', 'sentiment', 'fundamental']

6.4.3 Decision for Structured Debate

Another prompt determines if the query requires structured debate or simple coordination:

2025-08-30 19:09:41 - groupchat_agent.a2a_agent - INFO - 🤔 LLM DEBATE DECISION: YES for query: 'I'm undecided whether to invest in Tesla or Pala...'
2025-08-30 19:09:41 - groupchat_agent.a2a_agent - INFO - Multi-agent analysis detected - initiating structured debate

6.5 Phase 5: Round 1 - Initial Analysis

The structured debate mechanism with round-robin is activated. Each agent presents its initial analysis sequentially.

2025-08-30 19:09:41 - groupchat_agent.a2a_agent - INFO - Starting structured debate with agents: ['valuation', 'sentiment', 'fundamental']
2025-08-30 19:09:41 - groupchat_agent.a2a_agent - INFO - === INITIAL ANALYSIS ROUND ===

6.5.1 Valuation Agent - Round 1

Ticker Resolution: The Valuation Agent begins by resolving company names to stock tickers, then retrieves financial data, calculates valuation metrics, and finally generates a brief synthesis report recommending investment in Palantir:

> Entering new AgentExecutor chain...

Invoking: `resolve_company_ticker` with `{'query': 'Tesla'}`
→ {'success': True, 'query': 'Tesla', 'ticker': 'TSLA', 'company_name': 'Tesla, Inc.', 'resolution_method': 'company_mapping'}

Invoking: `resolve_company_ticker` with `{'query': 'Palantir'}`  
→ {'success': False, 'query': 'Palantir', 'error': "Could not resolve 'Palantir' to a valid stock ticker"}

Invoking: `resolve_company_ticker` with `{'query': 'PLTR'}`
→ {'success': True, 'query': 'PLTR', 'ticker': 'PLTR', 'company_name': 'Palantir Technologies Inc.', 'resolution_method': 'direct_ticker'}

Financial Data Retrieval:

Invoking: `fetch_stock_data` with `{'symbol': 'TSLA', 'period_days': 365}`
→ Successfully fetched 250 data points for TSLA

Invoking: `fetch_stock_data` with `{'symbol': 'PLTR', 'period_days': 365}`  
→ Successfully fetched 250 data points for PLTR

Volatility and Metrics Calculations:

Invoking: `calculate_volatility_metrics` with `{'symbol': 'TSLA'}`
Invoking: `calculate_volatility_metrics` with `{'symbol': 'PLTR'}`

Valuation Agent Comparative Analysis:

Tesla (TSLA):

Current Price: $333.87
Total Return (1 year): 58.53%
Annualized Return: 59.42%
Annualized Volatility: 71.44%
Sharpe Ratio: 0.76
Max Drawdown: -53.77%

Palantir (PLTR):

Current Price: $156.71
Total Return (1 year): 413.63%
Annualized Return: 423.86%
Annualized Volatility: 72.53%
Sharpe Ratio: 5.77
Max Drawdown: -40.61%

Valuation Agent Recommendation:

STRONG BUY Palantir
Palantir’s superior risk-adjusted returns and extraordinary growth potential make it a more attractive option for maximizing returns.

6.5.2 Sentiment Agent - Round 1

The Sentiment Agent searches for news about Tesla and Palantir on Yahoo Finance and Google News, retrieves the text of the articles, and calculates sentiment scores using VADER and TextBlob.

Financial News Collection:

2025-08-30 19:10:08 - sentiment_agent.tools - INFO - Collecting news for TSLA (Tesla, Inc.)
2025-08-30 19:10:08 - sentiment_agent.tools - INFO - Collecting news for PLTR (Palantir Technologies Inc.)

HTTP Request: GET https://feeds.finance.yahoo.com/rss/2.0/headline?s=TSLA&region=US&lang=en-US
HTTP Request: GET https://feeds.finance.yahoo.com/rss/2.0/headline?s=PLTR&region=US&lang=en-US
HTTP Request: GET https://news.google.com/rss/search?q="Tesla,+Inc."+OR+"TSLA"+stock+finance

2025-08-30 19:10:09 - sentiment_agent.tools - INFO - Successfully collected 9 articles for TSLA
2025-08-30 19:10:09 - sentiment_agent.tools - INFO - Successfully collected 6 articles for PLTR

Sentiment Analysis with VADER + TextBlob:

Tesla (TSLA):

Overall Sentiment: Neutral
TextBlob Average: 0.0024
VADER Average: -0.0651
Combined Average: -0.0381 (slight negative trend)
Distribution: 22% positive, 33% negative, 44% neutral

Palantir (PLTR):

Overall Sentiment: Positive
TextBlob Average: 0.0235
VADER Average: 0.4288
Combined Average: 0.2667 (moderately positive)
Distribution: 83% positive, 17% negative

Key Tesla News Analysis:

Negative: “Nearly Half Of Americans Say Tesla FSD Should Be Illegal”
Negative: “Tesla’s sales down 40% in EU while Chinese EV maker BYD is up 200%”
Positive: “Tesla Stock is Going Up Today? TSLA Shares Jump to June Highs on Elon Musk’s FSD Optimism”

Key Palantir News Analysis:

Positive: “Palantir Technologies Inc. (PLTR)’s New Name Is ‘Karpe Diem,’ Says Jim Cramer”
Positive: “Moody’s Just Upgraded The Top S&P 500 Stock Not Named Palantir”

Sentiment Agent Recommendation:

BUY Palantir
Strong positive sentiment and market performance vs Tesla’s neutral sentiment with regulatory challenges.

6.5.3 Fundamental Agent - Round 1

The output of the Fundamental Agent is much more articulated and detailed. In particular, it did not indicate a specific investment preference, but instead suggested a bivalent strategy (SELL on Tesla and BUY on Palantir)

In-Depth Financial Analysis:

Initially, the agent retrieves and examines the financial reports of both companies, highlighting their performance and key metrics. The data is retrieved through Yahoo Finance APIs.

2025-08-30 19:10:38 - fundamental_agent.tools - INFO - Pulling financial reports for TSLA (Tesla, Inc.)
2025-08-30 19:10:40 - fundamental_agent.tools - INFO - Successfully retrieved financial reports for TSLA

2025-08-30 19:10:40 - fundamental_agent.tools - INFO - Pulling financial reports for PLTR (Palantir Technologies Inc.) 
2025-08-30 19:10:42 - fundamental_agent.tools - INFO - Successfully retrieved financial reports for PLTR

RAG-based Analysis Results:

As anticipated earlier, this is not true RAG analysis, but rather a search for relevant information starting from Yahoo Finance API output. Starting from the retrieved financial data, the agent calculates KPIs of interest and produces an “expert” opinion.

Tesla (TSLA):

Sector: Consumer Cyclical - Auto Manufacturers
Gross Margin: 17.48%
Operating Margin: 4.10%
Revenue Growth: -11.80% (negative)
Earnings Growth: -17.50% (negative)
Fundamental Score: 0/100
Investment Recommendation: SELL

Concerns: “Low operating margin indicates operational challenges, negative revenue growth indicates declining business”

Palantir (PLTR):

Sector: Technology - Software Infrastructure
Gross Margin: 80.03% (excellent)
Operating Margin: 26.83% (strong)
Revenue Growth: 48.00% (robust)
Earnings Growth: 116.70% (extraordinary)
Fundamental Score: 100/100
Investment Recommendation: BUY

Strengths: Strong gross margin indicates good pricing power, Strong operating margin indicates efficient operations, Strong revenue growth indicates market expansion

Round 1 Conclusions:

2025-08-30 19:11:12 - groupchat_agent.a2a_agent - INFO - fundamental completed turn 1

6.6 Round 2 Debate

The system proceeds with Round 2 of the structured debate, where each agent has the opportunity to respond to the analyses of others:

2025-08-30 19:11:12 - groupchat_agent.a2a_agent - INFO - === DEBATE ROUND 2 ===
2025-08-30 19:11:12 - groupchat_agent.a2a_agent - INFO - Round 2: valuation's turn (turn #2)

6.6.1 Debate

In Round 2, agents have access to the analyses of other agents and can present counter-arguments or strengthen their positions. This process allows for:

Identifying disagreements: Tesla vs Palantir as optimal investment
Comparing methodologies: Technical vs fundamental vs sentiment analysis
Refining recommendations: Based on peer-to-peer feedback
Cross-validation: Cross-reference between different analytical perspectives

6.6.2 Consensus Building Process

The Group agent has a threshold for consensus among agents that is configured via an environment variable. Specifically, I left the default value of 75%, but it could be adjusted based on debate needs. In any case, for the test performed there were no significant differences in response, so it was very easy to reach consensus.

Nevertheless, before proceeding to the consolidation phase, the group agent still highlighted some distinctive elements of the responses from the 3 agents, and then provided a synthesis comment:

Valuation vs Fundamental divergence on Tesla:
Valuation Agent: Tesla showed 59.42% annualized return → Potential BUY
Fundamental Agent: Tesla has -11.80% revenue growth, 4.10% operating margins → SELL
Consensus on Palantir:
Valuation: 423.86% annualized return, Sharpe 5.77 → STRONG BUY
Fundamental: Score 100/100, 80.03% margins → BUY
Sentiment: 83% positive sentiment → BUY
Risk-adjusted perspective:
All agree: For risk-seeking investor with 5-7 year horizon, Palantir is superior to Tesla

6.7 Phase 7: Final Consolidation and Report Generation

6.7.1 LLM-based Synthesis

The Group Chat Agent uses an LLM with ad-hoc prompts to consolidate multiple analyses. To give an idea of how the process works, I report an example of prompt used, with some variables that refer to the context:

Consolidation Prompt:

"""
You are a financial analysis consolidator. Your role is to synthesize insights from multiple specialist agents into a coherent response.

USER REQUEST: I'm undecided whether to invest in Tesla or Palantir. My investment horizon is 5-7 years and I want maximum returns, I can handle high risk

AGENT ANALYSES:
**VALUATION AGENT:**
[Complete analysis with quantitative metrics...]

**SENTIMENT AGENT:** 
[Sentiment analysis with news distribution...]

**FUNDAMENTAL AGENT:**
[Fundamental analysis with score...]
"""

6.7.2 Final Report

The system generates a complete final report that integrates all specialist analyses. Here is the report in its entirety:

📊 ALPHAAGENTS INVESTMENT ANALYSIS REPORT

The analysis of whether to invest in Tesla (TSLA) or Palantir (PLTR) over a 5-7 year horizon, with a focus on maximizing returns and a willingness to manage high risk, was conducted through a structured multi-agent debate process. This comprehensive approach involved valuation, sentiment, and fundamental analyses, ensuring a well-rounded perspective.

🎯 CONSENSUS AND KEY FINDINGS:

1. Palantir Technologies Inc. (PLTR)

Valuation: Palantir has demonstrated an extraordinary annualized return of 423.86% with a high Sharpe ratio of 5.77, indicating exceptional risk-adjusted returns. The volatility is high, but the risk-reward profile is favorable.
Sentiment: The sentiment analysis shows a strong positive outlook, with 83% of articles reflecting positive sentiment. This aligns with the valuation analysis, reinforcing Palantir’s growth potential.
Fundamentals: Palantir’s strong growth metrics and operational efficiency support the positive sentiment and valuation findings.

Recommendation: STRONG BUY. Palantir is recommended for risk-seeking investors due to its exceptional growth potential and positive sentiment, making it an attractive option for maximizing returns.

2. Tesla, Inc. (TSLA):

Valuation: Tesla has a strong annualized return of 59.42% with high volatility. The Sharpe ratio of 0.76 suggests moderate risk-adjusted returns. The potential for significant returns exists, but with substantial risk.
Sentiment: The sentiment analysis is neutral with a slight negative tilt, reflecting concerns about automotive sector challenges and regulatory issues with Tesla’s Full Self-Driving technology.
Fundamentals: Operational challenges and strategic uncertainties contribute to a more cautious outlook.

Recommendation: HOLD. While Tesla has potential upside, the current sentiment and operational challenges suggest a more cautious approach. Investors should wait for more positive sentiment or strategic clarity before increasing exposure.

🏆 FINAL RECOMMENDATION:

Based on the collective reasoning from the multi-agent debate, Palantir (PLTR) is the preferred choice for a risk-seeking investor aiming for maximum returns over a 5-7 year horizon. Its strong positive sentiment, exceptional risk-adjusted returns, and growth potential make it a compelling investment.

Tesla (TSLA), while still a strong contender, is recommended as a hold due to its current neutral sentiment and operational challenges. Investors should monitor Tesla for strategic developments that could enhance its investment appeal.

This analysis benefits from the collaborative insights of multiple expert perspectives, providing increased confidence in the recommendations provided.

6.7.3 Debate Process Metadata

The final report also includes some metadata about the process followed, which can be used for explainability analysis:

{
  "debate_metadata": {
    "total_rounds": 2,
    "agents_participated": ["valuation", "sentiment", "fundamental"],  
    "consensus_achieved": true,
    "consensus_percentage": 100,
    "risk_tolerance_inferred": "seeking",
    "debate_duration": "3m 32s",
    "llm_calls": {
      "risk_inference": 1,
      "agent_routing": 1, 
      "debate_detection": 1,
      "consolidation": 1
    }
  }
}

6.8 Performance and Execution Times

An important aspect to document is the time needed to complete the entire analysis process. Based on the timestamps recorded during execution, here is the detailed temporal breakdown:

6.8.1 Overall Duration

Total execution time: 3 minutes and 32 seconds (from 19:09:40 to 19:13:12)

6.8.2 Temporal Breakdown by Phases

Phase	Duration	Start Timestamp	End Timestamp	Description
Initial Setup	~41s	19:06:50	19:07:31	Launch of all agents, Agent Registry, Healthcheck and basic verifications
Query Processing	~1s	19:09:40	19:09:41	User query reception and parsing
Risk Tolerance Inference	<1s	19:09:40	19:09:40	LLM analysis to infer risk profile
Agent Discovery	<1s	19:09:40	19:09:40	Endpoint retrieval from Agent Registry
Agent Routing	~1s	19:09:41	19:09:41	LLM decision on agents to involve
Debate Detection	<1s	19:09:41	19:09:41	LLM decision for structured debate
Round 1 Analysis	~91s	19:09:41	19:11:12	Sequential initial analyses of 3 agents
Round 2 Debate	~120s	19:11:12	19:13:12	Second round and final consolidation

6.8.3 Analysis of Most Expensive Components

The most time-intensive phases are:

Round 1 Analysis (91 seconds):
- Valuation Agent: ~30s for Yahoo Finance data retrieval + volatility calculations
- Sentiment Agent: ~30s for news collection from multiple sources + VADER/TextBlob analysis
- Fundamental Agent: ~31s for financial data processing + keyword matching
Round 2 Debate (120 seconds):
- Structured debate: ~60s for second round of cross-agent analysis
- LLM Consolidation: ~60s for final synthesis and report generation

6.8.4 Bottlenecks and Possible Optimizations

Identified bottlenecks:

API Latency: Calls to Yahoo Finance (~2-3s per request)
LLM Processing: Complex analyses require 10-15s per agent
Sequential Execution: Sequential turns not parallelizable by design

Future optimizations:

Round 1 Parallelization: Potential reduction from 91s to ~35-40s
Financial data caching: Avoid multiple retrievals of same tickers
Model optimization: More efficient prompts or faster models

6.8.5 Performance Considerations

The time of 3:32 minutes represents an optimal compromise for the financial use case:

✅ Acceptable for:

Financial analyses where accuracy > speed
Investors requiring multi-perspective validation
Medium to long-term investment decisions

⚠️ To improve for:

High-frequency trading
Real-time market analysis
Scenarios with high volume of simultaneous requests

6.9 End-to-End Process Conclusions

This complete trace demonstrates how the AlphaAgents system automatically orchestrates a sophisticated analysis process in 3:32 minutes, including:

Intelligent discovery of agents via Agent Registry
Automatic inference of risk tolerance via LLM semantic analysis
Multilingual routing to identify relevant agents
Structured debate with strict turn-taking and consensus building
Sequential specialist analyses with advanced tool calling
Multi-perspective consolidation via LLM meta-prompt
Final report with complete process metadata

Final result: Strong recommendation for Palantir Technologies based on unanimous agent consensus, supported by:

Superior return metrics (423% vs 59%)
Positive sentiment vs neutral for Tesla
Solid fundamentals (80% margins vs 17% for Tesla)
Alignment with risk-seeking profile and 5-7 year horizon

This multi-agent approach with structured debate allowed for clearly identifying the optimal investment through collaborative validation and individual bias mitigation, demonstrating the effectiveness of the A2A-compliant framework in real financial analysis scenarios with reasonable performance for the application domain.

7. Conclusions

7.1 Project Motivations and Objectives

This project was born from the need to concretely test the A2A protocol and Agent Registry in a real application scenario, to experiment with a distributed multi-agent application.

As I highlighted in the introduction, the proliferation of monolithic GenAI frameworks and their limitations in multi-agent support pushed me to invest some time to raise the level of abstraction and try to design an A2A-based agent architecture, in order to guarantee:

Communication standardization between heterogeneous agents
Elimination of lock-in to specific frameworks
Horizontal scalability through distributed deployments
Interoperability between agents implemented with different technology stacks

This small project kept me up a few nights but gave me the chance to get hands-on with a REAL multi-agent system different from the monolithic case study that is often presented in the “Getting Started” section of this or that GenAI framework.

I’m increasingly convinced that for Enterprise solutions, it’s worth starting to think seriously about agent architecture and collaboration protocols rather than the specific technology stack for implementing this or that Use Case.

7.2 The Choice of AlphaAgents

The selection of the BlackRock AlphaAgents paper as a use case proved particularly successful because, despite not overly high complexity, it allowed me to create an end-to-end solution with tangible and measurable results.

Compared to the original paper, I had to accept some compromises to simplify implementation, but this did not compromise achieving my objective.

7.3 Future Developments

The end-to-end test confirmed the validity of the approach and also provided me with some insights for future improvements.

7.4.1 Technical Improvements

Performance Optimization:

Round 1 Parallelization: Potential reduction from 91s to ~35s
Data caching: Avoid multiple retrievals of same tickers
LLM optimization: Prompt engineering to reduce latency

Agent reliability:

Sentiment Agent: FinBERT integration for more accurate analysis
Fundamental Agent: True RAG implementation with vector embeddings
Risk Assessment: More sophisticated models for portfolio risk management

Registry improvements:

Agent versioning and compatibility management
Load balancing for agents with multiple instances
Health monitoring and automatic failover

7.4.2 Functional Expansions

New Specialized Agents:

Technical Analysis Agent: Pattern recognition, support/resistance
Macro Economic Agent: Macroeconomic and sectoral analysis
ESG Agent: Environmental, Social, Governance scoring
Options Agent: Derivatives analysis and hedging strategies

Accurate results verification:

Double check of metrics calculated by Valuation Agent on large data samples
Cross-validation with other valuation models

7.5 Considerations on Financial Results

7.5.1 Limitations and Risks

Technical Limitations:

Limited data: Much of the analysis is based on Yahoo Finance data, not professional sources
Lack of backtesting: As I emphasized multiple times, I performed no systematic validation on historical data
Restricted scope: The analysis sample on which I did end-to-end tests is limited to few stocks and few types of user requests

Risks:

LLM hallucinations: Possible erroneous or contradictory analyses
Data quality: Input data accuracy not guaranteed
Model bias: Bias in language models used
Market conditions: Doesn’t consider current macroeconomic conditions

7.5.2 ⚠️ Important Disclaimer

Also in light of the limitations indicated above, all results, recommendations and analyses presented in this article are exclusively for exemplification purposes to illustrate the multi-agent system operation.

The results reported here and the software released on GitHub must NOT be used in any way as:

Basis for real investment decisions
Financial advice or trading recommendations
Input for portfolio management strategies

1. Introduction#

2. Brief Introduction to AlphaAgents#

2.1 The Problem and Solution#

2.2 The Three Agents#

2.3 Collaboration and Debate Mechanism#

2.4 Intelligent Risk Tolerance Management#

2.4.1 Risk Profile Classification#

2.4.2 Automatic Classification Examples#

2.4.3 Integration in the Debate Process#

2.5 Advantages of Multi-Agent Approach with Structured Debate#

3. Starting Point#

4. High-Level Design#

4.1 Overall Logical Architecture#

4.2 Registration Process#

4.3 Agent Card#

4.4 End-to-End Analysis Flow#

4.5 Transport Layer and Messaging Protocol#

5. Implementation#

5.1 Technology Stack and Dependencies#

5.2 Project Structure#

5.3 Group Chat Agent Implementation#

5.3.1 Risk Tolerance Inference#

5.3.2 Code Excerpt#

5.4 A2A Communication Patterns#

5.5 Specialized Agent Implementation#

5.5.1 Fundamental Agent#

5.5.2 Sentiment Agent - VADER Implementation#

5.5.3 Fundamental Agent - Keyword-Based Implementation#

5.5.4 Valuation Agent Tools#

5.6 Agent Registration and Discovery#

5.7 Consolidation Pattern with LLM#

5.8 Current Implementation Simplifications#

6. X-Ray Scan of a User Request#

6.1 Phase 1: Initialization and Ecosystem Setup#

6.1.1 Agent Registry Startup#

6.1.2 Specialized Agent Startup#

6.1.3 Group Chat Agent Startup#

6.2 Phase 2: Query Reception and Parsing#

6.2.1 JSON-RPC A2A Request#

6.2.2 Conversion to A2A Message Format#

6.3 Phase 3: Automatic Risk Tolerance Inference#

6.3.1 LLM Call for Risk Assessment#

6.4 Phase 4: Agent Discovery and Intelligent Routing#

6.4.1 Agent Card Retrieval from Agent Registry#

6.4.2 Routing LLM for Agent Selection#

6.4.3 Decision for Structured Debate#

6.5 Phase 5: Round 1 - Initial Analysis#

6.5.1 Valuation Agent - Round 1#

6.5.2 Sentiment Agent - Round 1#

6.5.3 Fundamental Agent - Round 1#

6.6 Round 2 Debate#

6.6.1 Debate#

6.6.2 Consensus Building Process#

6.7 Phase 7: Final Consolidation and Report Generation#

6.7.1 LLM-based Synthesis#

6.7.2 Final Report#

6.7.3 Debate Process Metadata#

6.8 Performance and Execution Times#

6.8.1 Overall Duration#

6.8.2 Temporal Breakdown by Phases#

6.8.3 Analysis of Most Expensive Components#

6.8.4 Bottlenecks and Possible Optimizations#

6.8.5 Performance Considerations#

6.9 End-to-End Process Conclusions#

7. Conclusions#

7.1 Project Motivations and Objectives#

7.2 The Choice of AlphaAgents#

7.3 Future Developments#

7.4.1 Technical Improvements#

7.4.2 Functional Expansions#

7.5 Considerations on Financial Results#

7.5.1 Limitations and Risks#

7.5.2 ⚠️ Important Disclaimer#

1. Introduction

2. Brief Introduction to AlphaAgents

2.1 The Problem and Solution

2.2 The Three Agents

2.3 Collaboration and Debate Mechanism

2.4 Intelligent Risk Tolerance Management

2.4.1 Risk Profile Classification

2.4.2 Automatic Classification Examples

2.4.3 Integration in the Debate Process

2.5 Advantages of Multi-Agent Approach with Structured Debate

3. Starting Point

4. High-Level Design

4.1 Overall Logical Architecture

4.2 Registration Process

4.3 Agent Card

4.4 End-to-End Analysis Flow

4.5 Transport Layer and Messaging Protocol

5. Implementation

5.1 Technology Stack and Dependencies

5.2 Project Structure

5.3 Group Chat Agent Implementation

5.3.1 Risk Tolerance Inference

5.3.2 Code Excerpt

5.4 A2A Communication Patterns

5.5 Specialized Agent Implementation

5.5.1 Fundamental Agent

5.5.2 Sentiment Agent - VADER Implementation

5.5.3 Fundamental Agent - Keyword-Based Implementation

5.5.4 Valuation Agent Tools

5.6 Agent Registration and Discovery

5.7 Consolidation Pattern with LLM

5.8 Current Implementation Simplifications

6. X-Ray Scan of a User Request

6.1 Phase 1: Initialization and Ecosystem Setup

6.1.1 Agent Registry Startup

6.1.2 Specialized Agent Startup

6.1.3 Group Chat Agent Startup

6.2 Phase 2: Query Reception and Parsing

6.2.1 JSON-RPC A2A Request

6.2.2 Conversion to A2A Message Format

6.3 Phase 3: Automatic Risk Tolerance Inference

6.3.1 LLM Call for Risk Assessment

6.4 Phase 4: Agent Discovery and Intelligent Routing

6.4.1 Agent Card Retrieval from Agent Registry

6.4.2 Routing LLM for Agent Selection

6.4.3 Decision for Structured Debate

6.5 Phase 5: Round 1 - Initial Analysis

6.5.1 Valuation Agent - Round 1

6.5.2 Sentiment Agent - Round 1

6.5.3 Fundamental Agent - Round 1

6.6 Round 2 Debate

6.6.1 Debate

6.6.2 Consensus Building Process

6.7 Phase 7: Final Consolidation and Report Generation

6.7.1 LLM-based Synthesis

6.7.2 Final Report

6.7.3 Debate Process Metadata

6.8 Performance and Execution Times

6.8.1 Overall Duration

6.8.2 Temporal Breakdown by Phases

6.8.3 Analysis of Most Expensive Components

6.8.4 Bottlenecks and Possible Optimizations

6.8.5 Performance Considerations

6.9 End-to-End Process Conclusions

7. Conclusions

7.1 Project Motivations and Objectives

7.2 The Choice of AlphaAgents

7.3 Future Developments

7.4.1 Technical Improvements

7.4.2 Functional Expansions

7.5 Considerations on Financial Results

7.5.1 Limitations and Risks

7.5.2 ⚠️ Important Disclaimer