Files

Tim Olson d41fcd0499 feat: add @tag model override support and remove Qdrant dependencies

- Add model-tags parser for @Tag syntax in chat messages
- Support Anthropic models (Sonnet, Haiku, Opus) via @tag
- Remove Qdrant vector database from infrastructure and configs
- Simplify license model config to use null fallbacks
- Add greeting stream after model switch via @tag
- Fix protobuf field names to camelCase for v7 compatibility
- Add 429 rate limit retry logic with exponential backoff
- Remove RAG references from agent harness documentation

2026-04-27 20:55:18 -04:00

10 KiB

Raw Blame History

Agent Harness Architecture

The Agent Harness is the core orchestration layer for the Dexorder AI platform, built on LangChain.js and LangGraph.js.

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                    Gateway (Fastify)                        │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │  WebSocket   │  │  Telegram    │  │  Event       │     │
│  │  Handler     │  │  Handler     │  │  Router      │     │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘     │
│         │                  │                  │              │
│         └──────────────────┴──────────────────┘              │
│                            │                                 │
│                    ┌───────▼────────┐                        │
│                    │ Agent Harness  │                        │
│                    │  (Stateless)   │                        │
│                    └───────┬────────┘                        │
│                            │                                 │
│         ┌──────────────────┼──────────────────┐             │
│         │                  │                  │             │
│    ┌────▼─────┐      ┌────▼─────┐      ┌────▼─────┐       │
│    │   MCP    │      │   LLM    │      │
│    │ Connector│      │  Router  │      │
│    └────┬─────┘      └────┬─────┘      │
│         │                  │             │
└─────────┼──────────────────┼─────────────┘
          │                  │
          ▼                  ▼
   ┌────────────┐     ┌───────────┐
   │   User's   │     │    LLM    │
   │    MCP     │     │ Providers │
   │ Container  │     │(Anthropic,│
   │ (k8s pod)  │     │  OpenAI,  │
   │            │     │   etc)    │
   └────────────┘     └───────────┘

Message Processing Flow

When a user sends a message:

1. Gateway receives message via channel (WebSocket/Telegram)
   ↓
2. Authenticator validates user and gets license info
   ↓
3. Container Manager ensures user's MCP container is running
   ↓
4. Agent Harness processes message:
   │
   ├─→ a. MCPClientConnector fetches context resources:
   │      - context://user-profile
   │      - context://conversation-summary
   │      - context://workspace-state
   │      - context://system-prompt
   │
   ├─→ b. Build system prompt:
   │      - Base platform prompt
   │      - User profile context
   │      - Workspace state
   │      - Custom user instructions
   │
   ├─→ d. ModelRouter selects LLM:
   │      - Based on license tier
   │      - Query complexity
   │      - Configured routing strategy
   │
   ├─→ e. LLM invocation with tool support:
   │      - Send messages to LLM
   │      - If tool calls requested:
   │         • Platform tools → handled by gateway
   │         • User tools → proxied to MCP container
   │      - Loop until no more tool calls
   │
   ├─→ f. Save conversation to MCP:
   │      - mcp.callTool('save_message', user_message)
   │      - mcp.callTool('save_message', assistant_message)
   │
   └─→ g. Return response to user via channel

Core Components

1. Agent Harness (`gateway/src/harness/agent-harness.ts`)

Stateless orchestrator - all state lives in user's MCP container.

Responsibilities:

Fetch context from user's MCP resources
Build prompts with full context
Route to appropriate LLM
Handle tool calls (platform vs user)
Save conversation back to MCP
Stream responses to user

Key Methods:

handleMessage(): Process single message (non-streaming)
streamMessage(): Process with streaming response
initialize(): Connect to user's MCP server

2. MCP Client Connector (`gateway/src/harness/mcp-client.ts`)

Connects to user's MCP container using Model Context Protocol.

Features:

Resource reading (context://, indicators://, strategies://)
Tool execution (save_message, run_backtest, etc.)
Automatic reconnection on container restarts
Error handling and fallbacks

3. Model Router (`gateway/src/llm/router.ts`)

Routes queries to appropriate LLM based on:

License tier: Free users → smaller models, paid → better models
Complexity: Simple queries → fast models, complex → powerful models
Cost optimization: Balance performance vs cost

Routing Strategies:

COST: Minimize cost
COMPLEXITY: Match model to query complexity
SPEED: Prioritize fast responses
QUALITY: Best available model

4. Memory Layer

Three-Tier Storage:

Redis (Hot Storage)

Active session state
Recent conversation history (last 50 messages)
LangGraph checkpoints (1 hour TTL)
Fast reads for active conversations

Iceberg (Cold Storage)

Full conversation history (partitioned by user_id, session_id)
Checkpoint snapshots for replay
Analytics and time-travel queries
GDPR-compliant with compaction

5. Skills vs Subagents

Skills (`gateway/src/harness/skills/`)

Use for: Well-defined, specific tasks

Market analysis
Strategy validation
Single-purpose capabilities
Defined in markdown + TypeScript

Structure:

class MarketAnalysisSkill extends BaseSkill {
  async execute(context, parameters) {
    // Implementation
  }
}

Subagents (`gateway/src/harness/subagents/`)

Use for: Complex domain expertise with context

Code reviewer with review guidelines
Risk analyzer with risk models
Multi-file knowledge base in memory/ directory
Custom system prompts

Structure:

subagents/
  code-reviewer/
    config.yaml              # Model, memory files, capabilities
    system-prompt.md         # Specialized instructions
    memory/
      review-guidelines.md
      common-patterns.md
      best-practices.md
    index.ts

Recommendation: Prefer skills for most tasks. Use subagents when you need:

Substantial domain-specific knowledge
Multi-file context management
Specialized system prompts

6. Workflows (`gateway/src/harness/workflows/`)

LangGraph state machines for multi-step orchestration:

Features:

Validation loops (retry with fixes)
Human-in-the-loop (approval gates)
Error recovery
State persistence via checkpoints

Example Workflows:

Strategy validation: review → backtest → risk → approval
Trading request: analysis → risk → approval → execute

User Context Structure

Every interaction includes rich context:

interface UserContext {
  userId: string;
  sessionId: string;
  license: UserLicense;

  // Multi-channel support
  activeChannel: {
    type: 'websocket' | 'telegram' | 'slack' | 'discord';
    channelUserId: string;
    capabilities: {
      supportsMarkdown: boolean;
      supportsImages: boolean;
      supportsButtons: boolean;
      maxMessageLength: number;
    };
  };

  // Retrieved from MCP + RAG
  conversationHistory: BaseMessage[];
  relevantMemories: MemoryChunk[];
  workspaceState: WorkspaceContext;
}

User-Specific Files and Tools

User's MCP container provides access to:

Indicators (indicators/*.py)

Custom technical indicators
Pure functions: DataFrame → Series/DataFrame
Version controlled in user's git repo

Strategies (strategies/*.py)

Trading strategies with entry/exit rules
Position sizing and risk management
Backtestable and deployable

Watchlists

Saved ticker lists
Market monitoring

Preferences

Trading style and risk tolerance
Chart settings and colors
Notification preferences

Executors (sub-strategies)

Tactical order generators (TWAP, iceberg, etc.)
Smart order routing

Container Lifecycle

User Container Creation

When user connects:

Gateway checks if container exists (ContainerManager)
If not, creates Kubernetes pod with:
- Agent container (Python + conda)
- Lifecycle sidecar (container management)
- Persistent volume (git repo)
Waits for MCP server ready (~5-10s cold start)
Establishes MCP connection
Begins message processing

Container Shutdown

Free users: 15 minutes idle timeout Paid users: Longer timeout based on license On shutdown:

Graceful save of all state
Persistent storage retained
Fast restart on next connection

MCP Authentication Modes

Public Mode (Free tier): No auth, read-only, anonymous session
Gateway Auth (Standard): Gateway authenticates, container trusts gateway
Direct Auth (Enterprise): User authenticates directly with container

Implementation Status

✅ Completed

Agent Harness with MCP integration
Model routing with license tiers
Document loader for global knowledge
EmbeddingService (Ollama/OpenAI)
Skills and subagents framework
Multi-channel support (WebSocket, Telegram)
Container lifecycle management
Event system with ZeroMQ

🚧 In Progress

Iceberg integration (checkpoint-saver, conversation-store)
More subagents (risk-analyzer, market-analyst)
LangGraph workflows with interrupts
Platform tools (market data, charting)

📋 Planned

File watcher for hot-reload in development
Advanced RAG strategies (hybrid search, re-ranking)
Caching layer for expensive operations
Performance monitoring and metrics

References

Implementation: gateway/src/harness/
Documentation: gateway/src/harness/README.md
Knowledge base: gateway/knowledge/
LangGraph: https://langchain-ai.github.io/langgraphjs/
MCP Spec: https://modelcontextprotocol.io/

10 KiB Raw Blame History