ai/doc/agent_harness.md

# Agent Harness Architecture

The Agent Harness is the core orchestration layer for the Dexorder AI platform, built on LangChain.js and LangGraph.js.

## Architecture Overview

```
┌─────────────────────────────────────────────────────────────┐
│                    Gateway (Fastify)                        │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │  WebSocket   │  │  Telegram    │  │  Event       │     │
│  │  Handler     │  │  Handler     │  │  Router      │     │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘     │
│         │                  │                  │              │
│         └──────────────────┴──────────────────┘              │
│                            │                                 │
│                    ┌───────▼────────┐                        │
│                    │ Agent Harness  │                        │
│                    │  (Stateless)   │                        │
│                    └───────┬────────┘                        │
│                            │                                 │
│         ┌──────────────────┼──────────────────┐             │
│         │                  │                  │             │
│    ┌────▼─────┐      ┌────▼─────┐      ┌────▼─────┐       │
│    │   MCP    │      │   LLM    │      │   RAG    │       │
│    │ Connector│      │  Router  │      │ Retriever│       │
│    └────┬─────┘      └────┬─────┘      └────┬─────┘       │
│         │                  │                  │             │
└─────────┼──────────────────┼──────────────────┼─────────────┘
          │                  │                  │
          ▼                  ▼                  ▼
   ┌────────────┐     ┌───────────┐     ┌───────────┐
   │   User's   │     │    LLM    │     │  Qdrant   │
   │    MCP     │     │ Providers │     │ (Vectors) │
   │ Container  │     │(Anthropic,│     │           │
   │ (k8s pod)  │     │  OpenAI,  │     │  Global + │
   │            │     │   etc)    │     │   User    │
   └────────────┘     └───────────┘     └───────────┘
```

## Message Processing Flow

When a user sends a message:

```
1. Gateway receives message via channel (WebSocket/Telegram)
   ↓
2. Authenticator validates user and gets license info
   ↓
3. Container Manager ensures user's MCP container is running
   ↓
4. Agent Harness processes message:
   │
   ├─→ a. MCPClientConnector fetches context resources:
   │      - context://user-profile
   │      - context://conversation-summary
   │      - context://workspace-state
   │      - context://system-prompt
   │
   ├─→ b. RAGRetriever searches for relevant memories:
   │      - Embeds user query
   │      - Searches Qdrant: user_id = current_user OR user_id = "0"
   │      - Returns user-specific + global platform knowledge
   │
   ├─→ c. Build system prompt:
   │      - Base platform prompt
   │      - User profile context
   │      - Workspace state
   │      - Custom user instructions
   │      - Relevant RAG memories
   │
   ├─→ d. ModelRouter selects LLM:
   │      - Based on license tier
   │      - Query complexity
   │      - Configured routing strategy
   │
   ├─→ e. LLM invocation with tool support:
   │      - Send messages to LLM
   │      - If tool calls requested:
   │         • Platform tools → handled by gateway
   │         • User tools → proxied to MCP container
   │      - Loop until no more tool calls
   │
   ├─→ f. Save conversation to MCP:
   │      - mcp.callTool('save_message', user_message)
   │      - mcp.callTool('save_message', assistant_message)
   │
   └─→ g. Return response to user via channel
```

## Core Components

### 1. Agent Harness (`gateway/src/harness/agent-harness.ts`)

**Stateless orchestrator** - all state lives in user's MCP container or RAG.

**Responsibilities:**
- Fetch context from user's MCP resources
- Query RAG for relevant memories
- Build prompts with full context
- Route to appropriate LLM
- Handle tool calls (platform vs user)
- Save conversation back to MCP
- Stream responses to user

**Key Methods:**
- `handleMessage()`: Process single message (non-streaming)
- `streamMessage()`: Process with streaming response
- `initialize()`: Connect to user's MCP server

### 2. MCP Client Connector (`gateway/src/harness/mcp-client.ts`)

Connects to user's MCP container using Model Context Protocol.

**Features:**
- Resource reading (context://, indicators://, strategies://)
- Tool execution (save_message, run_backtest, etc.)
- Automatic reconnection on container restarts
- Error handling and fallbacks

### 3. Model Router (`gateway/src/llm/router.ts`)

Routes queries to appropriate LLM based on:
- **License tier**: Free users → smaller models, paid → better models
- **Complexity**: Simple queries → fast models, complex → powerful models
- **Cost optimization**: Balance performance vs cost

**Routing Strategies:**
- `COST`: Minimize cost
- `COMPLEXITY`: Match model to query complexity
- `SPEED`: Prioritize fast responses
- `QUALITY`: Best available model

### 4. Memory Layer

#### Three-Tier Storage:

**Redis** (Hot Storage)
- Active session state
- Recent conversation history (last 50 messages)
- LangGraph checkpoints (1 hour TTL)
- Fast reads for active conversations

**Qdrant** (Vector Search)
- Conversation embeddings
- User-specific memories (user_id = actual user ID)
- **Global platform knowledge** (user_id = "0")
- RAG retrieval with cosine similarity
- GDPR-compliant (indexed by user_id for fast deletion)

**Iceberg** (Cold Storage)
- Full conversation history (partitioned by user_id, session_id)
- Checkpoint snapshots for replay
- Analytics and time-travel queries
- GDPR-compliant with compaction

#### RAG System:

**Global Knowledge** (user_id="0"):
- Platform capabilities and architecture
- Trading concepts and fundamentals
- Indicator development guides
- Strategy patterns and examples
- Loaded from `gateway/knowledge/` markdown files

**User Knowledge** (user_id=specific user):
- Personal conversation history
- Trading preferences and style
- Custom indicators and strategies
- Workspace state and context

**Query Flow:**
1. User query is embedded using EmbeddingService
2. Qdrant searches: `user_id IN (current_user, "0")`
3. Top-K relevant chunks returned
4. Added to LLM context automatically

### 5. Skills vs Subagents

#### Skills (`gateway/src/harness/skills/`)

**Use for**: Well-defined, specific tasks
- Market analysis
- Strategy validation
- Single-purpose capabilities
- Defined in markdown + TypeScript

**Structure:**
```typescript
class MarketAnalysisSkill extends BaseSkill {
  async execute(context, parameters) {
    // Implementation
  }
}
```

#### Subagents (`gateway/src/harness/subagents/`)

**Use for**: Complex domain expertise with context
- Code reviewer with review guidelines
- Risk analyzer with risk models
- Multi-file knowledge base in memory/ directory
- Custom system prompts

**Structure:**
```
subagents/
  code-reviewer/
    config.yaml              # Model, memory files, capabilities
    system-prompt.md         # Specialized instructions
    memory/
      review-guidelines.md
      common-patterns.md
      best-practices.md
    index.ts
```

**Recommendation**: Prefer skills for most tasks. Use subagents when you need:
- Substantial domain-specific knowledge
- Multi-file context management
- Specialized system prompts

### 6. Workflows (`gateway/src/harness/workflows/`)

LangGraph state machines for multi-step orchestration:

**Features:**
- Validation loops (retry with fixes)
- Human-in-the-loop (approval gates)
- Error recovery
- State persistence via checkpoints

**Example Workflows:**
- Strategy validation: review → backtest → risk → approval
- Trading request: analysis → risk → approval → execute

## User Context Structure

Every interaction includes rich context:

```typescript
interface UserContext {
  userId: string;
  sessionId: string;
  license: UserLicense;

  // Multi-channel support
  activeChannel: {
    type: 'websocket' | 'telegram' | 'slack' | 'discord';
    channelUserId: string;
    capabilities: {
      supportsMarkdown: boolean;
      supportsImages: boolean;
      supportsButtons: boolean;
      maxMessageLength: number;
    };
  };

  // Retrieved from MCP + RAG
  conversationHistory: BaseMessage[];
  relevantMemories: MemoryChunk[];
  workspaceState: WorkspaceContext;
}
```

## User-Specific Files and Tools

User's MCP container provides access to:

**Indicators** (`indicators/*.py`)
- Custom technical indicators
- Pure functions: DataFrame → Series/DataFrame
- Version controlled in user's git repo

**Strategies** (`strategies/*.py`)
- Trading strategies with entry/exit rules
- Position sizing and risk management
- Backtestable and deployable

**Watchlists**
- Saved ticker lists
- Market monitoring

**Preferences**
- Trading style and risk tolerance
- Chart settings and colors
- Notification preferences

**Executors** (sub-strategies)
- Tactical order generators (TWAP, iceberg, etc.)
- Smart order routing

## Global Knowledge Management

### Document Loading

At gateway startup:
1. DocumentLoader scans `gateway/knowledge/` directory
2. Markdown files chunked by headers (~1000 tokens/chunk)
3. Embeddings generated via EmbeddingService
4. Stored in Qdrant with user_id="0"
5. Content hashing enables incremental updates

### Directory Structure

```
gateway/knowledge/
  ├── platform/          # Platform capabilities
  ├── trading/           # Trading fundamentals
  ├── indicators/        # Indicator development
  └── strategies/        # Strategy patterns
```

### Updating Knowledge

**Development:**
```bash
curl -X POST http://localhost:3000/admin/reload-knowledge
```

**Production:**
- Update markdown files
- Deploy new version
- Auto-loaded on startup

**Monitoring:**
```bash
curl http://localhost:3000/admin/knowledge-stats
```

## Container Lifecycle

### User Container Creation

When user connects:
1. Gateway checks if container exists (ContainerManager)
2. If not, creates Kubernetes pod with:
   - Agent container (Python + conda)
   - Lifecycle sidecar (container management)
   - Persistent volume (git repo)
3. Waits for MCP server ready (~5-10s cold start)
4. Establishes MCP connection
5. Begins message processing

### Container Shutdown

**Free users:** 15 minutes idle timeout
**Paid users:** Longer timeout based on license
**On shutdown:**
- Graceful save of all state
- Persistent storage retained
- Fast restart on next connection

### MCP Authentication Modes

1. **Public Mode** (Free tier): No auth, read-only, anonymous session
2. **Gateway Auth** (Standard): Gateway authenticates, container trusts gateway
3. **Direct Auth** (Enterprise): User authenticates directly with container

## Implementation Status

### ✅ Completed
- Agent Harness with MCP integration
- Model routing with license tiers
- RAG retriever with Qdrant
- Document loader for global knowledge
- EmbeddingService (Ollama/OpenAI)
- Skills and subagents framework
- Multi-channel support (WebSocket, Telegram)
- Container lifecycle management
- Event system with ZeroMQ

### 🚧 In Progress
- Iceberg integration (checkpoint-saver, conversation-store)
- More subagents (risk-analyzer, market-analyst)
- LangGraph workflows with interrupts
- Platform tools (market data, charting)

### 📋 Planned
- File watcher for hot-reload in development
- Advanced RAG strategies (hybrid search, re-ranking)
- Caching layer for expensive operations
- Performance monitoring and metrics

## References

- Implementation: `gateway/src/harness/`
- Documentation: `gateway/src/harness/README.md`
- Knowledge base: `gateway/knowledge/`
- LangGraph: https://langchain-ai.github.io/langgraphjs/
- Qdrant: https://qdrant.tech/documentation/
- MCP Spec: https://modelcontextprotocol.io/