Files
ai/gateway/src/harness/README.md

9.5 KiB

Agent Harness

Comprehensive agent orchestration system for Dexorder AI platform, built on LangChain.js and LangGraph.js.

Architecture Overview

gateway/src/harness/
├── memory/              # Storage layer (Redis + Iceberg + Qdrant)
├── skills/              # Individual capabilities (markdown + TypeScript)
├── subagents/           # Specialized agents with multi-file memory
├── workflows/           # LangGraph state machines
├── tools/               # Platform tools (non-MCP)
├── config/              # Configuration files
└── index.ts             # Main exports

Core Components

1. Memory Layer (memory/)

Tiered storage architecture as per architecture discussion:

  • Redis: Hot state (active sessions, checkpoints)
  • Iceberg: Cold storage (durable conversations, analytics)
  • Qdrant: Vector search (RAG, semantic memory)

Key Files:

  • checkpoint-saver.ts: LangGraph checkpoint persistence
  • conversation-store.ts: Message history management
  • rag-retriever.ts: Vector similarity search
  • embedding-service.ts: Text→vector conversion
  • session-context.ts: User context with channel metadata

2. Skills (skills/)

Self-contained capabilities with markdown definitions:

  • *.skill.md: Human-readable documentation
  • *.ts: Implementation extending BaseSkill
  • Input validation and error handling
  • Can use LLM, MCP tools, or platform tools

Example:

import { MarketAnalysisSkill } from './skills';

const skill = new MarketAnalysisSkill(logger, model);
const result = await skill.execute({
  context: userContext,
  parameters: { ticker: 'BTC/USDT', period: '4h' }
});

See skills/README.md for authoring guide.

3. Subagents (subagents/)

Specialized agents with multi-file memory:

subagents/
  code-reviewer/
    config.yaml              # Model, memory files, capabilities
    system-prompt.md         # System instructions
    memory/                  # Multi-file knowledge base
      review-guidelines.md
      common-patterns.md
      best-practices.md
    index.ts                 # Implementation

Features:

  • Dedicated system prompts
  • Split memory into logical files (better organization)
  • Model overrides
  • Capability tagging

Example:

const codeReviewer = await createCodeReviewerSubagent(model, logger, basePath);
const review = await codeReviewer.execute({ userContext }, strategyCode);

4. Workflows (workflows/)

LangGraph state machines with:

  • Validation loops (retry with fixes)
  • Human-in-the-loop (approval gates)
  • Multi-step orchestration
  • Error recovery

Example Workflows:

  • strategy-validation/: Code review → backtest → risk → approval
  • trading-request/: Analysis → risk → approval → execute

See individual workflow READMEs for details.

5. Configuration (config/)

YAML-based configuration:

  • models.yaml: LLM providers, routing, rate limits
  • subagent-routing.yaml: When to use which subagent

User Context

Enhanced session context with channel awareness for multi-channel support:

interface UserContext {
  userId: string;
  sessionId: string;
  license: UserLicense;

  activeChannel: {
    type: 'websocket' | 'telegram' | 'slack' | 'discord';
    channelUserId: string;
    capabilities: {
      supportsMarkdown: boolean;
      supportsImages: boolean;
      supportsButtons: boolean;
      maxMessageLength: number;
    };
  };

  conversationHistory: BaseMessage[];
  relevantMemories: MemoryChunk[];
  workspaceState: WorkspaceContext;
}

This allows workflows to:

  • Route responses to correct channel
  • Format output for channel capabilities
  • Handle channel-specific interactions (buttons, voice, etc.)

Storage Architecture

Based on harness-rag.txt discussion:

Hot Path (Redis)

  • Active checkpoints (TTL: 1 hour)
  • Recent messages (last 50)
  • Session metadata
  • Fast reads for active conversations

Cold Path (Iceberg)

  • Full conversation history (partitioned by user_id, session_id)
  • Checkpoint snapshots
  • Time-travel queries
  • GDPR-compliant deletion with compaction

Vector Search (Qdrant)

  • Conversation embeddings
  • Long-term memory
  • RAG retrieval
  • Payload-indexed by user_id for fast GDPR deletion
  • Global knowledge base (user_id="0") loaded from markdown files

GDPR Compliance

// Delete user data across all stores
await conversationStore.deleteUserData(userId);
await ragRetriever.deleteUserData(userId);
await checkpointSaver.delete(userId);
await containerManager.deleteContainer(userId);

// Iceberg physical delete
await icebergTable.expire_snapshots();
await icebergTable.rewrite_data_files();

Standard Patterns

Validation Loop (Retry with Fixes)

graph.addConditionalEdges('validate', (state) => {
  if (state.errors.length > 0 && state.retryCount < 3) {
    return 'fix_errors';  // Loop back
  }
  return state.errors.length === 0 ? 'approve' : 'reject';
});

Human-in-the-Loop (Approval Gates)

const approvalNode = async (state) => {
  // Send to user's channel
  await sendToChannel(state.userContext.activeChannel, {
    type: 'approval_request',
    data: { /* details */ }
  });

  // LangGraph pauses here via Interrupt
  // Resume with user input: graph.invoke(state, { ...resumeConfig })

  return { approvalRequested: true };
};

Getting Started

1. Install Dependencies

Already in gateway/package.json:

{
  "@langchain/core": "^0.3.24",
  "@langchain/langgraph": "^0.2.26",
  "@langchain/anthropic": "^0.3.8",
  "ioredis": "^5.4.2"
}

2. Initialize Memory Layer

import Redis from 'ioredis';
import {
  TieredCheckpointSaver,
  ConversationStore,
  EmbeddingService,
  RAGRetriever
} from './harness/memory';

const redis = new Redis(process.env.REDIS_URL);

const checkpointSaver = new TieredCheckpointSaver(redis, logger);
const conversationStore = new ConversationStore(redis, logger);
const embeddings = new EmbeddingService({ provider: 'openai', apiKey }, logger);
const ragRetriever = new RAGRetriever({ url: QDRANT_URL }, logger);

await ragRetriever.initialize();

3. Create Subagents

import { createCodeReviewerSubagent } from './harness/subagents';
import { ModelRouter } from './llm/router';

const model = await modelRouter.route(query, license);
const codeReviewer = await createCodeReviewerSubagent(
  model,
  logger,
  'gateway/src/harness/subagents/code-reviewer'
);

4. Build Workflows

import { createStrategyValidationWorkflow } from './harness/workflows';

const workflow = await createStrategyValidationWorkflow(
  model,
  codeReviewer,
  mcpBacktestFn,
  logger,
  'gateway/src/harness/workflows/strategy-validation/config.yaml'
);

const result = await workflow.execute({
  userContext,
  strategyCode: '...',
  ticker: 'BTC/USDT',
  timeframe: '4h'
});

5. Use Skills

import { MarketAnalysisSkill } from './harness/skills';

const skill = new MarketAnalysisSkill(logger, model);
const analysis = await skill.execute({
  context: userContext,
  parameters: { ticker: 'BTC/USDT', period: '1h' }
});

Global Knowledge System

The harness includes a document loader that automatically loads markdown files from gateway/knowledge/ into Qdrant as global knowledge (user_id="0").

Directory Structure

gateway/knowledge/
  ├── platform/          # Platform capabilities and architecture
  ├── trading/           # Trading concepts and fundamentals
  ├── indicators/        # Indicator development guides
  └── strategies/        # Strategy patterns and examples

How It Works

  1. Startup: Documents are loaded automatically when gateway starts
  2. Chunking: Intelligent splitting by markdown headers (~1000 tokens/chunk)
  3. Embedding: Chunks are embedded using configured embedding service
  4. Storage: Stored in Qdrant with user_id="0" (global namespace)
  5. Updates: Content hashing detects changes for incremental updates

RAG Query Flow

When a user sends a message:

  1. Query is embedded using same embedding service
  2. Qdrant searches vectors with filter: user_id = current_user OR user_id = "0"
  3. Results include both user-specific and global knowledge
  4. Relevant chunks are added to LLM context
  5. LLM generates response with platform knowledge

Managing Knowledge

Add new documents:

# Create markdown file in appropriate directory
echo "# New Topic" > gateway/knowledge/platform/new-topic.md

# Reload knowledge (development)
curl -X POST http://localhost:3000/admin/reload-knowledge

Check stats:

curl http://localhost:3000/admin/knowledge-stats

In production: Just deploy updated markdown files - they'll be loaded on startup.

See gateway/knowledge/README.md for detailed documentation.

Next Steps

  1. Implement Iceberg Integration: Complete TODOs in checkpoint-saver.ts and conversation-store.ts
  2. Add More Subagents: Risk analyzer, market analyst, etc.
  3. Implement Interrupts: Full human-in-the-loop with LangGraph interrupts
  4. Add Platform Tools: Market data queries, chart rendering, etc.
  5. Expand Knowledge Base: Add more platform documentation to knowledge/

References