Files
ai/gateway/ARCHITECTURE.md

9.0 KiB

Gateway Architecture: LangChain.js + LangGraph

Why LangChain.js (Not Vercel AI SDK or Direct Anthropic SDK)?

The Decision

After evaluating Vercel AI SDK and LangChain.js, we chose LangChain.js + LangGraph for these reasons:

  1. Multi-model support: 300+ models via OpenRouter, plus direct integrations
  2. Complex workflows: LangGraph for stateful trading analysis pipelines
  3. No vendor lock-in: Switch between Anthropic, OpenAI, Google with one line
  4. Streaming: Same as Vercel AI SDK (.stream() method)
  5. Tool calling: Unified across all providers
  6. Trading-specific: State management, conditional branching, human-in-the-loop

We don't need Vercel AI SDK because:

  • We use Vue (not React) - don't need React hooks
  • We have Node.js servers (not edge) - don't need edge runtime
  • DO need complex workflows (strategy analysis, backtesting, approvals)
  • DO need stateful execution (resume from failures)

Architecture Layers

Layer 1: Model Abstraction (src/llm/)

Provider Factory (provider.ts)

const factory = new LLMProviderFactory(config, logger);

// Create any model
const claude = factory.createModel({
  provider: 'anthropic',
  model: 'claude-3-5-sonnet-20241022',
});

const gpt4 = factory.createModel({
  provider: 'openai',
  model: 'gpt-4o',
});

Model Router (router.ts)

const router = new ModelRouter(factory, logger);

// Intelligently route based on:
// - User license (free → Gemini Flash, pro → GPT-4, enterprise → Claude)
// - Query complexity (simple → cheap, complex → smart)
// - User preference (if set in license.preferredModel)
// - Cost optimization (always use cheapest)

const model = await router.route(
  message.content,
  userLicense,
  RoutingStrategy.COMPLEXITY
);

Layer 2: Agent Harness (src/harness/)

Stateless Orchestrator

The harness has ZERO conversation state. Everything lives in user's MCP container.

Flow:

async handleMessage(message: InboundMessage) {
  // 1. Fetch context from user's MCP (resources, not tools)
  const resources = await mcpClient.listResources();
  const context = await Promise.all([
    mcpClient.readResource('context://user-profile'),        // Trading style
    mcpClient.readResource('context://conversation-summary'), // RAG summary
    mcpClient.readResource('context://workspace-state'),      // Current chart
    mcpClient.readResource('context://system-prompt'),        // Custom instructions
  ]);

  // 2. Route to appropriate model
  const model = await modelRouter.route(message, license);

  // 3. Build messages with embedded context
  const messages = buildLangChainMessages(systemPrompt, context);

  // 4. Call LLM
  const response = await model.invoke(messages);

  // 5. Save to user's MCP (tool call)
  await mcpClient.callTool('save_message', { role: 'user', content: message });
  await mcpClient.callTool('save_message', { role: 'assistant', content: response });

  return response;
}

Streaming variant:

async *streamMessage(message: InboundMessage) {
  const model = await modelRouter.route(message, license);
  const messages = buildMessages(context, message);

  const stream = await model.stream(messages);

  let fullResponse = '';
  for await (const chunk of stream) {
    fullResponse += chunk.content;
    yield chunk.content; // Stream to WebSocket/Telegram
  }

  // Save after streaming completes
  await mcpClient.callTool('save_message', { /* ... */ });
}

Layer 3: Workflows (src/workflows/)

LangGraph for Complex Trading Analysis

// Example: Strategy Analysis Pipeline
const workflow = new StateGraph(StrategyAnalysisState)
  .addNode('code_review', async (state) => {
    const model = new ChatAnthropic({ model: 'claude-3-opus' });
    const review = await model.invoke(`Review: ${state.strategyCode}`);
    return { codeReview: review.content };
  })
  .addNode('backtest', async (state) => {
    // Call user's MCP backtest tool
    const results = await mcpClient.callTool('run_backtest', {
      strategy: state.strategyCode,
      ticker: state.ticker,
    });
    return { backtestResults: results };
  })
  .addNode('risk_assessment', async (state) => {
    const model = new ChatAnthropic({ model: 'claude-3-5-sonnet' });
    const assessment = await model.invoke(
      `Analyze risk: ${JSON.stringify(state.backtestResults)}`
    );
    return { riskAssessment: assessment.content };
  })
  .addNode('human_approval', async (state) => {
    // Pause for user review (human-in-the-loop)
    return { humanApproved: await waitForUserApproval(state) };
  })
  .addConditionalEdges('human_approval', (state) => {
    return state.humanApproved ? 'deploy' : 'reject';
  })
  .compile();

// Execute
const result = await workflow.invoke({
  strategyCode: userCode,
  ticker: 'BTC/USDT',
  timeframe: '1h',
});

Benefits:

  • Stateful: Resume if server crashes mid-analysis
  • Conditional: Route based on results (if Sharpe > 2 → deploy, else → reject)
  • Human-in-the-loop: Pause for user approval
  • Multi-step: Each node can use different models

User Context Architecture

MCP Resources (Not Tools)

User's MCP server exposes resources (read-only context):

context://user-profile          → Trading style, preferences
context://conversation-summary  → RAG-generated summary
context://workspace-state       → Current chart, positions
context://system-prompt         → User's custom AI instructions

Gateway fetches and embeds in LLM call:

const userProfile = await mcpClient.readResource('context://user-profile');
const conversationSummary = await mcpClient.readResource('context://conversation-summary');

// User's MCP server runs RAG search and returns summary
// Gateway embeds this in Claude/GPT prompt

Why resources, not tools?

  • Resources = context injection (read-only)
  • Tools = actions (write operations)
  • Context should be fetched before LLM call, not during

Model Routing Strategies

1. User Preference

// User's license has preferred model
{
  "preferredModel": {
    "provider": "anthropic",
    "model": "claude-3-5-sonnet-20241022"
  }
}

// Router uses this if set

2. Complexity-Based

const isComplex = message.includes('backtest') || message.length > 200;

if (isComplex) {
  return { provider: 'anthropic', model: 'claude-3-opus' }; // Smart
} else {
  return { provider: 'openai', model: 'gpt-4o-mini' }; // Fast
}

3. License Tier

switch (license.licenseType) {
  case 'free':
    return { provider: 'google', model: 'gemini-2.0-flash-exp' }; // Cheap
  case 'pro':
    return { provider: 'openai', model: 'gpt-4o' }; // Balanced
  case 'enterprise':
    return { provider: 'anthropic', model: 'claude-3-5-sonnet' }; // Premium
}

4. Cost-Optimized

return { provider: 'google', model: 'gemini-2.0-flash-exp' }; // Always cheapest

When to Use What

Simple Chat → Agent Harness

// User: "What's the RSI on BTC?"
// → Fast streaming response via harness.streamMessage()

Complex Analysis → LangGraph Workflow

// User: "Analyze this strategy and backtest it"
// → Multi-step workflow: code review → backtest → risk → approval

Direct Tool Call → MCP Client

// User: "Get my watchlist"
// → Direct MCP tool call, no LLM needed

Data Flow

User Message ("Analyze my strategy")
    ↓
Gateway → Route to workflow (not harness)
    ↓
LangGraph Workflow:
  ├─ Node 1: Code Review (Claude Opus)
  │   └─ Analyzes strategy code
  ├─ Node 2: Backtest (MCP tool call)
  │   └─ User's container runs backtest
  ├─ Node 3: Risk Assessment (Claude Sonnet)
  │   └─ Evaluates results
  ├─ Node 4: Human Approval (pause)
  │   └─ User reviews in UI
  └─ Node 5: Recommendation (GPT-4o-mini)
      └─ Final decision

Result → Return to user

Benefits Summary

Feature LangChain.js Vercel AI SDK Direct Anthropic SDK
Multi-model 300+ models 100+ models Anthropic only
Streaming .stream() streamText() .stream()
Tool calling Unified Unified Anthropic format
Complex workflows LangGraph Limited DIY
Stateful agents LangGraph No No
Human-in-the-loop LangGraph No No
React hooks N/A useChat() N/A
Bundle size Large (101kb) Small (30kb) Medium (60kb)
Dexorder needs Perfect fit Missing workflows Vendor lock-in

Next Steps

  1. Implement tool calling in agent harness (bind MCP tools to LangChain)
  2. Add state persistence for LangGraph (PostgreSQL checkpointer)
  3. Build more workflows: market scanner, portfolio optimizer
  4. Add monitoring: Track model usage, costs, latency
  5. User container: Implement Python MCP server with resources