Files
ai/gateway/README.md

11 KiB

Dexorder Gateway

Multi-channel gateway with agent harness for the Dexorder AI platform.

Architecture

┌─────────────────────────────────────────────────────────┐
│                    Platform Gateway                      │
│                   (Node.js/Fastify)                      │
│                                                          │
│  ┌────────────────────────────────────────────────┐    │
│  │  Channels                                       │    │
│  │  - WebSocket (/ws/chat)                         │    │
│  │  - Telegram Webhook (/webhook/telegram)        │    │
│  └────────────────────────────────────────────────┘    │
│                         ↕                                │
│  ┌────────────────────────────────────────────────┐    │
│  │  Authenticator                                  │    │
│  │  - JWT verification (WebSocket)                 │    │
│  │  - Channel linking (Telegram)                   │    │
│  │  - User license lookup (PostgreSQL)             │    │
│  └────────────────────────────────────────────────┘    │
│                         ↕                                │
│  ┌────────────────────────────────────────────────┐    │
│  │  Agent Harness (per-session)                    │    │
│  │  - Claude API integration                       │    │
│  │  - MCP client connector                         │    │
│  │  - Conversation state                           │    │
│  └────────────────────────────────────────────────┘    │
│                         ↕                                │
│  ┌────────────────────────────────────────────────┐    │
│  │  MCP Client                                      │    │
│  │  - User container connection                    │    │
│  │  - Tool routing                                  │    │
│  └────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────┘
                          ↕
          ┌───────────────────────────────┐
          │  User MCP Server (Python)      │
          │  - Strategies, indicators       │
          │  - Memory, preferences          │
          │  - Backtest sandbox             │
          └───────────────────────────────────┘

Features

  • Automatic container provisioning: Creates user agent containers on-demand via Kubernetes
  • Multi-channel support: WebSocket and Telegram webhooks
  • Per-channel authentication: JWT for web, channel linking for chat apps
  • User license management: Feature flags and resource limits from PostgreSQL
  • Container lifecycle management: Auto-shutdown on idle (handled by container sidecar)
  • License-based resources: Different memory/CPU/storage limits per tier
  • Multi-model LLM support: Anthropic Claude, OpenAI GPT, Google Gemini, OpenRouter (300+ models)
  • Zero vendor lock-in: Switch models with one line, powered by LangChain.js
  • Intelligent routing: Auto-select models based on complexity, license tier, or user preference
  • Streaming responses: Real-time chat with WebSocket and Telegram
  • Complex workflows: LangGraph for stateful trading analysis (backtest → risk → approval)
  • Agent harness: Stateless orchestrator (all context lives in user's MCP container)
  • MCP resource integration: User's RAG, conversation history, and preferences

Container Management

When a user authenticates, the gateway:

  1. Checks for existing container: Queries Kubernetes for deployment
  2. Creates if missing: Renders YAML template based on license tier
  3. Waits for ready: Polls deployment status until healthy
  4. Returns MCP endpoint: Computed from service name
  5. Connects to MCP server: Proceeds with normal authentication flow

Container templates by license tier:

Tier Memory CPU Storage Idle Timeout
Free 512Mi 500m 1Gi 15min
Pro 2Gi 2000m 10Gi 60min
Enterprise 4Gi 4000m 50Gi Never

Containers self-manage their lifecycle using the lifecycle sidecar (see ../lifecycle-sidecar/)

Setup

Prerequisites

  • Node.js >= 22.0.0
  • PostgreSQL database
  • At least one LLM provider API key:
    • Anthropic Claude
    • OpenAI GPT
    • Google Gemini
    • OpenRouter (one key for 300+ models)
  • Ollama (for embeddings): https://ollama.com/download
  • Redis (for session/hot storage)
  • Qdrant (for RAG vector search)
  • Kafka + Flink + Iceberg (for durable storage)

Development

  1. Install dependencies:
npm install
  1. Copy environment template:
cp .env.example .env
  1. Configure .env (see .env.example):
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/dexorder

# Configure at least one provider
ANTHROPIC_API_KEY=sk-ant-xxxxx
# OPENAI_API_KEY=sk-xxxxx
# GOOGLE_API_KEY=xxxxx
# OPENROUTER_API_KEY=sk-or-xxxxx

# Optional: Set default model
DEFAULT_MODEL_PROVIDER=anthropic
DEFAULT_MODEL=claude-3-5-sonnet-20241022
  1. Start Ollama and pull embedding model:
# Install Ollama (one-time): https://ollama.com/download
# Or with Docker: docker run -d -p 11434:11434 ollama/ollama

# Pull the all-minilm embedding model (90MB, CPU-friendly)
ollama pull all-minilm

# Alternative models:
# ollama pull nomic-embed-text  # 8K context length
# ollama pull mxbai-embed-large  # Higher accuracy, slower
  1. Run development server:
npm run dev

Production Build

npm run build
npm start

Docker

docker build -t dexorder/gateway:latest .
docker run -p 3000:3000 --env-file .env dexorder/gateway:latest

Database Schema

Required PostgreSQL tables (will be documented separately):

user_licenses

  • user_id (text, primary key)
  • email (text)
  • license_type (text: 'free', 'pro', 'enterprise')
  • features (jsonb)
  • resource_limits (jsonb)
  • mcp_server_url (text)
  • expires_at (timestamp, nullable)
  • created_at (timestamp)
  • updated_at (timestamp)
  • id (serial, primary key)
  • user_id (text, foreign key)
  • channel_type (text: 'telegram', 'slack', 'discord')
  • channel_user_id (text)
  • created_at (timestamp)

API Endpoints

WebSocket

GET /ws/chat

  • WebSocket connection for web client
  • Auth: Bearer token in headers
  • Protocol: JSON messages

Example:

const ws = new WebSocket('ws://localhost:3000/ws/chat', {
  headers: {
    'Authorization': 'Bearer your-jwt-token'
  }
});

ws.on('message', (data) => {
  const msg = JSON.parse(data);
  console.log(msg);
});

ws.send(JSON.stringify({
  type: 'message',
  content: 'Hello, AI!'
}));

Telegram Webhook

POST /webhook/telegram

  • Telegram bot webhook endpoint
  • Auth: Telegram user linked to platform user
  • Automatically processes incoming messages

Health Check

GET /health

  • Returns server health status

Ollama Deployment Options

The gateway requires Ollama for embedding generation in RAG queries. You have two deployment options:

Install Ollama directly in the gateway container. This keeps all dependencies local and simplifies networking.

Dockerfile additions:

FROM node:22-slim

# Install Ollama
RUN curl -fsSL https://ollama.com/install.sh | sh

# Pull embedding model at build time
RUN ollama serve & \
    sleep 5 && \
    ollama pull all-minilm && \
    pkill ollama

# ... rest of your gateway Dockerfile

Start script (entrypoint.sh):

#!/bin/bash
# Start Ollama in background
ollama serve &

# Start gateway
node dist/main.js

Pros:

  • Simple networking (localhost:11434)
  • No extra K8s resources
  • Self-contained deployment

Cons:

  • Larger container image (~200MB extra)
  • CPU/memory shared with gateway process

Resource requirements:

  • Add +200MB memory
  • Add +0.2 CPU cores for embedding inference

Option 2: Ollama as Separate Pod/Sidecar

Deploy Ollama as a separate container in the same pod (sidecar) or as its own deployment.

K8s Deployment (sidecar pattern):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: gateway
spec:
  template:
    spec:
      containers:
      - name: gateway
        image: ghcr.io/dexorder/gateway:latest
        env:
        - name: OLLAMA_URL
          value: http://localhost:11434

      - name: ollama
        image: ollama/ollama:latest
        command: ["/bin/sh", "-c"]
        args:
          - |
            ollama serve &
            sleep 5
            ollama pull all-minilm
            wait
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "1Gi"
            cpu: "1000m"

K8s Deployment (separate service):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ollama
spec:
  replicas: 1
  template:
    spec:
      containers:
      - name: ollama
        image: ollama/ollama:latest
        # ... same as above
---
apiVersion: v1
kind: Service
metadata:
  name: ollama
spec:
  selector:
    app: ollama
  ports:
  - port: 11434

Gateway .env:

OLLAMA_URL=http://ollama:11434

Pros:

  • Isolated resource limits
  • Can scale separately
  • Easier to monitor/debug

Cons:

  • More K8s resources
  • Network hop (minimal latency)
  • More complex deployment

Recommendation

For most deployments: Use Option 1 (in-container) for simplicity, unless you need to:

  • Share Ollama across multiple services
  • Scale embedding inference independently
  • Run Ollama on GPU nodes (gateway on CPU nodes)

TODO

  • Implement JWT verification with JWKS
  • Implement MCP HTTP/SSE transport
  • Add rate limiting per user license
  • Add message usage tracking
  • Add streaming responses for WebSocket
  • Add Slack and Discord channel handlers
  • Add session cleanup/timeout logic