11 KiB
Dexorder Gateway
Multi-channel gateway with agent harness for the Dexorder AI platform.
Architecture
┌─────────────────────────────────────────────────────────┐
│ Platform Gateway │
│ (Node.js/Fastify) │
│ │
│ ┌────────────────────────────────────────────────┐ │
│ │ Channels │ │
│ │ - WebSocket (/ws/chat) │ │
│ │ - Telegram Webhook (/webhook/telegram) │ │
│ └────────────────────────────────────────────────┘ │
│ ↕ │
│ ┌────────────────────────────────────────────────┐ │
│ │ Authenticator │ │
│ │ - JWT verification (WebSocket) │ │
│ │ - Channel linking (Telegram) │ │
│ │ - User license lookup (PostgreSQL) │ │
│ └────────────────────────────────────────────────┘ │
│ ↕ │
│ ┌────────────────────────────────────────────────┐ │
│ │ Agent Harness (per-session) │ │
│ │ - Claude API integration │ │
│ │ - MCP client connector │ │
│ │ - Conversation state │ │
│ └────────────────────────────────────────────────┘ │
│ ↕ │
│ ┌────────────────────────────────────────────────┐ │
│ │ MCP Client │ │
│ │ - User container connection │ │
│ │ - Tool routing │ │
│ └────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
↕
┌───────────────────────────────┐
│ User MCP Server (Python) │
│ - Strategies, indicators │
│ - Memory, preferences │
│ - Backtest sandbox │
└───────────────────────────────────┘
Features
- Automatic container provisioning: Creates user agent containers on-demand via Kubernetes
- Multi-channel support: WebSocket and Telegram webhooks
- Per-channel authentication: JWT for web, channel linking for chat apps
- User license management: Feature flags and resource limits from PostgreSQL
- Container lifecycle management: Auto-shutdown on idle (handled by container sidecar)
- License-based resources: Different memory/CPU/storage limits per tier
- Multi-model LLM support: Anthropic Claude, OpenAI GPT, Google Gemini, OpenRouter (300+ models)
- Zero vendor lock-in: Switch models with one line, powered by LangChain.js
- Intelligent routing: Auto-select models based on complexity, license tier, or user preference
- Streaming responses: Real-time chat with WebSocket and Telegram
- Complex workflows: LangGraph for stateful trading analysis (backtest → risk → approval)
- Agent harness: Stateless orchestrator (all context lives in user's MCP container)
- MCP resource integration: User's RAG, conversation history, and preferences
Container Management
When a user authenticates, the gateway:
- Checks for existing container: Queries Kubernetes for deployment
- Creates if missing: Renders YAML template based on license tier
- Waits for ready: Polls deployment status until healthy
- Returns MCP endpoint: Computed from service name
- Connects to MCP server: Proceeds with normal authentication flow
Container templates by license tier:
| Tier | Memory | CPU | Storage | Idle Timeout |
|---|---|---|---|---|
| Free | 512Mi | 500m | 1Gi | 15min |
| Pro | 2Gi | 2000m | 10Gi | 60min |
| Enterprise | 4Gi | 4000m | 50Gi | Never |
Containers self-manage their lifecycle using the lifecycle sidecar (see ../lifecycle-sidecar/)
Setup
Prerequisites
- Node.js >= 22.0.0
- PostgreSQL database
- At least one LLM provider API key:
- Anthropic Claude
- OpenAI GPT
- Google Gemini
- OpenRouter (one key for 300+ models)
- Ollama (for embeddings): https://ollama.com/download
- Redis (for session/hot storage)
- Qdrant (for RAG vector search)
- Kafka + Flink + Iceberg (for durable storage)
Development
- Install dependencies:
npm install
- Copy environment template:
cp .env.example .env
- Configure
.env(see.env.example):
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/dexorder
# Configure at least one provider
ANTHROPIC_API_KEY=sk-ant-xxxxx
# OPENAI_API_KEY=sk-xxxxx
# GOOGLE_API_KEY=xxxxx
# OPENROUTER_API_KEY=sk-or-xxxxx
# Optional: Set default model
DEFAULT_MODEL_PROVIDER=anthropic
DEFAULT_MODEL=claude-3-5-sonnet-20241022
- Start Ollama and pull embedding model:
# Install Ollama (one-time): https://ollama.com/download
# Or with Docker: docker run -d -p 11434:11434 ollama/ollama
# Pull the all-minilm embedding model (90MB, CPU-friendly)
ollama pull all-minilm
# Alternative models:
# ollama pull nomic-embed-text # 8K context length
# ollama pull mxbai-embed-large # Higher accuracy, slower
- Run development server:
npm run dev
Production Build
npm run build
npm start
Docker
docker build -t dexorder/gateway:latest .
docker run -p 3000:3000 --env-file .env dexorder/gateway:latest
Database Schema
Required PostgreSQL tables (will be documented separately):
user_licenses
user_id(text, primary key)email(text)license_type(text: 'free', 'pro', 'enterprise')features(jsonb)resource_limits(jsonb)mcp_server_url(text)expires_at(timestamp, nullable)created_at(timestamp)updated_at(timestamp)
user_channel_links
id(serial, primary key)user_id(text, foreign key)channel_type(text: 'telegram', 'slack', 'discord')channel_user_id(text)created_at(timestamp)
API Endpoints
WebSocket
GET /ws/chat
- WebSocket connection for web client
- Auth: Bearer token in headers
- Protocol: JSON messages
Example:
const ws = new WebSocket('ws://localhost:3000/ws/chat', {
headers: {
'Authorization': 'Bearer your-jwt-token'
}
});
ws.on('message', (data) => {
const msg = JSON.parse(data);
console.log(msg);
});
ws.send(JSON.stringify({
type: 'message',
content: 'Hello, AI!'
}));
Telegram Webhook
POST /webhook/telegram
- Telegram bot webhook endpoint
- Auth: Telegram user linked to platform user
- Automatically processes incoming messages
Health Check
GET /health
- Returns server health status
Ollama Deployment Options
The gateway requires Ollama for embedding generation in RAG queries. You have two deployment options:
Option 1: Ollama in Gateway Container (Recommended for simplicity)
Install Ollama directly in the gateway container. This keeps all dependencies local and simplifies networking.
Dockerfile additions:
FROM node:22-slim
# Install Ollama
RUN curl -fsSL https://ollama.com/install.sh | sh
# Pull embedding model at build time
RUN ollama serve & \
sleep 5 && \
ollama pull all-minilm && \
pkill ollama
# ... rest of your gateway Dockerfile
Start script (entrypoint.sh):
#!/bin/bash
# Start Ollama in background
ollama serve &
# Start gateway
node dist/main.js
Pros:
- Simple networking (localhost:11434)
- No extra K8s resources
- Self-contained deployment
Cons:
- Larger container image (~200MB extra)
- CPU/memory shared with gateway process
Resource requirements:
- Add +200MB memory
- Add +0.2 CPU cores for embedding inference
Option 2: Ollama as Separate Pod/Sidecar
Deploy Ollama as a separate container in the same pod (sidecar) or as its own deployment.
K8s Deployment (sidecar pattern):
apiVersion: apps/v1
kind: Deployment
metadata:
name: gateway
spec:
template:
spec:
containers:
- name: gateway
image: ghcr.io/dexorder/gateway:latest
env:
- name: OLLAMA_URL
value: http://localhost:11434
- name: ollama
image: ollama/ollama:latest
command: ["/bin/sh", "-c"]
args:
- |
ollama serve &
sleep 5
ollama pull all-minilm
wait
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "1000m"
K8s Deployment (separate service):
apiVersion: apps/v1
kind: Deployment
metadata:
name: ollama
spec:
replicas: 1
template:
spec:
containers:
- name: ollama
image: ollama/ollama:latest
# ... same as above
---
apiVersion: v1
kind: Service
metadata:
name: ollama
spec:
selector:
app: ollama
ports:
- port: 11434
Gateway .env:
OLLAMA_URL=http://ollama:11434
Pros:
- Isolated resource limits
- Can scale separately
- Easier to monitor/debug
Cons:
- More K8s resources
- Network hop (minimal latency)
- More complex deployment
Recommendation
For most deployments: Use Option 1 (in-container) for simplicity, unless you need to:
- Share Ollama across multiple services
- Scale embedding inference independently
- Run Ollama on GPU nodes (gateway on CPU nodes)
TODO
- Implement JWT verification with JWKS
- Implement MCP HTTP/SSE transport
- Add rate limiting per user license
- Add message usage tracking
- Add streaming responses for WebSocket
- Add Slack and Discord channel handlers
- Add session cleanup/timeout logic