# Dexorder Gateway Multi-channel gateway with agent harness for the Dexorder AI platform. ## Architecture ``` ┌─────────────────────────────────────────────────────────┐ │ Platform Gateway │ │ (Node.js/Fastify) │ │ │ │ ┌────────────────────────────────────────────────┐ │ │ │ Channels │ │ │ │ - WebSocket (/ws/chat) │ │ │ │ - Telegram Webhook (/webhook/telegram) │ │ │ └────────────────────────────────────────────────┘ │ │ ↕ │ │ ┌────────────────────────────────────────────────┐ │ │ │ Authenticator │ │ │ │ - JWT verification (WebSocket) │ │ │ │ - Channel linking (Telegram) │ │ │ │ - User license lookup (PostgreSQL) │ │ │ └────────────────────────────────────────────────┘ │ │ ↕ │ │ ┌────────────────────────────────────────────────┐ │ │ │ Agent Harness (per-session) │ │ │ │ - Claude API integration │ │ │ │ - MCP client connector │ │ │ │ - Conversation state │ │ │ └────────────────────────────────────────────────┘ │ │ ↕ │ │ ┌────────────────────────────────────────────────┐ │ │ │ MCP Client │ │ │ │ - User container connection │ │ │ │ - Tool routing │ │ │ └────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────┘ ↕ ┌───────────────────────────────┐ │ User MCP Server (Python) │ │ - Strategies, indicators │ │ - Memory, preferences │ │ - Backtest sandbox │ └───────────────────────────────────┘ ``` ## Features - **Automatic container provisioning**: Creates user agent containers on-demand via Kubernetes - **Multi-channel support**: WebSocket and Telegram webhooks - **Per-channel authentication**: JWT for web, channel linking for chat apps - **User license management**: Feature flags and resource limits from PostgreSQL - **Container lifecycle management**: Auto-shutdown on idle (handled by container sidecar) - **License-based resources**: Different memory/CPU/storage limits per tier - **Multi-model LLM support**: Anthropic Claude, OpenAI GPT, Google Gemini, OpenRouter (300+ models) - **Zero vendor lock-in**: Switch models with one line, powered by LangChain.js - **Intelligent routing**: Auto-select models based on complexity, license tier, or user preference - **Streaming responses**: Real-time chat with WebSocket and Telegram - **Complex workflows**: LangGraph for stateful trading analysis (backtest → risk → approval) - **Agent harness**: Stateless orchestrator (all context lives in user's MCP container) - **MCP resource integration**: User's RAG, conversation history, and preferences ## Container Management When a user authenticates, the gateway: 1. **Checks for existing container**: Queries Kubernetes for deployment 2. **Creates if missing**: Renders YAML template based on license tier 3. **Waits for ready**: Polls deployment status until healthy 4. **Returns MCP endpoint**: Computed from service name 5. **Connects to MCP server**: Proceeds with normal authentication flow Container templates by license tier: | Tier | Memory | CPU | Storage | Idle Timeout | |------|--------|-----|---------|--------------| | Free | 512Mi | 500m | 1Gi | 15min | | Pro | 2Gi | 2000m | 10Gi | 60min | | Enterprise | 4Gi | 4000m | 50Gi | Never | Containers self-manage their lifecycle using the lifecycle sidecar (see `../lifecycle-sidecar/`) ## Setup ### Prerequisites - Node.js >= 22.0.0 - PostgreSQL database - At least one LLM provider API key: - Anthropic Claude - OpenAI GPT - Google Gemini - OpenRouter (one key for 300+ models) - Ollama (for embeddings): https://ollama.com/download - Redis (for session/hot storage) - Qdrant (for RAG vector search) - Kafka + Flink + Iceberg (for durable storage) ### Development 1. Install dependencies: ```bash npm install ``` 2. Copy environment template: ```bash cp .env.example .env ``` 3. Configure `.env` (see `.env.example`): ```bash DATABASE_URL=postgresql://postgres:postgres@localhost:5432/dexorder # Configure at least one provider ANTHROPIC_API_KEY=sk-ant-xxxxx # OPENAI_API_KEY=sk-xxxxx # GOOGLE_API_KEY=xxxxx # OPENROUTER_API_KEY=sk-or-xxxxx # Optional: Set default model DEFAULT_MODEL_PROVIDER=anthropic DEFAULT_MODEL=claude-3-5-sonnet-20241022 ``` 4. Start Ollama and pull embedding model: ```bash # Install Ollama (one-time): https://ollama.com/download # Or with Docker: docker run -d -p 11434:11434 ollama/ollama # Pull the all-minilm embedding model (90MB, CPU-friendly) ollama pull all-minilm # Alternative models: # ollama pull nomic-embed-text # 8K context length # ollama pull mxbai-embed-large # Higher accuracy, slower ``` 5. Run development server: ```bash npm run dev ``` ### Production Build ```bash npm run build npm start ``` ### Docker ```bash docker build -t dexorder/gateway:latest . docker run -p 3000:3000 --env-file .env dexorder/gateway:latest ``` ## Database Schema Required PostgreSQL tables (will be documented separately): ### `user_licenses` - `user_id` (text, primary key) - `email` (text) - `license_type` (text: 'free', 'pro', 'enterprise') - `features` (jsonb) - `resource_limits` (jsonb) - `mcp_server_url` (text) - `expires_at` (timestamp, nullable) - `created_at` (timestamp) - `updated_at` (timestamp) ### `user_channel_links` - `id` (serial, primary key) - `user_id` (text, foreign key) - `channel_type` (text: 'telegram', 'slack', 'discord') - `channel_user_id` (text) - `created_at` (timestamp) ## API Endpoints ### WebSocket **`GET /ws/chat`** - WebSocket connection for web client - Auth: Bearer token in headers - Protocol: JSON messages Example: ```javascript const ws = new WebSocket('ws://localhost:3000/ws/chat', { headers: { 'Authorization': 'Bearer your-jwt-token' } }); ws.on('message', (data) => { const msg = JSON.parse(data); console.log(msg); }); ws.send(JSON.stringify({ type: 'message', content: 'Hello, AI!' })); ``` ### Telegram Webhook **`POST /webhook/telegram`** - Telegram bot webhook endpoint - Auth: Telegram user linked to platform user - Automatically processes incoming messages ### Health Check **`GET /health`** - Returns server health status ## Ollama Deployment Options The gateway requires Ollama for embedding generation in RAG queries. You have two deployment options: ### Option 1: Ollama in Gateway Container (Recommended for simplicity) Install Ollama directly in the gateway container. This keeps all dependencies local and simplifies networking. **Dockerfile additions:** ```dockerfile FROM node:22-slim # Install Ollama RUN curl -fsSL https://ollama.com/install.sh | sh # Pull embedding model at build time RUN ollama serve & \ sleep 5 && \ ollama pull all-minilm && \ pkill ollama # ... rest of your gateway Dockerfile ``` **Start script (entrypoint.sh):** ```bash #!/bin/bash # Start Ollama in background ollama serve & # Start gateway node dist/main.js ``` **Pros:** - Simple networking (localhost:11434) - No extra K8s resources - Self-contained deployment **Cons:** - Larger container image (~200MB extra) - CPU/memory shared with gateway process **Resource requirements:** - Add +200MB memory - Add +0.2 CPU cores for embedding inference ### Option 2: Ollama as Separate Pod/Sidecar Deploy Ollama as a separate container in the same pod (sidecar) or as its own deployment. **K8s Deployment (sidecar pattern):** ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: gateway spec: template: spec: containers: - name: gateway image: ghcr.io/dexorder/gateway:latest env: - name: OLLAMA_URL value: http://localhost:11434 - name: ollama image: ollama/ollama:latest command: ["/bin/sh", "-c"] args: - | ollama serve & sleep 5 ollama pull all-minilm wait resources: requests: memory: "512Mi" cpu: "500m" limits: memory: "1Gi" cpu: "1000m" ``` **K8s Deployment (separate service):** ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: ollama spec: replicas: 1 template: spec: containers: - name: ollama image: ollama/ollama:latest # ... same as above --- apiVersion: v1 kind: Service metadata: name: ollama spec: selector: app: ollama ports: - port: 11434 ``` Gateway `.env`: ```bash OLLAMA_URL=http://ollama:11434 ``` **Pros:** - Isolated resource limits - Can scale separately - Easier to monitor/debug **Cons:** - More K8s resources - Network hop (minimal latency) - More complex deployment ### Recommendation For most deployments: **Use Option 1 (in-container)** for simplicity, unless you need to: - Share Ollama across multiple services - Scale embedding inference independently - Run Ollama on GPU nodes (gateway on CPU nodes) ## TODO - [ ] Implement JWT verification with JWKS - [ ] Implement MCP HTTP/SSE transport - [ ] Add rate limiting per user license - [ ] Add message usage tracking - [ ] Add streaming responses for WebSocket - [ ] Add Slack and Discord channel handlers - [ ] Add session cleanup/timeout logic