Files
ai/gateway/README.md

362 lines
11 KiB
Markdown

# Dexorder Gateway
Multi-channel gateway with agent harness for the Dexorder AI platform.
## Architecture
```
┌─────────────────────────────────────────────────────────┐
│ Platform Gateway │
│ (Node.js/Fastify) │
│ │
│ ┌────────────────────────────────────────────────┐ │
│ │ Channels │ │
│ │ - WebSocket (/ws/chat) │ │
│ │ - Telegram Webhook (/webhook/telegram) │ │
│ └────────────────────────────────────────────────┘ │
│ ↕ │
│ ┌────────────────────────────────────────────────┐ │
│ │ Authenticator │ │
│ │ - JWT verification (WebSocket) │ │
│ │ - Channel linking (Telegram) │ │
│ │ - User license lookup (PostgreSQL) │ │
│ └────────────────────────────────────────────────┘ │
│ ↕ │
│ ┌────────────────────────────────────────────────┐ │
│ │ Agent Harness (per-session) │ │
│ │ - Claude API integration │ │
│ │ - MCP client connector │ │
│ │ - Conversation state │ │
│ └────────────────────────────────────────────────┘ │
│ ↕ │
│ ┌────────────────────────────────────────────────┐ │
│ │ MCP Client │ │
│ │ - User container connection │ │
│ │ - Tool routing │ │
│ └────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
┌───────────────────────────────┐
│ User MCP Server (Python) │
│ - Strategies, indicators │
│ - Memory, preferences │
│ - Backtest sandbox │
└───────────────────────────────────┘
```
## Features
- **Automatic container provisioning**: Creates user agent containers on-demand via Kubernetes
- **Multi-channel support**: WebSocket and Telegram webhooks
- **Per-channel authentication**: JWT for web, channel linking for chat apps
- **User license management**: Feature flags and resource limits from PostgreSQL
- **Container lifecycle management**: Auto-shutdown on idle (handled by container sidecar)
- **License-based resources**: Different memory/CPU/storage limits per tier
- **Multi-model LLM support**: Anthropic Claude, OpenAI GPT, Google Gemini, OpenRouter (300+ models)
- **Zero vendor lock-in**: Switch models with one line, powered by LangChain.js
- **Intelligent routing**: Auto-select models based on complexity, license tier, or user preference
- **Streaming responses**: Real-time chat with WebSocket and Telegram
- **Complex workflows**: LangGraph for stateful trading analysis (backtest → risk → approval)
- **Agent harness**: Stateless orchestrator (all context lives in user's MCP container)
- **MCP resource integration**: User's RAG, conversation history, and preferences
## Container Management
When a user authenticates, the gateway:
1. **Checks for existing container**: Queries Kubernetes for deployment
2. **Creates if missing**: Renders YAML template based on license tier
3. **Waits for ready**: Polls deployment status until healthy
4. **Returns MCP endpoint**: Computed from service name
5. **Connects to MCP server**: Proceeds with normal authentication flow
Container templates by license tier:
| Tier | Memory | CPU | Storage | Idle Timeout |
|------|--------|-----|---------|--------------|
| Free | 512Mi | 500m | 1Gi | 15min |
| Pro | 2Gi | 2000m | 10Gi | 60min |
| Enterprise | 4Gi | 4000m | 50Gi | Never |
Containers self-manage their lifecycle using the lifecycle sidecar (see `../lifecycle-sidecar/`)
## Setup
### Prerequisites
- Node.js >= 22.0.0
- PostgreSQL database
- At least one LLM provider API key:
- Anthropic Claude
- OpenAI GPT
- Google Gemini
- OpenRouter (one key for 300+ models)
- Ollama (for embeddings): https://ollama.com/download
- Redis (for session/hot storage)
- Qdrant (for RAG vector search)
- Kafka + Flink + Iceberg (for durable storage)
### Development
1. Install dependencies:
```bash
npm install
```
2. Copy environment template:
```bash
cp .env.example .env
```
3. Configure `.env` (see `.env.example`):
```bash
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/dexorder
# Configure at least one provider
ANTHROPIC_API_KEY=sk-ant-xxxxx
# OPENAI_API_KEY=sk-xxxxx
# GOOGLE_API_KEY=xxxxx
# OPENROUTER_API_KEY=sk-or-xxxxx
# Optional: Set default model
DEFAULT_MODEL_PROVIDER=anthropic
DEFAULT_MODEL=claude-3-5-sonnet-20241022
```
4. Start Ollama and pull embedding model:
```bash
# Install Ollama (one-time): https://ollama.com/download
# Or with Docker: docker run -d -p 11434:11434 ollama/ollama
# Pull the all-minilm embedding model (90MB, CPU-friendly)
ollama pull all-minilm
# Alternative models:
# ollama pull nomic-embed-text # 8K context length
# ollama pull mxbai-embed-large # Higher accuracy, slower
```
5. Run development server:
```bash
npm run dev
```
### Production Build
```bash
npm run build
npm start
```
### Docker
```bash
docker build -t dexorder/gateway:latest .
docker run -p 3000:3000 --env-file .env dexorder/gateway:latest
```
## Database Schema
Required PostgreSQL tables (will be documented separately):
### `user_licenses`
- `user_id` (text, primary key)
- `email` (text)
- `license_type` (text: 'free', 'pro', 'enterprise')
- `features` (jsonb)
- `resource_limits` (jsonb)
- `mcp_server_url` (text)
- `expires_at` (timestamp, nullable)
- `created_at` (timestamp)
- `updated_at` (timestamp)
### `user_channel_links`
- `id` (serial, primary key)
- `user_id` (text, foreign key)
- `channel_type` (text: 'telegram', 'slack', 'discord')
- `channel_user_id` (text)
- `created_at` (timestamp)
## API Endpoints
### WebSocket
**`GET /ws/chat`**
- WebSocket connection for web client
- Auth: Bearer token in headers
- Protocol: JSON messages
Example:
```javascript
const ws = new WebSocket('ws://localhost:3000/ws/chat', {
headers: {
'Authorization': 'Bearer your-jwt-token'
}
});
ws.on('message', (data) => {
const msg = JSON.parse(data);
console.log(msg);
});
ws.send(JSON.stringify({
type: 'message',
content: 'Hello, AI!'
}));
```
### Telegram Webhook
**`POST /webhook/telegram`**
- Telegram bot webhook endpoint
- Auth: Telegram user linked to platform user
- Automatically processes incoming messages
### Health Check
**`GET /health`**
- Returns server health status
## Ollama Deployment Options
The gateway requires Ollama for embedding generation in RAG queries. You have two deployment options:
### Option 1: Ollama in Gateway Container (Recommended for simplicity)
Install Ollama directly in the gateway container. This keeps all dependencies local and simplifies networking.
**Dockerfile additions:**
```dockerfile
FROM node:22-slim
# Install Ollama
RUN curl -fsSL https://ollama.com/install.sh | sh
# Pull embedding model at build time
RUN ollama serve & \
sleep 5 && \
ollama pull all-minilm && \
pkill ollama
# ... rest of your gateway Dockerfile
```
**Start script (entrypoint.sh):**
```bash
#!/bin/bash
# Start Ollama in background
ollama serve &
# Start gateway
node dist/main.js
```
**Pros:**
- Simple networking (localhost:11434)
- No extra K8s resources
- Self-contained deployment
**Cons:**
- Larger container image (~200MB extra)
- CPU/memory shared with gateway process
**Resource requirements:**
- Add +200MB memory
- Add +0.2 CPU cores for embedding inference
### Option 2: Ollama as Separate Pod/Sidecar
Deploy Ollama as a separate container in the same pod (sidecar) or as its own deployment.
**K8s Deployment (sidecar pattern):**
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: gateway
spec:
template:
spec:
containers:
- name: gateway
image: ghcr.io/dexorder/gateway:latest
env:
- name: OLLAMA_URL
value: http://localhost:11434
- name: ollama
image: ollama/ollama:latest
command: ["/bin/sh", "-c"]
args:
- |
ollama serve &
sleep 5
ollama pull all-minilm
wait
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "1000m"
```
**K8s Deployment (separate service):**
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: ollama
spec:
replicas: 1
template:
spec:
containers:
- name: ollama
image: ollama/ollama:latest
# ... same as above
---
apiVersion: v1
kind: Service
metadata:
name: ollama
spec:
selector:
app: ollama
ports:
- port: 11434
```
Gateway `.env`:
```bash
OLLAMA_URL=http://ollama:11434
```
**Pros:**
- Isolated resource limits
- Can scale separately
- Easier to monitor/debug
**Cons:**
- More K8s resources
- Network hop (minimal latency)
- More complex deployment
### Recommendation
For most deployments: **Use Option 1 (in-container)** for simplicity, unless you need to:
- Share Ollama across multiple services
- Scale embedding inference independently
- Run Ollama on GPU nodes (gateway on CPU nodes)
## TODO
- [ ] Implement JWT verification with JWKS
- [ ] Implement MCP HTTP/SSE transport
- [ ] Add rate limiting per user license
- [ ] Add message usage tracking
- [ ] Add streaming responses for WebSocket
- [ ] Add Slack and Discord channel handlers
- [ ] Add session cleanup/timeout logic