362 lines
11 KiB
Markdown
362 lines
11 KiB
Markdown
# Dexorder Gateway
|
|
|
|
Multi-channel gateway with agent harness for the Dexorder AI platform.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ Platform Gateway │
|
|
│ (Node.js/Fastify) │
|
|
│ │
|
|
│ ┌────────────────────────────────────────────────┐ │
|
|
│ │ Channels │ │
|
|
│ │ - WebSocket (/ws/chat) │ │
|
|
│ │ - Telegram Webhook (/webhook/telegram) │ │
|
|
│ └────────────────────────────────────────────────┘ │
|
|
│ ↕ │
|
|
│ ┌────────────────────────────────────────────────┐ │
|
|
│ │ Authenticator │ │
|
|
│ │ - JWT verification (WebSocket) │ │
|
|
│ │ - Channel linking (Telegram) │ │
|
|
│ │ - User license lookup (PostgreSQL) │ │
|
|
│ └────────────────────────────────────────────────┘ │
|
|
│ ↕ │
|
|
│ ┌────────────────────────────────────────────────┐ │
|
|
│ │ Agent Harness (per-session) │ │
|
|
│ │ - Claude API integration │ │
|
|
│ │ - MCP client connector │ │
|
|
│ │ - Conversation state │ │
|
|
│ └────────────────────────────────────────────────┘ │
|
|
│ ↕ │
|
|
│ ┌────────────────────────────────────────────────┐ │
|
|
│ │ MCP Client │ │
|
|
│ │ - User container connection │ │
|
|
│ │ - Tool routing │ │
|
|
│ └────────────────────────────────────────────────┘ │
|
|
└─────────────────────────────────────────────────────────┘
|
|
↕
|
|
┌───────────────────────────────┐
|
|
│ User MCP Server (Python) │
|
|
│ - Strategies, indicators │
|
|
│ - Memory, preferences │
|
|
│ - Backtest sandbox │
|
|
└───────────────────────────────────┘
|
|
```
|
|
|
|
## Features
|
|
|
|
- **Automatic container provisioning**: Creates user agent containers on-demand via Kubernetes
|
|
- **Multi-channel support**: WebSocket and Telegram webhooks
|
|
- **Per-channel authentication**: JWT for web, channel linking for chat apps
|
|
- **User license management**: Feature flags and resource limits from PostgreSQL
|
|
- **Container lifecycle management**: Auto-shutdown on idle (handled by container sidecar)
|
|
- **License-based resources**: Different memory/CPU/storage limits per tier
|
|
- **Multi-model LLM support**: Anthropic Claude, OpenAI GPT, Google Gemini, OpenRouter (300+ models)
|
|
- **Zero vendor lock-in**: Switch models with one line, powered by LangChain.js
|
|
- **Intelligent routing**: Auto-select models based on complexity, license tier, or user preference
|
|
- **Streaming responses**: Real-time chat with WebSocket and Telegram
|
|
- **Complex workflows**: LangGraph for stateful trading analysis (backtest → risk → approval)
|
|
- **Agent harness**: Stateless orchestrator (all context lives in user's MCP container)
|
|
- **MCP resource integration**: User's RAG, conversation history, and preferences
|
|
|
|
## Container Management
|
|
|
|
When a user authenticates, the gateway:
|
|
|
|
1. **Checks for existing container**: Queries Kubernetes for deployment
|
|
2. **Creates if missing**: Renders YAML template based on license tier
|
|
3. **Waits for ready**: Polls deployment status until healthy
|
|
4. **Returns MCP endpoint**: Computed from service name
|
|
5. **Connects to MCP server**: Proceeds with normal authentication flow
|
|
|
|
Container templates by license tier:
|
|
|
|
| Tier | Memory | CPU | Storage | Idle Timeout |
|
|
|------|--------|-----|---------|--------------|
|
|
| Free | 512Mi | 500m | 1Gi | 15min |
|
|
| Pro | 2Gi | 2000m | 10Gi | 60min |
|
|
| Enterprise | 4Gi | 4000m | 50Gi | Never |
|
|
|
|
Containers self-manage their lifecycle using the lifecycle sidecar (see `../lifecycle-sidecar/`)
|
|
|
|
## Setup
|
|
|
|
### Prerequisites
|
|
|
|
- Node.js >= 22.0.0
|
|
- PostgreSQL database
|
|
- At least one LLM provider API key:
|
|
- Anthropic Claude
|
|
- OpenAI GPT
|
|
- Google Gemini
|
|
- OpenRouter (one key for 300+ models)
|
|
- Ollama (for embeddings): https://ollama.com/download
|
|
- Redis (for session/hot storage)
|
|
- Qdrant (for RAG vector search)
|
|
- Kafka + Flink + Iceberg (for durable storage)
|
|
|
|
### Development
|
|
|
|
1. Install dependencies:
|
|
```bash
|
|
npm install
|
|
```
|
|
|
|
2. Copy environment template:
|
|
```bash
|
|
cp .env.example .env
|
|
```
|
|
|
|
3. Configure `.env` (see `.env.example`):
|
|
```bash
|
|
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/dexorder
|
|
|
|
# Configure at least one provider
|
|
ANTHROPIC_API_KEY=sk-ant-xxxxx
|
|
# OPENAI_API_KEY=sk-xxxxx
|
|
# GOOGLE_API_KEY=xxxxx
|
|
# OPENROUTER_API_KEY=sk-or-xxxxx
|
|
|
|
# Optional: Set default model
|
|
DEFAULT_MODEL_PROVIDER=anthropic
|
|
DEFAULT_MODEL=claude-3-5-sonnet-20241022
|
|
```
|
|
|
|
4. Start Ollama and pull embedding model:
|
|
```bash
|
|
# Install Ollama (one-time): https://ollama.com/download
|
|
# Or with Docker: docker run -d -p 11434:11434 ollama/ollama
|
|
|
|
# Pull the all-minilm embedding model (90MB, CPU-friendly)
|
|
ollama pull all-minilm
|
|
|
|
# Alternative models:
|
|
# ollama pull nomic-embed-text # 8K context length
|
|
# ollama pull mxbai-embed-large # Higher accuracy, slower
|
|
```
|
|
|
|
5. Run development server:
|
|
```bash
|
|
npm run dev
|
|
```
|
|
|
|
### Production Build
|
|
|
|
```bash
|
|
npm run build
|
|
npm start
|
|
```
|
|
|
|
### Docker
|
|
|
|
```bash
|
|
docker build -t dexorder/gateway:latest .
|
|
docker run -p 3000:3000 --env-file .env dexorder/gateway:latest
|
|
```
|
|
|
|
## Database Schema
|
|
|
|
Required PostgreSQL tables (will be documented separately):
|
|
|
|
### `user_licenses`
|
|
- `user_id` (text, primary key)
|
|
- `email` (text)
|
|
- `license_type` (text: 'free', 'pro', 'enterprise')
|
|
- `features` (jsonb)
|
|
- `resource_limits` (jsonb)
|
|
- `mcp_server_url` (text)
|
|
- `expires_at` (timestamp, nullable)
|
|
- `created_at` (timestamp)
|
|
- `updated_at` (timestamp)
|
|
|
|
### `user_channel_links`
|
|
- `id` (serial, primary key)
|
|
- `user_id` (text, foreign key)
|
|
- `channel_type` (text: 'telegram', 'slack', 'discord')
|
|
- `channel_user_id` (text)
|
|
- `created_at` (timestamp)
|
|
|
|
## API Endpoints
|
|
|
|
### WebSocket
|
|
|
|
**`GET /ws/chat`**
|
|
- WebSocket connection for web client
|
|
- Auth: Bearer token in headers
|
|
- Protocol: JSON messages
|
|
|
|
Example:
|
|
```javascript
|
|
const ws = new WebSocket('ws://localhost:3000/ws/chat', {
|
|
headers: {
|
|
'Authorization': 'Bearer your-jwt-token'
|
|
}
|
|
});
|
|
|
|
ws.on('message', (data) => {
|
|
const msg = JSON.parse(data);
|
|
console.log(msg);
|
|
});
|
|
|
|
ws.send(JSON.stringify({
|
|
type: 'message',
|
|
content: 'Hello, AI!'
|
|
}));
|
|
```
|
|
|
|
### Telegram Webhook
|
|
|
|
**`POST /webhook/telegram`**
|
|
- Telegram bot webhook endpoint
|
|
- Auth: Telegram user linked to platform user
|
|
- Automatically processes incoming messages
|
|
|
|
### Health Check
|
|
|
|
**`GET /health`**
|
|
- Returns server health status
|
|
|
|
## Ollama Deployment Options
|
|
|
|
The gateway requires Ollama for embedding generation in RAG queries. You have two deployment options:
|
|
|
|
### Option 1: Ollama in Gateway Container (Recommended for simplicity)
|
|
|
|
Install Ollama directly in the gateway container. This keeps all dependencies local and simplifies networking.
|
|
|
|
**Dockerfile additions:**
|
|
```dockerfile
|
|
FROM node:22-slim
|
|
|
|
# Install Ollama
|
|
RUN curl -fsSL https://ollama.com/install.sh | sh
|
|
|
|
# Pull embedding model at build time
|
|
RUN ollama serve & \
|
|
sleep 5 && \
|
|
ollama pull all-minilm && \
|
|
pkill ollama
|
|
|
|
# ... rest of your gateway Dockerfile
|
|
```
|
|
|
|
**Start script (entrypoint.sh):**
|
|
```bash
|
|
#!/bin/bash
|
|
# Start Ollama in background
|
|
ollama serve &
|
|
|
|
# Start gateway
|
|
node dist/main.js
|
|
```
|
|
|
|
**Pros:**
|
|
- Simple networking (localhost:11434)
|
|
- No extra K8s resources
|
|
- Self-contained deployment
|
|
|
|
**Cons:**
|
|
- Larger container image (~200MB extra)
|
|
- CPU/memory shared with gateway process
|
|
|
|
**Resource requirements:**
|
|
- Add +200MB memory
|
|
- Add +0.2 CPU cores for embedding inference
|
|
|
|
### Option 2: Ollama as Separate Pod/Sidecar
|
|
|
|
Deploy Ollama as a separate container in the same pod (sidecar) or as its own deployment.
|
|
|
|
**K8s Deployment (sidecar pattern):**
|
|
```yaml
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
metadata:
|
|
name: gateway
|
|
spec:
|
|
template:
|
|
spec:
|
|
containers:
|
|
- name: gateway
|
|
image: ghcr.io/dexorder/gateway:latest
|
|
env:
|
|
- name: OLLAMA_URL
|
|
value: http://localhost:11434
|
|
|
|
- name: ollama
|
|
image: ollama/ollama:latest
|
|
command: ["/bin/sh", "-c"]
|
|
args:
|
|
- |
|
|
ollama serve &
|
|
sleep 5
|
|
ollama pull all-minilm
|
|
wait
|
|
resources:
|
|
requests:
|
|
memory: "512Mi"
|
|
cpu: "500m"
|
|
limits:
|
|
memory: "1Gi"
|
|
cpu: "1000m"
|
|
```
|
|
|
|
**K8s Deployment (separate service):**
|
|
```yaml
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
metadata:
|
|
name: ollama
|
|
spec:
|
|
replicas: 1
|
|
template:
|
|
spec:
|
|
containers:
|
|
- name: ollama
|
|
image: ollama/ollama:latest
|
|
# ... same as above
|
|
---
|
|
apiVersion: v1
|
|
kind: Service
|
|
metadata:
|
|
name: ollama
|
|
spec:
|
|
selector:
|
|
app: ollama
|
|
ports:
|
|
- port: 11434
|
|
```
|
|
|
|
Gateway `.env`:
|
|
```bash
|
|
OLLAMA_URL=http://ollama:11434
|
|
```
|
|
|
|
**Pros:**
|
|
- Isolated resource limits
|
|
- Can scale separately
|
|
- Easier to monitor/debug
|
|
|
|
**Cons:**
|
|
- More K8s resources
|
|
- Network hop (minimal latency)
|
|
- More complex deployment
|
|
|
|
### Recommendation
|
|
|
|
For most deployments: **Use Option 1 (in-container)** for simplicity, unless you need to:
|
|
- Share Ollama across multiple services
|
|
- Scale embedding inference independently
|
|
- Run Ollama on GPU nodes (gateway on CPU nodes)
|
|
|
|
## TODO
|
|
|
|
- [ ] Implement JWT verification with JWKS
|
|
- [ ] Implement MCP HTTP/SSE transport
|
|
- [ ] Add rate limiting per user license
|
|
- [ ] Add message usage tracking
|
|
- [ ] Add streaming responses for WebSocket
|
|
- [ ] Add Slack and Discord channel handlers
|
|
- [ ] Add session cleanup/timeout logic
|