container lifecycle management

This commit is contained in:
2026-03-12 15:13:38 -04:00
parent e99ef5d2dd
commit b9cc397e05
61 changed files with 6880 additions and 31 deletions

212
gateway/README.md Normal file
View File

@@ -0,0 +1,212 @@
# Dexorder Gateway
Multi-channel gateway with agent harness for the Dexorder AI platform.
## Architecture
```
┌─────────────────────────────────────────────────────────┐
│ Platform Gateway │
│ (Node.js/Fastify) │
│ │
│ ┌────────────────────────────────────────────────┐ │
│ │ Channels │ │
│ │ - WebSocket (/ws/chat) │ │
│ │ - Telegram Webhook (/webhook/telegram) │ │
│ └────────────────────────────────────────────────┘ │
│ ↕ │
│ ┌────────────────────────────────────────────────┐ │
│ │ Authenticator │ │
│ │ - JWT verification (WebSocket) │ │
│ │ - Channel linking (Telegram) │ │
│ │ - User license lookup (PostgreSQL) │ │
│ └────────────────────────────────────────────────┘ │
│ ↕ │
│ ┌────────────────────────────────────────────────┐ │
│ │ Agent Harness (per-session) │ │
│ │ - Claude API integration │ │
│ │ - MCP client connector │ │
│ │ - Conversation state │ │
│ └────────────────────────────────────────────────┘ │
│ ↕ │
│ ┌────────────────────────────────────────────────┐ │
│ │ MCP Client │ │
│ │ - User container connection │ │
│ │ - Tool routing │ │
│ └────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
┌───────────────────────────────┐
│ User MCP Server (Python) │
│ - Strategies, indicators │
│ - Memory, preferences │
│ - Backtest sandbox │
└───────────────────────────────────┘
```
## Features
- **Automatic container provisioning**: Creates user agent containers on-demand via Kubernetes
- **Multi-channel support**: WebSocket and Telegram webhooks
- **Per-channel authentication**: JWT for web, channel linking for chat apps
- **User license management**: Feature flags and resource limits from PostgreSQL
- **Container lifecycle management**: Auto-shutdown on idle (handled by container sidecar)
- **License-based resources**: Different memory/CPU/storage limits per tier
- **Multi-model LLM support**: Anthropic Claude, OpenAI GPT, Google Gemini, OpenRouter (300+ models)
- **Zero vendor lock-in**: Switch models with one line, powered by LangChain.js
- **Intelligent routing**: Auto-select models based on complexity, license tier, or user preference
- **Streaming responses**: Real-time chat with WebSocket and Telegram
- **Complex workflows**: LangGraph for stateful trading analysis (backtest → risk → approval)
- **Agent harness**: Stateless orchestrator (all context lives in user's MCP container)
- **MCP resource integration**: User's RAG, conversation history, and preferences
## Container Management
When a user authenticates, the gateway:
1. **Checks for existing container**: Queries Kubernetes for deployment
2. **Creates if missing**: Renders YAML template based on license tier
3. **Waits for ready**: Polls deployment status until healthy
4. **Returns MCP endpoint**: Computed from service name
5. **Connects to MCP server**: Proceeds with normal authentication flow
Container templates by license tier:
| Tier | Memory | CPU | Storage | Idle Timeout |
|------|--------|-----|---------|--------------|
| Free | 512Mi | 500m | 1Gi | 15min |
| Pro | 2Gi | 2000m | 10Gi | 60min |
| Enterprise | 4Gi | 4000m | 50Gi | Never |
Containers self-manage their lifecycle using the lifecycle sidecar (see `../lifecycle-sidecar/`)
## Setup
### Prerequisites
- Node.js >= 22.0.0
- PostgreSQL database
- At least one LLM provider API key:
- Anthropic Claude
- OpenAI GPT
- Google Gemini
- OpenRouter (one key for 300+ models)
### Development
1. Install dependencies:
```bash
npm install
```
2. Copy environment template:
```bash
cp .env.example .env
```
3. Configure `.env` (see `.env.example`):
```bash
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/dexorder
# Configure at least one provider
ANTHROPIC_API_KEY=sk-ant-xxxxx
# OPENAI_API_KEY=sk-xxxxx
# GOOGLE_API_KEY=xxxxx
# OPENROUTER_API_KEY=sk-or-xxxxx
# Optional: Set default model
DEFAULT_MODEL_PROVIDER=anthropic
DEFAULT_MODEL=claude-3-5-sonnet-20241022
```
4. Run development server:
```bash
npm run dev
```
### Production Build
```bash
npm run build
npm start
```
### Docker
```bash
docker build -t dexorder/gateway:latest .
docker run -p 3000:3000 --env-file .env dexorder/gateway:latest
```
## Database Schema
Required PostgreSQL tables (will be documented separately):
### `user_licenses`
- `user_id` (text, primary key)
- `email` (text)
- `license_type` (text: 'free', 'pro', 'enterprise')
- `features` (jsonb)
- `resource_limits` (jsonb)
- `mcp_server_url` (text)
- `expires_at` (timestamp, nullable)
- `created_at` (timestamp)
- `updated_at` (timestamp)
### `user_channel_links`
- `id` (serial, primary key)
- `user_id` (text, foreign key)
- `channel_type` (text: 'telegram', 'slack', 'discord')
- `channel_user_id` (text)
- `created_at` (timestamp)
## API Endpoints
### WebSocket
**`GET /ws/chat`**
- WebSocket connection for web client
- Auth: Bearer token in headers
- Protocol: JSON messages
Example:
```javascript
const ws = new WebSocket('ws://localhost:3000/ws/chat', {
headers: {
'Authorization': 'Bearer your-jwt-token'
}
});
ws.on('message', (data) => {
const msg = JSON.parse(data);
console.log(msg);
});
ws.send(JSON.stringify({
type: 'message',
content: 'Hello, AI!'
}));
```
### Telegram Webhook
**`POST /webhook/telegram`**
- Telegram bot webhook endpoint
- Auth: Telegram user linked to platform user
- Automatically processes incoming messages
### Health Check
**`GET /health`**
- Returns server health status
## TODO
- [ ] Implement JWT verification with JWKS
- [ ] Implement MCP HTTP/SSE transport
- [ ] Add Redis for session persistence
- [ ] Add rate limiting per user license
- [ ] Add message usage tracking
- [ ] Add streaming responses for WebSocket
- [ ] Add Slack and Discord channel handlers
- [ ] Add session cleanup/timeout logic