ai/doc/architecture.md

# DexOrder AI Platform Architecture

## Overview

DexOrder is an AI-powered trading platform that combines real-time market data processing, user-specific AI agents, and a flexible data pipeline. The system is designed for scalability, isolation, and extensibility.

## High-Level Architecture

```
┌─────────────────────────────────────────────────────────────────┐
│                         User Clients                             │
│              (Web, Mobile, Telegram, External MCP)               │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                          Gateway                                 │
│  • WebSocket/HTTP/Telegram handlers                             │
│  • Authentication & session management                           │
│  • Agent Harness (LangChain/LangGraph orchestration)            │
│    - MCP client connector to user containers                    │
│    - RAG retriever (Qdrant)                                     │
│    - Model router (LLM selection)                               │
│    - Skills & subagents framework                               │
│  • Dynamic user container provisioning                           │
│  • Event routing (informational & critical)                      │
└────────┬──────────────────┬────────────────────┬────────────────┘
         │                  │                    │
         ▼                  ▼                    ▼
┌──────────────────┐  ┌──────────────┐   ┌──────────────────────┐
│ User Containers  │  │    Relay     │   │   Infrastructure     │
│ (per-user pods)  │  │ (ZMQ Router) │   │ • DragonflyDB (cache)│
│                  │  │              │   │ • Qdrant (vectors)   │
│ • MCP Server     │  │ • Market data│   │ • PostgreSQL (meta)  │
│ • User files:    │  │   fanout     │   │ • MinIO (S3)         │
│   - Indicators   │  │ • Work queue │   │                      │
│   - Strategies   │  │ • Stateless  │   │                      │
│   - Preferences  │  │              │   │                      │
│ • Event Publisher│  │              │   │                      │
│ • Lifecycle Mgmt │  │              │   │                      │
└──────────────────┘  └──────┬───────┘   └──────────────────────┘
                             │
              ┌──────────────┴──────────────┐
              │                             │
              ▼                             ▼
    ┌──────────────────┐        ┌──────────────────────┐
    │   Ingestors      │        │    Flink Cluster     │
    │ • CCXT adapters  │        │ • Deduplication      │
    │ • Exchange APIs  │        │ • OHLC aggregation   │
    │ • Push to Kafka  │        │ • CEP engine         │
    └────────┬─────────┘        │ • Writes to Iceberg  │
             │                  │ • Market data PUB    │
             │                  └──────────┬───────────┘
             ▼                             │
    ┌─────────────────────────────────────▼────────────┐
    │                    Kafka                          │
    │  • Durable append log                            │
    │  • Topic-based streams                           │
    │  • Event sourcing                                │
    └──────────────────────┬───────────────────────────┘
                           │
                           ▼
                  ┌─────────────────┐
                  │ Iceberg Catalog │
                  │ • Historical    │
                  │   OHLC storage  │
                  │ • Query API     │
                  └─────────────────┘
```

## Core Components

### 1. Gateway

**Location:** `gateway/`
**Language:** TypeScript (Node.js)
**Purpose:** Entry point for all user interactions

**Responsibilities:**
- **Authentication:** JWT tokens, Telegram OAuth, multi-tier licensing
- **Session Management:** WebSocket connections, Telegram webhooks, multi-channel support
- **Container Orchestration:** Dynamic provisioning of user agent pods ([[gateway_container_creation]])
- **Event Handling:**
  - Subscribe to user container events (XPUB/SUB for informational)
  - Route critical events (ROUTER/DEALER with ack) ([[user_container_events]])
- **Agent Harness (LangChain/LangGraph):** ([[agent_harness]])
  - Stateless LLM orchestration
  - MCP client connector to user containers
  - RAG retrieval from Qdrant (global + user-specific knowledge)
  - Model routing based on license tier and complexity
  - Skills and subagents framework
  - Workflow state machines with validation loops

**Key Features:**
- **Stateless design:** All conversation state lives in user containers or Qdrant
- **Multi-channel support:** WebSocket, Telegram (future: mobile, Discord, Slack)
- **Kubernetes-native:** Uses k8s API for container management
- **Three-tier memory:**
  - Redis: Hot storage, active sessions, LangGraph checkpoints (1 hour TTL)
  - Qdrant: Vector search, RAG, global + user knowledge, GDPR-compliant
  - Iceberg: Cold storage, full history, analytics, time-travel queries

**Infrastructure:**
- Deployed in `dexorder-system` namespace
- RBAC: Can create but not delete user containers
- Network policies: Access to k8s API, user containers, infrastructure

---

### 2. User Containers

**Location:** `client-py/`
**Language:** Python
**Purpose:** Per-user isolated workspace and data storage

**Architecture:**
- One pod per user (auto-provisioned by gateway)
- Persistent storage (PVC) for user data
- Multi-container pod:
  - **Agent container:** MCP server + event publisher + user files
  - **Lifecycle sidecar:** Auto-shutdown and cleanup

**Components:**

#### MCP Server
Exposes user-specific resources and tools via Model Context Protocol.

**Resources (Context for LLM):**
Gateway fetches these before each LLM call:
- `context://user-profile` - Trading preferences, style, risk tolerance
- `context://conversation-summary` - Recent conversation with semantic context
- `context://workspace-state` - Current chart, watchlist, positions, alerts
- `context://system-prompt` - User's custom AI instructions

**Tools (Actions with side effects):**
Gateway proxies these to user's MCP server:
- `save_message(role, content)` - Save to conversation history
- `search_conversation(query)` - Semantic search over past conversations
- `list_strategies()`, `read_strategy(name)`, `write_strategy(name, code)`
- `list_indicators()`, `read_indicator(name)`, `write_indicator(name, code)`
- `run_backtest(strategy, params)` - Execute backtest
- `get_watchlist()`, `execute_trade(params)`, `get_positions()`
- `run_python(code)` - Execute Python with data science libraries

**User Files:**
- `indicators/*.py` - Custom technical indicators
- `strategies/*.py` - Trading strategies with entry/exit rules
- Watchlists and preferences
- Git-versioned in persistent volume

#### Event Publisher ([[user_container_events]])
Publishes user events (order fills, alerts, workspace changes) via dual-channel ZMQ:
- **XPUB:** Informational events (fire-and-forget to active sessions)
- **DEALER:** Critical events (guaranteed delivery with ack)

#### Lifecycle Manager ([[container_lifecycle_management]])
Tracks activity and triggers; auto-shuts down when idle:
- Configurable idle timeouts by license tier
- Exit code 42 signals intentional shutdown
- Sidecar deletes deployment and optionally PVC

**Isolation:**
- Network policies: Cannot access k8s API, other users, or system services
- PodSecurity: Non-root, read-only rootfs, dropped capabilities
- Resource limits enforced by license tier

---

### 3. Data Pipeline

#### Relay (ZMQ Router)

**Location:** `relay/`
**Language:** Rust
**Purpose:** Stateless message router for market data and requests

**Architecture:**
- Well-known bind point (all components connect to it)
- No request tracking or state
- Topic-based routing

**Channels:**
1. **Client Requests (ROUTER):** Port 5559 - Historical data requests
2. **Ingestor Work Queue (PUB):** Port 5555 - Work distribution with exchange prefix
3. **Market Data Fanout (XPUB/XSUB):** Port 5558 - Realtime data + notifications
4. **Responses (SUB → PUB proxy):** Notifications from Flink to clients

See [[protocol]] for detailed ZMQ patterns and message formats.

---

#### Ingestors

**Location:** `ingestor/`
**Language:** Python
**Purpose:** Fetch market data from exchanges

**Features:**
- CCXT-based exchange adapters
- Subscribes to work queue via exchange prefix (e.g., `BINANCE:`)
- Writes raw data to Kafka only (no direct client responses)
- Supports realtime ticks and historical OHLC

**Data Flow:**
```
Exchange API → Ingestor → Kafka → Flink → Iceberg
                                      ↓
                                  Notification → Relay → Clients
```

---

#### Kafka

**Deployment:** KRaft mode (no Zookeeper)
**Purpose:** Durable event log and stream processing backbone

**Topics:**
- Raw market data streams (per exchange/symbol)
- Processed OHLC data
- Notification events
- User events (orders, alerts)

**Retention:**
- Configurable per topic (default: 7 days for raw data)
- Longer retention for aggregated data

---

#### Flink

**Deployment:** JobManager + TaskManager(s)
**Purpose:** Stream processing and aggregation

**Jobs:**
1. **Deduplication:** Remove duplicate ticks from multiple ingestors
2. **OHLC Aggregation:** Build candles from tick streams
3. **CEP (Complex Event Processing):** Pattern detection and alerts
4. **Iceberg Writer:** Batch write to long-term storage
5. **Notification Publisher:** ZMQ PUB for async client notifications

**State:**
- Checkpointing to MinIO (S3-compatible)
- Exactly-once processing semantics

**Scaling:**
- Multiple TaskManagers for parallelism
- Headless service for ZMQ discovery (see [[protocol#TODO: Flink-to-Relay ZMQ Discovery]])

---

#### Apache Iceberg

**Deployment:** REST catalog with PostgreSQL backend
**Purpose:** Historical data lake for OHLC and analytics

**Features:**
- Schema evolution
- Time travel queries
- Partitioning by date/symbol
- Efficient columnar storage (Parquet)

**Storage:** MinIO (S3-compatible object storage)

---

### 4. Infrastructure Services

#### DragonflyDB
- Redis-compatible in-memory cache
- Session state, rate limiting, hot data

#### Qdrant
- Vector database for RAG
- **Global knowledge** (user_id="0"): Platform capabilities, trading concepts, strategy patterns
- **User knowledge** (user_id=specific): Personal conversations, preferences, strategies
- GDPR-compliant (indexed by user_id for fast deletion)

#### PostgreSQL
- Iceberg catalog metadata
- User accounts and license info (gateway)
- Per-user data lives in user containers

#### MinIO
- S3-compatible object storage
- Iceberg table data
- Flink checkpoints
- User file uploads

---

## Data Flow Patterns

### Historical Data Query (Async)

```
1. Client → Gateway → User Container MCP: User requests data
2. Gateway → Relay (REQ/REP): Submit historical request
3. Relay → Ingestors (PUB/SUB): Broadcast work with exchange prefix
4. Ingestor → Exchange API: Fetch data
5. Ingestor → Kafka: Write OHLC batch with metadata
6. Flink → Kafka: Read, process, dedupe
7. Flink → Iceberg: Write to table
8. Flink → Relay (PUB): Publish HistoryReadyNotification
9. Relay → Client (SUB): Notification delivered
10. Client → Iceberg: Query data directly
```

**Key Design:**
- Client subscribes to notification topic BEFORE submitting request (prevents race)
- Notification topics are deterministic: `RESPONSE:{client_id}` or `HISTORY_READY:{request_id}`
- No state in Relay (fully topic-based routing)

See [[protocol#Historical Data Query Flow]] for details.

---

### Realtime Market Data

```
1. Ingestor → Kafka: Write realtime ticks
2. Flink → Kafka: Read and aggregate OHLC
3. Flink → Relay (PUB): Publish market data
4. Relay → Clients (XPUB/SUB): Fanout to subscribers
```

**Topic Format:** `{ticker}|{data_type}` (e.g., `BINANCE:BTC/USDT|tick`)

---

### User Events

User containers emit events (order fills, alerts) that must reach users reliably.

**Dual-Channel Design:**

1. **Informational Events (XPUB/SUB):**
   - Container tracks active subscriptions via XPUB
   - Publishes only if someone is listening
   - Zero latency, fire-and-forget

2. **Critical Events (DEALER/ROUTER):**
   - Container sends to gateway ROUTER with event ID
   - Gateway delivers via Telegram/email/push
   - Gateway sends EventAck back to container
   - Container retries on timeout
   - Persisted to disk on shutdown

See [[user_container_events]] for implementation.

---

## Container Lifecycle

### Creation ([[gateway_container_creation]])

```
User authenticates → Gateway checks if deployment exists
                  → If missing, create from template (based on license tier)
                  → Wait for ready (2min timeout)
                  → Return MCP endpoint
```

**Templates by Tier:**
| Tier | Memory | CPU | Storage | Idle Timeout |
|------|--------|-----|---------|--------------|
| Free | 512Mi | 500m | 1Gi | 15min |
| Pro | 2Gi | 2000m | 10Gi | 60min |
| Enterprise | 4Gi | 4000m | 50Gi | Never |

---

### Lifecycle Management ([[container_lifecycle_management]])

**Idle Detection:**
- Container is idle when: no active triggers + no recent MCP activity
- Lifecycle manager tracks:
  - MCP tool/resource calls (reset idle timer)
  - Active triggers (data subscriptions, CEP patterns)

**Shutdown:**
- On idle timeout: exit with code 42
- Lifecycle sidecar detects exit code 42
- Sidecar calls k8s API to delete deployment
- Optionally deletes PVC (anonymous users only)

**Security:**
- Sidecar has RBAC to delete its own deployment only
- Cannot delete other deployments or access other namespaces
- Gateway cannot delete deployments (separation of concerns)

---

## Security Architecture

### Network Isolation

**NetworkPolicies:**
- User containers:
  - ✅ Connect to gateway (MCP)
  - ✅ Connect to relay (market data)
  - ✅ Outbound HTTPS (exchanges, LLM APIs)
  - ❌ No k8s API access
  - ❌ No system namespace access
  - ❌ No inter-user communication

- Gateway:
  - ✅ k8s API (create containers)
  - ✅ User containers (MCP client)
  - ✅ Infrastructure (Postgres, Redis)
  - ✅ Outbound (Anthropic API)

---

### RBAC

**Gateway ServiceAccount:**
- Create deployments/services/PVCs in `dexorder-agents` namespace
- Read pod status and logs
- Cannot delete, exec, or access secrets

**Lifecycle Sidecar ServiceAccount:**
- Delete deployments in `dexorder-agents` namespace
- Delete PVCs (conditional on user type)
- Cannot access other resources

---

### Admission Control

All pods in `dexorder-agents` namespace must:
- Use approved images only (allowlist)
- Run as non-root
- Drop all capabilities
- Use read-only root filesystem
- Have resource limits

See `deploy/k8s/base/admission-policy.yaml`

---

## Agent Harness Flow

The gateway's agent harness (LangChain/LangGraph) orchestrates LLM interactions with full context.

```
1. User sends message → Gateway (WebSocket/Telegram)
   ↓
2. Authenticator validates user and gets license info
   ↓
3. Container Manager ensures user's MCP container is running
   ↓
4. Agent Harness processes message:
   │
   ├─→ a. MCPClientConnector fetches context resources from user's MCP:
   │      - context://user-profile
   │      - context://conversation-summary
   │      - context://workspace-state
   │      - context://system-prompt
   │
   ├─→ b. RAGRetriever searches Qdrant for relevant memories:
   │      - Embeds user query
   │      - Searches: user_id IN (current_user, "0")
   │      - Returns user-specific + global platform knowledge
   │
   ├─→ c. Build system prompt:
   │      - Base platform prompt
   │      - User profile context
   │      - Workspace state
   │      - Custom user instructions
   │      - Relevant RAG memories
   │
   ├─→ d. ModelRouter selects LLM:
   │      - Based on license tier
   │      - Query complexity
   │      - Routing strategy (cost/speed/quality)
   │
   ├─→ e. LLM invocation with tool support:
   │      - Send messages to LLM
   │      - If tool calls requested:
   │         • Platform tools → handled by gateway
   │         • User tools → proxied to MCP container
   │      - Loop until no more tool calls
   │
   ├─→ f. Save conversation to MCP:
   │      - mcp.callTool('save_message', user_message)
   │      - mcp.callTool('save_message', assistant_message)
   │
   └─→ g. Return response to user via channel
```

**Key Architecture:**
- **Gateway is stateless:** No conversation history stored in gateway
- **User context in MCP:** All user-specific data lives in user's container
- **Global knowledge in Qdrant:** Platform documentation loaded from `gateway/knowledge/`
- **RAG at gateway level:** Semantic search combines global + user knowledge
- **Skills vs Subagents:**
  - Skills: Well-defined, single-purpose tasks
  - Subagents: Complex domain expertise with multi-file context
- **Workflows:** LangGraph state machines for multi-step processes

See [[agent_harness]] for detailed implementation.

---

## Configuration Management

All services use dual YAML files:
- `config.yaml` - Non-sensitive configuration (mounted from ConfigMap)
- `secrets.yaml` - Credentials and tokens (mounted from Secret)

**Environment Variables:**
- K8s downward API for pod metadata
- Service discovery via DNS (e.g., `kafka:9092`)

---

## Deployment

### Development

```bash
# Start local k8s
minikube start

# Apply infrastructure
kubectl apply -k deploy/k8s/dev

# Build and load images
docker build -t dexorder/gateway:latest gateway/
minikube image load dexorder/gateway:latest

# Port-forward for access
kubectl port-forward -n dexorder-system svc/gateway 3000:3000
```

---

### Production

```bash
# Apply production configs
kubectl apply -k deploy/k8s/prod

# Push images to registry
docker push ghcr.io/dexorder/gateway:latest
docker push ghcr.io/dexorder/agent:latest
docker push ghcr.io/dexorder/lifecycle-sidecar:latest
```

**Namespaces:**
- `dexorder-system` - Platform services (gateway, infrastructure)
- `dexorder-agents` - User containers (isolated)

---

## Observability

### Metrics (Prometheus)
- Container creation/deletion rates
- Idle shutdown counts
- MCP call latency and errors
- Event delivery rates and retries
- Kafka lag and throughput
- Flink checkpoint duration

### Logging
- Structured JSON logs
- User ID in all agent logs
- Aggregated via Loki or CloudWatch

### Tracing
- OpenTelemetry spans across gateway → MCP → LLM
- User-scoped traces for debugging

---

## Scalability

### Horizontal Scaling

**Stateless Components:**
- Gateway: Add replicas behind load balancer
- Relay: Single instance (stateless router)
- Ingestors: Scale by exchange workload

**Stateful Components:**
- Flink: Scale TaskManagers
- User containers: One per user (1000s of pods)

**Bottlenecks:**
- Flink → Relay ZMQ: Requires discovery protocol (see [[protocol#TODO: Flink-to-Relay ZMQ Discovery]])
- Kafka: Partition by symbol for parallelism
- Iceberg: Partition by date/symbol

---

### Cost Optimization

**Tiered Resources:**
- Free users: Aggressive idle shutdown (15min)
- Pro users: Longer timeout (60min)
- Enterprise: Always-on containers

**Storage:**
- PVC deletion for anonymous users
- Tiered storage classes (fast SSD → cheap HDD)

**LLM Costs:**
- Rate limiting per license tier
- Caching of MCP resources (1-5min TTL)
- Conversation summarization to reduce context size

---

## Development Roadmap

See [[backend_redesign]] for detailed notes.

**Phase 1: Foundation (Complete)**
- Gateway with k8s integration
- User container provisioning
- MCP protocol implementation
- Basic market data pipeline

**Phase 2: Data Pipeline (In Progress)**
- Kafka topic schemas
- Flink jobs for aggregation
- Iceberg integration
- Historical backfill service

**Phase 3: Agent Features**
- RAG integration (Qdrant)
- Strategy backtesting
- Risk management tools
- Portfolio analytics

**Phase 4: Production Hardening**
- Multi-region deployment
- HA for infrastructure
- Comprehensive monitoring
- Performance optimization

---

## Related Documentation

- [[protocol]] - ZMQ message protocols and data flow
- [[gateway_container_creation]] - Dynamic container provisioning
- [[container_lifecycle_management]] - Idle shutdown and cleanup
- [[user_container_events]] - Event system implementation
- [[agent_harness]] - LLM orchestration flow
- [[m_c_p_tools_architecture]] - User MCP tools specification
- [[user_mcp_resources]] - Context resources and RAG
- [[m_c_p_client_authentication_modes]] - MCP authentication patterns
- [[backend_redesign]] - Design notes and TODO items