feat: add @tag model override support and remove Qdrant dependencies
- Add model-tags parser for @Tag syntax in chat messages - Support Anthropic models (Sonnet, Haiku, Opus) via @tag - Remove Qdrant vector database from infrastructure and configs - Simplify license model config to use null fallbacks - Add greeting stream after model switch via @tag - Fix protobuf field names to camelCase for v7 compatibility - Add 429 rate limit retry logic with exponential backoff - Remove RAG references from agent harness documentation
This commit is contained in:
@@ -10,7 +10,7 @@ The platform runs across two namespaces:
|
||||
|
||||
| Namespace | Contents |
|
||||
|-----------|----------|
|
||||
| `ai` | Gateway, web UI, all infrastructure services (postgres, minio, kafka, flink, relay, ingestor, qdrant, dragonfly, iceberg-catalog) |
|
||||
| `ai` | Gateway, web UI, all infrastructure services (postgres, minio, kafka, flink, relay, ingestor, dragonfly, iceberg-catalog) |
|
||||
| `sandbox` | Per-user sandbox containers (created dynamically by the gateway) |
|
||||
|
||||
Secrets are managed via 1Password CLI (`op inject`). All `.tpl.yaml` files in `deploy/k8s/prod/secrets/` contain `op://` references and are safe to commit; actual values are never stored in git.
|
||||
@@ -217,7 +217,7 @@ kubectl --context=prod -n ai get configmaps
|
||||
|
||||
## Step 7 — Deploy Infrastructure
|
||||
|
||||
Infrastructure services (postgres, minio, kafka, iceberg-catalog, dragonfly, qdrant, relay, ingestor, flink) are defined in `deploy/k8s/prod/infrastructure.yaml` and were applied in Step 4.
|
||||
Infrastructure services (postgres, minio, kafka, iceberg-catalog, dragonfly, relay, ingestor, flink) are defined in `deploy/k8s/prod/infrastructure.yaml` and were applied in Step 4.
|
||||
|
||||
Wait for the StatefulSets and Deployments to become ready:
|
||||
|
||||
@@ -225,7 +225,6 @@ Wait for the StatefulSets and Deployments to become ready:
|
||||
kubectl --context=prod -n ai rollout status statefulset/postgres
|
||||
kubectl --context=prod -n ai rollout status statefulset/minio
|
||||
kubectl --context=prod -n ai rollout status statefulset/kafka
|
||||
kubectl --context=prod -n ai rollout status statefulset/qdrant
|
||||
kubectl --context=prod -n ai rollout status deployment/dragonfly
|
||||
kubectl --context=prod -n ai rollout status deployment/iceberg-catalog
|
||||
kubectl --context=prod -n ai rollout status deployment/relay
|
||||
|
||||
@@ -22,20 +22,20 @@ The Agent Harness is the core orchestration layer for the Dexorder AI platform,
|
||||
│ ┌──────────────────┼──────────────────┐ │
|
||||
│ │ │ │ │
|
||||
│ ┌────▼─────┐ ┌────▼─────┐ ┌────▼─────┐ │
|
||||
│ │ MCP │ │ LLM │ │ RAG │ │
|
||||
│ │ Connector│ │ Router │ │ Retriever│ │
|
||||
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
|
||||
│ │ │ │ │
|
||||
└─────────┼──────────────────┼──────────────────┼─────────────┘
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌────────────┐ ┌───────────┐ ┌───────────┐
|
||||
│ User's │ │ LLM │ │ Qdrant │
|
||||
│ MCP │ │ Providers │ │ (Vectors) │
|
||||
│ Container │ │(Anthropic,│ │ │
|
||||
│ (k8s pod) │ │ OpenAI, │ │ Global + │
|
||||
│ │ │ etc) │ │ User │
|
||||
└────────────┘ └───────────┘ └───────────┘
|
||||
│ │ MCP │ │ LLM │ │
|
||||
│ │ Connector│ │ Router │ │
|
||||
│ └────┬─────┘ └────┬─────┘ │
|
||||
│ │ │ │
|
||||
└─────────┼──────────────────┼─────────────┘
|
||||
│ │
|
||||
▼ ▼
|
||||
┌────────────┐ ┌───────────┐
|
||||
│ User's │ │ LLM │
|
||||
│ MCP │ │ Providers │
|
||||
│ Container │ │(Anthropic,│
|
||||
│ (k8s pod) │ │ OpenAI, │
|
||||
│ │ │ etc) │
|
||||
└────────────┘ └───────────┘
|
||||
```
|
||||
|
||||
## Message Processing Flow
|
||||
@@ -57,17 +57,11 @@ When a user sends a message:
|
||||
│ - context://workspace-state
|
||||
│ - context://system-prompt
|
||||
│
|
||||
├─→ b. RAGRetriever searches for relevant memories:
|
||||
│ - Embeds user query
|
||||
│ - Searches Qdrant: user_id = current_user OR user_id = "0"
|
||||
│ - Returns user-specific + global platform knowledge
|
||||
│
|
||||
├─→ c. Build system prompt:
|
||||
├─→ b. Build system prompt:
|
||||
│ - Base platform prompt
|
||||
│ - User profile context
|
||||
│ - Workspace state
|
||||
│ - Custom user instructions
|
||||
│ - Relevant RAG memories
|
||||
│
|
||||
├─→ d. ModelRouter selects LLM:
|
||||
│ - Based on license tier
|
||||
@@ -92,11 +86,10 @@ When a user sends a message:
|
||||
|
||||
### 1. Agent Harness (`gateway/src/harness/agent-harness.ts`)
|
||||
|
||||
**Stateless orchestrator** - all state lives in user's MCP container or RAG.
|
||||
**Stateless orchestrator** - all state lives in user's MCP container.
|
||||
|
||||
**Responsibilities:**
|
||||
- Fetch context from user's MCP resources
|
||||
- Query RAG for relevant memories
|
||||
- Build prompts with full context
|
||||
- Route to appropriate LLM
|
||||
- Handle tool calls (platform vs user)
|
||||
@@ -141,40 +134,12 @@ Routes queries to appropriate LLM based on:
|
||||
- LangGraph checkpoints (1 hour TTL)
|
||||
- Fast reads for active conversations
|
||||
|
||||
**Qdrant** (Vector Search)
|
||||
- Conversation embeddings
|
||||
- User-specific memories (user_id = actual user ID)
|
||||
- **Global platform knowledge** (user_id = "0")
|
||||
- RAG retrieval with cosine similarity
|
||||
- GDPR-compliant (indexed by user_id for fast deletion)
|
||||
|
||||
**Iceberg** (Cold Storage)
|
||||
- Full conversation history (partitioned by user_id, session_id)
|
||||
- Checkpoint snapshots for replay
|
||||
- Analytics and time-travel queries
|
||||
- GDPR-compliant with compaction
|
||||
|
||||
#### RAG System:
|
||||
|
||||
**Global Knowledge** (user_id="0"):
|
||||
- Platform capabilities and architecture
|
||||
- Trading concepts and fundamentals
|
||||
- Indicator development guides
|
||||
- Strategy patterns and examples
|
||||
- Loaded from `gateway/knowledge/` markdown files
|
||||
|
||||
**User Knowledge** (user_id=specific user):
|
||||
- Personal conversation history
|
||||
- Trading preferences and style
|
||||
- Custom indicators and strategies
|
||||
- Workspace state and context
|
||||
|
||||
**Query Flow:**
|
||||
1. User query is embedded using EmbeddingService
|
||||
2. Qdrant searches: `user_id IN (current_user, "0")`
|
||||
3. Top-K relevant chunks returned
|
||||
4. Added to LLM context automatically
|
||||
|
||||
### 5. Skills vs Subagents
|
||||
|
||||
#### Skills (`gateway/src/harness/skills/`)
|
||||
@@ -290,44 +255,6 @@ User's MCP container provides access to:
|
||||
- Tactical order generators (TWAP, iceberg, etc.)
|
||||
- Smart order routing
|
||||
|
||||
## Global Knowledge Management
|
||||
|
||||
### Document Loading
|
||||
|
||||
At gateway startup:
|
||||
1. DocumentLoader scans `gateway/knowledge/` directory
|
||||
2. Markdown files chunked by headers (~1000 tokens/chunk)
|
||||
3. Embeddings generated via EmbeddingService
|
||||
4. Stored in Qdrant with user_id="0"
|
||||
5. Content hashing enables incremental updates
|
||||
|
||||
### Directory Structure
|
||||
|
||||
```
|
||||
gateway/knowledge/
|
||||
├── platform/ # Platform capabilities
|
||||
├── trading/ # Trading fundamentals
|
||||
├── indicators/ # Indicator development
|
||||
└── strategies/ # Strategy patterns
|
||||
```
|
||||
|
||||
### Updating Knowledge
|
||||
|
||||
**Development:**
|
||||
```bash
|
||||
curl -X POST http://localhost:3000/admin/reload-knowledge
|
||||
```
|
||||
|
||||
**Production:**
|
||||
- Update markdown files
|
||||
- Deploy new version
|
||||
- Auto-loaded on startup
|
||||
|
||||
**Monitoring:**
|
||||
```bash
|
||||
curl http://localhost:3000/admin/knowledge-stats
|
||||
```
|
||||
|
||||
## Container Lifecycle
|
||||
|
||||
### User Container Creation
|
||||
@@ -362,7 +289,6 @@ When user connects:
|
||||
### ✅ Completed
|
||||
- Agent Harness with MCP integration
|
||||
- Model routing with license tiers
|
||||
- RAG retriever with Qdrant
|
||||
- Document loader for global knowledge
|
||||
- EmbeddingService (Ollama/OpenAI)
|
||||
- Skills and subagents framework
|
||||
@@ -388,5 +314,4 @@ When user connects:
|
||||
- Documentation: `gateway/src/harness/README.md`
|
||||
- Knowledge base: `gateway/knowledge/`
|
||||
- LangGraph: https://langchain-ai.github.io/langgraphjs/
|
||||
- Qdrant: https://qdrant.tech/documentation/
|
||||
- MCP Spec: https://modelcontextprotocol.io/
|
||||
|
||||
@@ -19,7 +19,6 @@ Dexorder is an AI-powered trading platform that combines real-time market data p
|
||||
│ • Authentication & session management │
|
||||
│ • Agent Harness (LangChain/LangGraph orchestration) │
|
||||
│ - MCP client connector to user containers │
|
||||
│ - RAG retriever (Qdrant) │
|
||||
│ - Model router (LLM selection) │
|
||||
│ - Skills & subagents framework │
|
||||
│ • Dynamic user container provisioning │
|
||||
@@ -30,8 +29,7 @@ Dexorder is an AI-powered trading platform that combines real-time market data p
|
||||
┌──────────────────┐ ┌──────────────┐ ┌──────────────────────┐
|
||||
│ User Containers │ │ Relay │ │ Infrastructure │
|
||||
│ (per-user pods) │ │ (ZMQ Router) │ │ • DragonflyDB (cache)│
|
||||
│ │ │ │ │ • Qdrant (vectors) │
|
||||
│ • MCP Server │ │ • Market data│ │ • PostgreSQL (meta) │
|
||||
│ │ │ │ • MCP Server │ │ • Market data│ │ • PostgreSQL (meta) │
|
||||
│ • User files: │ │ fanout │ │ • MinIO (S3) │
|
||||
│ - Indicators │ │ • Work queue │ │ │
|
||||
│ - Strategies │ │ • Stateless │ │ │
|
||||
@@ -86,18 +84,16 @@ Dexorder is an AI-powered trading platform that combines real-time market data p
|
||||
- **Agent Harness (LangChain/LangGraph):** ([[agent_harness]])
|
||||
- Stateless LLM orchestration
|
||||
- MCP client connector to user containers
|
||||
- RAG retrieval from Qdrant (global + user-specific knowledge)
|
||||
- Model routing based on license tier and complexity
|
||||
- Skills and subagents framework
|
||||
- Workflow state machines with validation loops
|
||||
|
||||
**Key Features:**
|
||||
- **Stateless design:** All conversation state lives in user containers or Qdrant
|
||||
- **Stateless design:** All conversation state lives in user containers
|
||||
- **Multi-channel support:** WebSocket, Telegram (future: mobile, Discord, Slack)
|
||||
- **Kubernetes-native:** Uses k8s API for container management
|
||||
- **Three-tier memory:**
|
||||
- Redis: Hot storage, active sessions, LangGraph checkpoints (1 hour TTL)
|
||||
- Qdrant: Vector search, RAG, global + user knowledge, GDPR-compliant
|
||||
- Iceberg: Cold storage, full history, analytics, time-travel queries
|
||||
|
||||
**Infrastructure:**
|
||||
@@ -270,12 +266,6 @@ Exchange API → Ingestor → Kafka → Flink → Iceberg
|
||||
- Redis-compatible in-memory cache
|
||||
- Session state, rate limiting, hot data
|
||||
|
||||
#### Qdrant
|
||||
- Vector database for RAG
|
||||
- **Global knowledge** (user_id="0"): Platform capabilities, trading concepts, strategy patterns
|
||||
- **User knowledge** (user_id=specific): Personal conversations, preferences, strategies
|
||||
- GDPR-compliant (indexed by user_id for fast deletion)
|
||||
|
||||
#### PostgreSQL
|
||||
- Iceberg catalog metadata
|
||||
- User accounts and license info (gateway)
|
||||
@@ -458,17 +448,11 @@ The gateway's agent harness (LangChain/LangGraph) orchestrates LLM interactions
|
||||
│ - context://workspace-state
|
||||
│ - context://system-prompt
|
||||
│
|
||||
├─→ b. RAGRetriever searches Qdrant for relevant memories:
|
||||
│ - Embeds user query
|
||||
│ - Searches: user_id IN (current_user, "0")
|
||||
│ - Returns user-specific + global platform knowledge
|
||||
│
|
||||
├─→ c. Build system prompt:
|
||||
├─→ b. Build system prompt:
|
||||
│ - Base platform prompt
|
||||
│ - User profile context
|
||||
│ - Workspace state
|
||||
│ - Custom user instructions
|
||||
│ - Relevant RAG memories
|
||||
│
|
||||
├─→ d. ModelRouter selects LLM:
|
||||
│ - Based on license tier
|
||||
@@ -492,8 +476,6 @@ The gateway's agent harness (LangChain/LangGraph) orchestrates LLM interactions
|
||||
**Key Architecture:**
|
||||
- **Gateway is stateless:** No conversation history stored in gateway
|
||||
- **User context in MCP:** All user-specific data lives in user's container
|
||||
- **Global knowledge in Qdrant:** Platform documentation loaded from `gateway/knowledge/`
|
||||
- **RAG at gateway level:** Semantic search combines global + user knowledge
|
||||
- **Skills vs Subagents:**
|
||||
- Skills: Well-defined, single-purpose tasks
|
||||
- Subagents: Complex domain expertise with multi-file context
|
||||
@@ -630,7 +612,6 @@ See [[backend_redesign]] for detailed notes.
|
||||
- Historical backfill service
|
||||
|
||||
**Phase 3: Agent Features**
|
||||
- RAG integration (Qdrant)
|
||||
- Strategy backtesting
|
||||
- Risk management tools
|
||||
- Portfolio analytics
|
||||
|
||||
10
doc/plan.md
10
doc/plan.md
@@ -14,3 +14,13 @@
|
||||
* TradingView indicator import tool
|
||||
* Results persistence: ~~research analysis~~, backtests, strategy performance metrics, etc.
|
||||
* Free tier with token limits and sandbox shutdown
|
||||
* Performance analysis
|
||||
* Custom pre-session scanners / summaries
|
||||
* Saved prompts (Create /presession prompt command for easy re-use)
|
||||
|
||||
|
||||
https://github.com/wangzhe3224/awesome-systematic-trading
|
||||
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3247865 151 trading strategies
|
||||
https://vectorbt.dev/
|
||||
https://github.com/shiyu-coder/Kronos
|
||||
https://x.com/RohOnChain/status/2041180375838498950?s=20 combining signals
|
||||
|
||||
Reference in New Issue
Block a user