feat: add @tag model override support and remove Qdrant dependencies

- Add model-tags parser for @Tag syntax in chat messages - Support Anthropic models (Sonnet, Haiku, Opus) via @tag - Remove Qdrant vector database from infrastructure and configs - Simplify license model config to use null fallbacks - Add greeting stream after model switch via @tag - Fix protobuf field names to camelCase for v7 compatibility - Add 429 rate limit retry logic with exponential backoff - Remove RAG references from agent harness documentation
2026-04-27 20:55:18 -04:00
parent 6f937f9e5e
commit d41fcd0499
50 changed files with 956 additions and 798 deletions
--- a/doc/architecture.md
+++ b/doc/architecture.md
@@ -19,7 +19,6 @@ Dexorder is an AI-powered trading platform that combines real-time market data p
 │  • Authentication & session management                           │
 │  • Agent Harness (LangChain/LangGraph orchestration)            │
 │    - MCP client connector to user containers                    │
-│    - RAG retriever (Qdrant)                                     │
 │    - Model router (LLM selection)                               │
 │    - Skills & subagents framework                               │
 │  • Dynamic user container provisioning                           │
@@ -30,8 +29,7 @@ Dexorder is an AI-powered trading platform that combines real-time market data p
 ┌──────────────────┐  ┌──────────────┐   ┌──────────────────────┐
 │ User Containers  │  │    Relay     │   │   Infrastructure     │
 │ (per-user pods)  │  │ (ZMQ Router) │   │ • DragonflyDB (cache)│
-│                  │  │              │   │ • Qdrant (vectors)   │
-│ • MCP Server     │  │ • Market data│   │ • PostgreSQL (meta)  │
+│                  │  │              │ • MCP Server     │  │ • Market data│   │ • PostgreSQL (meta)  │
 │ • User files:    │  │   fanout     │   │ • MinIO (S3)         │
 │   - Indicators   │  │ • Work queue │   │                      │
 │   - Strategies   │  │ • Stateless  │   │                      │
@@ -86,18 +84,16 @@ Dexorder is an AI-powered trading platform that combines real-time market data p
 - **Agent Harness (LangChain/LangGraph):** ([[agent_harness]])
  - Stateless LLM orchestration
  - MCP client connector to user containers
-  - RAG retrieval from Qdrant (global + user-specific knowledge)
  - Model routing based on license tier and complexity
  - Skills and subagents framework
  - Workflow state machines with validation loops

 **Key Features:**
- **Stateless design:** All conversation state lives in user containers or Qdrant
+- **Stateless design:** All conversation state lives in user containers
 - **Multi-channel support:** WebSocket, Telegram (future: mobile, Discord, Slack)
 - **Kubernetes-native:** Uses k8s API for container management
 - **Three-tier memory:**
  - Redis: Hot storage, active sessions, LangGraph checkpoints (1 hour TTL)
-  - Qdrant: Vector search, RAG, global + user knowledge, GDPR-compliant
  - Iceberg: Cold storage, full history, analytics, time-travel queries

 **Infrastructure:**
@@ -270,12 +266,6 @@ Exchange API → Ingestor → Kafka → Flink → Iceberg
 - Redis-compatible in-memory cache
 - Session state, rate limiting, hot data

-#### Qdrant
- Vector database for RAG
- **Global knowledge** (user_id="0"): Platform capabilities, trading concepts, strategy patterns
- **User knowledge** (user_id=specific): Personal conversations, preferences, strategies
- GDPR-compliant (indexed by user_id for fast deletion)
-
 #### PostgreSQL
 - Iceberg catalog metadata
 - User accounts and license info (gateway)
@@ -458,17 +448,11 @@ The gateway's agent harness (LangChain/LangGraph) orchestrates LLM interactions
   │      - context://workspace-state
   │      - context://system-prompt
   │
-   ├─→ b. RAGRetriever searches Qdrant for relevant memories:
-   │      - Embeds user query
-   │      - Searches: user_id IN (current_user, "0")
-   │      - Returns user-specific + global platform knowledge
-   │
-   ├─→ c. Build system prompt:
+   ├─→ b. Build system prompt:
   │      - Base platform prompt
   │      - User profile context
   │      - Workspace state
   │      - Custom user instructions
-   │      - Relevant RAG memories
   │
   ├─→ d. ModelRouter selects LLM:
   │      - Based on license tier
@@ -492,8 +476,6 @@ The gateway's agent harness (LangChain/LangGraph) orchestrates LLM interactions
 **Key Architecture:**
 - **Gateway is stateless:** No conversation history stored in gateway
 - **User context in MCP:** All user-specific data lives in user's container
- **Global knowledge in Qdrant:** Platform documentation loaded from `gateway/knowledge/`
- **RAG at gateway level:** Semantic search combines global + user knowledge
 - **Skills vs Subagents:**
  - Skills: Well-defined, single-purpose tasks
  - Subagents: Complex domain expertise with multi-file context
@@ -630,7 +612,6 @@ See [[backend_redesign]] for detailed notes.
 - Historical backfill service

 **Phase 3: Agent Features**
- RAG integration (Qdrant)
 - Strategy backtesting
 - Risk management tools
 - Portfolio analytics