feat: add @tag model override support and remove Qdrant dependencies

- Add model-tags parser for @Tag syntax in chat messages - Support Anthropic models (Sonnet, Haiku, Opus) via @tag - Remove Qdrant vector database from infrastructure and configs - Simplify license model config to use null fallbacks - Add greeting stream after model switch via @tag - Fix protobuf field names to camelCase for v7 compatibility - Add 429 rate limit retry logic with exponential backoff - Remove RAG references from agent harness documentation
2026-04-27 20:55:18 -04:00
parent 6f937f9e5e
commit d41fcd0499
50 changed files with 956 additions and 798 deletions
--- a/doc/CLUSTER_SETUP.md
+++ b/doc/CLUSTER_SETUP.md
@@ -10,7 +10,7 @@ The platform runs across two namespaces:

 | Namespace | Contents |
 |-----------|----------|
-| `ai` | Gateway, web UI, all infrastructure services (postgres, minio, kafka, flink, relay, ingestor, qdrant, dragonfly, iceberg-catalog) |
+| `ai` | Gateway, web UI, all infrastructure services (postgres, minio, kafka, flink, relay, ingestor, dragonfly, iceberg-catalog) |
 | `sandbox` | Per-user sandbox containers (created dynamically by the gateway) |

 Secrets are managed via 1Password CLI (`op inject`). All `.tpl.yaml` files in `deploy/k8s/prod/secrets/` contain `op://` references and are safe to commit; actual values are never stored in git.
@@ -217,7 +217,7 @@ kubectl --context=prod -n ai get configmaps

 ## Step 7 — Deploy Infrastructure

-Infrastructure services (postgres, minio, kafka, iceberg-catalog, dragonfly, qdrant, relay, ingestor, flink) are defined in `deploy/k8s/prod/infrastructure.yaml` and were applied in Step 4.
+Infrastructure services (postgres, minio, kafka, iceberg-catalog, dragonfly, relay, ingestor, flink) are defined in `deploy/k8s/prod/infrastructure.yaml` and were applied in Step 4.

 Wait for the StatefulSets and Deployments to become ready:

@@ -225,7 +225,6 @@ Wait for the StatefulSets and Deployments to become ready:
 kubectl --context=prod -n ai rollout status statefulset/postgres
 kubectl --context=prod -n ai rollout status statefulset/minio
 kubectl --context=prod -n ai rollout status statefulset/kafka
-kubectl --context=prod -n ai rollout status statefulset/qdrant
 kubectl --context=prod -n ai rollout status deployment/dragonfly
 kubectl --context=prod -n ai rollout status deployment/iceberg-catalog
 kubectl --context=prod -n ai rollout status deployment/relay
--- a/doc/agent_harness.md
+++ b/doc/agent_harness.md
@@ -22,20 +22,20 @@ The Agent Harness is the core orchestration layer for the Dexorder AI platform,
 │         ┌──────────────────┼──────────────────┐             │
 │         │                  │                  │             │
 │    ┌────▼─────┐      ┌────▼─────┐      ┌────▼─────┐       │
-│    │   MCP    │      │   LLM    │      │   RAG    │       │
-│    │ Connector│      │  Router  │      │ Retriever│       │
-│    └────┬─────┘      └────┬─────┘      └────┬─────┘       │
-│         │                  │                  │             │
-└─────────┼──────────────────┼──────────────────┼─────────────┘
-          │                  │                  │
-          ▼                  ▼                  ▼
-   ┌────────────┐     ┌───────────┐     ┌───────────┐
-   │   User's   │     │    LLM    │     │  Qdrant   │
-   │    MCP     │     │ Providers │     │ (Vectors) │
-   │ Container  │     │(Anthropic,│     │           │
-   │ (k8s pod)  │     │  OpenAI,  │     │  Global + │
-   │            │     │   etc)    │     │   User    │
-   └────────────┘     └───────────┘     └───────────┘
+│    │   MCP    │      │   LLM    │      │
+│    │ Connector│      │  Router  │      │
+│    └────┬─────┘      └────┬─────┘      │
+│         │                  │             │
+└─────────┼──────────────────┼─────────────┘
+          │                  │
+          ▼                  ▼
+   ┌────────────┐     ┌───────────┐
+   │   User's   │     │    LLM    │
+   │    MCP     │     │ Providers │
+   │ Container  │     │(Anthropic,│
+   │ (k8s pod)  │     │  OpenAI,  │
+   │            │     │   etc)    │
+   └────────────┘     └───────────┘
 ```

 ## Message Processing Flow
@@ -57,17 +57,11 @@ When a user sends a message:
   │      - context://workspace-state
   │      - context://system-prompt
   │
-   ├─→ b. RAGRetriever searches for relevant memories:
-   │      - Embeds user query
-   │      - Searches Qdrant: user_id = current_user OR user_id = "0"
-   │      - Returns user-specific + global platform knowledge
-   │
-   ├─→ c. Build system prompt:
+   ├─→ b. Build system prompt:
   │      - Base platform prompt
   │      - User profile context
   │      - Workspace state
   │      - Custom user instructions
-   │      - Relevant RAG memories
   │
   ├─→ d. ModelRouter selects LLM:
   │      - Based on license tier
@@ -92,11 +86,10 @@ When a user sends a message:

 ### 1. Agent Harness (`gateway/src/harness/agent-harness.ts`)

-**Stateless orchestrator** - all state lives in user's MCP container or RAG.
+**Stateless orchestrator** - all state lives in user's MCP container.

 **Responsibilities:**
 - Fetch context from user's MCP resources
- Query RAG for relevant memories
 - Build prompts with full context
 - Route to appropriate LLM
 - Handle tool calls (platform vs user)
@@ -141,40 +134,12 @@ Routes queries to appropriate LLM based on:
 - LangGraph checkpoints (1 hour TTL)
 - Fast reads for active conversations

-**Qdrant** (Vector Search)
- Conversation embeddings
- User-specific memories (user_id = actual user ID)
- **Global platform knowledge** (user_id = "0")
- RAG retrieval with cosine similarity
- GDPR-compliant (indexed by user_id for fast deletion)
-
 **Iceberg** (Cold Storage)
 - Full conversation history (partitioned by user_id, session_id)
 - Checkpoint snapshots for replay
 - Analytics and time-travel queries
 - GDPR-compliant with compaction

-#### RAG System:
-
-**Global Knowledge** (user_id="0"):
- Platform capabilities and architecture
- Trading concepts and fundamentals
- Indicator development guides
- Strategy patterns and examples
- Loaded from `gateway/knowledge/` markdown files
-
-**User Knowledge** (user_id=specific user):
- Personal conversation history
- Trading preferences and style
- Custom indicators and strategies
- Workspace state and context
-
-**Query Flow:**
-1. User query is embedded using EmbeddingService
-2. Qdrant searches: `user_id IN (current_user, "0")`
-3. Top-K relevant chunks returned
-4. Added to LLM context automatically
-
 ### 5. Skills vs Subagents

 #### Skills (`gateway/src/harness/skills/`)
@@ -290,44 +255,6 @@ User's MCP container provides access to:
 - Tactical order generators (TWAP, iceberg, etc.)
 - Smart order routing

-## Global Knowledge Management
-
-### Document Loading
-
-At gateway startup:
-1. DocumentLoader scans `gateway/knowledge/` directory
-2. Markdown files chunked by headers (~1000 tokens/chunk)
-3. Embeddings generated via EmbeddingService
-4. Stored in Qdrant with user_id="0"
-5. Content hashing enables incremental updates
-
-### Directory Structure
-
-```
-gateway/knowledge/
-  ├── platform/          # Platform capabilities
-  ├── trading/           # Trading fundamentals
-  ├── indicators/        # Indicator development
-  └── strategies/        # Strategy patterns
-```
-
-### Updating Knowledge
-
-**Development:**
-```bash
-curl -X POST http://localhost:3000/admin/reload-knowledge
-```
-
-**Production:**
- Update markdown files
- Deploy new version
- Auto-loaded on startup
-
-**Monitoring:**
-```bash
-curl http://localhost:3000/admin/knowledge-stats
-```
-
 ## Container Lifecycle

 ### User Container Creation
@@ -362,7 +289,6 @@ When user connects:
 ### ✅ Completed
 - Agent Harness with MCP integration
 - Model routing with license tiers
- RAG retriever with Qdrant
 - Document loader for global knowledge
 - EmbeddingService (Ollama/OpenAI)
 - Skills and subagents framework
@@ -388,5 +314,4 @@ When user connects:
 - Documentation: `gateway/src/harness/README.md`
 - Knowledge base: `gateway/knowledge/`
 - LangGraph: https://langchain-ai.github.io/langgraphjs/
- Qdrant: https://qdrant.tech/documentation/
 - MCP Spec: https://modelcontextprotocol.io/
--- a/doc/architecture.md
+++ b/doc/architecture.md
@@ -19,7 +19,6 @@ Dexorder is an AI-powered trading platform that combines real-time market data p
 │  • Authentication & session management                           │
 │  • Agent Harness (LangChain/LangGraph orchestration)            │
 │    - MCP client connector to user containers                    │
-│    - RAG retriever (Qdrant)                                     │
 │    - Model router (LLM selection)                               │
 │    - Skills & subagents framework                               │
 │  • Dynamic user container provisioning                           │
@@ -30,8 +29,7 @@ Dexorder is an AI-powered trading platform that combines real-time market data p
 ┌──────────────────┐  ┌──────────────┐   ┌──────────────────────┐
 │ User Containers  │  │    Relay     │   │   Infrastructure     │
 │ (per-user pods)  │  │ (ZMQ Router) │   │ • DragonflyDB (cache)│
-│                  │  │              │   │ • Qdrant (vectors)   │
-│ • MCP Server     │  │ • Market data│   │ • PostgreSQL (meta)  │
+│                  │  │              │ • MCP Server     │  │ • Market data│   │ • PostgreSQL (meta)  │
 │ • User files:    │  │   fanout     │   │ • MinIO (S3)         │
 │   - Indicators   │  │ • Work queue │   │                      │
 │   - Strategies   │  │ • Stateless  │   │                      │
@@ -86,18 +84,16 @@ Dexorder is an AI-powered trading platform that combines real-time market data p
 - **Agent Harness (LangChain/LangGraph):** ([[agent_harness]])
  - Stateless LLM orchestration
  - MCP client connector to user containers
-  - RAG retrieval from Qdrant (global + user-specific knowledge)
  - Model routing based on license tier and complexity
  - Skills and subagents framework
  - Workflow state machines with validation loops

 **Key Features:**
- **Stateless design:** All conversation state lives in user containers or Qdrant
+- **Stateless design:** All conversation state lives in user containers
 - **Multi-channel support:** WebSocket, Telegram (future: mobile, Discord, Slack)
 - **Kubernetes-native:** Uses k8s API for container management
 - **Three-tier memory:**
  - Redis: Hot storage, active sessions, LangGraph checkpoints (1 hour TTL)
-  - Qdrant: Vector search, RAG, global + user knowledge, GDPR-compliant
  - Iceberg: Cold storage, full history, analytics, time-travel queries

 **Infrastructure:**
@@ -270,12 +266,6 @@ Exchange API → Ingestor → Kafka → Flink → Iceberg
 - Redis-compatible in-memory cache
 - Session state, rate limiting, hot data

-#### Qdrant
- Vector database for RAG
- **Global knowledge** (user_id="0"): Platform capabilities, trading concepts, strategy patterns
- **User knowledge** (user_id=specific): Personal conversations, preferences, strategies
- GDPR-compliant (indexed by user_id for fast deletion)
-
 #### PostgreSQL
 - Iceberg catalog metadata
 - User accounts and license info (gateway)
@@ -458,17 +448,11 @@ The gateway's agent harness (LangChain/LangGraph) orchestrates LLM interactions
   │      - context://workspace-state
   │      - context://system-prompt
   │
-   ├─→ b. RAGRetriever searches Qdrant for relevant memories:
-   │      - Embeds user query
-   │      - Searches: user_id IN (current_user, "0")
-   │      - Returns user-specific + global platform knowledge
-   │
-   ├─→ c. Build system prompt:
+   ├─→ b. Build system prompt:
   │      - Base platform prompt
   │      - User profile context
   │      - Workspace state
   │      - Custom user instructions
-   │      - Relevant RAG memories
   │
   ├─→ d. ModelRouter selects LLM:
   │      - Based on license tier
@@ -492,8 +476,6 @@ The gateway's agent harness (LangChain/LangGraph) orchestrates LLM interactions
 **Key Architecture:**
 - **Gateway is stateless:** No conversation history stored in gateway
 - **User context in MCP:** All user-specific data lives in user's container
- **Global knowledge in Qdrant:** Platform documentation loaded from `gateway/knowledge/`
- **RAG at gateway level:** Semantic search combines global + user knowledge
 - **Skills vs Subagents:**
  - Skills: Well-defined, single-purpose tasks
  - Subagents: Complex domain expertise with multi-file context
@@ -630,7 +612,6 @@ See [[backend_redesign]] for detailed notes.
 - Historical backfill service

 **Phase 3: Agent Features**
- RAG integration (Qdrant)
 - Strategy backtesting
 - Risk management tools
 - Portfolio analytics
--- a/doc/plan.md
+++ b/doc/plan.md
@@ -14,3 +14,13 @@
 * TradingView indicator import tool
 * Results persistence: ~~research analysis~~, backtests, strategy performance metrics, etc.
 * Free tier with token limits and sandbox shutdown
+* Performance analysis
+* Custom pre-session scanners / summaries
+* Saved prompts (Create /presession prompt command for easy re-use)
+
+
+https://github.com/wangzhe3224/awesome-systematic-trading
+https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3247865 151 trading strategies
+https://vectorbt.dev/
+https://github.com/shiyu-coder/Kronos
+https://x.com/RohOnChain/status/2041180375838498950?s=20 combining signals