feat: add @tag model override support and remove Qdrant dependencies

- Add model-tags parser for @Tag syntax in chat messages - Support Anthropic models (Sonnet, Haiku, Opus) via @tag - Remove Qdrant vector database from infrastructure and configs - Simplify license model config to use null fallbacks - Add greeting stream after model switch via @tag - Fix protobuf field names to camelCase for v7 compatibility - Add 429 rate limit retry logic with exponential backoff - Remove RAG references from agent harness documentation
2026-04-27 20:55:18 -04:00
parent 6f937f9e5e
commit d41fcd0499
50 changed files with 956 additions and 798 deletions
--- a/doc/agent_harness.md
+++ b/doc/agent_harness.md
@@ -22,20 +22,20 @@ The Agent Harness is the core orchestration layer for the Dexorder AI platform,
 │         ┌──────────────────┼──────────────────┐             │
 │         │                  │                  │             │
 │    ┌────▼─────┐      ┌────▼─────┐      ┌────▼─────┐       │
-│    │   MCP    │      │   LLM    │      │   RAG    │       │
-│    │ Connector│      │  Router  │      │ Retriever│       │
-│    └────┬─────┘      └────┬─────┘      └────┬─────┘       │
-│         │                  │                  │             │
-└─────────┼──────────────────┼──────────────────┼─────────────┘
-          │                  │                  │
-          ▼                  ▼                  ▼
-   ┌────────────┐     ┌───────────┐     ┌───────────┐
-   │   User's   │     │    LLM    │     │  Qdrant   │
-   │    MCP     │     │ Providers │     │ (Vectors) │
-   │ Container  │     │(Anthropic,│     │           │
-   │ (k8s pod)  │     │  OpenAI,  │     │  Global + │
-   │            │     │   etc)    │     │   User    │
-   └────────────┘     └───────────┘     └───────────┘
+│    │   MCP    │      │   LLM    │      │
+│    │ Connector│      │  Router  │      │
+│    └────┬─────┘      └────┬─────┘      │
+│         │                  │             │
+└─────────┼──────────────────┼─────────────┘
+          │                  │
+          ▼                  ▼
+   ┌────────────┐     ┌───────────┐
+   │   User's   │     │    LLM    │
+   │    MCP     │     │ Providers │
+   │ Container  │     │(Anthropic,│
+   │ (k8s pod)  │     │  OpenAI,  │
+   │            │     │   etc)    │
+   └────────────┘     └───────────┘
 ```

 ## Message Processing Flow
@@ -57,17 +57,11 @@ When a user sends a message:
   │      - context://workspace-state
   │      - context://system-prompt
   │
-   ├─→ b. RAGRetriever searches for relevant memories:
-   │      - Embeds user query
-   │      - Searches Qdrant: user_id = current_user OR user_id = "0"
-   │      - Returns user-specific + global platform knowledge
-   │
-   ├─→ c. Build system prompt:
+   ├─→ b. Build system prompt:
   │      - Base platform prompt
   │      - User profile context
   │      - Workspace state
   │      - Custom user instructions
-   │      - Relevant RAG memories
   │
   ├─→ d. ModelRouter selects LLM:
   │      - Based on license tier
@@ -92,11 +86,10 @@ When a user sends a message:

 ### 1. Agent Harness (`gateway/src/harness/agent-harness.ts`)

-**Stateless orchestrator** - all state lives in user's MCP container or RAG.
+**Stateless orchestrator** - all state lives in user's MCP container.

 **Responsibilities:**
 - Fetch context from user's MCP resources
- Query RAG for relevant memories
 - Build prompts with full context
 - Route to appropriate LLM
 - Handle tool calls (platform vs user)
@@ -141,40 +134,12 @@ Routes queries to appropriate LLM based on:
 - LangGraph checkpoints (1 hour TTL)
 - Fast reads for active conversations

-**Qdrant** (Vector Search)
- Conversation embeddings
- User-specific memories (user_id = actual user ID)
- **Global platform knowledge** (user_id = "0")
- RAG retrieval with cosine similarity
- GDPR-compliant (indexed by user_id for fast deletion)
-
 **Iceberg** (Cold Storage)
 - Full conversation history (partitioned by user_id, session_id)
 - Checkpoint snapshots for replay
 - Analytics and time-travel queries
 - GDPR-compliant with compaction

-#### RAG System:
-
-**Global Knowledge** (user_id="0"):
- Platform capabilities and architecture
- Trading concepts and fundamentals
- Indicator development guides
- Strategy patterns and examples
- Loaded from `gateway/knowledge/` markdown files
-
-**User Knowledge** (user_id=specific user):
- Personal conversation history
- Trading preferences and style
- Custom indicators and strategies
- Workspace state and context
-
-**Query Flow:**
-1. User query is embedded using EmbeddingService
-2. Qdrant searches: `user_id IN (current_user, "0")`
-3. Top-K relevant chunks returned
-4. Added to LLM context automatically
-
 ### 5. Skills vs Subagents

 #### Skills (`gateway/src/harness/skills/`)
@@ -290,44 +255,6 @@ User's MCP container provides access to:
 - Tactical order generators (TWAP, iceberg, etc.)
 - Smart order routing

-## Global Knowledge Management
-
-### Document Loading
-
-At gateway startup:
-1. DocumentLoader scans `gateway/knowledge/` directory
-2. Markdown files chunked by headers (~1000 tokens/chunk)
-3. Embeddings generated via EmbeddingService
-4. Stored in Qdrant with user_id="0"
-5. Content hashing enables incremental updates
-
-### Directory Structure
-
-```
-gateway/knowledge/
-  ├── platform/          # Platform capabilities
-  ├── trading/           # Trading fundamentals
-  ├── indicators/        # Indicator development
-  └── strategies/        # Strategy patterns
-```
-
-### Updating Knowledge
-
-**Development:**
-```bash
-curl -X POST http://localhost:3000/admin/reload-knowledge
-```
-
-**Production:**
- Update markdown files
- Deploy new version
- Auto-loaded on startup
-
-**Monitoring:**
-```bash
-curl http://localhost:3000/admin/knowledge-stats
-```
-
 ## Container Lifecycle

 ### User Container Creation
@@ -362,7 +289,6 @@ When user connects:
 ### ✅ Completed
 - Agent Harness with MCP integration
 - Model routing with license tiers
- RAG retriever with Qdrant
 - Document loader for global knowledge
 - EmbeddingService (Ollama/OpenAI)
 - Skills and subagents framework
@@ -388,5 +314,4 @@ When user connects:
 - Documentation: `gateway/src/harness/README.md`
 - Knowledge base: `gateway/knowledge/`
 - LangGraph: https://langchain-ai.github.io/langgraphjs/
- Qdrant: https://qdrant.tech/documentation/
 - MCP Spec: https://modelcontextprotocol.io/