feat: add @tag model override support and remove Qdrant dependencies

- Add model-tags parser for @Tag syntax in chat messages
- Support Anthropic models (Sonnet, Haiku, Opus) via @tag
- Remove Qdrant vector database from infrastructure and configs
- Simplify license model config to use null fallbacks
- Add greeting stream after model switch via @tag
- Fix protobuf field names to camelCase for v7 compatibility
- Add 429 rate limit retry logic with exponential backoff
- Remove RAG references from agent harness documentation
This commit is contained in:
2026-04-27 20:55:18 -04:00
parent 6f937f9e5e
commit d41fcd0499
50 changed files with 956 additions and 798 deletions

View File

@@ -22,20 +22,20 @@ The Agent Harness is the core orchestration layer for the Dexorder AI platform,
│ ┌──────────────────┼──────────────────┐ │
│ │ │ │ │
│ ┌────▼─────┐ ┌────▼─────┐ ┌────▼─────┐ │
│ │ MCP │ │ LLM │ │ RAG │ │
│ │ Connector│ │ Router │ │ Retriever│ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
└─────────┼──────────────────┼──────────────────┼─────────────
│ │
▼ ▼
┌────────────┐ ┌───────────┐ ┌───────────┐
│ User's │ │ LLM │ │ Qdrant │
│ MCP │ │ Providers │ │ (Vectors) │
│ Container │ │(Anthropic,│ │ │
│ (k8s pod) │ │ OpenAI, │ │ Global + │
│ │ │ etc) │ │ User │
└────────────┘ └───────────┘ └───────────┘
│ │ MCP │ │ LLM │ │
│ │ Connector│ │ Router │ │
│ └────┬─────┘ └────┬─────┘ │
│ │ │ │
└─────────┼──────────────────┼─────────────┘
│ │
▼ ▼
┌────────────┐ ┌───────────┐
│ User's │ │ LLM │
│ MCP │ │ Providers │
│ Container │ │(Anthropic,│
│ (k8s pod) │ │ OpenAI, │
│ │ │ etc) │
└────────────┘ └───────────┘
```
## Message Processing Flow
@@ -57,17 +57,11 @@ When a user sends a message:
│ - context://workspace-state
│ - context://system-prompt
├─→ b. RAGRetriever searches for relevant memories:
│ - Embeds user query
│ - Searches Qdrant: user_id = current_user OR user_id = "0"
│ - Returns user-specific + global platform knowledge
├─→ c. Build system prompt:
├─→ b. Build system prompt:
│ - Base platform prompt
│ - User profile context
│ - Workspace state
│ - Custom user instructions
│ - Relevant RAG memories
├─→ d. ModelRouter selects LLM:
│ - Based on license tier
@@ -92,11 +86,10 @@ When a user sends a message:
### 1. Agent Harness (`gateway/src/harness/agent-harness.ts`)
**Stateless orchestrator** - all state lives in user's MCP container or RAG.
**Stateless orchestrator** - all state lives in user's MCP container.
**Responsibilities:**
- Fetch context from user's MCP resources
- Query RAG for relevant memories
- Build prompts with full context
- Route to appropriate LLM
- Handle tool calls (platform vs user)
@@ -141,40 +134,12 @@ Routes queries to appropriate LLM based on:
- LangGraph checkpoints (1 hour TTL)
- Fast reads for active conversations
**Qdrant** (Vector Search)
- Conversation embeddings
- User-specific memories (user_id = actual user ID)
- **Global platform knowledge** (user_id = "0")
- RAG retrieval with cosine similarity
- GDPR-compliant (indexed by user_id for fast deletion)
**Iceberg** (Cold Storage)
- Full conversation history (partitioned by user_id, session_id)
- Checkpoint snapshots for replay
- Analytics and time-travel queries
- GDPR-compliant with compaction
#### RAG System:
**Global Knowledge** (user_id="0"):
- Platform capabilities and architecture
- Trading concepts and fundamentals
- Indicator development guides
- Strategy patterns and examples
- Loaded from `gateway/knowledge/` markdown files
**User Knowledge** (user_id=specific user):
- Personal conversation history
- Trading preferences and style
- Custom indicators and strategies
- Workspace state and context
**Query Flow:**
1. User query is embedded using EmbeddingService
2. Qdrant searches: `user_id IN (current_user, "0")`
3. Top-K relevant chunks returned
4. Added to LLM context automatically
### 5. Skills vs Subagents
#### Skills (`gateway/src/harness/skills/`)
@@ -290,44 +255,6 @@ User's MCP container provides access to:
- Tactical order generators (TWAP, iceberg, etc.)
- Smart order routing
## Global Knowledge Management
### Document Loading
At gateway startup:
1. DocumentLoader scans `gateway/knowledge/` directory
2. Markdown files chunked by headers (~1000 tokens/chunk)
3. Embeddings generated via EmbeddingService
4. Stored in Qdrant with user_id="0"
5. Content hashing enables incremental updates
### Directory Structure
```
gateway/knowledge/
├── platform/ # Platform capabilities
├── trading/ # Trading fundamentals
├── indicators/ # Indicator development
└── strategies/ # Strategy patterns
```
### Updating Knowledge
**Development:**
```bash
curl -X POST http://localhost:3000/admin/reload-knowledge
```
**Production:**
- Update markdown files
- Deploy new version
- Auto-loaded on startup
**Monitoring:**
```bash
curl http://localhost:3000/admin/knowledge-stats
```
## Container Lifecycle
### User Container Creation
@@ -362,7 +289,6 @@ When user connects:
### ✅ Completed
- Agent Harness with MCP integration
- Model routing with license tiers
- RAG retriever with Qdrant
- Document loader for global knowledge
- EmbeddingService (Ollama/OpenAI)
- Skills and subagents framework
@@ -388,5 +314,4 @@ When user connects:
- Documentation: `gateway/src/harness/README.md`
- Knowledge base: `gateway/knowledge/`
- LangGraph: https://langchain-ai.github.io/langgraphjs/
- Qdrant: https://qdrant.tech/documentation/
- MCP Spec: https://modelcontextprotocol.io/