backend redesign

2026-03-11 18:47:11 -04:00
parent 8ff277c8c6
commit e99ef5d2dd
210 changed files with 12147 additions and 155 deletions
--- a/doc/backend_redesign.md
+++ b/doc/backend_redesign.md
@@ -0,0 +1,110 @@
+# aiignore
+# This is not implemented yet and are just notes for Tim
+
+# Overview
+We need a realtime data system that is scalable and durable, so we have the following architecture:
+* Protobufs over ZeroMQ for data streaming
+* Ingestors
+  * Realtime data subscriptions (tick data)
+  * Historical data queries (OHLC)
+  * Everything pushes to Kafka topics
+* Kafka
+  * Durable append logs for incoming and in-process data
+  * Topics maintained by Flink in redesign/flink/src/main/resources/topics.yaml
+* Flink
+  * Raw ingestor streams are read from Kafka
+  * Deduplication
+  * Builds OHLCs
+* Apache Iceberg
+  * Historical data storage
+
+# Configuration
+All systems should use two YAML configuration files that are mounted by k8s from a ConfigMap and / or Secrets. Keep secrets separate from config.
+
+When a configuration or secrets item is needed, describe it in resdesign/doc/config.md
+
+# Ingest
+Ingestion API
+* all symbols
+  * exchange id (BINANCE)
+  * market_id (BTC/USDT)
+  * market_type
+    * Spot
+  * description (Bitcoin/Tether on Binance)
+  * column names ( ['open', 'high', 'low', 'close', 'volume', 'taker_vol', 'maker_vol'])
+  * name
+  * exchange
+  * base asset
+  * quote asset
+  * earliest time
+  * tick size
+  * supported periods
+
+* Centralized data streaming backend
+  * Ingestion of tick, ohlc, news, etc. into Kafka by worker gatherers
+  * Flink with:
+    * zmq pubsub
+  * (seq, time) key for every row in a tick series
+  * every series also has seq->time and time->seq indexes
+  * Sequence tickers with strict seq's AND time index (seq can just be row counter autoincrement)
+* Historical data
+  * Apache Iceberg
+    * Clients query here first
+  * Backfill service
+* Quote Server
+  * Realtime current prices for selected quote currencies
+* Workspace
+  * Current chart, indicators, drawings, etc.
+  * Always in context, must be brief. Data series are a reference not the actual data.
+* Analysis
+  * Analysis engines are short-running and always tied to a user
+  * Free users lose pod and data when session times out
+  * Conda available with many preinstalled packages
+    * Pip & Conda configured to install
+  * Src dir r/w with git
+    * Indicators
+    * Strategies
+    * Analysis
+* Request Context
+  * User ID
+  * Workspace ID
+  * Channel
+    * Telegram
+    * Web
+* Website
+  * Current vue site
+* Gateway
+  * Websocket gateway
+    * Authentication
+      * User Featureset / License Info added to requests/headers
+    * Relays data pub/sub to web/mobile clients
+    * Routes agent chat to/from user container
+  * Active channel features
+    * TV Chart
+    * Text chat
+    * Plot out
+    * Voice/Audio
+  * Static file server
+    * Kafka
+    * Temp Gateway files (image responses, etc.)
+* Logs
+  * Kafka
+    * Strategy Logs
+    * Order/Execution Logs
+  * Chat Logs
+    * User ID Topic has TTL based on license
+* Agent Framework
+  * Soul file
+  * Tool set (incl subagents)
+  * LLM choice
+  * RAG namespace
+  * Agents
+    * Top-level coordinator
+    * TradingView agent
+      * Indicators, Drawings, Annotations
+    * Research Agent
+      * Pandas/Polars analysis
+      * Plot generation
+* License Manager
+* Kafka Topics Doc w/ schemas
+*