backend redesign

This commit is contained in:
2026-03-11 18:47:11 -04:00
parent 8ff277c8c6
commit e99ef5d2dd
210 changed files with 12147 additions and 155 deletions

110
doc/backend_redesign.md Normal file
View File

@@ -0,0 +1,110 @@
# aiignore
# This is not implemented yet and are just notes for Tim
# Overview
We need a realtime data system that is scalable and durable, so we have the following architecture:
* Protobufs over ZeroMQ for data streaming
* Ingestors
* Realtime data subscriptions (tick data)
* Historical data queries (OHLC)
* Everything pushes to Kafka topics
* Kafka
* Durable append logs for incoming and in-process data
* Topics maintained by Flink in redesign/flink/src/main/resources/topics.yaml
* Flink
* Raw ingestor streams are read from Kafka
* Deduplication
* Builds OHLCs
* Apache Iceberg
* Historical data storage
# Configuration
All systems should use two YAML configuration files that are mounted by k8s from a ConfigMap and / or Secrets. Keep secrets separate from config.
When a configuration or secrets item is needed, describe it in resdesign/doc/config.md
# Ingest
Ingestion API
* all symbols
* exchange id (BINANCE)
* market_id (BTC/USDT)
* market_type
* Spot
* description (Bitcoin/Tether on Binance)
* column names ( ['open', 'high', 'low', 'close', 'volume', 'taker_vol', 'maker_vol'])
* name
* exchange
* base asset
* quote asset
* earliest time
* tick size
* supported periods
* Centralized data streaming backend
* Ingestion of tick, ohlc, news, etc. into Kafka by worker gatherers
* Flink with:
* zmq pubsub
* (seq, time) key for every row in a tick series
* every series also has seq->time and time->seq indexes
* Sequence tickers with strict seq's AND time index (seq can just be row counter autoincrement)
* Historical data
* Apache Iceberg
* Clients query here first
* Backfill service
* Quote Server
* Realtime current prices for selected quote currencies
* Workspace
* Current chart, indicators, drawings, etc.
* Always in context, must be brief. Data series are a reference not the actual data.
* Analysis
* Analysis engines are short-running and always tied to a user
* Free users lose pod and data when session times out
* Conda available with many preinstalled packages
* Pip & Conda configured to install
* Src dir r/w with git
* Indicators
* Strategies
* Analysis
* Request Context
* User ID
* Workspace ID
* Channel
* Telegram
* Web
* Website
* Current vue site
* Gateway
* Websocket gateway
* Authentication
* User Featureset / License Info added to requests/headers
* Relays data pub/sub to web/mobile clients
* Routes agent chat to/from user container
* Active channel features
* TV Chart
* Text chat
* Plot out
* Voice/Audio
* Static file server
* Kafka
* Temp Gateway files (image responses, etc.)
* Logs
* Kafka
* Strategy Logs
* Order/Execution Logs
* Chat Logs
* User ID Topic has TTL based on license
* Agent Framework
* Soul file
* Tool set (incl subagents)
* LLM choice
* RAG namespace
* Agents
* Top-level coordinator
* TradingView agent
* Indicators, Drawings, Annotations
* Research Agent
* Pandas/Polars analysis
* Plot generation
* License Manager
* Kafka Topics Doc w/ schemas
*