ai/doc/backend_redesign.md

# aiignore
# This is not implemented yet and are just notes for Tim

# Overview
We need a realtime data system that is scalable and durable, so we have the following architecture:
* Protobufs over ZeroMQ for data streaming
* Ingestors
  * Realtime data subscriptions (tick data)
  * Historical data queries (OHLC)
  * Everything pushes to Kafka topics
* Kafka
  * Durable append logs for incoming and in-process data
  * Topics maintained by Flink in redesign/flink/src/main/resources/topics.yaml
* Flink
  * Raw ingestor streams are read from Kafka
  * Deduplication
  * Builds OHLCs
* Apache Iceberg
  * Historical data storage

# Configuration
All systems should use two YAML configuration files that are mounted by k8s from a ConfigMap and / or Secrets. Keep secrets separate from config.

When a configuration or secrets item is needed, describe it in resdesign/doc/config.md

# Ingest
Ingestion API
* all symbols
  * exchange id (BINANCE)
  * market_id (BTC/USDT)
  * market_type
    * Spot
  * description (Bitcoin/Tether on Binance)
  * column names ( ['open', 'high', 'low', 'close', 'volume', 'taker_vol', 'maker_vol'])
  * name
  * exchange
  * base asset
  * quote asset
  * earliest time
  * tick size
  * supported periods

* Centralized data streaming backend
  * Ingestion of tick, ohlc, news, etc. into Kafka by worker gatherers
  * Flink with:
    * zmq pubsub
  * (seq, time) key for every row in a tick series
  * every series also has seq->time and time->seq indexes
  * Sequence tickers with strict seq's AND time index (seq can just be row counter autoincrement)
* Historical data
  * Apache Iceberg
    * Clients query here first
  * Backfill service
* Quote Server
  * Realtime current prices for selected quote currencies
* Workspace
  * Current chart, indicators, drawings, etc.
  * Always in context, must be brief. Data series are a reference not the actual data.
* Analysis
  * Analysis engines are short-running and always tied to a user
  * Free users lose pod and data when session times out
  * Conda available with many preinstalled packages
    * Pip & Conda configured to install
  * Src dir r/w with git
    * Indicators
    * Strategies
    * Analysis
* Request Context
  * User ID
  * Workspace ID
  * Channel
    * Telegram
    * Web
* Website
  * Current vue site
* Gateway
  * Websocket gateway
    * Authentication
      * User Featureset / License Info added to requests/headers
    * Relays data pub/sub to web/mobile clients
    * Routes agent chat to/from user container
  * Active channel features
    * TV Chart
    * Text chat
    * Plot out
    * Voice/Audio
  * Static file server
    * Kafka
    * Temp Gateway files (image responses, etc.)
* Logs
  * Kafka
    * Strategy Logs
    * Order/Execution Logs
  * Chat Logs
    * User ID Topic has TTL based on license
* Agent Framework
  * Soul file
  * Tool set (incl subagents)
  * LLM choice
  * RAG namespace
  * Agents
    * Top-level coordinator
    * TradingView agent
      * Indicators, Drawings, Annotations
    * Research Agent
      * Pandas/Polars analysis
      * Plot generation
* License Manager
* Kafka Topics Doc w/ schemas
*