Files
ai/doc/backend_redesign.md

3.1 KiB

aiignore

This is not implemented yet and are just notes for Tim

Overview

We need a realtime data system that is scalable and durable, so we have the following architecture:

  • Protobufs over ZeroMQ for data streaming
  • Ingestors
    • Realtime data subscriptions (tick data)
    • Historical data queries (OHLC)
    • Everything pushes to Kafka topics
  • Kafka
    • Durable append logs for incoming and in-process data
    • Topics maintained by Flink in redesign/flink/src/main/resources/topics.yaml
  • Flink
    • Raw ingestor streams are read from Kafka
    • Deduplication
    • Builds OHLCs
  • Apache Iceberg
    • Historical data storage

Configuration

All systems should use two YAML configuration files that are mounted by k8s from a ConfigMap and / or Secrets. Keep secrets separate from config.

When a configuration or secrets item is needed, describe it in resdesign/doc/config.md

Ingest

Ingestion API

  • all symbols

    • exchange id (BINANCE)
    • market_id (BTC/USDT)
    • market_type
      • Spot
    • description (Bitcoin/Tether on Binance)
    • column names ( ['open', 'high', 'low', 'close', 'volume', 'taker_vol', 'maker_vol'])
    • name
    • exchange
    • base asset
    • quote asset
    • earliest time
    • tick size
    • supported periods
  • Centralized data streaming backend

    • Ingestion of tick, ohlc, news, etc. into Kafka by worker gatherers
    • Flink with:
      • zmq pubsub
    • (seq, time) key for every row in a tick series
    • every series also has seq->time and time->seq indexes
    • Sequence tickers with strict seq's AND time index (seq can just be row counter autoincrement)
  • Historical data

    • Apache Iceberg
      • Clients query here first
    • Backfill service
  • Quote Server

    • Realtime current prices for selected quote currencies
  • Workspace

    • Current chart, indicators, drawings, etc.
    • Always in context, must be brief. Data series are a reference not the actual data.
  • Analysis

    • Analysis engines are short-running and always tied to a user
    • Free users lose pod and data when session times out
    • Conda available with many preinstalled packages
      • Pip & Conda configured to install
    • Src dir r/w with git
      • Indicators
      • Strategies
      • Analysis
  • Request Context

    • User ID
    • Workspace ID
    • Channel
      • Telegram
      • Web
  • Website

    • Current vue site
  • Gateway

    • Websocket gateway
      • Authentication
        • User Featureset / License Info added to requests/headers
      • Relays data pub/sub to web/mobile clients
      • Routes agent chat to/from user container
    • Active channel features
      • TV Chart
      • Text chat
      • Plot out
      • Voice/Audio
    • Static file server
      • Kafka
      • Temp Gateway files (image responses, etc.)
  • Logs

    • Kafka
      • Strategy Logs
      • Order/Execution Logs
    • Chat Logs
      • User ID Topic has TTL based on license
  • Agent Framework

    • Soul file
    • Tool set (incl subagents)
    • LLM choice
    • RAG namespace
    • Agents
      • Top-level coordinator
      • TradingView skill
        • Indicators, Drawings, Annotations
      • Research Agent
        • Pandas/Polars analysis
        • Plot generation
  • License Manager

  • Kafka Topics Doc w/ schemas