# ZeroMQ Protocol Architecture Our data transfer protocol uses ZeroMQ with Protobufs. We send a small envelope with a protocol version byte as the first frame, then a type ID as the first byte of the second frame, followed by the protobuf payload also in the second frame. OHLC periods are represented as seconds. ## Data Flow Overview **Relay as Gateway**: The Relay is a well-known bind point that all components connect to. It routes messages between clients, ingestors, and Flink. ### Historical Data Query Flow (Async Event-Driven Architecture) * Client generates request_id and/or client_id (both are client-generated) * Client computes notification topic: `RESPONSE:{client_id}` or `HISTORY_READY:{request_id}` * **Client subscribes to notification topic BEFORE sending request (prevents race condition)** * Client sends SubmitHistoricalRequest to Relay (REQ/REP) * Relay returns immediate SubmitResponse with request_id and notification_topic (for confirmation) * Relay publishes DataRequest to ingestor work queue with exchange prefix (PUB/SUB) * Ingestor receives request, fetches data from exchange * Ingestor writes OHLC data to Kafka with __metadata in first record * Flink reads from Kafka, processes data, writes to Iceberg * Flink publishes HistoryReadyNotification to ZMQ PUB socket (port 5557) with deterministic topic * Relay proxies notification via XSUB → XPUB to clients * Client receives notification (already subscribed) and queries Iceberg for data **Key Architectural Change**: Relay is completely stateless. No request/response correlation needed. All notification routing is topic-based (e.g., "RESPONSE:{client_id}"). **Race Condition Prevention**: Notification topics are deterministic based on client-generated values (request_id or client_id). Clients MUST subscribe to the notification topic BEFORE submitting the request to avoid missing notifications. **Two Notification Patterns**: 1. **Per-client topic** (`RESPONSE:{client_id}`): Subscribe once during connection, reuse for all requests from this client. Recommended for most clients. 2. **Per-request topic** (`HISTORY_READY:{request_id}`): Subscribe immediately before each request. Use when you need per-request isolation or don't have a persistent client_id. ### Realtime Data Flow (Flink → Relay → Clients) * Ingestors write realtime ticks to Kafka * Flink reads from Kafka, processes OHLC aggregations, CEP triggers * Flink publishes market data via ZMQ PUB * Relay subscribes to Flink (XSUB) and fanouts to clients (XPUB) * Clients subscribe to specific tickers ### Data Processing (Kafka → Flink → Iceberg) * All market data flows through Kafka (durable event log) * Flink processes streams for aggregations and CEP * Flink writes historical data to Apache Iceberg tables * Clients can query Iceberg for historical data (alternative to ingestor backfill) **Key Design Principles**: * Relay is the well-known bind point - all other components connect to it * Relay is completely stateless - no request tracking, only topic-based routing * Exchange prefix filtering allows ingestor specialization (e.g., only BINANCE ingestors) * Historical data flows through Kafka (durable processing) only - no direct response * Async event-driven notifications via pub/sub (Flink → Relay → Clients) * Protobufs over ZMQ for all inter-service communication * Kafka for durability and Flink stream processing * Iceberg for long-term historical storage and client queries ## ZeroMQ Channels and Patterns All sockets bind on **Relay** (well-known endpoint). Components connect to relay. ### 1. Client Request Channel (Clients → Relay) **Pattern**: ROUTER (Relay binds, Clients use REQ) - **Socket Type**: Relay uses ROUTER (bind), Clients use REQ (connect) - **Endpoint**: `tcp://*:5559` (Relay binds) - **Message Types**: `SubmitHistoricalRequest` → `SubmitResponse` - **Behavior**: - Client generates request_id and/or client_id - Client computes notification topic deterministically - **Client subscribes to notification topic FIRST (prevents race)** - Client sends REQ for historical OHLC data - Relay validates request and returns immediate acknowledgment - Response includes notification_topic for client confirmation - Relay publishes DataRequest to ingestor work queue - No request tracking - relay is stateless ### 2. Ingestor Work Queue (Relay → Ingestors) **Pattern**: PUB/SUB with exchange prefix filtering - **Socket Type**: Relay uses PUB (bind), Ingestors use SUB (connect) - **Endpoint**: `tcp://*:5555` (Relay binds) - **Message Types**: `DataRequest` (historical or realtime) - **Topic Prefix**: Exchange name (e.g., `BINANCE:`, `COINBASE:`) - **Behavior**: - Relay publishes work with exchange prefix from ticker - Ingestors subscribe only to exchanges they support - Multiple ingestors can compete for same exchange - Ingestors write data to Kafka only (no direct response) - Flink processes Kafka → Iceberg → notification ### 3. Market Data Fanout (Relay ↔ Flink ↔ Clients) **Pattern**: XPUB/XSUB proxy - **Socket Type**: - Relay XPUB (bind) ← Clients SUB (connect) - Port 5558 - Relay XSUB (connect) → Flink PUB (bind) - Port 5557 - **Message Types**: `Tick`, `OHLC`, `HistoryReadyNotification` - **Topic Formats**: - Market data: `{ticker}|{data_type}` (e.g., `BINANCE:BTC/USDT|tick`) - Notifications: `RESPONSE:{client_id}` or `HISTORY_READY:{request_id}` - **Behavior**: - Clients subscribe to ticker topics and notification topics via Relay XPUB - Relay forwards subscriptions to Flink via XSUB - Flink publishes processed market data and notifications - Relay proxies data to subscribed clients (stateless forwarding) - Dynamic subscription management (no pre-registration) ### 4. Ingestor Control Channel (Optional - Future Use) **Pattern**: PUB/SUB (Broadcast control) - **Socket Type**: Relay uses PUB, Ingestors use SUB - **Endpoint**: `tcp://*:5557` (Relay binds) - **Message Types**: `IngestorControl` (cancel, config updates) - **Behavior**: - Broadcast control messages to all ingestors - Used for realtime subscription cancellation - Configuration updates ## Message Envelope Format The core protocol uses two ZeroMQ frames: ``` Frame 1: [1 byte: protocol version] Frame 2: [1 byte: message type ID][N bytes: protobuf message] ``` This two-frame approach allows receivers to check the protocol version before parsing the message type and protobuf payload. **Important**: Some ZeroMQ socket patterns (PUB/SUB, XPUB/XSUB) may prepend additional frames for routing purposes. For example: - **PUB/SUB with topic filtering**: SUB sockets receive `[topic frame][version frame][message frame]` - **ROUTER sockets**: Prepend identity frames before the message Components must handle these additional frames appropriately: - SUB sockets: Skip the first frame (topic), then parse the remaining frames as the standard 2-frame envelope - ROUTER sockets: Extract identity frames, then parse the standard 2-frame envelope The two-frame envelope is the **logical protocol format**, but physical transmission may include additional ZeroMQ transport frames. ## Message Type IDs | Type ID | Message Type | Description | |---------|---------------------------|------------------------------------------------| | 0x01 | DataRequest | Request for historical or realtime data | | 0x02 | DataResponse (deprecated) | Historical data response (no longer used) | | 0x03 | IngestorControl | Control messages for ingestors | | 0x04 | Tick | Individual trade tick data | | 0x05 | OHLC | Single OHLC candle with volume | | 0x06 | Market | Market metadata | | 0x07 | OHLCRequest (deprecated) | Client request (replaced by SubmitHistorical) | | 0x08 | Response (deprecated) | Generic response (replaced by SubmitResponse) | | 0x09 | CEPTriggerRequest | Register CEP trigger | | 0x0A | CEPTriggerAck | CEP trigger acknowledgment | | 0x0B | CEPTriggerEvent | CEP trigger fired callback | | 0x0C | OHLCBatch | Batch of OHLC rows with metadata (Kafka) | | 0x10 | SubmitHistoricalRequest | Client request for historical data (async) | | 0x11 | SubmitResponse | Immediate ack with notification topic | | 0x12 | HistoryReadyNotification | Notification that data is ready in Iceberg | ## Error Handling **Async Architecture Error Handling**: - Failed historical requests: ingestor writes error marker to Kafka - Flink reads error marker and publishes HistoryReadyNotification with ERROR status - Client timeout: if no notification received within timeout, assume failure - Realtime requests cancelled via control channel if ingestor fails - REQ/REP timeouts: 30 seconds default for client request submission - PUB/SUB has no delivery guarantees (Kafka provides durability) - No response routing needed - all notifications via topic-based pub/sub **Durability**: - All data flows through Kafka for durability - Flink checkpointing ensures exactly-once processing - Client can retry request with new request_id if notification not received