11 KiB
ZeroMQ Protocol Architecture
Our data transfer protocol uses ZeroMQ with Protobufs. We send a small envelope with a protocol version byte as the first frame, then a type ID as the first byte of the second frame, followed by the protobuf payload also in the second frame.
OHLC periods are represented as seconds.
Data Flow Overview
Relay as Gateway: The Relay is a well-known bind point that all components connect to. It routes messages between clients, ingestors, and Flink.
Historical Data Query Flow (Async Event-Driven Architecture)
- Client generates request_id and/or client_id (both are client-generated)
- Client computes notification topic:
RESPONSE:{client_id}orHISTORY_READY:{request_id} - Client subscribes to notification topic BEFORE sending request (prevents race condition)
- Client sends SubmitHistoricalRequest to Relay (REQ/REP)
- Relay returns immediate SubmitResponse with request_id and notification_topic (for confirmation)
- Relay publishes DataRequest to ingestor work queue with exchange prefix (PUB/SUB)
- Ingestor receives request, fetches data from exchange
- Ingestor writes OHLC data to Kafka with __metadata in first record
- Flink reads from Kafka, processes data, writes to Iceberg
- Flink task manager sends HistoryReadyNotification via PUSH to job manager PULL (port 5561)
- Job manager
HistoryNotificationForwarderrepublishes on MARKET_DATA_PUB (port 5558) - Relay proxies notification via XSUB → XPUB to clients
- Client receives notification (already subscribed) and queries Iceberg for data
Key Architectural Change: Relay is completely stateless. No request/response correlation needed. All notification routing is topic-based (e.g., "RESPONSE:{client_id}").
Race Condition Prevention: Notification topics are deterministic based on client-generated values (request_id or client_id). Clients MUST subscribe to the notification topic BEFORE submitting the request to avoid missing notifications.
Two Notification Patterns:
- Per-client topic (
RESPONSE:{client_id}): Subscribe once during connection, reuse for all requests from this client. Recommended for most clients. - Per-request topic (
HISTORY_READY:{request_id}): Subscribe immediately before each request. Use when you need per-request isolation or don't have a persistent client_id.
Realtime Data Flow (Flink → Relay → Clients)
- Ingestors write realtime ticks to Kafka
- Flink reads from Kafka, processes OHLC aggregations, CEP triggers
- Flink publishes market data via ZMQ PUB (port 5558)
- Relay subscribes to Flink (XSUB) and fanouts to clients (XPUB)
- Clients subscribe to specific tickers
Symbol Metadata Update Flow (Flink → Gateways)
- Ingestors write symbol metadata to Kafka
- Flink reads from Kafka, writes to Iceberg symbol_metadata table
- After committing to Iceberg, Flink publishes SymbolMetadataUpdated notification on MARKET_DATA_PUB
- Gateways subscribe to METADATA_UPDATE topic on startup
- Upon receiving notification, gateways reload symbol metadata from Iceberg
- This prevents race conditions where gateways start before symbol metadata is available
Data Processing (Kafka → Flink → Iceberg)
- All market data flows through Kafka (durable event log)
- Flink processes streams for aggregations and CEP
- Flink writes historical data to Apache Iceberg tables
- Clients can query Iceberg for historical data (alternative to ingestor backfill)
Key Design Principles:
- Relay is the well-known bind point - all other components connect to it
- Relay is completely stateless - no request tracking, only topic-based routing
- Exchange prefix filtering allows ingestor specialization (e.g., only BINANCE ingestors)
- Historical data flows through Kafka (durable processing) only - no direct response
- Async event-driven notifications via pub/sub (Flink → Relay → Clients)
- Protobufs over ZMQ for all inter-service communication
- Kafka for durability and Flink stream processing
- Iceberg for long-term historical storage and client queries
ZeroMQ Channels and Patterns
All sockets bind on Relay (well-known endpoint). Components connect to relay.
1. Client Request Channel (Clients → Relay)
Pattern: ROUTER (Relay binds, Clients use REQ)
- Socket Type: Relay uses ROUTER (bind), Clients use REQ (connect)
- Endpoint:
tcp://*:5559(Relay binds) - Message Types:
SubmitHistoricalRequest→SubmitResponse - Behavior:
- Client generates request_id and/or client_id
- Client computes notification topic deterministically
- Client subscribes to notification topic FIRST (prevents race)
- Client sends REQ for historical OHLC data
- Relay validates request and returns immediate acknowledgment
- Response includes notification_topic for client confirmation
- Relay publishes DataRequest to ingestor work queue
- No request tracking - relay is stateless
2. Ingestor Work Queue (Flink ↔ Ingestors)
Pattern: ROUTER/DEALER slot-based broker
- Socket Type: Flink
IngestorBrokeruses ROUTER (bind), Ingestors use DEALER (connect) - Endpoint:
tcp://*:5567(Flink binds) - Message Types:
WorkerReady(slot offer),DataRequest(work assignment),WorkComplete,WorkHeartbeat,WorkReject,WorkStop - Capacity model:
- Each
WorkerReady(0x20) is ONE slot offer for one exchange and one job type (SlotType:HISTORICAL=1,REALTIME=2,ANY=0) - Ingestors send N
WorkerReadymessages at startup — one per available slot per exchange per type - Flink dispatches a job by matching the slot's exchange and SlotType to the request
- The slot is consumed on dispatch; the ingestor re-offers it (new
WorkerReady) when the job ends - Rate-limit backoff: if the exchange returns a 429, the ingestor delays the re-offer by the
Retry-Afterduration from the response header
- Each
- Historical job lifecycle:
- Flink dispatches
DataRequest(HISTORICAL_OHLC) → ingestor fetches and writes to Kafka → sendsWorkComplete(0x21) → sends newWorkerReadyfor that slot
- Flink dispatches
- Realtime job lifecycle:
- Flink dispatches
DataRequest(REALTIME_TICKS) → ingestor polls exchange and writes ticks to Kafka → sendsWorkHeartbeat(0x22) every 5 s → onWorkStop(0x25) from Flink: cancels and sends newWorkerReady
- Flink dispatches
- Slot configuration (per ingestor, per exchange):
exchange_capacity: BINANCE: { historical_slots: 3, realtime_slots: 5 } KRAKEN: { historical_slots: 2, realtime_slots: 3 } COINBASE: { historical_slots: 2, realtime_slots: 4 } - Flink restart: when Flink restarts its
freeSlotsdeque is cleared; all in-flight jobs time out on the ingestor side, releasing their slots, which then re-offer viaWorkerReady
3. Market Data Fanout (Relay ↔ Flink ↔ Clients)
Pattern: XPUB/XSUB proxy
- Socket Type:
- Relay XPUB (bind) ← Clients SUB (connect) - Port 5558
- Relay XSUB (connect) → Flink MARKET_DATA_PUB (bind) - Port 5558
- Message Types:
Tick,OHLC,HistoryReadyNotification,SymbolMetadataUpdated - Topic Formats:
- Market data:
{ticker}|{data_type}(e.g.,BTC/USDT.BINANCE|tick) - Notifications:
RESPONSE:{client_id}orHISTORY_READY:{request_id} - System notifications:
METADATA_UPDATE(for symbol metadata updates)
- Market data:
- Behavior:
- Clients subscribe to ticker topics and notification topics via Relay XPUB
- Relay forwards subscriptions to Flink via XSUB
- Flink publishes processed market data and notifications
- Relay proxies data to subscribed clients (stateless forwarding)
- Dynamic subscription management (no pre-registration)
Internal Flink notification path (port 5561):
- Flink task managers send
HistoryReadyNotificationvia PUSH to job manager PULL (port 5561) HistoryNotificationForwarder(job manager) receives and republishes on MARKET_DATA_PUB (port 5558)- This decouples task manager instances from direct pub/sub and handles multi-task-manager setups
4. User Event Channels (User Containers → Gateway)
See user-events.md for the full spec including ZMQ patterns, protobuf schemas, and delivery semantics for ports 5570 and 5571.
Message Envelope Format
The core protocol uses two ZeroMQ frames:
Frame 1: [1 byte: protocol version]
Frame 2: [1 byte: message type ID][N bytes: protobuf message]
This two-frame approach allows receivers to check the protocol version before parsing the message type and protobuf payload.
Important: Some ZeroMQ socket patterns (PUB/SUB, XPUB/XSUB) may prepend additional frames for routing purposes. For example:
- PUB/SUB with topic filtering: SUB sockets receive
[topic frame][version frame][message frame] - ROUTER sockets: Prepend identity frames before the message
Components must handle these additional frames appropriately:
- SUB sockets: Skip the first frame (topic), then parse the remaining frames as the standard 2-frame envelope
- ROUTER sockets: Extract identity frames, then parse the standard 2-frame envelope
The two-frame envelope is the logical protocol format, but physical transmission may include additional ZeroMQ transport frames.
Message Type IDs
| Type ID | Message Type | Description |
|---|---|---|
| 0x01 | DataRequest | Request for historical or realtime data |
| 0x02 | DataResponse (deprecated) | Historical data response (no longer used) |
| 0x03 | IngestorControl | Control messages for ingestors |
| 0x04 | Tick | Individual trade tick data |
| 0x05 | OHLC | Single OHLC candle with volume |
| 0x06 | Market | Market metadata |
| 0x07 | OHLCRequest (deprecated) | Client request (replaced by SubmitHistorical) |
| 0x08 | Response (deprecated) | Generic response (replaced by SubmitResponse) |
| 0x09 | CEPTriggerRequest | Register CEP trigger |
| 0x0A | CEPTriggerAck | CEP trigger acknowledgment |
| 0x0B | CEPTriggerEvent | CEP trigger fired callback |
| 0x0C | OHLCBatch | Batch of OHLC rows with metadata (Kafka) |
| 0x10 | SubmitHistoricalRequest | Client request for historical data (async) |
| 0x11 | SubmitResponse | Immediate ack with notification topic |
| 0x12 | HistoryReadyNotification | Notification that data is ready in Iceberg |
| 0x13 | SymbolMetadataUpdated | Notification that symbol metadata refreshed |
| 0x20 | UserEvent | Container → Gateway event (see user-events.md) |
| 0x21 | EventAck | Gateway → Container acknowledgment |
Error Handling
Async Architecture Error Handling:
- Failed historical requests: ingestor writes error marker to Kafka
- Flink reads error marker and publishes HistoryReadyNotification with ERROR status
- Client timeout: if no notification received within timeout, assume failure
- REQ/REP timeouts: 30 seconds default for client request submission
- PUB/SUB has no delivery guarantees (Kafka provides durability)
- No response routing needed - all notifications via topic-based pub/sub
Durability:
- All data flows through Kafka for durability
- Flink checkpointing ensures exactly-once processing
- Client can retry request with new request_id if notification not received