bugfixes; research subproc; higher sandbox limits

This commit is contained in:
2026-04-16 18:11:26 -04:00
parent f80c943dc3
commit 3153e89d4f
54 changed files with 1947 additions and 498 deletions

View File

@@ -81,18 +81,29 @@ All sockets bind on **Relay** (well-known endpoint). Components connect to relay
- Relay publishes DataRequest to ingestor work queue
- No request tracking - relay is stateless
### 2. Ingestor Work Queue (Relay → Ingestors)
**Pattern**: PUB/SUB with exchange prefix filtering
- **Socket Type**: Relay uses PUB (bind), Ingestors use SUB (connect)
- **Endpoint**: `tcp://*:5555` (Relay binds)
- **Message Types**: `DataRequest` (historical or realtime)
- **Topic Prefix**: Market name (e.g., `BTC/USDT.`, `ETH/BTC.`)
- **Behavior**:
- Relay publishes work with exchange prefix from ticker
- Ingestors subscribe only to exchanges they support
- Multiple ingestors can compete for same exchange
- Ingestors write data to Kafka only (no direct response)
- Flink processes Kafka → Iceberg → notification
### 2. Ingestor Work Queue (Flink ↔ Ingestors)
**Pattern**: ROUTER/DEALER slot-based broker
- **Socket Type**: Flink `IngestorBroker` uses ROUTER (bind), Ingestors use DEALER (connect)
- **Endpoint**: `tcp://*:5567` (Flink binds)
- **Message Types**: `WorkerReady` (slot offer), `DataRequest` (work assignment), `WorkComplete`, `WorkHeartbeat`, `WorkReject`, `WorkStop`
- **Capacity model**:
- Each `WorkerReady` (0x20) is ONE slot offer for one exchange and one job type (`SlotType`: `HISTORICAL=1`, `REALTIME=2`, `ANY=0`)
- Ingestors send N `WorkerReady` messages at startup — one per available slot per exchange per type
- Flink dispatches a job by matching the slot's exchange and SlotType to the request
- The slot is consumed on dispatch; the ingestor re-offers it (new `WorkerReady`) when the job ends
- Rate-limit backoff: if the exchange returns a 429, the ingestor delays the re-offer by the `Retry-After` duration from the response header
- **Historical job lifecycle**:
- Flink dispatches `DataRequest` (HISTORICAL_OHLC) → ingestor fetches and writes to Kafka → sends `WorkComplete` (0x21) → sends new `WorkerReady` for that slot
- **Realtime job lifecycle**:
- Flink dispatches `DataRequest` (REALTIME_TICKS) → ingestor polls exchange and writes ticks to Kafka → sends `WorkHeartbeat` (0x22) every 5 s → on `WorkStop` (0x25) from Flink: cancels and sends new `WorkerReady`
- **Slot configuration** (per ingestor, per exchange):
```yaml
exchange_capacity:
BINANCE: { historical_slots: 3, realtime_slots: 5 }
KRAKEN: { historical_slots: 2, realtime_slots: 3 }
COINBASE: { historical_slots: 2, realtime_slots: 4 }
```
- **Flink restart**: when Flink restarts its `freeSlots` deque is cleared; all in-flight jobs time out on the ingestor side, releasing their slots, which then re-offer via `WorkerReady`
### 3. Market Data Fanout (Relay ↔ Flink ↔ Clients)
**Pattern**: XPUB/XSUB proxy