backend redesign

This commit is contained in:
2026-03-11 18:47:11 -04:00
parent 8ff277c8c6
commit e99ef5d2dd
210 changed files with 12147 additions and 155 deletions

7
relay/.gitignore vendored Normal file
View File

@@ -0,0 +1,7 @@
target/
config.yaml
secrets.yaml
*.log
.env
.DS_Store
protobuf/

1466
relay/Cargo.lock generated Normal file

File diff suppressed because it is too large Load Diff

27
relay/Cargo.toml Normal file
View File

@@ -0,0 +1,27 @@
[package]
name = "relay"
version = "0.1.0"
edition = "2021"
[dependencies]
zmq = "0.9"
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
serde_yaml = "0.9"
tokio = { version = "1.0", features = ["full"] }
tokio-zmq = "0.10"
anyhow = "1.0"
thiserror = "1.0"
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
bytes = "1.0"
dashmap = "5.5"
prost = "0.13"
[build-dependencies]
prost-build = "0.13"
[profile.release]
opt-level = 3
lto = true
codegen-units = 1

52
relay/Dockerfile Normal file
View File

@@ -0,0 +1,52 @@
FROM rust:latest AS builder
WORKDIR /app
# Install ZMQ and protobuf dependencies
RUN apt-get update && apt-get install -y \
libzmq3-dev \
pkg-config \
protobuf-compiler \
&& rm -rf /var/lib/apt/lists/*
# Copy manifests
COPY Cargo.toml Cargo.lock* ./
# Copy build script and protobuf files (required for build)
COPY build.rs ./
COPY protobuf ./protobuf
# Copy source code
COPY src ./src
# Build application (includes dependencies)
RUN cargo build --release
# Runtime stage
FROM debian:bookworm-slim
# Install runtime dependencies
RUN apt-get update && apt-get install -y \
libzmq5 \
ca-certificates \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
# Copy binary from builder
COPY --from=builder /app/target/release/relay /app/relay
# Create config directory
RUN mkdir -p /config
# Set environment
ENV RUST_LOG=relay=info
# Expose ports
# 5555: Ingestor work queue (PUB)
# 5556: Ingestor response (ROUTER)
# 5558: Market data publication (XPUB)
# 5559: Client requests (ROUTER)
EXPOSE 5555 5556 5558 5559
CMD ["/app/relay"]

238
relay/README.md Normal file
View File

@@ -0,0 +1,238 @@
# ZMQ Relay Gateway
High-performance ZMQ relay/gateway that routes messages between clients, Flink, and ingestors.
## Architecture
The relay acts as a well-known bind point for all components:
```
┌─────────┐ ┌───────┐ ┌──────────┐
│ Clients │◄──────────────────►│ Relay │◄──────────────────►│ Ingestors│
└─────────┘ └───┬───┘ └──────────┘
┌────────┐
│ Flink │
└────────┘
```
## Responsibilities
### 1. Client Request Routing
- **Socket**: ROUTER (bind on port 5559)
- **Flow**: Client REQ → Relay ROUTER → Ingestor PUB
- Receives OHLC requests from clients
- Routes to appropriate ingestors using exchange prefix filtering
- Tracks pending requests and matches responses
### 2. Ingestor Work Distribution
- **Socket**: PUB (bind on port 5555)
- **Pattern**: Topic-based distribution with exchange prefixes
- Publishes work requests with exchange prefix (e.g., `BINANCE:`)
- Ingestors subscribe to exchanges they support
### 3. Response Routing
- **Socket**: ROUTER (bind on port 5556)
- **Flow**: Ingestor DEALER → Relay ROUTER → Client REQ
- Receives responses from ingestors
- Matches responses to pending client requests by request_id
- Returns data to waiting clients
### 4. Market Data Fanout
- **Sockets**: XPUB (bind on 5558) + XSUB (connect to Flink:5557)
- **Pattern**: XPUB/XSUB proxy
- Relays market data from Flink to multiple clients
- Manages subscriptions dynamically
- Forwards subscription messages upstream to Flink
## Message Flows
### Historical Data Request
```
1. Client → Relay
Socket: REQ → ROUTER (5559)
Message: OHLCRequest (0x07)
2. Relay → Ingestor
Socket: PUB (5555)
Topic: Exchange prefix (e.g., "BINANCE:")
Message: DataRequest (0x01)
3. Ingestor fetches data from exchange
4. Ingestor → Relay
Socket: DEALER → ROUTER (5556)
Message: DataResponse (0x02)
5. Relay → Client
Socket: ROUTER → REQ
Message: Response (0x08)
```
### Market Data Subscription
```
1. Client subscribes to ticker
Socket: SUB → XPUB (5558)
Topic: "BINANCE:BTC/USDT|tick"
2. Relay forwards subscription
Socket: XSUB → Flink PUB (5557)
3. Flink publishes data
Socket: PUB (5557) → XSUB
4. Relay fanout to clients
Socket: XPUB (5558) → SUB
```
## Configuration
Edit `config.yaml`:
```yaml
bind_address: "tcp://*"
client_request_port: 5559
market_data_pub_port: 5558
ingestor_work_port: 5555
ingestor_response_port: 5556
flink_market_data_endpoint: "tcp://flink-jobmanager:5557"
request_timeout_secs: 30
high_water_mark: 10000
```
## Building
```bash
cargo build --release
```
## Running
```bash
# With default config
./target/release/relay
# With custom config
CONFIG_PATH=/path/to/config.yaml ./target/release/relay
# With Docker
docker build -t relay .
docker run -p 5555-5559:5555-5559 relay
```
## Environment Variables
- `CONFIG_PATH`: Path to config file (default: `/config/config.yaml`)
- `RUST_LOG`: Log level (default: `relay=info`)
## Ports
| Port | Socket Type | Direction | Purpose |
|------|------------|-----------|---------|
| 5555 | PUB | → Ingestors | Work distribution with exchange prefix |
| 5556 | ROUTER | ← Ingestors | Response collection |
| 5557 | - | (Flink) | Flink market data publication |
| 5558 | XPUB | → Clients | Market data fanout |
| 5559 | ROUTER | ← Clients | Client request handling |
## Monitoring
The relay logs all major events:
```
INFO relay: Client request routing
INFO relay: Forwarded request to ingestors: prefix=BINANCE:, request_id=...
INFO relay: Received response from ingestor: request_id=..., status=OK
INFO relay: Sent response to client: request_id=...
WARN relay: Request timed out: request_id=...
```
## Performance
- **High water mark**: Configurable per socket (default: 10,000 messages)
- **Request timeout**: Automatic cleanup of expired requests (default: 30s)
- **Zero-copy proxying**: XPUB/XSUB market data forwarding
- **Async cleanup**: Background task for timeout management
## Design Decisions
### Why Rust?
- **Performance**: Zero-cost abstractions, minimal overhead
- **Safety**: Memory safety without garbage collection
- **Concurrency**: Fearless concurrency with strong type system
- **ZMQ Integration**: Excellent ZMQ bindings
### Why ROUTER for clients?
- Preserves client identity for request/response matching
- Allows async responses (no blocking)
- Handles multiple concurrent clients efficiently
### Why PUB for ingestor work?
- Topic-based filtering by exchange
- Multiple ingestors can compete for same exchange
- Scales horizontally with ingestor count
- No single point of failure
### Why XPUB/XSUB for market data?
- Dynamic subscription management
- Efficient fanout to many clients
- Upstream subscription control
- Standard ZMQ proxy pattern
## Troubleshooting
### No response from ingestors
Check:
- Ingestors are connected to port 5555
- Ingestors have subscribed to exchange prefix
- Topic format: `EXCHANGE:` (e.g., `BINANCE:`)
### Client timeout
Check:
- Request timeout configuration
- Ingestor availability
- Network connectivity
- Pending requests map (logged on timeout)
### Market data not flowing
Check:
- Flink is publishing on port 5557
- Relay XSUB is connected to Flink
- Clients have subscribed to correct topics
- Topic format: `{ticker}|{data_type}`
## Testing
Run the test client:
```bash
cd ../test/history_client
python client.py
```
Expected flow:
1. Client sends request to relay:5559
2. Relay publishes to ingestors:5555
3. Ingestor fetches and responds to relay:5556
4. Relay returns to client
## Future Enhancements
- [ ] Metrics collection (Prometheus)
- [ ] Health check endpoint
- [ ] Request rate limiting
- [ ] Circuit breaker for failed ingestors
- [ ] Request deduplication
- [ ] Response caching
- [ ] Multi-part response support for large datasets

16
relay/build.rs Normal file
View File

@@ -0,0 +1,16 @@
fn main() {
// Use Config to compile all protos together
// Since the proto files don't have package declarations,
// they'll all be generated into a single _.rs file
prost_build::Config::new()
.compile_protos(
&[
"protobuf/ingestor.proto",
"protobuf/market.proto",
"protobuf/ohlc.proto",
"protobuf/tick.proto",
],
&["protobuf/"],
)
.unwrap_or_else(|e| panic!("Failed to compile protos: {}", e));
}

19
relay/config.example.yaml Normal file
View File

@@ -0,0 +1,19 @@
# ZMQ Relay Configuration
# Bind address for all relay sockets
bind_address: "tcp://*"
# Client-facing ports
client_request_port: 5559 # ROUTER - Client historical data requests
market_data_pub_port: 5558 # XPUB - Market data fanout to clients
# Ingestor-facing ports
ingestor_work_port: 5555 # PUB - Distribute work with exchange prefix
ingestor_response_port: 5556 # ROUTER - Receive responses from ingestors
# Flink connection
flink_market_data_endpoint: "tcp://flink-jobmanager:5557" # XSUB - Subscribe to Flink market data
# Timeouts and limits
request_timeout_secs: 30 # Timeout for pending client requests
high_water_mark: 10000 # ZMQ high water mark for all sockets

104
relay/src/config.rs Normal file
View File

@@ -0,0 +1,104 @@
use anyhow::Result;
use serde::{Deserialize, Serialize};
use std::fs;
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Config {
/// Bind address for client-facing sockets
#[serde(default = "default_bind_address")]
pub bind_address: String,
/// Client request port (ROUTER - receives client requests)
#[serde(default = "default_client_request_port")]
pub client_request_port: u16,
/// Market data publication port (XPUB - clients subscribe here)
#[serde(default = "default_market_data_pub_port")]
pub market_data_pub_port: u16,
/// Ingestor work queue port (PUB - publish work with exchange prefix)
#[serde(default = "default_ingestor_work_port")]
pub ingestor_work_port: u16,
/// Ingestor response port (ROUTER - receives responses from ingestors)
#[serde(default = "default_ingestor_response_port")]
pub ingestor_response_port: u16,
/// Flink market data endpoint (XSUB - relay subscribes to Flink)
#[serde(default = "default_flink_market_data_endpoint")]
pub flink_market_data_endpoint: String,
/// Request timeout in seconds
#[serde(default = "default_request_timeout_secs")]
pub request_timeout_secs: u64,
/// High water mark for sockets
#[serde(default = "default_hwm")]
pub high_water_mark: i32,
}
fn default_bind_address() -> String {
"tcp://*".to_string()
}
fn default_client_request_port() -> u16 {
5559
}
fn default_market_data_pub_port() -> u16 {
5558
}
fn default_ingestor_work_port() -> u16 {
5555
}
fn default_ingestor_response_port() -> u16 {
5556
}
fn default_flink_market_data_endpoint() -> String {
"tcp://flink-jobmanager:5557".to_string()
}
fn default_request_timeout_secs() -> u64 {
30
}
fn default_hwm() -> i32 {
10000
}
impl Default for Config {
fn default() -> Self {
Self {
bind_address: default_bind_address(),
client_request_port: default_client_request_port(),
market_data_pub_port: default_market_data_pub_port(),
ingestor_work_port: default_ingestor_work_port(),
ingestor_response_port: default_ingestor_response_port(),
flink_market_data_endpoint: default_flink_market_data_endpoint(),
request_timeout_secs: default_request_timeout_secs(),
high_water_mark: default_hwm(),
}
}
}
impl Config {
pub fn from_file(path: &str) -> Result<Self> {
let contents = fs::read_to_string(path)?;
let config: Config = serde_yaml::from_str(&contents)?;
Ok(config)
}
pub fn from_env() -> Result<Self> {
let config_path = std::env::var("CONFIG_PATH")
.unwrap_or_else(|_| "/config/config.yaml".to_string());
if std::path::Path::new(&config_path).exists() {
Self::from_file(&config_path)
} else {
Ok(Self::default())
}
}
}

47
relay/src/main.rs Normal file
View File

@@ -0,0 +1,47 @@
mod config;
mod relay;
mod proto;
use anyhow::Result;
use config::Config;
use relay::Relay;
use tracing::{info, error};
use tracing_subscriber::{layer::SubscriberExt, util::SubscriberInitExt};
#[tokio::main]
async fn main() -> Result<()> {
// Initialize tracing
tracing_subscriber::registry()
.with(
tracing_subscriber::EnvFilter::try_from_default_env()
.unwrap_or_else(|_| "relay=info".into()),
)
.with(tracing_subscriber::fmt::layer())
.init();
info!("Starting Stateless ZMQ Relay Gateway");
info!("Architecture: Async event-driven with pub/sub notifications");
// Load configuration
let config = Config::from_env()?;
info!("Configuration loaded: {:?}", config);
// Create and run stateless relay
let relay = Relay::new(config)?;
// Handle shutdown signals
tokio::select! {
result = relay.run() => {
match result {
Ok(_) => info!("Relay stopped gracefully"),
Err(e) => error!("Relay error: {}", e),
}
}
_ = tokio::signal::ctrl_c() => {
info!("Received shutdown signal");
}
}
info!("ZMQ Relay Gateway stopped");
Ok(())
}

3
relay/src/proto.rs Normal file
View File

@@ -0,0 +1,3 @@
// Include generated protobuf code from build.rs
// Since proto files have no package declaration, they're all in _.rs
include!(concat!(env!("OUT_DIR"), "/_.rs"));

323
relay/src/relay.rs Normal file
View File

@@ -0,0 +1,323 @@
use crate::config::Config;
use crate::proto;
use anyhow::{Context, Result};
use prost::Message;
use tracing::{debug, error, info, warn};
const PROTOCOL_VERSION: u8 = 0x01;
const MSG_TYPE_SUBMIT_REQUEST: u8 = 0x10;
const MSG_TYPE_SUBMIT_RESPONSE: u8 = 0x11;
const MSG_TYPE_DATA_REQUEST: u8 = 0x01;
const MSG_TYPE_HISTORY_READY: u8 = 0x12;
pub struct Relay {
config: Config,
context: zmq::Context,
}
impl Relay {
pub fn new(config: Config) -> Result<Self> {
let context = zmq::Context::new();
Ok(Self {
config,
context,
})
}
pub async fn run(self) -> Result<()> {
info!("Initializing Stateless ZMQ Relay");
// Bind sockets
let client_request_socket = self.create_client_request_socket()?;
let market_data_frontend = self.create_market_data_frontend()?;
let market_data_backend = self.create_market_data_backend()?;
let ingestor_work_socket = self.create_ingestor_work_socket()?;
info!("All sockets initialized successfully - relay is STATELESS");
info!("No pending requests tracked - all async via pub/sub");
// Run main loop
tokio::task::spawn_blocking(move || {
Self::proxy_loop(
client_request_socket,
market_data_frontend,
market_data_backend,
ingestor_work_socket,
)
})
.await?
}
fn create_client_request_socket(&self) -> Result<zmq::Socket> {
let socket = self.context.socket(zmq::ROUTER)?;
socket.set_sndhwm(self.config.high_water_mark)?;
socket.set_rcvhwm(self.config.high_water_mark)?;
socket.set_linger(1000)?;
let endpoint = format!("{}:{}", self.config.bind_address, self.config.client_request_port);
socket.bind(&endpoint)?;
info!("Client request socket (ROUTER) bound to {}", endpoint);
info!(" → Accepts SubmitHistoricalRequest, returns SubmitResponse immediately");
Ok(socket)
}
fn create_market_data_frontend(&self) -> Result<zmq::Socket> {
let socket = self.context.socket(zmq::XPUB)?;
socket.set_sndhwm(self.config.high_water_mark)?;
socket.set_xpub_verbose(true)?;
let endpoint = format!("{}:{}", self.config.bind_address, self.config.market_data_pub_port);
socket.bind(&endpoint)?;
info!("Market data frontend (XPUB) bound to {}", endpoint);
info!(" → Clients subscribe here for HistoryReadyNotification and market data");
Ok(socket)
}
fn create_market_data_backend(&self) -> Result<zmq::Socket> {
let socket = self.context.socket(zmq::XSUB)?;
socket.set_rcvhwm(self.config.high_water_mark)?;
socket.connect(&self.config.flink_market_data_endpoint)?;
info!("Market data backend (XSUB) connected to {}", self.config.flink_market_data_endpoint);
info!(" → Receives HistoryReadyNotification and market data from Flink");
Ok(socket)
}
fn create_ingestor_work_socket(&self) -> Result<zmq::Socket> {
let socket = self.context.socket(zmq::PUB)?;
socket.set_sndhwm(self.config.high_water_mark)?;
socket.set_linger(1000)?;
let endpoint = format!("{}:{}", self.config.bind_address, self.config.ingestor_work_port);
socket.bind(&endpoint)?;
info!("Ingestor work queue (PUB) bound to {}", endpoint);
info!(" → Publishes DataRequest with exchange prefix");
Ok(socket)
}
fn proxy_loop(
client_request_socket: zmq::Socket,
market_data_frontend: zmq::Socket,
market_data_backend: zmq::Socket,
ingestor_work_socket: zmq::Socket,
) -> Result<()> {
let mut items = [
client_request_socket.as_poll_item(zmq::POLLIN),
market_data_frontend.as_poll_item(zmq::POLLIN),
market_data_backend.as_poll_item(zmq::POLLIN),
];
info!("Entering stateless proxy loop");
loop {
// Poll with 100ms timeout
zmq::poll(&mut items, 100)
.context("Failed to poll sockets")?;
// Handle client request submissions
if items[0].is_readable() {
if let Err(e) = Self::handle_client_submission(
&client_request_socket,
&ingestor_work_socket,
) {
error!("Error handling client submission: {}", e);
}
}
// Handle market data subscriptions from clients (XPUB → XSUB)
if items[1].is_readable() {
if let Err(e) = Self::proxy_subscription(&market_data_frontend, &market_data_backend) {
error!("Error proxying subscription: {}", e);
}
}
// Handle market data from Flink (XSUB → XPUB)
// This includes HistoryReadyNotification and regular market data
if items[2].is_readable() {
if let Err(e) = Self::proxy_market_data(&market_data_backend, &market_data_frontend) {
error!("Error proxying market data: {}", e);
}
}
}
}
fn handle_client_submission(
client_socket: &zmq::Socket,
ingestor_socket: &zmq::Socket,
) -> Result<()> {
// Receive from client: [identity][empty][version][message]
let identity = client_socket.recv_bytes(0)?;
let _empty = client_socket.recv_bytes(0)?;
let version_frame = client_socket.recv_bytes(0)?;
let message_frame = client_socket.recv_bytes(0)?;
if version_frame.len() != 1 || version_frame[0] != PROTOCOL_VERSION {
warn!("Invalid protocol version from client");
return Ok(());
}
if message_frame.is_empty() {
warn!("Empty message frame from client");
return Ok(());
}
let msg_type = message_frame[0];
let payload = &message_frame[1..];
debug!("Received client submission: type=0x{:02x}, payload_len={}", msg_type, payload.len());
match msg_type {
MSG_TYPE_SUBMIT_REQUEST => {
Self::handle_submit_request(
identity,
payload,
client_socket,
ingestor_socket,
)?;
}
_ => {
warn!("Unknown message type from client: 0x{:02x}", msg_type);
}
}
Ok(())
}
fn handle_submit_request(
client_identity: Vec<u8>,
payload: &[u8],
client_socket: &zmq::Socket,
ingestor_socket: &zmq::Socket,
) -> Result<()> {
// Parse protobuf request
let request = proto::SubmitHistoricalRequest::decode(payload)
.context("Failed to parse SubmitHistoricalRequest")?;
let request_id = request.request_id.clone();
let ticker = request.ticker.clone();
let client_id = request.client_id.clone();
info!("Handling request submission: request_id={}, ticker={}, client_id={:?}",
request_id, ticker, client_id);
// Extract exchange prefix from ticker
let exchange_prefix = ticker.split(':').next()
.map(|s| format!("{}:", s))
.unwrap_or_else(|| String::from(""));
if exchange_prefix.is_empty() {
warn!("Ticker '{}' missing exchange prefix", ticker);
}
// Build DataRequest protobuf for ingestors
let data_request = proto::DataRequest {
request_id: request_id.clone(),
r#type: proto::data_request::RequestType::HistoricalOhlc as i32,
ticker: ticker.clone(),
historical: Some(proto::HistoricalParams {
start_time: request.start_time,
end_time: request.end_time,
period_seconds: request.period_seconds,
limit: request.limit,
}),
realtime: None,
client_id: client_id.clone(),
};
let mut data_request_bytes = Vec::new();
data_request.encode(&mut data_request_bytes)?;
// Publish to ingestors with exchange prefix
let version_frame = vec![PROTOCOL_VERSION];
let mut message_frame = vec![MSG_TYPE_DATA_REQUEST];
message_frame.extend_from_slice(&data_request_bytes);
ingestor_socket.send(&exchange_prefix, zmq::SNDMORE)?;
ingestor_socket.send(&version_frame, zmq::SNDMORE)?;
ingestor_socket.send(&message_frame, 0)?;
info!("Published to ingestors: prefix={}, request_id={}", exchange_prefix, request_id);
// Build SubmitResponse protobuf
// NOTE: This topic is DETERMINISTIC based on client-generated values.
// Client should have already subscribed to this topic BEFORE sending the request
// to prevent race condition where notification arrives before client subscribes.
let notification_topic = if let Some(cid) = &client_id {
format!("RESPONSE:{}", cid)
} else {
format!("HISTORY_READY:{}", request_id)
};
let response = proto::SubmitResponse {
request_id: request_id.clone(),
status: proto::submit_response::SubmitStatus::Queued as i32,
error_message: None,
notification_topic: notification_topic.clone(),
};
let mut response_bytes = Vec::new();
response.encode(&mut response_bytes)?;
// Send immediate response to client
let version_frame = vec![PROTOCOL_VERSION];
let mut message_frame = vec![MSG_TYPE_SUBMIT_RESPONSE];
message_frame.extend_from_slice(&response_bytes);
client_socket.send(&client_identity, zmq::SNDMORE)?;
client_socket.send(&[] as &[u8], zmq::SNDMORE)?;
client_socket.send(&version_frame, zmq::SNDMORE)?;
client_socket.send(&message_frame, 0)?;
info!("Sent SubmitResponse to client: request_id={}, topic={}", request_id, notification_topic);
// Relay is now DONE with this request - completely stateless!
// Client will receive notification via pub/sub when Flink publishes HistoryReadyNotification
Ok(())
}
fn proxy_subscription(
frontend: &zmq::Socket,
backend: &zmq::Socket,
) -> Result<()> {
// Forward subscription message from XPUB to XSUB
let msg = frontend.recv_bytes(0)?;
backend.send(&msg, 0)?;
if !msg.is_empty() {
let action = if msg[0] == 1 { "subscribe" } else { "unsubscribe" };
let topic = String::from_utf8_lossy(&msg[1..]);
debug!("Client {} to topic: {}", action, topic);
}
Ok(())
}
fn proxy_market_data(
backend: &zmq::Socket,
frontend: &zmq::Socket,
) -> Result<()> {
// Forward all messages from XSUB to XPUB (zero-copy proxy)
// This includes:
// - Regular market data (ticks, OHLC)
// - HistoryReadyNotification from Flink
loop {
let msg = backend.recv_bytes(0)?;
let more = backend.get_rcvmore()?;
if more {
frontend.send(&msg, zmq::SNDMORE)?;
} else {
frontend.send(&msg, 0)?;
break;
}
}
Ok(())
}
}