prod alpha deploy

This commit is contained in:
2026-04-10 16:09:39 -04:00
parent 7d231169d9
commit 6418729b16
17 changed files with 375 additions and 967 deletions

35
doc/scaling.md Normal file
View File

@@ -0,0 +1,35 @@
# Scaling Notes
## TODO: Flink-to-Relay ZMQ Discovery
Currently Relay connects to Flink via XSUB on a single endpoint. With multiple Flink instances behind a K8s service, we need many-to-many connectivity.
**Problem**: K8s service load balancing doesn't help ZMQ since connections are persistent. Relay needs to connect to ALL Flink instances to receive all published messages.
**Proposed Solution**: Use a K8s headless service for Flink workers:
```yaml
apiVersion: v1
kind: Service
metadata:
name: flink-workers
spec:
clusterIP: None
selector:
app: flink
```
Relay implementation:
1. On startup and periodically (every N seconds), resolve `flink-workers.namespace.svc.cluster.local`
2. DNS returns A records for all Flink pod IPs
3. Diff against current XSUB connections
4. Connect to new pods, disconnect from removed pods
**Alternative approaches considered**:
- XPUB/XSUB broker: Adds single point of failure and latency
- Service discovery (etcd/Redis): More complex, requires additional infrastructure
**Open questions**:
- Appropriate polling interval for DNS resolution (510 seconds?)
- Handling of brief disconnection during pod replacement
- Whether to use K8s Endpoints API watch instead of DNS polling for faster reaction