backend redesign

This commit is contained in:
2026-03-11 18:47:11 -04:00
parent 8ff277c8c6
commit e99ef5d2dd
210 changed files with 12147 additions and 155 deletions

226
ingestor/README.md Normal file
View File

@@ -0,0 +1,226 @@
# CCXT Market Data Ingestor
A NodeJS-based market data ingestor that uses CCXT to fetch historical OHLC data and realtime tick data from cryptocurrency exchanges. Integrates with Apache Flink via ZeroMQ for work distribution and writes data to Kafka.
## Architecture
The ingestor is a worker process that:
1. Connects to Flink's ZMQ work queue (PULL socket) to receive data requests
2. Connects to Flink's ZMQ control channel (SUB socket) to receive control messages
3. Fetches market data from exchanges using CCXT
4. Writes data to Kafka using the protobuf protocol
### Data Request Types
#### Historical OHLC
- Fetches historical candlestick data for a specified time range
- Uses CCXT's `fetchOHLCV` method
- Writes OHLC messages to Kafka
- Request is completed and removed from queue after processing
#### Realtime Ticks
- Subscribes to realtime trade data
- Uses 10-second polling to fetch recent trades via `fetchTrades`
- Writes Tick messages to Kafka `market-0` topic
- Subscription persists until cancelled by Flink control message
## Installation
```bash
npm install
```
## Configuration
Create `config.yaml` based on `config.example.yaml`:
```yaml
# Flink ZMQ endpoints
flink_hostname: localhost
ingestor_work_port: 5555
ingestor_control_port: 5556
# Kafka configuration
kafka_brokers:
- localhost:9092
kafka_topic: market-0
# Worker configuration
max_concurrent: 10
poll_interval_ms: 10000
```
Optional `secrets.yaml` for sensitive configuration.
## Usage
### Development
```bash
npm run dev
```
### Production
```bash
npm start
```
### Docker
```bash
docker build -t ccxt-ingestor .
docker run -v /path/to/config:/config ccxt-ingestor
```
## Ticker Format
Tickers must be in the format: `EXCHANGE:SYMBOL`
Examples:
- `BINANCE:BTC/USDT`
- `COINBASE:ETH/USD`
- `KRAKEN:XRP/EUR`
## Protocol
### ZeroMQ Message Format
All messages use a two-frame envelope:
```
Frame 1: [1 byte: protocol version = 0x01]
Frame 2: [1 byte: message type ID][N bytes: protobuf message]
```
### Message Type IDs
- `0x01`: DataRequest
- `0x02`: IngestorControl
- `0x03`: Tick
- `0x04`: OHLC
### DataRequest (from Flink)
```protobuf
message DataRequest {
string request_id = 1;
RequestType type = 2; // HISTORICAL_OHLC or REALTIME_TICKS
string ticker = 3;
optional HistoricalParams historical = 4;
optional RealtimeParams realtime = 5;
}
```
### IngestorControl (from Flink)
```protobuf
message IngestorControl {
ControlAction action = 1; // CANCEL, SHUTDOWN, CONFIG_UPDATE, HEARTBEAT
optional string request_id = 2;
optional IngestorConfig config = 3;
}
```
### Tick (to Kafka)
```protobuf
message Tick {
string trade_id = 1;
string ticker = 2;
uint64 timestamp = 3; // microseconds
int64 price = 4; // fixed-point (10^8)
int64 amount = 5; // fixed-point (10^8)
int64 quote_amount = 6; // fixed-point (10^8)
bool taker_buy = 7;
}
```
### OHLC (to Kafka)
```protobuf
message OHLC {
int64 open = 2; // fixed-point (10^8)
int64 high = 3;
int64 low = 4;
int64 close = 5;
optional int64 volume = 6;
optional int64 open_time = 9; // microseconds
optional int64 close_time = 12;
string ticker = 14;
}
```
## Fixed-Point Encoding
All prices and amounts are encoded as fixed-point integers using 8 decimal places (denominator = 10^8):
- Example: 123.45678901 → 12345678901
- This provides precision while avoiding floating-point errors
## Components
### `src/index.js`
Main worker process that coordinates all components and handles the work loop.
### `src/zmq-client.js`
ZeroMQ client for connecting to Flink's work queue and control channel.
### `src/kafka-producer.js`
Kafka producer for writing protobuf-encoded messages to Kafka topics.
### `src/ccxt-fetcher.js`
CCXT wrapper for fetching historical OHLC and recent trades from exchanges.
### `src/realtime-poller.js`
Manages realtime subscriptions with 10-second polling for trade updates.
### `src/proto/messages.js`
Protobuf message definitions and encoding/decoding utilities.
## Error Handling
- Failed requests automatically return to the Flink work queue
- Realtime subscriptions are cancelled after 5 consecutive errors
- Worker logs all errors with context for debugging
- Graceful shutdown on SIGINT/SIGTERM
## Monitoring
The worker logs status information every 60 seconds including:
- Number of active requests
- Realtime subscription statistics
- Error counts
## Environment Variables
- `CONFIG_PATH`: Path to config.yaml (default: `/config/config.yaml`)
- `SECRETS_PATH`: Path to secrets.yaml (default: `/config/secrets.yaml`)
- `LOG_LEVEL`: Log level (default: `info`)
## Supported Exchanges
All exchanges supported by CCXT can be used. Popular exchanges include:
- Binance
- Coinbase
- Kraken
- Bitfinex
- Huobi
- And 100+ more
## Development
### Project Structure
```
redesign/ingestor/
├── src/
│ ├── index.js # Main worker
│ ├── zmq-client.js # ZMQ client
│ ├── kafka-producer.js # Kafka producer
│ ├── ccxt-fetcher.js # CCXT wrapper
│ ├── realtime-poller.js # Realtime poller
│ └── proto/
│ └── messages.js # Protobuf definitions
├── config.example.yaml
├── Dockerfile
├── package.json
└── README.md
```
## License
ISC