227 lines
5.2 KiB
Markdown
227 lines
5.2 KiB
Markdown
# CCXT Market Data Ingestor
|
|
|
|
A NodeJS-based market data ingestor that uses CCXT to fetch historical OHLC data and realtime tick data from cryptocurrency exchanges. Integrates with Apache Flink via ZeroMQ for work distribution and writes data to Kafka.
|
|
|
|
## Architecture
|
|
|
|
The ingestor is a worker process that:
|
|
1. Connects to Flink's ZMQ work queue (PULL socket) to receive data requests
|
|
2. Connects to Flink's ZMQ control channel (SUB socket) to receive control messages
|
|
3. Fetches market data from exchanges using CCXT
|
|
4. Writes data to Kafka using the protobuf protocol
|
|
|
|
### Data Request Types
|
|
|
|
#### Historical OHLC
|
|
- Fetches historical candlestick data for a specified time range
|
|
- Uses CCXT's `fetchOHLCV` method
|
|
- Writes OHLC messages to Kafka
|
|
- Request is completed and removed from queue after processing
|
|
|
|
#### Realtime Ticks
|
|
- Subscribes to realtime trade data
|
|
- Uses 10-second polling to fetch recent trades via `fetchTrades`
|
|
- Writes Tick messages to Kafka `market-0` topic
|
|
- Subscription persists until cancelled by Flink control message
|
|
|
|
## Installation
|
|
|
|
```bash
|
|
npm install
|
|
```
|
|
|
|
## Configuration
|
|
|
|
Create `config.yaml` based on `config.example.yaml`:
|
|
|
|
```yaml
|
|
# Flink ZMQ endpoints
|
|
flink_hostname: localhost
|
|
ingestor_work_port: 5555
|
|
ingestor_control_port: 5556
|
|
|
|
# Kafka configuration
|
|
kafka_brokers:
|
|
- localhost:9092
|
|
kafka_topic: market-0
|
|
|
|
# Worker configuration
|
|
max_concurrent: 10
|
|
poll_interval_ms: 10000
|
|
```
|
|
|
|
Optional `secrets.yaml` for sensitive configuration.
|
|
|
|
## Usage
|
|
|
|
### Development
|
|
```bash
|
|
npm run dev
|
|
```
|
|
|
|
### Production
|
|
```bash
|
|
npm start
|
|
```
|
|
|
|
### Docker
|
|
```bash
|
|
docker build -t ccxt-ingestor .
|
|
docker run -v /path/to/config:/config ccxt-ingestor
|
|
```
|
|
|
|
## Ticker Format
|
|
|
|
Tickers must be in the format: `EXCHANGE:SYMBOL`
|
|
|
|
Examples:
|
|
- `BINANCE:BTC/USDT`
|
|
- `COINBASE:ETH/USD`
|
|
- `KRAKEN:XRP/EUR`
|
|
|
|
## Protocol
|
|
|
|
### ZeroMQ Message Format
|
|
|
|
All messages use a two-frame envelope:
|
|
```
|
|
Frame 1: [1 byte: protocol version = 0x01]
|
|
Frame 2: [1 byte: message type ID][N bytes: protobuf message]
|
|
```
|
|
|
|
### Message Type IDs
|
|
- `0x01`: DataRequest
|
|
- `0x02`: IngestorControl
|
|
- `0x03`: Tick
|
|
- `0x04`: OHLC
|
|
|
|
### DataRequest (from Flink)
|
|
|
|
```protobuf
|
|
message DataRequest {
|
|
string request_id = 1;
|
|
RequestType type = 2; // HISTORICAL_OHLC or REALTIME_TICKS
|
|
string ticker = 3;
|
|
optional HistoricalParams historical = 4;
|
|
optional RealtimeParams realtime = 5;
|
|
}
|
|
```
|
|
|
|
### IngestorControl (from Flink)
|
|
|
|
```protobuf
|
|
message IngestorControl {
|
|
ControlAction action = 1; // CANCEL, SHUTDOWN, CONFIG_UPDATE, HEARTBEAT
|
|
optional string request_id = 2;
|
|
optional IngestorConfig config = 3;
|
|
}
|
|
```
|
|
|
|
### Tick (to Kafka)
|
|
|
|
```protobuf
|
|
message Tick {
|
|
string trade_id = 1;
|
|
string ticker = 2;
|
|
uint64 timestamp = 3; // microseconds
|
|
int64 price = 4; // fixed-point (10^8)
|
|
int64 amount = 5; // fixed-point (10^8)
|
|
int64 quote_amount = 6; // fixed-point (10^8)
|
|
bool taker_buy = 7;
|
|
}
|
|
```
|
|
|
|
### OHLC (to Kafka)
|
|
|
|
```protobuf
|
|
message OHLC {
|
|
int64 open = 2; // fixed-point (10^8)
|
|
int64 high = 3;
|
|
int64 low = 4;
|
|
int64 close = 5;
|
|
optional int64 volume = 6;
|
|
optional int64 open_time = 9; // microseconds
|
|
optional int64 close_time = 12;
|
|
string ticker = 14;
|
|
}
|
|
```
|
|
|
|
## Fixed-Point Encoding
|
|
|
|
All prices and amounts are encoded as fixed-point integers using 8 decimal places (denominator = 10^8):
|
|
- Example: 123.45678901 → 12345678901
|
|
- This provides precision while avoiding floating-point errors
|
|
|
|
## Components
|
|
|
|
### `src/index.js`
|
|
Main worker process that coordinates all components and handles the work loop.
|
|
|
|
### `src/zmq-client.js`
|
|
ZeroMQ client for connecting to Flink's work queue and control channel.
|
|
|
|
### `src/kafka-producer.js`
|
|
Kafka producer for writing protobuf-encoded messages to Kafka topics.
|
|
|
|
### `src/ccxt-fetcher.js`
|
|
CCXT wrapper for fetching historical OHLC and recent trades from exchanges.
|
|
|
|
### `src/realtime-poller.js`
|
|
Manages realtime subscriptions with 10-second polling for trade updates.
|
|
|
|
### `src/proto/messages.js`
|
|
Protobuf message definitions and encoding/decoding utilities.
|
|
|
|
## Error Handling
|
|
|
|
- Failed requests automatically return to the Flink work queue
|
|
- Realtime subscriptions are cancelled after 5 consecutive errors
|
|
- Worker logs all errors with context for debugging
|
|
- Graceful shutdown on SIGINT/SIGTERM
|
|
|
|
## Monitoring
|
|
|
|
The worker logs status information every 60 seconds including:
|
|
- Number of active requests
|
|
- Realtime subscription statistics
|
|
- Error counts
|
|
|
|
## Environment Variables
|
|
|
|
- `CONFIG_PATH`: Path to config.yaml (default: `/config/config.yaml`)
|
|
- `SECRETS_PATH`: Path to secrets.yaml (default: `/config/secrets.yaml`)
|
|
- `LOG_LEVEL`: Log level (default: `info`)
|
|
|
|
## Supported Exchanges
|
|
|
|
All exchanges supported by CCXT can be used. Popular exchanges include:
|
|
- Binance
|
|
- Coinbase
|
|
- Kraken
|
|
- Bitfinex
|
|
- Huobi
|
|
- And 100+ more
|
|
|
|
## Development
|
|
|
|
### Project Structure
|
|
```
|
|
redesign/ingestor/
|
|
├── src/
|
|
│ ├── index.js # Main worker
|
|
│ ├── zmq-client.js # ZMQ client
|
|
│ ├── kafka-producer.js # Kafka producer
|
|
│ ├── ccxt-fetcher.js # CCXT wrapper
|
|
│ ├── realtime-poller.js # Realtime poller
|
|
│ └── proto/
|
|
│ └── messages.js # Protobuf definitions
|
|
├── config.example.yaml
|
|
├── Dockerfile
|
|
├── package.json
|
|
└── README.md
|
|
```
|
|
|
|
## License
|
|
|
|
ISC
|