Files
ai/ingestor/README.md
2026-03-11 18:47:11 -04:00

5.2 KiB

CCXT Market Data Ingestor

A NodeJS-based market data ingestor that uses CCXT to fetch historical OHLC data and realtime tick data from cryptocurrency exchanges. Integrates with Apache Flink via ZeroMQ for work distribution and writes data to Kafka.

Architecture

The ingestor is a worker process that:

  1. Connects to Flink's ZMQ work queue (PULL socket) to receive data requests
  2. Connects to Flink's ZMQ control channel (SUB socket) to receive control messages
  3. Fetches market data from exchanges using CCXT
  4. Writes data to Kafka using the protobuf protocol

Data Request Types

Historical OHLC

  • Fetches historical candlestick data for a specified time range
  • Uses CCXT's fetchOHLCV method
  • Writes OHLC messages to Kafka
  • Request is completed and removed from queue after processing

Realtime Ticks

  • Subscribes to realtime trade data
  • Uses 10-second polling to fetch recent trades via fetchTrades
  • Writes Tick messages to Kafka market-0 topic
  • Subscription persists until cancelled by Flink control message

Installation

npm install

Configuration

Create config.yaml based on config.example.yaml:

# Flink ZMQ endpoints
flink_hostname: localhost
ingestor_work_port: 5555
ingestor_control_port: 5556

# Kafka configuration
kafka_brokers:
  - localhost:9092
kafka_topic: market-0

# Worker configuration
max_concurrent: 10
poll_interval_ms: 10000

Optional secrets.yaml for sensitive configuration.

Usage

Development

npm run dev

Production

npm start

Docker

docker build -t ccxt-ingestor .
docker run -v /path/to/config:/config ccxt-ingestor

Ticker Format

Tickers must be in the format: EXCHANGE:SYMBOL

Examples:

  • BINANCE:BTC/USDT
  • COINBASE:ETH/USD
  • KRAKEN:XRP/EUR

Protocol

ZeroMQ Message Format

All messages use a two-frame envelope:

Frame 1: [1 byte: protocol version = 0x01]
Frame 2: [1 byte: message type ID][N bytes: protobuf message]

Message Type IDs

  • 0x01: DataRequest
  • 0x02: IngestorControl
  • 0x03: Tick
  • 0x04: OHLC
message DataRequest {
  string request_id = 1;
  RequestType type = 2;  // HISTORICAL_OHLC or REALTIME_TICKS
  string ticker = 3;
  optional HistoricalParams historical = 4;
  optional RealtimeParams realtime = 5;
}
message IngestorControl {
  ControlAction action = 1;  // CANCEL, SHUTDOWN, CONFIG_UPDATE, HEARTBEAT
  optional string request_id = 2;
  optional IngestorConfig config = 3;
}

Tick (to Kafka)

message Tick {
  string trade_id = 1;
  string ticker = 2;
  uint64 timestamp = 3;      // microseconds
  int64 price = 4;           // fixed-point (10^8)
  int64 amount = 5;          // fixed-point (10^8)
  int64 quote_amount = 6;    // fixed-point (10^8)
  bool taker_buy = 7;
}

OHLC (to Kafka)

message OHLC {
  int64 open = 2;            // fixed-point (10^8)
  int64 high = 3;
  int64 low = 4;
  int64 close = 5;
  optional int64 volume = 6;
  optional int64 open_time = 9;   // microseconds
  optional int64 close_time = 12;
  string ticker = 14;
}

Fixed-Point Encoding

All prices and amounts are encoded as fixed-point integers using 8 decimal places (denominator = 10^8):

  • Example: 123.45678901 → 12345678901
  • This provides precision while avoiding floating-point errors

Components

src/index.js

Main worker process that coordinates all components and handles the work loop.

src/zmq-client.js

ZeroMQ client for connecting to Flink's work queue and control channel.

src/kafka-producer.js

Kafka producer for writing protobuf-encoded messages to Kafka topics.

src/ccxt-fetcher.js

CCXT wrapper for fetching historical OHLC and recent trades from exchanges.

src/realtime-poller.js

Manages realtime subscriptions with 10-second polling for trade updates.

src/proto/messages.js

Protobuf message definitions and encoding/decoding utilities.

Error Handling

  • Failed requests automatically return to the Flink work queue
  • Realtime subscriptions are cancelled after 5 consecutive errors
  • Worker logs all errors with context for debugging
  • Graceful shutdown on SIGINT/SIGTERM

Monitoring

The worker logs status information every 60 seconds including:

  • Number of active requests
  • Realtime subscription statistics
  • Error counts

Environment Variables

  • CONFIG_PATH: Path to config.yaml (default: /config/config.yaml)
  • SECRETS_PATH: Path to secrets.yaml (default: /config/secrets.yaml)
  • LOG_LEVEL: Log level (default: info)

Supported Exchanges

All exchanges supported by CCXT can be used. Popular exchanges include:

  • Binance
  • Coinbase
  • Kraken
  • Bitfinex
  • Huobi
  • And 100+ more

Development

Project Structure

redesign/ingestor/
├── src/
│   ├── index.js              # Main worker
│   ├── zmq-client.js         # ZMQ client
│   ├── kafka-producer.js     # Kafka producer
│   ├── ccxt-fetcher.js       # CCXT wrapper
│   ├── realtime-poller.js    # Realtime poller
│   └── proto/
│       └── messages.js       # Protobuf definitions
├── config.example.yaml
├── Dockerfile
├── package.json
└── README.md

License

ISC