backend redesign
This commit is contained in:
259
client-py/README.md
Normal file
259
client-py/README.md
Normal file
@@ -0,0 +1,259 @@
|
||||
# DexOrder Python Client Library
|
||||
|
||||
High-level Python API for accessing historical OHLC data from the DexOrder trading platform.
|
||||
|
||||
## Features
|
||||
|
||||
- **Smart Caching**: Automatically checks Iceberg warehouse before requesting new data
|
||||
- **Async Request/Response**: Non-blocking historical data requests via relay
|
||||
- **Gap Detection**: Identifies and requests only missing data ranges
|
||||
- **Transparent Access**: Single API for both cached and on-demand data
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
cd redesign/client-py
|
||||
pip install -e .
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
|
||||
```python
|
||||
import asyncio
|
||||
from dexorder import OHLCClient
|
||||
|
||||
async def main():
|
||||
# Initialize client
|
||||
client = OHLCClient(
|
||||
iceberg_catalog_uri="http://iceberg-catalog:8181",
|
||||
relay_endpoint="tcp://relay:5555",
|
||||
notification_endpoint="tcp://flink:5557"
|
||||
)
|
||||
|
||||
# Start background notification listener
|
||||
await client.start()
|
||||
|
||||
try:
|
||||
# Fetch OHLC data (automatically checks cache and requests missing data)
|
||||
df = await client.fetch_ohlc(
|
||||
ticker="BINANCE:BTC/USDT",
|
||||
period_seconds=3600, # 1-hour candles
|
||||
start_time=1735689600000000, # microseconds
|
||||
end_time=1736294399000000
|
||||
)
|
||||
|
||||
print(f"Fetched {len(df)} candles")
|
||||
print(df.head())
|
||||
|
||||
finally:
|
||||
await client.stop()
|
||||
|
||||
# Run
|
||||
asyncio.run(main())
|
||||
```
|
||||
|
||||
## Using Context Manager
|
||||
|
||||
```python
|
||||
async def main():
|
||||
async with OHLCClient(...) as client:
|
||||
df = await client.fetch_ohlc(...)
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
### Components
|
||||
|
||||
1. **OHLCClient**: High-level API with smart caching
|
||||
2. **IcebergClient**: Direct queries to Iceberg warehouse
|
||||
3. **HistoryClient**: Submit requests via relay and wait for notifications
|
||||
|
||||
### Data Flow
|
||||
|
||||
```
|
||||
┌─────────┐
|
||||
│ Client │
|
||||
└────┬────┘
|
||||
│ 1. fetch_ohlc()
|
||||
▼
|
||||
┌─────────────────┐
|
||||
│ OHLCClient │
|
||||
└────┬────────────┘
|
||||
│ 2. Check Iceberg
|
||||
▼
|
||||
┌─────────────────┐ ┌──────────┐
|
||||
│ IcebergClient │─────▶│ Iceberg │
|
||||
└─────────────────┘ └──────────┘
|
||||
│ 3. Missing data?
|
||||
▼
|
||||
┌─────────────────┐ ┌──────────┐
|
||||
│ HistoryClient │─────▶│ Relay │
|
||||
└────┬────────────┘ └──────────┘
|
||||
│ │
|
||||
│ 4. Wait for notification │
|
||||
│◀─────────────────────────┘
|
||||
│ 5. Query Iceberg again
|
||||
▼
|
||||
┌─────────────────┐
|
||||
│ Return data │
|
||||
└─────────────────┘
|
||||
```
|
||||
|
||||
## API Reference
|
||||
|
||||
### OHLCClient
|
||||
|
||||
#### `__init__(iceberg_catalog_uri, relay_endpoint, notification_endpoint, namespace="trading")`
|
||||
|
||||
Initialize the client with connection parameters.
|
||||
|
||||
#### `async fetch_ohlc(ticker, period_seconds, start_time, end_time, request_timeout=30.0)`
|
||||
|
||||
Fetch OHLC data with smart caching.
|
||||
|
||||
**Parameters:**
|
||||
- `ticker` (str): Market identifier (e.g., "BINANCE:BTC/USDT")
|
||||
- `period_seconds` (int): OHLC period in seconds (60, 300, 3600, 86400, etc.)
|
||||
- `start_time` (int): Start timestamp in microseconds
|
||||
- `end_time` (int): End timestamp in microseconds
|
||||
- `request_timeout` (float): Timeout for historical requests in seconds
|
||||
|
||||
**Returns:** `pd.DataFrame` with columns:
|
||||
- `ticker`: Market identifier
|
||||
- `period_seconds`: Period in seconds
|
||||
- `timestamp`: Candle timestamp (microseconds)
|
||||
- `open`, `high`, `low`, `close`: Prices (integer format)
|
||||
- `volume`: Trading volume
|
||||
- Additional fields: `buy_vol`, `sell_vol`, `open_interest`, etc.
|
||||
|
||||
### IcebergClient
|
||||
|
||||
Direct access to Iceberg warehouse.
|
||||
|
||||
#### `query_ohlc(ticker, period_seconds, start_time, end_time)`
|
||||
|
||||
Query OHLC data directly from Iceberg.
|
||||
|
||||
#### `find_missing_ranges(ticker, period_seconds, start_time, end_time)`
|
||||
|
||||
Identify missing data ranges. Returns list of `(start_time, end_time)` tuples.
|
||||
|
||||
#### `has_data(ticker, period_seconds, start_time, end_time)`
|
||||
|
||||
Check if any data exists for the given parameters.
|
||||
|
||||
### HistoryClient
|
||||
|
||||
Low-level client for submitting historical data requests.
|
||||
|
||||
**IMPORTANT**: Always call `connect()` before making requests to prevent race condition.
|
||||
|
||||
#### `async connect()`
|
||||
|
||||
Connect to relay and start notification listener. **MUST be called before making any requests.**
|
||||
|
||||
This subscribes to the notification topic `RESPONSE:{client_id}` BEFORE any requests are sent,
|
||||
preventing the race condition where notifications arrive before subscription.
|
||||
|
||||
#### `async request_historical_ohlc(ticker, period_seconds, start_time, end_time, timeout=30.0, limit=None)`
|
||||
|
||||
Submit historical data request and wait for completion notification.
|
||||
|
||||
**Returns:** dict with keys:
|
||||
- `request_id`: The request ID
|
||||
- `status`: 'OK', 'NOT_FOUND', or 'ERROR'
|
||||
- `error_message`: Error message if status is 'ERROR'
|
||||
- `iceberg_namespace`, `iceberg_table`, `row_count`: Available when status is 'OK'
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
from dexorder import HistoryClient
|
||||
|
||||
client = HistoryClient(
|
||||
relay_endpoint="tcp://relay:5559",
|
||||
notification_endpoint="tcp://relay:5558"
|
||||
)
|
||||
|
||||
# CRITICAL: Connect first to prevent race condition
|
||||
await client.connect()
|
||||
|
||||
# Now safe to make requests
|
||||
result = await client.request_historical_ohlc(
|
||||
ticker="BINANCE:BTC/USDT",
|
||||
period_seconds=3600,
|
||||
start_time=1735689600000000,
|
||||
end_time=1736294399000000
|
||||
)
|
||||
|
||||
await client.close()
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
The client requires the following endpoints:
|
||||
|
||||
- **Iceberg Catalog URI**: REST API endpoint for Iceberg metadata (default: `http://iceberg-catalog:8181`)
|
||||
- **Relay Endpoint**: ZMQ REQ/REP endpoint for submitting requests (default: `tcp://relay:5555`)
|
||||
- **Notification Endpoint**: ZMQ PUB/SUB endpoint for receiving notifications (default: `tcp://flink:5557`)
|
||||
|
||||
## Development
|
||||
|
||||
### Generate Protobuf Files
|
||||
|
||||
```bash
|
||||
cd redesign/protobuf
|
||||
protoc -I . --python_out=../client-py/dexorder ingestor.proto ohlc.proto
|
||||
```
|
||||
|
||||
### Run Tests
|
||||
|
||||
```bash
|
||||
pytest tests/
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
See `../relay/test/async_client.py` for a complete example.
|
||||
|
||||
## Timestamp Format
|
||||
|
||||
All timestamps are in **microseconds since epoch**:
|
||||
|
||||
```python
|
||||
# Convert from datetime
|
||||
from datetime import datetime, timezone
|
||||
|
||||
dt = datetime(2024, 1, 1, tzinfo=timezone.utc)
|
||||
timestamp_micros = int(dt.timestamp() * 1_000_000)
|
||||
|
||||
# Convert to datetime
|
||||
dt = datetime.fromtimestamp(timestamp_micros / 1_000_000, tz=timezone.utc)
|
||||
```
|
||||
|
||||
## Period Seconds
|
||||
|
||||
Common period values:
|
||||
- `60` - 1 minute
|
||||
- `300` - 5 minutes
|
||||
- `900` - 15 minutes
|
||||
- `3600` - 1 hour
|
||||
- `14400` - 4 hours
|
||||
- `86400` - 1 day
|
||||
- `604800` - 1 week
|
||||
|
||||
## Error Handling
|
||||
|
||||
```python
|
||||
try:
|
||||
df = await client.fetch_ohlc(...)
|
||||
except TimeoutError:
|
||||
print("Request timed out")
|
||||
except ValueError as e:
|
||||
print(f"Request failed: {e}")
|
||||
except ConnectionError:
|
||||
print("Unable to connect to relay")
|
||||
```
|
||||
|
||||
## License
|
||||
|
||||
Internal use only.
|
||||
3
client-py/__init__.py
Normal file
3
client-py/__init__.py
Normal file
@@ -0,0 +1,3 @@
|
||||
import logging
|
||||
|
||||
log = logging.getLogger(__name__)
|
||||
16
client-py/dexorder/__init__.py
Normal file
16
client-py/dexorder/__init__.py
Normal file
@@ -0,0 +1,16 @@
|
||||
"""
|
||||
DexOrder Trading Platform Python Client
|
||||
|
||||
Provides high-level APIs for:
|
||||
- Historical OHLC data retrieval with smart caching
|
||||
- Async request/response via relay
|
||||
- Iceberg data warehouse queries
|
||||
"""
|
||||
|
||||
__version__ = "0.1.0"
|
||||
|
||||
from .ohlc_client import OHLCClient
|
||||
from .iceberg_client import IcebergClient
|
||||
from .history_client import HistoryClient
|
||||
|
||||
__all__ = ['OHLCClient', 'IcebergClient', 'HistoryClient']
|
||||
296
client-py/dexorder/history_client.py
Normal file
296
client-py/dexorder/history_client.py
Normal file
@@ -0,0 +1,296 @@
|
||||
"""
|
||||
HistoryClient - Submit historical data requests via relay and wait for notifications
|
||||
|
||||
RACE CONDITION PREVENTION:
|
||||
The client must subscribe to notification topics BEFORE submitting requests.
|
||||
Notification topics are deterministic: RESPONSE:{client_id} or HISTORY_READY:{request_id}
|
||||
Since both are client-generated, we can subscribe before sending the request.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import uuid
|
||||
import zmq
|
||||
import zmq.asyncio
|
||||
from typing import Optional
|
||||
import struct
|
||||
import sys
|
||||
import os
|
||||
|
||||
# Import protobuf messages (assuming they're generated in ../protobuf)
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), '../../protobuf'))
|
||||
try:
|
||||
from ingestor_pb2 import SubmitHistoricalRequest, SubmitResponse, HistoryReadyNotification
|
||||
except ImportError:
|
||||
print("Warning: Protobuf files not found. Run: protoc -I ../protobuf --python_out=../protobuf ../protobuf/*.proto")
|
||||
raise
|
||||
|
||||
|
||||
class HistoryClient:
|
||||
"""
|
||||
Client for submitting historical data requests via relay.
|
||||
|
||||
IMPORTANT: Call connect() before making any requests. This ensures the notification
|
||||
listener is running and subscribed BEFORE any requests are submitted, preventing
|
||||
the race condition where notifications arrive before subscription.
|
||||
|
||||
Provides:
|
||||
- Submit historical OHLC data requests
|
||||
- Wait for completion notifications
|
||||
- Handle request timeouts and errors
|
||||
"""
|
||||
|
||||
def __init__(self, relay_endpoint: str, notification_endpoint: str, client_id: Optional[str] = None):
|
||||
"""
|
||||
Initialize history client.
|
||||
|
||||
Args:
|
||||
relay_endpoint: ZMQ endpoint for relay client requests (e.g., "tcp://relay:5559")
|
||||
notification_endpoint: ZMQ endpoint for notifications (e.g., "tcp://relay:5558")
|
||||
client_id: Optional client ID for notification routing. If not provided, generates one.
|
||||
All notifications for this client will be sent to topic RESPONSE:{client_id}
|
||||
"""
|
||||
self.relay_endpoint = relay_endpoint
|
||||
self.notification_endpoint = notification_endpoint
|
||||
self.client_id = client_id or f"client-{uuid.uuid4().hex[:8]}"
|
||||
self.context = zmq.asyncio.Context()
|
||||
self.pending_requests = {} # request_id -> asyncio.Event
|
||||
self.notification_task = None
|
||||
self.connected = False
|
||||
|
||||
async def connect(self):
|
||||
"""
|
||||
Connect to relay and start notification listener.
|
||||
|
||||
CRITICAL: This MUST be called before making any requests to prevent race condition.
|
||||
The notification listener subscribes to the deterministic topic RESPONSE:{client_id}
|
||||
BEFORE any requests are sent, ensuring we never miss notifications.
|
||||
"""
|
||||
if self.connected:
|
||||
return
|
||||
|
||||
# Start notification listener FIRST
|
||||
self.notification_task = asyncio.create_task(self._notification_listener())
|
||||
|
||||
# Give the listener a moment to connect and subscribe
|
||||
await asyncio.sleep(0.1)
|
||||
|
||||
self.connected = True
|
||||
|
||||
async def request_historical_ohlc(
|
||||
self,
|
||||
ticker: str,
|
||||
period_seconds: int,
|
||||
start_time: int,
|
||||
end_time: int,
|
||||
timeout: float = 30.0,
|
||||
limit: Optional[int] = None
|
||||
) -> dict:
|
||||
"""
|
||||
Request historical OHLC data and wait for completion notification.
|
||||
|
||||
IMPORTANT: Call connect() before using this method.
|
||||
|
||||
Args:
|
||||
ticker: Market identifier (e.g., "BINANCE:BTC/USDT")
|
||||
period_seconds: OHLC period in seconds
|
||||
start_time: Start timestamp in microseconds
|
||||
end_time: End timestamp in microseconds
|
||||
timeout: Request timeout in seconds (default: 30)
|
||||
limit: Optional limit on number of candles
|
||||
|
||||
Returns:
|
||||
dict with keys:
|
||||
- request_id: The request ID
|
||||
- status: 'OK', 'NOT_FOUND', or 'ERROR'
|
||||
- error_message: Error message if status is 'ERROR'
|
||||
- iceberg_namespace: Iceberg namespace (if status is 'OK')
|
||||
- iceberg_table: Iceberg table name (if status is 'OK')
|
||||
- row_count: Number of rows written (if status is 'OK')
|
||||
|
||||
Raises:
|
||||
TimeoutError: If request times out
|
||||
ConnectionError: If unable to connect to relay or not connected
|
||||
"""
|
||||
if not self.connected:
|
||||
raise ConnectionError("Client not connected. Call connect() first to prevent race condition.")
|
||||
|
||||
request_id = str(uuid.uuid4())
|
||||
|
||||
# Register the pending request BEFORE sending to eliminate any race condition.
|
||||
# The notification topic is deterministic (RESPONSE:{client_id}) and the listener
|
||||
# is already subscribed, so we just need pending_requests populated before Flink
|
||||
# could possibly publish the notification.
|
||||
event = asyncio.Event()
|
||||
self.pending_requests[request_id] = {
|
||||
'event': event,
|
||||
'result': None
|
||||
}
|
||||
|
||||
try:
|
||||
# Create protobuf request with client_id for notification routing
|
||||
request = SubmitHistoricalRequest(
|
||||
request_id=request_id,
|
||||
ticker=ticker,
|
||||
period_seconds=period_seconds,
|
||||
start_time=start_time,
|
||||
end_time=end_time,
|
||||
client_id=self.client_id # CRITICAL: Enables deterministic notification topic
|
||||
)
|
||||
|
||||
if limit is not None:
|
||||
request.limit = limit
|
||||
|
||||
# Encode with ZMQ envelope: version (1 byte) + message type (1 byte) + protobuf payload
|
||||
MESSAGE_TYPE_SUBMIT_HISTORICAL = 0x10
|
||||
version_frame = struct.pack('B', 0x01)
|
||||
message_frame = struct.pack('B', MESSAGE_TYPE_SUBMIT_HISTORICAL) + request.SerializeToString()
|
||||
|
||||
# Send request to relay
|
||||
socket = self.context.socket(zmq.REQ)
|
||||
socket.connect(self.relay_endpoint)
|
||||
|
||||
try:
|
||||
# Send two frames: version, then message
|
||||
await socket.send(version_frame, zmq.SNDMORE)
|
||||
await socket.send(message_frame)
|
||||
|
||||
# Wait for immediate response
|
||||
response_frames = []
|
||||
while True:
|
||||
frame = await asyncio.wait_for(socket.recv(), timeout=5.0)
|
||||
response_frames.append(frame)
|
||||
if not socket.get(zmq.RCVMORE):
|
||||
break
|
||||
|
||||
# Parse response (expect 2 frames: version, message)
|
||||
if len(response_frames) < 2:
|
||||
raise ConnectionError(f"Expected 2 frames, got {len(response_frames)}")
|
||||
|
||||
msg_type = response_frames[1][0]
|
||||
response_payload = response_frames[1][1:]
|
||||
|
||||
response = SubmitResponse()
|
||||
response.ParseFromString(response_payload)
|
||||
|
||||
if response.status != 0:
|
||||
raise ConnectionError(f"Request failed: {response.error_message}")
|
||||
|
||||
finally:
|
||||
socket.close()
|
||||
|
||||
# Wait for Flink notification with timeout
|
||||
try:
|
||||
await asyncio.wait_for(event.wait(), timeout=timeout)
|
||||
result = self.pending_requests[request_id]['result']
|
||||
return result
|
||||
except asyncio.TimeoutError:
|
||||
raise TimeoutError(f"Request {request_id} timed out after {timeout}s")
|
||||
|
||||
finally:
|
||||
self.pending_requests.pop(request_id, None)
|
||||
|
||||
async def _notification_listener(self):
|
||||
"""
|
||||
Internal notification listener that subscribes to RESPONSE:{client_id} topic.
|
||||
|
||||
CRITICAL: This runs BEFORE any requests are submitted to prevent race condition.
|
||||
The notification topic is deterministic based on our client_id.
|
||||
"""
|
||||
socket = self.context.socket(zmq.SUB)
|
||||
socket.connect(self.notification_endpoint)
|
||||
|
||||
# Subscribe to our client-specific topic
|
||||
# CRITICAL: This topic is deterministic (RESPONSE:{client_id}) and we know it
|
||||
# before sending any requests, so we can subscribe first to prevent race condition
|
||||
notification_topic = f"RESPONSE:{self.client_id}"
|
||||
socket.setsockopt_string(zmq.SUBSCRIBE, notification_topic)
|
||||
|
||||
try:
|
||||
while True:
|
||||
# Receive multi-frame message: [topic][version][message]
|
||||
frames = []
|
||||
while True:
|
||||
frame = await socket.recv()
|
||||
frames.append(frame)
|
||||
if not socket.get(zmq.RCVMORE):
|
||||
break
|
||||
|
||||
# Parse frames
|
||||
if len(frames) < 3:
|
||||
continue
|
||||
|
||||
topic_frame = frames[0]
|
||||
version_frame = frames[1]
|
||||
message_frame = frames[2]
|
||||
|
||||
# Validate version
|
||||
if len(version_frame) != 1 or version_frame[0] != 0x01:
|
||||
continue
|
||||
|
||||
# Validate message type
|
||||
if len(message_frame) < 1:
|
||||
continue
|
||||
|
||||
msg_type = message_frame[0]
|
||||
payload = message_frame[1:]
|
||||
|
||||
MESSAGE_TYPE_HISTORY_READY = 0x12
|
||||
if msg_type != MESSAGE_TYPE_HISTORY_READY:
|
||||
continue
|
||||
|
||||
# Parse notification (protobuf)
|
||||
try:
|
||||
notification = HistoryReadyNotification()
|
||||
notification.ParseFromString(payload)
|
||||
except Exception as e:
|
||||
print(f"Warning: failed to parse notification payload: {e}")
|
||||
continue
|
||||
|
||||
request_id = notification.request_id
|
||||
|
||||
# Check if we're waiting for this request
|
||||
if request_id in self.pending_requests:
|
||||
# Map protobuf enum to string status
|
||||
# NotificationStatus: OK=0, NOT_FOUND=1, ERROR=2, TIMEOUT=3
|
||||
status_map = {0: 'OK', 1: 'NOT_FOUND', 2: 'ERROR', 3: 'TIMEOUT'}
|
||||
status = status_map.get(notification.status, 'ERROR')
|
||||
|
||||
result = {
|
||||
'request_id': request_id,
|
||||
'status': status,
|
||||
'error_message': notification.error_message if notification.error_message else None
|
||||
}
|
||||
|
||||
# Add Iceberg details if available
|
||||
if status == 'OK':
|
||||
result.update({
|
||||
'iceberg_namespace': notification.iceberg_namespace,
|
||||
'iceberg_table': notification.iceberg_table,
|
||||
'row_count': notification.row_count,
|
||||
'ticker': notification.ticker,
|
||||
'period_seconds': notification.period_seconds,
|
||||
'start_time': notification.start_time,
|
||||
'end_time': notification.end_time,
|
||||
})
|
||||
|
||||
self.pending_requests[request_id]['result'] = result
|
||||
self.pending_requests[request_id]['event'].set()
|
||||
|
||||
except asyncio.CancelledError:
|
||||
pass
|
||||
finally:
|
||||
socket.close()
|
||||
|
||||
async def close(self):
|
||||
"""
|
||||
Close the client and cleanup resources.
|
||||
"""
|
||||
if self.notification_task:
|
||||
self.notification_task.cancel()
|
||||
try:
|
||||
await self.notification_task
|
||||
except asyncio.CancelledError:
|
||||
pass
|
||||
|
||||
self.context.term()
|
||||
self.connected = False
|
||||
179
client-py/dexorder/iceberg_client.py
Normal file
179
client-py/dexorder/iceberg_client.py
Normal file
@@ -0,0 +1,179 @@
|
||||
"""
|
||||
IcebergClient - Query OHLC data from Iceberg warehouse (Iceberg 1.10.1)
|
||||
"""
|
||||
|
||||
from typing import Optional, List, Tuple
|
||||
import pandas as pd
|
||||
from pyiceberg.catalog import load_catalog
|
||||
from pyiceberg.expressions import (
|
||||
And,
|
||||
EqualTo,
|
||||
GreaterThanOrEqual,
|
||||
LessThanOrEqual
|
||||
)
|
||||
|
||||
|
||||
class IcebergClient:
|
||||
"""
|
||||
Client for querying OHLC data from Iceberg warehouse (Iceberg 1.10.1).
|
||||
|
||||
Note: Iceberg 1.x does not enforce primary keys at the table level.
|
||||
Deduplication is handled by:
|
||||
- Flink upsert mode with equality delete files
|
||||
- PyIceberg automatically filters deleted rows during queries
|
||||
- Last-write-wins semantics for duplicates
|
||||
|
||||
Provides:
|
||||
- Query OHLC data by ticker, period, and time range
|
||||
- Identify missing data gaps
|
||||
- Efficient partition pruning for large datasets
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
catalog_uri: str,
|
||||
namespace: str = "trading",
|
||||
s3_endpoint: Optional[str] = None,
|
||||
s3_access_key: Optional[str] = None,
|
||||
s3_secret_key: Optional[str] = None,
|
||||
):
|
||||
"""
|
||||
Initialize Iceberg client.
|
||||
|
||||
Args:
|
||||
catalog_uri: URI of the Iceberg catalog (e.g., "http://iceberg-catalog:8181")
|
||||
namespace: Iceberg namespace (default: "trading")
|
||||
s3_endpoint: S3/MinIO endpoint URL (e.g., "http://localhost:9000")
|
||||
s3_access_key: S3/MinIO access key
|
||||
s3_secret_key: S3/MinIO secret key
|
||||
"""
|
||||
self.catalog_uri = catalog_uri
|
||||
self.namespace = namespace
|
||||
|
||||
catalog_props = {"uri": catalog_uri}
|
||||
if s3_endpoint:
|
||||
catalog_props["s3.endpoint"] = s3_endpoint
|
||||
catalog_props["s3.path-style-access"] = "true"
|
||||
if s3_access_key:
|
||||
catalog_props["s3.access-key-id"] = s3_access_key
|
||||
if s3_secret_key:
|
||||
catalog_props["s3.secret-access-key"] = s3_secret_key
|
||||
|
||||
self.catalog = load_catalog("trading", **catalog_props)
|
||||
self.table = self.catalog.load_table(f"{namespace}.ohlc")
|
||||
|
||||
def query_ohlc(
|
||||
self,
|
||||
ticker: str,
|
||||
period_seconds: int,
|
||||
start_time: int,
|
||||
end_time: int
|
||||
) -> pd.DataFrame:
|
||||
"""
|
||||
Query OHLC data for a specific ticker, period, and time range.
|
||||
|
||||
Args:
|
||||
ticker: Market identifier (e.g., "BINANCE:BTC/USDT")
|
||||
period_seconds: OHLC period in seconds (60, 300, 3600, etc.)
|
||||
start_time: Start timestamp in microseconds
|
||||
end_time: End timestamp in microseconds
|
||||
|
||||
Returns:
|
||||
DataFrame with OHLC data sorted by timestamp
|
||||
"""
|
||||
# Reload table metadata to pick up snapshots committed after this client was initialized
|
||||
self.table = self.catalog.load_table(f"{self.namespace}.ohlc")
|
||||
|
||||
df = self.table.scan(
|
||||
row_filter=And(
|
||||
EqualTo("ticker", ticker),
|
||||
EqualTo("period_seconds", period_seconds),
|
||||
GreaterThanOrEqual("timestamp", start_time),
|
||||
LessThanOrEqual("timestamp", end_time)
|
||||
)
|
||||
).to_pandas()
|
||||
|
||||
if not df.empty:
|
||||
df = df.sort_values("timestamp")
|
||||
|
||||
return df
|
||||
|
||||
def find_missing_ranges(
|
||||
self,
|
||||
ticker: str,
|
||||
period_seconds: int,
|
||||
start_time: int,
|
||||
end_time: int
|
||||
) -> List[Tuple[int, int]]:
|
||||
"""
|
||||
Identify missing data ranges in the requested time period.
|
||||
|
||||
Returns list of (start, end) tuples for missing ranges.
|
||||
Expected candles are calculated based on period_seconds.
|
||||
|
||||
Args:
|
||||
ticker: Market identifier
|
||||
period_seconds: OHLC period in seconds
|
||||
start_time: Start timestamp in microseconds
|
||||
end_time: End timestamp in microseconds
|
||||
|
||||
Returns:
|
||||
List of (start_time, end_time) tuples for missing ranges
|
||||
"""
|
||||
df = self.query_ohlc(ticker, period_seconds, start_time, end_time)
|
||||
|
||||
if df.empty:
|
||||
# All data is missing
|
||||
return [(start_time, end_time)]
|
||||
|
||||
# Convert period to microseconds
|
||||
period_micros = period_seconds * 1_000_000
|
||||
|
||||
# Generate expected timestamps
|
||||
expected_timestamps = list(range(start_time, end_time + 1, period_micros))
|
||||
actual_timestamps = set(df['timestamp'].values)
|
||||
|
||||
# Find gaps
|
||||
missing = sorted(set(expected_timestamps) - actual_timestamps)
|
||||
|
||||
if not missing:
|
||||
return []
|
||||
|
||||
# Consolidate consecutive missing timestamps into ranges
|
||||
ranges = []
|
||||
range_start = missing[0]
|
||||
prev_ts = missing[0]
|
||||
|
||||
for ts in missing[1:]:
|
||||
if ts > prev_ts + period_micros:
|
||||
# Gap in missing data - close previous range
|
||||
ranges.append((range_start, prev_ts))
|
||||
range_start = ts
|
||||
prev_ts = ts
|
||||
|
||||
# Close final range
|
||||
ranges.append((range_start, prev_ts))
|
||||
|
||||
return ranges
|
||||
|
||||
def has_data(
|
||||
self,
|
||||
ticker: str,
|
||||
period_seconds: int,
|
||||
start_time: int,
|
||||
end_time: int
|
||||
) -> bool:
|
||||
"""
|
||||
Check if any data exists for the given parameters.
|
||||
|
||||
Args:
|
||||
ticker: Market identifier
|
||||
period_seconds: OHLC period in seconds
|
||||
start_time: Start timestamp in microseconds
|
||||
end_time: End timestamp in microseconds
|
||||
|
||||
Returns:
|
||||
True if at least one candle exists, False otherwise
|
||||
"""
|
||||
df = self.query_ohlc(ticker, period_seconds, start_time, end_time)
|
||||
return not df.empty
|
||||
142
client-py/dexorder/ohlc_client.py
Normal file
142
client-py/dexorder/ohlc_client.py
Normal file
@@ -0,0 +1,142 @@
|
||||
"""
|
||||
OHLCClient - High-level API for fetching OHLC data with smart caching
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import pandas as pd
|
||||
from typing import Optional
|
||||
from .iceberg_client import IcebergClient
|
||||
from .history_client import HistoryClient
|
||||
|
||||
|
||||
class OHLCClient:
|
||||
"""
|
||||
High-level client for fetching OHLC data.
|
||||
|
||||
Workflow:
|
||||
1. Check Iceberg for existing data
|
||||
2. Identify missing ranges
|
||||
3. Request missing data via relay
|
||||
4. Wait for notification
|
||||
5. Query Iceberg for complete dataset
|
||||
6. Return combined results
|
||||
|
||||
This provides transparent caching - clients don't need to know
|
||||
whether data came from cache or was fetched on-demand.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
iceberg_catalog_uri: str,
|
||||
relay_endpoint: str,
|
||||
notification_endpoint: str,
|
||||
namespace: str = "trading",
|
||||
s3_endpoint: str = None,
|
||||
s3_access_key: str = None,
|
||||
s3_secret_key: str = None,
|
||||
):
|
||||
"""
|
||||
Initialize OHLC client.
|
||||
|
||||
Args:
|
||||
iceberg_catalog_uri: URI of Iceberg catalog
|
||||
relay_endpoint: ZMQ endpoint for relay requests
|
||||
notification_endpoint: ZMQ endpoint for notifications
|
||||
namespace: Iceberg namespace (default: "trading")
|
||||
s3_endpoint: S3/MinIO endpoint URL (e.g., "http://localhost:9000")
|
||||
s3_access_key: S3/MinIO access key
|
||||
s3_secret_key: S3/MinIO secret key
|
||||
"""
|
||||
self.iceberg = IcebergClient(
|
||||
iceberg_catalog_uri, namespace,
|
||||
s3_endpoint=s3_endpoint,
|
||||
s3_access_key=s3_access_key,
|
||||
s3_secret_key=s3_secret_key,
|
||||
)
|
||||
self.history = HistoryClient(relay_endpoint, notification_endpoint)
|
||||
|
||||
async def start(self):
|
||||
"""
|
||||
Start the client. Must be called before making requests.
|
||||
Starts background notification listener.
|
||||
"""
|
||||
await self.history.connect()
|
||||
|
||||
async def stop(self):
|
||||
"""
|
||||
Stop the client and cleanup resources.
|
||||
"""
|
||||
await self.history.close()
|
||||
|
||||
async def fetch_ohlc(
|
||||
self,
|
||||
ticker: str,
|
||||
period_seconds: int,
|
||||
start_time: int,
|
||||
end_time: int,
|
||||
request_timeout: float = 30.0
|
||||
) -> pd.DataFrame:
|
||||
"""
|
||||
Fetch OHLC data with smart caching.
|
||||
|
||||
Steps:
|
||||
1. Query Iceberg for existing data
|
||||
2. If complete, return immediately
|
||||
3. If missing data, request via relay
|
||||
4. Wait for completion notification
|
||||
5. Query Iceberg again for complete dataset
|
||||
6. Return results
|
||||
|
||||
Args:
|
||||
ticker: Market identifier (e.g., "BINANCE:BTC/USDT")
|
||||
period_seconds: OHLC period in seconds (60, 300, 3600, etc.)
|
||||
start_time: Start timestamp in microseconds
|
||||
end_time: End timestamp in microseconds
|
||||
request_timeout: Timeout for historical data requests (default: 30s)
|
||||
|
||||
Returns:
|
||||
DataFrame with OHLC data sorted by timestamp
|
||||
|
||||
Raises:
|
||||
TimeoutError: If historical data request times out
|
||||
ValueError: If request fails
|
||||
"""
|
||||
# Step 1: Check Iceberg for existing data
|
||||
df = self.iceberg.query_ohlc(ticker, period_seconds, start_time, end_time)
|
||||
|
||||
# Step 2: Identify missing ranges
|
||||
missing_ranges = self.iceberg.find_missing_ranges(
|
||||
ticker, period_seconds, start_time, end_time
|
||||
)
|
||||
|
||||
if not missing_ranges:
|
||||
# All data exists in Iceberg
|
||||
return df
|
||||
|
||||
# Step 3: Request missing data for each range
|
||||
# For simplicity, request entire range (relay can merge adjacent requests)
|
||||
result = await self.history.request_historical_ohlc(
|
||||
ticker=ticker,
|
||||
period_seconds=period_seconds,
|
||||
start_time=start_time,
|
||||
end_time=end_time,
|
||||
timeout=request_timeout
|
||||
)
|
||||
|
||||
# Step 4: Check result status
|
||||
if result['status'] == 'ERROR':
|
||||
raise ValueError(f"Historical data request failed: {result['error_message']}")
|
||||
|
||||
# Step 5: Query Iceberg again for complete dataset
|
||||
df = self.iceberg.query_ohlc(ticker, period_seconds, start_time, end_time)
|
||||
|
||||
return df
|
||||
|
||||
async def __aenter__(self):
|
||||
"""Support async context manager."""
|
||||
await self.start()
|
||||
return self
|
||||
|
||||
async def __aexit__(self, exc_type, exc_val, exc_tb):
|
||||
"""Support async context manager."""
|
||||
await self.stop()
|
||||
23
client-py/setup.py
Normal file
23
client-py/setup.py
Normal file
@@ -0,0 +1,23 @@
|
||||
from setuptools import setup, find_packages
|
||||
|
||||
setup(
|
||||
name="dexorder-client",
|
||||
version="0.1.0",
|
||||
description="DexOrder Trading Platform Python Client",
|
||||
packages=find_packages(),
|
||||
python_requires=">=3.9",
|
||||
install_requires=[
|
||||
"pyiceberg>=0.6.0",
|
||||
"pyarrow>=14.0.0",
|
||||
"pandas>=2.0.0",
|
||||
"zmq>=0.0.0",
|
||||
"protobuf>=4.25.0",
|
||||
"pyyaml>=6.0",
|
||||
],
|
||||
extras_require={
|
||||
"dev": [
|
||||
"pytest>=7.0.0",
|
||||
"pytest-asyncio>=0.21.0",
|
||||
]
|
||||
},
|
||||
)
|
||||
Reference in New Issue
Block a user