Files
ai/gateway/knowledge/trading/strategies/crypto/ann-strategy.md
Tim Olson 47471b7700 Expand model tag support: add GLM-5.1, simplify Anthropic IDs, scan tags anywhere in message
- Flink update_bars debouncing
- update_bars subscription idempotency bugfix
- Price decimal correction bugfix of previous commit
- Add GLM-5.1 model tag alongside renamed GLM-5
- Use short Anthropic model IDs (sonnet/haiku/opus) instead of full version strings
- Allow @tags anywhere in message content, not just at start
- Return hasOtherContent flag instead of trimmed rest string
- Only trigger greeting stream when tag has no other content
- Update workspace knowledge base references to platform/workspace and platform/shapes
- Hierarchical knowledge base catalog
- 151 Trading Strategies knowledge base articles
- Shapes knowledge base article
- MutateShapes tool instead of workspace patch
2026-04-28 15:05:15 -04:00

7.2 KiB

description, tags
description tags
An artificial neural network (ANN) strategy that forecasts short-term BTC price movements using technical indicators (EMA, EMSD, RSI) as inputs and quantile-based classification as output.
crypto
machine-learning
ann
bitcoin
technical-analysis

Artificial Neural Network (ANN) Strategy

Section: 18.2 | Asset Class: Cryptocurrencies | Type: Machine learning / Price prediction

Overview

This strategy uses an ANN to forecast short-term movements of BTC price based on input technical indicators. Unlike equities, cryptocurrencies have no evident "fundamentals" on which to build value-based strategies, so cryptocurrency trading strategies tend to rely on trend data mining via machine learning techniques. The ANN classifies the future normalized return into quantile buckets and generates buy/sell signals accordingly.

Construction / Mechanics

Price and Return Normalization

Let P(t) be the BTC price at time t, where t = 1, 2, ... is measured in some units (e.g., 15-minute intervals; t = 1 is the most recent time).

Return:

R(t) = P(t)/P(t+1) - 1                                          (521)

Serial mean return over T₁ periods:

R_bar(t, T₁) = (1/T₁) * sum_{t'=t+1}^{t+T₁} R(t')             (523)

Serially demeaned return:

R_tilde(t, T₁) = R(t) - R_bar(t, T₁)                           (522)

Variance:

[sigma(t, T₁)]² = (1/(T₁-1)) * sum_{t'=t+1}^{t+T₁} [R_tilde(t', T₁)]²   (525)

Normalized (serially demeaned) return:

R_hat(t, T₁) = R_tilde(t, T₁) / sigma(t, T₁)                  (524)

For notational simplicity the T₁ parameter is omitted below and R_hat(t) denotes the normalized return. T₁ should be chosen long enough to provide a reasonable volatility estimate.

Input Layer: Technical Indicators

Exponential Moving Average (EMA):

EMA(t, lambda, tau) = ((1-lambda)/(1-lambda^tau)) * sum_{t'=t+1}^{t+tau} lambda^{t'-t-1} * R_hat(t')    (526)

Exponential Moving Standard Deviation (EMSD):

[EMSD(t, lambda, tau)]² = ((1-lambda)/(lambda - lambda^tau)) * sum_{t'=t+1}^{t+tau} lambda^{t'-t-1} * [R_hat(t') - EMA(t, lambda, tau)]²    (527)

Relative Strength Index (RSI):

RSI(t, tau) = nu_+(t, tau) / [nu_+(t, tau) + nu_-(t, tau)]      (528)

nu_±(t, tau) = sum_{t'=t+1}^{t+tau} max(±R_hat(t'), 0)          (529)

Where: tau is the moving average length; lambda is the exponential smoothing parameter (to reduce parameters, one can set lambda = (tau-1)/(tau+1)).

Typically RSI > 0.7 is interpreted as overbought; RSI < 0.3 as oversold.

Input Layer Construction

The input layer consists of:

  • R_hat(t) — the current normalized return
  • EMA(t, lambda_a, tau_a) for a = 1, ..., m
  • EMSD(t, lambda_a, tau_a) for a = 1, ..., m
  • RSI(t, tau_{a'}) for a' = 1, ..., m'

Example parameter choices (from the literature):

  • tau_a corresponding to 30 min, 1 hr, 3 hrs, 6 hrs (so m = 4)
  • tau_{a'} corresponding to 3 hrs, 6 hrs, 12 hrs (so m' = 3)

Output Layer: Quantile Classification

The objective is to forecast which quantile the future normalized return R_hat(t) will belong to.

Let K be the number of quantiles. For training dataset D_train, compute the (K-1) quantile values q_alpha, alpha = 1, ..., K-1, of R_hat(t), t in D_train.

Define supervisory K-vectors S_alpha(t), alpha = 1, ..., K:

S_1(t) = 1,    if R_hat(t) <= q_1
S_alpha(t) = 1, if q_{alpha-1} <= R_hat(t) < q_alpha,  for 1 < alpha < K    (530)
S_K(t) = 1,    if q_{K-1} <= R_hat(t)
S_alpha(t) = 0, otherwise

The output layer produces a nonnegative K-vector p_alpha(t) of class probabilities:

sum_{alpha=1}^{K} p_alpha(t) = 1                                 (531)

Network Architecture

The ANN has L layers labeled l = 1, ..., L:

  • l = 1: input layer
  • l = L: output layer
  • Intermediate layers: hidden layers

At each layer l, there are N^(l) nodes with vectors X_vec^(l) having components X_{i(l)}^(l):

Forward propagation:

X_{i(l)}^(l) = h_{i(l)}^(l)(Y_vec^(l)),    l = 2, ..., L        (532)

Y_{i(l)}^(l) = sum_{j(l-1)=1}^{N^(l-1)} A_{i(l)j(l-1)}^(l) * X_{j(l-1)}^(l-1) + B_{i(l)}^(l)    (533)

Where: A_{i(l)j(l-1)}^(l) are the weights; B_{i(l)}^(l) are the biases (both determined via training).

Activation functions:

Hidden layers use ReLU:

h_{i(l)}^(l)(Y_vec^(l)) = max(Y_{i(l)}^(l), 0),    l = 2, ..., L-1    (534)

Output layer uses softmax (ensuring probabilities sum to 1):

h_{i(L)}^(L)(Y_vec^(L)) = Y_{i(L)}^(L) * [sum_{j(L)=1}^{N(L)} Y_{j(L)}^(L)]^{-1}    (535)

ReLU fires a neuron only if Y_{i(l)}^(l) > 0; softmax enforces condition (531).

Training: Cross-Entropy Loss

The error function to minimize is the cross-entropy:

E = - sum_{t in D_train} sum_{alpha=1}^{K} S_alpha(t) * ln(p_alpha(t))    (536)

Minimized via stochastic gradient descent (SGD), which iterates until convergence.

Trading Signal

Signal = Buy,  iff  max(p_alpha(t)) = p_K(t)     (537)
         Sell, iff  max(p_alpha(t)) = p_1(t)

The trader buys BTC if the predicted class is p_K(t) (the top quantile) and sells if it is p_1(t) (the bottom quantile). This rule can be modified — e.g., buy on top 2 quantiles and sell on bottom 2 quantiles.

Return Profile / Objective

The strategy profits when the ANN correctly classifies the direction and magnitude of short-term BTC price movements. Returns are driven by the quality of the technical indicator signals and the ability of the trained network to generalize out-of-sample. Given BTC's high volatility, even modest directional accuracy can produce significant returns.

Key Parameters / Signals

  • T₁: lookback for return normalization (volatility estimation window)
  • tau_a: EMA/EMSD lookback periods (e.g., 30 min, 1 hr, 3 hrs, 6 hrs)
  • tau_{a'}: RSI lookback periods (e.g., 3 hrs, 6 hrs, 12 hrs)
  • lambda: exponential smoothing factor; can be set to (tau-1)/(tau+1)
  • K: number of quantile classes (e.g., K=2 for simple up/down)
  • N^(l): number of nodes at each hidden layer
  • L: total number of layers
  • d_1: number of most-recent time points excluded from training data to ensure all indicators are computed on sufficient data

Variations

  • K=2 binary classification: simple up/down forecast; buy/sell signal directly
  • K>2 multi-quantile: more granular signal strength; trade only on extreme quantiles
  • Extended indicator set: add MACD, Bollinger Bands, volume indicators to the input layer
  • LSTM/RNN variant: replace feedforward ANN with recurrent architecture to better capture time-series dependencies

Notes

The primary risk is overfitting: many free parameters (tau_a, lambda_a, tau_{a'}, N^(l), K) must be chosen and the ever-present danger of overfitting various free parameters necessitates careful out-of-sample backtesting. The training dataset must exclude the most recent d_1 time points to ensure all EMA, EMSD, and RSI values are computed using the required number of data points. This strategy is conceptually similar to the single-stock KNN trading strategy (Section 3.17) but uses ANN instead of k-nearest neighbors. No fundamental valuation of BTC is implied.