Expand model tag support: add GLM-5.1, simplify Anthropic IDs, scan tags anywhere in message

- Flink update_bars debouncing - update_bars subscription idempotency bugfix - Price decimal correction bugfix of previous commit - Add GLM-5.1 model tag alongside renamed GLM-5 - Use short Anthropic model IDs (sonnet/haiku/opus) instead of full version strings - Allow @tags anywhere in message content, not just at start - Return hasOtherContent flag instead of trimmed rest string - Only trigger greeting stream when tag has no other content - Update workspace knowledge base references to platform/workspace and platform/shapes - Hierarchical knowledge base catalog - 151 Trading Strategies knowledge base articles - Shapes knowledge base article - MutateShapes tool instead of workspace patch
2026-04-28 15:05:15 -04:00
parent d41fcd0499
commit 47471b7700
184 changed files with 9044 additions and 170 deletions
--- a/gateway/knowledge/trading/strategies/crypto/ann-strategy.md
+++ b/gateway/knowledge/trading/strategies/crypto/ann-strategy.md
@@ -0,0 +1,175 @@
+---
+description: "An artificial neural network (ANN) strategy that forecasts short-term BTC price movements using technical indicators (EMA, EMSD, RSI) as inputs and quantile-based classification as output."
+tags: [crypto, machine-learning, ann, bitcoin, technical-analysis]
+---
+
+# Artificial Neural Network (ANN) Strategy
+
+**Section**: 18.2 | **Asset Class**: Cryptocurrencies | **Type**: Machine learning / Price prediction
+
+## Overview
+
+This strategy uses an ANN to forecast short-term movements of BTC price based on input technical indicators. Unlike equities, cryptocurrencies have no evident "fundamentals" on which to build value-based strategies, so cryptocurrency trading strategies tend to rely on trend data mining via machine learning techniques. The ANN classifies the future normalized return into quantile buckets and generates buy/sell signals accordingly.
+
+## Construction / Mechanics
+
+### Price and Return Normalization
+
+Let `P(t)` be the BTC price at time `t`, where `t = 1, 2, ...` is measured in some units (e.g., 15-minute intervals; `t = 1` is the most recent time).
+
+**Return:**
+```
+R(t) = P(t)/P(t+1) - 1                                          (521)
+```
+
+**Serial mean return** over T₁ periods:
+```
+R_bar(t, T₁) = (1/T₁) * sum_{t'=t+1}^{t+T₁} R(t')             (523)
+```
+
+**Serially demeaned return:**
+```
+R_tilde(t, T₁) = R(t) - R_bar(t, T₁)                           (522)
+```
+
+**Variance:**
+```
+[sigma(t, T₁)]² = (1/(T₁-1)) * sum_{t'=t+1}^{t+T₁} [R_tilde(t', T₁)]²   (525)
+```
+
+**Normalized (serially demeaned) return:**
+```
+R_hat(t, T₁) = R_tilde(t, T₁) / sigma(t, T₁)                  (524)
+```
+
+For notational simplicity the T₁ parameter is omitted below and `R_hat(t)` denotes the normalized return. T₁ should be chosen long enough to provide a reasonable volatility estimate.
+
+### Input Layer: Technical Indicators
+
+**Exponential Moving Average (EMA):**
+```
+EMA(t, lambda, tau) = ((1-lambda)/(1-lambda^tau)) * sum_{t'=t+1}^{t+tau} lambda^{t'-t-1} * R_hat(t')    (526)
+```
+
+**Exponential Moving Standard Deviation (EMSD):**
+```
+[EMSD(t, lambda, tau)]² = ((1-lambda)/(lambda - lambda^tau)) * sum_{t'=t+1}^{t+tau} lambda^{t'-t-1} * [R_hat(t') - EMA(t, lambda, tau)]²    (527)
+```
+
+**Relative Strength Index (RSI):**
+```
+RSI(t, tau) = nu_+(t, tau) / [nu_+(t, tau) + nu_-(t, tau)]      (528)
+
+nu_±(t, tau) = sum_{t'=t+1}^{t+tau} max(±R_hat(t'), 0)          (529)
+```
+
+Where: `tau` is the moving average length; `lambda` is the exponential smoothing parameter (to reduce parameters, one can set `lambda = (tau-1)/(tau+1)`).
+
+Typically RSI > 0.7 is interpreted as overbought; RSI < 0.3 as oversold.
+
+### Input Layer Construction
+
+The input layer consists of:
+- `R_hat(t)` — the current normalized return
+- `EMA(t, lambda_a, tau_a)` for `a = 1, ..., m`
+- `EMSD(t, lambda_a, tau_a)` for `a = 1, ..., m`
+- `RSI(t, tau_{a'})` for `a' = 1, ..., m'`
+
+Example parameter choices (from the literature):
+- `tau_a` corresponding to 30 min, 1 hr, 3 hrs, 6 hrs (so `m = 4`)
+- `tau_{a'}` corresponding to 3 hrs, 6 hrs, 12 hrs (so `m' = 3`)
+
+### Output Layer: Quantile Classification
+
+The objective is to forecast which quantile the future normalized return `R_hat(t)` will belong to.
+
+Let `K` be the number of quantiles. For training dataset `D_train`, compute the `(K-1)` quantile values `q_alpha`, `alpha = 1, ..., K-1`, of `R_hat(t)`, `t in D_train`.
+
+Define supervisory K-vectors `S_alpha(t)`, `alpha = 1, ..., K`:
+```
+S_1(t) = 1,    if R_hat(t) <= q_1
+S_alpha(t) = 1, if q_{alpha-1} <= R_hat(t) < q_alpha,  for 1 < alpha < K    (530)
+S_K(t) = 1,    if q_{K-1} <= R_hat(t)
+S_alpha(t) = 0, otherwise
+```
+
+The output layer produces a nonnegative K-vector `p_alpha(t)` of class probabilities:
+```
+sum_{alpha=1}^{K} p_alpha(t) = 1                                 (531)
+```
+
+### Network Architecture
+
+The ANN has `L` layers labeled `l = 1, ..., L`:
+- `l = 1`: input layer
+- `l = L`: output layer
+- Intermediate layers: hidden layers
+
+At each layer `l`, there are `N^(l)` nodes with vectors `X_vec^(l)` having components `X_{i(l)}^(l)`:
+
+**Forward propagation:**
+```
+X_{i(l)}^(l) = h_{i(l)}^(l)(Y_vec^(l)),    l = 2, ..., L        (532)
+
+Y_{i(l)}^(l) = sum_{j(l-1)=1}^{N^(l-1)} A_{i(l)j(l-1)}^(l) * X_{j(l-1)}^(l-1) + B_{i(l)}^(l)    (533)
+```
+
+Where: `A_{i(l)j(l-1)}^(l)` are the weights; `B_{i(l)}^(l)` are the biases (both determined via training).
+
+**Activation functions:**
+
+Hidden layers use ReLU:
+```
+h_{i(l)}^(l)(Y_vec^(l)) = max(Y_{i(l)}^(l), 0),    l = 2, ..., L-1    (534)
+```
+
+Output layer uses softmax (ensuring probabilities sum to 1):
+```
+h_{i(L)}^(L)(Y_vec^(L)) = Y_{i(L)}^(L) * [sum_{j(L)=1}^{N(L)} Y_{j(L)}^(L)]^{-1}    (535)
+```
+
+ReLU fires a neuron only if `Y_{i(l)}^(l) > 0`; softmax enforces condition (531).
+
+### Training: Cross-Entropy Loss
+
+The error function to minimize is the cross-entropy:
+```
+E = - sum_{t in D_train} sum_{alpha=1}^{K} S_alpha(t) * ln(p_alpha(t))    (536)
+```
+
+Minimized via stochastic gradient descent (SGD), which iterates until convergence.
+
+### Trading Signal
+
+```
+Signal = Buy,  iff  max(p_alpha(t)) = p_K(t)     (537)
+         Sell, iff  max(p_alpha(t)) = p_1(t)
+```
+
+The trader buys BTC if the predicted class is `p_K(t)` (the top quantile) and sells if it is `p_1(t)` (the bottom quantile). This rule can be modified — e.g., buy on top 2 quantiles and sell on bottom 2 quantiles.
+
+## Return Profile / Objective
+
+The strategy profits when the ANN correctly classifies the direction and magnitude of short-term BTC price movements. Returns are driven by the quality of the technical indicator signals and the ability of the trained network to generalize out-of-sample. Given BTC's high volatility, even modest directional accuracy can produce significant returns.
+
+## Key Parameters / Signals
+
+- `T₁`: lookback for return normalization (volatility estimation window)
+- `tau_a`: EMA/EMSD lookback periods (e.g., 30 min, 1 hr, 3 hrs, 6 hrs)
+- `tau_{a'}`: RSI lookback periods (e.g., 3 hrs, 6 hrs, 12 hrs)
+- `lambda`: exponential smoothing factor; can be set to `(tau-1)/(tau+1)`
+- `K`: number of quantile classes (e.g., K=2 for simple up/down)
+- `N^(l)`: number of nodes at each hidden layer
+- `L`: total number of layers
+- `d_1`: number of most-recent time points excluded from training data to ensure all indicators are computed on sufficient data
+
+## Variations
+
+- **K=2 binary classification**: simple up/down forecast; buy/sell signal directly
+- **K>2 multi-quantile**: more granular signal strength; trade only on extreme quantiles
+- **Extended indicator set**: add MACD, Bollinger Bands, volume indicators to the input layer
+- **LSTM/RNN variant**: replace feedforward ANN with recurrent architecture to better capture time-series dependencies
+
+## Notes
+
+The primary risk is overfitting: many free parameters (`tau_a`, `lambda_a`, `tau_{a'}`, `N^(l)`, `K`) must be chosen and the ever-present danger of overfitting various free parameters necessitates careful out-of-sample backtesting. The training dataset must exclude the most recent `d_1` time points to ensure all EMA, EMSD, and RSI values are computed using the required number of data points. This strategy is conceptually similar to the single-stock KNN trading strategy (Section 3.17) but uses ANN instead of k-nearest neighbors. No fundamental valuation of BTC is implied.