Files
ai/gateway/knowledge/trading/strategies/stocks/ml-knn.md
Tim Olson 47471b7700 Expand model tag support: add GLM-5.1, simplify Anthropic IDs, scan tags anywhere in message
- Flink update_bars debouncing
- update_bars subscription idempotency bugfix
- Price decimal correction bugfix of previous commit
- Add GLM-5.1 model tag alongside renamed GLM-5
- Use short Anthropic model IDs (sonnet/haiku/opus) instead of full version strings
- Allow @tags anywhere in message content, not just at start
- Return hasOtherContent flag instead of trimmed rest string
- Only trigger greeting stream when tag has no other content
- Update workspace knowledge base references to platform/workspace and platform/shapes
- Hierarchical knowledge base catalog
- 151 Trading Strategies knowledge base articles
- Shapes knowledge base article
- MutateShapes tool instead of workspace patch
2026-04-28 15:05:15 -04:00

4.3 KiB

description, tags
description tags
Predicts a stock's future T-day cumulative return using the K-nearest-neighbor algorithm on normalized price and volume features, then trades based on the predicted return signal.
stocks
machine-learning
knn
prediction

Machine Learning — Single-Stock KNN

Section: 3.17 | Asset Class: Stocks | Type: Machine Learning / Prediction

Overview

This single-stock strategy uses the k-nearest-neighbor (KNN) algorithm to predict future cumulative stock returns based on a set of predictor (feature) variables derived from the stock's own price and volume history. For each stock, the model is trained independently using only that stock's data (no cross-sectional information). The predicted return is then used to generate long/short signals.

Construction / Signal

Target variable — cumulative return over the next T trading days:

Y(t) = P(t-T) / P(t) - 1                                  (332)

(t ascending corresponds to going back in time; t=0 is today)

Predictor variables (moving averages of volume and price over varying windows T_1, T_2, T_3, ...):

X_1(t) = (1/T_1) * sum_{s=1}^{T_1} V(t+s)                (333)   [volume MA]
X_2(t) = (1/T_2) * sum_{s=1}^{T_2} P(t+s)                (334)   [price MA 1]
X_3(t) = (1/T_3) * sum_{s=1}^{T_3} P(t+s)                (335)   [price MA 2]
...                                                        (336)

Predictor variables are normalized to [0, 1] using the training period's min/max:

X_tilde_a(t) = (X_a(t) - X_a^-) / (X_a^+ - X_a^-)       (337)

where X_a^+ and X_a^- are the max and min of X_a(t) over the training period.

KNN prediction — for a given t, find the k nearest neighbors of X_tilde_a(t) among training points t' = t+1, t+2, ..., t+T_* using Euclidean distance:

[D(t, t')]^2 = sum_{a=1}^{m} (X_tilde_a(t) - X_tilde_a(t'))^2   (338)

Predicted return (simple average):

Y(t) = (1/k) * sum_{alpha=1}^{k} Y(t'_alpha(t))           (339)

Alternatively, fit a linear model with weights w_alpha and intercept v:

Y(t) = sum_{alpha=1}^{k} Y(t'_alpha(t)) w_alpha + v       (340)

trained by regressing Y(t) on the k neighbor returns over M values of t.

Trading signal (z_1, z_2 are trader-defined thresholds):

Signal = { Establish long position if Y > z_1
          { Liquidate long position if Y <= z_2
          { Establish short position if Y < -z_1
          { Liquidate short position if Y >= -z_2           (341)

Entry / Exit Rules

  • Long entry: Predicted cumulative return Y = Y(0) > z_1
  • Long exit: Predicted return Y <= z_2 (where z_2 <= z_1)
  • Short entry: Predicted return Y < -z_1
  • Short exit: Predicted return Y >= -z_2
  • All thresholds must be backtested out-of-sample.

Key Parameters

  • Number of neighbors k: Typically k = floor(sqrt(T_*)) or k = ceiling(sqrt(T_*)) (T_* = training sample size)
  • Training sample size T_*: Number of historical time points used for training
  • Prediction horizon T: Number of trading days for the target return
  • Feature set m: Number and type of predictor variables (volume MAs, price MAs)
  • Thresholds z_1, z_2: Entry and exit thresholds for signals (backtested)
  • Train/validation split: E.g., 60% training, 40% cross-validation
  • Distance metric: Euclidean (Eq. 338) or Manhattan distance

Variations

  • Weighted KNN: Use distance-based weights for the k neighbors instead of uniform averaging (Eq. 340)
  • Cross-sectional extension: Compute expected returns Y_i for N stocks and use as inputs to cross-sectional mean-reversion or other multi-stock strategies
  • Alternative features: Fundamental data, earnings surprises, sentiment indicators in addition to price/volume

Notes

  • This is a single-stock strategy: each stock's model is trained on that stock's own price/volume data only.
  • The strategy must be backtested strictly out-of-sample; data leakage is a critical risk.
  • Simple uniform KNN (Eq. 339) has no parameters to train; the linear model (Eq. 340) requires cross-validation and is prone to out-of-sample instability.
  • k can be optimized via backtesting; common heuristic: k = floor(sqrt(T_*)).
  • Typical holding period: T trading days (matching the prediction horizon).
  • Training/cross-validation split: e.g., 60%/40%.