Files
ai/gateway/knowledge/trading/strategies/crypto/sentiment-naive-bayes.md
Tim Olson 47471b7700 Expand model tag support: add GLM-5.1, simplify Anthropic IDs, scan tags anywhere in message
- Flink update_bars debouncing
- update_bars subscription idempotency bugfix
- Price decimal correction bugfix of previous commit
- Add GLM-5.1 model tag alongside renamed GLM-5
- Use short Anthropic model IDs (sonnet/haiku/opus) instead of full version strings
- Allow @tags anywhere in message content, not just at start
- Return hasOtherContent flag instead of trimmed rest string
- Only trigger greeting stream when tag has no other content
- Update workspace knowledge base references to platform/workspace and platform/shapes
- Hierarchical knowledge base catalog
- 151 Trading Strategies knowledge base articles
- Shapes knowledge base article
- MutateShapes tool instead of workspace patch
2026-04-28 15:05:15 -04:00

6.2 KiB

description, tags
description tags
A naïve Bayes Bernoulli classifier applied to Twitter sentiment data to forecast BTC price direction, generating buy/sell signals from keyword-frequency feature vectors.
crypto
machine-learning
nlp
sentiment
bitcoin
naive-bayes

Sentiment Analysis — Naïve Bayes Bernoulli

Section: 18.3 | Asset Class: Cryptocurrencies | Type: Machine learning / NLP sentiment

Overview

This strategy applies a social media sentiment analysis classification scheme to forecast the direction (or quantile) of BTC price movements based on Twitter data. It uses the naïve Bayes Bernoulli model to classify tweets into outcome classes and generate trading signals. The premise is that aggregate social media sentiment contains predictive information about short-term crypto price movements.

Construction / Mechanics

Data Collection and Preprocessing

  1. Collect all tweets containing at least one keyword from a pertinent learning vocabulary V over some timeframe
  2. Clean data: remove duplicate tweets from bots, remove stop-words (e.g., "the", "is", "in", "which"), perform stemming (reduce words to base forms, e.g., "investing" and "invested" → "invest")
  3. Stemming can be performed using the Porter stemming algorithm

Let:

  • M = |V| = number of keywords in the learning vocabulary
  • N = number of tweets in the dataset
  • i = 1, ..., N labels tweets
  • a = 1, ..., M labels words w_a in V

Feature Vector Construction (Bernoulli Model)

Assign a feature M-vector X_i to each tweet i:

Bernoulli (binary presence/absence):

X_{ia} = 0    if word w_a not present in tweet T_i
X_{ia} = 1    if word w_a is present in tweet T_i      (Bernoulli)

Alternative (multinomial): X_{ia} = n_{ia}, the number of times w_a appears in T_i.

The Bernoulli case is the focus of this strategy.

Classification Framework

Define K outcome classes C_alpha, alpha = 1, ..., K:

  • Simplest case: K = 2 (BTC goes up or down) — provides buy/sell signal
  • Alternative: K quantiles of the normalized return R_hat(t) (as in the ANN strategy, Section 18.2)

Goal: given the N feature vectors X_1, ..., X_N, predict class C_alpha.

Bayesian Foundation

By Bayes' theorem:

P(A|B) = P(B|A) * P(A) / P(B)                                   (538)

The posterior probability of class C_alpha given features X_1, ..., X_N:

P(C_alpha | X_1, ..., X_N) = P(X_1, ..., X_N | C_alpha) * P(C_alpha) / P(X_1, ..., X_N)    (539)

Note P(X_1, ..., X_N) is independent of C_alpha and acts only as a normalization constant.

Naïve (Conditional Independence) Assumption

The naïve Bayes simplification assumes that for a given class C_alpha, all features X_i are conditionally independent:

P(X_i | C_alpha, X_1, ..., X_{i-1}, X_{i+1}, ..., X_N) = P(X_i | C_alpha)    (540)

This gives:

P(C_alpha | X_1, ..., X_N) = gamma * P(C_alpha) * prod_{i=1}^{N} P(X_i | C_alpha)    (541)

gamma = 1 / P(X_1, ..., X_N)                                     (542)

Bernoulli Likelihood

For the Bernoulli model, the conditional probability of feature vector X_i given class C_alpha:

P(X_i | C_alpha) = prod_{a=1}^{M} Q_{ia alpha}                   (543)

Where:

Q_{ia alpha} = P(w_a | C_alpha),      if X_{ia} = 1              (544)
Q_{ia alpha} = 1 - P(w_a | C_alpha),  if X_{ia} = 0              (545)

The conditional probabilities P(w_a | C_alpha) are estimated from word occurrence frequencies in the training data. Similarly, P(C_alpha) is estimated from the training data class frequencies.

Prediction Rule

Set the forecasted class C_pred to the one with maximum posterior probability:

C_pred = argmax_{C_alpha in {1,...,K}}  P(C_alpha) * prod_{i=1}^{N} prod_{a=1}^{M} [P(w_a | C_alpha)]^{X_{ia}} * [1 - P(w_a | C_alpha)]^{1 - X_{ia}}    (546)

Trading Signal

  • For K = 2: C_pred = 1 → Sell; C_pred = 2 → Buy (consistent with ANN signal convention)
  • For K quantiles: trade on extreme quantile predictions analogously to the ANN strategy

Return Profile / Objective

Returns are driven by the predictive content of Twitter sentiment about short-term BTC price direction. The strategy profits when aggregate social media tone (bullish vs. bearish language in the vocabulary) correlates with subsequent price movements. High crypto volatility means that even modest accuracy gains over random can generate meaningful returns.

Key Parameters / Signals

  • Vocabulary V: the set of keywords relevant to BTC price forecasting; quality of V critically affects performance
  • K: number of outcome classes (2 for binary up/down, or quantile-based)
  • M = |V|: vocabulary size; larger vocabulary increases model complexity and overfitting risk
  • Training window: timeframe for estimating P(C_alpha) and P(w_a | C_alpha)
  • Stemming algorithm: Porter or similar; affects effective vocabulary size
  • Stop-word list: removal of common non-informative words reduces noise

Variations

  • Multinomial naïve Bayes: uses word count n_{ia} instead of binary presence; can better capture emphasis
  • Support vector machine (SVM): alternative classifier on the same feature vectors
  • Logistic regression: another popular alternative to naïve Bayes for text classification
  • Tree boosting: gradient boosting applied to tweet feature vectors for BTC direction prediction
  • Multi-source sentiment: extend beyond Twitter to Reddit (r/bitcoin), news feeds, Telegram channels

Notes

The naïve (conditional independence) assumption is rarely exactly true but often works well in practice for text classification. The key challenge is vocabulary construction — the learning vocabulary V must be chosen to be informative about BTC price movements specifically, not just generally related to Bitcoin. Laplace smoothing (adding a pseudo-count) is typically applied to avoid zero probabilities for words not observed in a given class. Compared to the ANN strategy, naïve Bayes is more interpretable and computationally cheaper to train. Social media strategies are vulnerable to coordinated manipulation (pumping sentiment with bots) and regime changes in platform usage patterns.