- Flink update_bars debouncing - update_bars subscription idempotency bugfix - Price decimal correction bugfix of previous commit - Add GLM-5.1 model tag alongside renamed GLM-5 - Use short Anthropic model IDs (sonnet/haiku/opus) instead of full version strings - Allow @tags anywhere in message content, not just at start - Return hasOtherContent flag instead of trimmed rest string - Only trigger greeting stream when tag has no other content - Update workspace knowledge base references to platform/workspace and platform/shapes - Hierarchical knowledge base catalog - 151 Trading Strategies knowledge base articles - Shapes knowledge base article - MutateShapes tool instead of workspace patch
6.2 KiB
description, tags
| description | tags | ||||||
|---|---|---|---|---|---|---|---|
| A naïve Bayes Bernoulli classifier applied to Twitter sentiment data to forecast BTC price direction, generating buy/sell signals from keyword-frequency feature vectors. |
|
Sentiment Analysis — Naïve Bayes Bernoulli
Section: 18.3 | Asset Class: Cryptocurrencies | Type: Machine learning / NLP sentiment
Overview
This strategy applies a social media sentiment analysis classification scheme to forecast the direction (or quantile) of BTC price movements based on Twitter data. It uses the naïve Bayes Bernoulli model to classify tweets into outcome classes and generate trading signals. The premise is that aggregate social media sentiment contains predictive information about short-term crypto price movements.
Construction / Mechanics
Data Collection and Preprocessing
- Collect all tweets containing at least one keyword from a pertinent learning vocabulary
Vover some timeframe - Clean data: remove duplicate tweets from bots, remove stop-words (e.g., "the", "is", "in", "which"), perform stemming (reduce words to base forms, e.g., "investing" and "invested" → "invest")
- Stemming can be performed using the Porter stemming algorithm
Let:
M = |V|= number of keywords in the learning vocabularyN= number of tweets in the dataseti = 1, ..., Nlabels tweetsa = 1, ..., Mlabels wordsw_ainV
Feature Vector Construction (Bernoulli Model)
Assign a feature M-vector X_i to each tweet i:
Bernoulli (binary presence/absence):
X_{ia} = 0 if word w_a not present in tweet T_i
X_{ia} = 1 if word w_a is present in tweet T_i (Bernoulli)
Alternative (multinomial): X_{ia} = n_{ia}, the number of times w_a appears in T_i.
The Bernoulli case is the focus of this strategy.
Classification Framework
Define K outcome classes C_alpha, alpha = 1, ..., K:
- Simplest case:
K = 2(BTC goes up or down) — provides buy/sell signal - Alternative:
Kquantiles of the normalized returnR_hat(t)(as in the ANN strategy, Section 18.2)
Goal: given the N feature vectors X_1, ..., X_N, predict class C_alpha.
Bayesian Foundation
By Bayes' theorem:
P(A|B) = P(B|A) * P(A) / P(B) (538)
The posterior probability of class C_alpha given features X_1, ..., X_N:
P(C_alpha | X_1, ..., X_N) = P(X_1, ..., X_N | C_alpha) * P(C_alpha) / P(X_1, ..., X_N) (539)
Note P(X_1, ..., X_N) is independent of C_alpha and acts only as a normalization constant.
Naïve (Conditional Independence) Assumption
The naïve Bayes simplification assumes that for a given class C_alpha, all features X_i are conditionally independent:
P(X_i | C_alpha, X_1, ..., X_{i-1}, X_{i+1}, ..., X_N) = P(X_i | C_alpha) (540)
This gives:
P(C_alpha | X_1, ..., X_N) = gamma * P(C_alpha) * prod_{i=1}^{N} P(X_i | C_alpha) (541)
gamma = 1 / P(X_1, ..., X_N) (542)
Bernoulli Likelihood
For the Bernoulli model, the conditional probability of feature vector X_i given class C_alpha:
P(X_i | C_alpha) = prod_{a=1}^{M} Q_{ia alpha} (543)
Where:
Q_{ia alpha} = P(w_a | C_alpha), if X_{ia} = 1 (544)
Q_{ia alpha} = 1 - P(w_a | C_alpha), if X_{ia} = 0 (545)
The conditional probabilities P(w_a | C_alpha) are estimated from word occurrence frequencies in the training data. Similarly, P(C_alpha) is estimated from the training data class frequencies.
Prediction Rule
Set the forecasted class C_pred to the one with maximum posterior probability:
C_pred = argmax_{C_alpha in {1,...,K}} P(C_alpha) * prod_{i=1}^{N} prod_{a=1}^{M} [P(w_a | C_alpha)]^{X_{ia}} * [1 - P(w_a | C_alpha)]^{1 - X_{ia}} (546)
Trading Signal
- For
K = 2:C_pred = 1→ Sell;C_pred = 2→ Buy (consistent with ANN signal convention) - For
Kquantiles: trade on extreme quantile predictions analogously to the ANN strategy
Return Profile / Objective
Returns are driven by the predictive content of Twitter sentiment about short-term BTC price direction. The strategy profits when aggregate social media tone (bullish vs. bearish language in the vocabulary) correlates with subsequent price movements. High crypto volatility means that even modest accuracy gains over random can generate meaningful returns.
Key Parameters / Signals
- Vocabulary
V: the set of keywords relevant to BTC price forecasting; quality ofVcritically affects performance K: number of outcome classes (2 for binary up/down, or quantile-based)M = |V|: vocabulary size; larger vocabulary increases model complexity and overfitting risk- Training window: timeframe for estimating
P(C_alpha)andP(w_a | C_alpha) - Stemming algorithm: Porter or similar; affects effective vocabulary size
- Stop-word list: removal of common non-informative words reduces noise
Variations
- Multinomial naïve Bayes: uses word count
n_{ia}instead of binary presence; can better capture emphasis - Support vector machine (SVM): alternative classifier on the same feature vectors
- Logistic regression: another popular alternative to naïve Bayes for text classification
- Tree boosting: gradient boosting applied to tweet feature vectors for BTC direction prediction
- Multi-source sentiment: extend beyond Twitter to Reddit (r/bitcoin), news feeds, Telegram channels
Notes
The naïve (conditional independence) assumption is rarely exactly true but often works well in practice for text classification. The key challenge is vocabulary construction — the learning vocabulary V must be chosen to be informative about BTC price movements specifically, not just generally related to Bitcoin. Laplace smoothing (adding a pseudo-count) is typically applied to avoid zero probabilities for words not observed in a given class. Compared to the ANN strategy, naïve Bayes is more interpretable and computationally cheaper to train. Social media strategies are vulnerable to coordinated manipulation (pumping sentiment with bots) and regime changes in platform usage patterns.