--- description: "A naïve Bayes Bernoulli classifier applied to Twitter sentiment data to forecast BTC price direction, generating buy/sell signals from keyword-frequency feature vectors." tags: [crypto, machine-learning, nlp, sentiment, bitcoin, naive-bayes] --- # Sentiment Analysis — Naïve Bayes Bernoulli **Section**: 18.3 | **Asset Class**: Cryptocurrencies | **Type**: Machine learning / NLP sentiment ## Overview This strategy applies a social media sentiment analysis classification scheme to forecast the direction (or quantile) of BTC price movements based on Twitter data. It uses the naïve Bayes Bernoulli model to classify tweets into outcome classes and generate trading signals. The premise is that aggregate social media sentiment contains predictive information about short-term crypto price movements. ## Construction / Mechanics ### Data Collection and Preprocessing 1. Collect all tweets containing at least one keyword from a pertinent learning vocabulary `V` over some timeframe 2. Clean data: remove duplicate tweets from bots, remove stop-words (e.g., "the", "is", "in", "which"), perform stemming (reduce words to base forms, e.g., "investing" and "invested" → "invest") 3. Stemming can be performed using the Porter stemming algorithm Let: - `M = |V|` = number of keywords in the learning vocabulary - `N` = number of tweets in the dataset - `i = 1, ..., N` labels tweets - `a = 1, ..., M` labels words `w_a` in `V` ### Feature Vector Construction (Bernoulli Model) Assign a feature M-vector `X_i` to each tweet `i`: **Bernoulli (binary presence/absence):** ``` X_{ia} = 0 if word w_a not present in tweet T_i X_{ia} = 1 if word w_a is present in tweet T_i (Bernoulli) ``` Alternative (multinomial): `X_{ia} = n_{ia}`, the number of times `w_a` appears in `T_i`. The Bernoulli case is the focus of this strategy. ### Classification Framework Define `K` outcome classes `C_alpha`, `alpha = 1, ..., K`: - Simplest case: `K = 2` (BTC goes up or down) — provides buy/sell signal - Alternative: `K` quantiles of the normalized return `R_hat(t)` (as in the ANN strategy, Section 18.2) **Goal:** given the `N` feature vectors `X_1, ..., X_N`, predict class `C_alpha`. ### Bayesian Foundation By Bayes' theorem: ``` P(A|B) = P(B|A) * P(A) / P(B) (538) ``` The posterior probability of class `C_alpha` given features `X_1, ..., X_N`: ``` P(C_alpha | X_1, ..., X_N) = P(X_1, ..., X_N | C_alpha) * P(C_alpha) / P(X_1, ..., X_N) (539) ``` Note `P(X_1, ..., X_N)` is independent of `C_alpha` and acts only as a normalization constant. ### Naïve (Conditional Independence) Assumption The naïve Bayes simplification assumes that for a given class `C_alpha`, all features `X_i` are conditionally independent: ``` P(X_i | C_alpha, X_1, ..., X_{i-1}, X_{i+1}, ..., X_N) = P(X_i | C_alpha) (540) ``` This gives: ``` P(C_alpha | X_1, ..., X_N) = gamma * P(C_alpha) * prod_{i=1}^{N} P(X_i | C_alpha) (541) gamma = 1 / P(X_1, ..., X_N) (542) ``` ### Bernoulli Likelihood For the Bernoulli model, the conditional probability of feature vector `X_i` given class `C_alpha`: ``` P(X_i | C_alpha) = prod_{a=1}^{M} Q_{ia alpha} (543) ``` Where: ``` Q_{ia alpha} = P(w_a | C_alpha), if X_{ia} = 1 (544) Q_{ia alpha} = 1 - P(w_a | C_alpha), if X_{ia} = 0 (545) ``` The conditional probabilities `P(w_a | C_alpha)` are estimated from word occurrence frequencies in the training data. Similarly, `P(C_alpha)` is estimated from the training data class frequencies. ### Prediction Rule Set the forecasted class `C_pred` to the one with maximum posterior probability: ``` C_pred = argmax_{C_alpha in {1,...,K}} P(C_alpha) * prod_{i=1}^{N} prod_{a=1}^{M} [P(w_a | C_alpha)]^{X_{ia}} * [1 - P(w_a | C_alpha)]^{1 - X_{ia}} (546) ``` ### Trading Signal - For `K = 2`: `C_pred = 1` → Sell; `C_pred = 2` → Buy (consistent with ANN signal convention) - For `K` quantiles: trade on extreme quantile predictions analogously to the ANN strategy ## Return Profile / Objective Returns are driven by the predictive content of Twitter sentiment about short-term BTC price direction. The strategy profits when aggregate social media tone (bullish vs. bearish language in the vocabulary) correlates with subsequent price movements. High crypto volatility means that even modest accuracy gains over random can generate meaningful returns. ## Key Parameters / Signals - **Vocabulary `V`**: the set of keywords relevant to BTC price forecasting; quality of `V` critically affects performance - **`K`**: number of outcome classes (2 for binary up/down, or quantile-based) - **`M = |V|`**: vocabulary size; larger vocabulary increases model complexity and overfitting risk - **Training window**: timeframe for estimating `P(C_alpha)` and `P(w_a | C_alpha)` - **Stemming algorithm**: Porter or similar; affects effective vocabulary size - **Stop-word list**: removal of common non-informative words reduces noise ## Variations - **Multinomial naïve Bayes**: uses word count `n_{ia}` instead of binary presence; can better capture emphasis - **Support vector machine (SVM)**: alternative classifier on the same feature vectors - **Logistic regression**: another popular alternative to naïve Bayes for text classification - **Tree boosting**: gradient boosting applied to tweet feature vectors for BTC direction prediction - **Multi-source sentiment**: extend beyond Twitter to Reddit (r/bitcoin), news feeds, Telegram channels ## Notes The naïve (conditional independence) assumption is rarely exactly true but often works well in practice for text classification. The key challenge is vocabulary construction — the learning vocabulary `V` must be chosen to be informative about BTC price movements specifically, not just generally related to Bitcoin. Laplace smoothing (adding a pseudo-count) is typically applied to avoid zero probabilities for words not observed in a given class. Compared to the ANN strategy, naïve Bayes is more interpretable and computationally cheaper to train. Social media strategies are vulnerable to coordinated manipulation (pumping sentiment with bots) and regime changes in platform usage patterns.