Chapter 1: Bayesian Inference
Bayesian inference revolves around the concept of using probability distributions to represent our beliefs about uncertain events or parameters. These beliefs are not static but evolve as we acquire new data.
Understanding through an example:
Imagine you're a detective trying to solve a mystery. You start with some initial ideas (suspects) about who might have committed the crime. This is like your prior belief in Bayesian inference.
Now, you start gathering clues (evidence). Each clue either strengthens or weakens your suspicion about each suspect. For example, finding a fingerprint at the scene might strongly suggest one suspect is guilty, while discovering an alibi for another suspect might make them seem less likely. This is like the likelihood in Bayesian inference.
As you gather more and more clues, you keep updating your belief about who the culprit is. You're not just throwing away your initial ideas, but you're also letting the evidence guide you to a more informed conclusion. This process of continually refining your belief based on new evidence is the heart of Bayesian inference.
The final verdict you reach (who you believe is the most likely culprit) is your posterior belief. It's a combination of your initial hunch and all the clues you've gathered along the way.
Analogy: The Cookie Jar
Let's use a simple analogy to make this even clearer:
- Jar A: 90% chocolate chip, 10% oatmeal
- Jar B: 50% chocolate chip, 50% oatmeal
You pick a cookie at random, and it's chocolate chip. Does this mean you definitely picked from Jar A? Not necessarily! It's still possible the cookie came from Jar B, just less likely.
Now, let's use Bayesian inference to figure this out:
- Prior Belief: Before you even pick a cookie, your prior belief is that there's a 50/50 chance you'll pick from either jar.
- Likelihood: The fact that you got a chocolate chip cookie is the evidence. This evidence is more likely if you picked from Jar A (90% chocolate chip) than from Jar B (50% chocolate chip).
- Posterior Belief: Combining your prior belief and the likelihood, you can now update your belief about which jar you picked from. Since chocolate chip cookies are more common in Jar A, your posterior belief will now lean more towards Jar A. However, it's not 100% certain because there's still a chance the cookie came from Jar B.
The Bayes Theorem:
Bayes' Theorem is more than just a formula; it's a way of thinking about how we update our beliefs based on new evidence. It reflects the idea that our initial beliefs (prior probabilities) can be modified as we gather more information (likelihoods) to arrive at a more accurate understanding (posterior probabilities).
A Deeper Look at the Formula
Let's revisit the formula with a more detailed explanation of each component:
P(A|B) = [P(B|A) * P(A)] / P(B)
- P(A|B): The Posterior Probability
            This is the updated probability of event A happening after we know that event B has occurred. In other words, it's our revised belief about A given the new evidence B. Think of it as the "after" picture: How has our perspective on A changed after learning about B? 
- P(B|A): The Likelihood
            This is the probability of observing the evidence (B) if the hypothesis (A) is true. It measures how well the evidence supports the hypothesis. Think of it as a compatibility check: How likely is it that we would see B if A were the case? 
- P(A): The Prior Probability
            This is our initial belief about the probability of event A happening before we consider any new evidence. Think of it as the "before" picture: What did we think about A before we knew anything about B? 
- P(B): The Marginal Likelihood (or Evidence)
            This is the overall probability of observing the evidence (B), regardless of whether the hypothesis (A) is true or not. It acts as a normalizing factor, ensuring that the posterior probability is valid. Think of it as the background information: How likely is B to happen in general? 
Calculating P(B): The Law of Total Probability
The calculation of P(B) often requires using the law of total probability, which states that the probability of an event can be found by summing the probabilities of that event occurring under each possible condition. In the context of Bayes' Theorem:
P(B) = [P(B|A) * P(A)] + [P(B|not A) * P(not A)]
This means that the probability of observing B is the sum of:
- The probability of B happening when A is true, weighted by the prior probability of A.
- The probability of B happening when A is false, weighted by the prior probability of not A.
Example 1: The Super Rare Disease
Imagine there's a super rare disease called "Zombitis" that only affects 1 out of every 1000 people (0.1%). Scientists have developed a test for Zombitis, but it's not perfect:
- True Positive: If someone has Zombitis, the test will correctly say they have it 99% of the time.
- False Positive: If someone doesn't have Zombitis, the test will incorrectly say they do have it 5% of the time.
The Question: You get tested for Zombitis, and the result is positive. Should you panic? What are the chances you actually have the disease?
Intuitive (But Incorrect) Thinking
It's tempting to think, "The test is 99% accurate, so I must have a 99% chance of having Zombitis!" But that's wrong. Here's why:
The test isn't just telling you whether you have the disease. It's also being influenced by how rare the disease is. Most people don't have Zombitis, so even a small false positive rate can lead to a lot of positive results in healthy people.
Using Bayes' Theorem to Find the Real Answer
Let's break down how Bayes' Theorem helps us calculate the actual probability you have Zombitis:
- Define the Events:
            A= You have Zombitis.
 B= The test result is positive.
- Know the Probabilities (From the Scenario):
            P(A) = 0.001(The probability you have Zombitis is 0.1%)
 P(B|A) = 0.99(The probability of a positive test given you have Zombitis is 99%)
 P(B|not A) = 0.05(The probability of a positive test given you don't have Zombitis is 5%)
- Calculate P(not A):P(not A) = 1 - P(A) = 0.999(The probability you don't have Zombitis is 99.9%)
- Calculate P(B)(The Total Probability of a Positive Test):This combines the probability of a true positive and a false positive. 
 P(B) = [P(B|A) * P(A)] + [P(B|not A) * P(not A)]
 P(B) = (0.99 * 0.001) + (0.05 * 0.999)
 P(B) = 0.00099 + 0.04995
 P(B) ≈ 0.05094(About a 5.1% chance of getting a positive test, regardless of whether you have Zombitis or not)
- Apply Bayes' Theorem:
            P(A|B) = [P(B|A) * P(A)] / P(B)
 P(A|B) = (0.99 * 0.001) / 0.05094
 P(A|B) ≈ 0.0194(About a 1.9% chance you actually have Zombitis given a positive test result)
The Big Reveal
Even though the test seems very accurate, your chance of actually having Zombitis after a positive test is only about 1.9%. This is because the disease is so rare, those false positives become a big deal!
Key Takeaway
Bayes' Theorem helps us understand that the probability of something being true isn't just based on the test result itself, but also on how common or rare that thing is in the first place.
Example 2: The Suspicious Email
Imagine you have a spam filter for your email. It's pretty good, but not perfect:
- 95% of spam emails are correctly identified as spam.
- 1% of legitimate emails are mistakenly identified as spam (false positives).
- Overall, about 20% of your emails are spam.
The Question: You get an email flagged as spam. What's the probability that it's actually spam?
Using Bayes' Theorem to Find the Real Answer
- Define the Events:
            A= The email is spam.
 B= The email is flagged as spam.
- Know the Probabilities (From the Scenario):
            P(A) = 0.20(The probability an email is spam is 20%)
 P(B|A) = 0.95(The probability of being flagged as spam given it is spam is 95%)
 P(B|not A) = 0.01(The probability of being flagged as spam given it's not spam is 1%)
- Calculate P(not A):P(not A) = 1 - P(A) = 0.80(The probability an email is not spam is 80%)
- Calculate P(B)(The Total Probability of an Email Being Flagged as Spam):This combines the probability of a true positive (a spam email correctly flagged) and a false positive (a legitimate email incorrectly flagged). 
 P(B) = [P(B|A) * P(A)] + [P(B|not A) * P(not A)]
 P(B) = (0.95 * 0.20) + (0.01 * 0.80)
 P(B) = 0.19 + 0.008
 P(B) = 0.198(About a 19.8% chance of an email being flagged as spam, regardless of whether it actually is spam or not)
- Apply Bayes' Theorem:
            P(A|B) = [P(B|A) * P(A)] / P(B)
 P(A|B) = (0.95 * 0.20) / 0.198
 P(A|B) ≈ 0.9596(About a 96% chance the email is actually spam given that it was flagged)
The Big Reveal
Even though only 1% of legitimate emails are flagged as spam, the probability that a flagged email is actually spam is about 96%. This is because a large proportion of your emails are spam in the first place.
Bayesian vs Frequentist
Imagine a pharmaceutical company has developed a new drug to treat a certain disease. They conduct a clinical trial to assess its effectiveness.
Classical Inference (Frequentist Approach)
- Hypothesis Testing: The researchers set up a null hypothesis (H0) that the drug has no effect and an alternative hypothesis (Ha) that the drug is effective.
- Data Collection: They collect data from the trial, typically measuring some outcome like the reduction in disease symptoms.
- Statistical Test: They perform a statistical test (e.g., t-test) to calculate a p-value, which represents the probability of observing the data or more extreme results if the null hypothesis were true.
- Decision: If the p-value is below a pre-determined threshold (often 0.05), they reject the null hypothesis and conclude that the drug is effective.
Key Points of Classical Inference:
- Focus on the Null Hypothesis: The primary focus is on rejecting or failing to reject the null hypothesis.
- No Prior Information: Classical inference does not incorporate any prior knowledge or beliefs about the drug's effectiveness.
- Limited Uncertainty Quantification: The p-value only tells us the probability of observing the data under the null hypothesis, not the probability of the drug being effective.
Bayesian Inference
- Prior Belief: The researchers start with a prior distribution representing their belief about the drug's effectiveness before the trial. This could be based on previous studies, expert opinions, or a skeptical viewpoint.
- Likelihood: They collect data from the trial and use it to construct a likelihood function, which describes how likely the observed data is under different values of the drug's effectiveness.
- Posterior Belief: They combine the prior distribution and the likelihood using Bayes' Theorem to obtain a posterior distribution, which represents their updated belief about the drug's effectiveness after seeing the trial data.
Key Points of Bayesian Inference:
- Focus on Parameter Estimation: The primary focus is on estimating the drug's effectiveness and quantifying uncertainty around that estimate.
- Incorporates Prior Information: Bayesian inference allows for incorporating prior knowledge into the analysis, leading to more informed conclusions.
- Full Uncertainty Quantification: The posterior distribution provides a complete picture of uncertainty about the drug's effectiveness, including credible intervals.
Example: Interpretation of Results
Let's say both approaches find that the drug seems to be effective. However, their interpretations differ:
- Classical Inference: "We reject the null hypothesis that the drug has no effect, suggesting it is likely effective (p < 0.05)."
- Bayesian Inference: "Based on our prior belief and the observed data, we estimate the drug's effectiveness to be X% with a 95% credible interval of [Y%, Z%]."
Key Differences
| Feature | Classical Inference (Frequentist) | Bayesian Inference | 
|---|---|---|
| Focus | Hypothesis testing (rejecting/failing to reject null hypothesis) | Parameter estimation and uncertainty quantification | 
| Prior Information | Not incorporated | Incorporated | 
| Uncertainty Quantification | Limited to p-value | Full posterior distribution and credible intervals | 
| Interpretation of Results | Based on statistical significance | Based on probability of the hypothesis given the data | 
| Philosophical Underpinnings | Objective probability based on long-run frequencies | Subjective probability representing degrees of belief |