Statistics for AIML - Regression Metrics - Bayes Theorem Tutorial
- Named after Thomas Bayes
- Bayes' Theorem describes the probability of an event, based on prior knowledge of conditions that might be related to the event.
- Note: This conditional probability is known as a hypothesis. This hypothesis is calculated through previous evidence or knowledge. This conditional probability is the probability of the occurrence of an event, given that some other event has already happened.
Example:
Suppose the weather of the day is cloudy. Now, you need to know whether it will rain today, given the cloudiness of the day. Therefore, you are supposed to calculate the probability of rainfall, given the evidence of cloudiness.
FROM WHERE IT CAME?
\(P(A|B) = \frac{P(A \bigcap B)}{P(B)}\) or \(P(B|A) = \frac{P(B \bigcap A)}{P(A)}\)
Rearranging the Equation-
\(P(A \bigcap B) = P(A|B) * P(B) \)
Similarly
\(P(B \bigcap A) = P(B|A) * P(A)\)
Since
\(P(B\bigcap A) = P(A\bigcap B)\)
Hence
\( P(B|A) * P(A) = P(A|B) * P(B) \)
finally
\(P(A|B) = \frac{P(B|A) * P(A)}{P(B)}\)
Bayes Theorem Generalised Form -
Epidemiologists claim that the probability of breast cancer among Caucasian women in their mid-50s is 0.005. An established test identified people who had breast cancer and those who were healthy. A new mammography test in clinical trials has a probability of 0.85 for detecting cancer correctly. In women without breast cancer, it has a chance of 0.925 for a negative result. If a 55-year-old Caucasian woman tests positive for breast cancer, what is the probability that she, in fact, has breast cancer?
Solution:
- P(Cancer) = 0.005
- P(Test Positive | Cancer) = 0.85
- P(Test -ve|No cancer) = 0.925
- P(Cancer|Test +ve)= P(Cancer) * P(Test Positive | Cancer) / P(Test Positive) = 0.005 * 0.85 / 0.078875 = 0.0538837
Symantec works by having users train the system. It looks for patterns in the words in emails marked as spam by the user. For example, it may have learned that the word “free” appears in 20% of the emails marked as spam. Assuming 0.1% of non-spam mail includes the word “free” and 50% of all emails received by the user is spam, find the probability that a mail is a spam if the word “free” appears in it.
Solution -