where A and B are events.
- P(A) and P(B) are the probabilities of A and B without regard to each other.
- P(A | B), a conditional probability, is the probability of observing event A given that B is true.
- P(B | A) is the probability of observing event B given that A is true.
- P(A)是A的先驗概率或(或边缘概率)。之所以稱為"先驗"是因為它不考慮任何B方面的因素。
- P(A|B)是已知B發生后A的條件概率,也由于得自B的取值而被稱作A的后驗概率。
- P(B|A)是已知A發生后B的條件概率,也由于得自A的取值而被稱作B的后驗概率。
- P(B)是B的先驗概率或邊緣概率,也作標准化常量(normalizing constant)。
- 后驗概率 = (相似度*先驗概率)/標准化常量
另外,比例P(B|A)/P(B)也有時被稱作標准相似度(standardised likelihood),贝叶斯定理可表述為:
条件概率(英语:conditional probability)就是事件A在另外一个事件B已经发生条件下的发生概率。条件概率表示为P(A|B),读作“在B条件下A的概率”。
P(A|B) = |A∩B|/|B|
That is, P(A|B) ≈ P(B|A) only if P(B)/P(A) ≈ 1, or equivalently, P(A) ≈ P(B).
Alternatively, noting that A ∩ B = B ∩ A, and applying conditional probability:
Rearranging gives the result.
- P(D)代表雇员吸毒的概率,不考虑其他情况,该值为0.005。因为公司的预先统计表明该公司的雇员中有0.5%的人吸食毒品,所以这个值就是D的先验概率。
- P(N)代表雇员不吸毒的概率,显然,该值为0.995,也就是1-P(D)。
- P(+|D)代表吸毒者阳性检出率,这是一个条件概率,由于阳性检测准确性是99%,因此该值为0.99。
- P(+|N)代表不吸毒者阳性检出率,也就是出错检测的概率,该值为0.01,因为对于不吸毒者,其检测为阴性的概率为99%,因此,其被误检测成阳性的概率为1-99%。
- P(+)代表不考虑其他因素的影响的阳性检出率。该值为0.0149或者1.49%。我们可以通过全概率公式计算得到:此概率 = 吸毒者阳性检出率(0.5% x 99% = 0.495%)+ 不吸毒者阳性检出率(99.5% x 1% = 0.995%)。P(+)=0.0149是检测呈阳性的先验概率。用数学公式描述为:
Despite the apparent accuracy of the test, if an individual tests positive, it is more likely that they do not use the drug than that they do. This again illustrates the importance of base rates, and how the formation of policy can be egregiously misguided if base rates are neglected.
This surprising result arises because the number of non-users is very large compared to the number of users; thus the number of false positives (0.995%) outweighs the number of true positives (0.495%). To use concrete numbers, if 1000 individuals are tested, there are expected to be 995 non-users and 5 users. From the 995 non-users, 0.01 × 995 ≃ 10 false positives are expected. From the 5 users, 0.99 × 5 ≃ 5 true positives are expected. Out of 15 positive results, only 5, about 33%, are genuine.
Suppose we want to know a person's probability of having cancer, but we know nothing about him or her. Despite not knowing anything about that person, a probability can be assigned based on the general prevalence of cancer. For the sake of this example, suppose it is 1%. This is known as the base rate or prior probability of having cancer. "Prior" refers to the time before being informed about the particular case at hand.
Next, suppose we find out that person is 65 years old. If we assume that cancer and age are related, this new piece of information can be used to better assess that person's chance of having cancer. More precisely, we'd like to know the probability that a person has cancer when it is known that he or she is 65 years old. This quantity is known as the current probability, where "current" refers to upon finding out information about the particular case at hand.
In order to apply knowledge of that person's age in conjunction with Bayes' Theorem, two additional pieces of information are needed. Note, however, that the additional information is not specific to that person. The needed information is as follows:
- The probability of being 65 years old. Suppose it is 0.2%
- The probability that a person with cancer is 65 years old. Suppose it is 0.5%. Note that this is greater than the previous value. This reflects that people with cancer are disproportionately 65 years old.
Knowing this, along with the base rate, we can calculate that a person who is age 65 has a probability of having cancer equal to