close
Bayes’ Law


Bayes' theorem is stated mathematically as the following equation:
P(A|B) = \frac{P(A)\, P(B | A)}{P(B)},

where A and B are events.

  • P(A) and P(B) are the probabilities of A and B without regard to each other.
  • P(A | B), a conditional probability, is the probability of observing event A given that B is true.
  • P(B | A) is the probability of observing event B given that A is true.

800px-Bayes'_Theorem_2D.svg.png


通常,事件A在事件B(发生)的条件下的概率,与事件B在事件A的条件下的概率是不一样的;然而,这两者是有确定的关系,贝叶斯定理就是这种关系的陈述。贝叶斯公式的用途在于通过己知三个概率函数推出第四个。它的内容是:在B出现的前提下,A出现的概率等于A出现的前提下B出现的概率乘以A出现的概率再除以B出现的概率。通过联系A与B,计算从一个事件产生另一事件的概率,即从结果上溯原。

贝叶斯定理是关于随机事件A和B的条件概率的一則定理。

P(A|B) = \frac{P(B | A)\, P(A)}{P(B)}

其中P(A|B)是在B发生的情况下A发生的可能性。

在贝叶斯定理中,每个名词都有约定俗成的名称:

按這些術語,贝叶斯定理可表述為:

后驗概率 = (相似度*先驗概率)/標准化常量

也就是說,后驗概率与先驗概率和相似度的乘積成正比。

另外,比例P(B|A)/P(B)也有時被稱作標准相似度(standardised likelihood),贝叶斯定理可表述為:

后驗概率 = 標准相似度*先驗概率

條件機率 Conditional probability

条件概率英语conditional probability)就是事件A在另外一个事件B已经发生条件下的发生概率。条件概率表示为PA|B),读作“在B条件下A的概率”。

联合概率表示两个事件共同发生的概率。AB的联合概率表示为P(A\cap B)或者P(A,B)

边缘概率是某个事件发生的概率。边缘概率是這樣得到的:在聯合概率中,把最終結果中不需要的那些事件合并成其事件的全概率而消失(對离散隨机變量用求和得全概率,對連續隨机變量用積分得全概率)。這稱為邊緣化marginalization)。A的边缘概率表示为PA),B的边缘概率表示为PB)。


需要注意的是,在这些定义中AB之间不一定有因果或者时间顺序关系。A可能会先于B发生,也可能相反,也可能二者同时发生。A可能会导致B的发生,也可能相反,也可能二者之间根本就没有因果关系。

在同一个样本空间Ω中的事件或者子集A与B,如果随机从Ω中选出的一个元素属于B,那么这个随机选择的元素还属于A的概率就定义为在B的前提下A的条件概率。从这个定义中,我们可以得出

P(A|B) = |A∩B|/|B|

分子、分母都除以|Ω|得到

P(A|B)={\frac {P(A\cap B)}{P(B)}}

有时候也称为:后验概率


從Conditional probability 推導Bayes’ law

根據條件概率的定義。在事件B发生的条件下事件A发生的概率是[1]

P(A|B)=\frac{P(A \cap B)}{P(B)}

同樣地,在事件A发生的条件下事件B发生的概率

P(B|A) = \frac{P(A \cap B)}{P(A)}. \!

整理与合并這兩個方程式,我們可以找\到

P(A|B)\, P(B) = P(A \cap B) = P(B|A)\, P(A). \!

这个引理有时称作概率乘法规则。上式兩邊同除以P(A),若P(A)是非零的,我們可以得到贝叶斯 定理:

P(B|A) = \frac{P(A|B)\,P(B)}{P(A)}. \!


Assuming conditional probability is of similar size to its inverse
In general, it cannot be assumed that P(A|B) ≈ P(B|A). This can be an insidious error, even for those who are highly conversant with statistics.[5] The relationship between P(A|B) and P(B|A) is given by Bayes' theorem:
P(B|A)={\frac {P(A|B)P(B)}{P(A)}}.

That is, P(A|B) ≈ P(B|A) only if P(B)/P(A) ≈ 1, or equivalently, P(A) ≈ P(B).

Alternatively, noting that A ∩ B = B ∩ A, and applying conditional probability:

P(A|B)P(B)=P(A\cap B)=P(B\cap A)=P(B|A)P(A)

Rearranging gives the result.

512px-Bayes_theorem_visualisation.svg.png
A geometric visualisation of Bayes' theorem. In the table, the values wxy and z give the relative weights of each corresponding condition and case. The figures denote the cells of the table involved in each metric, the probability being the fraction of each figure that is shaded. This shows that P(A|B) P(B) = P(B|A) P(A) i.e. P(A|B) = P(B|A)P(A)/P(B). Similar reasoning can be used to show that P(Ā|B) = P(B|~A)P(~A)/P(B) etc.


Assuming marginal and conditional probabilities are of similar size

In general, it cannot be assumed that P(A) ≈ P(A|B). These probabilities are linked through the law of total probability:

P(A)=\sum _{n}P(A\cap B_{n})=\sum _{n}P(A|B_{n})P(B_{n}).

where the events (B_{n}) form a countable partition of A.


This fallacy may arise through selection bias.[6] For example, in the context of a medical claim, let SC be the event that a sequela(chronic disease) S occurs as a consequence of circumstance (acute condition) C. Let H be the event that an individual seeks medical help. Suppose that in most cases, C does not cause S so P(SC) is low. Suppose also that medical attention is only sought if S has occurred due to C. From experience of patients, a doctor may therefore erroneously conclude that P(SC) is high. The actual probability observed by the doctor is P(SC|H).


Over- or under-weighting priors
Not taking prior probability into account partially or completely is called base rate neglect. The reverse, insufficient adjustment from the prior probability is conservatism.


吸毒者检测

贝叶斯定理在检测吸毒者时很有用。假设一个常规的检测结果的敏感度与可靠度均为99%,也就是说,当被检者吸毒时,每次检测呈阳性(+)的概率为99%。而被检者不吸毒时,每次检测呈阴性(-)的概率为99%。从检测结果的概率来看,检测结果是比较准确的,但是贝叶斯定理卻可以揭示一个潜在的问题。假设某公司将对其全体雇员进行一次鸦片吸食情况的检测,已知0.5%的雇员吸毒。我们想知道,每位医学检测呈阳性的雇员吸毒的概率有多高?令“D”为雇员吸毒事件,“N”为雇员不吸毒事件,“+”为检测呈阳性事件。可得
  • P(D)代表雇员吸毒的概率,不考虑其他情况,该值为0.005。因为公司的预先统计表明该公司的雇员中有0.5%的人吸食毒品,所以这个值就是D的先验概率
  • P(N)代表雇员不吸毒的概率,显然,该值为0.995,也就是1-P(D)。
  • P(+|D)代表吸毒者阳性检出率,这是一个条件概率,由于阳性检测准确性是99%,因此该值为0.99。
  • P(+|N)代表不吸毒者阳性检出率,也就是出错检测的概率,该值为0.01,因为对于不吸毒者,其检测为阴性的概率为99%,因此,其被误检测成阳性的概率为1-99%。
  • P(+)代表不考虑其他因素的影响的阳性检出率。该值为0.0149或者1.49%。我们可以通过全概率公式计算得到:此概率 = 吸毒者阳性检出率(0.5% x 99% = 0.495%)+ 不吸毒者阳性检出率(99.5% x 1% = 0.995%)。P(+)=0.0149是检测呈阳性的先验概率。用数学公式描述为:
P(+)=P(+,D)+P(+,N)=P(+|D)P(D)+P(+|N)P(N)

根据上述描述,我们可以计算某人检测呈阳性时确实吸毒的条件概率P(D|+):

\begin{align}P(D|+) & = \frac{P(+ | D) P(D)}{P(+)} \ & = \frac{P(+ | D) P(D)}{P(+ | D) P(D) + P(+ | N) P(N)} \ & = \frac{0.99 \times 0.005}{0.99 \times 0.005 + 0.01 \times 0.995} \ & = 0.3322.\end{align}

尽管我们的检测结果可靠性很高,但是只能得出如下结论:如果某人检测呈阳性,那么此人是吸毒的概率只有大约33%,也就是说此人不吸毒的可能性比较大。我们测试的条件(本例中指D,雇员吸毒)越难发生,發生误判的可能性越大。

Example of Drug Test

Suppose a drug test is 99% sensitive and 99% specific. That is, the test will produce 99% true positive results for drug users and 99% true negative results for non-drug users. Suppose that 0.5% of people are users of the drug. If a randomly selected individual tests positive, what is the probability he or she is a user?
 \begin{align} P(\text{User}\mid\text{+}) &= \frac{P(\text{+}\mid\text{User}) P(\text{User})}{P(\text{+}\mid\text{User}) P(\text{User}) + P(\text{+}\mid\text{Non-user}) P(\text{Non-user})} \\[8pt] &= \frac{0.99 \times 0.005}{0.99 \times 0.005 + 0.01 \times 0.995} \\[8pt] &\approx 33.2\% \end{align}

Despite the apparent accuracy of the test, if an individual tests positive, it is more likely that they do not use the drug than that they do. This again illustrates the importance of base rates, and how the formation of policy can be egregiously misguided if base rates are neglected.

This surprising result arises because the number of non-users is very large compared to the number of users; thus the number of false positives (0.995%) outweighs the number of true positives (0.495%). To use concrete numbers, if 1000 individuals are tested, there are expected to be 995 non-users and 5 users. From the 995 non-users, 0.01 × 995 ≃ 10 false positives are expected. From the 5 users, 0.99 × 5 ≃ 5 true positives are expected. Out of 15 positive results, only 5, about 33%, are genuine.


Note: The importance of specificity can be illustrated by showing that even if sensitivity is 100% and specificity is at 99% the probability of the person being a drug user is ≈33% but if the specificity is changed to 99.5% and the sensitivity is dropped down to 99% the probability of the person being a drug user rises to 49.8%.

401px-Bayes_theorem_drugs_example_tree.svg.png
Tree diagram illustrating drug testing example. U, Ū, "+" and "−" are the events representing user, non-user, positive result and negative result. Percentages in parentheses are calculated.


Example of Cancer at age 65

Suppose we want to know a person's probability of having cancer, but we know nothing about him or her. Despite not knowing anything about that person, a probability can be assigned based on the general prevalence of cancer. For the sake of this example, suppose it is 1%. This is known as the base rate or prior probability of having cancer. "Prior" refers to the time before being informed about the particular case at hand.

Next, suppose we find out that person is 65 years old. If we assume that cancer and age are related, this new piece of information can be used to better assess that person's chance of having cancer. More precisely, we'd like to know the probability that a person has cancer when it is known that he or she is 65 years old. This quantity is known as the current probability, where "current" refers to upon finding out information about the particular case at hand.

In order to apply knowledge of that person's age in conjunction with Bayes' Theorem, two additional pieces of information are needed. Note, however, that the additional information is not specific to that person. The needed information is as follows:

  1. The probability of being 65 years old. Suppose it is 0.2%
  2. The probability that a person with cancer is 65 years old. Suppose it is 0.5%. Note that this is greater than the previous value. This reflects that people with cancer are disproportionately 65 years old.

Knowing this, along with the base rate, we can calculate that a person who is age 65 has a probability of having cancer equal to

(1% * 0.5%) \div 0.2% = 2.5%


It may come as a surprise that even though being 65 years old increases the risk of having cancer, that person's probability of having cancer is still fairly low. This is because the base rate of cancer (regardless of age) is low. This illustrates both the importance of base rate, as well as that it is commonly neglected.[3] Base rate neglect leads to serious misinterpretation of statistics; therefore, special care should be taken to avoid such mistakes. Becoming familiar with Bayes' theorem is one way to combat the natural tendency to neglect base rates.


挑戰者B不知道原壟斷者A是屬於高阻撓成本類型還是低阻撓成本類型,但B知道,如果A屬於高阻撓成本類型,B進入市場時A進行阻撓的概率是20%(此時A為了保持壟斷帶來的高利潤,不計成本地拼命阻撓);如果A屬於低阻撓成本類型,B進入市場時A進行阻撓的概率是100%。

  博弈開始時,B認為A屬於高阻撓成本企業的概率為70%,因此,B估計自己在進入市場時,受到A阻撓的概率為:

  0.7×0.2+0.3×1=0.44

  0.44是在B給定A所屬類型的先驗概率下,A可能採取阻撓行為的概率。

  當B進入市場時,A確實進行阻撓。使用貝葉斯法則,根據阻撓這一可以觀察到的行為,B認為A屬於高阻撓成本企業的概率變成A屬於高成本企業的概率=0.7(A屬於高成本企業的先驗概率)×0.2(高成本企業對新進入市場的企業進行阻撓的概率)÷0.44=0.32

  根據這一新的概率,B估計自己在進入市場時,受到A阻撓的概率為:

  0.32×0.2+0.68×1=0.744

  如果B再一次進入市場時,A又進行了阻撓。使用貝葉斯法則,根據再次阻撓這一可觀察到的行為,B認為A屬於高阻撓成本企業的概率變成

  A屬於高成本企業的概率=0.32(A屬於高成本企業的先驗概率)×0.2(高成本企業對新進入市場的企業進行阻撓的概率)÷0.744=0.086

  這樣,根據A一次又一次的阻撓行為,B對A所屬類型的判斷逐步發生變化,越來越傾向於將A判斷為低阻撓成本企業了。

  以上例子表明,在不完全信息動態博弈中,參與人所採取的行為具有傳遞信息的作用。儘管A企業有可能是高成本企業,但A企業連續進行的市場進入阻撓,給B企業以A企業是低阻撓成本企業的印象,從而使得B企業停止了進入地市場的行動。

  應該指出的是,傳遞信息的行為是需要成本的。假如這種行為沒有成本,誰都可以效仿,那麼,這種行為就達不到傳遞信息的目的。只有在行為需要相當大的成本,因而別人不敢輕易效仿時,這種行為才能起到傳遞信息的作用。


  傳遞信息所支付的成本是由信息的不完全性造成的。但不能因此就說不完全信息就一定是壞事。研究表明,在重覆次數有限的囚徒困境博弈中,不完全信息可以導致博弈雙方的合作。理由是:當信息不完全時,參與人為了獲得合作帶來的長期利益,不願過早暴露自己的本性。這就是說,在一種長期的關係中,一個人乾好事還是幹壞事,常常不取決於他的本性是好是壞,而在很大程度上取決於其他人在多大程度上認為他是好人。如果其他人不知道自己的真實面目,一個壞人也會為了掩蓋自己而在相當長的時期內做好事。



arrow
arrow
    文章標籤
    probability machine learning
    全站熱搜

    Knibu 發表在 痞客邦 留言(0) 人氣()