Bayesian Learning , The Naïve Bayes Model
Naïve Bayes is a
simple probabilistic classifier based on applying Bayes theorem with assumption
of independence between features.
Explanation of Baye’s rule:
Baye’s rule states
that p(h|e) = p(e|h) * p(h) / p(e)
The probability of a hypothesis or event (h) occurring can
be predicted based on evidences (e) that can be observed from prior events.
Some important terms
are:
Priori probability–
The probability of p(h) event before the evidence is observed.
Posterior probability
– The probability of p(h|e) event after evidence is observed.
Naïve Bays can be used in such areas as document
classification (spam filtering, website classification, etc..) and also for
events who’ s features can be safely assumed independent or that establishing
dependence with be too costly.
Let H be the event of “fever” and E the evidence of “sore
throat” , then we have
P( fever | sore throat) = P( sore throat | fever) * P
(fever) / P (dark cloud )
P( sore throat | fever ) this is the probability that the
person has “sore throat” given or during a “fever”.
P(fever) – is the
Prior probability. This can be obtained from statistical medical records such
as number of people who had a fever when visiting the doctor this year.
P( sore throat) is
probability of evidence occurring , This can be obtained from statistical
medical records such as number of people who had a sore throats when visiting the doctor this year.
As one can observe from the above example, we can predict an
outcome of some events by observing some evidences; the more evidences the
better prediction. When including more
evidences for building our NB model, we could run into a problem of
dependencies. For example including the evidence “excessive coughing” might be
due to “sore throat” can make the model complicated. Therefore we assume that all evidences are
“independent” of each other thus “naïve”.
Bayes rule for
multiple evidences with independence
P(H
| E1, E2,
..., En) = P( E1 |
H) x P( E2
| H) x ... P( En | H) x P(H)
P(E1, E2,
..., En)
Example
1: Lets try and build an NB model. Using weather example from
the book “Data Mining , Practical Machine Learning Tools & Techniques”
Predict if the team
will “play” given the features “outlook, temperature, humidity, and windy”
Lets build frequency table of different evidences per
feature & classification outcome.
The above table , tabulates all the data in one place in
order to make comparison a little easier each feature value such as Outlook =
sunny has frequency of Yes versus no No classification. The bottom portion has
the relative frequency expressed in fractions, such as p( Outlook=sunny |Play = yes)
= 2/9 p( Outlook= sunny | Play = no ) =3/5.
Now that we have created the NB model via the table above we
can utilize this model to predict the likelihood event “play” based on
different evidence values. For example ,
P [ yes | outlook = rainy
, temperature = cool, humidity = high, windy = false] =
P[rainy | yes] * P
[mild | yes] * P [high | yes] * P[ false | yes] * P [yes] / P
[rainy*cool*high*false] =
3/9 * 4/9 * 3/9 * 6/
9 * 9/14, we can ignore P[Evidences] = 0.02116448
Also likelihood of :
P [ no | outlook = rainy , temperature = cool, humidity =
high, windy = false] = 3/9 * 4/9 * 3/9 * 6/ 9 * 5/14 = 0.01175778
Finally will convert
the above likelihood calculations into probability by normalization
P[yes] = 0.02116 / ( 0.01175 + 0.02116) = 0.6431 or about
64.31 % chance of rain given the evidences
P[No]= 0.01175 / (
0.01175 + 0.02116) = 0.3568 or about
36.68% chance of no rain given evidences
How to deal with zero-frequency
in our data, such as P(overcast | play = no) = 0/5 . To be on the safe the data miner
should not state that a hypothesis or event could never occur , unless there is
real scientific evidence that supports a zero probability. To solve the zero
frequency we use a technique known as “Laplace estimation” , by adding a
constant “m “ across all counts.
Above explanation of Laplace estimation was from Haruechaiyasak,
Choochart. "A Tutorial on Naive Bayes Classification.".
Next article we will discuss how to build a Naïve Nayes
model in WEKA explorer.
References
2.
-Haruechaiyasak, Choochart. "A Tutorial on
Naive Bayes Classification." . N.p., 16 08 2008. Web. 31 Oct 2013.
3.
Jurafsky, Dan. " Text
Classification and Naïve Bayes." . Stanford University . Web. 31 Oct
2013.