Bayes是首先通过以训练集进行概率计算,得到先验概率分布和条件概率分布,而对于这两种概率,Bayes分类器是采用的用样本估计整体(极大似然估计)的思路,通过训练集中样本的这两种概率分布,进行总体近似。也就是计算出
再知道这两种概率之后,就能利用贝叶斯定理
P
(
A
∣
B
)
=
P
(
A
B
)
P
(
B
)
=
P
(
B
∣
A
)
P
(
A
)
P
(
B
)
P(A|B)=\frac{P(AB)}{P(B)}=\frac{P(B|A)P(A)}{P(B)}
P(A∣B)=P(B)P(AB)=P(B)P(B∣A)P(A)
计算出在已知样本特征之后预测样本的标签。
先验概率分布:
P
(
Y
=
c
k
)
,
k
=
1
,
2
,
⋯
,
K
P(Y=c_k),\quad k=1,2,\cdots ,K
P(Y=ck),k=1,2,⋯,K
条件概率分布:朴素贝叶斯的朴素也表现在这里,“朴素”假设样本不同特征相互独立
P
(
X
=
x
∣
Y
=
c
k
)
=
P
(
X
(
1
)
=
x
(
1
)
,
⋯
,
X
(
n
)
=
x
(
n
)
∣
Y
=
c
k
)
=
∏
j
=
1
n
P
(
X
(
j
)
=
x
(
j
)
∣
Y
=
c
k
)
P(X=x∣Y=ck)=P(X(1)=x(1),⋯,X(n)=x(n)∣Y=ck)=n∏j=1P(X(j)=x(j)∣Y=ck)
于是可以由乘法公式得到
P
(
X
,
Y
)
P(X,Y)
P(X,Y),再利用贝叶斯定理求出后验概率分布,也就是相当于已知了某样本的相关特征属性,预测它属性
c
k
c_k
ck的概率是多少,从而达到了预测的效果。
P
(
Y
=
C
k
∣
X
=
x
)
=
P
(
X
=
x
∣
Y
=
c
k
)
⋅
P
(
Y
=
c
k
)
P
(
X
=
x
)
=
P
(
X
=
x
∣
Y
=
C
k
)
P
(
Y
=
C
k
)
∑
k
P
(
X
=
x
∣
Y
=
C
k
)
P
(
Y
=
C
k
)
=
P
(
Y
=
C
k
)
∏
j
P
(
X
(
j
)
=
x
(
j
)
∣
Y
=
C
k
)
∑
k
P
(
Y
=
C
k
)
∏
j
P
(
X
(
j
)
=
x
(
j
)
∣
Y
=
C
k
)
P(Y=Ck∣X=x)=P(X=x|Y=ck)⋅P(Y=ck)P(X=x)=P(X=x∣Y=Ck)P(Y=Ck)∑kP(X=x∣Y=Ck)P(Y=Ck)=P(Y=Ck)∏jP(X(j)=x(j)∣Y=Ck)∑kP(Y=Ck)∏jP(X(j)=x(j)∣Y=Ck)
对于整个事件的概率分布,我们一般不可能知道各个类型的分布情况,以及在某个类型发生情况下,各个属性发生的概率分布,因此我们可以使用极大似然估计,以样本预测总体的概率。
先验概率分布的极大似然估计:
P
(
Y
=
c
k
)
=
∑
i
=
1
n
I
(
y
i
=
c
K
)
N
,
k
=
1
,
2
,
⋯
,
K
P(Y=c_k)=\frac{\sum_{i=1}^nI(y_i=c_K)}{N},\quad k=1,2,\cdots ,K
P(Y=ck)=N∑i=1nI(yi=cK),k=1,2,⋯,K
条件概率的极大似然估计:
P
(
X
(
j
)
=
a
j
l
∣
Y
=
c
k
)
=
∑
i
=
1
N
I
(
x
i
(
j
)
=
a
j
l
,
y
i
=
c
k
)
∑
i
=
1
N
I
(
y
i
=
c
k
)
j
=
1
,
2
,
⋯
,
n
,
l
=
1
,
2
,
⋯
,
S
j
,
y
i
∈
{
c
1
,
c
2
,
⋯
,
c
K
}
P\left(X^{(j)}=a_{j l} \mid Y=c_{k}\right)=\frac{\sum_{i=1}^{N} I\left(x_{i}^{(j)}=a_{j l}, y_{i}=c_{k}\right)}{\sum_{i=1}^{N} I\left(y_{i}=c_{k}\right)}\quad \\j=1,2, \cdots, n, l=1,2, \cdots,S_{j}, y_{i} \in\{c_{1},c_2 , \cdots , c_{K}\}
P(X(j)=ajl∣Y=ck)=∑i=1NI(yi=ck)∑i=1NI(xi(j)=ajl,yi=ck)j=1,2,⋯,n,l=1,2,⋯,Sj,yi∈{c1,c2,⋯,cK}
P
(
Y
=
c
k
)
=
∑
i
=
1
N
I
(
y
i
=
c
k
)
N
,
k
=
1
,
2
,
⋯
,
K
P
(
X
(
j
)
=
a
j
l
∣
Y
=
c
k
)
=
∑
i
=
1
N
I
(
x
i
(
j
)
=
a
j
1
,
y
i
=
c
k
)
∑
i
=
1
N
I
(
y
i
=
c
k
)
j
=
1
,
2
,
⋯
,
n
;
l
=
1
,
2
,
⋯
,
S
j
;
k
=
1
,
2
,
⋯
,
K
P(Y=ck)=∑Ni=1I(yi=ck)N,k=1,2,⋯,KP(X(j)=ajl∣Y=ck)=∑Ni=1I(x(j)i=aj1,yi=ck)∑Ni=1I(yi=ck)j=1,2,⋯,n;l=1,2,⋯,Sj;k=1,2,⋯,K
P ( Y = c k ) ∏ j = 1 n P ( X ( j ) = x ( n ) ∣ Y = c k ) , k = 1 , 2 , ⋯ , K P\left(Y=c_{k}\right) \prod_{j=1}^{n} P\left(X^{(j)}=x^{(n)} \mid Y=c_{k}\right), \quad k=1,2, \cdots, K P(Y=ck)j=1∏nP(X(j)=x(n)∣Y=ck),k=1,2,⋯,K
y = arg max c k P ( Y = c k ) ∏ j = 1 n P ( X ( j ) = x ( j ) ∣ Y = c k ) y=\arg \max _{c_{k}} P\left(Y=c_{k}\right) \prod_{j=1}^{n} P\left(X^{(j)}=x^{(j)} \mid Y=c_{k}\right) y=argckmaxP(Y=ck)j=1∏nP(X(j)=x(j)∣Y=ck)
处理布尔型特征(true和false,或者1和0),使用伯努利模型。
如果特征值为1,那么 P ( x i ∣ y k ) = P ( x i = 1 ∣ y k ) P\left(x_{i} \mid y_{k}\right)=P\left(x_{i}=1 \mid y_{k}\right) P(xi∣yk)=P(xi=1∣yk)
如果特征值为0,那么 P ( x i ∣ y k ) = 1 − P ( x i = 1 ∣ y k ) P\left(x_{i} \mid y_{k}\right)=1-P\left(x_{i}=1 \mid y_{k}\right) P(xi∣yk)=1−P(xi=1∣yk)
用极大似然估计可能会出现所要估计的概率值为 0 的情况. 这时会影响到后验概率的计算结果, 使分类产生偏差. 解决这一问题的方法是采用贝叶斯估计. 具体地, 条件概率的贝叶斯估计是
P
λ
(
X
(
j
)
=
a
j
l
∣
Y
=
c
k
)
=
∑
i
=
1
N
I
(
x
i
(
j
)
=
a
j
l
,
y
i
=
c
k
)
+
λ
∑
i
=
1
N
I
(
y
i
=
c
k
)
+
S
j
λ
P_{\lambda}\left(X^{(j)}=a_{j l} \mid Y=c_{k}\right)=\frac{\sum_{i=1}^{N} I\left(x_{i}^{(j)}=a_{j l}, y_{i}=c_{k}\right)+\lambda}{\sum_{i=1}^{N} I\left(y_{i}=c_{k}\right)+S_{j} \lambda}
Pλ(X(j)=ajl∣Y=ck)=∑i=1NI(yi=ck)+Sjλ∑i=1NI(xi(j)=ajl,yi=ck)+λ
式中
λ
⩾
0
\lambda \geqslant 0
λ⩾0 . 等价于在随机变量各个取值的频数上赋予一个正数
λ
>
0
\lambda>0
λ>0. 当
λ
=
0
\lambda=0
λ=0时 就是极大似然估计. 常取
λ
=
1
\lambda=1
λ=1 , 这时称为拉普拉斯平滑 (Laplace smoothing). 显然, 对任何
l
=
1
,
2
,
⋯
,
S
j
,
k
=
1
,
2
,
⋯
,
K
l=1,2, \cdots, S_{j}, k=1,2, \cdots, K
l=1,2,⋯,Sj,k=1,2,⋯,K , 有
P
λ
(
X
(
j
)
=
a
j
l
∣
Y
=
c
k
)
>
0
∑
l
=
1
s
j
P
(
X
(
j
)
=
a
j
l
∣
Y
=
c
k
)
=
1
P_{\lambda}\left(X^{(j)}=a_{j l} \mid Y=c_{k}\right)>0 \\ \sum_{l=1}^{s_{j}} P\left(X^{(j)}=a_{j l} \mid Y=c_{k}\right)=1
Pλ(X(j)=ajl∣Y=ck)>0l=1∑sjP(X(j)=ajl∣Y=ck)=1
同样, 先验概率的贝叶斯估计是
P
λ
(
Y
=
c
k
)
=
∑
i
=
1
N
I
(
y
i
=
c
k
)
+
λ
N
+
K
λ
P_{\lambda}\left(Y=c_{k}\right)=\frac{\sum_{i=1}^{N} I\left(y_{i}=c_{k}\right)+\lambda}{N+K \lambda}
Pλ(Y=ck)=N+Kλ∑i=1NI(yi=ck)+λ
特征的可能性被假设为高斯
概率密度函数:
P
(
x
i
∣
y
k
)
=
1
2
π
σ
y
k
2
e
x
p
(
−
(
x
i
−
μ
y
k
)
2
2
σ
y
k
2
)
P(x_i | y_k)=\frac{1}{\sqrt{2\pi\sigma^2_{yk}}}exp(-\frac{(x_i-\mu_{yk})^2}{2\sigma^2_{yk}})
P(xi∣yk)=2πσyk21exp(−2σyk2(xi−μyk)2)
数学期望(mean): μ \mu μ
方差: σ 2 = ∑ ( X − μ ) 2 N \sigma^2=\frac{\sum(X-\mu)^2}{N} σ2=N∑(X−μ)2
class NaiveBayes:
def __init__(self):
self.model = None
def summarize(self, train_data):
train_data = np.array(train_data)
mean = np.mean(train_data, axis=0)
std = np.std(train_data, axis=0)
summaries = np.stack((mean, std), axis=1)
return summaries
def fit(self, X, y):
labels = list(set(y))
data = {label: [] for label in labels}
for f, label in zip(X, y):
data[label].append(f)
self.model = {label: self.summarize(value) for label, value in data.items()}
return 'gaussianNB train done!'
# 高斯概率密度函数
def gaussian_probability(self, x, mean, stdev):
exponent = math.exp(-(math.pow(x - mean, 2) /(2 * math.pow(stdev, 2))))
prod=(1 / (math.sqrt(2 * math.pi) * stdev)) * exponent
return prod
def gaussian_probability_np(self, x, summarize):
x=np.array(x)
x = x.reshape(x.shape[0], 1)
mean, std = np.hsplit(summarize, indices_or_sections=2)
exponent = np.exp(-((x - mean) ** 2 /(2 * (std ** 2))))
prod = (1 / (np.sqrt(2 * np.pi) * std)) * exponent
prod=np.prod(prod, axis=0)
return prod
# 计算概率
def calculate_probabilities_np(self, input_data):
probabilities = {}
for label, value in self.model.items():
# 初始化权重概率为1
probabilities[label] = 1
# 计算有多少个属性遍历几次
probabilities[label] *= self.gaussian_probability_np(input_data, value)
return probabilities
def calculate_probabilities(self, input_data):
probabilities = {}
for label, value in self.model.items():
# 初始化权重概率为1
probabilities[label] = 1
for i in range(len(value)):
mean, stdev = value[i]
probabilities[label] *= self.gaussian_probability(input_data[i], mean, stdev)
print('math:',probabilities)
return probabilities
# 类别
def predict(self, X_test):
# {0.0: 2.9680340789325763e-27, 1.0: 3.5749783019849535e-26}
label = sorted(
self.calculate_probabilities_np(X_test).items(),
key=lambda x: x[-1])
label = label[-1][0]
return label
def score(self, X_test, y_test):
right = 0
for X, y in zip(X_test, y_test):
label = self.predict(X)
if label == y:
right += 1
return right / float(len(X_test))
如下是再利用贝叶斯分类器进行数据集的分类之后,得到的最终概率

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import math
# data
def create_data():
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['label'] = iris.target
df.columns = ['sepal length', 'sepal width', 'petal length', 'petal width', 'label']
data = np.array(df.iloc[:, :])
return data[:, :-1], data[:, -1]
X, y = create_data()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
model = NaiveBayes()
model.fit(X_train, y_train)
score=model.score(X_test, y_test)
print(model.predict([4.4, 3.2, 1.3, 0.2]))
print(score)
