Back to Posts

机器学习(Week4)-神经网络

Posted in Tech

Lecture1: Neural Networks

为什么我们需要神经网络

  • 当特征特别多(计算机视觉问题)时,如果计算多次多项式,得出来的最终特征会特别多,多到计算机无法处理。

产生的原因

Algorithms that try to mimic the brain. Was very widely used in 80s and early 90s; popularity diminished in late 90s
Recent resurgence: State-of-the-art technique for many applications

  • One learning algorithm

大脑的某一块区域(听觉皮层)能处理听觉,视觉,触觉等相关的信息(神经重接实验),所以是否有一各机器学习算法,可以处理各种的事务?

模型表示(Model Representation)
  • 最简单的表示(只有一个神经元的情况下)
\[\class{myMJSmall}{ \begin{bmatrix} x_0 \\ x_1 \\ x_2 \\ x_3 \\ \end{bmatrix} \rightarrow \begin{bmatrix} \end{bmatrix} \rightarrow h_\theta(x) }\]

模拟神经元:输入特征\(x_1\cdots x_n\)像是神经元树突(dendrites);输出结果\(h_\theta(x)\)好比神经轴突(axons)
\(x_0\)的值总是为1,称之为偏置单元(bias unit),有时候会不画出来,计算维度时也不计入
神经网络中也使用\(\frac{1}{1+e^{\theta^Tx}}\),称之为S型激活函数(sigmoid/logistic activation function)
\(\theta\)参数有时候也称之为权重(weight)

  • 添加层数
\[\class{myMJSmall}{ \begin{bmatrix} x_0 \\ x_1 \\ x_2 \\ x_3 \\ \end{bmatrix} \rightarrow \begin{bmatrix} a_1^{(2)} \\ a_2^{(2)} \\ a_3^{(2)} \\ \end{bmatrix} \rightarrow \begin{bmatrix} a_1^{(3)} \\ \end{bmatrix} \rightarrow h_\theta(x) }\]

\(x_0,x_1\cdots\)所在的层称为第一层(layer 1),也称之为输入层(input layer)
\(h_\theta(x)\)的前一层(layer 3)输出称之类输出层(output layer)
在输入层与输出层之间的层(layer 2)称为隐藏层(hidden layer)
隐藏层中的\(a_0^{(2)},a_1^{(2)}\cdots\)称为激活单元(activation units)

  • 激活单元的表示
\[\class{myMJSmall}{ \begin{align*} a_1^{(2)} = g(\Theta_{10}^{(1)}x_0 + \Theta_{11}^{(1)}x_1 + \Theta_{12}^{(1)}x_2 + \Theta_{13}^{(1)}x_3) \\ a_2^{(2)} = g(\Theta_{20}^{(1)}x_0 + \Theta_{21}^{(1)}x_1 + \Theta_{22}^{(1)}x_2 + \Theta_{23}^{(1)}x_3) \\ a_3^{(2)} = g(\Theta_{30}^{(1)}x_0 + \Theta_{31}^{(1)}x_1 + \Theta_{32}^{(1)}x_2 + \Theta_{33}^{(1)}x_3) \\ h_\Theta(x) = a_1^{(3)} = g(\Theta_{10}^{(2)}a_0^{(2)} + \Theta_{11}^{(2)}a_1^{(2)} + \Theta_{12}^{(2)}a_2^{(2)} + \Theta_{13}^{(2)}a_3^{(2)}) \\ \end{align*} }\]

\(a_i^{(j)}\)表示第\(j\)层的第\(i\)个激活单元
\(\Theta^{(j)}\)表示从第\(j\)层到\(j+1\)层的函数参数矩阵
\(\Theta^{(j)}\)的维度计算方法为:\(s_{j+1}\times (s_j+1)\),其中\(s_j\)为第j层的激活单元个数(不包括为0的单元)
之前的\(\Theta^{(1)}\)的维度为\(3\times4\),\(\Theta^{(2)}\)的维度为\(1\times 4\)
函数\(g\)为S型函数(sigmoid function)

  • 向量表示
\[\class{myMJSmall}{ a_1{(2)} = g(z_1^{(2)}) \\ a_2{(2)} = g(z_2^{(2)}) \\ a_3{(2)} = g(z_3^{(2)}) \\ \\ a_1{(3)} = g(z_1^{(3)}) \\ }\]

其中

\[\class{myMJSmall}{ z_k^{(2)} = \Theta_{k,0}^{(1)}x_0+\Theta_{k,1}^{(1)}x_1 +\cdots + \Theta_{k,n}^{(1)}x_n \\ z_k^{(3)} = \Theta_{k,0}^{(1)}a_0+\Theta_{k,1}^{(1)}a_1 +\cdots + \Theta_{k,n}^{(1)}a_n \\ }\]

更一般的

\[\class{myMJSmall}{ z_k^{(j)} = \Theta_{k,0}^{(j-1)}x_0 + \Theta_{k,1}^{(j-1)}+\cdots+\Theta_{k,n}^{(j-1)}x_n \\ }\]

\[\class{myMJSmall}{ x = \begin{bmatrix} x_0 \\ x_1 \\ \vdots \\ x_n\\ \end{bmatrix} a^{(j)} = \begin{bmatrix} a_0^{(j)} \\ a_1^{(j)} \\ \vdots \\ a_n^{(j)} \\ \end{bmatrix} z^{(j)} = \begin{bmatrix} z_0^{(j)} \\ z_1^{(j)} \\ \vdots \\ z_n^{(j)} \\ \end{bmatrix} }\]

\[\class{myMJSmall}{ a^{(1)} = x }\]

则,向量表示为

\[\class{myMJSmall}{ z^{(j)} = \Theta^{(j−1)}a^{(j−1)}\\ a^{(j)} = g(z^{(j)})\\ j \in [2,3\cdots] }\]

例子

  • AND运算
\[\class{myMJSmall}{ \begin{align*}\begin{bmatrix}x_0 \newline x_1 \newline x_2\end{bmatrix} \rightarrow\begin{bmatrix}g(z^{(2)})\end{bmatrix} \rightarrow h_\Theta(x)\end{align*} }\]

假设我们有两个特征值\(x_0, x_1\),值只能取0和1

令\(\class{myMJSmall}{\Theta_{(1)} = \begin{bmatrix}-30 & 20 & 20 \end{bmatrix}}\),则可以得到以下结果

\[\class{myMJSmall}{\begin{array}{cc|c} x_1 & x_2 & h_\theta(x) \\ \hline 0 & 0 & g(-30) \approx 0 \\ 0 & 1 & g(-10) \approx 0 \\ 1 & 0 & g(-10) \approx 0 \\ 1 & 1 & g(10) \approx 1 \\ \end{array} }\]
  • OR运算与NOR运算

同理,我们可以得到如下

\[\class{myMJSmall}{ NOR: \Theta^{(1)} = \begin{bmatrix} 10 & -20 & -20 \end{bmatrix} \\ OR: \Theta^{(1)} = \begin{bmatrix}-10 & 20 & 20\end{bmatrix} }\]
  • XNOR异或非运算

使用AND,NOR,OR和神经网络结合成XNOR运算

\[\class{myMJSmall}{ \begin{align*}\begin{bmatrix} x_0 \\ x_1 \\ x_2 \end{bmatrix} \rightarrow \begin{bmatrix} a_1^{(2)} \\ a_2^{(2)} \end{bmatrix} \rightarrow \begin{bmatrix} a^{(3)} \end{bmatrix} \rightarrow h_\Theta(x) \end{align*} }\] \[\class{myMJSmall}{ \Theta^{(1)} =\begin{bmatrix}-30 & 20 & 20 \newline 10 & -20 & -20\end{bmatrix} \\ \Theta^{(2)} =\begin{bmatrix}-10 & 20 & 20\end{bmatrix} }\]

过程如下图 xnor

多分类分类问题(Multiclass Classification)

此时输出层包含有多个\(h_\theta(x)\)函数,即为向量。每一列代表属于所属分类的概率,取概率最大的为对应的预测值

如下图,\(n\)个特征,预测4个分类

\[\class{myMJSmall}{ \begin{align*}\begin{bmatrix} x_0 \\ x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} \rightarrow \begin{bmatrix} a_0^{(2)} \\ a_1^{(2)} \\ a_2^{(2)} \\ \vdots \end{bmatrix} \rightarrow \begin{bmatrix} a_0^{(3)} \\ a_1^{(3)} \\ a_2^{(3)} \\ \vdots \end{bmatrix} \rightarrow \cdots \rightarrow \begin{bmatrix} h_\theta(x)_1 \\ h_\theta(x)_2 \\ h_\theta(x)_3 \\ h_\theta(x)_4 \\ \end{bmatrix} \end{align*} }\]

如下图

lr

学习资料

课件和笔记 Octave编程作业

Read Next

机器学习(Week3)-逻辑回归