2.2
- logistic
- loss
- i-th sample
- 不用squared error,会有优化问题,变成non-convex,会有很多local-minima
- loss→single training example cost function→cost of parameters
- i-th sample
2.4 Gradient
2.7 computational graph
2.9 logistic for gradient
- dz只是coding中变量名
2.10 m samples
- naive:
缺点:
- 2 for-loop : 1)samples 2) features
2.11 Vectorization
z=np.dot(z,x)+b
广播机制:
for logistic:
2.15 Broadcasting
对行列同样有效
trick:
- 初始化nx1向量而不是秩为1的矩阵
- 上面那样a外积本来应该是矩阵确是数字
最大似然估计:iid
- 上面那样a外积本来应该是矩阵确是数字