2.2

  1. logistic
  2. loss
    1. i-th sample
    2. 不用squared error,会有优化问题,变成non-convex,会有很多local-minima
    3. losssingle training example cost functioncost of parameters

2.4 Gradient

2.7 computational graph

2.9 logistic for gradient

  1. dz只是coding中变量名

2.10 m samples

  1. naive: 缺点:
    1. 2 for-loop : 1)samples 2) features

2.11 Vectorization

z=np.dot(z,x)+b

广播机制:

for logistic:

2.15 Broadcasting

对行列同样有效 trick:

  1. 初始化nx1向量而不是秩为1的矩阵
    1. 上面那样a外积本来应该是矩阵确是数字
    numpy c 最大似然估计:iid