ื”ืจืฆืื” 11 - CNN

PDF

(Classical) Gradient Descent

ืฆืขื“ ื”ืขื“ื›ื•ืŸ ื‘ gradient descent ื ืชื•ืŸ ืขืœ ื™ื“ื™:

ฮธ(t+1)=ฮธ(t)โˆ’ฮทโˆ‡ฮธg(ฮธ(t))\boldsymbol{\theta}^{(t+1)}=\boldsymbol{\theta}^{(t)}-\eta\nabla_{\boldsymbol{\theta}}g(\boldsymbol{\theta}^{(t)})
  1. ื‘ ERM:

    argโกminโกฮธ1Nโˆ‘i=1Nl(h(x(i);ฮธ),y(i))โŸg(ฮธ;D)\underset{\boldsymbol{\theta}}{\arg\min} \underbrace{\frac{1}{N}\sum_{i=1}^N l(h(\boldsymbol{x^{(i)}};\boldsymbol{\theta}),y^{(i)})}_{g(\boldsymbol{\theta};\mathcal{D})}
  2. MLE:

    argโกminโกฮธโˆ’โˆ‘i=1Nlogโก(pyโˆฃx(y(i)โˆฃx(i);ฮธ)โŸg(ฮธ;D)\underset{\boldsymbol{\theta}}{\arg\min} \underbrace{-\sum_{i=1}^N \log(p_{\text{y}|\mathbf{x}}(y^{(i)}|\boldsymbol{x}^{(i)};\boldsymbol{\theta})}_{g(\boldsymbol{\theta};\mathcal{D})}

(Classical) Gradient Descent

ื‘ ERM:

argโกminโกฮธ1Nโˆ‘i=1Nl(h(x(i);ฮธ),y(i))โŸg(ฮธ;D)\underset{\boldsymbol{\theta}}{\arg\min} \underbrace{\frac{1}{N}\sum_{i=1}^N l(h(\boldsymbol{x^{(i)}};\boldsymbol{\theta}),y^{(i)})}_{g(\boldsymbol{\theta};\mathcal{D})}

ื‘ MLE:

argโกminโกฮธโˆ’โˆ‘i=1Nlogโก(py;x(y(i)โˆฃx(i);ฮธ)โŸg(ฮธ;D)\underset{\boldsymbol{\theta}}{\arg\min} -\underbrace{\sum_{i=1}^N \log(p_{\text{y};\mathbf{x}}(y^{(i)}|\boldsymbol{x}^{(i)};\boldsymbol{\theta})}_{g(\boldsymbol{\theta};\mathcal{D})}
  • ื”ื’ืจื“ื™ืื ื˜ ืžื›ื™ืœ ืกื›ื•ื ืขืœ ื›ืœ ื”ืžื“ื’ื ืืฉืจ ื™ื›ื•ืœ ืœื”ื™ื•ืช ื‘ืขื™ื™ืชื™ ื›ืืฉืจ ื”ืžื“ื’ื ื’ื“ื•ืœ.
  • ื ืจืฆื” ืœื”ืฉืชืžืฉ ื‘ื—ื™ืฉื•ื‘ ืืฉืจ ืžืฉืชืžืฉ ื‘ื›ืœ ืฆืขื“ ืจืง ื‘ื—ืœืง ืžืŸ ื”ืžื“ื’ื.

Stochastic Gradient Descent

  • ืžื—ืฉื‘ ื‘ื›ืœ ืคืขื ืืช ื”ื ื’ื–ืจืช ืขืœ ืคื™ ื“ื’ื™ืžื” ื‘ื•ื“ื“ืช ืžืชื•ืš ื”ืžื“ื’ื, ื›ืืฉืจ ื‘ื›ืœ ืฆืขื“ ื ืฉืชืžืฉ ื‘ื“ื’ื™ืžื” ืื—ืจืช.
  • ืฉืชื™ ืื•ืคืฆื™ื•ืช ืœื‘ื—ื™ืจื” ืฉืœ ื”ื“ื’ื™ืžื” ื‘ื›ืœ ืฆืขื“ ื”ื™ื ืŸ:

    1. ืœื”ื’ืจื™ืœ ื“ื’ื™ืžื” ืืงืจืื™ืช ืื—ืจืช ื‘ื›ืœ ืฆืขื“.
    2. ืœืขื‘ื•ืจ ืขืœ ื”ื“ื’ื™ืžื•ืช ื‘ืžื“ื’ื ื‘ืฆื•ืจื” ืกื™ื“ืจืชื™ืช.
  • ื›ืœ ืื—ืช ืžื”ื“ื’ื™ืžื•ืช ืชืฆื‘ื™ืข ืœื›ื™ื•ื•ืŸ ืฉื•ื ื” ืžื”ื ื’ื–ืจืช ืฉืœ ื”ืกื›ื•ื ืื‘ืœ ื‘ืžืžื•ืฆืข ื”ื›ื™ื•ื•ืŸ ื”ื›ืœืœื™ ื™ื”ื™ื” ื–ื”ื” ืœื›ื™ื•ื•ืŸ ืฉืœ ื”ืกื›ื•ื.
  • ื”ื—ื™ืฉื•ื‘ ื”ื•ื ืžืื“ ืžื”ื™ืจ ืื‘ืœ ื”ื’ืจื“ื™ืื ื˜ ืžืื“ "ืจื•ืขืฉ".

Stochastic Gradient Descent

ื™ืชืจื•ื ื•ืช:

  1. ืžื—ื™ืจ ืื™ื˜ืจืฆื™ื” ืœื ืชืœื•ื™ ื‘ืžื“ื’ื
  2. ื—ื™ืกื›ื•ืŸ ื‘ื–ื™ื›ืจื•ืŸ


From https://www.stat.cmu.edu/~ryantibs/convexopt/lectures/stochastic-gd.pdf

Mini-Batch Gradient Descent

  • ืคืชืจื•ืŸ ื‘ื™ื ื™ื™ื.
  • ื‘ืฉื™ื˜ื” ื–ื• ื ืฉืชืžืฉ ื‘ืงื‘ื•ืฆืช ื“ื’ื™ืžื•ืช ืžืชื•ืš ื”ืžื“ื’ื ื”ืžื›ื•ื ื” mini-batch. ื‘ื›ืœ ืฆืขื“ ืื ื• ื ื—ืœื™ืฃ ืืช ื” mini-batch.
  • ื”ืฉื™ื˜ื” ื”ื ืคื•ืฆื” ื‘ื™ื•ืชืจ ืœืื™ืžื•ืŸ ืฉืœ ืจืฉืชื•ืช ื ื•ื™ืจื•ื ื™ื.
  • ื’ื“ืœื™ื ืื•ืคื™ื™ื ื™ื ืฉืœ ื” mini-batch ื”ื™ื ื 32-256 ื“ื’ื™ืžื•ืช.

ืฉืžื•ืช

  • Epoch: ืžืขื‘ืจ ืฉืœื ืขืœ ื”ืžื“ื’ื.
  • ืžืชื™ื™ื—ืกื™ื ืœ mini-batch ื‘ืฉื batch.
  • ื‘ื—ื‘ื™ืœื•ืช ืจื‘ื•ืช ื”ืืœื’ื•ืจื™ืชื gradient descent ืžื•ืคื™ืข ืชื—ืช ื”ืฉื stochastic gradient descent.

ืขืฆื™ืจื” ืžื•ืงื“ืžืช ืฉืœ gradient descent

  • ื“ืจืš ืžื•ืฆืœื—ืช ื ื•ืกืคืช ืœืžื ื•ืข ื”ืชืืžืช-ื™ืชืจ ื”ื™ื ื” ืœืขืฆื•ืจ ืืช ืืœื’ื•ืจื™ืชื ื”ื’ืจื“ื™ืื ื˜ ืœืคื ื™ ืฉื”ื•ื ืžืชื›ื ืก.
  • ื–ื” ื ืขืฉื” ืขืœ ื™ื“ื™ ื—ื™ืฉื•ื‘ ื” objective ืขืœ ื” validataion set ื•ื‘ื—ื™ืจืช ื”ืคืจืžื˜ืจื™ื ืฉืžืžื–ืขืจื™ื ืืช ื” objective.

Convolutional Neural Networks (CNN)

  • ื‘ MLP ื ื™ืชืŸ ืœื”ื’ื“ื™ืœ ืืช ื™ื›ื•ืœืช ื”ื™ื™ืฆื•ื’ ืขืœ ื™ื“ื™ ื”ื’ื“ืœืช ื”ืจืฉืช (ืžืกืคืจ ื”ืฉื›ื‘ื•ืช ื•ื”ืจื•ื—ื‘ ืฉืœื”ื).
  • ื›ืคื™ ืฉืงื•ืจื” ื‘ื›ืœ ืžื•ื“ืœ ืคืจืžื˜ืจื™, ื”ื’ื“ืœื” ืฉืœ ื™ื›ื•ืœืช ื”ื™ื™ืฆื•ื’ ืชื’ื“ื™ืœ ื’ื ืืช ื” overfitting.
  • ืจืฉืช ื‘ืขืœืช ืืจื›ื™ื˜ืงื˜ื•ืจื” ื˜ื•ื‘ื” ื”ื™ื ื“ื•ื•ืงื ืจืฉืช ื‘ืขืœืช ื™ื›ื•ืœืช ื™ื™ืฆื•ื’ ื ืžื•ื›ื” ืืฉืจ ืขื“ื™ื™ืŸ ืžืกื•ื’ืœืช ืœืงืจื‘ ื‘ืฆื•ืจื” ื˜ื•ื‘ื” ืืช ื”ืคื•ื ืงืฆื™ื” ืฉืื•ืชื” ื”ื™ื ืžื ืกื” ืœืžื“ืœ.
  • ื‘ืžืงืจื™ื ืžืกื•ื™ื™ืžื™ื ืืจื›ื™ื˜ืงื˜ื•ืจื” ื‘ืฉื convolutional nerual network (CNN) ืขื•ื ื” ื‘ื“ื™ื•ืง ืขืœ ื“ืจื™ืฉื•ืช ืืœื•. ื”ืžื•ื˜ื™ื‘ืฆื™ื” ื”ืžืงื•ืจื™ืช ืฉืœื” ื”ื’ื™ืข ืžื”ืชื—ื•ื ืฉืœ ืขื™ื‘ื•ื“ ืชืžื•ื ื”.

Convolutional Neural Networks (CNN)

ืขื“ ืขื›ืฉื™ื• ื”ื ื—ื ื• ืฉื”ืžืืคื™ื™ื ื™ื ื ืชื•ื ื™ื ืœื ื•. ืžื” ืงื•ืจื” ื›ืืฉืจ ื”ืงืœื˜ ื”ื•ื ืื•ืช ื˜ื‘ืขื™ - ืชืžื•ื ื”, ืื•ื“ื™ื• ื•ื›ื•'?

Convolutional Neural Networks (CNN)

ืžื•ื˜ื™ื‘ืฆื™ื” ืœื”ื›ื ืกืช ืžื‘ื ื” ืœืจืฉืชื•ืช ืขืฆื‘ื™ื•ืช ื”ืขื•ืกืงื•ืช ื‘ืชืžื•ื ื•ืช

  • ื‘ืชืžื•ื ื•ืช ื˜ื‘ืขื™ื•ืช ื™ืฉ ืงื•ืจืœืฆื™ื” ื’ื‘ื•ื”ื” ื‘ื™ืŸ ืคื™ืงืกืœื™ื ืงืจื•ื‘ื™ื ืฉื“ื•ืขื›ืช ืขื ื”ืžืจื—ืง
  • ืœืชืžื•ื ื•ืช ื™ืฉ ืžื‘ื ื” ื”ื™ืจืจื›ื™ ืฉืœ ืžืืคื™ื™ื ื™ื ืžืงื•ืžื™ื™ื (ืงืฆื•ื•ืช, ืฆื•ืจื•ืช, ืื•ื‘ื™ืงื˜ื™ื)
  • ื–ื™ื”ื•ื™ ืื•ื‘ื™ืงื˜ื™ื ืื™ื ื•ื•ืจื™ืื ื˜ื™ ืœืžื™ืงื•ื
  • ืื•ื‘ื™ืงื˜ื™ื ืžื•ื‘ื—ื ื™ื ื‘ืชืžื•ื ื” ืข"ื™ ืงื•ื•ื™ื ื‘ื›ื™ื•ื•ื ื™ื ืฉื•ื ื™ื
  • ืฉื™ื ื•ื™ื™ ื”ืืจื” ืงื˜ื ื™ื ื•ืจืขืฉ ืœื ืืžื•ืจื™ื ืœื”ืฉืคื™ืข ืขืœ ื–ื™ื”ื•ื™

ืฉืืœื”: ื”ืื ื ื™ืชืŸ ืœื”ื˜ืžื™ืข ืืœืžื ื˜ื™ื ืืœื” ื‘ืจืฉืช ืขืฆื‘ื™ืช?

ืฉื›ื‘ืช ืงื•ื ื‘ื•ืœื•ืฆื™ื”

ื ื“ื’ื™ื ื–ืืช ืขื‘ื•ืจ ืงืœื˜ ื—ื“-ืžืžื“ื™

  1. ื›ืœ ื ื•ื™ืจื•ืŸ ื‘ืฉื›ื‘ื” ื–ื• ืžื•ื–ืŸ ืจืง ืžื›ืžื•ืช ืžื•ื’ื‘ืœืช ืฉืœ ืขืจื›ื™ื ื”ื ืžืฆืื™ื ื‘ืกื‘ื™ื‘ืชื• ื”ืงืจื•ื‘ื”.
  2. ื›ืœ ื”ื ื•ื™ืจื•ื ื™ื ื‘ืฉื›ื‘ื” ืžืกื•ื™ื™ืžืช ื–ื”ื™ื (weight sharing).

ืฉื›ื‘ืช ืงื•ื ื‘ื•ืœื•ืฆื™ื”

ื ื™ืชืŸ ืœื”ืกืชื›ืœ ืขืœ ื”ืคืขื•ืœื” ืฉืœ ืฉื›ื‘ืช ื”ืงื•ื ื‘ื•ืœื•ืฆื™ื” ื‘ืื•ืคืŸ ื”ื‘ื:

ืฉื›ื‘ืช ืงื•ื ื‘ื•ืœื•ืฆื™ื”

ืžืชืžื˜ื™ืช ื”ืฉื™ื›ื‘ื” ืžื‘ืฆืขืช ืืช ืฉืœื•ืฉ ื”ืคืขื•ืœื•ืช ื”ื‘ืื•ืช:

  1. ืคืขื•ืœืช ืงืจื•ืก-ืงื•ืจืœืฆื™ื” (ื•ืœื ืงื•ื ื‘ื•ืœื•ืฆื™ื”) ื‘ื™ืŸ ื•ืงื˜ื•ืจ ื”ื›ื ื™ืกื” x\boldsymbol{x} ื•ื•ืงื˜ื•ืจ ืžืฉืงื•ืœื•ืช w\boldsymbol{w} ื‘ืื•ืจืš KK.
  2. ื”ื•ืกืคืช ื”ืกื˜ bb (ืื•ืคืฆื™ื•ื ืœื™).
  3. ื”ืคืขืœื” ืฉืœ ืคื•ื ืงืฆื™ื™ืช ื”ืคืขืœื” ืขืœ ื•ืงื˜ื•ืจ ื”ืžื•ืฆื ืื™ื‘ืจ ืื™ื‘ืจ.

ืฉื›ื‘ืช ืงื•ื ื‘ื•ืœื•ืฆื™ื”

ืคืขื•ืœืช ื”ืงืจื•ืก-ืงื•ืจืœืฆื™ื” ืžื•ื’ื“ืจืช ื‘ืื•ืคืŸ ื”ื‘ื:

yi=โˆ‘m=1Kxi+mโˆ’1wmy_i=\sum_{m=1}^K x_{i+m-1}w_m

ื•ืงื˜ื•ืจ ื”ืžืฉืงื•ืœื•ืช ืฉืœ ืฉื›ื‘ืช ื”ืงื•ื ื‘ื•ืœืฆื™ื” w\boldsymbol{w} ื ืงืจื ื’ืจืขื™ืŸ ื”ืงื•ื ื‘ื•ืœื•ืฆื™ื” (convolution kernel).

ืฉื›ื‘ืช ืงื•ื ื‘ื•ืœื•ืฆื™ื”

  • ื’ื•ื“ืœ ื”ืžื•ืฆื ืฉืœ ืฉื›ื‘ืช ื”ืงื•ื ื‘ื•ืœื•ืฆื™ื” ื”ื•ื ืงื˜ืŸ ื™ื•ืชืจ ืžื”ื›ื ื™ืกื” ื•ื”ื•ื ื ืชื•ืŸ ืขืœ ื™ื“ื™ Dout=Dinโˆ’K+1D_{\text{out}}=D_{\text{in}}-K+1.
  • ื‘ืฉื›ื‘ืช FC ืงื™ื™ืžื•ืช Dinร—DoutD_{\text{in}}\times D_{\text{out}} ืžืฉืงื•ืœื•ืช ื•ืขื•ื“ DoutD_{\text{out}} ืื™ื‘ืจื™ ื”ื™ืกื˜.
  • ื‘ืฉื›ื‘ืช ืงื•ื ื‘ื•ืœืฆื™ื” ื™ืฉ KK ืžืฉืงื•ืœื•ืช ื•ืื™ื‘ืจ ื”ื™ืกื˜ ื‘ื•ื“ื“.

ืฉื›ื‘ืช ืงื•ื ื‘ื•ืœื•ืฆื™ื”

ืงืœื˜ ืจื‘-ืขืจื•ืฆื™

ื‘ืžืงืจื™ื ืจื‘ื™ื ื ืจืฆื” ืฉืฉื›ื‘ืช ื”ืงื•ื ื‘ื•ืœืฆื™ื” ืชืงื‘ืœ ืงืœื˜ ืจื‘ ืžืžื“ื™, ืœื“ื•ื’ืžื, ืชืžื•ื ื” ื‘ืขืœืช ืฉืœื•ืฉื” ืขืจื•ืฆื™ ืฆื‘ืข ืื• ืงืœื˜ ืฉืžืข ืžืžืกืคืจ ืขืจื•ืฆื™ ื”ืงืœื˜ื”.

ืคืœื˜ ืจื‘-ืขืจื•ืฆื™

ื ืจืฆื” ืœืจื•ื‘ ืœื”ืฉืชืžืฉ ื‘ื™ื•ืชืจ ืžื’ืจืขื™ืŸ ืงื•ื ื‘ื•ืœื•ืฆื™ื” ืื—ื“, ื‘ืžืงืจื™ื ืืœื• ื ื™ื™ืฆืจ ืžืกืคืจ ืขืจื•ืฆื™ื ื‘ื™ืฆื™ืื” ืขื‘ื•ืจ ื›ืœ ืื—ื“ ืžื’ืจืขื™ื ื™ ื”ืงื•ื ื‘ื•ืœื•ืฆื™ื”.

ืื™ืŸ ืฉื™ืชื•ืฃ ืฉืœ ืžืฉืงื•ืœื•ืช ื‘ื™ืŸ ืขืจื•ืฆื™ ื”ืคืœื˜ ื”ืฉื•ื ื™ื.

ืคืœื˜ ืจื‘-ืขืจื•ืฆื™

  • CinC_\text{in} - ืžืกืคืจ ืขืจื•ืฆื™ ืงืœื˜.
  • CoutC_\text{out} - ืžืกืคืจ ืขืจื•ืฆื™ ืคืœื˜.
  • KK - ื’ื•ื“ืœ ื”ื’ืจืขื™ืŸ.

ืžืกืคืจ ื”ืคืจืžื˜ืจื™ื ื‘ืฉื›ื‘ื”: Cinร—Coutร—KโŸtheย weights+CoutโŸtheย bias\underbrace{C_\text{in}\times C_\text{out}\times K}_\text{the weights}+\underbrace{C_\text{out}}_\text{the bias}.

Padding - ืจื™ืคื•ื“

ื‘ืžื™ื“ื” ื•ื ืจืฆื” ืœืฉืžื•ืจ ืขืœ ื’ื•ื“ืœ ื”ื•ืงื˜ื•ืจ ื‘ืžื•ืฆื ืฉืœ ืฉื›ื‘ืช ื”ืงื•ื ื‘ื•ืœื•ืฆื™ื”, ื ื™ืชืŸ ืœืจืคื“ ืืช ื•ืงื˜ื•ืจ ื”ื›ื ื™ืกื” ื‘ืืคืกื™ื. ืœื“ื•ื’ืžื:


ืžืืคืฉืจ ื”ื–ื–ื” ืฉืœ ื”ื’ืจืขื™ืŸ ืœืื•ืจืš ื”ืชืžื•ื ื”.

Stride - ื’ื•ื“ืœ ืฆืขื“

ืœืขื™ืชื™ื ื ืจืฆื” ื“ื•ื•ืงื ืœื”ืงื˜ื™ืŸ ืืช ื’ื•ื“ืœ ื”ื•ืงื˜ื•ืจ ื‘ืžื•ืฆื ื‘ืคืงื˜ื•ืจ ืžืกื•ื™ื™ื. ื“ืจืš ืื—ืช ืœืขืฉื•ืช ื–ืืช ื”ื™ื ืขืœ ื™ื“ื™ ื“ื™ืœื•ืœ ื”ืžื•ืฆื. ื‘ืคื•ืขืœ ืื™ืŸ ืฆื•ืจืš ืœื—ืฉื‘ ืืช ื”ืขืจื›ื™ื ื‘ืžื•ืฆื ืฉื ื–ืจืงื™ื ื•ืœื›ืŸ ืœืžืขืฉื” ื ื™ืชืŸ ืœื—ืฉื‘ ืืช ื”ืงื•ื ื‘ื•ืœื•ืฆื™ื” ื‘ืงืคื™ืฆื•ืช ืžืกื•ื™ื™ืžื•ืช ื”ืžื›ื•ื ื•ืช stride.


ืžืฆืžืฆื ืขืœื•ืช ื—ื™ืฉื•ื‘ื™ืช ืข"ื™ ื‘ื™ืฆื•ืข downsampling.

Dilation - ื”ืชืจื—ื‘ื•ืช

ื‘ืžืงืจื™ื ืื—ืจื™ื ื ืจืฆื” ืœื”ื’ื“ื™ืœ ืืช ื”ืื™ื–ื•ืจ ืฉืžืžื ื• ืื•ืกืฃ ื ื•ื™ืจื•ืŸ ืžืกื•ื™ื™ื ืืช ื”ืงืœื˜ ืฉืœื• ืžื‘ืœื™ ืœื”ื’ื“ื™ืœ ืืช ืžืกืคืจ ื”ืคืจืžื˜ืจื™ื ื•ืืช ื”ืกื™ื‘ื•ื›ื™ื•ืช ื”ื—ื™ืฉื•ื‘ื™ืช. ืœืฉื ื›ืš ื ื™ืชืŸ ืœื“ืœืœ ืืช ื”ื“ืจืš ื‘ื” ื ื“ื’ื ื”ืงืœื˜ ืขืœ ืžื ืช ืœื”ืจื—ื™ื‘ ืืช ืื™ื–ื•ืจ ื”ืงืœื˜. ืืœื ืื ืจืฉื•ื ืื—ืจืช, ื” dilation ืฉืœ ืฉื›ื‘ื” (ื”ืฆืคื™ืคื•ืช ื‘ื” ื”ื›ื ื™ืกื” ื ื“ื’ืžืช) ื”ื•ื 1.


Max / Average Pooling

ืžื•ื˜ื™ื‘ืฆื™ื”: ื”ืงื˜ื ืช ื”ืจื–ื•ืœื•ืฆื™ื” ื”ืžืจื—ื‘ื™ืช, ืœืฆื•ืจืš ื–ื™ื”ื•ื™ ืื•ื‘ื™ื™ืงื˜ื™ื ืœืžืฉืœ.

ืฉื›ื‘ื•ืช ื ื•ืกืคื•ืช ืืฉืจ ืžื•ืคื™ืขื•ืช ื‘ืžืงืจื™ื ืจื‘ื™ื ื‘ืจืฉืชื•ืช CNN ื”ื ืฉื›ื‘ื•ืช ืžืกื•ื’ pooling. ืฉืชื™ ืฉื›ื‘ื•ืช pooling ื ืคื•ืฆื•ืช ื”ืŸ max pooling ื• average pooling, ืฉื›ื‘ื” ื–ื• ืœื•ืงื—ืช ืืช ื”ืžืžื•ืฆืข ืื• ื”ืžืงืกื™ืžื•ื ืฉืœ ืขืจื›ื™ ื”ื›ื ื™ืกื”.

ื“ื•ื’ืžื ื–ื• ืžืฆื™ื’ื” max pooling ื‘ื’ื•ื“ืœ 2 ืขื ื’ื•ื“ืœ ืฆืขื“ (stride) ื’ื ื›ืŸ ืฉืœ 2:

ื‘ืฉื›ื‘ื” ื–ืืช ืื™ืŸ ืคืจืžื˜ืจื™ื ื ืœืžื“ื™ื.

2D Convolutional Layer

kernel size=3
padding=0
stride=1
dilation=1
kernel size=4
padding=2
stride=1
dilation=1
kernel size=3
padding=1
stride=1
dilation=1
(Half padding)
kernel size=3
padding=2
stride=1
dilation=1
(Full padding)
kernel size=3
padding=0
stride=2
dilation=1
kernel size=3
padding=1
stride=2
dilation=1
kernel size=3
padding=1
stride=2
dilation=1
kernel size=3
padding=0
stride=1
dilation=2

ืžื‘ื ื” ืจืฉืช CNN

ืžื‘ื ื” ืจืฉืช CNN ื”ืจืืฉื•ื ื” ืฉื”ื•ืฆื’ื” ื‘ืฉื ืช 1989.


From https://d2l.ai/chapter_convolutional-neural-networks/lenet.html

ืžื‘ื ื” ืจืฉืช CNN

CNN

ืœืžื” CNN ื›ืœ ื›ืš ื˜ื•ื‘ื™ื ืœื‘ืขื™ื•ืช ืžืกื•ื™ื™ืžื•ืช?

  • CNNs ืžืื“ ื˜ื•ื‘ื™ื ื‘ืกื™ื•ื•ื’ ืฉืœ ืชืžื•ื ื•ืช ืœืคื™ ื”ืชื•ื›ืŸ ืฉืœื”ื.
  • ื”ืกื™ื‘ื” ืฉื‘ื’ืœืœื” CNNs ืžืชืื™ืžื™ื ืœืคืชืจื•ืŸ ืฉืœ ื‘ืขื™ื” ื–ื• ื”ื™ื ื‘ื™ืŸ ื”ื™ืชืจ ื‘ื’ืœืœ ืฉืฉืชื™ ื”ืชื›ื•ื ื•ืช, ืฉืžื‘ื“ื™ืœื•ืช ืฉื›ื‘ืช ืงื•ื ื‘ื•ืœื•ืฆื™ื” ืžืฉื›ื‘ื•ืช FC, ืžืชืื™ืžื•ืช ืœื™ื™ืฆื•ื’ ืฉืœ ื”ืคืชืจื•ืŸ.
  • ื ืชื™ื™ื—ืก ืœื›ืœ ืื—ืช ืžืฉืชื™ ื”ืชื›ื•ื ื•ืช ื‘ื ืคืจื“.

ืชืœื•ืช ืฉืœ ื›ืœ ื ื•ื™ืจื•ืŸ ืจืง ื‘ืกื‘ื™ื‘ื” ื”ืžื™ื™ื“ื™ืช ืฉืœื•

ื›ืœ ื ื•ื™ืจื•ืŸ ืจื•ืื” ืจืง ืืช ื”ืกื‘ื™ื‘ื” ื”ืžื™ื™ื“ื™ืช ืฉืœื• ื•ืœื›ืŸ ืขืœ ื”ืจืฉืช ื™ื”ื™ื” ืœื ืกื•ืช ืœื ืชื— ืืช ื”ืชืžื•ื ื” ื‘ืฆื•ืจื” ื”ื™ืจืจื›ื™ืช:


Receptive Field

  • ื”ื’ื•ื“ืœ ืฉืœ ื”ืื™ื–ื•ืจ ืฉืžืžื ื• ืžื•ืฉืคืข ื ื•ื™ืจื•ืŸ ื‘ืฉื›ื‘ื” ืžืกื•ื™ื™ืžืช ื ืงืจื ื” receptive field ืฉืœื•.
  • ืœื“ื•ื’ืžื, ื” receptive field ืฉืœ ื ื•ื™ืจื•ืŸ ื‘ืฉื›ื‘ื” ื”ืฉืœื™ืฉื™ืช ื”ื•ื 7.
  • ื‘ื ื•ืกืฃ, ื™ืฉ ื’ื ืืช ืฉื›ื‘ื•ืช ื” pooling ืืฉืจ ืžืงื˜ื™ื ื•ืช ืืช ื”ืžื™ืžื“ื™ื ื•ื‘ื›ืš ืžื’ื“ื™ืœื•ืช ืืช ื” receptive filed.

ื—ื™ืœื•ืฅ ืžืืคื™ื™ื ื™ื ืžื”ืชืžื•ื ื”

ื ื“ื’ื™ื ืืช ื”ืคืขื•ืœื” ืฉืžื‘ืฆืขืช ื”ืฉื›ื‘ื” ื”ืจืืฉื•ื ื” ื‘ืจืฉืช ืืฉืจ ืžื ืกื” ืœื–ื”ื•ืช ื”ืื ื‘ืชืžื•ื ื” ืžืกื•ื™ื™ืžืช ืžื•ืคื™ืข ืคืจืฆื•ืฃ.



ื’ืจืขื™ื ื™ ื”ืงื•ื ื‘ื•ืœื•ืฆื™ื” ืฉืœ ื”ืฉื›ื‘ื•ืช ื”ืจืืฉื•ื ื•ืช ื™ืขื‘ืจื• ืขืœ ื”ืชืžื•ื ื” ื•ื™ื—ืคืฉื• ืชื•ืคืขื•ืช ื‘ืกื™ืกื™ื•ืช ื›ืžื• ืคืกื™ื ืื ื›ื™ื™ื, ืคืกื™ื ืื•ืคืงื™ื™ื, ืคื™ื ื•ืช, ื ืงื•ื“ื•ืช ืงื˜ื ื•ืช ื•ื›ื•'.

ื—ื™ืœื•ืฅ ืžืืคื™ื™ื ื™ื ืžื”ืชืžื•ื ื”

ื›ืœ ื’ืจืขื™ืŸ ื™ื™ืฆืจ ืขืจื•ืฅ ืืฉืจ ืžืชืื™ื ืœืชื•ืคืขื” ืฉืื•ืชื” ื”ื•ื ืžื—ืคืฉ:

  • ื”ืฉื›ื‘ื•ืช ื”ื‘ืื•ืช ื‘ืจืฉืช ื™ื—ืคืฉื• ืื•ื‘ื™ื™ืงื˜ื™ื ืืฉืจ ืžื•ืจื›ื‘ื™ื ืžื”ืชื•ืคืขื•ืช ืฉืžืฆืื• ื”ืฉื›ื‘ื•ืช ื”ืจืืฉื•ื ื•ืช.
  • ืœื“ื•ื’ืžื ื ื—ืคืฉ ืื™ื–ื•ืจื™ื ืฉืžื›ื™ืœื™ื ื”ืจื‘ื” ืคืกื™ื ืื ื›ื™ื™ื ื‘ื›ื“ื™ ืœื–ื”ื•ืช ืฉื™ืขืจ, ืื• ืฉื ื™ ืคืกื™ื ืื•ืคืงื™ื™ื ืกืžื•ื›ื™ื ืฉืขืฉื•ื™ื™ื ืœื”ื›ื™ืœ ืฉืคืชื™ื™ื. ืกื•ื’ ื–ื” ืฉืœ ืขื™ื‘ื•ื“ ืžื™ื“ืข ื“ื•ืžื” ืœืžื” ืฉืžื‘ืฆืขื™ื ื‘ืžืขืจื›ืช ื”ืจืื™ื™ื” ืฉืœ ื™ื•ื ืงื™ื.

Weight sharing

ื”ืชื›ื•ื ื” ื”ื ื•ืกืคืช ืฉืœ ืฉื›ื‘ืช ื”ืงื•ื ื‘ื•ืœื•ืฆื™ื” ื”ื™ื ื” ืฉื”ืžืฉืงื•ืœื•ืช ืฉืœ ื›ืœ ื”ื ื•ื™ืจื•ื ื™ื ืžืฉื•ืชืคื™ื ื‘ื™ืŸ ื›ืœ ื”ื ื•ื™ืจื•ื ื™ื ื‘ืื•ืชื” ื”ืฉื›ื‘ื” + ืขืจื•ืฅ.

ืœืžื” ื–ื” ืœื ืžื’ื‘ื™ืœ ืืช ื”ืจืฉืช:

  1. ื”ืกื™ื•ื•ื’ ืฉืœ ื”ืชืžื•ื ื” ืœื ืืžื•ืจ ืœื”ื™ื•ืช ืžื•ืฉืคืข ืื ืžื–ื™ื–ื™ื ืืช ื”ืื•ื‘ื™ื™ืงื˜ ื‘ืชืžื•ื ื” ืžืขื˜ ืœืฆื“ื“ื™ื.
  2. ื”ืคืขื•ืœื•ืช ืฉื”ืฉื›ื‘ื•ืช ื”ืจืืฉื•ื ื•ืช ืžื‘ืฆืขื•ืช, ื›ื’ื•ืŸ ื—ื™ืคื•ืฉ ืงื•ื•ื™ื ืื•ืคืงื™ื™ื ื•ืื ื›ื™ื™ื ืžืฉื•ืชืฃ ืœื›ืœ ื”ืื™ื–ื•ืจื™ื ื‘ืชืžื•ื ื”.

ืกื™ื›ื•ื - ื™ืชืจื•ื ื•ืช ื’ื™ืฉืช ื” CNN

  • ืขื•ื‘ื“ ื™ืฉื™ืจื•ืช ืขืœ ื”ืงืœื˜ ื”ืžืงื•ืจื™ - ืชืžื•ื ื”
  • ื”ื™ืกื˜ื•ืจื™ืช - ื”ืฆืœื—ื” ืžืฉืžืขื•ืชื™ืช ืจืืฉื•ื ื” ื‘ืฉื™ืคื•ืจ ื‘ื™ืฆื•ืขื™ื ืžืฉืžืขื•ืชื™ (2012)
  • ืžืืคื™ื™ื ื™ื ืžืงื•ืžื™ื™ื ืชื•ืคืกื™ื ื”ื™ื˜ื‘ ืชื›ื•ื ื•ืช ืฉืœ ืชืžื•ื ื•ืช ื•ื‘ืฉื™ืœื•ื‘ื ืžืืคืฉืจื™ื ืฉื™ืœื•ื‘ ื”ื™ืจืจื›ื™
  • ืžืืคื™ื™ื ื™ื ืจืœื‘ื ื˜ื™ื™ื ื ืœืžื“ื™ื ืื•ื˜ื•ืžื˜ื™ืช (ื‘ืื•ืคืŸ ื”ื™ืจืจื›ื™) โ€“ ืจื–ื•ืœื•ืฆื™ื” ืžืฉืชื ื”
  • ืื™ื ื•ืจื™ืื ื˜ื™ื•ืช ืœื”ื–ื–ื•ืช ื•ื—ืกื™ื ื•ืช ื‘ืคื ื™ ืฉื™ื ื•ื™ื™ื ื‘ื ืชื•ื ื™ื
  • ืฉื™ืชื•ืฃ ืคืจืžื˜ืจื™ื โ€“ ื”ืงื˜ื ื” ืžืฉืžืขื•ืชื™ืช ื•ืžื ื™ืขืช ื”ืชืืžืช ื™ืชืจ, ื—ื™ืกื›ื•ืŸ ื‘ื—ื™ืฉื•ื‘ ื•ื–ื™ื›ืจื•ืŸ
  • ื”ืชืืžื•ืช ื•ื”ืจื—ื‘ื•ืช ืœื™ื™ืฉื•ืžื™ื ืื—ืจื™ื (ืื•ื“ื™ื•, ื•ื•ื™ื“ืื•)
  • ืฉื™ืœื•ื‘ ืžื•ืฉื›ืœ ื•ื™ืขื™ืœ ื‘ื™ืŸ ื™ื“ืข ืžื•ืงื“ื (ืžื‘ื ื” ื”ืจืฉืช) ืœื ืชื•ื ื™ื ืžื”ืขื•ืœื
## Batch Normalization (ืœื ืœืžื‘ื—ืŸ)
  • ืื—ืช ื”ื‘ืขื™ื•ืช ื‘ืขื‘ื•ื“ื” ืขื ืจืฉืชื•ืช ืขืžื•ืงื•ืช ื”ื™ื ื” ืžืฆื‘ ืฉื‘ื• ื”ืขืจื›ื™ื ื‘ืžื•ืฆืื™ื ืฉืœ ื”ืฉื›ื‘ื•ืช ื”ื ืžืกื“ืจ ื’ื•ื“ืœ ืฉื•ื ื”.
  • ื”ื“ื‘ืจ ืžืฉืคื™ืข ืขืœ ื”ื’ืจื“ื™ืื ื˜ื™ื ื•ืžืงืฉื” ืขืœ ื”ื‘ื—ื™ืจื” ืฉืœ ื’ื•ื“ืœ ื”ืฆืขื“.
  • ื“ืจืš ืื—ืช ืœื ืกื•ืช ื•ืœื”ื‘ื˜ื™ื— ื›ื™ ื”ืžื•ืฆืื™ื ื™ื”ื™ื• ื‘ืขืจืš ืžืื•ืชื• ืกื“ืจ ื’ื•ื“ืœ ื”ื™ื ื” ืขืœ ื™ื“ื™ ื”ื•ืกืคื” ืฉืœ ืฉื›ื‘ื” ื‘ืฉื batch normalization.

Batch Normalization (ืœื ืœืžื‘ื—ืŸ)

  • ืžื ืกื” ืœื ืจืžืœ ืืช ื”ืขืจื›ื™ื ืืฉืจ ืขื•ื‘ืจื™ื ื“ืจื›ื” (ืžื‘ื™ืื” ืืช ื”ืชื•ื—ืœืช ืฉืœ ื”ืขืจื›ื™ื ืœ 0 ื•ืืช ื”ืกื˜ื™ื™ืช ืชืงืŸ ืœ 1).
  • ืขื•ืฉื” ื–ืืช ืขืœ ื™ื“ื™ ื—ื™ืฉื•ื‘ ื”ืชื•ื—ืœืช ื•ืกื˜ื™ื™ืช ื”ืชืงืŸ ื”ืืžืคื™ืจื™ืช ืฉืœ ื”ืขืจื›ื™ื ืขืœ ืคื ื™ ื” batch.

Batch Normalization (ืœื ืœืžื‘ื—ืŸ)

ฮผ=1Mโˆ‘i=1Mzin(i)\boldsymbol{\mu}=\frac{1}{M}\sum_{i=1}^M \boldsymbol{z}_{\text{in}}^{(i)} ฯƒ2=1Mโˆ‘i=1M(zin(i)โˆ’ฮผ)2\sigma^2=\frac{1}{M}\sum_{i=1}^M (\boldsymbol{z}_{\text{in}}^{(i)}-\boldsymbol{\mu})^2

ื”ืžื•ืฆื ืฉืœ ื”ืฉื›ื‘ื” ื™ื”ื™ื”:

zout=zinโˆ’ฮผฯƒ+ฯต\boldsymbol{z}_{\text{out}}=\frac{ \boldsymbol{z}_{\text{in}}-\boldsymbol{\mu} }{\sigma+\epsilon}

Batch Normalization (ืœื ืœืžื‘ื—ืŸ)

ืœืจื•ื‘ ื”ืฉื›ื‘ื” ืชื›ื™ืœ ื’ื ื˜ืจื ืกืคื•ืจืžืฆื™ื” ืœื™ื ืืจื™ืช ื ืœืžื“ืช ืขื ืคืจืžื˜ืจื™ื ฮณ\gamma ื• ฮฒ\beta:

zout=zinโˆ’ฮผฯƒ+ฯตโ‹…ฮณ+ฮฒ\boldsymbol{z}_{\text{out}}=\frac{ \boldsymbol{z}_{\text{in}}-\boldsymbol{\mu} }{\sigma+\epsilon}\cdot\gamma+\beta

ื›ืืฉืจ ฮณ\gamma ื• ฮฒ\beta ื”ื•ื ื•ืงื˜ื•ืจื™ื ื‘ืื•ืจืš ืฉืœ z\boldsymbol{z} ื•ื”ืžื›ืคืœื” ืขื ฮณ\gamma ื”ื™ื ืื™ื‘ืจ ืื™ื‘ืจ.

ืื—ืจื™ ืฉืœื‘ ื”ืื™ืžื•ืŸ

ื‘ืžื”ืœืš ื”ืœื™ืžื•ื“ ืžื—ื–ื™ืงื™ื ืžืžื•ืฆืข ื ืข (exponantial moving average) ืฉืœ ื”ืขืจื›ื™ื ฮผ\mu ื• ฯƒ\sigma ื•ื‘ืกื•ืฃ ืฉืœื‘ ื”ืœื™ืžื•ื“ ืžืงื‘ืขื™ื ืืช ื”ืขืจื›ื™ื ืฉืœื”ื ื•ืืœื• ื”ืขืจื›ื™ื ืฉื‘ื”ื ื”ืจืฉืช ืชืฉืชืžืฉ ืœืื—ืจ ืฉืœื‘ ื”ืื™ืžื•ืŸ.