2020/11/13 - [머신러닝/딥러닝] - 합성곱 신경망(Convolutional Neural Network)
가중치 규제(Weight Regularization)
가중치의 값이 커지지 않도록 제한하는 기법으로 과대적합(overfitting)을 억제합니다. 손실 함수에 norm을 추가하는 것으로 대표적으로 L1 규제와 L2 규제가 있습니다.
L1 규제는 손실 함수에 L1 norm을 추가하는 것입니다.
$ L + \alpha \left| \left| w \right| \right|_1 $
L1 norm은 가중치의 절대값을 더한 값이며, $ \alpha $ 는 규제의 양을 조절하는 파라미터입니다.
$ \left| \left| w \right| \right|_1 = \displaystyle \sum_{i=1}^n \left| w_i \right| $
경사 하강법을 이용한 업데이트 식은 다음과 같습니다.
$\begin{aligned}
w' & = w + { \partial \over \partial w } L + \color{ red }{ { \partial \over \partial w } \alpha \ \displaystyle \sum_{i=1}^n \left| w_i \right| } \\
& = w + { \partial \over \partial w } L + \color{ red }{ \alpha \ sign(w) }
\end{aligned}$
$ \left| w \right| $ 을 미분하면 부호만 남습니다. $ sign(w) $ 은 $ w $ 의 부호라는 의미입니다.
L2 규제는 손실 함수에 L2 norm을 추가하는 것입니다.
$ L + \alpha \ \left| \left| w \right| \right|_2 $
L2 norm은 가중치의 제곱을 더한 값이며, $ \alpha $ 는 규제의 양을 조절하는 파라미터입니다.
$ \left| \left| w \right| \right|_2 = { 1 \over 2 } \displaystyle \sum_{i=1}^n \left| w_i \right|^2 $
경사 하강법을 이용한 업데이트 식은 다음과 같습니다.
$\begin{aligned}
w' & = w + { \partial \over \partial w } L + \color{ red }{ { \partial \over \partial w } { 1 \over 2 } \alpha \ \displaystyle \sum_{i=1}^n \left| w_i \right|^2 }\\
& = w + { \partial \over \partial w } L + \color{ red }{ \alpha \ w }
\end{aligned}$
텐서플로에서 제공하는 규제를 이용할 수 있습니다. (tf.contrib.layers.l2_regularizer)
import tensorflow as tf
tf.reset_default_graph()
x = tf.placeholder(tf.float32, [None, 1])
reg1 = tf.contrib.layers.l1_regularizer(scale=0.1)
reg2 = tf.contrib.layers.l2_regularizer(scale=0.1)
fc1 = tf.layers.dense(x, 2, kernel_regularizer=reg1)
fc2 = tf.layers.dense(x, 2, kernel_regularizer=reg2)
vars = tf.trainable_variables()
norm = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
w1 = sess.run(vars[0])
w2 = sess.run(vars[2])
norm_ = sess.run(norm, {x: [[1.]]})
print('weights')
print(w1)
print(w2)
print()
print('L1 norm')
print(norm_[0], np.sum(abs(w1)) * 0.1)
print('L2 norm')
print(norm_[1], np.sum(np.square(w2))* 0.5 * 0.1)
weights
[[-0.5047436 1.1104265]]
[[-0.5299379 0.07715666]]
L1 norm
0.16151701 0.16151701211929323
L2 norm
0.014339368 0.014339368045330049
신경망의 손실 함수에 추가할 norm은 규제가 적용된 가중치의 합으로 구합니다.
import tensorflow as tf
tf.reset_default_graph()
x = tf.placeholder(tf.float32, [None, 1])
reg = tf.contrib.layers.l2_regularizer(scale=0.1)
fc1 = tf.layers.dense(x, 2, kernel_regularizer=reg)
fc2 = tf.layers.dense(x, 2, kernel_regularizer=reg)
vars = tf.trainable_variables()
norm = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
w1 = sess.run(vars[0])
w2 = sess.run(vars[2])
fc1_ = np.sum(np.square(w1))* 0.5 * 0.1
fc2_ = np.sum(np.square(w2))* 0.5 * 0.1
norm_ = sess.run(norm, {x: [[1.]]})
print('weights')
print(w1)
print(w2)
print()
print('norm')
print(np.sum(norm_))
print(fc1_ + fc2_)
weights
[[0.26062596 0.3645985 ]]
[[-0.8904719 -0.8322536]]
norm
0.08432221
0.08432220667600632
구현
합성곱 신경망 모델에 L2 규제를 적용합니다.
패션 MNIST 데이터셋을 불러옵니다.
import numpy as np
from tensorflow.keras import datasets
(x_train, y_train), (x_test, y_test) = datasets.fashion_mnist.load_data()
print('data shape:', x_train.shape)
print('target shape:', y_train.shape)
print('target label:', np.unique(y_train, return_counts=True))
data shape: (60000, 28, 28)
target shape: (60000,)
target label: (array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=uint8), array([6000, 6000, 6000, 6000, 6000, 6000, 6000, 6000, 6000, 6000],
dtype=int64))
데이터 형태를 변형합니다.
x_train = x_train.reshape(60000, 28, 28, 1)
x_test = x_test.reshape(10000, 28, 28, 1)
학습 데이터의 20%를 검증 데이터로 분할합니다.
from sklearn.model_selection import train_test_split
x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=0.2)
신경망을 정의합니다.
import numpy as np
import tensorflow as tf
class Model:
def __init__(self, lr=1e-3, reg=False, lambda_reg=0.01, path_name=''):
tf.reset_default_graph()
regularizer=None
if reg:
regularizer = tf.contrib.layers.l2_regularizer(scale=lambda_reg)
with tf.name_scope('input'):
self.x = tf.placeholder(tf.float32, [None, 28, 28, 1])
self.y = tf.placeholder(tf.int64)
with tf.name_scope('preprocessing'):
x_norm = self.x / 255.0
y_onehot = tf.one_hot(self.y, 10)
with tf.name_scope('layer'):
conv1 = tf.layers.conv2d(x_norm, 32, [3, 3], padding='VALID', activation=tf.nn.relu)
pool1 = tf.layers.max_pooling2d(conv1, [2, 2], [2, 2], padding='VALID')
conv2 = tf.layers.conv2d(pool1, 64, [3, 3], padding='VALID', activation=tf.nn.relu)
pool2 = tf.layers.max_pooling2d(conv2, [2, 2], [2, 2], padding='VALID')
flat = tf.layers.flatten(pool2)
fc = tf.layers.dense(flat, 64, tf.nn.relu, kernel_regularizer=regularizer)
logits = tf.layers.dense(fc, 10, kernel_regularizer=regularizer)
with tf.name_scope('output'):
self.predict = tf.argmax(tf.nn.softmax(logits), 1)
with tf.name_scope('accuracy'):
self.accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.to_int64(self.predict), self.y), dtype=tf.float32))
with tf.name_scope('loss'):
cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(labels=y_onehot, logits=logits)
self.loss = tf.reduce_mean(cross_entropy)
if reg:
norm = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
self.loss += tf.reduce_sum(norm)
with tf.name_scope('optimizer'):
self.train_op = tf.train.AdamOptimizer(lr).minimize(self.loss)
with tf.name_scope('summary'):
self.summary_loss = tf.placeholder(tf.float32)
self.summary_accuracy = tf.placeholder(tf.float32)
tf.summary.scalar('loss', self.summary_loss)
tf.summary.scalar('accuracy', self.summary_accuracy)
self.merge = tf.summary.merge_all()
self.train_writer = tf.summary.FileWriter('./tmp/cnn_fashion_mnist/' + path_name + 'train', tf.get_default_graph())
self.val_writer = tf.summary.FileWriter('./tmp/cnn_fashion_mnist/' + path_name + 'val', tf.get_default_graph())
self.sess = tf.Session()
self.sess.run(tf.global_variables_initializer())
def write_summary(self, tl, ta, vl, va, epoch):
train_summary = self.sess.run(self.merge, {self.summary_loss: tl, self.summary_accuracy: ta})
val_summary = self.sess.run(self.merge, {self.summary_loss: vl, self.summary_accuracy: va})
self.train_writer.add_summary(train_summary, epoch)
self.val_writer.add_summary(val_summary, epoch)
def train(self, x_train, y_train, x_val, y_val, epochs, batch_size=32):
data_size = len(x_train)
for e in range(epochs):
t_l, t_a = [], []
idx = np.random.permutation(np.arange(data_size))
_x_train, _y_train = x_train[idx], y_train[idx]
for i in range(0, data_size, batch_size):
si, ei = i, i + batch_size
if ei > data_size:
ei = data_size
x_batch, y_batch = _x_train[si:ei, :, :], _y_train[si:ei]
tl, ta, _ = self.sess.run([self.loss, self.accuracy, self.train_op], {self.x: x_batch, self.y: y_batch})
t_l.append(tl)
t_a.append(ta)
vl, va = self.sess.run([self.loss, self.accuracy], {self.x: x_val, self.y: y_val})
self.write_summary(np.mean(t_l), np.mean(t_a), vl, va, e)
print('epoch:', e + 1, ' / train_loss:', np.mean(t_l), '/ train_acc:', np.mean(t_a), ' / val_loss:', vl, '/ val_acc:', va)
def score(self, x, y):
return self.sess.run(self.accuracy, {self.x: x, self.y: y})
규제를 정의합니다.
regularizer=None
if reg:
regularizer = tf.contrib.layers.l2_regularizer(scale=lambda_reg)
규제를 적용할 층에 매개변수로 지정합니다.
with tf.name_scope('layer'):
conv1 = tf.layers.conv2d(x_norm, 32, [3, 3], padding='VALID', activation=tf.nn.relu)
pool1 = tf.layers.max_pooling2d(conv1, [2, 2], [2, 2], padding='VALID')
conv2 = tf.layers.conv2d(pool1, 64, [3, 3], padding='VALID', activation=tf.nn.relu)
pool2 = tf.layers.max_pooling2d(conv2, [2, 2], [2, 2], padding='VALID')
flat = tf.layers.flatten(pool2)
## regularizer ##
fc = tf.layers.dense(flat, 64, tf.nn.relu, kernel_regularizer=regularizer)
logits = tf.layers.dense(fc, 10, kernel_regularizer=regularizer)
손실 함수에 norm을 추가합니다.
with tf.name_scope('loss'):
cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(labels=y_onehot, logits=logits)
self.loss = tf.reduce_mean(cross_entropy)
if reg:
norm = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
self.loss += tf.reduce_sum(norm)
규제를 적용하지 않은 모델을 학습하고 테스트합니다.
model = Model()
model.train(x_train, y_train, x_val, y_val, epochs=20)
model.score(x_test, y_test)
epoch: 1 / train_loss: 0.4899995 / train_acc: 0.82308334 / val_loss: 0.34418163 / val_acc: 0.87675
epoch: 2 / train_loss: 0.32420996 / train_acc: 0.8821458 / val_loss: 0.32503137 / val_acc: 0.8814167
epoch: 3 / train_loss: 0.2795372 / train_acc: 0.89872915 / val_loss: 0.27930364 / val_acc: 0.8980833
epoch: 4 / train_loss: 0.24964015 / train_acc: 0.9086667 / val_loss: 0.25830066 / val_acc: 0.90525
epoch: 5 / train_loss: 0.22372074 / train_acc: 0.91735417 / val_loss: 0.25123677 / val_acc: 0.9059167
epoch: 6 / train_loss: 0.20405148 / train_acc: 0.92397916 / val_loss: 0.23442969 / val_acc: 0.91391665
epoch: 7 / train_loss: 0.18366154 / train_acc: 0.93202084 / val_loss: 0.25422248 / val_acc: 0.906
epoch: 8 / train_loss: 0.16686963 / train_acc: 0.93939584 / val_loss: 0.24268445 / val_acc: 0.91508335
epoch: 9 / train_loss: 0.1534329 / train_acc: 0.94364583 / val_loss: 0.24112262 / val_acc: 0.91525
epoch: 10 / train_loss: 0.13780946 / train_acc: 0.9493542 / val_loss: 0.23617037 / val_acc: 0.91541666
epoch: 11 / train_loss: 0.12407387 / train_acc: 0.9548333 / val_loss: 0.26025593 / val_acc: 0.91616666
epoch: 12 / train_loss: 0.112011366 / train_acc: 0.9585417 / val_loss: 0.2636378 / val_acc: 0.916
epoch: 13 / train_loss: 0.10216832 / train_acc: 0.96141666 / val_loss: 0.2987651 / val_acc: 0.91066664
epoch: 14 / train_loss: 0.09138283 / train_acc: 0.9656875 / val_loss: 0.29418647 / val_acc: 0.9174167
epoch: 15 / train_loss: 0.08280988 / train_acc: 0.96975 / val_loss: 0.31967795 / val_acc: 0.91425
epoch: 16 / train_loss: 0.075191446 / train_acc: 0.97283334 / val_loss: 0.3187853 / val_acc: 0.9095
epoch: 17 / train_loss: 0.06787912 / train_acc: 0.9746042 / val_loss: 0.3280637 / val_acc: 0.91525
epoch: 18 / train_loss: 0.06372369 / train_acc: 0.9764583 / val_loss: 0.34051234 / val_acc: 0.91608334
epoch: 19 / train_loss: 0.05584117 / train_acc: 0.9784167 / val_loss: 0.36320615 / val_acc: 0.9105
epoch: 20 / train_loss: 0.053313486 / train_acc: 0.9807708 / val_loss: 0.42804903 / val_acc: 0.9059167
0.8992
10 에포크 이후에 과대적합이 발생합니다.
규제의 양을 조절하는 파라미터 값 0.0001, 0.001, 0.01을 비교합니다.
파라미터 값을 0.0001로 지정한 경우에는 규제의 강도가 미비하여 과대적합이 발생합니다.
model = Model(reg=True, lambda_reg=0.0001, path_name='weight_decay_0.0001_')
model.train(x_train, y_train, x_val, y_val, epochs=20)
model.score(x_test, y_test)
epoch: 1 / train_loss: 0.51481104 / train_acc: 0.8163125 / val_loss: 0.41753772 / val_acc: 0.85225
epoch: 2 / train_loss: 0.345886 / train_acc: 0.8766875 / val_loss: 0.32372278 / val_acc: 0.88841665
epoch: 3 / train_loss: 0.30285904 / train_acc: 0.89360416 / val_loss: 0.31370842 / val_acc: 0.89208335
epoch: 4 / train_loss: 0.27255118 / train_acc: 0.9043125 / val_loss: 0.28951547 / val_acc: 0.90108335
epoch: 5 / train_loss: 0.24855728 / train_acc: 0.91485417 / val_loss: 0.29960057 / val_acc: 0.89675
epoch: 6 / train_loss: 0.23295376 / train_acc: 0.92170835 / val_loss: 0.28628847 / val_acc: 0.90316665
epoch: 7 / train_loss: 0.21696827 / train_acc: 0.9271875 / val_loss: 0.2921477 / val_acc: 0.902
epoch: 8 / train_loss: 0.20461519 / train_acc: 0.9319792 / val_loss: 0.27835524 / val_acc: 0.90925
epoch: 9 / train_loss: 0.1896936 / train_acc: 0.939875 / val_loss: 0.3034753 / val_acc: 0.90358335
epoch: 10 / train_loss: 0.17822354 / train_acc: 0.94325 / val_loss: 0.3268171 / val_acc: 0.9
epoch: 11 / train_loss: 0.17124838 / train_acc: 0.94602084 / val_loss: 0.2925508 / val_acc: 0.91108334
epoch: 12 / train_loss: 0.1610266 / train_acc: 0.95104164 / val_loss: 0.32277325 / val_acc: 0.90291667
epoch: 13 / train_loss: 0.15321763 / train_acc: 0.9546458 / val_loss: 0.31398556 / val_acc: 0.907
epoch: 14 / train_loss: 0.1440719 / train_acc: 0.95783335 / val_loss: 0.32070822 / val_acc: 0.90833336
epoch: 15 / train_loss: 0.13773532 / train_acc: 0.960125 / val_loss: 0.33933806 / val_acc: 0.90608335
epoch: 16 / train_loss: 0.13126785 / train_acc: 0.96283334 / val_loss: 0.34860352 / val_acc: 0.90891665
epoch: 17 / train_loss: 0.12526983 / train_acc: 0.96520835 / val_loss: 0.3467158 / val_acc: 0.909
epoch: 18 / train_loss: 0.119342156 / train_acc: 0.9675625 / val_loss: 0.34521878 / val_acc: 0.9098333
epoch: 19 / train_loss: 0.11426294 / train_acc: 0.9703125 / val_loss: 0.38304567 / val_acc: 0.90875
epoch: 20 / train_loss: 0.11378552 / train_acc: 0.9705625 / val_loss: 0.37654424 / val_acc: 0.91125
0.9094
파라미터 값을 0.001로 지정한 경우에는 과대적합이 발생하지 않으며 성능이 소폭 향상되었습니다.
model = Model(reg=True, lambda_reg=0.001, path_name='weight_decay_0.001_')
model.train(x_train, y_train, x_val, y_val, epochs=20)
model.score(x_test, y_test)
epoch: 1 / train_loss: 0.55039114 / train_acc: 0.8218333 / val_loss: 0.4208436 / val_acc: 0.87133336
epoch: 2 / train_loss: 0.39176953 / train_acc: 0.87941664 / val_loss: 0.37457174 / val_acc: 0.88558334
epoch: 3 / train_loss: 0.3536052 / train_acc: 0.8917292 / val_loss: 0.3548428 / val_acc: 0.89208335
epoch: 4 / train_loss: 0.3287368 / train_acc: 0.90052086 / val_loss: 0.34722483 / val_acc: 0.8950833
epoch: 5 / train_loss: 0.31021914 / train_acc: 0.907625 / val_loss: 0.35307658 / val_acc: 0.89208335
epoch: 6 / train_loss: 0.29850483 / train_acc: 0.9120625 / val_loss: 0.3384591 / val_acc: 0.89933336
epoch: 7 / train_loss: 0.28659913 / train_acc: 0.9167917 / val_loss: 0.31821793 / val_acc: 0.9065833
epoch: 8 / train_loss: 0.27779076 / train_acc: 0.9196042 / val_loss: 0.34279007 / val_acc: 0.8980833
epoch: 9 / train_loss: 0.26766524 / train_acc: 0.9232708 / val_loss: 0.3185545 / val_acc: 0.9059167
epoch: 10 / train_loss: 0.25809973 / train_acc: 0.9271042 / val_loss: 0.31635404 / val_acc: 0.90716666
epoch: 11 / train_loss: 0.2516942 / train_acc: 0.9291667 / val_loss: 0.33354875 / val_acc: 0.9030833
epoch: 12 / train_loss: 0.24395902 / train_acc: 0.93291664 / val_loss: 0.3235069 / val_acc: 0.90716666
epoch: 13 / train_loss: 0.23825896 / train_acc: 0.93379164 / val_loss: 0.32964057 / val_acc: 0.9026667
epoch: 14 / train_loss: 0.23224322 / train_acc: 0.93604165 / val_loss: 0.34270942 / val_acc: 0.90216666
epoch: 15 / train_loss: 0.22589256 / train_acc: 0.93854165 / val_loss: 0.32007968 / val_acc: 0.91108334
epoch: 16 / train_loss: 0.21986204 / train_acc: 0.94083333 / val_loss: 0.32736585 / val_acc: 0.90816665
epoch: 17 / train_loss: 0.2155721 / train_acc: 0.9426875 / val_loss: 0.3209575 / val_acc: 0.912
epoch: 18 / train_loss: 0.211696 / train_acc: 0.94416666 / val_loss: 0.3179196 / val_acc: 0.91366667
epoch: 19 / train_loss: 0.20734118 / train_acc: 0.94677085 / val_loss: 0.32665992 / val_acc: 0.91225
epoch: 20 / train_loss: 0.2022699 / train_acc: 0.9475833 / val_loss: 0.31761256 / val_acc: 0.91358334
0.9106
파라미터 값을 0.01로 지정한 경우에는 규제의 강도가 커서 과소적합이 발생합니다.
model = Model(reg=True, lambda_reg=0.01, path_name='weight_decay_0.01_')
model.train(x_train, y_train, x_val, y_val, epochs=20)
model.score(x_test, y_test)
epoch: 1 / train_loss: 0.7216794 / train_acc: 0.8040625 / val_loss: 0.58291274 / val_acc: 0.83925
epoch: 2 / train_loss: 0.50784343 / train_acc: 0.8576667 / val_loss: 0.49030805 / val_acc: 0.8609167
epoch: 3 / train_loss: 0.4646731 / train_acc: 0.8681667 / val_loss: 0.4526889 / val_acc: 0.87375
epoch: 4 / train_loss: 0.43478292 / train_acc: 0.8775833 / val_loss: 0.47555774 / val_acc: 0.8635833
epoch: 5 / train_loss: 0.4173219 / train_acc: 0.8821458 / val_loss: 0.41809207 / val_acc: 0.8805
epoch: 6 / train_loss: 0.40076128 / train_acc: 0.88591665 / val_loss: 0.4433061 / val_acc: 0.87191665
epoch: 7 / train_loss: 0.38866675 / train_acc: 0.89079165 / val_loss: 0.43044075 / val_acc: 0.8745
epoch: 8 / train_loss: 0.3777382 / train_acc: 0.89391667 / val_loss: 0.39097238 / val_acc: 0.88958335
epoch: 9 / train_loss: 0.36583894 / train_acc: 0.8960417 / val_loss: 0.3819606 / val_acc: 0.89225
epoch: 10 / train_loss: 0.35933015 / train_acc: 0.8998333 / val_loss: 0.4025547 / val_acc: 0.8825833
epoch: 11 / train_loss: 0.35212287 / train_acc: 0.90122914 / val_loss: 0.37651968 / val_acc: 0.89225
epoch: 12 / train_loss: 0.34527814 / train_acc: 0.9032292 / val_loss: 0.37082198 / val_acc: 0.8933333
epoch: 13 / train_loss: 0.33904266 / train_acc: 0.9048125 / val_loss: 0.3604129 / val_acc: 0.89525
epoch: 14 / train_loss: 0.33481276 / train_acc: 0.9063333 / val_loss: 0.35880318 / val_acc: 0.89675
epoch: 15 / train_loss: 0.33108407 / train_acc: 0.90708333 / val_loss: 0.3690926 / val_acc: 0.8955
epoch: 16 / train_loss: 0.32683825 / train_acc: 0.9093958 / val_loss: 0.36419964 / val_acc: 0.8979167
epoch: 17 / train_loss: 0.3242014 / train_acc: 0.908125 / val_loss: 0.35681537 / val_acc: 0.8965833
epoch: 18 / train_loss: 0.31887272 / train_acc: 0.9120625 / val_loss: 0.3517346 / val_acc: 0.8995
epoch: 19 / train_loss: 0.31730837 / train_acc: 0.9111875 / val_loss: 0.3501565 / val_acc: 0.89933336
epoch: 20 / train_loss: 0.3142373 / train_acc: 0.9133125 / val_loss: 0.34369817 / val_acc: 0.90216666
0.9005
과소적합으로 학습과 검증의 학습 곡선의 간격이 가까운 것을 확인할 수 있습니다.
'머신러닝 > 딥러닝' 카테고리의 다른 글
오토인코더(AutoEncoder) (0) | 2021.02.16 |
---|---|
드롭아웃(Dropout) (0) | 2021.02.09 |
배치 정규화(Batch Normalization) (0) | 2021.02.09 |
가중치 초기화(Weight Initialization) (0) | 2021.02.08 |
LSTM(Long Short-Term Memory) (0) | 2021.02.03 |