DonHurry

step46. Optimizer๋กœ ์ˆ˜ํ–‰ํ•˜๋Š” ๋งค๊ฐœ๋ณ€์ˆ˜ ๊ฐฑ์‹  ๋ณธ๋ฌธ

DeZero/๐Ÿ—ป์ œ4๊ณ ์ง€

step46. Optimizer๋กœ ์ˆ˜ํ–‰ํ•˜๋Š” ๋งค๊ฐœ๋ณ€์ˆ˜ ๊ฐฑ์‹ 

_๋„๋… 2023. 2. 25. 00:25

๐Ÿ“ข ๋ณธ ํฌ์ŠคํŒ…์€ ๋ฐ‘๋ฐ”๋‹ฅ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜๋Š” ๋”ฅ๋Ÿฌ๋‹3์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ž‘์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋ฐฐ์šด ๋‚ด์šฉ์„ ๊ธฐ๋กํ•˜๊ณ , ๊ฐœ์ธ์ ์ธ ๊ณต๋ถ€๋ฅผ ์œ„ํ•ด ์ž‘์„ฑํ•˜๋Š” ํฌ์ŠคํŒ…์ž…๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๊ต์žฌ ๊ตฌ๋งค๋ฅผ ๊ฐ•๋ ฅ ์ถ”์ฒœ๋“œ๋ฆฝ๋‹ˆ๋‹ค.

 

 

์ง€๊ธˆ๊นŒ์ง€๋Š” ๋งค๊ฐœ๋ณ€์ˆ˜ ๊ฐฑ์‹ ์— ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•์„ ํ™œ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์‹ค์ œ ๋ชจ๋ธ ํ•™์Šต์—๋Š” ๋” ๋‹ค์–‘ํ•œ ์ตœ์ ํ™” ๊ธฐ๋ฒ•์ด ์žˆ๊ณ , ์—ฌ๋Ÿฌ ๊ฐ€์ง€๋ฅผ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ด๋ฒˆ ๋‹จ๊ณ„์—์„œ ๊ฐฑ์‹ ํ•˜๋Š” ์ž‘์—…์„ ๋ชจ๋“ˆํ™”ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

 

๋จผ์ € ๋งค๊ฐœ๋ณ€์ˆ˜ ๊ฐฑ์‹ ์„ ์œ„ํ•œ ๊ธฐ๋ฐ˜ ํด๋ž˜์Šค๋ฅผ ๋งˆ๋ จํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ๋ณธ์ ์œผ๋กœ target๊ณผ hooks๋ผ๋Š” ์ธ์Šคํ„ด์Šค ๋ณ€์ˆ˜๋ฅผ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค. target์€ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ๊ฐ–๋Š” ํด๋ž˜์Šค์ธ Model์ด๋‚˜ Layer๋ฅผ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค. ์‹ค์ œ์ ์œผ๋กœ ๋งค๊ฐœ๋ณ€์ˆ˜ ๊ฐฑ์‹ ์€ update_one์—์„œ ์‹คํ–‰ํ•˜๋Š”๋ฐ, ์ด๋Š” ์ž์‹ ํด๋ž˜์Šค์—์„œ ์žฌ์ •์˜ํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. hook์€ ๊ฐ€์ค‘์น˜ ๊ฐ์†Œ(Weight Decay) ํ˜น์€ ๊ธฐ์šธ๊ธฐ ํด๋ฆฌํ•‘(Gradient Clipping) ๊ฐ™์€ ๊ธฐ๋ฒ•์„ ํ™œ์šฉํ•  ๋•Œ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

class Optimizer:
    def __init__(self):
        self.target = None
        self.hooks = []
    
    def setup(self, target):
        self.target = target
        return self
    
    def update(self):
        params = [p for p in self.target.params() if p.grad is not None]
		
        # ์ „์ฒ˜๋ฆฌ(์˜ต์…˜)
        for f in self.hooks:
            f(params)
        
        # ๋งค๊ฐœ๋ณ€์ˆ˜ ๊ฐฑ์‹ 
        for param in params:
            self.update_one(param)
    
    def update_one(self, param):
        raise NotImplementedError()
    
    def add_hook(self, f):
        self.hooks.append(f)

 

์ด์ œ Optimizer ํด๋ž˜์Šค๋ฅผ ์ƒ์†ํ•˜์—ฌ SGD ํด๋ž˜์Šค๋ฅผ ๊ตฌํ˜„ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. SGD๋Š” ํ™•๋ฅ ์ ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•(Stochastic Gradient Descent)์˜ ์•ฝ์ž์ž…๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ ํ™•๋ฅ ์ ์ด๋ž€ ๋Œ€์ƒ ๋ฐ์ดํ„ฐ ์ค‘์—์„œ ๋ฌด์ž‘์œ„๋กœ ์„ ๋ณ„ํ•œ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒƒ์„ ๋œปํ•ฉ๋‹ˆ๋‹ค.

class SGD(Optimizer):
    def __init__(self, lr=0.01):
        super().__init__()
        self.lr = lr
    
    def update_one(self, param):
        param.data -= self.lr * param.grad.data

 

์ด์ œ ์ด์ „ ๋‹จ๊ณ„์—์„œ ํ’€์—ˆ๋˜ ๋ฌธ์ œ๋ฅผ ๋‹ค์‹œ ํ’€์–ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. MLP ํด๋ž˜์Šค๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ์ƒ์„ฑํ•˜๊ณ , SGD ํด๋ž˜์Šค๋กœ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ๊ฐฑ์‹ ํ•ฉ๋‹ˆ๋‹ค. ์ด์ „๋ณด๋‹ค ํ›จ์”ฌ ๊น”๋”ํ•ด์กŒ์Šต๋‹ˆ๋‹ค.

import numpy as np
from dezero import Variable
from dezero import optimizers
import dezero.functions as F
from dezero.models import MLP


np.random.seed(0)
x = np.random.rand(100, 1)
y = np.sin(2 * np.pi * x) + np.random.rand(100, 1)

lr = 0.2
max_iter = 10000
hidden_size = 10

model = MLP((hidden_size, 1))
optimizer = optimizers.SGD(lr).setup(model)

for i in range(max_iter):
    y_pred = model(x)
    loss = F.mean_squared_error(y, y_pred)

    model.cleargrads()
    loss.backward()

    optimizer.update()
    if i % 1000 == 0:
        print(loss)

 

SGD ์ด์™ธ์˜ ์ถ”๊ฐ€์ ์ธ ์ตœ์ ํ™” ๊ธฐ๋ฒ•์€ ์•„๋ž˜ ๊นƒํ—ˆ๋ธŒ ๋งํฌ๋ฅผ ์ฐธ๊ณ ํ•ด์ฃผ์„ธ์š”.

 

GitHub - WegraLee/deep-learning-from-scratch-3: ใ€Ž๋ฐ‘๋ฐ”๋‹ฅ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜๋Š” ๋”ฅ๋Ÿฌ๋‹ โธใ€(ํ•œ๋น›๋ฏธ๋””์–ด, 2020)

ใ€Ž๋ฐ‘๋ฐ”๋‹ฅ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜๋Š” ๋”ฅ๋Ÿฌ๋‹ โธใ€(ํ•œ๋น›๋ฏธ๋””์–ด, 2020). Contribute to WegraLee/deep-learning-from-scratch-3 development by creating an account on GitHub.

github.com