DonHurry

step44. ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ๋ชจ์•„๋‘๋Š” ๊ณ„์ธต ๋ณธ๋ฌธ

DeZero/๐Ÿ—ป์ œ4๊ณ ์ง€

step44. ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ๋ชจ์•„๋‘๋Š” ๊ณ„์ธต

_๋„๋… 2023. 2. 23. 00:50

๐Ÿ“ข ๋ณธ ํฌ์ŠคํŒ…์€ ๋ฐ‘๋ฐ”๋‹ฅ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜๋Š” ๋”ฅ๋Ÿฌ๋‹3์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ž‘์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋ฐฐ์šด ๋‚ด์šฉ์„ ๊ธฐ๋กํ•˜๊ณ , ๊ฐœ์ธ์ ์ธ ๊ณต๋ถ€๋ฅผ ์œ„ํ•ด ์ž‘์„ฑํ•˜๋Š” ํฌ์ŠคํŒ…์ž…๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๊ต์žฌ ๊ตฌ๋งค๋ฅผ ๊ฐ•๋ ฅ ์ถ”์ฒœ๋“œ๋ฆฝ๋‹ˆ๋‹ค.

 

 

์•ž ๋‹จ๊ณ„์—์„œ๋Š” ๋‹จ์ˆœํ•˜์ง€๋งŒ ์‹ ๊ฒฝ๋ง์„ ๊ตฌํ˜„ํ•ด๋ณด์•˜์Šต๋‹ˆ๋‹ค. ์ด์ œ DeZero๋Š” ์‹ ๊ฒฝ๋ง ํ”„๋ ˆ์ž„์›Œํฌ์˜ ๊ตฌ์ƒ‰์„ ๊ฐ–์ถ”๊ณ  ์žˆ์ง€๋งŒ, ์‚ฌ์šฉ ํŽธ์˜์„ฑ ์ธก๋ฉด์—์„œ๋Š” ๋ณด์™„ํ•  ์ ์ด ๋งŽ์Šต๋‹ˆ๋‹ค. ์ด๋ฒˆ ๋‹จ๊ณ„์—์„œ๋Š” ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ๋‹ด๋Š” ๊ตฌ์กฐ๋ฅผ ๋งŒ๋“ค์–ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. Parameter์™€ Layer๋ผ๋Š” ํด๋ž˜์Šค๋ฅผ ๊ตฌํ˜„ํ•˜๊ณ , ์ด๋ฅผ ํ†ตํ•ด ๋งค๊ฐœ๋ณ€์ˆ˜ ๊ด€๋ฆฌ๋ฅผ ์ž๋™ํ™”ํ•ฉ๋‹ˆ๋‹ค.

 

Parameter ํด๋ž˜์Šค๋Š” ์•„๋ž˜ ๋‚ด์šฉ์ด ๋‹ค์ž…๋‹ˆ๋‹ค. Variable ํด๋ž˜์Šค์™€ ๋™์ผํ•œ ๊ธฐ๋Šฅ์„ ๊ฐ€์ง‘๋‹ˆ๋‹ค. ๋ฌผ๋ก  Variable ์ธ์Šคํ„ด์Šค์™€ Parameter ์ธ์Šคํ„ด์Šค๋Š” ๊ตฌ๋ณ„์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

class Parameter(Variable):
    pass

 

Layer ํด๋ž˜์Šค๋Š” Function ํด๋ž˜์Šค์ฒ˜๋Ÿผ ๋ณ€์ˆ˜๋ฅผ ๋ณ€ํ™˜ํ•˜๋Š” ํด๋ž˜์Šค์ง€๋งŒ, Function ํด๋ž˜์Šค์™€ ๋‹ฌ๋ฆฌ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค. ์„ค๋ช…์€ ์•„๋ž˜์—์„œ ์ด์–ด๋‚˜๊ฐ€๊ฒ ์Šต๋‹ˆ๋‹ค.

class Layer:
    def __init__(self):
        self._params = set()
    
    def __setattr__(self, name, value):
        if isinstance(value, Parameter):
            self._params.add(name)
        super().__setattr__(name, value)

    def __call__(self, *inputs):
        outputs = self.forward(*inputs)
        if not isinstance(outputs, tuple):
            outputs = (outputs,)
        self.inputs = [weakref.ref(x) for x in inputs]
        self.outputs = [weakref.ref(y) for y in outputs]
        return outputs if len(outputs) > 1 else outputs[0]
    
    def forward(self, inputs):
        raise NotImplementedError()

    def params(self):
        for name in self._params:
            yield self.__dict__[name]
    
    def cleargrads(self):
        for param in self.params():
            param.cleargrad()

 

Layer ํด๋ž˜์Šค๋Š” _params๋ผ๋Š” ์ธ์Šคํ„ด์Šค ๋ณ€์ˆ˜๋ฅผ ๋‘๊ณ , ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ๋ณด๊ด€ํ•ฉ๋‹ˆ๋‹ค. ์ด๋•Œ __setattr__์€ ์ธ์Šคํ„ด์Šค ๋ณ€์ˆ˜๋ฅผ ์„ค์ •ํ•  ๋•Œ ํ˜ธ์ถœ๋˜๋Š” ํŠน์ˆ˜ ๋ฉ”์„œ๋“œ๋กœ, ์ด ๋ฉ”์„œ๋“œ๋ฅผ ์žฌ์ •์˜ํ•˜์—ฌ ์ปค์Šคํ…€ ๋กœ์ง์„ ์ถ”๊ฐ€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ๋Š” value๊ฐ€ Parameter ์ธ์Šคํ„ด์Šค์ธ ๊ฒฝ์šฐ self._params์— name์„ ์ถ”๊ฐ€ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

 

__call__ ๋ฉ”์„œ๋“œ๋Š” Function ํด๋ž˜์Šค์™€ ๋น„์Šทํ•˜๊ฒŒ ๊ตฌํ˜„๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. params ๋ฉ”์„œ๋“œ์˜ ๊ฒฝ์šฐ Layer ์ธ์Šคํ„ด์Šค์— ์žˆ๋Š” Parameter ์ธ์Šคํ„ด์Šค๋ฅผ ๊บผ๋‚ด์ฃผ๊ณ , cleargrads ๋ฉ”์„œ๋“œ๋Š” ๋ชจ๋“  ๋งค๊ฐœ๋ณ€์ˆ˜ ๊ธฐ์šธ๊ธฐ๋ฅผ ์žฌ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.

 

Layer ํด๋ž˜์Šค์˜ ๊ฒฝ์šฐ base ํด๋ž˜์Šค์ด๊ณ , ๋‹ค๋ฅธ ํด๋ž˜์Šค๋“ค์€ ์ด๋ฅผ ์ƒ์†๋ฐ›์•„ ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฒˆ์—๋Š” Linear ํด๋ž˜์Šค๋ฅผ ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค. __init__์˜ ์ธ์ˆ˜๋Š” ์ˆœ์„œ๋Œ€๋กœ ์ถœ๋ ฅ ํฌ๊ธฐ, ํŽธํ–ฅ ์‚ฌ์šฉ ์—ฌ๋ถ€ ํ”Œ๋ž˜๊ทธ, ๋ฐ์ดํ„ฐ ํƒ€์ž…, ์ž…๋ ฅ ํฌ๊ธฐ์ž…๋‹ˆ๋‹ค. ํŠน์ดํ•œ ์ ์€ ๊ฐ€์ค‘์น˜ W๋ฅผ ์ƒ์„ฑํ•˜๋Š” ์‹œ์ ์„ ๋Šฆ์ถ˜ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๊ฐ€์ค‘์น˜๋ฅผ __init__ ๋ฉ”์„œ๋“œ๊ฐ€ ์•„๋‹Œ forward ๋ฉ”์„œ๋“œ์—์„œ ์ƒ์„ฑํ•จ์œผ๋กœ์จ Linear ํด๋ž˜์Šค์˜ ์ž…๋ ฅ ํฌ๊ธฐ๋ฅผ ์ž๋™์œผ๋กœ ๊ฒฐ์ •ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์‚ฌ์šฉ์ž๊ฐ€ ์ง€์ •ํ•˜์ง€ ์•Š์•„๋„ ๋˜๋Š” ๊ฒƒ์ด์ฃ .

class Linear(Layer):
    def __init__(self, out_size, nobias=False, dtype=np.float32, in_size=None):
        super().__init__()
        self.in_size = in_size
        self.out_size = out_size
        self.dtype = dtype

        self.W = Parameter(None, name='W')
        # in_size๊ฐ€ ์ง€์ •๋˜์–ด ์žˆ์ง€ ์•Š๋Š” ๊ฒฝ์šฐ ๋‚˜์ค‘์œผ๋กœ ์—ฐ๊ธฐ
        if self.in_size is not None:
            self._init_W()

        if nobias:
            self.b = None
        else:
            self.b = Parameter(np.zeros(out_size, dtype=dtype), name='b')
        
    def _init_W(self):
        I, O = self.in_size, self.out_size
        W_data = np.random.randn(I, O).astype(self.dtype) * np.sqrt(1 / I)
        self.W.data = W_data

    def forward(self, x):
    	# ๋ฐ์ดํ„ฐ๋ฅผ ํ˜๋ ค๋ณด๋‚ด๋Š” ์‹œ์ ์— ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™”
        if self.W.data is None:
            self.in_size = x.shape[1]
            self._init_W()

        y = F.linear(x, self.W, self.b)
        return y

 

์ด์ œ ํ…Œ์ŠคํŠธ๋ฅผ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์ด์ „ ๋‹จ๊ณ„์™€ ๊ฐ™์€ ๊ธฐ๋Šฅ์„ ํ•˜๋Š” ํ…Œ์ŠคํŠธ ์ฝ”๋“œ์ง€๋งŒ ํ›จ์”ฌ ๋” ๊ฐ„๊ฒฐํ•ด์กŒ์Šต๋‹ˆ๋‹ค. ๋งค๊ฐœ๋ณ€์ˆ˜ ๊ด€๋ฆฌ๋ฅผ Linear ์ธ์Šคํ„ด์Šค๊ฐ€ ๋งก๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ๋‹ค์Œ ๋‹จ๊ณ„์—์„œ๋Š” ์—ฌ๋Ÿฌ Layer๋ฅผ ํ•˜๋‚˜์˜ ํด๋ž˜์Šค๋กœ ๋ฌถ์–ด์„œ ๊ด€๋ฆฌํ•˜๋„๋ก ๊ฐœ์„ ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

import numpy as np
from dezero import Variable
import dezero.functions as F
import dezero.layers as L


# dataset
np.random.seed(0)
x = np.random.rand(100, 1)
y = np.sin(2 * np.pi * x) + np.random.rand(100, 1)

l1 = L.Linear(10)  # output size
l2 = L.Linear(1)


def predict(x):
    y = l1(x)
    y = F.sigmoid(y)
    y = l2(y)
    return y


lr = 0.2
iters = 10000

for i in range(iters):
    y_pred = predict(x)
    loss = F.mean_squared_error(y, y_pred)

    l1.cleargrads()
    l2.cleargrads()
    loss.backward()

    for l in [l1, l2]:
        for p in l.params():
            p.data -= lr * p.grad.data
    if i % 1000 == 0:
        print(loss)