DonHurry

step 43. 신경망 λ³Έλ¬Έ

DeZero/πŸ—»μ œ4κ³ μ§€

step 43. 신경망

_도녁 2023. 2. 19. 21:23

πŸ“’ λ³Έ ν¬μŠ€νŒ…μ€ λ°‘λ°”λ‹₯λΆ€ν„° μ‹œμž‘ν•˜λŠ” λ”₯λŸ¬λ‹3을 기반으둜 μž‘μ„±ν•˜μ˜€μŠ΅λ‹ˆλ‹€. 배운 λ‚΄μš©μ„ κΈ°λ‘ν•˜κ³ , 개인적인 곡뢀λ₯Ό μœ„ν•΄ μž‘μ„±ν•˜λŠ” ν¬μŠ€νŒ…μž…λ‹ˆλ‹€. μžμ„Έν•œ λ‚΄μš©μ€ ꡐ재 ꡬ맀λ₯Ό κ°•λ ₯ μΆ”μ²œλ“œλ¦½λ‹ˆλ‹€.

 

 

이번 λ‹¨κ³„μ—μ„œλŠ” μ „ λ‹¨κ³„μ—μ„œ κ΅¬ν˜„ν–ˆλ˜ μ„ ν˜• νšŒκ·€λ₯Ό μ‹ κ²½λ§μœΌλ‘œ ν™•μž₯μ‹œν‚€λ„λ‘ ν•˜κ² μŠ΅λ‹ˆλ‹€. μš°μ„  μ„ ν˜• λ³€ν™˜μ„ DeZero의 linear ν•¨μˆ˜λ‘œ κ΅¬ν˜„ν•΄λ³΄κ² μŠ΅λ‹ˆλ‹€. $y = F.matmul(x, W) + b$와 같이 μž…λ ₯ x와 λ§€κ°œλ³€μˆ˜ Wλ₯Ό ν–‰λ ¬ κ³±ν•˜κ³ , bλ₯Ό λ”ν•˜λŠ” 것을 μ„ ν˜• λ³€ν™˜(linear transformation) ν˜Ήμ€ μ•„ν•€ λ³€ν™˜(affine transformation)이라고 ν•©λ‹ˆλ‹€. (μ—„λ°€νžˆ λ§ν•˜λ©΄ bλ₯Ό μ œμ™Έν•œ 것이 μ„ ν˜• λ³€ν™˜μž…λ‹ˆλ‹€.) μ΄λ•Œ μ„ ν˜• λ³€ν™˜μ€ μ‹ κ²½λ§μ—μ„œ 완전연결계측(fully connected later)에 ν•΄λ‹Ήν•©λ‹ˆλ‹€.

 

μ„ ν˜• λ³€ν™˜μ€ μ•„λž˜ 두 κ°€μ§€ 방식을 톡해 κ΅¬ν˜„ν•  수 μžˆμŠ΅λ‹ˆλ‹€. μ™Όμͺ½μ€ DeZero의 matmul ν•¨μˆ˜μ™€ +(add ν•¨μˆ˜)λ₯Ό μ΄μš©ν•˜λŠ”λ°, μ΄λ•Œ matmul ν•¨μˆ˜μ˜ 좜λ ₯은 Variable μΈμŠ€ν„΄μŠ€μ΄κΈ° λ•Œλ¬Έμ— 계산 κ·Έλž˜ν”„μ— κΈ°λ‘λ©λ‹ˆλ‹€. 계산 κ·Έλž˜ν”„κ°€ μ‘΄μž¬ν•˜λŠ” λ™μ•ˆμ—λŠ” Variable μΈμŠ€ν„΄μŠ€μ™€ κ·Έ μ•ˆμ— λ‹΄κΈ΄ ndarray μΈμŠ€ν„΄μŠ€λŠ” λ©”λͺ¨λ¦¬μ— 계속 λ‚¨κ²Œ λ©λ‹ˆλ‹€. ν•œνŽΈ 였λ₯Έμͺ½μ˜ Linear 클래슀λ₯Ό ν™œμš©ν•˜λŠ” 방법은 쀑간 κ²°κ³Ό tκ°€ λ³΄μ‘΄λ˜μ§€ μ•ŠκΈ° λ•Œλ¬Έμ— λ©”λͺ¨λ¦¬λ₯Ό 효율적으둜 μ‚¬μš©ν•©λ‹ˆλ‹€.

 

 

μ΄λ•Œ t의 λ°μ΄ν„°λŠ” + μ—­μ „νŒŒμ— ν•„μš”κ°€ μ—†μŠ΅λ‹ˆλ‹€. matmul μ—­μ „νŒŒ μ—­μ‹œ λ§ˆμ°¬κ°€μ§€μž…λ‹ˆλ‹€. +의 μ—­μ „νŒŒλŠ” 좜λ ₯ μͺ½ 기울기λ₯Ό λ‹¨μˆœνžˆ 흘리기 λ•Œλ¬Έμž…λ‹ˆλ‹€. λ”°λΌμ„œ λ‹€μŒκ³Ό 같이 κ΅¬ν˜„ κ°€λŠ₯ν•©λ‹ˆλ‹€. 쀑간 결과인 t 데이터λ₯Ό μ‚­μ œν•΄μ£ΌλŠ” 것이죠.

def linear_simple(x, W, b=None):
    t = matmul(x, W)
    if b is None:
        return t
    
    y = t + b
    t.data = None  # Release t.data (ndarray) for memory
    return y

 

제 4κ³ μ§€μ—μ„œλŠ” μƒλž΅λ˜λŠ” μ½”λ“œ μ„€λͺ…이 λ§Žμ€λ°, μ•„λž˜ κΉƒν—ˆλΈŒμ—μ„œ μžμ„Έν•œ κ΅¬ν˜„μ„ 확인해볼 수 μžˆμŠ΅λ‹ˆλ‹€. Linear 클래슀 μ—­μ‹œ λ§ˆμ°¬κ°€μ§€μž…λ‹ˆλ‹€. dezero의 functions.py에 κ΅¬ν˜„λ˜μ–΄ μžˆμŠ΅λ‹ˆλ‹€.

 

GitHub - WegraLee/deep-learning-from-scratch-3: γ€Žλ°‘λ°”λ‹₯λΆ€ν„° μ‹œμž‘ν•˜λŠ” λ”₯λŸ¬λ‹ ❸』(ν•œλΉ›λ―Έλ””μ–΄, 2020)

γ€Žλ°‘λ°”λ‹₯λΆ€ν„° μ‹œμž‘ν•˜λŠ” λ”₯λŸ¬λ‹ ❸』(ν•œλΉ›λ―Έλ””μ–΄, 2020). Contribute to WegraLee/deep-learning-from-scratch-3 development by creating an account on GitHub.

github.com

 

이제 직접 데이터셋을 ν™œμš©ν•΄ 신경망을 κ΅¬ν˜„ν•΄λ³΄κ² μŠ΅λ‹ˆλ‹€. μ €λ²ˆκ³ΌλŠ” 달리 λΉ„μ„ ν˜• 데이터셋을 λ§Œλ“€μ–΄λ‚΄κ² μŠ΅λ‹ˆλ‹€.

import numpy as np

np.random.seed(0)
x = np.random.rand(100, 1)
y = np.sin(2 * np.pi * x) + np.random.rand(100, 1)

 

μ΄λŸ¬ν•œ λ°μ΄ν„°μ…‹μ—μ„œ x와 yλŠ” μ„ ν˜• 관계가 μ•„λ‹ˆκΈ° λ•Œλ¬Έμ— μ„ ν˜• νšŒκ·€λ‘œ ν’€ 수 μ—†μŠ΅λ‹ˆλ‹€. λ”°λΌμ„œ 신경망이 ν•„μš”ν•©λ‹ˆλ‹€. μ‹ κ²½λ§μ—μ„œλŠ” μ„ ν˜• λ³€ν™˜μ˜ 좜λ ₯에 λΉ„μ„ ν˜• λ³€ν™˜μ„ μˆ˜ν–‰ν•˜κ²Œ λ˜λŠ”λ°, 이 λΉ„μ„ ν˜• λ³€ν™˜μ„ ν™œμ„±ν™” ν•¨μˆ˜λΌκ³  ν•©λ‹ˆλ‹€. μ‹œκ·Έλͺ¨μ΄λ“œλ‚˜ 렐루 같은 ν•¨μˆ˜λ“€μ΄μ£ .

 

μ„ ν˜• λ³€ν™˜κ³Ό ν™œμ„±ν™” ν•¨μˆ˜λ₯Ό ν™œμš©ν•˜μ—¬ 신경망 μ½”λ“œλ₯Ό κ΅¬ν˜„ν•΄λ³΄κ² μŠ΅λ‹ˆλ‹€.

import numpy as np
import matplotlib.pyplot as plt
from dezero import Variable
import dezero.functions as F


# non-linear dataset
np.random.seed(0)
x = np.random.rand(100, 1)
y = np.sin(2 * np.pi * x) + np.random.rand(100, 1)

# κ°€μ€‘μΉ˜ μ΄ˆκΈ°ν™”
I, H, O = 1, 10, 1
W1 = Variable(0.01 * np.random.randn(I, H))
b1 = Variable(np.zeros(H))
W2 = Variable(0.01 * np.random.randn(H, O))
b2 = Variable(np.zeros(O))


# 신경망 μΆ”λ‘ 
def predict(x):
    y = F.linear(x, W1, b1)
    y = F.sigmoid(y)
    y = F.linear(y, W2, b2)
    return y


lr = 0.2
iters = 10000

# 신경망 ν•™μŠ΅
for i in range(iters):
    y_pred = predict(x)
    loss = F.mean_squared_error(y, y_pred)

    W1.cleargrad()
    b1.cleargrad()
    W2.cleargrad()
    b2.cleargrad()
    loss.backward()

    W1.data -= lr * W1.grad.data
    b1.data -= lr * b1.grad.data
    W2.data -= lr * W2.grad.data
    b2.data -= lr * b2.grad.data
    if i % 1000 == 0:
        print(loss)


# Plot
plt.scatter(x, y, s=10)
plt.xlabel('x')
plt.ylabel('y')
t = np.arange(0, 1, 0.1)[:, np.newaxis]
y_pred = predict(t)
plt.plot(t, y_pred.data, color='r')
plt.show()

variable(0.8473695850105871)
variable(0.2514286285183606)
variable(0.2475948546674987)
variable(0.2378612044705481)
variable(0.21222231333102912)
variable(0.16742181117834126)
variable(0.09681932619992642)
variable(0.07849528290602334)
variable(0.07749729552991154)
variable(0.07722132399559321)

 

ν•™μŠ΅μ„ μ™„λ£Œν•˜λ©΄ λ‹€μŒκ³Ό 같은 κ²°κ³Όλ₯Ό 얻을 수 μžˆμŠ΅λ‹ˆλ‹€. (μœ„μ˜ μ½”λ“œλ₯Ό κ·ΈλŒ€λ‘œ 돌리면 곑선이 μ•„λž˜μ²˜λŸΌ λ§€λ„λŸ½μ§€λŠ” μ•ŠμŠ΅λ‹ˆλ‹€.) κ²°κ³Όλ₯Ό ν™•μΈν•˜λ©΄ sin ν•¨μˆ˜μ˜ 곑선을 잘 ν‘œν˜„ν•˜κ³  μžˆμŠ΅λ‹ˆλ‹€. 이 방식을 ν™œμš©ν•˜λ©΄ 더 κΉŠμ€ 신경망도 κ΅¬ν˜„ν•  수 μžˆμ§€λ§Œ, λ§€κ°œλ³€μˆ˜ 관리가 νž˜λ“€μ–΄μ§‘λ‹ˆλ‹€. λ”°λΌμ„œ λ‹€μŒ λ‹¨κ³„μ—μ„œλŠ” λ§€κ°œλ³€μˆ˜ 관리λ₯Ό κ°„μ†Œν™”ν•˜λŠ” ꡬ쑰λ₯Ό λ§Œλ“€μ–΄λ³΄κ² μŠ΅λ‹ˆλ‹€.