• Linear Regression in mojo with NDBuffer


    The linear regression is the simplest machine learning algorithm. In this article I will use mojo NDBuffer to implement a simple linear regression algorithm from scratch. I will use NDArray class which was developed by in the previous article. First import the necessary libs and NDArray definition:

    # common import
    from String import String
    from Bool import Bool
    from List import VariadicList
    from Buffer import NDBuffer
    from List import DimList
    from DType import DType
    from Pointer import DTypePointer
    from TargetInfo import dtype_sizeof, dtype_simd_width
    from Index import StaticIntTuple
    from Random import rand
    
    alias nelts = dtype_simd_width[DType.f32]()
    
    struct NDArray[rank:Int, dims:DimList, dtype:DType]:
        var ndb:NDBuffer[rank, dims, dtype]
        
        fn __init__(inout self, size:Int):
            let data = DTypePointer[dtype].alloc(size)
            self.ndb = NDBuffer[rank, dims, dtype](data)
            
        fn __getitem__(self, idxs:StaticIntTuple[rank]) -> SIMD[dtype,1]:
            return self.ndb.simd_load[1](idxs)
        
        fn __setitem__(self, idxs:StaticIntTuple[rank], val:SIMD[dtype,1]):
            self.ndb.simd_store[1](idxs, val)
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26

    Let’s assume we want to figure out this function:
    y = W ⋅ X y = W \cdot X y=WX
    Where:

    • W is the parameter
    • X is the sample design matrix. Each row is a sample. Each sample is a n dimension vector x ∈ R n \boldsymbol{x} \in R^{n} xRn. If we have m samples then X ∈ R m × n X \in R^{m \times n} XRm×n
      Here we will deal with a very simple toy problem. We assume n = 3 n=3 n=3 and m = 5 m=5 m=5. Let’s define the ith sample:
      x ( i ) = [ x 1 ( i ) x 2 ( i ) x 3 ( i ) ] ∈ R 3 × 1 , i ∈ { 1 , 2 , 3 , 4 , 5 } \boldsymbol{x}^{(i)} =
      [x1(i)x2(i)x3(i)]" role="presentation">[x1(i)x2(i)x3(i)]
      \in R^{3 \times 1}, i \in \{ 1, 2, 3, 4, 5\}
      x(i)= x1(i)x2(i)x3(i) R3×1,i{1,2,3,4,5}

      Notes:
    • i is the index of the sample;
    • 1, 2, 3 is the subscript of the feature dimension;
    • m is the total number of samples. In this case m=5;
    • n is the dimension of the feature vector. in this case n=3;

    Let’s generate the dataset:

    alias X_rank = 2
    alias r1 = 5
    alias r2 = 3
    var X_size = r1 * r2
    var X = NDArray[X_rank, DimList(r1, r2), DType.f32](X_size)
    # geneate random number and set to X
    var rvs = DTypePointer[DType.f32].alloc(X_size)
    rand[DType.f32](rvs, X_size)
    for d1 in range(r1):
        for d2 in range(r2):
            X[StaticIntTuple[X_rank](d1, d2)] = rvs.load(d1*r2+d2)*5.0 + 1.0
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11

    Let’s define the parameter w \boldsymbol{w} w:
    w = [ 1.1 1.2 1.3 ] \boldsymbol{w} =

    [1.11.21.3]" role="presentation">[1.11.21.3]
    w= 1.11.21.3

    Let define the ground truth hypothesis function:
    y = w T ⋅ x + b = [ 1.1 2.2 3.3 ] ⋅ [ x 1 x 2 x 3 ] + b = 1.1 ⋅ x 1 + 2.2 ⋅ x 2 + 3.3 ⋅ x 3 + 1.8 y = \boldsymbol{w}^{T} \cdot \boldsymbol{x} + b =

    [1.12.23.3]" role="presentation">[1.12.23.3]
    \cdot
    [x1x2x3]" role="presentation">[x1x2x3]
    + b = 1.1 \cdot x_{1} + 2.2 \cdot x_{2} + 3.3 \cdot x_{3} + 1.8 y=wTx+b= 1.12.23.3 x1x2x3 +b=1.1x1+2.2x2+3.3x3+1.8

    Let’s define the paramter w \boldsymbol{w} w:

    alias w_rank = 2
    alias w_r1 = 3
    alias w_r2 = 1
    var w = NDArray[w_rank, DimList(w_r1, w_r2), DType.f32](w_r1 * w_r2)
    w[StaticIntTuple[w_rank](0,0)] = 0.01
    w[StaticIntTuple[w_rank](1,0)] = 0.02
    w[StaticIntTuple[w_rank](2,0)] = 0.03
    var b = SIMD[DType.f32, 1](0.0)
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8

    Now we can get the ground truch label 𝑦:

    alias y_rank = 1
    alias y_r1 = 5
    var y = NDArray[y_rank, DimList(y_r1), DType.f32](y_r1)
    for d1 in range(y_r1):
        y[StaticIntTuple[y_rank](d1)] = 1.1 * X[StaticIntTuple[X_rank](d1,0)] + 
                    2.2 * X[StaticIntTuple[X_rank](d1,1)] + 
                    3.3 * X[StaticIntTuple[X_rank](d1,2)] + 1.8
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7

    Let define the function get_batch to get a mini batch from the training dataset:

    alias batch_size = 2
    alias batch_rank = 2
    # idx can only be 0,1 We will ignore the last element in X.
    fn get_batch(inout batch_X:NDArray[batch_rank, DimList(batch_size, r2), DType.f32],
                 inout batch_y:NDArray[y_rank, DimList(batch_size), DType.f32],
                 X:NDArray[X_rank, DimList(r1, r2), DType.f32], 
                 y:NDArray[y_rank, DimList(y_r1), DType.f32],
                 batch_idx:Int):
        for b_idx in range(batch_size):
            batch_y[StaticIntTuple[y_rank](b_idx)] = y[StaticIntTuple[y_rank]
            		(batch_size*batch_idx+b_idx)]
            for c_idx in range(r2):
                batch_X[StaticIntTuple[batch_rank](b_idx, c_idx)] = 
                		X[StaticIntTuple[X_rank](batch_size*batch_idx+b_idx,c_idx)]
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14

    Let’s discuss the math theory of linear regress. For ith sample we will omit the (i) subscript for simplicity. The calculated label y ^ \hat{y} y^:
    y ^ = w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b \hat{y} = w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b y^=w1x1+w2x2+w3x3+b
    As we have the ground truth label y we define the loss function as:
    l = 1 2 ( y ^ − y ) 2 = 1 2 ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b − y ) 2 \mathcal{l} = \frac{1}{2}(\hat{y}-y)^{2} = \frac{1}{2}(w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b - y)^{2} l=21(y^y)2=21(w1x1+w2x2+w3x3+by)2
    According to linear regression algorithm we will set random value to parameter w \boldsymbol{w} w and zero to b b b. We will calculate y by using these parameters setting. Then we calculate the loss which represent how good our parameters are. Our task is to find the best parameters setting to let the loss minmum:
    arg ⁡ min ⁡ w , b l \arg\min_{\boldsymbol{w},b} \mathcal{l} argw,bminl

    To get the minmum parameter we will get each individual parameter’s gradient of loss and adjust the parameter against the gradient direction. This is the gradient descent algorithm. So let get parameter w 1 w_{1} w1 gradient of loss:
    ∂ l ∂ w 1 = ∂ ( 1 2 ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b − y ) 2 ) ∂ w 1 = ∂ ( 1 2 ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b − y ) 2 ) ∂ ( ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b − y ) ) ⋅ ∂ ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b − y ) ∂ w 1 = ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b − y ) ⋅ x 1 \frac{\partial{\mathcal{l}}}{\partial{w_{1}}} = \frac{\partial{ \big( \frac{1}{2}(w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b - y)^{2} \big) }}{\partial{w_{1}}} \\ = \frac{\partial{ \big( \frac{1}{2}(w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b - y)^{2} \big) }}{\partial{ \big( (w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b - y) \big) }} \cdot \frac{\partial{ (w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b - y) }} { \partial{w_{1}} } \\ = (w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b - y) \cdot x_{1} w1l=w1(21(w1x1+w2x2+w3x3+by)2)=((w1x1+w2x2+w3x3+by))(21(w1x1+w2x2+w3x3+by)2)w1(w1x1+w2x2+w3x3+by)=(w1x1+w2x2+w3x3+by)x1
    We use the chain rule of gradient in the above formula. We can get all parameters gradient of loss in the same way:
    ∂ l ∂ w 1 = ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b − y ) ⋅ x 1 ∂ l ∂ w 2 = ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b − y ) ⋅ x 2 ∂ l ∂ w 3 = ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b − y ) ⋅ x 3 ∂ l ∂ b = ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b − y ) \frac{\partial{\mathcal{l}}}{\partial{w_{1}}} = (w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b - y) \cdot x_{1} \\ \frac{\partial{\mathcal{l}}}{\partial{w_{2}}} = (w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b - y) \cdot x_{2} \\ \frac{\partial{\mathcal{l}}}{\partial{w_{3}}} = (w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b - y) \cdot x_{3} \\ \frac{\partial{\mathcal{l}}}{\partial{b}} = (w_{1} \cdot x_{1} + w_{2} \cdot x_{2} + w_{3} \cdot x_{3} + b - y) w1l=(w1x1+w2x2+w3x3+by)x1w2l=(w1x1+w2x2+w3x3+by)x2w3l=(w1x1+w2x2+w3x3+by)x3bl=(w1x1+w2x2+w3x3+by)

    Assume the learning rate is α \alpha α then we can update the parameters:
    w 1 : = w 1 − α ⋅ l ∂ w 1 w 2 : = w 2 − α ⋅ l ∂ w 2 w 3 : = w 3 − α ⋅ l ∂ w 3 b : = b − α ⋅ l ∂ b w_{1} := w_{1} - \alpha \cdot \frac{\mathcal{l}}{\partial{w_{1}}} \\ w_{2} := w_{2} - \alpha \cdot \frac{\mathcal{l}}{\partial{w_{2}}} \\ w_{3} := w_{3} - \alpha \cdot \frac{\mathcal{l}}{\partial{w_{3}}} \\ b := b - \alpha \cdot \frac{\mathcal{l}}{\partial{b}} w1:=w1αw1lw2:=w2αw2lw3:=w3αw3lb:=bαbl

    let epochs = 1000
    var y_hat = NDArray[y_rank, DimList(batch_size), DType.f32](batch_size)
    alias loss_rank = 1
    var loss = NDArray[loss_rank, DimList(batch_size), DType.f32](batch_size)
    var coff = 0.5
    var Xi = NDArray[batch_rank, DimList(batch_size, r2), DType.f32](batch_size*r2)
    var yi = NDArray[y_rank, DimList(batch_size), DType.f32](batch_size)
    var lr = 0.001
    for epoch in range(epochs):
        for bidx in range(batch_size):
            get_batch(Xi, yi, X, y, bidx)
            # forward pass
            y_hat[StaticIntTuple[y_rank](0)] = 
            			w[StaticIntTuple[w_rank](0,0)]*Xi[StaticIntTuple[X_rank](0,0)] + 
                        w[StaticIntTuple[w_rank](1,0)]*Xi[StaticIntTuple[X_rank](0,1)] + 
                        w[StaticIntTuple[w_rank](2,0)]*Xi[StaticIntTuple[X_rank](0,2)] +
                        b
            y_hat[StaticIntTuple[y_rank](1)] = 
            			w[StaticIntTuple[w_rank](0,0)]*Xi[StaticIntTuple[X_rank](1,0)] + 
                        w[StaticIntTuple[w_rank](1,0)]*Xi[StaticIntTuple[X_rank](1,1)] + 
                        w[StaticIntTuple[w_rank](2,0)]*Xi[StaticIntTuple[X_rank](1,2)] +
                        b
            # calculate the loss
            loss[StaticIntTuple[loss_rank](0)] = coff *(
                    (y[StaticIntTuple[y_rank](0)]-y_hat[StaticIntTuple[y_rank](0)])*
                    (y[StaticIntTuple[y_rank](0)]-y_hat[StaticIntTuple[y_rank](0)])
            )
            loss[StaticIntTuple[loss_rank](1)] = coff *(
                    (y[StaticIntTuple[y_rank](1)]-y_hat[StaticIntTuple[y_rank](1)])*
                    (y[StaticIntTuple[y_rank](1)]-
                    y_hat[StaticIntTuple[y_rank](1)])
            )
            g_w1 = (y_hat[StaticIntTuple[y_rank](0)]-
            		y[StaticIntTuple[y_rank](0)])*Xi[StaticIntTuple[X_rank](0,0)] + 
                    (y_hat[StaticIntTuple[y_rank](1)]-
                    y[StaticIntTuple[y_rank](1)])*Xi[StaticIntTuple[X_rank](1,0)]
            w[StaticIntTuple[w_rank](0,0)] -= lr*g_w1
            g_w2 = (y_hat[StaticIntTuple[y_rank](0)]-
            		y[StaticIntTuple[y_rank](0)])*Xi[StaticIntTuple[X_rank](0,1)] + 
                    (y_hat[StaticIntTuple[y_rank](1)]-
                    y[StaticIntTuple[y_rank](1)])*Xi[StaticIntTuple[X_rank](1,1)]
            w[StaticIntTuple[w_rank](1,0)] -= lr*g_w2
            g_w3 = (y_hat[StaticIntTuple[y_rank](0)]-
            y[StaticIntTuple[y_rank](0)])*Xi[StaticIntTuple[X_rank](0,2)] + 
                    (y_hat[StaticIntTuple[y_rank](1)]-
                    y[StaticIntTuple[y_rank](1)])*Xi[StaticIntTuple[X_rank](1,2)]
            w[StaticIntTuple[w_rank](2,0)] -= lr*g_w3
            g_b = (y_hat[StaticIntTuple[y_rank](0)]-y[StaticIntTuple[y_rank](0)]) + 
                    (y_hat[StaticIntTuple[y_rank](1)]-y[StaticIntTuple[y_rank](1)])
            b -= lr*g_b
            loss_val = loss[StaticIntTuple[loss_rank](0)] + 
            		loss[StaticIntTuple[loss_rank](0)]
            print('epoch_', epoch, ': idx=', bidx, ' loss=', loss_val, 
            		'; w1=', w[StaticIntTuple[w_rank](0,0)], 
                  ', w2=', w[StaticIntTuple[w_rank](1,0)], 
                  ', w3=', w[StaticIntTuple[w_rank](2,0)], ', b=', b, ';')
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
  • 相关阅读:
    OpenWRT通过内网穿透实现安全可靠的ssh远程连接
    C++ 数组
    深度学习时,训练集的精度与测试集精度之间的关系
    LeetCode 每日一题——655. 输出二叉树
    Win11环境Mecab日语分词和词性分析以及动态库DLL not found问题(Python3.10)
    Java入门学习(1)
    WEB基础
    关于微前端,你理解到究极奥义了么?
    大数据笔记-Hive调优总结(二)- Hive建表设计层面
    力扣344-反转字符串——双指针法
  • 原文地址:https://blog.csdn.net/Yt7589/article/details/130864909