深度学习【pytorch安装，入门，梯度下降，线性回归】

深度学习【pytorch安装，入门，梯度下降，线性回归】
一 Pytorch的安装

 1. Pytorch的介绍

Pytorch是一款facebook发布的深度学习框架，由其易用性，友好性，深受广大用户青睐。

2. Pytorch的版本

 3. Pytorch的安装

安装地址介绍：https://pytorch.org/get-started/locally/

带GPU安装步骤：

conda install pytorch torchvision cudatoolkit=9.0 -c pytorch

不带GPU安装步骤

conda install pytorch-cpu torchvision-cpu -c pytorch

安装之后打开ipython

输入：
```
In [1]: import torch
In [2]: torch.__version__
Out[2]: '1.0.1'
1
2
3
```
注意：安装模块的时候安装的是pytorch ，但是在代码中都是使用torch

二 Pytorch的入门使用

 1. 张量Tensor

张量是一个统称，其中包含很多类型：
- 0阶张量：标量、常数，0-D Tensor
- 1阶张量：向量，1-D Tensor
- 2阶张量：矩阵，2-D Tensor
- 3阶张量
- …
- N阶张量
2. Pytorch中创建张量
- 使用python中的列表或者序列创建tensor
```
torch.tensor([[1., -1.], [1., -1.]])
tensor([[ 1.0000, -1.0000],
        [ 1.0000, -1.0000]])
1
2
3
```
- 使用numpy中的数组创建tensor
```
torch.tensor(np.array([[1, 2, 3], [4, 5, 6]]))
tensor([[ 1,  2,  3],
        [ 4,  5,  6]])
1
2
3
```
- 使用torch的api创建tensor
  - torch.empty([3,4])创建3行4列的空的tensor，会用无用数据进行填充
  - torch.ones([3,4]) 创建3行4列的全为1的tensortorch.zeros([3,4])
  - 创建3行4列的全为0的tensor
  - torch.rand([3,4]) 创建3行4列的**随机值**的tensor，随机值的区间是[0, 1)`
```
>>> torch.rand(2, 3)
tensor([[ 0.8237,  0.5781,  0.6879],
[ 0.3816,  0.7249,  0.0998]])
1
2
3
```
- torch.randint(low=0,high=10,size=[3,4]) 创建3行4列的随机整数的tensor，随机值的区间是[low, high)
```
>>> torch.randint(3, 10, (2, 2))
tensor([[4, 5],
	    [6, 7]])
1
2
3
```
- torch.randn([3,4]) 创建3行4列的随机数的tensor，随机值的分布式均值为0，方差为1
3. Pytorch中tensor的常用方法
- 获取tensor中的数据(当tensor中只有一个元素可用)：tensor.item()
```
In [10]: a = torch.tensor(np.arange(1))

In [11]: a
Out[11]: tensor([0])

In [12]: a.item()
Out[12]: 0
1
2
3
4
5
6
7
```
- 转化为numpy数组
```
In [55]: z.numpy()
Out[55]:
array([[-2.5871205],
       [ 7.3690367],
       [-2.4918075]], dtype=float32)
1
2
3
4
5
```
- 获取形状：tensor.size()
```
In [72]: x
Out[72]:
tensor([[    1,     2],
        [    3,     4],
        [    5,    10]], dtype=torch.int32)

In [73]: x.size()
Out[73]: torch.Size([3, 2])
1
2
3
4
5
6
7
8
```
- 形状改变：tensor.view((3,4))。类似numpy中的reshape，是一种浅拷贝，仅仅是形状发生改变
```
In [76]: x.view(2,3)
Out[76]:
tensor([[    1,     2,     3],
        [    4,     5,    10]], dtype=torch.int32)
1
2
3
4
```
- 获取阶数：tensor.dim()
```
In [77]: x.dim()
Out[77]: 2
1
2
```
- 获取最大值：tensor.max()
```
In [78]: x.max()
Out[78]: tensor(10, dtype=torch.int32)
1
2
```
- 转置：tensor.t()
```
In [79]: x.t()
Out[79]:
tensor([[    1,     3,     5],
        [    2,     4, 	  10]], dtype=torch.int32)
1
2
3
4
```
- tensor[1,3] 获取tensor中第一行第三列的值
- tensor[1,3]=100 对tensor中第一行第三列的位置进行赋值100
- tensor的切片
```
   In [101]: x
   Out[101]:
   tensor([[1.6437, 1.9439, 1.5393],
           [1.3491, 1.9575, 1.0552],
           [1.5106, 1.0123, 1.0961],
           [1.4382, 1.5939, 1.5012],
           [1.5267, 1.4858, 1.4007]])
   
   In [102]: x[:,1]
   Out[102]: tensor([1.9439, 1.9575, 1.0123, 1.5939, 1.4858])
1
2
3
4
5
6
7
8
9
10
```
4. tensor的数据类型

tensor中的数据类型非常多，常见类型如下：

上图中的Tensor types表示这种type的tensor是其实例
- 获取tensor的数据类型:tensor.dtype
```
In [80]: x.dtype
Out[80]: torch.int32
1
2
```
- 创建数据的时候指定类型
```
In [88]: torch.ones([2,3],dtype=torch.float32)
Out[88]:
tensor([[9.1167e+18, 0.0000e+00, 7.8796e+15],
        [8.3097e-43, 0.0000e+00, -0.0000e+00]])
1
2
3
4
```
- 类型的修改
```
In [17]: a
Out[17]: tensor([1, 2], dtype=torch.int32)

In [18]: a.type(torch.float)
Out[18]: tensor([1., 2.])

In [19]: a.double()
Out[19]: tensor([1., 2.], dtype=torch.float64)
1
2
3
4
5
6
7
8
```
5. tensor的其他操作
- tensor和tensor相加
```
In [94]: x = x.new_ones(5, 3, dtype=torch.float)

In [95]: y = torch.rand(5, 3)

In [96]: x+y
Out[96]:
tensor([[1.6437, 1.9439, 1.5393],
        [1.3491, 1.9575, 1.0552],
        [1.5106, 1.0123, 1.0961],
        [1.4382, 1.5939, 1.5012],
        [1.5267, 1.4858, 1.4007]])
In [98]: torch.add(x,y)
Out[98]:
tensor([[1.6437, 1.9439, 1.5393],
        [1.3491, 1.9575, 1.0552],
        [1.5106, 1.0123, 1.0961],
        [1.4382, 1.5939, 1.5012],
        [1.5267, 1.4858, 1.4007]])
In [99]: x.add(y)
Out[99]:
tensor([[1.6437, 1.9439, 1.5393],
        [1.3491, 1.9575, 1.0552],
        [1.5106, 1.0123, 1.0961],
        [1.4382, 1.5939, 1.5012],
        [1.5267, 1.4858, 1.4007]])
In [100]: x.add_(y)  #带下划线的方法会对x进行就地修改
Out[100]:
tensor([[1.6437, 1.9439, 1.5393],
        [1.3491, 1.9575, 1.0552],
        [1.5106, 1.0123, 1.0961],
        [1.4382, 1.5939, 1.5012],
        [1.5267, 1.4858, 1.4007]])

In [101]: x #x发生改变
Out[101]:
tensor([[1.6437, 1.9439, 1.5393],
        [1.3491, 1.9575, 1.0552],
        [1.5106, 1.0123, 1.0961],
        [1.4382, 1.5939, 1.5012],
        [1.5267, 1.4858, 1.4007]])
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
```
注意：带下划线的方法（比如:add_)会对tensor进行就地修改
- tensor和数字操作
```
In [97]: x + 10
Out[97]:
tensor([[11., 11., 11.],
        [11., 11., 11.],
        [11., 11., 11.],
        [11., 11., 11.],
        [11., 11., 11.]])
1
2
3
4
5
6
7
```
- CUDA中的tensor
CUDA（Compute Unified Device Architecture），是NVIDIA推出的运算平台。 CUDA™是一种由NVIDIA推出的通用并行计算架构，该架构使GPU能够解决复杂的计算问题。

torch.cuda这个模块增加了对CUDA tensor的支持，能够在cpu和gpu上使用相同的方法操作tensor

通过.to方法能够把一个tensor转移到另外一个设备(比如从CPU转到GPU)
```
#device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
if torch.cuda.is_available():
    device = torch.device("cuda")          # cuda device对象
    y = torch.ones_like(x, device=device)  # 创建一个在cuda上的tensor
    x = x.to(device)                       # 使用方法把x转为cuda 的tensor
    z = x + y
    print(z)
    print(z.to("cpu", torch.double))       # .to方法也能够同时设置类型
    
>>tensor([1.9806], device='cuda:0')
>>tensor([1.9806], dtype=torch.float64)

1
2
3
4
5
6
7
8
9
10
11
12
```
torch的各种操作几乎和numpy一样

 三梯度下降和反向传播

 1. 梯度是什么

梯度：是一个向量，导数 + 变化最快的方向(学习的前进方向)

回顾机器学习

收集数据 $x$ ，构建机器学习模型 $f$ ，得到 $f(x,w) = Y_{predict}$

判断模型好坏的方法：

$\begin{aligned} l o s s & = (Y_{p r e d i c t} - Y_{t r u e})^{2} & (回归损失) \\ l o s s & = Y_{t r u e} \cdot l o g (Y_{p r e d i c t}) & (分类损失) \end{aligned}$
lossloss=(Ypredict−Ytrue)2=Ytrue⋅log(Ypredict)(回归损失)(分类损失)
目标：通过调整(学习)参数 $w$ ，尽可能的降低 $l oss$ ，那么该如何调整 $w$ 呢？

随机选择一个起始点 $w_0$ ，通过调整 $w_0$ ，让loss函数取到最小值

$w$ 的更新方法：
- 计算 $w$ 的梯度（导数）
$$
\begin{align*}
\nabla w = \frac{f(w+0.000001)-f(w-0.000001)}{2*0.000001}

\end{align*}
$$
- 更新 $w$
$\alpha \nabla w$

其中：
- $\nabla w <0 $ ,意味着w将增大
- $\nabla w >0 $ ,意味着w将减小
总结：梯度就是多元函数参数的变化趋势（参数学习的方向），只有一个自变量时称为导数，两个自变量称为偏导数

 2. 偏导的计算

 2.1 常见的导数计算
- 多项式求导数： $f(x) = x^5$ , $f^{'}(x) = 5x^{(5-1)}$
- 基本运算求导： $f (x) = x y$ ， $f^{'}(x) = y$
- 指数求导： $f(x) = 5e^x$ ， $f^{'}(x) = 5e^x$
- 对数求导： $f (x) = 5 l n x$ ， $f^{'}(x) = \frac{5}{x}$ ，ln 表示log以e为底的对数
- 导数的微分形式：
  
  $\begin{aligned} f^{^{'}} (x) = & \frac{d f (x)}{d x} \\ 牛顿 & 莱布尼兹 \end{aligned}$ $f^{^{'}} (x) = 牛顿 \frac{df ( x )}{d x} 莱布尼兹$
那么：如何求 $f(x) = (1+e^{-x})^{-1}$ 的导数呢？那就可以使用

$f(x) = (1+e^{-x})^{-1}$ ==> $f(a) = a^{-1},a(b) = (1+b),b(c) = e^c,c(x) = -x$

则有：

$\begin{aligned} \frac{d f (x)}{d x} & = \frac{d f}{d a} \times \frac{d a}{d b} \times \frac{d b}{d c} \times \frac{d c}{d x} \\ = - a^{- 2} \times 1 \times e^{c} \times (- 1) \\ = - (1 + e^{- x})^{- 2} \times e^{- x} \times (- 1) \\ = e^{- x} (1 + e^{- x})^{- 2} \end{aligned}$
dxdf(x)=dadf×dbda×dcdb×dxdc=−a−2×1×ec×(−1)=−(1+e−x)−2×e−x×(−1)=e−x(1+e−x)−2

2.2 多元函数求偏导

一元函数，即有一个自变量。类似 $f (x)$

多元函数，即有多个自变量。类似 $f (x, y, z), 三个自变量 x, y, z$

多元函数求偏导过程中：对某一个自变量求导，其他自变量当做常量即可

例1：

$\begin{aligned} f (x, y, z) & = & a x + b y + c z \\ \frac{d f (x, y, z)}{d x} & = & a \\ \frac{d f (x, y, z)}{d y} & = & b \\ \frac{d f (x, y, z)}{d z} & = & c \end{aligned}$
f(x,y,z)dxdf(x,y,z)dydf(x,y,z)dzdf(x,y,z)====ax+by+czabc
例2：
$\begin{aligned} f (x, y) & = & x y \\ \frac{d f (x, y)}{d x} & = & y \\ \frac{d f (x, y)}{d y} & = & x \end{aligned}$
例3：
$\begin{aligned} f (x, w) & = & (y - x w)^{2} \\ \frac{d f (x, w)}{d x} & = & - 2 w (y - x w) \\ \frac{d f (x, w)}{d w} & = & - 2 x (y - x w) \end{aligned}$
练习：

已知 $J (a, b, c) = 3 (a + b c), 令 u = a + v, v = b c$ ,求a，b，c各自的偏导数。

$\begin{aligned} 令 : & J (a, b, c) = 3 u \\ \frac{d J}{d a} & = \frac{d J}{d u} \times \frac{d u}{d a} = 3 \times 1 \\ \frac{d J}{d b} & = \frac{d J}{d u} \times \frac{d u}{d v} \times \frac{d v}{d b} = 3 \times 1 \times c \\ \frac{d J}{d c} & = \frac{d J}{d u} \times \frac{d u}{d v} \times \frac{d v}{d c} = 3 \times 1 \times b \end{aligned}$
令:dadJdbdJdcdJJ(a,b,c)=3u=dudJ×dadu=3×1=dudJ×dvdu×dbdv=3×1×c=dudJ×dvdu×dcdv=3×1×b

3. 反向传播算法

 3.1 计算图和反向传播

计算图：通过图的方式来描述函数的图形

在上面的练习中， $J (a, b, c) = 3 (a + b c), 令 u = a + v, v = b c$ ,把它绘制成计算图可以表示为：

绘制成为计算图之后，可以清楚的看到向前计算的过程

之后，对每个节点求偏导可有：

那么反向传播的过程就是一个上图的从右往左的过程，自变量 $a, b, c$ 各自的偏导就是连线上的梯度的乘积：

$\begin{aligned} \frac{d J}{d a} & = 3 \times 1 \\ \frac{d J}{d b} & = 3 \times 1 \times c \\ \frac{d J}{d c} & = 3 \times 1 \times b \end{aligned}$
dadJdbdJdcdJ=3×1=3×1×c=3×1×b

3.2 神经网络中的反向传播

3.2.1 神经网络的示意图

$w 1, w 2, .... w n$ 表示网络第n层权重

$w_n[i,j]$ 表示第n层第i个神经元，连接到第n+1层第j个神经元的权重。

3.2.2 神经网络的计算图

其中：
1. $\nabla out$ 是根据损失函数对预测值进行求导得到的结果
2. f函数可以理解为激活函数
**问题：**那么此时 $w_1[1,2]$ 的偏导该如何求解呢？

通过观察，发现从 $o u t$ 到 $w_1[1,2]$ 的来连接线有两条

结果如下：
$\frac{dout}{dW_1[1,2]} = x1*f^{'}(a2)*(W_2[2,1]*f^{'}(b1)*W_3[1,1]*\nabla out +W_2[2,2]*f^{'}(b2)*W_3[2,1]*\nabla out)$
公式分为两部分：
1. 括号外：左边红线部分
2. 括号内
  1. 加号左边：右边红线部分
  2. 加号右边：蓝线部分
但是这样做，当模型很大的时候，计算量非常大

所以反向传播的思想就是对其中的某一个参数单独求梯度，之后更新，如下图所示：

计算过程如下
$$
\begin{align*}
&\nabla W_3[1,1] = f(b_1)\nabla out & （计算W_3[1,1]梯度）\
&\nabla W_3[2,1] = f(b_2)\nabla out & （计算W_3[2,1]梯度）\
\
&\nabla b_1= f^{‘}(b_1)W_3[1,1]\nabla out & （计算W_3[2,1]梯度）\
&\nabla b_2= f^{’}(b_2)W_3[2,1]\nabla out & （计算W_3[2,1]梯度）\

\end{align*}
$$
更新参数之后，继续反向传播

计算过程如下：

$\begin{aligned} \nabla W_{2} [1, 2] = f (a_{1}) * \nabla b_{2} \\ \nabla a_{2} = f^{^{'}} (a_{2}) * (w_{2} [2, 1] \nabla b_{1} + W_{2} [2, 2] \nabla b_{2}) \end{aligned}$
∇W2[1,2]=f(a1)∗∇b2∇a2=f′(a2)∗(w2[2,1]∇b1+W2[2,2]∇b2)
继续反向传播

计算过程如下：

$\begin{aligned} ▽ W_{1} [1, 2] = x_{1} * ▽ a_{2} \\ ▽ x_{1} = (W_{1} [1, 1] * ▽ a_{1} + w_{1} [1, 2] * ▽ a_{2}) * x_{1}^{'} \end{aligned}$
▽W1[1,2]=x1∗▽a2▽x1=(W1[1,1]∗▽a1+w1[1,2]∗▽a2)∗x1’
通用的描述如下
$\nabla w^{l}_{i,j} = f(a^l_i)* \nabla a^{i+1}_{j}\\ \nabla a^{l}_i = f'(a^l_i)*(\sum_{j=1}^{m}w_{i,j}*\nabla a_j^{l+1})$

四 Pytorch完成线性回归

 1. 向前计算

对于pytorch中的一个tensor，如果设置它的属性 .requires_grad为True，那么它将会追踪对于该张量的所有操作。或者可以理解为，这个tensor是一个参数，后续会被计算梯度，更新该参数。

1.1 计算过程

假设有以下条件（1/4表示求均值，xi中有4个数），使用torch完成其向前计算的过程

$\begin{aligned} o u t p u t = \frac{1}{4} \sum_{i} z_{i} \\ z_{i} = 3 (x_{i} + 2)^{2} \\ 其中 : \\ z_{i} |_{x_{i} = 1} = 27 \end{aligned}$
其中:output=41i∑zizi=3(xi+2)2zi∣xi=1=27
如果x为参数，需要对其进行梯度的计算和更新

那么，在最开始随机设置x的值的过程中，需要设置他的requires_grad属性为True，其默认值为False
```
import torch
x = torch.ones(2, 2, requires_grad=True)  #初始化参数x并设置requires_grad=True用来追踪其计算历史
print(x)
#tensor([[1., 1.],
#        [1., 1.]], requires_grad=True)

y = x+2
print(y)
#tensor([[3., 3.],
#        [3., 3.]], grad_fn=)

z = y*y*3  #平方x3
print(x)
#tensor([[27., 27.],
#        [27., 27.]], grad_fn=) 

out = z.mean() #求均值
print(out)
#tensor(27., grad_fn=)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
```
从上述代码可以看出：
- x的requires_grad属性为True，之后的每次计算都会修改其grad_fn属性，用来记录做过的操作
- 通过这个函数和grad_fn能够组成一个和前一小节类似的计算图
1.2 requires_grad和grad_fn
```
a = torch.randn(2, 2)
a = ((a * 3) / (a - 1))
print(a.requires_grad)  #False
a.requires_grad_(True)  #就地修改
print(a.requires_grad)  #True
b = (a * a).sum()
print(b.grad_fn) # 
# 以下代码不会记录c之前的操作
with torch.no_gard():
    c = (a * a).sum()  #tensor(151.6830),此时c没有gard_fn
    
print(c.requires_grad) #False
1
2
3
4
5
6
7
8
9
10
11
12
```
注意：

为了防止跟踪历史记录（和使用内存），可以将代码块包装在with torch.no_grad():中。在评估模型时特别有用，因为模型可能具有requires_grad = True的可训练的参数，但是不需要在此过程中对他们进行梯度计算。

2. 梯度计算

对于1.1 中的out而言，可以使用backward方法来进行反向传播，计算梯度

out.backward(),此时便能够求出导数 $\frac{d out}{dx}$ ,调用x.gard能够获取导数值

得到
```
tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])
1
2
```
因为：
$\frac{d(O)}{d(x_i)} = \frac{3}{2}(x_i+2)$
在 $x_i$ 等于1时其值为4.5

注意：在输出为一个标量的情况下，可以调用输出tensor的backword() 方法，但是在数据是一个向量的时候，调用backward()的时候还需要传入其他参数。

很多时候的损失函数都是一个标量，所以这里就不再介绍损失为向量的情况。

loss.backward()就是根据损失函数，对参数（requires_grad=True）去计算他的梯度，并且把它累加保存到x.gard，此时还并未更新其梯度

注意点：
- tensor.data:
  - 在tensor的require_grad=False，tensor.data和tensor等价
  - require_grad=True时，tensor.data仅仅是获取tensor中的数据
- tensor.numpy():
  - require_grad=True不能够直接转换，需要使用tensor.detach().numpy()
3. 线性回归实现

下面，使用一个自定义的数据，来使用torch实现一个简单的线性回归

假设的基础模型就是y = wx+b，其中w和b均为参数，使用y = 3x+0.8来构造数据x、y，所以最后通过模型应该能够得出w和b应该分别接近3和0.8
- 准备数据
- 计算预测值
- 计算损失，把参数的梯度置为0，进行反向传播
- 更新参数
实现方法一
```
import torch
import matplotlib.pyplot as plt

learning_rate = 0.1

# 1. 准备数据 #y = 3x + 0.8
x = torch.randn([500,1])
y_true = 3*x + 0.8

# 2. 计算预测值 y_pred = x * w + b
w = torch.rand([],requires_grad=True)
b = torch.tensor(0,dtype=torch.float,requires_grad=True)

for k in range(30):
    for i in [w,b]:
        if i.grad is not None:
            i.grad.data.zero_()

    y_predict = x * w + b
    # 3. 计算损失，把参数的梯度置为0，进行反向传播
    loss =  (y_predict-y_true).pow(2).mean()

    loss.backward()
    # 3.1 能够得到w和b的梯度
    # 4. 更新参数
    w.data = w.data - learning_rate * w.grad
    b.data = b.data - learning_rate * b.grad
    if k%10 == 0:
        print(k,loss.item(),w.item(),b.item())
# print(w,b)

#绘图
plt.figure(figsize=(20,8))
plt.scatter(x.numpy(),y_true.numpy())

y_predict =  x * w + b
plt.plot(x.numpy(),y_predict.detach().numpy(),c="red")
plt.show()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
```
实现方法二
```
import torch
import numpy as np
from matplotlib import pyplot as plt


#1. 准备数据 y = 3x+0.8，准备参数
# x为50个从0到1的值
x = torch.rand([50])
y = 3*x + 0.8
# w和b的初始值
w = torch.rand(1,requires_grad=True)
b = torch.rand(1,requires_grad=True)

# 损失函数
def loss_fn(y,y_predict):
    loss = (y_predict-y).pow(2).mean()
    for i in [w,b]:
		#每次反向传播前把梯度置为0，如果不置为0，梯度会累加
        if i.grad is not None:
            i.grad.data.zero_()
    # [i.grad.data.zero_() for i in [w,b] if i.grad is not None]
    loss.backward()  #反向传播
    return loss.data

# 更新参数
def optimize(learning_rate):
    # print(w.grad.data,w.data,b.data)
    w.data -= learning_rate* w.grad.data
    b.data -= learning_rate* b.grad.data

for i in range(3000):
    #2. 计算预测值
    y_predict = x*w + b
	
    #3.计算损失，把参数的梯度置为0，进行反向传播 
    loss = loss_fn(y,y_predict)
    
    if i%500 == 0:
        print(i,loss)
    #4. 更新参数w和b
    optimize(0.01)

# 绘制图形，观察训练结束的预测值和真实值
predict =  x*w + b  #使用训练后的w和b计算预测值

# 绘制散点图
plt.scatter(x.data.numpy(), y.data.numpy(),c = "r")
plt.plot(x.data.numpy(), predict.data.numpy())
plt.show()

print("w",w)
print("b",b)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
```
图形效果如下：

打印w和b，可有
```
w tensor([2.9280], requires_grad=True)
b tensor([0.8372], requires_grad=True)
1
2
```
可知，w和b已经非常接近原来的预设的3和0.8
相关阅读:
PHP 排序函数使用方法，按照字母排序等操作
 Java版本+企业电子招投标系统源代码+支持二开+招投标系统+中小型企业采购供应商招投标平台
 【Linux】进程概念与进程状态
 Spring Boot 配置 jar 包外面的 Properties 配置文件
 在mac上使用jmap -heap命令报错：Attaching to process ID 96530, please wait...
qml 两个listview一起上下滑动
 月涨粉3W！自媒体大咖最不愿公开的5个“宝藏”工具
 MS933/MS934 适用于 1MP/60fps 摄像头，15MHz100MHz，10 位/12 位的具有直流平衡编码和双向控制通道的串化器和解串器
 详解junit
RocketMQ生产环境常见问题分析与总结
原文地址：https://blog.csdn.net/weixin_43923463/article/details/126353366

一 Pytorch的安装

1. Pytorch的介绍

2. Pytorch的版本

3. Pytorch的安装

二 Pytorch的入门使用

1. 张量Tensor

2. Pytorch中创建张量

3. Pytorch中tensor的常用方法

4. tensor的数据类型

5. tensor的其他操作

三 梯度下降和反向传播

1. 梯度是什么

2. 偏导的计算

2.1 常见的导数计算

2.2 多元函数求偏导

3. 反向传播算法

3.1 计算图和反向传播

3.2 神经网络中的反向传播

3.2.1 神经网络的示意图

3.2.2 神经网络的计算图

四 Pytorch完成线性回归

1. 向前计算

1.1 计算过程

1.2 requires_grad和grad_fn

2. 梯度计算

3. 线性回归实现

实现方法一

实现方法二

三梯度下降和反向传播