Pytorch构建模型大致可分为三部分:
1.构建网络层,在 继承于
torch.nn.Module的类对象的构造器 init 方法中定义卷积、池化、线性、激活函数等;
2.定义前向传播,重写
forward实例方法;
3.定义初始化方式,使用
torch.nn.init中的函数。
PyTorch 中组装模型需要用到容器Containers,Containers包括Module、 Sequential、ModuleList、ModuleDict。
Module 是所有模型的基类,此外构建block一般用 Sequential,整体模型构建用Module
torch.nn.Module 为所有模型的基类,它自定义层的步骤:
1.自定义一个Module的子类,实现两个基本的函数: (1)构造 init 函数 (2)层的逻辑运算函数,即正向传播的forward函数
2.在构造 init 函数中实现层的参数定义。 比如Linear层的权重和偏置,Conv2d层的in_channels, out_channels, kernel_size, stride=1,padding=0, dilation=1, groups=1,bias=True, padding_mode='zeros’这一系列参数;
3.在forward函数里实现前向运算。 一般都是通过torch.nn.functional.xx()函数来实现,如果该层含有权重,那么权重必须是nn.Parameter类型。
4.补充:一般情况下,我们定义的参数是可以求导的,但是自定义操作如不可导,需要实现backward函数。
下面是lenet模型的两种写法
区别在于是否将relu,maxpool操作创建为网络的成员对象组件:
如果将其创建为对象,使用
torch.nn,则需要在__init__函数中进行创建加入self.
如果不将其创建为对象,使用torch.nn.functional,则需要在forward中直接调用.
from torch.nn import Module
from torch.nn import Conv2d, ReLU, MaxPool2d, Linear
class LeNet5(Module): # 继承
def __init__(self, num_classes=10):
super(LeNet5, self).__init__()
self.conv1 = Conv2d(in_channels=1, out_channels=6, kernel_size=5, stride=1, padding=2)
self.conv2 = Conv2d(6, 16, 5)
self.linear1 = Linear(in_features=16 * 5 * 5, out_features=120)
self.linear2 = Linear(120, 84)
self.linear3 = Linear(84, num_classes)
self.relu = ReLU()
self.maxpool = MaxPool2d(kernel_size=2)
def forward(self, x): # 一个 module 相当于一个运算,必须实现 forward 函数
x = self.conv1(x) # 1 x 28 x 28 -> 6 x 28 x 28
x = self.relu(x)
x = self.maxpool(x) # 6 x 28 x 28 -> 6 x 14 x 14
x = self.conv2(x) # 6 x 14 x 14 -> 16 x 10 x 10
x = self.relu(x)
x = self.maxpool(x) # 16 x 10 x 10 -> 16 x 5 x 5
x = torch.flatten(x, 1)
x = self.linear1(x) # 400 -> 120
x = self.relu(x)
x = self.linear2(x) # 120 -> 84
x = self.relu(x)
x = self.linear3(x) # 84 -> num_classes
return x
import torch.nn as nn
import torch.nn.functional as F
class LeNet(nn.Module):
def __init__(self, classes): # 构建网络层组件(用torch.nn创建组件)
super(LeNet, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16*5*5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, classes)
def forward(self, x): # 拼接网络层组件(用torch.nn.functional创建组件)
out = F.relu(self.conv1(x))
out = F.max_pool2d(out, 2)
out = F.relu(self.conv2(out))
out = F.max_pool2d(out, 2)
out = out.view(out.size(0), -1)
out = F.relu(self.fc1(out))
out = F.relu(self.fc2(out))
out = self.fc3(out)
return out
torch.nn.Sequential 类能够按顺序组装网络层,nn.Sequential 通用继承于 nn.Module 类,与Module不同的是,Sequential 已经默认定义了forward函数,按照顺序依次输入输出。模块将按照构造函数中传递的顺序添加到模块中。
与nn.Module相同,nn.Sequential也是用来构建网络block的,但nn.Sequential不需要像nn.Module那么多过程,可以快速构建神经网络。
一般向 Sequential 中传入网络层有两种格式:
1.向 Sequential 中直接传入各网络层,实例化后以数字为索引 (用的较多)
这个LeNetModule由两个Sequential组成(features 和classifier )
import torch
import torchvision
import torch.nn as nn
from collections import OrderedDict
# ============================ Sequential
class LeNetSequential(nn.Module):
def __init__(self, classes): # 此LeNetModule由两个Sequential组成
super(LeNetSequential, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 6, 5),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(6, 16, 5),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),)
self.classifier = nn.Sequential(
nn.Linear(16*5*5, 120),
nn.ReLU(),
nn.Linear(120, 84),
nn.ReLU(),
nn.Linear(84, classes),)
def forward(self, x): # 只需执行features(卷积池化)和classifier(全连接)
x = self.features(x)
x = x.view(x.size()[0], -1)
x = self.classifier(x)
return x
2.向 Sequential 中传入 OrderedDict 对象,实例化后可以自定义的名字为索引
import torch
import torchvision
import torch.nn as nn
from collections import OrderedDict
# ============================ Sequential
class LeNetSequentialOrderDict(nn.Module):
def __init__(self, classes):
super(LeNetSequentialOrderDict, self).__init__()
self.features = nn.Sequential(OrderedDict({ #通过有序字典对网络中的操作进行命名,这样可以很容易的索引到
'conv1': nn.Conv2d(3, 6, 5),
'relu1': nn.ReLU(inplace=True),
'pool1': nn.MaxPool2d(kernel_size=2, stride=2),
'conv2': nn.Conv2d(6, 16, 5),
'relu2': nn.ReLU(inplace=True),
'pool2': nn.MaxPool2d(kernel_size=2, stride=2),
}))
self.classifier = nn.Sequential(OrderedDict({
'fc1': nn.Linear(16*5*5, 120),
'relu3': nn.ReLU(),
'fc2': nn.Linear(120, 84),
'relu4': nn.ReLU(inplace=True),
'fc3': nn.Linear(84, classes),
}))
def forward(self, x):
x = self.features(x)
x = x.view(x.size()[0], -1)
x = self.classifier(x)
return x
torch.nn.Parameter是继承自torch.Tensor的子类,其主要作用是作为nn.Module中的可训练参数使用。它与torch.Tensor的区别就是nn.Parameter会自动被认为是module的可训练参数,即加入到parameter()这个迭代器中去;而module中非nn.Parameter()的普通tensor是不在parameter中的。
nn.Parameter的对象的requires_grad属性的默认值是True,即是可被训练的,这与torh.Tensor对象的默认值相反。在nn.Module类中,pytorch也是使用nn.Parameter来对每一个module的参数进行初始化的。
nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0,
dilation=1, groups=1, bias=True, padding_mode='zeros')
dilation:空洞卷积,当大于1的时候可以增大感受野,同时保持特征图的尺寸
groups:可实现组卷积,即在卷积操作时不是逐点卷积,而是将输入通道范围分为多个组,稀疏连接达到降低计算量的目的
输入特征图必须写为( B , C , H , W ) 的形式
最大下采样池化层
nn.MaxPool2d(kernel_size, stride=None, padding=0,
dilation=1, return_indices=False, ceil_mode=False)
stride – 注意:stride 默认值= kernel_size,而非1
return_indices – 是否返回最大下采样的像素索引位置
ceil_mode – when True, will use ceil instead of floor to compute the output shape
平均池化层
nn.AvgPool2d(kernel_size, stride=None, padding=0,
ceil_mode=False, count_include_pad=True, divisor_override=None)
count_include_pad – 边界padding填充是否用于计算平均值
divisor_override – 除法因子
最大上采样池化层
nn.MaxUnpool2d(kernel_size, stride=None, padding=0)
forward(self,input,indeces)
使用时,不仅要传入input,还要传入indeces最大下采样时得到的像素索引位置

nn.Linear(in_features, out_features, bias=True)
对一维向量进行线性组合
in_features:输入向量长度(输入层结点数)
out_features:输出向量长度(输出层结点数)
>>> linear = nn.Linear(784, 10)
>>> input = torch.randn(4, 784)
>>> output = linear(input)
>>> output.shape
torch.Size([4, 10])
nn.RNN(input_size, hidden_size, num_layers=1, nonlinearity=tanh,
bias=True, batch_first=False, dropout=0, bidirectional=False)
input_size输入特征的维度, 一般rnn中输入的是词向量,那么 input_size 就等于一个词向量的维度
hidden_size隐藏层神经元个数,或者也叫输出的维度(因为rnn输出为各个时间步上的隐藏状态)
num_layers网络的层数
nonlinearity激活函数
bias是否使用偏置
batch_first输入数据的形式,默认是 False,就是这样形式,(seq(num_step), batch, input_dim),也就是将序列长度放在第一位,batch 放在第二位
dropout是否应用dropout, 默认不使用,如若使用将其设置成一个0-1的数字即可
birdirectional是否使用双向的 rnn,默认是 False
注意某些参数的默认值在标题中已注明
具体计算过程:

[batch_size, input_dim] * [input_dim, num_hiddens] + [batch_size, num_hiddens] *[num_hiddens, num_hiddens] +bias
GRU层
nn.GRU(input_size, hidden_size,num_layers=1)
LSTM 层
nn.LSTM(input_size=embedding_dim,hidden_size=hidden_size,num_layers=num_layers,
bias=True,batch_first=False,dropout=0.5,bidirectional=False)
激活函数层也可以用torch.nn.functional中的函数替代
Sigmoid层
导数<1,易梯度消失
nn.Sigmoid()
>>> sigmoid = nn.Sigmoid()
>>> sigmoid(torch.Tensor([1, 1, 2, 2]))
tensor([0.7311, 0.7311, 0.8808, 0.8808])
ReLU层
导数=1,但易梯度爆炸
nn.ReLU(inplace=False)
>>> relu = nn.ReLU(inplace=True)
>>> input = torch.randn(2, 2)
>>> input
tensor([[-0.4853, 2.3864],
[ 0.7122, -0.6493]])
>>> relu(input)
tensor([[0.0000, 2.3864],
[0.7122, 0.0000]])
>>> input
tensor([[0.0000, 2.3864],
[0.7122, 0.0000]])
Tanh层
导数<1,易梯度消失
nn.Tanh()
>>>m = nn.Tanh()
>>>input = torch.randn(2)
>>>output = m(input)
>>>print(output)
>tensor([0.5793, 0.2608])
Softplus层
nn.Softplus()
Softmax层
将一/二维数据变成符合概率分布的形式(非负,行和为1),常用于分类输出,放在FC层之后
nn.Softmax(dim=None)
>>> softmax = nn.Softmax(dim=1)
>>> score = torch.randn(1, 4)
>>> score
tensor([[ 0.3101, 3.5648, 1.0988, -1.5856]])
>>> softmax(score)
tensor([[0.0342, 0.8855, 0.0752, 0.0051]])
衡量模型输出 与真实标签的差异
CrossEntropyLoss交叉熵损失函数
nn.CrossEntropyLoss(weight=None, ignore_index=-100, reduction='mean')
Softmax + CrossEntropyLoss,它们两个结合在一起时梯度反向传播的时候结果就会很好
weight:各个类别loss的权重
ignore_index:忽略某个类别
reduction:计算模式(none/some/mean)
loss = nn.CrossEntropyLoss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.empty(3, dtype=torch.long).random_(5)
output = loss(input, target)
output.backward()
NLLLoss
nn.NLLLoss(weight=None, size_average=None,
ignore_index=-100, reduce=None, reduction='mean')
用于分类问题
按p概率,随机舍弃部分值
nn.Dropout(p=0.5, inplace=False)
>>> dropout = nn.Dropout(0.5, inplace=False)
>>> input = torch.randn(1, 20)
>>> output = dropout(input)
>>> output
tensor([[-2.9413, 0.0000, 1.8461, 1.9605, 0.2774, -0.0000, -2.5381, -2.0313,
-0.1914, 0.0000, 0.5346, -0.0000, 0.0000, 4.4960, -3.8345, -1.0938,
4.3297, 2.1258, -4.1431, 0.0000]])
>>> input
tensor([[-1.4707, 0.5105, 0.9231, 0.9802, 0.1387, -0.4195, -1.2690, -1.0156,
-0.0957, 0.8108, 0.2673, -2.0898, 0.6666, 2.2480, -1.9173, -0.5469,
2.1648, 1.0629, -2.0716, 0.9974]])
torch.nn.BatchNorm2d(num_features, eps=1e-05, momentum=0.1,
affine=True, track_running_stats=True)
num_features – C CC from an expected input of size ( N , C , H , W ) (N, C, H, W)(N,C,H,W)
eps – a value added to the denominator for numerical stability. Default: 1e-5
momentum – the value used for the running_mean and running_var computation. Can be set to None for cumulative moving average (i.e. simple average). Default: 0.1
affine – a boolean value that when set to True, this module has learnable affine parameters. Default: True
track_running_stats – a boolean value that when set to True, this module tracks the running mean and variance, and when set to False, this module does not track such statistics and always uses batch statistics in both training and eval modes. Default: True
>>> bn = nn.BatchNorm2d(64)
>>> input = torch.randn(4, 64, 28, 28)
>>> output = bn(input)
>>> output.shape
torch.Size([4, 64, 28, 28])