学习VGG笔记

文章目录

16~19层深的卷积神经网络VGGNet探索了卷积深度和性能的关系：深度一定程度上影响网络性能，能使错误率下降，拓展性，泛化性变好。
VGGNet可以看成是加深版本的AlexNet，都是由卷积层、全连接层两大部分构成
大道至简，通篇使用3*3大小的小卷积核
相比AlexNet的3x3的池化核，VGG全部采用2x2的池化核。
通道数多，每层都进行了翻倍，最多到512个通道，AlexNet和ZFNet最多的通道是256
卷积层专注于扩大feature maps的通道数、池化层专注于缩小feature maps的宽和高，层数更深
模型的特征提取阶段是不断重复堆叠卷积层和池化层实现的，一共经过5次下采样，下采样方式为最大池化。注意：通过调整步长（strdie）和填充（padding），网络中的所有卷积操作都没有改变输入特征图的尺寸。
在最后的顶层设计中，经过三个全连接层实现对图片的分类操作。注意：由于全连接层的存在，网络只能接收固定大小的图像尺寸。
两个3x3的卷积堆叠获得的感受野大小，相当一个5x5的卷积；而3个3x3卷积的堆叠获取到的感受野相当于一个7x7的卷积。这样可以增加非线性映射，也能很好地减少参数（例如7x7的参数为49个，而3个3x3的参数为27）
代码最好跟着敲一遍，虽然直接看都能理解，但是敲一遍也是对代码能力的一个培养
VGGNet代码：

import torch.nn as nn
import torch
 
class VGG(nn.Module):
    def __init__(self, features, num_classes=1000, init_weights=False):
        super(VGG, self).__init__()
        self.features = features
        self.classifier = nn.Sequential(
            nn.Linear(512*7*7, 4096),
            nn.ReLU(True),
            nn.Dropout(p=0.5),
            nn.Linear(4096, 4096),
            nn.ReLU(True),
            nn.Dropout(p=0.5),
            nn.Linear(4096, num_classes)
        )
        if init_weights:
            self._initialize_weights()

    def forward(self, x):
        # N x 3 x 224 x 224
        x = self.features(x)
        # N x 512 x 7 x 7
        x = torch.flatten(x, start_dim=1)
        # N x 512*7*7
        x = self.classifier(x)
        return x

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                nn.init.xavier_uniform_(m.weight)
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.xavier_uniform_(m.weight)
                # nn.init.normal_(m.weight, 0, 0.01)
                nn.init.constant_(m.bias, 0)


def make_features(cfg: list):
    layers = []
    in_channels = 3
    for v in cfg:
        if v == "M":
            layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
        else:
            conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)
            layers += [conv2d, nn.ReLU(True)]
            in_channels = v
    return nn.Sequential(*layers)

# vgg_tiny(VGG11), vgg_small(VGG13), vgg(VGG16), vgg_big(VGG19)
cfgs = {
    'vgg11': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],   
    'vgg13': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    'vgg16': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
    'vgg19': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'],
}


def vgg11(num_classes): 
    cfg = cfgs["vgg11"]
    model = VGG(make_features(cfg), num_classes=num_classes)
    return model

def vgg13(num_classes):  
    cfg = cfgs["vgg13"]
    model = VGG(make_features(cfg), num_classes=num_classes)
    return model

def vgg16(num_classes):  
    cfg = cfgs["vgg16"]
    model = VGG(make_features(cfg), num_classes=num_classes)
    return model

def vgg19(num_classes):  
    cfg = cfgs['vgg19']
    model = VGG(make_features(cfg), num_classes=num_classes)
    return model
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82

论文名称：Very deep convolutional networks for large-scale image recognition
论文下载链接：https://arxiv.org/pdf/1409.1556.pdf%E3%80%82
pytorch代码实现：https://github.com/Arwin-Yu/Deep-Learning-Classification-Models-Based-CNN-or-Attention

相关阅读:
负载均衡性能参数如何测评？
【python】查询字符串a = ‘hello word‘中第二个o的索引
 11月15日，每日信息差
 [计算机提升] 计算机系统中的格式化
 前端面试宝典React篇20 React 中你常用的工具库有哪些？
【Pinia】小菠萝详细使用说明
 利用stream实现行政区域列表转tree树形结构
 ASP.NET Core - 依赖注入(二)
第一天| 第一章数组part01 数组理论基础、704. 二分查找、27. 移除元素
 Flask 实现增改及分页查询的完整 Demo
原文地址：https://blog.csdn.net/qq_61735602/article/details/133943977