残差神经网络-ResNet

cnblogs 2024-06-26 08:13:00 阅读 90

残差神经网络

一些前置知识

神经网络的层数

神经网络的层数或者神经网络的深度指的是隐藏层数+1(输出层)。如下图所示,它是一个三层的网络。[通常情况下计算神经网络的层数时不包括池化层]

low/mid/high-level features

在神经网络中low/mid/high-level features指的是在不同层级上提取的特征,这些特征在抽象层次上逐渐增加。

  • 低级特征(low level features)。指的是从输入图像的最初几层卷积网络中提取的特征,它们通常捕捉图像中的基本信息,如边缘、纹理、颜色等。它们对于描述图像的局部结构非常重要。

  • 中级特征(mid level features)。中级特征是在网络的中间层提取的特征,它们能够捕捉图像中更加复杂的模式和局部结构,如角落、曲线、重复的结构、眼睛、嘴巴、轮子等。

  • 高级特征(high level features)。高级特征是从网络的后几层提取的特征,它们是最抽象的特征能够捕捉图像的全局信息和语义内容,能够描述图像中的完整物体和场景,如物体的整体形状、物体的具体类别(猫、狗、车、人等)。

End-to-end

端到端训练意味着整个模型从输入到输出是一个整体,没有手工设计的特征提取或中间步骤。

10-crop tesing

指的是:在测试时,对每个图片随机采样出10个不同的图片出来,然后分别做预测,然后把测试结果做平均。

fully convolutional form(FCN)

全卷积去掉了一般卷积网络最后的全连接层与平均池化层,新增了1*1的卷积和转置卷积。

  • 1*1卷积目的:不改变图像的空间结构信息,主要作用是进行降维即减少通道数。

  • 转置(Transposed)卷积目的:转置卷积也叫做反卷积或上采用卷积用于增加特征图的空间分辨率即高度和宽度,不改变通道数。如果CNN是把图像缩小,那么转置卷积的目的则是扩大图像(还原图像)。

error plateaus

误差稳定区:即错误率比较平的时候,言外之意就是错误率不再下降的时候。

动量梯度下降法

momentum是一种加速梯度下降优化算法的方法,有助于提高训练速度并避免陷入局部最小值。

传统梯度下降参数更新公式如下:

\(\theta_{t + 1} = \theta_t - \eta \cdot \nabla J(\theta_t)\)

加入momentum后,参数更新公式如下:

\(v_t+1 = \gamma v_t + \eta \nabla J(\theta_t)\)

\(\theta_{t + 1} = \theta - v_{t + 1}\)

unreferenced functions

无参考的函数,这里指的是传统神经网络尝试直接学习从输入到输出的映射函数,而不借助任何额外的参考信息或结构。而残差网络学习输入和输出之间的区别(\(H(\mathrm{x}) - \mathrm{x}\))

"plain" networks

指的是,在实验对比中没有使用跳跃连接的网络。

degradation problem

指的是还没有发生过拟合时,随着网络层数的增加,错误率也会增加,如下图所示:

实际上,本不应该发生这种问题,因为理论上存在一个构造解可以让深层网络达到和浅层网络一样的效果。

  • 构造解:假设存在两个网络,一个是浅层的,另外一个是在浅层的基础上继续增加层数构成深层网络。那么对于这个深层网络理论上存在一个构造解:能够使得新增的层输出就等于输入不做改变,而其他层的参数从浅层网络中复制过来。

构造解表明:深层网络至少可以达到与浅层网络相同的训练误差,因为深层可以通过构造解来退化为浅层网络。

但实际的实验结果表明:现在的优化算法很难找到与构造解一样好的解,或者比构造解更好的解。这意味着,尽管理论上深层网络不应该有比浅层网络更高的训练误差,但在实际优化过程中,实际的优化方法无法有效的找到这些好的解。

ResNet结构

这里为了节省空间,把图片旋转了。图片显示的是一个34层的残差网络,其中里面用虚线连接的表示维度不一致(即F(X)不能直接+X)

实现代码

import torch.nn as nn

import torch

class BasicBlock(nn.Module):

expansion = 1

def __init__(self, in_channel, out_channel, stride=1, downsample=None, **kwargs):

super(BasicBlock, self).__init__()

self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=out_channel,

kernel_size=3, stride=stride, padding=1, bias=False)

self.bn1 = nn.BatchNorm2d(out_channel)

self.relu = nn.ReLU()

self.conv2 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel,

kernel_size=3, stride=1, padding=1, bias=False)

self.bn2 = nn.BatchNorm2d(out_channel)

self.downsample = downsample

def forward(self, x):

identity = x

if self.downsample is not None:

identity = self.downsample(x)

out = self.conv1(x)

out = self.bn1(out)

out = self.relu(out)

out = self.conv2(out)

out = self.bn2(out)

out += identity

out = self.relu(out)

return out

class Bottleneck(nn.Module):

"""

注意:原论文中,在虚线残差结构的主分支上,第一个1x1卷积层的步距是2,第二个3x3卷积层步距是1。

但在pytorch官方实现过程中是第一个1x1卷积层的步距是1,第二个3x3卷积层步距是2,

这么做的好处是能够在top1上提升大概0.5%的准确率。

可参考Resnet v1.5 https://ngc.nvidia.com/catalog/model-scripts/nvidia:resnet_50_v1_5_for_pytorch

"""

expansion = 4

def __init__(self, in_channel, out_channel, stride=1, downsample=None,

groups=1, width_per_group=64):

super(Bottleneck, self).__init__()

width = int(out_channel * (width_per_group / 64.)) * groups

self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=width,

kernel_size=1, stride=1, bias=False) # squeeze channels

self.bn1 = nn.BatchNorm2d(width)

# -----------------------------------------

self.conv2 = nn.Conv2d(in_channels=width, out_channels=width, groups=groups,

kernel_size=3, stride=stride, bias=False, padding=1)

self.bn2 = nn.BatchNorm2d(width)

# -----------------------------------------

self.conv3 = nn.Conv2d(in_channels=width, out_channels=out_channel*self.expansion,

kernel_size=1, stride=1, bias=False) # unsqueeze channels

self.bn3 = nn.BatchNorm2d(out_channel*self.expansion)

self.relu = nn.ReLU(inplace=True)

self.downsample = downsample

def forward(self, x):

identity = x

if self.downsample is not None:

identity = self.downsample(x)

out = self.conv1(x)

out = self.bn1(out)

out = self.relu(out)

out = self.conv2(out)

out = self.bn2(out)

out = self.relu(out)

out = self.conv3(out)

out = self.bn3(out)

out += identity

out = self.relu(out)

return out

class ResNet(nn.Module):

def __init__(self,

block,

blocks_num,

num_classes=1000,

include_top=True,

groups=1,

width_per_group=64):

super(ResNet, self).__init__()

self.include_top = include_top

self.in_channel = 64

self.groups = groups

self.width_per_group = width_per_group

self.conv1 = nn.Conv2d(3, self.in_channel, kernel_size=7, stride=2,

padding=3, bias=False)

self.bn1 = nn.BatchNorm2d(self.in_channel)

self.relu = nn.ReLU(inplace=True)

self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

self.layer1 = self._make_layer(block, 64, blocks_num[0])

self.layer2 = self._make_layer(block, 128, blocks_num[1], stride=2)

self.layer3 = self._make_layer(block, 256, blocks_num[2], stride=2)

self.layer4 = self._make_layer(block, 512, blocks_num[3], stride=2)

if self.include_top:

self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) # output size = (1, 1)

self.fc = nn.Linear(512 * block.expansion, num_classes)

for m in self.modules():

if isinstance(m, nn.Conv2d):

nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')

def _make_layer(self, block, channel, block_num, stride=1):

downsample = None

if stride != 1 or self.in_channel != channel * block.expansion:

downsample = nn.Sequential(

nn.Conv2d(self.in_channel, channel * block.expansion, kernel_size=1, stride=stride, bias=False),

nn.BatchNorm2d(channel * block.expansion))

layers = []

layers.append(block(self.in_channel,

channel,

downsample=downsample,

stride=stride,

groups=self.groups,

width_per_group=self.width_per_group))

self.in_channel = channel * block.expansion

for _ in range(1, block_num):

layers.append(block(self.in_channel,

channel,

groups=self.groups,

width_per_group=self.width_per_group))

return nn.Sequential(*layers)

def forward(self, x):

x = self.conv1(x)

x = self.bn1(x)

x = self.relu(x)

x = self.maxpool(x)

x = self.layer1(x)

x = self.layer2(x)

x = self.layer3(x)

x = self.layer4(x)

if self.include_top:

x = self.avgpool(x)

x = torch.flatten(x, 1)

x = self.fc(x)

return x

def resnet34(num_classes=1000, include_top=True):

# https://download.pytorch.org/models/resnet34-333f7ec4.pth

return ResNet(BasicBlock, [3, 4, 6, 3], num_classes=num_classes, include_top=include_top)

def resnet50(num_classes=1000, include_top=True):

# https://download.pytorch.org/models/resnet50-19c8e357.pth

return ResNet(Bottleneck, [3, 4, 6, 3], num_classes=num_classes, include_top=include_top)

def resnet101(num_classes=1000, include_top=True):

# https://download.pytorch.org/models/resnet101-5d3b4d8f.pth

return ResNet(Bottleneck, [3, 4, 23, 3], num_classes=num_classes, include_top=include_top)

def resnext50_32x4d(num_classes=1000, include_top=True):

# https://download.pytorch.org/models/resnext50_32x4d-7cdf4587.pth

groups = 32

width_per_group = 4

return ResNet(Bottleneck, [3, 4, 6, 3],

num_classes=num_classes,

include_top=include_top,

groups=groups,

width_per_group=width_per_group)

def resnext101_32x8d(num_classes=1000, include_top=True):

# https://download.pytorch.org/models/resnext101_32x8d-8ba56ff5.pth

groups = 32

width_per_group = 8

return ResNet(Bottleneck, [3, 4, 23, 3],

num_classes=num_classes,

include_top=include_top,

groups=groups,

width_per_group=width_per_group)



声明

本文内容仅代表作者观点,或转载于其他网站,本站不以此文作为商业用途
如有涉及侵权,请联系本站进行删除
转载本站原创文章,请注明来源及作者。