Skip to content

固定部分参数进行训练

有两种实现场景:

  1. 仅训练某些层
  2. 暂时固定某些层
  3. 固定BN层(注意!!!)

仅训练某些层

直接在优化器中输入要训练的层参数即可

第一步:在模型中设置不训练的层参数的require_gradsFalse

class AlexNet(nn.Module):

    def __init__(self, num_classes=1000):
        super(AlexNet_SPP, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(64, 192, kernel_size=5, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(192, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
        )
        self.avgpool = nn.AdaptiveAvgPool2d((6, 6))
        # 只训练分类器,设置之前参数的require_grads为False
        for p in self.parameters():
            p.requires_grad = False

        self.classifier = nn.Sequential(
            nn.Dropout(),
            nn.Linear(256 * 50, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Linear(4096, num_classes),
        )

    def forward(self, x):
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

第二步:在optimizer中输入待训练的层参数

optimizer = optim.Adam(filter(lambda p: p.requires_grad, model.parameters()), lr=1e-4)

Note:使用filterlambda函数过滤不训练的层参数

最近又遇到这个问题,在pytorch论坛上找了一些相关资料。发现目前优化器已经隐式在内部过滤requires_grad=False的参数,比如sgd

不过还是推荐显式执行第二步,因为不同优化器的执行会有差别,而且显式操作总是更安全。

I would recommend to filter them out, as this would stick to the Python Zen Explicit is better than implicit.

An unwanted side effect of passing all parameters could be that the parameters, which are frozen now, are still being updated, if they required gradients before, and if you are using an optimizer with running estimates, e.g. Adam.

暂时固定某些层

使用named_parameters过滤

for k, v in model.named_parameters():
        print(k, v.requires_grad)
        if 'classifier' not in k:
                v.requires_grad = False

固定BN层

参考:Pytorch Batch Normalizatin layer的坑

仅设置requires_grad=False然后通过filter(lambda ...)方式只能冻结有参数部分模块,对于BN层而言还存在运行时参数(均值和方差)的更新,必须显示设置BN层为eval()状态

方式一:在train()之后调用那个fix_bn函数显式设置BN层状态

def fix_bn(m):
    classname = m.__class__.__name__
    if classname.find('BatchNorm') != -1:
        m.eval()

...
model.train()
model.apply(fix_bn) # fix batchnorm

方式二:重写模型train()方法,设置BN层状态

def train(self, mode=True):
        """
        Override the default train() to freeze the BN parameters
        """
        super(MyNet, self).train(mode)
        if self.freeze_bn:
            print("Freezing Mean/Var of BatchNorm2D.")
            if self.freeze_bn_affine:
                print("Freezing Weight/Bias of BatchNorm2D.")
        if self.freeze_bn:
            for m in self.backbone.modules():
                if isinstance(m, nn.BatchNorm2d):
                    m.eval()
                    if self.freeze_bn_affine:
                        m.weight.requires_grad = False
                        m.bias.requires_grad = False

在实际操作过程中,会发现冻结全部BN层会导致训练梯度极易爆炸,可以做如下调整:

  1. 可以固定模型前面层的BN层参数而放开后续层BN层训练;
  2. 降低微调训练初始学习率。

相关阅读