固定部分参数进行训练
有两种实现场景:
- 仅训练某些层
- 暂时固定某些层
- 固定BN层(注意!!!)
仅训练某些层
直接在优化器中输入要训练的层参数即可
第一步:在模型中设置不训练的层参数的require_grads
为False
class AlexNet(nn.Module):
def __init__(self, num_classes=1000):
super(AlexNet_SPP, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(64, 192, kernel_size=5, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(192, 384, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(384, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(256, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
)
self.avgpool = nn.AdaptiveAvgPool2d((6, 6))
# 只训练分类器,设置之前参数的require_grads为False
for p in self.parameters():
p.requires_grad = False
self.classifier = nn.Sequential(
nn.Dropout(),
nn.Linear(256 * 50, 4096),
nn.ReLU(inplace=True),
nn.Dropout(),
nn.Linear(4096, 4096),
nn.ReLU(inplace=True),
nn.Linear(4096, num_classes),
)
def forward(self, x):
x = self.features(x)
x = self.avgpool(x)
x = torch.flatten(x, 1)
x = self.classifier(x)
return x
第二步:在optimizer
中输入待训练的层参数
optimizer = optim.Adam(filter(lambda p: p.requires_grad, model.parameters()), lr=1e-4)
Note:使用filter
和lambda
函数过滤不训练的层参数
最近又遇到这个问题,在pytorch
论坛上找了一些相关资料。发现目前优化器已经隐式在内部过滤requires_grad=False
的参数,比如sgd。
不过还是推荐显式执行第二步,因为不同优化器的执行会有差别,而且显式操作总是更安全。
I would recommend to filter them out, as this would stick to the Python Zen Explicit is better than implicit.
An unwanted side effect of passing all parameters could be that the parameters, which are frozen now, are still being updated, if they required gradients before, and if you are using an optimizer with running estimates, e.g. Adam.
暂时固定某些层
使用named_parameters过滤
for k, v in model.named_parameters():
print(k, v.requires_grad)
if 'classifier' not in k:
v.requires_grad = False
固定BN层
参考:Pytorch Batch Normalizatin layer的坑
仅设置requires_grad=False
然后通过filter(lambda ...)
方式只能冻结有参数部分模块,对于BN
层而言还存在运行时参数(均值和方差)的更新,必须显示设置BN
层为eval()
状态
方式一:在train()
之后调用那个fix_bn
函数显式设置BN
层状态
def fix_bn(m):
classname = m.__class__.__name__
if classname.find('BatchNorm') != -1:
m.eval()
...
model.train()
model.apply(fix_bn) # fix batchnorm
方式二:重写模型train()
方法,设置BN
层状态
def train(self, mode=True):
"""
Override the default train() to freeze the BN parameters
"""
super(MyNet, self).train(mode)
if self.freeze_bn:
print("Freezing Mean/Var of BatchNorm2D.")
if self.freeze_bn_affine:
print("Freezing Weight/Bias of BatchNorm2D.")
if self.freeze_bn:
for m in self.backbone.modules():
if isinstance(m, nn.BatchNorm2d):
m.eval()
if self.freeze_bn_affine:
m.weight.requires_grad = False
m.bias.requires_grad = False
在实际操作过程中,会发现冻结全部BN
层会导致训练梯度极易爆炸,可以做如下调整:
- 可以固定模型前面层的
BN
层参数而放开后续层BN
层训练; - 降低微调训练初始学习率。
相关阅读
- pytorch固定部分参数进行网络训练
- Parameters with requires_grad = False are updated during training
- Trying to understand Optimizer and relation to requires_grad
- What is the behavior of passing model parameters with
requires_grad == False
to an optimizer? - Why is it when I call require_grad = False on all my params my weights in the network would still update?