mindspore.compression¶
mindspore.compression.quant¶
Compression quant module.
-
class
mindspore.compression.quant.
QuantizationAwareTraining
(bn_fold=True, freeze_bn=10000000, quant_delay=(0, 0), quant_dtype=(<QuantDtype.INT8: 'INT8'>, <QuantDtype.INT8: 'INT8'>), per_channel=(False, False), symmetric=(False, False), narrow_range=(False, False), optimize_option=<OptimizeOption.QAT: 'QAT'>, one_conv_fold=True)[source]¶ Quantizer for quantization aware training.
- Parameters
bn_fold (bool) – Flag to used bn fold ops for simulation inference operation. Default: True.
freeze_bn (int) – Number of steps after which BatchNorm OP parameters used total mean and variance. Default: 1e7.
quant_delay (Union[int, list, tuple]) – Number of steps after which weights and activations are quantized during eval. The first element represents weights and second element represents data flow. Default: (0, 0)
quant_dtype (Union[QuantDtype, list, tuple]) – Datatype to use for quantize weights and activations. The first element represents weights and second element represents data flow. Default: (QuantDtype.INT8, QuantDtype.INT8)
per_channel (Union[bool, list, tuple]) – Quantization granularity based on layer or on channel. If True then base on per channel otherwise base on per layer. The first element represents weights and second element represents data flow. Default: (False, False)
symmetric (Union[bool, list, tuple]) – Whether the quantization algorithm is symmetric or not. If True then base on symmetric otherwise base on asymmetric. The first element represents weights and second element represents data flow. Default: (False, False)
narrow_range (Union[bool, list, tuple]) – Whether the quantization algorithm uses narrow range or not. The first element represents weights and the second element represents data flow. Default: (False, False)
optimize_option (Union[OptimizeOption, list, tuple]) – Specifies the quant algorithm and options, currently only support QAT. Default: OptimizeOption.QAT
one_conv_fold (bool) – Flag to used one conv bn fold ops for simulation inference operation. Default: True.
Examples
>>> class LeNet5(nn.Cell): ... def __init__(self, num_class=10, channel=1): ... super(LeNet5, self).__init__() ... self.type = "fusion" ... self.num_class = num_class ... ... # change `nn.Conv2d` to `nn.Conv2dBnAct` ... self.conv1 = nn.Conv2dBnAct(channel, 6, 5, pad_mode='valid', activation='relu') ... self.conv2 = nn.Conv2dBnAct(6, 16, 5, pad_mode='valid', activation='relu') ... # change `nn.Dense` to `nn.DenseBnAct` ... self.fc1 = nn.DenseBnAct(16 * 5 * 5, 120, activation='relu') ... self.fc2 = nn.DenseBnAct(120, 84, activation='relu') ... self.fc3 = nn.DenseBnAct(84, self.num_class) ... ... self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2) ... self.flatten = nn.Flatten() ... ... def construct(self, x): ... x = self.conv1(x) ... x = self.max_pool2d(x) ... x = self.conv2(x) ... x = self.max_pool2d(x) ... x = self.flatten(x) ... x = self.fc1(x) ... x = self.fc2(x) ... x = self.fc3(x) ... return x ... >>> net = LeNet5() >>> quantizer = QuantizationAwareTraining(bn_fold=False, per_channel=[True, False], symmetric=[True, False]) >>> net_qat = quantizer.quantize(net)
-
mindspore.compression.quant.
create_quant_config
(quant_observer=(<class 'mindspore.nn.layer.quant.FakeQuantWithMinMaxObserver'>, <class 'mindspore.nn.layer.quant.FakeQuantWithMinMaxObserver'>), quant_delay=(0, 0), quant_dtype=(<QuantDtype.INT8: 'INT8'>, <QuantDtype.INT8: 'INT8'>), per_channel=(False, False), symmetric=(False, False), narrow_range=(False, False))[source]¶ Config the observer type of weights and data flow with quant params.
- Parameters
quant_observer (Union[Observer, list, tuple]) – The observer type to do quantization. The first element represents weights and second element represents data flow. Default: (nn.FakeQuantWithMinMaxObserver, nn.FakeQuantWithMinMaxObserver)
quant_delay (Union[int, list, tuple]) – Number of steps after which weights and activations are quantized during eval. The first element represents weights and second element represents data flow. Default: (0, 0)
quant_dtype (Union[QuantDtype, list, tuple]) – Datatype to use for quantize weights and activations. The first element represents weights and second element represents data flow. Default: (QuantDtype.INT8, QuantDtype.INT8)
per_channel (Union[bool, list, tuple]) – Quantization granularity based on layer or on channel. If True then base on per channel otherwise base on per layer. The first element represents weights and second element represents data flow. Default: (False, False)
symmetric (Union[bool, list, tuple]) – Whether the quantization algorithm is symmetric or not. If True then base on symmetric otherwise base on asymmetric. The first element represents weights and second element represents data flow. Default: (False, False)
narrow_range (Union[bool, list, tuple]) – Whether the quantization algorithm uses narrow range or not. The first element represents weights and the second element represents data flow. Default: (False, False)
- Returns
QuantConfig, Contains the observer type of weight and activation.
-
class
mindspore.compression.quant.
OptimizeOption
[source]¶ An enum for the model quantization optimize option, currently only support QAT.
mindspore.compression.common¶
Compression common module.
-
class
mindspore.compression.common.
QuantDtype
[source]¶ An enum for quant datatype, contains INT2`~`INT8, UINT2`~`UINT8.
-
num_bits
¶ Get the num bits of the QuantDtype member.
- Returns
int, the num bits of the QuantDtype member
Examples
>>> quant_dtype = QuantDtype.INT8 >>> num_bits = quant_dtype.num_bits
-