Skip to content

bit

mindnlp.transformers.models.bit.configuration_bit.BitConfig

Bases: BackboneConfigMixin, PretrainedConfig

This is the configuration class to store the configuration of a [BitModel]. It is used to instantiate an BiT model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the BiT google/bit-50 architecture.

Configuration objects inherit from [PretrainedConfig] and can be used to control the model outputs. Read the documentation from [PretrainedConfig] for more information.

PARAMETER DESCRIPTION
num_channels

The number of input channels.

TYPE: `int`, *optional*, defaults to 3 DEFAULT: 3

embedding_size

Dimensionality (hidden size) for the embedding layer.

TYPE: `int`, *optional*, defaults to 64 DEFAULT: 64

hidden_sizes

Dimensionality (hidden size) at each stage.

TYPE: `List[int]`, *optional*, defaults to `[256, 512, 1024, 2048]` DEFAULT: [256, 512, 1024, 2048]

depths

Depth (number of layers) for each stage.

TYPE: `List[int]`, *optional*, defaults to `[3, 4, 6, 3]` DEFAULT: [3, 4, 6, 3]

layer_type

The layer to use, it can be either "preactivation" or "bottleneck".

TYPE: `str`, *optional*, defaults to `"preactivation"` DEFAULT: 'preactivation'

hidden_act

The non-linear activation function in each block. If string, "gelu", "relu", "selu" and "gelu_new" are supported.

TYPE: `str`, *optional*, defaults to `"relu"` DEFAULT: 'relu'

global_padding

Padding strategy to use for the convolutional layers. Can be either "valid", "same", or None.

TYPE: `str`, *optional* DEFAULT: None

num_groups

Number of groups used for the BitGroupNormActivation layers.

TYPE: `int`, *optional*, defaults to 32 DEFAULT: 32

drop_path_rate

The drop path rate for the stochastic depth.

TYPE: `float`, *optional*, defaults to 0.0 DEFAULT: 0.0

embedding_dynamic_padding

Whether or not to make use of dynamic padding for the embedding layer.

TYPE: `bool`, *optional*, defaults to `False` DEFAULT: False

output_stride

The output stride of the model.

TYPE: `int`, *optional*, defaults to 32 DEFAULT: 32

width_factor

The width factor for the model.

TYPE: `int`, *optional*, defaults to 1 DEFAULT: 1

out_features

If used as backbone, list of features to output. Can be any of "stem", "stage1", "stage2", etc. (depending on how many stages the model has). If unset and out_indices is set, will default to the corresponding stages. If unset and out_indices is unset, will default to the last stage. Must be in the same order as defined in the stage_names attribute.

TYPE: `List[str]`, *optional* DEFAULT: None

out_indices

If used as backbone, list of indices of features to output. Can be any of 0, 1, 2, etc. (depending on how many stages the model has). If unset and out_features is set, will default to the corresponding stages. If unset and out_features is unset, will default to the last stage. Must be in the same order as defined in the stage_names attribute.

TYPE: `List[int]`, *optional* DEFAULT: None

Example
>>> from transformers import BitConfig, BitModel
...
>>> # Initializing a BiT bit-50 style configuration
>>> configuration = BitConfig()
...
>>> # Initializing a model (with random weights) from the bit-50 style configuration
>>> model = BitModel(configuration)
...
>>> # Accessing the model configuration
>>> configuration = model.config
Source code in mindnlp/transformers/models/bit/configuration_bit.py
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
class BitConfig(BackboneConfigMixin, PretrainedConfig):
    r"""
    This is the configuration class to store the configuration of a [`BitModel`]. It is used to instantiate an BiT
    model according to the specified arguments, defining the model architecture. Instantiating a configuration with the
    defaults will yield a similar configuration to that of the BiT
    [google/bit-50](https://huggingface.co/google/bit-50) architecture.

    Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
    documentation from [`PretrainedConfig`] for more information.

    Args:
        num_channels (`int`, *optional*, defaults to 3):
            The number of input channels.
        embedding_size (`int`, *optional*, defaults to 64):
            Dimensionality (hidden size) for the embedding layer.
        hidden_sizes (`List[int]`, *optional*, defaults to `[256, 512, 1024, 2048]`):
            Dimensionality (hidden size) at each stage.
        depths (`List[int]`, *optional*, defaults to `[3, 4, 6, 3]`):
            Depth (number of layers) for each stage.
        layer_type (`str`, *optional*, defaults to `"preactivation"`):
            The layer to use, it can be either `"preactivation"` or `"bottleneck"`.
        hidden_act (`str`, *optional*, defaults to `"relu"`):
            The non-linear activation function in each block. If string, `"gelu"`, `"relu"`, `"selu"` and `"gelu_new"`
            are supported.
        global_padding (`str`, *optional*):
            Padding strategy to use for the convolutional layers. Can be either `"valid"`, `"same"`, or `None`.
        num_groups (`int`, *optional*, defaults to 32):
            Number of groups used for the `BitGroupNormActivation` layers.
        drop_path_rate (`float`, *optional*, defaults to 0.0):
            The drop path rate for the stochastic depth.
        embedding_dynamic_padding (`bool`, *optional*, defaults to `False`):
            Whether or not to make use of dynamic padding for the embedding layer.
        output_stride (`int`, *optional*, defaults to 32):
            The output stride of the model.
        width_factor (`int`, *optional*, defaults to 1):
            The width factor for the model.
        out_features (`List[str]`, *optional*):
            If used as backbone, list of features to output. Can be any of `"stem"`, `"stage1"`, `"stage2"`, etc.
            (depending on how many stages the model has). If unset and `out_indices` is set, will default to the
            corresponding stages. If unset and `out_indices` is unset, will default to the last stage. Must be in the
            same order as defined in the `stage_names` attribute.
        out_indices (`List[int]`, *optional*):
            If used as backbone, list of indices of features to output. Can be any of 0, 1, 2, etc. (depending on how
            many stages the model has). If unset and `out_features` is set, will default to the corresponding stages.
            If unset and `out_features` is unset, will default to the last stage. Must be in the
            same order as defined in the `stage_names` attribute.

    Example:
        ```python
        >>> from transformers import BitConfig, BitModel
        ...
        >>> # Initializing a BiT bit-50 style configuration
        >>> configuration = BitConfig()
        ...
        >>> # Initializing a model (with random weights) from the bit-50 style configuration
        >>> model = BitModel(configuration)
        ...
        >>> # Accessing the model configuration
        >>> configuration = model.config
        ```
    """
    model_type = "bit"
    layer_types = ["preactivation", "bottleneck"]
    supported_padding = ["SAME", "VALID"]

    def __init__(
        self,
        num_channels=3,
        embedding_size=64,
        hidden_sizes=[256, 512, 1024, 2048],
        depths=[3, 4, 6, 3],
        layer_type="preactivation",
        hidden_act="relu",
        global_padding=None,
        num_groups=32,
        drop_path_rate=0.0,
        embedding_dynamic_padding=False,
        output_stride=32,
        width_factor=1,
        out_features=None,
        out_indices=None,
        **kwargs,
    ):
        """
        Initialize the BitConfig class with the provided configuration parameters.

        Args:
            self: Reference to the current instance of the class.
            num_channels (int): Number of input channels for the model. Defaults to 3.
            embedding_size (int): Dimensionality of the embedding space. Defaults to 64.
            hidden_sizes (list): List of integers specifying the sizes of hidden layers in the model.
            depths (list): List of integers representing the depths of each stage in the model.
            layer_type (str): Type of layer architecture to use in the model.
            hidden_act (str): Activation function to apply in the hidden layers. Default is 'relu'.
            global_padding (str): Strategy for padding. Must be one of the supported padding strategies.
            num_groups (int): Number of groups for group normalization.
            drop_path_rate (float): Probability of dropping a path during training. Default is 0.0.
            embedding_dynamic_padding (bool): Flag indicating whether dynamic padding should be applied to embeddings.
            output_stride (int): Stride value for output computation.
            width_factor (int): Factor to scale the width of the model.
            out_features (list): List of output features to align with stage names.
            out_indices (list): List of output indices to align with stage names.

        Returns:
            None.

        Raises:
            ValueError: If the provided layer_type is not supported.
            ValueError: If the global_padding strategy is not supported.
        """
        super().__init__(**kwargs)
        if layer_type not in self.layer_types:
            raise ValueError(f"layer_type={layer_type} is not one of {','.join(self.layer_types)}")
        if global_padding is not None:
            if global_padding.upper() in self.supported_padding:
                global_padding = global_padding.upper()
            else:
                raise ValueError(f"Padding strategy {global_padding} not supported")
        self.num_channels = num_channels
        self.embedding_size = embedding_size
        self.hidden_sizes = hidden_sizes
        self.depths = depths
        self.layer_type = layer_type
        self.hidden_act = hidden_act
        self.global_padding = global_padding
        self.num_groups = num_groups
        self.drop_path_rate = drop_path_rate
        self.embedding_dynamic_padding = embedding_dynamic_padding
        self.output_stride = output_stride
        self.width_factor = width_factor

        self.stage_names = ["stem"] + [f"stage{idx}" for idx in range(1, len(depths) + 1)]
        self._out_features, self._out_indices = get_aligned_output_features_output_indices(
            out_features=out_features, out_indices=out_indices, stage_names=self.stage_names
        )

mindnlp.transformers.models.bit.configuration_bit.BitConfig.__init__(num_channels=3, embedding_size=64, hidden_sizes=[256, 512, 1024, 2048], depths=[3, 4, 6, 3], layer_type='preactivation', hidden_act='relu', global_padding=None, num_groups=32, drop_path_rate=0.0, embedding_dynamic_padding=False, output_stride=32, width_factor=1, out_features=None, out_indices=None, **kwargs)

Initialize the BitConfig class with the provided configuration parameters.

PARAMETER DESCRIPTION
self

Reference to the current instance of the class.

num_channels

Number of input channels for the model. Defaults to 3.

TYPE: int DEFAULT: 3

embedding_size

Dimensionality of the embedding space. Defaults to 64.

TYPE: int DEFAULT: 64

hidden_sizes

List of integers specifying the sizes of hidden layers in the model.

TYPE: list DEFAULT: [256, 512, 1024, 2048]

depths

List of integers representing the depths of each stage in the model.

TYPE: list DEFAULT: [3, 4, 6, 3]

layer_type

Type of layer architecture to use in the model.

TYPE: str DEFAULT: 'preactivation'

hidden_act

Activation function to apply in the hidden layers. Default is 'relu'.

TYPE: str DEFAULT: 'relu'

global_padding

Strategy for padding. Must be one of the supported padding strategies.

TYPE: str DEFAULT: None

num_groups

Number of groups for group normalization.

TYPE: int DEFAULT: 32

drop_path_rate

Probability of dropping a path during training. Default is 0.0.

TYPE: float DEFAULT: 0.0

embedding_dynamic_padding

Flag indicating whether dynamic padding should be applied to embeddings.

TYPE: bool DEFAULT: False

output_stride

Stride value for output computation.

TYPE: int DEFAULT: 32

width_factor

Factor to scale the width of the model.

TYPE: int DEFAULT: 1

out_features

List of output features to align with stage names.

TYPE: list DEFAULT: None

out_indices

List of output indices to align with stage names.

TYPE: list DEFAULT: None

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
ValueError

If the provided layer_type is not supported.

ValueError

If the global_padding strategy is not supported.

Source code in mindnlp/transformers/models/bit/configuration_bit.py
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
def __init__(
    self,
    num_channels=3,
    embedding_size=64,
    hidden_sizes=[256, 512, 1024, 2048],
    depths=[3, 4, 6, 3],
    layer_type="preactivation",
    hidden_act="relu",
    global_padding=None,
    num_groups=32,
    drop_path_rate=0.0,
    embedding_dynamic_padding=False,
    output_stride=32,
    width_factor=1,
    out_features=None,
    out_indices=None,
    **kwargs,
):
    """
    Initialize the BitConfig class with the provided configuration parameters.

    Args:
        self: Reference to the current instance of the class.
        num_channels (int): Number of input channels for the model. Defaults to 3.
        embedding_size (int): Dimensionality of the embedding space. Defaults to 64.
        hidden_sizes (list): List of integers specifying the sizes of hidden layers in the model.
        depths (list): List of integers representing the depths of each stage in the model.
        layer_type (str): Type of layer architecture to use in the model.
        hidden_act (str): Activation function to apply in the hidden layers. Default is 'relu'.
        global_padding (str): Strategy for padding. Must be one of the supported padding strategies.
        num_groups (int): Number of groups for group normalization.
        drop_path_rate (float): Probability of dropping a path during training. Default is 0.0.
        embedding_dynamic_padding (bool): Flag indicating whether dynamic padding should be applied to embeddings.
        output_stride (int): Stride value for output computation.
        width_factor (int): Factor to scale the width of the model.
        out_features (list): List of output features to align with stage names.
        out_indices (list): List of output indices to align with stage names.

    Returns:
        None.

    Raises:
        ValueError: If the provided layer_type is not supported.
        ValueError: If the global_padding strategy is not supported.
    """
    super().__init__(**kwargs)
    if layer_type not in self.layer_types:
        raise ValueError(f"layer_type={layer_type} is not one of {','.join(self.layer_types)}")
    if global_padding is not None:
        if global_padding.upper() in self.supported_padding:
            global_padding = global_padding.upper()
        else:
            raise ValueError(f"Padding strategy {global_padding} not supported")
    self.num_channels = num_channels
    self.embedding_size = embedding_size
    self.hidden_sizes = hidden_sizes
    self.depths = depths
    self.layer_type = layer_type
    self.hidden_act = hidden_act
    self.global_padding = global_padding
    self.num_groups = num_groups
    self.drop_path_rate = drop_path_rate
    self.embedding_dynamic_padding = embedding_dynamic_padding
    self.output_stride = output_stride
    self.width_factor = width_factor

    self.stage_names = ["stem"] + [f"stage{idx}" for idx in range(1, len(depths) + 1)]
    self._out_features, self._out_indices = get_aligned_output_features_output_indices(
        out_features=out_features, out_indices=out_indices, stage_names=self.stage_names
    )

mindnlp.transformers.models.bit.modeling_bit.BitForImageClassification

Bases: BitPreTrainedModel

BitForImageClassification is a class that represents a model for image classification using a Bit (Big Transfer) architecture. It inherits from BitPreTrainedModel and provides functionalities for image classification tasks.

ATTRIBUTE DESCRIPTION
num_labels

The number of labels for classification.

TYPE: int

bit

BitModel instance for feature extraction.

TYPE: BitModel

classifier

Neural network layers for classification.

TYPE: Sequential

METHOD DESCRIPTION
__init__

Initializes the BitForImageClassification instance with the given configuration.

forward

Constructs the image classifier model with optional inputs and returns the output with or without attention.

Parameters:

  • pixel_values (mindspore.Tensor): Tensor of shape (batch_size, channels, height, width) representing input images.
  • labels (mindspore.Tensor): Tensor of shape (batch_size,) representing labels for classification/regression. Indices should be in [0, ..., config.num_labels - 1]. For classification, a classification loss is computed (Cross-Entropy).
  • output_hidden_states (bool): Flag to indicate whether to output hidden states.
  • return_dict (bool): Flag to specify the format of the returned output.
Returns

ImageClassifierOutputWithNoAttention: Output containing loss, logits, and hidden states if specified.

Source code in mindnlp/transformers/models/bit/modeling_bit.py
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
class BitForImageClassification(BitPreTrainedModel):

    """
    BitForImageClassification is a class that represents a model for image classification using a Bit (Big Transfer) architecture.
    It inherits from BitPreTrainedModel and provides functionalities for image classification tasks.

    Attributes:
        num_labels (int): The number of labels for classification.
        bit (BitModel): BitModel instance for feature extraction.
        classifier (nn.Sequential): Neural network layers for classification.

    Methods:
        __init__:
            Initializes the BitForImageClassification instance with the given configuration.

        forward:
            Constructs the image classifier model with optional inputs and returns the output with or without attention.

            Parameters:

            - pixel_values (mindspore.Tensor): Tensor of shape `(batch_size, channels, height, width)` representing input images.
            - labels (mindspore.Tensor): Tensor of shape `(batch_size,)` representing labels for classification/regression.
                Indices should be in `[0, ..., config.num_labels - 1]`. For classification, a classification loss is computed (Cross-Entropy).
            - output_hidden_states (bool): Flag to indicate whether to output hidden states.
            - return_dict (bool): Flag to specify the format of the returned output.

        Returns:
            ImageClassifierOutputWithNoAttention: Output containing loss, logits, and hidden states if specified.
    """
    def __init__(self, config):
        """
        Initializes an instance of the BitForImageClassification class.

        Args:
            self (BitForImageClassification): The current instance of the BitForImageClassification class.
            config: The configuration object containing various settings for the model.

        Returns:
            None

        Raises:
            None
        """
        super().__init__(config)
        self.num_labels = config.num_labels
        self.bit = BitModel(config)
        # classification head
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(config.hidden_sizes[-1], config.num_labels) if config.num_labels > 0 else nn.Identity(),
        )
        # initialize weights and apply final processing
        self.post_init()

    def forward(
        self,
        pixel_values: Optional[mindspore.Tensor] = None,
        labels: Optional[mindspore.Tensor] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> ImageClassifierOutputWithNoAttention:
        r"""
        Args:
            labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
                Labels for computing the image classification/regression loss. Indices should be in `[0, ...,
                config.num_labels - 1]`. If `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        outputs = self.bit(pixel_values, output_hidden_states=output_hidden_states, return_dict=return_dict)

        pooled_output = outputs.pooler_output if return_dict else outputs[1]

        logits = self.classifier(pooled_output)

        loss = None

        if labels is not None:
            if self.config.problem_type is None:
                if self.num_labels == 1:
                    self.config.problem_type = "regression"
                elif self.num_labels > 1 and labels.dtype in (mindspore.int64, mindspore.int32):
                    self.config.problem_type = "single_label_classification"
                else:
                    self.config.problem_type = "multi_label_classification"
            if self.config.problem_type == "regression":
                if self.num_labels == 1:
                    loss = F.mse_loss(logits.squeeze(), labels.squeeze())
                else:
                    loss = F.mse_loss(logits, labels)
            elif self.config.problem_type == "single_label_classification":
                loss = F.cross_entropy(logits.view(-1, self.num_labels), labels.view(-1))
            elif self.config.problem_type == "multi_label_classification":
                loss = F.binary_cross_entropy_with_logits(logits, labels)

        if not return_dict:
            output = (logits,) + outputs[2:]
            return (loss,) + output if loss is not None else output

        return ImageClassifierOutputWithNoAttention(loss=loss, logits=logits, hidden_states=outputs.hidden_states)

mindnlp.transformers.models.bit.modeling_bit.BitForImageClassification.__init__(config)

Initializes an instance of the BitForImageClassification class.

PARAMETER DESCRIPTION
self

The current instance of the BitForImageClassification class.

TYPE: BitForImageClassification

config

The configuration object containing various settings for the model.

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/bit/modeling_bit.py
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
def __init__(self, config):
    """
    Initializes an instance of the BitForImageClassification class.

    Args:
        self (BitForImageClassification): The current instance of the BitForImageClassification class.
        config: The configuration object containing various settings for the model.

    Returns:
        None

    Raises:
        None
    """
    super().__init__(config)
    self.num_labels = config.num_labels
    self.bit = BitModel(config)
    # classification head
    self.classifier = nn.Sequential(
        nn.Flatten(),
        nn.Linear(config.hidden_sizes[-1], config.num_labels) if config.num_labels > 0 else nn.Identity(),
    )
    # initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.bit.modeling_bit.BitForImageClassification.forward(pixel_values=None, labels=None, output_hidden_states=None, return_dict=None)

PARAMETER DESCRIPTION
labels

Labels for computing the image classification/regression loss. Indices should be in [0, ..., config.num_labels - 1]. If config.num_labels > 1 a classification loss is computed (Cross-Entropy).

TYPE: `mindspore.Tensor` of shape `(batch_size,)`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/bit/modeling_bit.py
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
def forward(
    self,
    pixel_values: Optional[mindspore.Tensor] = None,
    labels: Optional[mindspore.Tensor] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> ImageClassifierOutputWithNoAttention:
    r"""
    Args:
        labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for computing the image classification/regression loss. Indices should be in `[0, ...,
            config.num_labels - 1]`. If `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    outputs = self.bit(pixel_values, output_hidden_states=output_hidden_states, return_dict=return_dict)

    pooled_output = outputs.pooler_output if return_dict else outputs[1]

    logits = self.classifier(pooled_output)

    loss = None

    if labels is not None:
        if self.config.problem_type is None:
            if self.num_labels == 1:
                self.config.problem_type = "regression"
            elif self.num_labels > 1 and labels.dtype in (mindspore.int64, mindspore.int32):
                self.config.problem_type = "single_label_classification"
            else:
                self.config.problem_type = "multi_label_classification"
        if self.config.problem_type == "regression":
            if self.num_labels == 1:
                loss = F.mse_loss(logits.squeeze(), labels.squeeze())
            else:
                loss = F.mse_loss(logits, labels)
        elif self.config.problem_type == "single_label_classification":
            loss = F.cross_entropy(logits.view(-1, self.num_labels), labels.view(-1))
        elif self.config.problem_type == "multi_label_classification":
            loss = F.binary_cross_entropy_with_logits(logits, labels)

    if not return_dict:
        output = (logits,) + outputs[2:]
        return (loss,) + output if loss is not None else output

    return ImageClassifierOutputWithNoAttention(loss=loss, logits=logits, hidden_states=outputs.hidden_states)

mindnlp.transformers.models.bit.modeling_bit.BitModel

Bases: BitPreTrainedModel

The BitModel class represents a model for processing pixel values using Bit embeddings and encoding techniques. It inherits from the BitPreTrainedModel and includes methods for initialization and forwarding the model output with pooling and no attention.

ATTRIBUTE DESCRIPTION
config

The configuration for the model.

embedder

Instance of BitEmbeddings for embedding the input pixel values.

encoder

Instance of BitEncoder for encoding the embedded values.

norm

Instance of BitGroupNormActivation for applying normalization to the hidden state.

pooler

Instance of nn.AdaptiveAvgPool2d for pooling the last hidden state.

METHOD DESCRIPTION
__init__

Initializes the BitModel with the provided configuration.

forward

Constructs the model output with pooling and no attention based on the input pixel values and optional flags for outputting hidden states and using a return dictionary.

Source code in mindnlp/transformers/models/bit/modeling_bit.py
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
class BitModel(BitPreTrainedModel):

    """
    The BitModel class represents a model for processing pixel values using Bit embeddings and encoding techniques.
    It inherits from the BitPreTrainedModel and includes methods for initialization and
    forwarding the model output with pooling and no attention.

    Attributes:
        config: The configuration for the model.
        embedder: Instance of BitEmbeddings for embedding the input pixel values.
        encoder: Instance of BitEncoder for encoding the embedded values.
        norm: Instance of BitGroupNormActivation for applying normalization to the hidden state.
        pooler: Instance of nn.AdaptiveAvgPool2d for pooling the last hidden state.

    Methods:
        __init__(self, config): Initializes the BitModel with the provided configuration.
        forward(self, pixel_values, output_hidden_states, return_dict): Constructs the model output with pooling
            and no attention based on the input pixel values and optional flags for outputting hidden states and
            using a return dictionary.
    """
    def __init__(self, config):
        """Initializes a BitModel instance.

        Args:
            self (BitModel): An instance of the BitModel class.
            config (object): A configuration object containing various settings for the model.

        Returns:
            None

        Raises:
            None
        """
        super().__init__(config)
        self.config = config

        self.embedder = BitEmbeddings(config)

        self.encoder = BitEncoder(config)
        self.norm = (
            BitGroupNormActivation(config, num_channels=config.hidden_sizes[-1])
            if config.layer_type == "preactivation"
            else nn.Identity()
        )

        self.pooler = nn.AdaptiveAvgPool2d((1, 1))
        # Initialize weights and apply final processing
        self.post_init()

    def forward(
        self, pixel_values: mindspore.Tensor, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None
    ) -> BaseModelOutputWithPoolingAndNoAttention:
        """
        Constructs the BitModel by processing the given pixel values.

        Args:
            self: The instance of the BitModel class.
            pixel_values (mindspore.Tensor): The input tensor containing pixel values.
            output_hidden_states (bool, optional): Whether to include the hidden states in the output. Defaults to None.
            return_dict (bool, optional): Whether to return the output as a dictionary. Defaults to None.

        Returns:
            BaseModelOutputWithPoolingAndNoAttention: An object containing the forwarded BitModel output,
                including the last hidden state, pooled output, and hidden states.

        Raises:
            None.

        Note:
            - The `output_hidden_states` parameter,
            if provided, overrides the `output_hidden_states` configuration of the BitModel instance.
            - The `return_dict` parameter,
            if provided, overrides the `use_return_dict` configuration of the BitModel instance.
        """
        output_hidden_states = (
            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
        )
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        embedding_output = self.embedder(pixel_values)

        encoder_outputs = self.encoder(
            embedding_output, output_hidden_states=output_hidden_states, return_dict=return_dict
        )

        last_hidden_state = encoder_outputs[0]

        last_hidden_state = self.norm(last_hidden_state)

        pooled_output = self.pooler(last_hidden_state)

        if not return_dict:
            return (last_hidden_state, pooled_output) + encoder_outputs[1:]

        return BaseModelOutputWithPoolingAndNoAttention(
            last_hidden_state=last_hidden_state,
            pooler_output=pooled_output,
            hidden_states=encoder_outputs.hidden_states,
        )

mindnlp.transformers.models.bit.modeling_bit.BitModel.__init__(config)

Initializes a BitModel instance.

PARAMETER DESCRIPTION
self

An instance of the BitModel class.

TYPE: BitModel

config

A configuration object containing various settings for the model.

TYPE: object

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/bit/modeling_bit.py
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
def __init__(self, config):
    """Initializes a BitModel instance.

    Args:
        self (BitModel): An instance of the BitModel class.
        config (object): A configuration object containing various settings for the model.

    Returns:
        None

    Raises:
        None
    """
    super().__init__(config)
    self.config = config

    self.embedder = BitEmbeddings(config)

    self.encoder = BitEncoder(config)
    self.norm = (
        BitGroupNormActivation(config, num_channels=config.hidden_sizes[-1])
        if config.layer_type == "preactivation"
        else nn.Identity()
    )

    self.pooler = nn.AdaptiveAvgPool2d((1, 1))
    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.bit.modeling_bit.BitModel.forward(pixel_values, output_hidden_states=None, return_dict=None)

Constructs the BitModel by processing the given pixel values.

PARAMETER DESCRIPTION
self

The instance of the BitModel class.

pixel_values

The input tensor containing pixel values.

TYPE: Tensor

output_hidden_states

Whether to include the hidden states in the output. Defaults to None.

TYPE: bool DEFAULT: None

return_dict

Whether to return the output as a dictionary. Defaults to None.

TYPE: bool DEFAULT: None

RETURNS DESCRIPTION
BaseModelOutputWithPoolingAndNoAttention

An object containing the forwarded BitModel output, including the last hidden state, pooled output, and hidden states.

TYPE: BaseModelOutputWithPoolingAndNoAttention

Note
  • The output_hidden_states parameter, if provided, overrides the output_hidden_states configuration of the BitModel instance.
  • The return_dict parameter, if provided, overrides the use_return_dict configuration of the BitModel instance.
Source code in mindnlp/transformers/models/bit/modeling_bit.py
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
def forward(
    self, pixel_values: mindspore.Tensor, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None
) -> BaseModelOutputWithPoolingAndNoAttention:
    """
    Constructs the BitModel by processing the given pixel values.

    Args:
        self: The instance of the BitModel class.
        pixel_values (mindspore.Tensor): The input tensor containing pixel values.
        output_hidden_states (bool, optional): Whether to include the hidden states in the output. Defaults to None.
        return_dict (bool, optional): Whether to return the output as a dictionary. Defaults to None.

    Returns:
        BaseModelOutputWithPoolingAndNoAttention: An object containing the forwarded BitModel output,
            including the last hidden state, pooled output, and hidden states.

    Raises:
        None.

    Note:
        - The `output_hidden_states` parameter,
        if provided, overrides the `output_hidden_states` configuration of the BitModel instance.
        - The `return_dict` parameter,
        if provided, overrides the `use_return_dict` configuration of the BitModel instance.
    """
    output_hidden_states = (
        output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
    )
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    embedding_output = self.embedder(pixel_values)

    encoder_outputs = self.encoder(
        embedding_output, output_hidden_states=output_hidden_states, return_dict=return_dict
    )

    last_hidden_state = encoder_outputs[0]

    last_hidden_state = self.norm(last_hidden_state)

    pooled_output = self.pooler(last_hidden_state)

    if not return_dict:
        return (last_hidden_state, pooled_output) + encoder_outputs[1:]

    return BaseModelOutputWithPoolingAndNoAttention(
        last_hidden_state=last_hidden_state,
        pooler_output=pooled_output,
        hidden_states=encoder_outputs.hidden_states,
    )

mindnlp.transformers.models.bit.modeling_bit.BitPreTrainedModel

Bases: PreTrainedModel

An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained models.

Source code in mindnlp/transformers/models/bit/modeling_bit.py
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
class BitPreTrainedModel(PreTrainedModel):
    """
    An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
    models.
    """
    config_class = BitConfig
    base_model_prefix = "bit"
    main_input_name = "pixel_values"

    def _init_weights(self, cell):
        """
        This method initializes the weights of the given cell based on its type.

        Args:
            self: The instance of the BitPreTrainedModel class.
            cell: An instance of a neural network cell (e.g., nn.Conv2d, nn.BatchNorm2d, nn.GroupNorm).
                It represents the cell for which the weights are initialized.

        Returns:
            None.

        Raises:
            TypeError: If the 'cell' parameter is not an instance of nn.Conv2d, nn.BatchNorm2d, or nn.GroupNorm.
            ValueError: If the 'cell' parameter is provided with an unsupported type.
            RuntimeError: If the weight initialization fails due to any runtime issues.
        """
        if isinstance(cell, nn.Conv2d):
            cell.weight.set_data(initializer(HeNormal(), cell.weight.shape, cell.weight.dtype))
        elif isinstance(cell, (nn.BatchNorm2d, nn.GroupNorm)):
            cell.weight.set_data(initializer('ones', cell.weight.shape, cell.weight.dtype))
            cell.bias.set_data(initializer('zeros', cell.bias.shape, cell.bias.dtype))

mindnlp.transformers.models.bit.modeling_bit.BitBackbone

Bases: BitPreTrainedModel, BackboneMixin

A BitBackbone class represents the backbone of a Bit model, which is a pre-trained image classification model.

This class inherits from the BitPreTrainedModel and BackboneMixin classes.

The BitBackbone class has the following methods:

  • init(self, config): Initializes the BitBackbone instance with the provided configuration.
  • forward(self, pixel_values, output_hidden_states, return_dict): Constructs the backbone model and returns the feature maps and hidden states.
Example
>>> from transformers import AutoImageProcessor, AutoBackbone
>>> from PIL import Image
>>> import requests
...
>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
>>> image = Image.open(requests.get(url, stream=True).raw)
...
>>> processor = AutoImageProcessor.from_pretrained("google/resnetnv2-50")
>>> model = AutoBackbone.from_pretrained("google/resnetnv2-50")
...
>>> inputs = processor(image, return_tensors="pt")
>>> outputs = model(**inputs)
Note

In the above example, the BitBackbone class is used to extract feature maps and hidden states from an image using a pre-trained Bit model.

Source code in mindnlp/transformers/models/bit/modeling_bit.py
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
class BitBackbone(BitPreTrainedModel, BackboneMixin):

    """
    A BitBackbone class represents the backbone of a Bit model, which is a pre-trained image classification model.

    This class inherits from the BitPreTrainedModel and BackboneMixin classes.

    The BitBackbone class has the following methods:

    - __init__(self, config): Initializes the BitBackbone instance with the provided configuration.
    - forward(self, pixel_values, output_hidden_states, return_dict): Constructs the backbone model and returns the feature maps and hidden states.

    Example:
        ```python
        >>> from transformers import AutoImageProcessor, AutoBackbone
        >>> from PIL import Image
        >>> import requests
        ...
        >>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
        >>> image = Image.open(requests.get(url, stream=True).raw)
        ...
        >>> processor = AutoImageProcessor.from_pretrained("google/resnetnv2-50")
        >>> model = AutoBackbone.from_pretrained("google/resnetnv2-50")
        ...
        >>> inputs = processor(image, return_tensors="pt")
        >>> outputs = model(**inputs)
        ```

    Note:
        In the above example, the BitBackbone class is used to extract feature maps and hidden states from an image using a pre-trained Bit model.
    """
    def __init__(self, config):
        """
        Initializes an instance of the BitBackbone class.

        Args:
            self: The instance of the BitBackbone class.
            config:
                A configuration object containing the settings for the BitBackbone model.
                It should be an instance of the Config class and contain the following attributes:

                - embedding_size (int): The size of the input embedding.
                - hidden_sizes (list): A list of integers representing the sizes of hidden layers.

        Returns:
            None

        Raises:
            None
        """
        super().__init__(config)
        super()._init_backbone(config)

        self.bit = BitModel(config)
        self.num_features = [config.embedding_size] + config.hidden_sizes

        # initialize weights and apply final processing
        self.post_init()

    def forward(
        self, pixel_values: mindspore.Tensor, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None
    ) -> BackboneOutput:
        """
        Returns:
            BackboneOutput

        Example:
            ```python
            >>> from transformers import AutoImageProcessor, AutoBackbone
            >>> import torch
            >>> from PIL import Image
            >>> import requests
            ...
            >>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
            >>> image = Image.open(requests.get(url, stream=True).raw)
            ...
            >>> processor = AutoImageProcessor.from_pretrained("google/resnetnv2-50")
            >>> model = AutoBackbone.from_pretrained("google/resnetnv2-50")
            ...
            >>> inputs = processor(image, return_tensors="pt")
            >>> outputs = model(**inputs)
            ```
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
        output_hidden_states = (
            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
        )

        outputs = self.bit(pixel_values, output_hidden_states=True, return_dict=True)

        hidden_states = outputs.hidden_states

        feature_maps = ()
        for idx, stage in enumerate(self.stage_names):
            if stage in self.out_features:
                feature_maps += (hidden_states[idx],)

        if not return_dict:
            output = (feature_maps,)
            if output_hidden_states:
                output += (outputs.hidden_states,)
            return output

        return BackboneOutput(
            feature_maps=feature_maps,
            hidden_states=outputs.hidden_states if output_hidden_states else None,
            attentions=None,
        )

mindnlp.transformers.models.bit.modeling_bit.BitBackbone.__init__(config)

Initializes an instance of the BitBackbone class.

PARAMETER DESCRIPTION
self

The instance of the BitBackbone class.

config

A configuration object containing the settings for the BitBackbone model. It should be an instance of the Config class and contain the following attributes:

  • embedding_size (int): The size of the input embedding.
  • hidden_sizes (list): A list of integers representing the sizes of hidden layers.

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/bit/modeling_bit.py
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
def __init__(self, config):
    """
    Initializes an instance of the BitBackbone class.

    Args:
        self: The instance of the BitBackbone class.
        config:
            A configuration object containing the settings for the BitBackbone model.
            It should be an instance of the Config class and contain the following attributes:

            - embedding_size (int): The size of the input embedding.
            - hidden_sizes (list): A list of integers representing the sizes of hidden layers.

    Returns:
        None

    Raises:
        None
    """
    super().__init__(config)
    super()._init_backbone(config)

    self.bit = BitModel(config)
    self.num_features = [config.embedding_size] + config.hidden_sizes

    # initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.bit.modeling_bit.BitBackbone.forward(pixel_values, output_hidden_states=None, return_dict=None)

RETURNS DESCRIPTION
BackboneOutput

BackboneOutput

Example
>>> from transformers import AutoImageProcessor, AutoBackbone
>>> import torch
>>> from PIL import Image
>>> import requests
...
>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
>>> image = Image.open(requests.get(url, stream=True).raw)
...
>>> processor = AutoImageProcessor.from_pretrained("google/resnetnv2-50")
>>> model = AutoBackbone.from_pretrained("google/resnetnv2-50")
...
>>> inputs = processor(image, return_tensors="pt")
>>> outputs = model(**inputs)
Source code in mindnlp/transformers/models/bit/modeling_bit.py
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
def forward(
    self, pixel_values: mindspore.Tensor, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None
) -> BackboneOutput:
    """
    Returns:
        BackboneOutput

    Example:
        ```python
        >>> from transformers import AutoImageProcessor, AutoBackbone
        >>> import torch
        >>> from PIL import Image
        >>> import requests
        ...
        >>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
        >>> image = Image.open(requests.get(url, stream=True).raw)
        ...
        >>> processor = AutoImageProcessor.from_pretrained("google/resnetnv2-50")
        >>> model = AutoBackbone.from_pretrained("google/resnetnv2-50")
        ...
        >>> inputs = processor(image, return_tensors="pt")
        >>> outputs = model(**inputs)
        ```
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict
    output_hidden_states = (
        output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
    )

    outputs = self.bit(pixel_values, output_hidden_states=True, return_dict=True)

    hidden_states = outputs.hidden_states

    feature_maps = ()
    for idx, stage in enumerate(self.stage_names):
        if stage in self.out_features:
            feature_maps += (hidden_states[idx],)

    if not return_dict:
        output = (feature_maps,)
        if output_hidden_states:
            output += (outputs.hidden_states,)
        return output

    return BackboneOutput(
        feature_maps=feature_maps,
        hidden_states=outputs.hidden_states if output_hidden_states else None,
        attentions=None,
    )

mindnlp.transformers.models.bit.image_processing_bit.BitImageProcessor

Bases: BaseImageProcessor

Constructs a BiT image processor.

PARAMETER DESCRIPTION
do_resize

Whether to resize the image's (height, width) dimensions to the specified size. Can be overridden by do_resize in the preprocess method.

TYPE: `bool`, *optional*, defaults to `True` DEFAULT: True

size

224}): Size of the image after resizing. The shortest edge of the image is resized to size["shortest_edge"], with the longest edge resized to keep the input aspect ratio. Can be overridden bysizein thepreprocess` method.

TYPE: `Dict[str, int]` *optional*, defaults to `{"shortest_edge" DEFAULT: None

resample

Resampling filter to use if resizing the image. Can be overridden by resample in the preprocess method.

TYPE: `PILImageResampling`, *optional*, defaults to `PILImageResampling.BICUBIC` DEFAULT: BICUBIC

do_center_crop

Whether to center crop the image to the specified crop_size. Can be overridden by do_center_crop in the preprocess method.

TYPE: `bool`, *optional*, defaults to `True` DEFAULT: True

crop_size

Size of the output image after applying center_crop. Can be overridden by crop_size in the preprocess method.

TYPE: `Dict[str, int]` *optional*, defaults to 224 DEFAULT: None

do_rescale

Whether to rescale the image by the specified scale rescale_factor. Can be overridden by do_rescale in the preprocess method.

TYPE: `bool`, *optional*, defaults to `True` DEFAULT: True

rescale_factor

Scale factor to use if rescaling the image. Can be overridden by rescale_factor in the preprocess method.

TYPE: `int` or `float`, *optional*, defaults to `1/255` DEFAULT: 1 / 255

do_normalize

Whether to normalize the image. Can be overridden by do_normalize in the preprocess method.

TYPE: bool DEFAULT: True

image_mean

Mean to use if normalizing the image. This is a float or list of floats the length of the number of channels in the image. Can be overridden by the image_mean parameter in the preprocess method.

TYPE: `float` or `List[float]`, *optional*, defaults to `OPENAI_CLIP_MEAN` DEFAULT: None

image_std

Standard deviation to use if normalizing the image. This is a float or list of floats the length of the number of channels in the image. Can be overridden by the image_std parameter in the preprocess method. Can be overridden by the image_std parameter in the preprocess method.

TYPE: `float` or `List[float]`, *optional*, defaults to `OPENAI_CLIP_MEAN` DEFAULT: None

do_convert_rgb

Whether to convert the image to RGB.

TYPE: `bool`, *optional*, defaults to `True` DEFAULT: True

Source code in mindnlp/transformers/models/bit/image_processing_bit.py
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
class BitImageProcessor(BaseImageProcessor):
    r"""
    Constructs a BiT image processor.

    Args:
        do_resize (`bool`, *optional*, defaults to `True`):
            Whether to resize the image's (height, width) dimensions to the specified `size`. Can be overridden by
            `do_resize` in the `preprocess` method.
        size (`Dict[str, int]` *optional*, defaults to `{"shortest_edge": 224}`):
            Size of the image after resizing. The shortest edge of the image is resized to size["shortest_edge"], with
            the longest edge resized to keep the input aspect ratio. Can be overridden by `size` in the `preprocess`
            method.
        resample (`PILImageResampling`, *optional*, defaults to `PILImageResampling.BICUBIC`):
            Resampling filter to use if resizing the image. Can be overridden by `resample` in the `preprocess` method.
        do_center_crop (`bool`, *optional*, defaults to `True`):
            Whether to center crop the image to the specified `crop_size`. Can be overridden by `do_center_crop` in the
            `preprocess` method.
        crop_size (`Dict[str, int]` *optional*, defaults to 224):
            Size of the output image after applying `center_crop`. Can be overridden by `crop_size` in the `preprocess`
            method.
        do_rescale (`bool`, *optional*, defaults to `True`):
            Whether to rescale the image by the specified scale `rescale_factor`. Can be overridden by `do_rescale` in
            the `preprocess` method.
        rescale_factor (`int` or `float`, *optional*, defaults to `1/255`):
            Scale factor to use if rescaling the image. Can be overridden by `rescale_factor` in the `preprocess`
            method.
        do_normalize:
            Whether to normalize the image. Can be overridden by `do_normalize` in the `preprocess` method.
        image_mean (`float` or `List[float]`, *optional*, defaults to `OPENAI_CLIP_MEAN`):
            Mean to use if normalizing the image. This is a float or list of floats the length of the number of
            channels in the image. Can be overridden by the `image_mean` parameter in the `preprocess` method.
        image_std (`float` or `List[float]`, *optional*, defaults to `OPENAI_CLIP_MEAN`):
            Standard deviation to use if normalizing the image. This is a float or list of floats the length of the
            number of channels in the image. Can be overridden by the `image_std` parameter in the `preprocess` method.
            Can be overridden by the `image_std` parameter in the `preprocess` method.
        do_convert_rgb (`bool`, *optional*, defaults to `True`):
            Whether to convert the image to RGB.
    """
    model_input_names = ["pixel_values"]

    def __init__(
        self,
        do_resize: bool = True,
        size: Dict[str, int] = None,
        resample: PILImageResampling = PILImageResampling.BICUBIC,
        do_center_crop: bool = True,
        crop_size: Dict[str, int] = None,
        do_rescale: bool = True,
        rescale_factor: Union[int, float] = 1 / 255,
        do_normalize: bool = True,
        image_mean: Optional[Union[float, List[float]]] = None,
        image_std: Optional[Union[float, List[float]]] = None,
        do_convert_rgb: bool = True,
        **kwargs,
    ) -> None:
        """
        Initializes a BitImageProcessor instance.

        Args:
            self: The BitImageProcessor instance.
            do_resize (bool, optional): Whether to resize the image. Defaults to True.
            size (Dict[str, int], optional): The target size of the image. Defaults to None.
            resample (PILImageResampling, optional): The resampling filter to use when resizing the image.
                Defaults to PILImageResampling.BICUBIC.
            do_center_crop (bool, optional): Whether to perform center cropping. Defaults to True.
            crop_size (Dict[str, int], optional): The size for center cropping. Defaults to None.
            do_rescale (bool, optional): Whether to rescale the image. Defaults to True.
            rescale_factor (Union[int, float], optional): The rescaling factor. Defaults to 1 / 255.
            do_normalize (bool, optional): Whether to normalize the image. Defaults to True.
            image_mean (Optional[Union[float, List[float]]], optional): The mean value for image normalization.
                Defaults to None.
            image_std (Optional[Union[float, List[float]]], optional): The standard deviation for image normalization.
                Defaults to None.
            do_convert_rgb (bool, optional): Whether to convert the image to RGB format. Defaults to True.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__(**kwargs)
        size = size if size is not None else {"shortest_edge": 224}
        size = get_size_dict(size, default_to_square=False)
        crop_size = crop_size if crop_size is not None else {"height": 224, "width": 224}
        crop_size = get_size_dict(crop_size, default_to_square=True, param_name="crop_size")

        self.do_resize = do_resize
        self.size = size
        self.resample = resample
        self.do_center_crop = do_center_crop
        self.crop_size = crop_size
        self.do_rescale = do_rescale
        self.rescale_factor = rescale_factor
        self.do_normalize = do_normalize
        self.image_mean = image_mean if image_mean is not None else OPENAI_CLIP_MEAN
        self.image_std = image_std if image_std is not None else OPENAI_CLIP_STD
        self.do_convert_rgb = do_convert_rgb
        self._valid_processor_keys = [
            "images",
            "do_resize",
            "size",
            "resample",
            "do_center_crop",
            "crop_size",
            "do_rescale",
            "rescale_factor",
            "do_normalize",
            "image_mean",
            "image_std",
            "do_convert_rgb",
            "return_tensors",
            "data_format",
            "input_data_format",
        ]

    # Copied from transformers.models.clip.image_processing_clip.CLIPImageProcessor.resize
    def resize(
        self,
        image: np.ndarray,
        size: Dict[str, int],
        resample: PILImageResampling = PILImageResampling.BICUBIC,
        data_format: Optional[Union[str, ChannelDimension]] = None,
        input_data_format: Optional[Union[str, ChannelDimension]] = None,
        **kwargs,
    ) -> np.ndarray:
        """
        Resize an image. The shortest edge of the image is resized to size["shortest_edge"], with the longest edge
        resized to keep the input aspect ratio.

        Args:
            image (`np.ndarray`):
                Image to resize.
            size (`Dict[str, int]`):
                Size of the output image.
            resample (`PILImageResampling`, *optional*, defaults to `PILImageResampling.BICUBIC`):
                Resampling filter to use when resiizing the image.
            data_format (`str` or `ChannelDimension`, *optional*):
                The channel dimension format of the image. If not provided, it will be the same as the input image.
            input_data_format (`ChannelDimension` or `str`, *optional*):
                The channel dimension format of the input image. If not provided, it will be inferred.
        """
        default_to_square = True
        if "shortest_edge" in size:
            size = size["shortest_edge"]
            default_to_square = False
        elif "height" in size and "width" in size:
            size = (size["height"], size["width"])
        else:
            raise ValueError("Size must contain either 'shortest_edge' or 'height' and 'width'.")

        output_size = get_resize_output_image_size(
            image,
            size=size,
            default_to_square=default_to_square,
            input_data_format=input_data_format,
        )
        return resize(
            image,
            size=output_size,
            resample=resample,
            data_format=data_format,
            input_data_format=input_data_format,
            **kwargs,
        )

    def preprocess(
        self,
        images: ImageInput,
        do_resize: bool = None,
        size: Dict[str, int] = None,
        resample: PILImageResampling = None,
        do_center_crop: bool = None,
        crop_size: int = None,
        do_rescale: bool = None,
        rescale_factor: float = None,
        do_normalize: bool = None,
        image_mean: Optional[Union[float, List[float]]] = None,
        image_std: Optional[Union[float, List[float]]] = None,
        do_convert_rgb: bool = None,
        return_tensors: Optional[Union[str, TensorType]] = None,
        data_format: Optional[ChannelDimension] = ChannelDimension.FIRST,
        input_data_format: Optional[Union[str, ChannelDimension]] = None,
        **kwargs,
    ) -> PIL.Image.Image:
        """
        Preprocess an image or batch of images.

        Args:
            images (`ImageInput`):
                Image to preprocess. Expects a single or batch of images with pixel values ranging from 0 to 255. If
                passing in images with pixel values between 0 and 1, set `do_rescale=False`.
            do_resize (`bool`, *optional*, defaults to `self.do_resize`):
                Whether to resize the image.
            size (`Dict[str, int]`, *optional*, defaults to `self.size`):
                Size of the image after resizing. Shortest edge of the image is resized to size["shortest_edge"], with
                the longest edge resized to keep the input aspect ratio.
            resample (`int`, *optional*, defaults to `self.resample`):
                Resampling filter to use if resizing the image. This can be one of the enum `PILImageResampling`. Only
                has an effect if `do_resize` is set to `True`.
            do_center_crop (`bool`, *optional*, defaults to `self.do_center_crop`):
                Whether to center crop the image.
            crop_size (`Dict[str, int]`, *optional*, defaults to `self.crop_size`):
                Size of the center crop. Only has an effect if `do_center_crop` is set to `True`.
            do_rescale (`bool`, *optional*, defaults to `self.do_rescale`):
                Whether to rescale the image.
            rescale_factor (`float`, *optional*, defaults to `self.rescale_factor`):
                Rescale factor to rescale the image by if `do_rescale` is set to `True`.
            do_normalize (`bool`, *optional*, defaults to `self.do_normalize`):
                Whether to normalize the image.
            image_mean (`float` or `List[float]`, *optional*, defaults to `self.image_mean`):
                Image mean to use for normalization. Only has an effect if `do_normalize` is set to `True`.
            image_std (`float` or `List[float]`, *optional*, defaults to `self.image_std`):
                Image standard deviation to use for normalization. Only has an effect if `do_normalize` is set to
                `True`.
            do_convert_rgb (`bool`, *optional*, defaults to `self.do_convert_rgb`):
                Whether to convert the image to RGB.
            return_tensors (`str` or `TensorType`, *optional*):
                The type of tensors to return. Can be one of:

                - Unset: Return a list of `np.ndarray`.
                - `TensorType.TENSORFLOW` or `'tf'`: Return a batch of type `tf.Tensor`.
                - `TensorType.PYTORCH` or `'pt'`: Return a batch of type `torch.Tensor`.
                - `TensorType.NUMPY` or `'np'`: Return a batch of type `np.ndarray`.
                - `TensorType.JAX` or `'jax'`: Return a batch of type `jax.numpy.ndarray`.
            data_format (`ChannelDimension` or `str`, *optional*, defaults to `ChannelDimension.FIRST`):
                The channel dimension format for the output image. Can be one of:

                - `"channels_first"` or `ChannelDimension.FIRST`: image in (num_channels, height, width) format.
                - `"channels_last"` or `ChannelDimension.LAST`: image in (height, width, num_channels) format.
                - Unset: Use the channel dimension format of the input image.
            input_data_format (`ChannelDimension` or `str`, *optional*):
                The channel dimension format for the input image. If unset, the channel dimension format is inferred
                from the input image. Can be one of:

                - `"channels_first"` or `ChannelDimension.FIRST`: image in (num_channels, height, width) format.
                - `"channels_last"` or `ChannelDimension.LAST`: image in (height, width, num_channels) format.
                - `"none"` or `ChannelDimension.NONE`: image in (height, width) format.
        """
        do_resize = do_resize if do_resize is not None else self.do_resize
        size = size if size is not None else self.size
        size = get_size_dict(size, param_name="size", default_to_square=False)
        resample = resample if resample is not None else self.resample
        do_center_crop = do_center_crop if do_center_crop is not None else self.do_center_crop
        crop_size = crop_size if crop_size is not None else self.crop_size
        crop_size = get_size_dict(crop_size, param_name="crop_size", default_to_square=True)
        do_rescale = do_rescale if do_rescale is not None else self.do_rescale
        rescale_factor = rescale_factor if rescale_factor is not None else self.rescale_factor
        do_normalize = do_normalize if do_normalize is not None else self.do_normalize
        image_mean = image_mean if image_mean is not None else self.image_mean
        image_std = image_std if image_std is not None else self.image_std
        do_convert_rgb = do_convert_rgb if do_convert_rgb is not None else self.do_convert_rgb

        validate_kwargs(captured_kwargs=kwargs.keys(), valid_processor_keys=self._valid_processor_keys)

        images = make_list_of_images(images)

        if not valid_images(images):
            raise ValueError(
                "Invalid image type. Must be of type PIL.Image.Image, numpy.ndarray, "
                "torch.Tensor, tf.Tensor or jax.ndarray."
            )

        validate_preprocess_arguments(
            do_rescale=do_rescale,
            rescale_factor=rescale_factor,
            do_normalize=do_normalize,
            image_mean=image_mean,
            image_std=image_std,
            do_center_crop=do_center_crop,
            crop_size=crop_size,
            do_resize=do_resize,
            size=size,
            resample=resample,
        )

        # PIL RGBA images are converted to RGB
        if do_convert_rgb:
            images = [convert_to_rgb(image) for image in images]

        # All transformations expect numpy arrays.
        images = [to_numpy_array(image) for image in images]

        if is_scaled_image(images[0]) and do_rescale:
            logger.warning_once(
                "It looks like you are trying to rescale already rescaled images. If the input"
                " images have pixel values between 0 and 1, set `do_rescale=False` to avoid rescaling them again."
            )

        if input_data_format is None:
            # We assume that all images have the same channel dimension format.
            input_data_format = infer_channel_dimension_format(images[0])

        if do_resize:
            images = [
                self.resize(image=image, size=size, resample=resample, input_data_format=input_data_format)
                for image in images
            ]

        if do_center_crop:
            images = [
                self.center_crop(image=image, size=crop_size, input_data_format=input_data_format) for image in images
            ]

        if do_rescale:
            images = [
                self.rescale(image=image, scale=rescale_factor, input_data_format=input_data_format)
                for image in images
            ]

        if do_normalize:
            images = [
                self.normalize(image=image, mean=image_mean, std=image_std, input_data_format=input_data_format)
                for image in images
            ]

        images = [
            to_channel_dimension_format(image, data_format, input_channel_dim=input_data_format) for image in images
        ]

        data = {"pixel_values": images}
        return BatchFeature(data=data, tensor_type=return_tensors)

mindnlp.transformers.models.bit.image_processing_bit.BitImageProcessor.__init__(do_resize=True, size=None, resample=PILImageResampling.BICUBIC, do_center_crop=True, crop_size=None, do_rescale=True, rescale_factor=1 / 255, do_normalize=True, image_mean=None, image_std=None, do_convert_rgb=True, **kwargs)

Initializes a BitImageProcessor instance.

PARAMETER DESCRIPTION
self

The BitImageProcessor instance.

do_resize

Whether to resize the image. Defaults to True.

TYPE: bool DEFAULT: True

size

The target size of the image. Defaults to None.

TYPE: Dict[str, int] DEFAULT: None

resample

The resampling filter to use when resizing the image. Defaults to PILImageResampling.BICUBIC.

TYPE: PILImageResampling DEFAULT: BICUBIC

do_center_crop

Whether to perform center cropping. Defaults to True.

TYPE: bool DEFAULT: True

crop_size

The size for center cropping. Defaults to None.

TYPE: Dict[str, int] DEFAULT: None

do_rescale

Whether to rescale the image. Defaults to True.

TYPE: bool DEFAULT: True

rescale_factor

The rescaling factor. Defaults to 1 / 255.

TYPE: Union[int, float] DEFAULT: 1 / 255

do_normalize

Whether to normalize the image. Defaults to True.

TYPE: bool DEFAULT: True

image_mean

The mean value for image normalization. Defaults to None.

TYPE: Optional[Union[float, List[float]]] DEFAULT: None

image_std

The standard deviation for image normalization. Defaults to None.

TYPE: Optional[Union[float, List[float]]] DEFAULT: None

do_convert_rgb

Whether to convert the image to RGB format. Defaults to True.

TYPE: bool DEFAULT: True

RETURNS DESCRIPTION
None

None.

Source code in mindnlp/transformers/models/bit/image_processing_bit.py
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
def __init__(
    self,
    do_resize: bool = True,
    size: Dict[str, int] = None,
    resample: PILImageResampling = PILImageResampling.BICUBIC,
    do_center_crop: bool = True,
    crop_size: Dict[str, int] = None,
    do_rescale: bool = True,
    rescale_factor: Union[int, float] = 1 / 255,
    do_normalize: bool = True,
    image_mean: Optional[Union[float, List[float]]] = None,
    image_std: Optional[Union[float, List[float]]] = None,
    do_convert_rgb: bool = True,
    **kwargs,
) -> None:
    """
    Initializes a BitImageProcessor instance.

    Args:
        self: The BitImageProcessor instance.
        do_resize (bool, optional): Whether to resize the image. Defaults to True.
        size (Dict[str, int], optional): The target size of the image. Defaults to None.
        resample (PILImageResampling, optional): The resampling filter to use when resizing the image.
            Defaults to PILImageResampling.BICUBIC.
        do_center_crop (bool, optional): Whether to perform center cropping. Defaults to True.
        crop_size (Dict[str, int], optional): The size for center cropping. Defaults to None.
        do_rescale (bool, optional): Whether to rescale the image. Defaults to True.
        rescale_factor (Union[int, float], optional): The rescaling factor. Defaults to 1 / 255.
        do_normalize (bool, optional): Whether to normalize the image. Defaults to True.
        image_mean (Optional[Union[float, List[float]]], optional): The mean value for image normalization.
            Defaults to None.
        image_std (Optional[Union[float, List[float]]], optional): The standard deviation for image normalization.
            Defaults to None.
        do_convert_rgb (bool, optional): Whether to convert the image to RGB format. Defaults to True.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__(**kwargs)
    size = size if size is not None else {"shortest_edge": 224}
    size = get_size_dict(size, default_to_square=False)
    crop_size = crop_size if crop_size is not None else {"height": 224, "width": 224}
    crop_size = get_size_dict(crop_size, default_to_square=True, param_name="crop_size")

    self.do_resize = do_resize
    self.size = size
    self.resample = resample
    self.do_center_crop = do_center_crop
    self.crop_size = crop_size
    self.do_rescale = do_rescale
    self.rescale_factor = rescale_factor
    self.do_normalize = do_normalize
    self.image_mean = image_mean if image_mean is not None else OPENAI_CLIP_MEAN
    self.image_std = image_std if image_std is not None else OPENAI_CLIP_STD
    self.do_convert_rgb = do_convert_rgb
    self._valid_processor_keys = [
        "images",
        "do_resize",
        "size",
        "resample",
        "do_center_crop",
        "crop_size",
        "do_rescale",
        "rescale_factor",
        "do_normalize",
        "image_mean",
        "image_std",
        "do_convert_rgb",
        "return_tensors",
        "data_format",
        "input_data_format",
    ]

mindnlp.transformers.models.bit.image_processing_bit.BitImageProcessor.preprocess(images, do_resize=None, size=None, resample=None, do_center_crop=None, crop_size=None, do_rescale=None, rescale_factor=None, do_normalize=None, image_mean=None, image_std=None, do_convert_rgb=None, return_tensors=None, data_format=ChannelDimension.FIRST, input_data_format=None, **kwargs)

Preprocess an image or batch of images.

PARAMETER DESCRIPTION
images

Image to preprocess. Expects a single or batch of images with pixel values ranging from 0 to 255. If passing in images with pixel values between 0 and 1, set do_rescale=False.

TYPE: `ImageInput`

do_resize

Whether to resize the image.

TYPE: `bool`, *optional*, defaults to `self.do_resize` DEFAULT: None

size

Size of the image after resizing. Shortest edge of the image is resized to size["shortest_edge"], with the longest edge resized to keep the input aspect ratio.

TYPE: `Dict[str, int]`, *optional*, defaults to `self.size` DEFAULT: None

resample

Resampling filter to use if resizing the image. This can be one of the enum PILImageResampling. Only has an effect if do_resize is set to True.

TYPE: `int`, *optional*, defaults to `self.resample` DEFAULT: None

do_center_crop

Whether to center crop the image.

TYPE: `bool`, *optional*, defaults to `self.do_center_crop` DEFAULT: None

crop_size

Size of the center crop. Only has an effect if do_center_crop is set to True.

TYPE: `Dict[str, int]`, *optional*, defaults to `self.crop_size` DEFAULT: None

do_rescale

Whether to rescale the image.

TYPE: `bool`, *optional*, defaults to `self.do_rescale` DEFAULT: None

rescale_factor

Rescale factor to rescale the image by if do_rescale is set to True.

TYPE: `float`, *optional*, defaults to `self.rescale_factor` DEFAULT: None

do_normalize

Whether to normalize the image.

TYPE: `bool`, *optional*, defaults to `self.do_normalize` DEFAULT: None

image_mean

Image mean to use for normalization. Only has an effect if do_normalize is set to True.

TYPE: `float` or `List[float]`, *optional*, defaults to `self.image_mean` DEFAULT: None

image_std

Image standard deviation to use for normalization. Only has an effect if do_normalize is set to True.

TYPE: `float` or `List[float]`, *optional*, defaults to `self.image_std` DEFAULT: None

do_convert_rgb

Whether to convert the image to RGB.

TYPE: `bool`, *optional*, defaults to `self.do_convert_rgb` DEFAULT: None

return_tensors

The type of tensors to return. Can be one of:

  • Unset: Return a list of np.ndarray.
  • TensorType.TENSORFLOW or 'tf': Return a batch of type tf.Tensor.
  • TensorType.PYTORCH or 'pt': Return a batch of type torch.Tensor.
  • TensorType.NUMPY or 'np': Return a batch of type np.ndarray.
  • TensorType.JAX or 'jax': Return a batch of type jax.numpy.ndarray.

TYPE: `str` or `TensorType`, *optional* DEFAULT: None

data_format

The channel dimension format for the output image. Can be one of:

  • "channels_first" or ChannelDimension.FIRST: image in (num_channels, height, width) format.
  • "channels_last" or ChannelDimension.LAST: image in (height, width, num_channels) format.
  • Unset: Use the channel dimension format of the input image.

TYPE: `ChannelDimension` or `str`, *optional*, defaults to `ChannelDimension.FIRST` DEFAULT: FIRST

input_data_format

The channel dimension format for the input image. If unset, the channel dimension format is inferred from the input image. Can be one of:

  • "channels_first" or ChannelDimension.FIRST: image in (num_channels, height, width) format.
  • "channels_last" or ChannelDimension.LAST: image in (height, width, num_channels) format.
  • "none" or ChannelDimension.NONE: image in (height, width) format.

TYPE: `ChannelDimension` or `str`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/bit/image_processing_bit.py
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
def preprocess(
    self,
    images: ImageInput,
    do_resize: bool = None,
    size: Dict[str, int] = None,
    resample: PILImageResampling = None,
    do_center_crop: bool = None,
    crop_size: int = None,
    do_rescale: bool = None,
    rescale_factor: float = None,
    do_normalize: bool = None,
    image_mean: Optional[Union[float, List[float]]] = None,
    image_std: Optional[Union[float, List[float]]] = None,
    do_convert_rgb: bool = None,
    return_tensors: Optional[Union[str, TensorType]] = None,
    data_format: Optional[ChannelDimension] = ChannelDimension.FIRST,
    input_data_format: Optional[Union[str, ChannelDimension]] = None,
    **kwargs,
) -> PIL.Image.Image:
    """
    Preprocess an image or batch of images.

    Args:
        images (`ImageInput`):
            Image to preprocess. Expects a single or batch of images with pixel values ranging from 0 to 255. If
            passing in images with pixel values between 0 and 1, set `do_rescale=False`.
        do_resize (`bool`, *optional*, defaults to `self.do_resize`):
            Whether to resize the image.
        size (`Dict[str, int]`, *optional*, defaults to `self.size`):
            Size of the image after resizing. Shortest edge of the image is resized to size["shortest_edge"], with
            the longest edge resized to keep the input aspect ratio.
        resample (`int`, *optional*, defaults to `self.resample`):
            Resampling filter to use if resizing the image. This can be one of the enum `PILImageResampling`. Only
            has an effect if `do_resize` is set to `True`.
        do_center_crop (`bool`, *optional*, defaults to `self.do_center_crop`):
            Whether to center crop the image.
        crop_size (`Dict[str, int]`, *optional*, defaults to `self.crop_size`):
            Size of the center crop. Only has an effect if `do_center_crop` is set to `True`.
        do_rescale (`bool`, *optional*, defaults to `self.do_rescale`):
            Whether to rescale the image.
        rescale_factor (`float`, *optional*, defaults to `self.rescale_factor`):
            Rescale factor to rescale the image by if `do_rescale` is set to `True`.
        do_normalize (`bool`, *optional*, defaults to `self.do_normalize`):
            Whether to normalize the image.
        image_mean (`float` or `List[float]`, *optional*, defaults to `self.image_mean`):
            Image mean to use for normalization. Only has an effect if `do_normalize` is set to `True`.
        image_std (`float` or `List[float]`, *optional*, defaults to `self.image_std`):
            Image standard deviation to use for normalization. Only has an effect if `do_normalize` is set to
            `True`.
        do_convert_rgb (`bool`, *optional*, defaults to `self.do_convert_rgb`):
            Whether to convert the image to RGB.
        return_tensors (`str` or `TensorType`, *optional*):
            The type of tensors to return. Can be one of:

            - Unset: Return a list of `np.ndarray`.
            - `TensorType.TENSORFLOW` or `'tf'`: Return a batch of type `tf.Tensor`.
            - `TensorType.PYTORCH` or `'pt'`: Return a batch of type `torch.Tensor`.
            - `TensorType.NUMPY` or `'np'`: Return a batch of type `np.ndarray`.
            - `TensorType.JAX` or `'jax'`: Return a batch of type `jax.numpy.ndarray`.
        data_format (`ChannelDimension` or `str`, *optional*, defaults to `ChannelDimension.FIRST`):
            The channel dimension format for the output image. Can be one of:

            - `"channels_first"` or `ChannelDimension.FIRST`: image in (num_channels, height, width) format.
            - `"channels_last"` or `ChannelDimension.LAST`: image in (height, width, num_channels) format.
            - Unset: Use the channel dimension format of the input image.
        input_data_format (`ChannelDimension` or `str`, *optional*):
            The channel dimension format for the input image. If unset, the channel dimension format is inferred
            from the input image. Can be one of:

            - `"channels_first"` or `ChannelDimension.FIRST`: image in (num_channels, height, width) format.
            - `"channels_last"` or `ChannelDimension.LAST`: image in (height, width, num_channels) format.
            - `"none"` or `ChannelDimension.NONE`: image in (height, width) format.
    """
    do_resize = do_resize if do_resize is not None else self.do_resize
    size = size if size is not None else self.size
    size = get_size_dict(size, param_name="size", default_to_square=False)
    resample = resample if resample is not None else self.resample
    do_center_crop = do_center_crop if do_center_crop is not None else self.do_center_crop
    crop_size = crop_size if crop_size is not None else self.crop_size
    crop_size = get_size_dict(crop_size, param_name="crop_size", default_to_square=True)
    do_rescale = do_rescale if do_rescale is not None else self.do_rescale
    rescale_factor = rescale_factor if rescale_factor is not None else self.rescale_factor
    do_normalize = do_normalize if do_normalize is not None else self.do_normalize
    image_mean = image_mean if image_mean is not None else self.image_mean
    image_std = image_std if image_std is not None else self.image_std
    do_convert_rgb = do_convert_rgb if do_convert_rgb is not None else self.do_convert_rgb

    validate_kwargs(captured_kwargs=kwargs.keys(), valid_processor_keys=self._valid_processor_keys)

    images = make_list_of_images(images)

    if not valid_images(images):
        raise ValueError(
            "Invalid image type. Must be of type PIL.Image.Image, numpy.ndarray, "
            "torch.Tensor, tf.Tensor or jax.ndarray."
        )

    validate_preprocess_arguments(
        do_rescale=do_rescale,
        rescale_factor=rescale_factor,
        do_normalize=do_normalize,
        image_mean=image_mean,
        image_std=image_std,
        do_center_crop=do_center_crop,
        crop_size=crop_size,
        do_resize=do_resize,
        size=size,
        resample=resample,
    )

    # PIL RGBA images are converted to RGB
    if do_convert_rgb:
        images = [convert_to_rgb(image) for image in images]

    # All transformations expect numpy arrays.
    images = [to_numpy_array(image) for image in images]

    if is_scaled_image(images[0]) and do_rescale:
        logger.warning_once(
            "It looks like you are trying to rescale already rescaled images. If the input"
            " images have pixel values between 0 and 1, set `do_rescale=False` to avoid rescaling them again."
        )

    if input_data_format is None:
        # We assume that all images have the same channel dimension format.
        input_data_format = infer_channel_dimension_format(images[0])

    if do_resize:
        images = [
            self.resize(image=image, size=size, resample=resample, input_data_format=input_data_format)
            for image in images
        ]

    if do_center_crop:
        images = [
            self.center_crop(image=image, size=crop_size, input_data_format=input_data_format) for image in images
        ]

    if do_rescale:
        images = [
            self.rescale(image=image, scale=rescale_factor, input_data_format=input_data_format)
            for image in images
        ]

    if do_normalize:
        images = [
            self.normalize(image=image, mean=image_mean, std=image_std, input_data_format=input_data_format)
            for image in images
        ]

    images = [
        to_channel_dimension_format(image, data_format, input_channel_dim=input_data_format) for image in images
    ]

    data = {"pixel_values": images}
    return BatchFeature(data=data, tensor_type=return_tensors)

mindnlp.transformers.models.bit.image_processing_bit.BitImageProcessor.resize(image, size, resample=PILImageResampling.BICUBIC, data_format=None, input_data_format=None, **kwargs)

Resize an image. The shortest edge of the image is resized to size["shortest_edge"], with the longest edge resized to keep the input aspect ratio.

PARAMETER DESCRIPTION
image

Image to resize.

TYPE: `np.ndarray`

size

Size of the output image.

TYPE: `Dict[str, int]`

resample

Resampling filter to use when resiizing the image.

TYPE: `PILImageResampling`, *optional*, defaults to `PILImageResampling.BICUBIC` DEFAULT: BICUBIC

data_format

The channel dimension format of the image. If not provided, it will be the same as the input image.

TYPE: `str` or `ChannelDimension`, *optional* DEFAULT: None

input_data_format

The channel dimension format of the input image. If not provided, it will be inferred.

TYPE: `ChannelDimension` or `str`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/bit/image_processing_bit.py
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
def resize(
    self,
    image: np.ndarray,
    size: Dict[str, int],
    resample: PILImageResampling = PILImageResampling.BICUBIC,
    data_format: Optional[Union[str, ChannelDimension]] = None,
    input_data_format: Optional[Union[str, ChannelDimension]] = None,
    **kwargs,
) -> np.ndarray:
    """
    Resize an image. The shortest edge of the image is resized to size["shortest_edge"], with the longest edge
    resized to keep the input aspect ratio.

    Args:
        image (`np.ndarray`):
            Image to resize.
        size (`Dict[str, int]`):
            Size of the output image.
        resample (`PILImageResampling`, *optional*, defaults to `PILImageResampling.BICUBIC`):
            Resampling filter to use when resiizing the image.
        data_format (`str` or `ChannelDimension`, *optional*):
            The channel dimension format of the image. If not provided, it will be the same as the input image.
        input_data_format (`ChannelDimension` or `str`, *optional*):
            The channel dimension format of the input image. If not provided, it will be inferred.
    """
    default_to_square = True
    if "shortest_edge" in size:
        size = size["shortest_edge"]
        default_to_square = False
    elif "height" in size and "width" in size:
        size = (size["height"], size["width"])
    else:
        raise ValueError("Size must contain either 'shortest_edge' or 'height' and 'width'.")

    output_size = get_resize_output_image_size(
        image,
        size=size,
        default_to_square=default_to_square,
        input_data_format=input_data_format,
    )
    return resize(
        image,
        size=output_size,
        resample=resample,
        data_format=data_format,
        input_data_format=input_data_format,
        **kwargs,
    )