Skip to content

resnet

mindnlp.transformers.models.resnet.configuration_resnet

ResNet model configuration

mindnlp.transformers.models.resnet.configuration_resnet.ResNetConfig

Bases: BackboneConfigMixin, PretrainedConfig

This is the configuration class to store the configuration of a [ResNetModel]. It is used to instantiate an ResNet model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the ResNet microsoft/resnet-50 architecture.

Configuration objects inherit from [PretrainedConfig] and can be used to control the model outputs. Read the documentation from [PretrainedConfig] for more information.

PARAMETER DESCRIPTION
num_channels

The number of input channels.

TYPE: `int`, *optional*, defaults to 3 DEFAULT: 3

embedding_size

Dimensionality (hidden size) for the embedding layer.

TYPE: `int`, *optional*, defaults to 64 DEFAULT: 64

hidden_sizes

Dimensionality (hidden size) at each stage.

TYPE: `List[int]`, *optional*, defaults to `[256, 512, 1024, 2048]` DEFAULT: [256, 512, 1024, 2048]

depths

Depth (number of layers) for each stage.

TYPE: `List[int]`, *optional*, defaults to `[3, 4, 6, 3]` DEFAULT: [3, 4, 6, 3]

layer_type

The layer to use, it can be either "basic" (used for smaller models, like resnet-18 or resnet-34) or "bottleneck" (used for larger models like resnet-50 and above).

TYPE: `str`, *optional*, defaults to `"bottleneck"` DEFAULT: 'bottleneck'

hidden_act

The non-linear activation function in each block. If string, "gelu", "relu", "selu" and "gelu_new" are supported.

TYPE: `str`, *optional*, defaults to `"relu"` DEFAULT: 'relu'

downsample_in_first_stage

If True, the first stage will downsample the inputs using a stride of 2.

TYPE: `bool`, *optional*, defaults to `False` DEFAULT: False

downsample_in_bottleneck

If True, the first conv 1x1 in ResNetBottleNeckLayer will downsample the inputs using a stride of 2.

TYPE: `bool`, *optional*, defaults to `False` DEFAULT: False

out_features

If used as backbone, list of features to output. Can be any of "stem", "stage1", "stage2", etc. (depending on how many stages the model has). If unset and out_indices is set, will default to the corresponding stages. If unset and out_indices is unset, will default to the last stage. Must be in the same order as defined in the stage_names attribute.

TYPE: `List[str]`, *optional* DEFAULT: None

out_indices

If used as backbone, list of indices of features to output. Can be any of 0, 1, 2, etc. (depending on how many stages the model has). If unset and out_features is set, will default to the corresponding stages. If unset and out_features is unset, will default to the last stage. Must be in the same order as defined in the stage_names attribute.

TYPE: `List[int]`, *optional* DEFAULT: None

Example
>>> from transformers import ResNetConfig, ResNetModel
...
>>> # Initializing a ResNet resnet-50 style configuration
>>> configuration = ResNetConfig()
...
>>> # Initializing a model (with random weights) from the resnet-50 style configuration
>>> model = ResNetModel(configuration)
...
>>> # Accessing the model configuration
>>> configuration = model.config
Source code in mindnlp/transformers/models/resnet/configuration_resnet.py
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
class ResNetConfig(BackboneConfigMixin, PretrainedConfig):
    r"""
    This is the configuration class to store the configuration of a [`ResNetModel`]. It is used to instantiate an
    ResNet model according to the specified arguments, defining the model architecture. Instantiating a configuration
    with the defaults will yield a similar configuration to that of the ResNet
    [microsoft/resnet-50](https://huggingface.co/microsoft/resnet-50) architecture.

    Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
    documentation from [`PretrainedConfig`] for more information.

    Args:
        num_channels (`int`, *optional*, defaults to 3):
            The number of input channels.
        embedding_size (`int`, *optional*, defaults to 64):
            Dimensionality (hidden size) for the embedding layer.
        hidden_sizes (`List[int]`, *optional*, defaults to `[256, 512, 1024, 2048]`):
            Dimensionality (hidden size) at each stage.
        depths (`List[int]`, *optional*, defaults to `[3, 4, 6, 3]`):
            Depth (number of layers) for each stage.
        layer_type (`str`, *optional*, defaults to `"bottleneck"`):
            The layer to use, it can be either `"basic"` (used for smaller models, like resnet-18 or resnet-34) or
            `"bottleneck"` (used for larger models like resnet-50 and above).
        hidden_act (`str`, *optional*, defaults to `"relu"`):
            The non-linear activation function in each block. If string, `"gelu"`, `"relu"`, `"selu"` and `"gelu_new"`
            are supported.
        downsample_in_first_stage (`bool`, *optional*, defaults to `False`):
            If `True`, the first stage will downsample the inputs using a `stride` of 2.
        downsample_in_bottleneck (`bool`, *optional*, defaults to `False`):
            If `True`, the first conv 1x1 in ResNetBottleNeckLayer will downsample the inputs using a `stride` of 2.
        out_features (`List[str]`, *optional*):
            If used as backbone, list of features to output. Can be any of `"stem"`, `"stage1"`, `"stage2"`, etc.
            (depending on how many stages the model has). If unset and `out_indices` is set, will default to the
            corresponding stages. If unset and `out_indices` is unset, will default to the last stage. Must be in the
            same order as defined in the `stage_names` attribute.
        out_indices (`List[int]`, *optional*):
            If used as backbone, list of indices of features to output. Can be any of 0, 1, 2, etc. (depending on how
            many stages the model has). If unset and `out_features` is set, will default to the corresponding stages.
            If unset and `out_features` is unset, will default to the last stage. Must be in the
            same order as defined in the `stage_names` attribute.

    Example:
        ```python
        >>> from transformers import ResNetConfig, ResNetModel
        ...
        >>> # Initializing a ResNet resnet-50 style configuration
        >>> configuration = ResNetConfig()
        ...
        >>> # Initializing a model (with random weights) from the resnet-50 style configuration
        >>> model = ResNetModel(configuration)
        ...
        >>> # Accessing the model configuration
        >>> configuration = model.config
        ```
    """
    model_type = "resnet"
    layer_types = ["basic", "bottleneck"]

    def __init__(
        self,
        num_channels=3,
        embedding_size=64,
        hidden_sizes=[256, 512, 1024, 2048],
        depths=[3, 4, 6, 3],
        layer_type="bottleneck",
        hidden_act="relu",
        downsample_in_first_stage=False,
        downsample_in_bottleneck=False,
        out_features=None,
        out_indices=None,
        **kwargs,
    ):
        """
        Initializes a ResNetConfig object with the specified configuration parameters.

        Args:
            self (ResNetConfig): The instance of the ResNetConfig class.
            num_channels (int): Number of input channels for the network. Default is 3.
            embedding_size (int): Size of the embedding for the network. Default is 64.
            hidden_sizes (list): List of integers representing hidden layer sizes in each stage. Default is [256, 512, 1024, 2048].
            depths (list): List of integers representing the depth of each stage. Default is [3, 4, 6, 3].
            layer_type (str): Type of layers to be used in the network. Must be one of ['bottleneck']. Default is 'bottleneck'.
            hidden_act (str): Activation function to be used in hidden layers. Default is 'relu'.
            downsample_in_first_stage (bool): Whether to downsample in the first stage. Default is False.
            downsample_in_bottleneck (bool): Whether to downsample in the bottleneck stage. Default is False.
            out_features (None or dict): Dictionary mapping stage names to output feature sizes. Default is None.
            out_indices (None or dict): Dictionary mapping stage names to output indices. Default is None.

        Returns:
            None.

        Raises:
            ValueError: If the provided layer_type is not one of the supported layer types.
        """
        super().__init__(**kwargs)
        if layer_type not in self.layer_types:
            raise ValueError(f"layer_type={layer_type} is not one of {','.join(self.layer_types)}")
        self.num_channels = num_channels
        self.embedding_size = embedding_size
        self.hidden_sizes = hidden_sizes
        self.depths = depths
        self.layer_type = layer_type
        self.hidden_act = hidden_act
        self.downsample_in_first_stage = downsample_in_first_stage
        self.downsample_in_bottleneck = downsample_in_bottleneck
        self.stage_names = ["stem"] + [f"stage{idx}" for idx in range(1, len(depths) + 1)]
        self._out_features, self._out_indices = get_aligned_output_features_output_indices(
            out_features=out_features, out_indices=out_indices, stage_names=self.stage_names
        )

mindnlp.transformers.models.resnet.configuration_resnet.ResNetConfig.__init__(num_channels=3, embedding_size=64, hidden_sizes=[256, 512, 1024, 2048], depths=[3, 4, 6, 3], layer_type='bottleneck', hidden_act='relu', downsample_in_first_stage=False, downsample_in_bottleneck=False, out_features=None, out_indices=None, **kwargs)

Initializes a ResNetConfig object with the specified configuration parameters.

PARAMETER DESCRIPTION
self

The instance of the ResNetConfig class.

TYPE: ResNetConfig

num_channels

Number of input channels for the network. Default is 3.

TYPE: int DEFAULT: 3

embedding_size

Size of the embedding for the network. Default is 64.

TYPE: int DEFAULT: 64

hidden_sizes

List of integers representing hidden layer sizes in each stage. Default is [256, 512, 1024, 2048].

TYPE: list DEFAULT: [256, 512, 1024, 2048]

depths

List of integers representing the depth of each stage. Default is [3, 4, 6, 3].

TYPE: list DEFAULT: [3, 4, 6, 3]

layer_type

Type of layers to be used in the network. Must be one of ['bottleneck']. Default is 'bottleneck'.

TYPE: str DEFAULT: 'bottleneck'

hidden_act

Activation function to be used in hidden layers. Default is 'relu'.

TYPE: str DEFAULT: 'relu'

downsample_in_first_stage

Whether to downsample in the first stage. Default is False.

TYPE: bool DEFAULT: False

downsample_in_bottleneck

Whether to downsample in the bottleneck stage. Default is False.

TYPE: bool DEFAULT: False

out_features

Dictionary mapping stage names to output feature sizes. Default is None.

TYPE: None or dict DEFAULT: None

out_indices

Dictionary mapping stage names to output indices. Default is None.

TYPE: None or dict DEFAULT: None

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
ValueError

If the provided layer_type is not one of the supported layer types.

Source code in mindnlp/transformers/models/resnet/configuration_resnet.py
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
def __init__(
    self,
    num_channels=3,
    embedding_size=64,
    hidden_sizes=[256, 512, 1024, 2048],
    depths=[3, 4, 6, 3],
    layer_type="bottleneck",
    hidden_act="relu",
    downsample_in_first_stage=False,
    downsample_in_bottleneck=False,
    out_features=None,
    out_indices=None,
    **kwargs,
):
    """
    Initializes a ResNetConfig object with the specified configuration parameters.

    Args:
        self (ResNetConfig): The instance of the ResNetConfig class.
        num_channels (int): Number of input channels for the network. Default is 3.
        embedding_size (int): Size of the embedding for the network. Default is 64.
        hidden_sizes (list): List of integers representing hidden layer sizes in each stage. Default is [256, 512, 1024, 2048].
        depths (list): List of integers representing the depth of each stage. Default is [3, 4, 6, 3].
        layer_type (str): Type of layers to be used in the network. Must be one of ['bottleneck']. Default is 'bottleneck'.
        hidden_act (str): Activation function to be used in hidden layers. Default is 'relu'.
        downsample_in_first_stage (bool): Whether to downsample in the first stage. Default is False.
        downsample_in_bottleneck (bool): Whether to downsample in the bottleneck stage. Default is False.
        out_features (None or dict): Dictionary mapping stage names to output feature sizes. Default is None.
        out_indices (None or dict): Dictionary mapping stage names to output indices. Default is None.

    Returns:
        None.

    Raises:
        ValueError: If the provided layer_type is not one of the supported layer types.
    """
    super().__init__(**kwargs)
    if layer_type not in self.layer_types:
        raise ValueError(f"layer_type={layer_type} is not one of {','.join(self.layer_types)}")
    self.num_channels = num_channels
    self.embedding_size = embedding_size
    self.hidden_sizes = hidden_sizes
    self.depths = depths
    self.layer_type = layer_type
    self.hidden_act = hidden_act
    self.downsample_in_first_stage = downsample_in_first_stage
    self.downsample_in_bottleneck = downsample_in_bottleneck
    self.stage_names = ["stem"] + [f"stage{idx}" for idx in range(1, len(depths) + 1)]
    self._out_features, self._out_indices = get_aligned_output_features_output_indices(
        out_features=out_features, out_indices=out_indices, stage_names=self.stage_names
    )

mindnlp.transformers.models.resnet.modeling_resnet

MindSpore ResNet model.

mindnlp.transformers.models.resnet.modeling_resnet.ResNetBackbone

Bases: ResNetPreTrainedModel, BackboneMixin

ResNetBackbone

This class represents a ResNet backbone for image processing tasks. It inherits from the ResNetPreTrainedModel and BackboneMixin classes.

ATTRIBUTE DESCRIPTION
num_features

A list of integers representing the number of features in each hidden layer of the backbone.

TYPE: List[int]

embedder

An instance of the ResNetEmbeddings class used for embedding pixel values.

TYPE: ResNetEmbeddings

encoder

An instance of the ResNetEncoder class used for encoding the embedded features.

TYPE: ResNetEncoder

stage_names

A list of strings representing the names of the stages in the backbone.

TYPE: List[str]

out_features

A list of strings representing the names of the output features.

TYPE: List[str]

config

An object containing the configuration parameters for the ResNetBackbone.

TYPE: object

METHOD DESCRIPTION
__init__

Initializes the ResNetBackbone instance with the given configuration.

forward

forwards the backbone and returns the output.

Source code in mindnlp/transformers/models/resnet/modeling_resnet.py
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
class ResNetBackbone(ResNetPreTrainedModel, BackboneMixin):

    """
    ResNetBackbone

    This class represents a ResNet backbone for image processing tasks. It inherits from the ResNetPreTrainedModel
    and BackboneMixin classes.

    Attributes:
        num_features (List[int]): A list of integers representing the number of features in each
            hidden layer of the backbone.
        embedder (ResNetEmbeddings): An instance of the ResNetEmbeddings class used for embedding pixel values.
        encoder (ResNetEncoder): An instance of the ResNetEncoder class used for encoding the embedded features.
        stage_names (List[str]): A list of strings representing the names of the stages in the backbone.
        out_features (List[str]): A list of strings representing the names of the output features.
        config (object): An object containing the configuration parameters for the ResNetBackbone.

    Methods:
        __init__: Initializes the ResNetBackbone instance with the given configuration.
        forward: forwards the backbone and returns the output.
    """
    def __init__(self, config):
        super().__init__(config)
        super()._init_backbone(config)

        self.num_features = [config.embedding_size] + config.hidden_sizes
        self.embedder = ResNetEmbeddings(config)
        self.encoder = ResNetEncoder(config)

        # initialize weights and apply final processing
        self.post_init()

    def forward(
        self, pixel_values: mindspore.Tensor, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None
    ) -> BackboneOutput:
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
        output_hidden_states = (
            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
        )

        embedding_output = self.embedder(pixel_values)

        outputs = self.encoder(embedding_output, output_hidden_states=True, return_dict=True)

        hidden_states = outputs.hidden_states

        feature_maps = ()
        for idx, stage in enumerate(self.stage_names):
            if stage in self.out_features:
                feature_maps += (hidden_states[idx],)

        if not return_dict:
            output = (feature_maps,)
            if output_hidden_states:
                output += (outputs.hidden_states,)
            return output

        return BackboneOutput(
            feature_maps=feature_maps,
            hidden_states=outputs.hidden_states if output_hidden_states else None,
            attentions=None,
        )

mindnlp.transformers.models.resnet.modeling_resnet.ResNetBasicLayer

Bases: Module

A classic ResNet's residual layer composed by two 3x3 convolutions.

Source code in mindnlp/transformers/models/resnet/modeling_resnet.py
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
class ResNetBasicLayer(nn.Module):
    """
    A classic ResNet's residual layer composed by two `3x3` convolutions.
    """
    def __init__(self, in_channels: int, out_channels: int, stride: int = 1, activation: str = "relu"):
        """
        Initializes a ResNetBasicLayer object with the specified parameters.

        Args:
            self: The object itself.
            in_channels (int): The number of input channels to the layer.
            out_channels (int): The number of output channels from the layer.
            stride (int, optional): The stride value for the layer. Defaults to 1.
            activation (str, optional): The type of activation function to apply. Defaults to 'relu'.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()
        should_apply_shortcut = in_channels != out_channels or stride != 1
        self.shortcut = (
            ResNetShortCut(in_channels, out_channels, stride=stride) if should_apply_shortcut else nn.Identity()
        )
        self.layer = nn.Sequential(
            ResNetConvLayer(in_channels, out_channels, stride=stride),
            ResNetConvLayer(out_channels, out_channels, activation=None),
        )
        self.activation = ACT2FN[activation]

    def forward(self, hidden_state):
        """
        forwards a ResNet basic layer by applying a series of operations to the input hidden state.

        Args:
            self (ResNetBasicLayer): An instance of the ResNetBasicLayer class.
            hidden_state: The input hidden state tensor. It should have the shape (batch_size, hidden_size).

        Returns:
            None

        Raises:
            None
        """
        residual = hidden_state
        hidden_state = self.layer(hidden_state)
        residual = self.shortcut(residual)
        hidden_state += residual
        hidden_state = self.activation(hidden_state)
        return hidden_state

mindnlp.transformers.models.resnet.modeling_resnet.ResNetBasicLayer.__init__(in_channels, out_channels, stride=1, activation='relu')

Initializes a ResNetBasicLayer object with the specified parameters.

PARAMETER DESCRIPTION
self

The object itself.

in_channels

The number of input channels to the layer.

TYPE: int

out_channels

The number of output channels from the layer.

TYPE: int

stride

The stride value for the layer. Defaults to 1.

TYPE: int DEFAULT: 1

activation

The type of activation function to apply. Defaults to 'relu'.

TYPE: str DEFAULT: 'relu'

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/resnet/modeling_resnet.py
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
def __init__(self, in_channels: int, out_channels: int, stride: int = 1, activation: str = "relu"):
    """
    Initializes a ResNetBasicLayer object with the specified parameters.

    Args:
        self: The object itself.
        in_channels (int): The number of input channels to the layer.
        out_channels (int): The number of output channels from the layer.
        stride (int, optional): The stride value for the layer. Defaults to 1.
        activation (str, optional): The type of activation function to apply. Defaults to 'relu'.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()
    should_apply_shortcut = in_channels != out_channels or stride != 1
    self.shortcut = (
        ResNetShortCut(in_channels, out_channels, stride=stride) if should_apply_shortcut else nn.Identity()
    )
    self.layer = nn.Sequential(
        ResNetConvLayer(in_channels, out_channels, stride=stride),
        ResNetConvLayer(out_channels, out_channels, activation=None),
    )
    self.activation = ACT2FN[activation]

mindnlp.transformers.models.resnet.modeling_resnet.ResNetBasicLayer.forward(hidden_state)

forwards a ResNet basic layer by applying a series of operations to the input hidden state.

PARAMETER DESCRIPTION
self

An instance of the ResNetBasicLayer class.

TYPE: ResNetBasicLayer

hidden_state

The input hidden state tensor. It should have the shape (batch_size, hidden_size).

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/resnet/modeling_resnet.py
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
def forward(self, hidden_state):
    """
    forwards a ResNet basic layer by applying a series of operations to the input hidden state.

    Args:
        self (ResNetBasicLayer): An instance of the ResNetBasicLayer class.
        hidden_state: The input hidden state tensor. It should have the shape (batch_size, hidden_size).

    Returns:
        None

    Raises:
        None
    """
    residual = hidden_state
    hidden_state = self.layer(hidden_state)
    residual = self.shortcut(residual)
    hidden_state += residual
    hidden_state = self.activation(hidden_state)
    return hidden_state

mindnlp.transformers.models.resnet.modeling_resnet.ResNetBottleNeckLayer

Bases: Module

A classic ResNet's bottleneck layer composed by three 3x3 convolutions.

The first 1x1 convolution reduces the input by a factor of reduction in order to make the second 3x3 convolution faster. The last 1x1 convolution remaps the reduced features to out_channels. If downsample_in_bottleneck is true, downsample will be in the first layer instead of the second layer.

Source code in mindnlp/transformers/models/resnet/modeling_resnet.py
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
class ResNetBottleNeckLayer(nn.Module):
    """
    A classic ResNet's bottleneck layer composed by three `3x3` convolutions.

    The first `1x1` convolution reduces the input by a factor of `reduction` in order to make the second `3x3`
    convolution faster. The last `1x1` convolution remaps the reduced features to `out_channels`. If
    `downsample_in_bottleneck` is true, downsample will be in the first layer instead of the second layer.
    """
    def __init__(
        self,
        in_channels: int,
        out_channels: int,
        stride: int = 1,
        activation: str = "relu",
        reduction: int = 4,
        downsample_in_bottleneck: bool = False,
    ):
        """
        Initializes a ResNetBottleNeckLayer object.

        Args:
            self: The instance of the ResNetBottleNeckLayer class.
            in_channels (int): The number of input channels.
            out_channels (int): The number of output channels.
            stride (int, optional): The stride value for the convolutional layers. Defaults to 1.
            activation (str, optional): The activation function to be applied. Defaults to 'relu'.
            reduction (int, optional): The reduction factor for the number of output channels. Defaults to 4.
            downsample_in_bottleneck (bool, optional): Whether to downsample in the bottleneck layer. Defaults to False.

        Returns:
            None

        Raises:
            None
        """
        super().__init__()
        should_apply_shortcut = in_channels != out_channels or stride != 1
        reduces_channels = out_channels // reduction
        self.shortcut = (
            ResNetShortCut(in_channels, out_channels, stride=stride) if should_apply_shortcut else nn.Identity()
        )
        self.layer = nn.Sequential(
            ResNetConvLayer(
                in_channels, reduces_channels, kernel_size=1, stride=stride if downsample_in_bottleneck else 1
            ),
            ResNetConvLayer(reduces_channels, reduces_channels, stride=stride if not downsample_in_bottleneck else 1),
            ResNetConvLayer(reduces_channels, out_channels, kernel_size=1, activation=None),
        )
        self.activation = ACT2FN[activation]

    def forward(self, hidden_state):
        """
        forwards a ResNet bottleneck layer.

        Args:
            self (ResNetBottleNeckLayer): An instance of the ResNetBottleNeckLayer class.
            hidden_state (Tensor): The input hidden state tensor.

        Returns:
            None.

        Raises:
            None.
        """
        residual = hidden_state
        hidden_state = self.layer(hidden_state)
        residual = self.shortcut(residual)
        hidden_state += residual
        hidden_state = self.activation(hidden_state)
        return hidden_state

mindnlp.transformers.models.resnet.modeling_resnet.ResNetBottleNeckLayer.__init__(in_channels, out_channels, stride=1, activation='relu', reduction=4, downsample_in_bottleneck=False)

Initializes a ResNetBottleNeckLayer object.

PARAMETER DESCRIPTION
self

The instance of the ResNetBottleNeckLayer class.

in_channels

The number of input channels.

TYPE: int

out_channels

The number of output channels.

TYPE: int

stride

The stride value for the convolutional layers. Defaults to 1.

TYPE: int DEFAULT: 1

activation

The activation function to be applied. Defaults to 'relu'.

TYPE: str DEFAULT: 'relu'

reduction

The reduction factor for the number of output channels. Defaults to 4.

TYPE: int DEFAULT: 4

downsample_in_bottleneck

Whether to downsample in the bottleneck layer. Defaults to False.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/resnet/modeling_resnet.py
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
def __init__(
    self,
    in_channels: int,
    out_channels: int,
    stride: int = 1,
    activation: str = "relu",
    reduction: int = 4,
    downsample_in_bottleneck: bool = False,
):
    """
    Initializes a ResNetBottleNeckLayer object.

    Args:
        self: The instance of the ResNetBottleNeckLayer class.
        in_channels (int): The number of input channels.
        out_channels (int): The number of output channels.
        stride (int, optional): The stride value for the convolutional layers. Defaults to 1.
        activation (str, optional): The activation function to be applied. Defaults to 'relu'.
        reduction (int, optional): The reduction factor for the number of output channels. Defaults to 4.
        downsample_in_bottleneck (bool, optional): Whether to downsample in the bottleneck layer. Defaults to False.

    Returns:
        None

    Raises:
        None
    """
    super().__init__()
    should_apply_shortcut = in_channels != out_channels or stride != 1
    reduces_channels = out_channels // reduction
    self.shortcut = (
        ResNetShortCut(in_channels, out_channels, stride=stride) if should_apply_shortcut else nn.Identity()
    )
    self.layer = nn.Sequential(
        ResNetConvLayer(
            in_channels, reduces_channels, kernel_size=1, stride=stride if downsample_in_bottleneck else 1
        ),
        ResNetConvLayer(reduces_channels, reduces_channels, stride=stride if not downsample_in_bottleneck else 1),
        ResNetConvLayer(reduces_channels, out_channels, kernel_size=1, activation=None),
    )
    self.activation = ACT2FN[activation]

mindnlp.transformers.models.resnet.modeling_resnet.ResNetBottleNeckLayer.forward(hidden_state)

forwards a ResNet bottleneck layer.

PARAMETER DESCRIPTION
self

An instance of the ResNetBottleNeckLayer class.

TYPE: ResNetBottleNeckLayer

hidden_state

The input hidden state tensor.

TYPE: Tensor

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/resnet/modeling_resnet.py
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
def forward(self, hidden_state):
    """
    forwards a ResNet bottleneck layer.

    Args:
        self (ResNetBottleNeckLayer): An instance of the ResNetBottleNeckLayer class.
        hidden_state (Tensor): The input hidden state tensor.

    Returns:
        None.

    Raises:
        None.
    """
    residual = hidden_state
    hidden_state = self.layer(hidden_state)
    residual = self.shortcut(residual)
    hidden_state += residual
    hidden_state = self.activation(hidden_state)
    return hidden_state

mindnlp.transformers.models.resnet.modeling_resnet.ResNetConvLayer

Bases: Module

The ResNetConvLayer class represents a convolutional layer used in the ResNet neural network architecture.

This class inherits from the nn.Module class and is designed to process input data through a series of operations including convolution, normalization, and activation.

ATTRIBUTE DESCRIPTION
convolution

The convolutional layer used for feature extraction.

TYPE: Conv2d

normalization

The batch normalization layer used for normalizing the outputs of the convolutional layer.

TYPE: BatchNorm2d

activation

The activation function applied to the normalized outputs.

TYPE: Identity or callable

METHOD DESCRIPTION
__init__

Initializes the ResNetConvLayer with the specified parameters.

forward

Applies the convolutional layer, normalization, and activation to the input tensor and returns the processed tensor.

Source code in mindnlp/transformers/models/resnet/modeling_resnet.py
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
class ResNetConvLayer(nn.Module):

    """
    The ResNetConvLayer class represents a convolutional layer used in the ResNet neural network architecture. 

    This class inherits from the nn.Module class and is designed to process input data through a series of operations
    including convolution, normalization, and activation.

    Attributes:
        convolution (nn.Conv2d): The convolutional layer used for feature extraction.
        normalization (nn.BatchNorm2d): The batch normalization layer used for normalizing the outputs of
            the convolutional layer.
        activation (nn.Identity or callable): The activation function applied to the normalized outputs.

    Methods:
        __init__:
            Initializes the ResNetConvLayer with the specified parameters.

        forward:
            Applies the convolutional layer, normalization, and activation to the input tensor and returns the processed tensor.
    """
    def __init__(
        self, in_channels: int, out_channels: int, kernel_size: int = 3, stride: int = 1, activation: str = "relu"
    ):
        """
        Initializes a ResNetConvLayer object.

        Args:
            self (ResNetConvLayer): The instance of the ResNetConvLayer class.
            in_channels (int): The number of input channels.
            out_channels (int): The number of output channels.
            kernel_size (int, optional): The size of the convolutional kernel. Defaults to 3.
            stride (int, optional): The stride of the convolutional kernel. Defaults to 1.
            activation (str, optional): The activation function to be applied. Defaults to 'relu'.

        Returns:
            None

        Raises:
            None
        """
        super().__init__()
        self.convolution = nn.Conv2d(
            in_channels, out_channels,
            kernel_size=kernel_size, stride=stride, padding=kernel_size // 2, bias=False
        )
        self.normalization = nn.BatchNorm2d(out_channels)
        self.activation = ACT2FN[activation] if activation is not None else nn.Identity()

    def forward(self, input: mindspore.Tensor) -> mindspore.Tensor:
        """
        Method 'forward' in the class 'ResNetConvLayer'.

        Args:
            self: Instance of the ResNetConvLayer class.
                Type: ResNetConvLayer
                Purpose: Represents the current instance of the ResNetConvLayer class.
                Restrictions: None.

            input: Input tensor for the convolution layer.
                Type: mindspore.Tensor
                Purpose: Represents the input tensor to be processed by the convolution layer.
                Restrictions: Should be a valid mindspore.Tensor.

        Returns:
            hidden_state:
                A tensor representing the processed output after passing through the convolution layer:

                - Type: mindspore.Tensor
                - Purpose: Represents the transformed tensor after passing through the convolution layer.

        Raises:
            None.
        """
        hidden_state = self.convolution(input)
        hidden_state = self.normalization(hidden_state)
        hidden_state = self.activation(hidden_state)
        return hidden_state

mindnlp.transformers.models.resnet.modeling_resnet.ResNetConvLayer.__init__(in_channels, out_channels, kernel_size=3, stride=1, activation='relu')

Initializes a ResNetConvLayer object.

PARAMETER DESCRIPTION
self

The instance of the ResNetConvLayer class.

TYPE: ResNetConvLayer

in_channels

The number of input channels.

TYPE: int

out_channels

The number of output channels.

TYPE: int

kernel_size

The size of the convolutional kernel. Defaults to 3.

TYPE: int DEFAULT: 3

stride

The stride of the convolutional kernel. Defaults to 1.

TYPE: int DEFAULT: 1

activation

The activation function to be applied. Defaults to 'relu'.

TYPE: str DEFAULT: 'relu'

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/resnet/modeling_resnet.py
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
def __init__(
    self, in_channels: int, out_channels: int, kernel_size: int = 3, stride: int = 1, activation: str = "relu"
):
    """
    Initializes a ResNetConvLayer object.

    Args:
        self (ResNetConvLayer): The instance of the ResNetConvLayer class.
        in_channels (int): The number of input channels.
        out_channels (int): The number of output channels.
        kernel_size (int, optional): The size of the convolutional kernel. Defaults to 3.
        stride (int, optional): The stride of the convolutional kernel. Defaults to 1.
        activation (str, optional): The activation function to be applied. Defaults to 'relu'.

    Returns:
        None

    Raises:
        None
    """
    super().__init__()
    self.convolution = nn.Conv2d(
        in_channels, out_channels,
        kernel_size=kernel_size, stride=stride, padding=kernel_size // 2, bias=False
    )
    self.normalization = nn.BatchNorm2d(out_channels)
    self.activation = ACT2FN[activation] if activation is not None else nn.Identity()

mindnlp.transformers.models.resnet.modeling_resnet.ResNetConvLayer.forward(input)

Method 'forward' in the class 'ResNetConvLayer'.

PARAMETER DESCRIPTION
self

Instance of the ResNetConvLayer class. Type: ResNetConvLayer Purpose: Represents the current instance of the ResNetConvLayer class. Restrictions: None.

input

Input tensor for the convolution layer. Type: mindspore.Tensor Purpose: Represents the input tensor to be processed by the convolution layer. Restrictions: Should be a valid mindspore.Tensor.

TYPE: Tensor

RETURNS DESCRIPTION
hidden_state

A tensor representing the processed output after passing through the convolution layer:

  • Type: mindspore.Tensor
  • Purpose: Represents the transformed tensor after passing through the convolution layer.

TYPE: Tensor

Source code in mindnlp/transformers/models/resnet/modeling_resnet.py
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
def forward(self, input: mindspore.Tensor) -> mindspore.Tensor:
    """
    Method 'forward' in the class 'ResNetConvLayer'.

    Args:
        self: Instance of the ResNetConvLayer class.
            Type: ResNetConvLayer
            Purpose: Represents the current instance of the ResNetConvLayer class.
            Restrictions: None.

        input: Input tensor for the convolution layer.
            Type: mindspore.Tensor
            Purpose: Represents the input tensor to be processed by the convolution layer.
            Restrictions: Should be a valid mindspore.Tensor.

    Returns:
        hidden_state:
            A tensor representing the processed output after passing through the convolution layer:

            - Type: mindspore.Tensor
            - Purpose: Represents the transformed tensor after passing through the convolution layer.

    Raises:
        None.
    """
    hidden_state = self.convolution(input)
    hidden_state = self.normalization(hidden_state)
    hidden_state = self.activation(hidden_state)
    return hidden_state

mindnlp.transformers.models.resnet.modeling_resnet.ResNetEmbeddings

Bases: Module

ResNet Embeddings (stem) composed of a single aggressive convolution.

Source code in mindnlp/transformers/models/resnet/modeling_resnet.py
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
class ResNetEmbeddings(nn.Module):
    """
    ResNet Embeddings (stem) composed of a single aggressive convolution.
    """
    def __init__(self, config: ResNetConfig):
        """
        Initializes an instance of the ResNetEmbeddings class.

        Args:
            self: The instance of the ResNetEmbeddings class.
            config (ResNetConfig):
                The configuration object that contains parameters for the ResNet embeddings.

                - num_channels (int): The number of input channels.
                - embedding_size (int): The size of the output embeddings.
                - hidden_act (str): The activation function for the hidden layers.

        Returns:
            None

        Raises:
            None
        """
        super().__init__()
        self.embedder = ResNetConvLayer(
            config.num_channels, config.embedding_size, kernel_size=7, stride=2, activation=config.hidden_act
        )
        self.pooler = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.num_channels = config.num_channels

    def forward(self, pixel_values: mindspore.Tensor) -> mindspore.Tensor:
        """
        forwards the embeddings for a given set of pixel values.

        Args:
            self (ResNetEmbeddings): An instance of the ResNetEmbeddings class.
            pixel_values (mindspore.Tensor): A tensor containing the pixel values of an image.

        Returns:
            mindspore.Tensor: The embeddings generated from the pixel values.

        Raises:
            ValueError: If the number of channels in the pixel_values tensor does not match the number of channels
                set in the configuration.

        This method takes in the pixel values of an image and generates embeddings using the ResNet model.
        It first checks if the number of channels in the pixel_values tensor matches the number of channels
        set in the configuration. If they do not match, a ValueError is raised. Otherwise, the pixel_values tensor
        is passed through the embedder and then the pooler to generate the embeddings.
        The resulting embeddings are returned as a mindspore.Tensor object.
        """
        num_channels = pixel_values.shape[1]
        if num_channels != self.num_channels:
            raise ValueError(
                "Make sure that the channel dimension of the pixel values match with the one set in the configuration."
            )
        embedding = self.embedder(pixel_values)
        embedding = self.pooler(embedding)
        return embedding

mindnlp.transformers.models.resnet.modeling_resnet.ResNetEmbeddings.__init__(config)

Initializes an instance of the ResNetEmbeddings class.

PARAMETER DESCRIPTION
self

The instance of the ResNetEmbeddings class.

config

The configuration object that contains parameters for the ResNet embeddings.

  • num_channels (int): The number of input channels.
  • embedding_size (int): The size of the output embeddings.
  • hidden_act (str): The activation function for the hidden layers.

TYPE: ResNetConfig

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/resnet/modeling_resnet.py
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
def __init__(self, config: ResNetConfig):
    """
    Initializes an instance of the ResNetEmbeddings class.

    Args:
        self: The instance of the ResNetEmbeddings class.
        config (ResNetConfig):
            The configuration object that contains parameters for the ResNet embeddings.

            - num_channels (int): The number of input channels.
            - embedding_size (int): The size of the output embeddings.
            - hidden_act (str): The activation function for the hidden layers.

    Returns:
        None

    Raises:
        None
    """
    super().__init__()
    self.embedder = ResNetConvLayer(
        config.num_channels, config.embedding_size, kernel_size=7, stride=2, activation=config.hidden_act
    )
    self.pooler = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
    self.num_channels = config.num_channels

mindnlp.transformers.models.resnet.modeling_resnet.ResNetEmbeddings.forward(pixel_values)

forwards the embeddings for a given set of pixel values.

PARAMETER DESCRIPTION
self

An instance of the ResNetEmbeddings class.

TYPE: ResNetEmbeddings

pixel_values

A tensor containing the pixel values of an image.

TYPE: Tensor

RETURNS DESCRIPTION
Tensor

mindspore.Tensor: The embeddings generated from the pixel values.

RAISES DESCRIPTION
ValueError

If the number of channels in the pixel_values tensor does not match the number of channels set in the configuration.

This method takes in the pixel values of an image and generates embeddings using the ResNet model. It first checks if the number of channels in the pixel_values tensor matches the number of channels set in the configuration. If they do not match, a ValueError is raised. Otherwise, the pixel_values tensor is passed through the embedder and then the pooler to generate the embeddings. The resulting embeddings are returned as a mindspore.Tensor object.

Source code in mindnlp/transformers/models/resnet/modeling_resnet.py
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
def forward(self, pixel_values: mindspore.Tensor) -> mindspore.Tensor:
    """
    forwards the embeddings for a given set of pixel values.

    Args:
        self (ResNetEmbeddings): An instance of the ResNetEmbeddings class.
        pixel_values (mindspore.Tensor): A tensor containing the pixel values of an image.

    Returns:
        mindspore.Tensor: The embeddings generated from the pixel values.

    Raises:
        ValueError: If the number of channels in the pixel_values tensor does not match the number of channels
            set in the configuration.

    This method takes in the pixel values of an image and generates embeddings using the ResNet model.
    It first checks if the number of channels in the pixel_values tensor matches the number of channels
    set in the configuration. If they do not match, a ValueError is raised. Otherwise, the pixel_values tensor
    is passed through the embedder and then the pooler to generate the embeddings.
    The resulting embeddings are returned as a mindspore.Tensor object.
    """
    num_channels = pixel_values.shape[1]
    if num_channels != self.num_channels:
        raise ValueError(
            "Make sure that the channel dimension of the pixel values match with the one set in the configuration."
        )
    embedding = self.embedder(pixel_values)
    embedding = self.pooler(embedding)
    return embedding

mindnlp.transformers.models.resnet.modeling_resnet.ResNetEncoder

Bases: Module

ResNetEncoder is a class that represents a Residual Neural Network (ResNet) encoder. It is a subclass of nn.Module and is used for forwarding the encoder part of a ResNet model.

ATTRIBUTE DESCRIPTION
stages

A list of ResNetStage instances representing the different stages of the ResNet encoder.

TYPE: ModuleList

METHOD DESCRIPTION
__init__

Initializes a ResNetEncoder instance.

Args:

  • config (ResNetConfig): An instance of ResNetConfig class containing the configuration parameters for the ResNet encoder.
forward

forwards the ResNet encoder.

Args:

  • hidden_state (mindspore.Tensor): The input hidden state tensor.
  • output_hidden_states (bool, optional): A flag indicating whether to output hidden states at each stage. Defaults to False.
  • return_dict (bool, optional): A flag indicating whether to return the output as a BaseModelOutputWithNoAttention instance. Defaults to True.

Returns:

  • BaseModelOutputWithNoAttention: An instance of BaseModelOutputWithNoAttention containing the encoder output.
Source code in mindnlp/transformers/models/resnet/modeling_resnet.py
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
class ResNetEncoder(nn.Module):

    """
    ResNetEncoder is a class that represents a Residual Neural Network (ResNet) encoder.
    It is a subclass of nn.Module and is used for forwarding the encoder part of a ResNet model.

    Attributes:
        stages (nn.ModuleList): A list of ResNetStage instances representing the different stages of the ResNet encoder.

    Methods:
        __init__:
            Initializes a ResNetEncoder instance.

            Args:

            - config (ResNetConfig): An instance of ResNetConfig class containing the configuration parameters
            for the ResNet encoder.

        forward:
            forwards the ResNet encoder.

            Args:

            - hidden_state (mindspore.Tensor): The input hidden state tensor.
            - output_hidden_states (bool, optional): A flag indicating whether to output hidden states at each stage.
            Defaults to False.
            - return_dict (bool, optional): A flag indicating whether to return the output as a
            BaseModelOutputWithNoAttention instance. Defaults to True.

            Returns:

            - BaseModelOutputWithNoAttention: An instance of BaseModelOutputWithNoAttention containing the encoder output.

    """
    def __init__(self, config: ResNetConfig):
        """
        Initializes an instance of the ResNetEncoder class.

        Args:
            self: The current instance of the class.
            config (ResNetConfig): The configuration object specifying the parameters for the ResNetEncoder.
                It is expected to have the following attributes:

                - embedding_size (int): The size of the input embeddings.
                - hidden_sizes (List[int]): A list of integers specifying the number of output channels for each
                ResNet stage.
                - depths (List[int]): A list of integers specifying the number of residual blocks in each ResNet stage.
                - downsample_in_first_stage (bool): A boolean indicating whether to perform downsampling in the first
                 ResNet stage or not. If True, the stride is set to 2; otherwise, it is set to 1.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()
        self.stages = nn.ModuleList([])
        # based on `downsample_in_first_stage` the first layer of the first stage may or may not downsample the input
        self.stages.append(
            ResNetStage(
                config,
                config.embedding_size,
                config.hidden_sizes[0],
                stride=2 if config.downsample_in_first_stage else 1,
                depth=config.depths[0],
            )
        )
        in_out_channels = zip(config.hidden_sizes, config.hidden_sizes[1:])
        for (in_channels, out_channels), depth in zip(in_out_channels, config.depths[1:]):
            self.stages.append(ResNetStage(config, in_channels, out_channels, depth=depth))

    def forward(
        self, hidden_state: mindspore.Tensor, output_hidden_states: bool = False, return_dict: bool = True
    ) -> BaseModelOutputWithNoAttention:
        """
        forwards the ResNetEncoder by processing the hidden state through the defined stages.

        Args:
            self (ResNetEncoder): The instance of the ResNetEncoder class.
            hidden_state (mindspore.Tensor): The input hidden state to be processed through the encoder.
            output_hidden_states (bool, optional): Whether to output hidden states at each stage. Defaults to False.
            return_dict (bool, optional): Whether to return the output as a dictionary. Defaults to True.

        Returns:
            BaseModelOutputWithNoAttention: An instance of BaseModelOutputWithNoAttention containing the
                last hidden state and optionally all hidden states if output_hidden_states is set to True.

        Raises:
            ValueError: If hidden_state is not a valid mindspore.Tensor.
            TypeError: If hidden_state is not of type mindspore.Tensor.
            RuntimeError: If an error occurs during the processing of the hidden state.

        """
        hidden_states = () if output_hidden_states else None

        for stage_module in self.stages:
            if output_hidden_states:
                hidden_states = hidden_states + (hidden_state,)

            hidden_state = stage_module(hidden_state)

        if output_hidden_states:
            hidden_states = hidden_states + (hidden_state,)

        if not return_dict:
            return tuple(v for v in [hidden_state, hidden_states] if v is not None)

        return BaseModelOutputWithNoAttention(
            last_hidden_state=hidden_state,
            hidden_states=hidden_states,
        )

mindnlp.transformers.models.resnet.modeling_resnet.ResNetEncoder.__init__(config)

Initializes an instance of the ResNetEncoder class.

PARAMETER DESCRIPTION
self

The current instance of the class.

config

The configuration object specifying the parameters for the ResNetEncoder. It is expected to have the following attributes:

  • embedding_size (int): The size of the input embeddings.
  • hidden_sizes (List[int]): A list of integers specifying the number of output channels for each ResNet stage.
  • depths (List[int]): A list of integers specifying the number of residual blocks in each ResNet stage.
  • downsample_in_first_stage (bool): A boolean indicating whether to perform downsampling in the first ResNet stage or not. If True, the stride is set to 2; otherwise, it is set to 1.

TYPE: ResNetConfig

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/resnet/modeling_resnet.py
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
def __init__(self, config: ResNetConfig):
    """
    Initializes an instance of the ResNetEncoder class.

    Args:
        self: The current instance of the class.
        config (ResNetConfig): The configuration object specifying the parameters for the ResNetEncoder.
            It is expected to have the following attributes:

            - embedding_size (int): The size of the input embeddings.
            - hidden_sizes (List[int]): A list of integers specifying the number of output channels for each
            ResNet stage.
            - depths (List[int]): A list of integers specifying the number of residual blocks in each ResNet stage.
            - downsample_in_first_stage (bool): A boolean indicating whether to perform downsampling in the first
             ResNet stage or not. If True, the stride is set to 2; otherwise, it is set to 1.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()
    self.stages = nn.ModuleList([])
    # based on `downsample_in_first_stage` the first layer of the first stage may or may not downsample the input
    self.stages.append(
        ResNetStage(
            config,
            config.embedding_size,
            config.hidden_sizes[0],
            stride=2 if config.downsample_in_first_stage else 1,
            depth=config.depths[0],
        )
    )
    in_out_channels = zip(config.hidden_sizes, config.hidden_sizes[1:])
    for (in_channels, out_channels), depth in zip(in_out_channels, config.depths[1:]):
        self.stages.append(ResNetStage(config, in_channels, out_channels, depth=depth))

mindnlp.transformers.models.resnet.modeling_resnet.ResNetEncoder.forward(hidden_state, output_hidden_states=False, return_dict=True)

forwards the ResNetEncoder by processing the hidden state through the defined stages.

PARAMETER DESCRIPTION
self

The instance of the ResNetEncoder class.

TYPE: ResNetEncoder

hidden_state

The input hidden state to be processed through the encoder.

TYPE: Tensor

output_hidden_states

Whether to output hidden states at each stage. Defaults to False.

TYPE: bool DEFAULT: False

return_dict

Whether to return the output as a dictionary. Defaults to True.

TYPE: bool DEFAULT: True

RETURNS DESCRIPTION
BaseModelOutputWithNoAttention

An instance of BaseModelOutputWithNoAttention containing the last hidden state and optionally all hidden states if output_hidden_states is set to True.

TYPE: BaseModelOutputWithNoAttention

RAISES DESCRIPTION
ValueError

If hidden_state is not a valid mindspore.Tensor.

TypeError

If hidden_state is not of type mindspore.Tensor.

RuntimeError

If an error occurs during the processing of the hidden state.

Source code in mindnlp/transformers/models/resnet/modeling_resnet.py
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
def forward(
    self, hidden_state: mindspore.Tensor, output_hidden_states: bool = False, return_dict: bool = True
) -> BaseModelOutputWithNoAttention:
    """
    forwards the ResNetEncoder by processing the hidden state through the defined stages.

    Args:
        self (ResNetEncoder): The instance of the ResNetEncoder class.
        hidden_state (mindspore.Tensor): The input hidden state to be processed through the encoder.
        output_hidden_states (bool, optional): Whether to output hidden states at each stage. Defaults to False.
        return_dict (bool, optional): Whether to return the output as a dictionary. Defaults to True.

    Returns:
        BaseModelOutputWithNoAttention: An instance of BaseModelOutputWithNoAttention containing the
            last hidden state and optionally all hidden states if output_hidden_states is set to True.

    Raises:
        ValueError: If hidden_state is not a valid mindspore.Tensor.
        TypeError: If hidden_state is not of type mindspore.Tensor.
        RuntimeError: If an error occurs during the processing of the hidden state.

    """
    hidden_states = () if output_hidden_states else None

    for stage_module in self.stages:
        if output_hidden_states:
            hidden_states = hidden_states + (hidden_state,)

        hidden_state = stage_module(hidden_state)

    if output_hidden_states:
        hidden_states = hidden_states + (hidden_state,)

    if not return_dict:
        return tuple(v for v in [hidden_state, hidden_states] if v is not None)

    return BaseModelOutputWithNoAttention(
        last_hidden_state=hidden_state,
        hidden_states=hidden_states,
    )

mindnlp.transformers.models.resnet.modeling_resnet.ResNetForImageClassification

Bases: ResNetPreTrainedModel

ResNetForImageClassification is a class that represents a ResNet model for image classification tasks. It inherits from the ResNetPreTrainedModel class and includes methods for initializing the model and performing image classification.

ATTRIBUTE DESCRIPTION
num_labels

The number of labels for the classification task.

TYPE: int

resnet

The ResNet model used for feature extraction.

TYPE: ResNetModel

classifier

The classifier module for final classification.

TYPE: Sequential

config

Configuration settings for the model.

TYPE: Sequential

METHOD DESCRIPTION
__init__

Initializes the ResNetForImageClassification model with the given configuration.

forward

forwards the model for image classification, taking pixel values, labels, and optional parameters as input and returning the classification output.

PARAMETER DESCRIPTION
pixel_values

Tensor containing the pixel values of the input images.

TYPE: Tensor

labels

Tensor containing the labels for computing classification/regression loss.

TYPE: Tensor

output_hidden_states

Flag to indicate whether to return hidden states in the output.

TYPE: bool

return_dict

Flag to indicate whether to return the output as a dictionary.

TYPE: bool

RETURNS DESCRIPTION
ImageClassifierOutputWithNoAttention

An ImageClassifierOutputWithNoAttention object containing the classification output with optional loss value and hidden states.

Notes
  • Labels should be indices in the range [0, config.num_labels - 1].
  • Classification loss is computed using Cross-Entropy if config.num_labels > 1.
  • The problem type is automatically determined based on the number of labels and label data type.
Source code in mindnlp/transformers/models/resnet/modeling_resnet.py
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
class ResNetForImageClassification(ResNetPreTrainedModel):

    """
    ResNetForImageClassification is a class that represents a ResNet model for image classification tasks.
    It inherits from the ResNetPreTrainedModel class and includes methods for initializing the model and
    performing image classification.

    Attributes:
        num_labels (int): The number of labels for the classification task.
        resnet (ResNetModel): The ResNet model used for feature extraction.
        classifier (nn.Sequential): The classifier module for final classification.
        config: Configuration settings for the model.

    Methods:
        __init__: Initializes the ResNetForImageClassification model with the given configuration.
        forward: forwards the model for image classification, taking pixel values, labels, and optional
            parameters as input and returning the classification output.

    Parameters:
        pixel_values (mindspore.Tensor, optional): Tensor containing the pixel values of the input images.
        labels (mindspore.Tensor, optional): Tensor containing the labels for computing classification/regression loss.
        output_hidden_states (bool, optional): Flag to indicate whether to return hidden states in the output.
        return_dict (bool, optional): Flag to indicate whether to return the output as a dictionary.

    Returns:
        ImageClassifierOutputWithNoAttention:
            An ImageClassifierOutputWithNoAttention object containing the classification output with optional loss value
            and hidden states.

    Notes:
        - Labels should be indices in the range [0, config.num_labels - 1].
        - Classification loss is computed using Cross-Entropy if config.num_labels > 1.
        - The problem type is automatically determined based on the number of labels and label data type.
    """
    def __init__(self, config):
        """
        Initializes the ResNetForImageClassification class.

        Args:
            self: The instance of the class.
            config (object): An object containing configuration parameters for the model.
                It should have the following attributes:

                - num_labels (int): The number of output labels.
                - hidden_sizes (list): A list of integers representing hidden layer sizes.

        Returns:
            None.

        Raises:
            TypeError: If the config parameter is not provided or is not an object.
            ValueError: If config.num_labels is not an integer or config.hidden_sizes is not a list.
        """
        super().__init__(config)
        self.num_labels = config.num_labels
        self.resnet = ResNetModel(config)
        # classification head
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(config.hidden_sizes[-1], config.num_labels) if config.num_labels > 0 else nn.Identity(),
        )
        # initialize weights and apply final processing
        self.post_init()

    def forward(
        self,
        pixel_values: Optional[mindspore.Tensor] = None,
        labels: Optional[mindspore.Tensor] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> ImageClassifierOutputWithNoAttention:
        r"""
        Args:
            labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
                Labels for computing the image classification/regression loss. Indices should be in `[0, ...,
                config.num_labels - 1]`. If `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        outputs = self.resnet(pixel_values, output_hidden_states=output_hidden_states, return_dict=return_dict)

        pooled_output = outputs.pooler_output if return_dict else outputs[1]

        logits = self.classifier(pooled_output)

        loss = None

        if labels is not None:
            if self.config.problem_type is None:
                if self.num_labels == 1:
                    self.config.problem_type = "regression"
                elif self.num_labels > 1 and labels.dtype in (mindspore.int64, mindspore.int32):
                    self.config.problem_type = "single_label_classification"
                else:
                    self.config.problem_type = "multi_label_classification"
            if self.config.problem_type == "regression":
                if self.num_labels == 1:
                    loss = ops.mse_loss(logits.squeeze(), labels.squeeze())
                else:
                    loss = ops.mse_loss(logits, labels)
            elif self.config.problem_type == "single_label_classification":
                loss = ops.cross_entropy(logits.view(-1, self.num_labels), labels.view(-1))
            elif self.config.problem_type == "multi_label_classification":
                loss = ops.binary_cross_entropy_with_logits(logits, labels)

        if not return_dict:
            output = (logits,) + outputs[2:]
            return (loss,) + output if loss is not None else output

        return ImageClassifierOutputWithNoAttention(loss=loss, logits=logits, hidden_states=outputs.hidden_states)

mindnlp.transformers.models.resnet.modeling_resnet.ResNetForImageClassification.__init__(config)

Initializes the ResNetForImageClassification class.

PARAMETER DESCRIPTION
self

The instance of the class.

config

An object containing configuration parameters for the model. It should have the following attributes:

  • num_labels (int): The number of output labels.
  • hidden_sizes (list): A list of integers representing hidden layer sizes.

TYPE: object

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
TypeError

If the config parameter is not provided or is not an object.

ValueError

If config.num_labels is not an integer or config.hidden_sizes is not a list.

Source code in mindnlp/transformers/models/resnet/modeling_resnet.py
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
def __init__(self, config):
    """
    Initializes the ResNetForImageClassification class.

    Args:
        self: The instance of the class.
        config (object): An object containing configuration parameters for the model.
            It should have the following attributes:

            - num_labels (int): The number of output labels.
            - hidden_sizes (list): A list of integers representing hidden layer sizes.

    Returns:
        None.

    Raises:
        TypeError: If the config parameter is not provided or is not an object.
        ValueError: If config.num_labels is not an integer or config.hidden_sizes is not a list.
    """
    super().__init__(config)
    self.num_labels = config.num_labels
    self.resnet = ResNetModel(config)
    # classification head
    self.classifier = nn.Sequential(
        nn.Flatten(),
        nn.Linear(config.hidden_sizes[-1], config.num_labels) if config.num_labels > 0 else nn.Identity(),
    )
    # initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.resnet.modeling_resnet.ResNetForImageClassification.forward(pixel_values=None, labels=None, output_hidden_states=None, return_dict=None)

PARAMETER DESCRIPTION
labels

Labels for computing the image classification/regression loss. Indices should be in [0, ..., config.num_labels - 1]. If config.num_labels > 1 a classification loss is computed (Cross-Entropy).

TYPE: `mindspore.Tensor` of shape `(batch_size,)`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/resnet/modeling_resnet.py
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
def forward(
    self,
    pixel_values: Optional[mindspore.Tensor] = None,
    labels: Optional[mindspore.Tensor] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> ImageClassifierOutputWithNoAttention:
    r"""
    Args:
        labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for computing the image classification/regression loss. Indices should be in `[0, ...,
            config.num_labels - 1]`. If `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    outputs = self.resnet(pixel_values, output_hidden_states=output_hidden_states, return_dict=return_dict)

    pooled_output = outputs.pooler_output if return_dict else outputs[1]

    logits = self.classifier(pooled_output)

    loss = None

    if labels is not None:
        if self.config.problem_type is None:
            if self.num_labels == 1:
                self.config.problem_type = "regression"
            elif self.num_labels > 1 and labels.dtype in (mindspore.int64, mindspore.int32):
                self.config.problem_type = "single_label_classification"
            else:
                self.config.problem_type = "multi_label_classification"
        if self.config.problem_type == "regression":
            if self.num_labels == 1:
                loss = ops.mse_loss(logits.squeeze(), labels.squeeze())
            else:
                loss = ops.mse_loss(logits, labels)
        elif self.config.problem_type == "single_label_classification":
            loss = ops.cross_entropy(logits.view(-1, self.num_labels), labels.view(-1))
        elif self.config.problem_type == "multi_label_classification":
            loss = ops.binary_cross_entropy_with_logits(logits, labels)

    if not return_dict:
        output = (logits,) + outputs[2:]
        return (loss,) + output if loss is not None else output

    return ImageClassifierOutputWithNoAttention(loss=loss, logits=logits, hidden_states=outputs.hidden_states)

mindnlp.transformers.models.resnet.modeling_resnet.ResNetModel

Bases: ResNetPreTrainedModel

Source code in mindnlp/transformers/models/resnet/modeling_resnet.py
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
class ResNetModel(ResNetPreTrainedModel):
    def __init__(self, config):
        super().__init__(config)
        self.config = config
        self.embedder = ResNetEmbeddings(config)
        self.encoder = ResNetEncoder(config)
        self.pooler = nn.AdaptiveAvgPool2d((1, 1))
        # Initialize weights and apply final processing
        self.post_init()

    def forward(
        self, pixel_values: mindspore.Tensor, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None
    ) -> BaseModelOutputWithPoolingAndNoAttention:
        """
        forwards a ResNet model.

        Args:
            self: The object instance.
            pixel_values (mindspore.Tensor): The input pixel values of the images.
                It should be a tensor of shape [batch_size, height, width, channels].
            output_hidden_states (Optional[bool]): Whether to return hidden states of the encoder. Defaults to None.
                If not provided, it uses the value from the configuration.
            return_dict (Optional[bool]): Whether to return the output as a dictionary. Defaults to None.
                If not provided, it uses the value from the configuration.

        Returns:
            BaseModelOutputWithPoolingAndNoAttention:
                An instance of the BaseModelOutputWithPoolingAndNoAttention class containing the following outputs:

                - last_hidden_state (mindspore.Tensor): The last hidden state of the encoder.
                It has a shape of [batch_size, sequence_length, hidden_size].
                - pooled_output (mindspore.Tensor): The pooled output of the encoder.
                It has a shape of [batch_size, hidden_size].
                - hidden_states (Tuple[mindspore.Tensor]): A tuple of hidden states of the encoder
                if `output_hidden_states` is set to True. Each hidden state is a tensor of shape
                [batch_size, sequence_length, hidden_size].

        Raises:
            None.

        """
        output_hidden_states = (
            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
        )
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        embedding_output = self.embedder(pixel_values)

        encoder_outputs = self.encoder(
            embedding_output, output_hidden_states=output_hidden_states, return_dict=return_dict
        )

        last_hidden_state = encoder_outputs[0]

        pooled_output = self.pooler(last_hidden_state)

        if not return_dict:
            return (last_hidden_state, pooled_output) + encoder_outputs[1:]

        return BaseModelOutputWithPoolingAndNoAttention(
            last_hidden_state=last_hidden_state,
            pooler_output=pooled_output,
            hidden_states=encoder_outputs.hidden_states,
        )

mindnlp.transformers.models.resnet.modeling_resnet.ResNetModel.forward(pixel_values, output_hidden_states=None, return_dict=None)

forwards a ResNet model.

PARAMETER DESCRIPTION
self

The object instance.

pixel_values

The input pixel values of the images. It should be a tensor of shape [batch_size, height, width, channels].

TYPE: Tensor

output_hidden_states

Whether to return hidden states of the encoder. Defaults to None. If not provided, it uses the value from the configuration.

TYPE: Optional[bool] DEFAULT: None

return_dict

Whether to return the output as a dictionary. Defaults to None. If not provided, it uses the value from the configuration.

TYPE: Optional[bool] DEFAULT: None

RETURNS DESCRIPTION
BaseModelOutputWithPoolingAndNoAttention

An instance of the BaseModelOutputWithPoolingAndNoAttention class containing the following outputs:

  • last_hidden_state (mindspore.Tensor): The last hidden state of the encoder. It has a shape of [batch_size, sequence_length, hidden_size].
  • pooled_output (mindspore.Tensor): The pooled output of the encoder. It has a shape of [batch_size, hidden_size].
  • hidden_states (Tuple[mindspore.Tensor]): A tuple of hidden states of the encoder if output_hidden_states is set to True. Each hidden state is a tensor of shape [batch_size, sequence_length, hidden_size].

TYPE: BaseModelOutputWithPoolingAndNoAttention

Source code in mindnlp/transformers/models/resnet/modeling_resnet.py
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
def forward(
    self, pixel_values: mindspore.Tensor, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None
) -> BaseModelOutputWithPoolingAndNoAttention:
    """
    forwards a ResNet model.

    Args:
        self: The object instance.
        pixel_values (mindspore.Tensor): The input pixel values of the images.
            It should be a tensor of shape [batch_size, height, width, channels].
        output_hidden_states (Optional[bool]): Whether to return hidden states of the encoder. Defaults to None.
            If not provided, it uses the value from the configuration.
        return_dict (Optional[bool]): Whether to return the output as a dictionary. Defaults to None.
            If not provided, it uses the value from the configuration.

    Returns:
        BaseModelOutputWithPoolingAndNoAttention:
            An instance of the BaseModelOutputWithPoolingAndNoAttention class containing the following outputs:

            - last_hidden_state (mindspore.Tensor): The last hidden state of the encoder.
            It has a shape of [batch_size, sequence_length, hidden_size].
            - pooled_output (mindspore.Tensor): The pooled output of the encoder.
            It has a shape of [batch_size, hidden_size].
            - hidden_states (Tuple[mindspore.Tensor]): A tuple of hidden states of the encoder
            if `output_hidden_states` is set to True. Each hidden state is a tensor of shape
            [batch_size, sequence_length, hidden_size].

    Raises:
        None.

    """
    output_hidden_states = (
        output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
    )
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    embedding_output = self.embedder(pixel_values)

    encoder_outputs = self.encoder(
        embedding_output, output_hidden_states=output_hidden_states, return_dict=return_dict
    )

    last_hidden_state = encoder_outputs[0]

    pooled_output = self.pooler(last_hidden_state)

    if not return_dict:
        return (last_hidden_state, pooled_output) + encoder_outputs[1:]

    return BaseModelOutputWithPoolingAndNoAttention(
        last_hidden_state=last_hidden_state,
        pooler_output=pooled_output,
        hidden_states=encoder_outputs.hidden_states,
    )

mindnlp.transformers.models.resnet.modeling_resnet.ResNetPreTrainedModel

Bases: PreTrainedModel

An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained models.

Source code in mindnlp/transformers/models/resnet/modeling_resnet.py
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
class ResNetPreTrainedModel(PreTrainedModel):
    """
    An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
    models.
    """
    config_class = ResNetConfig
    base_model_prefix = "resnet"
    main_input_name = "pixel_values"
    _no_split_modules = ["ResNetConvLayer", "ResNetShortCut"]
    _keys_to_ignore_on_load_unexpected = [r'num_batches_tracked']

    def _init_weights(self, module):
        """
        This method initializes the weights of the given module according to the specified initialization scheme.

        Args:
            self (ResNetPreTrainedModel): The instance of the ResNetPreTrainedModel class.
            module: The module for which the weights need to be initialized.

        Returns:
            None.

        Raises:
            TypeError: If the module is of an unsupported type.
            ValueError: If the module's weight and bias initialization fails.
        """
        if isinstance(module, nn.Conv2d):
            module.weight.set_data(initializer(HeNormal(), module.weight.shape, module.weight.dtype))
            #module.weight.initialize(HeNormal(mode='fan_out', nonlinearity='relu'))
        elif isinstance(module, (nn.BatchNorm2d, nn.GroupNorm)):
            module.weight.set_data(
                initializer(
                    "zeros",
                    module.bias.shape,
                    module.bias.dtype,
                )
            )
            module.weight.set_data(
                initializer("ones", module.weight.shape, module.weight.dtype)
            )

mindnlp.transformers.models.resnet.modeling_resnet.ResNetShortCut

Bases: Module

ResNet shortcut, used to project the residual features to the correct size. If needed, it is also used to downsample the input using stride=2.

Source code in mindnlp/transformers/models/resnet/modeling_resnet.py
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
class ResNetShortCut(nn.Module):
    """
    ResNet shortcut, used to project the residual features to the correct size. If needed, it is also used to
    downsample the input using `stride=2`.
    """
    def __init__(self, in_channels: int, out_channels: int, stride: int = 2):
        """
        Initializes a new instance of the ResNetShortCut class.

        Args:
            self: The object itself.
            in_channels (int): The number of input channels.
                This parameter specifies the number of channels in the input tensor.
                It must be a positive integer.
            out_channels (int): The number of output channels.
                This parameter specifies the number of channels produced by the convolution.
                It must be a positive integer.
            stride (int, optional): The stride of the convolution. Default is 2.
                This parameter determines the stride size of the convolution operation.
                It must be a positive integer.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()
        self.convolution = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, bias=False)
        self.normalization = nn.BatchNorm2d(out_channels)

    def forward(self, input: mindspore.Tensor) -> mindspore.Tensor:
        """
        forwards a hidden state tensor using convolution and normalization operations.

        Args:
            self (ResNetShortCut): The instance of the ResNetShortCut class.
            input (mindspore.Tensor): The input tensor for the forwardion process.

        Returns:
            mindspore.Tensor: A tensor representing the hidden state after applying convolution and normalization.

        Raises:
            None
        """
        hidden_state = self.convolution(input)
        hidden_state = self.normalization(hidden_state)
        return hidden_state

mindnlp.transformers.models.resnet.modeling_resnet.ResNetShortCut.__init__(in_channels, out_channels, stride=2)

Initializes a new instance of the ResNetShortCut class.

PARAMETER DESCRIPTION
self

The object itself.

in_channels

The number of input channels. This parameter specifies the number of channels in the input tensor. It must be a positive integer.

TYPE: int

out_channels

The number of output channels. This parameter specifies the number of channels produced by the convolution. It must be a positive integer.

TYPE: int

stride

The stride of the convolution. Default is 2. This parameter determines the stride size of the convolution operation. It must be a positive integer.

TYPE: int DEFAULT: 2

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/resnet/modeling_resnet.py
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
def __init__(self, in_channels: int, out_channels: int, stride: int = 2):
    """
    Initializes a new instance of the ResNetShortCut class.

    Args:
        self: The object itself.
        in_channels (int): The number of input channels.
            This parameter specifies the number of channels in the input tensor.
            It must be a positive integer.
        out_channels (int): The number of output channels.
            This parameter specifies the number of channels produced by the convolution.
            It must be a positive integer.
        stride (int, optional): The stride of the convolution. Default is 2.
            This parameter determines the stride size of the convolution operation.
            It must be a positive integer.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()
    self.convolution = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, bias=False)
    self.normalization = nn.BatchNorm2d(out_channels)

mindnlp.transformers.models.resnet.modeling_resnet.ResNetShortCut.forward(input)

forwards a hidden state tensor using convolution and normalization operations.

PARAMETER DESCRIPTION
self

The instance of the ResNetShortCut class.

TYPE: ResNetShortCut

input

The input tensor for the forwardion process.

TYPE: Tensor

RETURNS DESCRIPTION
Tensor

mindspore.Tensor: A tensor representing the hidden state after applying convolution and normalization.

Source code in mindnlp/transformers/models/resnet/modeling_resnet.py
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
def forward(self, input: mindspore.Tensor) -> mindspore.Tensor:
    """
    forwards a hidden state tensor using convolution and normalization operations.

    Args:
        self (ResNetShortCut): The instance of the ResNetShortCut class.
        input (mindspore.Tensor): The input tensor for the forwardion process.

    Returns:
        mindspore.Tensor: A tensor representing the hidden state after applying convolution and normalization.

    Raises:
        None
    """
    hidden_state = self.convolution(input)
    hidden_state = self.normalization(hidden_state)
    return hidden_state

mindnlp.transformers.models.resnet.modeling_resnet.ResNetStage

Bases: Module

A ResNet stage composed by stacked layers.

Source code in mindnlp/transformers/models/resnet/modeling_resnet.py
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
class ResNetStage(nn.Module):
    """
    A ResNet stage composed by stacked layers.
    """
    def __init__(
        self,
        config: ResNetConfig,
        in_channels: int,
        out_channels: int,
        stride: int = 2,
        depth: int = 2,
    ):
        """
        Initializes a ResNetStage object.

        Args:
            self: The instance of the class.
            config (ResNetConfig): The configuration object for the ResNet model.
            in_channels (int): The number of input channels.
            out_channels (int): The number of output channels.
            stride (int, optional): The stride value for the convolutional layers. Defaults to 2.
            depth (int, optional): The depth of the ResNet stage. Defaults to 2.

        Returns:
            None.

        Raises:
            TypeError: If the provided config is not an instance of ResNetConfig.
            ValueError: If in_channels or out_channels are not integers, or if depth is not a positive integer.
        """
        super().__init__()

        layer = ResNetBottleNeckLayer if config.layer_type == "bottleneck" else ResNetBasicLayer

        if config.layer_type == "bottleneck":
            first_layer = layer(
                in_channels,
                out_channels,
                stride=stride,
                activation=config.hidden_act,
                downsample_in_bottleneck=config.downsample_in_bottleneck,
            )
        else:
            first_layer = layer(in_channels, out_channels, stride=stride, activation=config.hidden_act)
        self.layers = nn.Sequential(
            first_layer, *[layer(out_channels, out_channels, activation=config.hidden_act) for _ in range(depth - 1)]
        )

    def forward(self, input: mindspore.Tensor) -> mindspore.Tensor:
        """
        forwards the hidden state of the ResNet stage based on the given input.

        Args:
            self (ResNetStage): An instance of the ResNetStage class.
            input (mindspore.Tensor): The input tensor for forwarding the hidden state.

        Returns:
            mindspore.Tensor: The forwarded hidden state tensor.

        Raises:
            None.
        """
        hidden_state = input
        for layer in self.layers:
            hidden_state = layer(hidden_state)
        return hidden_state

mindnlp.transformers.models.resnet.modeling_resnet.ResNetStage.__init__(config, in_channels, out_channels, stride=2, depth=2)

Initializes a ResNetStage object.

PARAMETER DESCRIPTION
self

The instance of the class.

config

The configuration object for the ResNet model.

TYPE: ResNetConfig

in_channels

The number of input channels.

TYPE: int

out_channels

The number of output channels.

TYPE: int

stride

The stride value for the convolutional layers. Defaults to 2.

TYPE: int DEFAULT: 2

depth

The depth of the ResNet stage. Defaults to 2.

TYPE: int DEFAULT: 2

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
TypeError

If the provided config is not an instance of ResNetConfig.

ValueError

If in_channels or out_channels are not integers, or if depth is not a positive integer.

Source code in mindnlp/transformers/models/resnet/modeling_resnet.py
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
def __init__(
    self,
    config: ResNetConfig,
    in_channels: int,
    out_channels: int,
    stride: int = 2,
    depth: int = 2,
):
    """
    Initializes a ResNetStage object.

    Args:
        self: The instance of the class.
        config (ResNetConfig): The configuration object for the ResNet model.
        in_channels (int): The number of input channels.
        out_channels (int): The number of output channels.
        stride (int, optional): The stride value for the convolutional layers. Defaults to 2.
        depth (int, optional): The depth of the ResNet stage. Defaults to 2.

    Returns:
        None.

    Raises:
        TypeError: If the provided config is not an instance of ResNetConfig.
        ValueError: If in_channels or out_channels are not integers, or if depth is not a positive integer.
    """
    super().__init__()

    layer = ResNetBottleNeckLayer if config.layer_type == "bottleneck" else ResNetBasicLayer

    if config.layer_type == "bottleneck":
        first_layer = layer(
            in_channels,
            out_channels,
            stride=stride,
            activation=config.hidden_act,
            downsample_in_bottleneck=config.downsample_in_bottleneck,
        )
    else:
        first_layer = layer(in_channels, out_channels, stride=stride, activation=config.hidden_act)
    self.layers = nn.Sequential(
        first_layer, *[layer(out_channels, out_channels, activation=config.hidden_act) for _ in range(depth - 1)]
    )

mindnlp.transformers.models.resnet.modeling_resnet.ResNetStage.forward(input)

forwards the hidden state of the ResNet stage based on the given input.

PARAMETER DESCRIPTION
self

An instance of the ResNetStage class.

TYPE: ResNetStage

input

The input tensor for forwarding the hidden state.

TYPE: Tensor

RETURNS DESCRIPTION
Tensor

mindspore.Tensor: The forwarded hidden state tensor.

Source code in mindnlp/transformers/models/resnet/modeling_resnet.py
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
def forward(self, input: mindspore.Tensor) -> mindspore.Tensor:
    """
    forwards the hidden state of the ResNet stage based on the given input.

    Args:
        self (ResNetStage): An instance of the ResNetStage class.
        input (mindspore.Tensor): The input tensor for forwarding the hidden state.

    Returns:
        mindspore.Tensor: The forwarded hidden state tensor.

    Raises:
        None.
    """
    hidden_state = input
    for layer in self.layers:
        hidden_state = layer(hidden_state)
    return hidden_state