Skip to content

convnext

mindnlp.transformers.models.convnext.configuration_convnext

ConvNeXT model configuration

mindnlp.transformers.models.convnext.configuration_convnext.ConvNextConfig

Bases: BackboneConfigMixin, PretrainedConfig

This is the configuration class to store the configuration of a [ConvNextModel]. It is used to instantiate an ConvNeXT model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the ConvNeXT facebook/convnext-tiny-224 architecture.

Configuration objects inherit from [PretrainedConfig] and can be used to control the model outputs. Read the documentation from [PretrainedConfig] for more information.

PARAMETER DESCRIPTION
num_channels

The number of input channels.

TYPE: `int`, *optional*, defaults to 3 DEFAULT: 3

patch_size

Patch size to use in the patch embedding layer.

TYPE: `int`, optional, defaults to 4 DEFAULT: 4

num_stages

The number of stages in the model.

TYPE: `int`, optional, defaults to 4 DEFAULT: 4

hidden_sizes

Dimensionality (hidden size) at each stage.

TYPE: `List[int]`, *optional*, defaults to [96, 192, 384, 768] DEFAULT: None

depths

Depth (number of blocks) for each stage.

TYPE: `List[int]`, *optional*, defaults to [3, 3, 9, 3] DEFAULT: None

hidden_act

The non-linear activation function (function or string) in each block. If string, "gelu", "relu", "selu" and "gelu_new" are supported.

TYPE: `str` or `function`, *optional*, defaults to `"gelu"` DEFAULT: 'gelu'

initializer_range

The standard deviation of the truncated_normal_initializer for initializing all weight matrices.

TYPE: `float`, *optional*, defaults to 0.02 DEFAULT: 0.02

layer_norm_eps

The epsilon used by the layer normalization layers.

TYPE: `float`, *optional*, defaults to 1e-12 DEFAULT: 1e-12

layer_scale_init_value

The initial value for the layer scale.

TYPE: `float`, *optional*, defaults to 1e-6 DEFAULT: 1e-06

drop_path_rate

The drop rate for stochastic depth.

TYPE: `float`, *optional*, defaults to 0.0 DEFAULT: 0.0

out_features

If used as backbone, list of features to output. Can be any of "stem", "stage1", "stage2", etc. (depending on how many stages the model has). If unset and out_indices is set, will default to the corresponding stages. If unset and out_indices is unset, will default to the last stage. Must be in the same order as defined in the stage_names attribute.

TYPE: `List[str]`, *optional* DEFAULT: None

out_indices

If used as backbone, list of indices of features to output. Can be any of 0, 1, 2, etc. (depending on how many stages the model has). If unset and out_features is set, will default to the corresponding stages. If unset and out_features is unset, will default to the last stage. Must be in the same order as defined in the stage_names attribute.

TYPE: `List[int]`, *optional* DEFAULT: None

Example
>>> from transformers import ConvNextConfig, ConvNextModel
...
>>> # Initializing a ConvNext convnext-tiny-224 style configuration
>>> configuration = ConvNextConfig()
...
>>> # Initializing a model (with random weights) from the convnext-tiny-224 style configuration
>>> model = ConvNextModel(configuration)
...
>>> # Accessing the model configuration
>>> configuration = model.config
Source code in mindnlp/transformers/models/convnext/configuration_convnext.py
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
class ConvNextConfig(BackboneConfigMixin, PretrainedConfig):
    r"""
    This is the configuration class to store the configuration of a [`ConvNextModel`]. It is used to instantiate an
    ConvNeXT model according to the specified arguments, defining the model architecture. Instantiating a configuration
    with the defaults will yield a similar configuration to that of the ConvNeXT
    [facebook/convnext-tiny-224](https://huggingface.co/facebook/convnext-tiny-224) architecture.

    Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
    documentation from [`PretrainedConfig`] for more information.

    Args:
        num_channels (`int`, *optional*, defaults to 3):
            The number of input channels.
        patch_size (`int`, optional, defaults to 4):
            Patch size to use in the patch embedding layer.
        num_stages (`int`, optional, defaults to 4):
            The number of stages in the model.
        hidden_sizes (`List[int]`, *optional*, defaults to [96, 192, 384, 768]):
            Dimensionality (hidden size) at each stage.
        depths (`List[int]`, *optional*, defaults to [3, 3, 9, 3]):
            Depth (number of blocks) for each stage.
        hidden_act (`str` or `function`, *optional*, defaults to `"gelu"`):
            The non-linear activation function (function or string) in each block. If string, `"gelu"`, `"relu"`,
            `"selu"` and `"gelu_new"` are supported.
        initializer_range (`float`, *optional*, defaults to 0.02):
            The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
        layer_norm_eps (`float`, *optional*, defaults to 1e-12):
            The epsilon used by the layer normalization layers.
        layer_scale_init_value (`float`, *optional*, defaults to 1e-6):
            The initial value for the layer scale.
        drop_path_rate (`float`, *optional*, defaults to 0.0):
            The drop rate for stochastic depth.
        out_features (`List[str]`, *optional*):
            If used as backbone, list of features to output. Can be any of `"stem"`, `"stage1"`, `"stage2"`, etc.
            (depending on how many stages the model has). If unset and `out_indices` is set, will default to the
            corresponding stages. If unset and `out_indices` is unset, will default to the last stage. Must be in the
            same order as defined in the `stage_names` attribute.
        out_indices (`List[int]`, *optional*):
            If used as backbone, list of indices of features to output. Can be any of 0, 1, 2, etc. (depending on how
            many stages the model has). If unset and `out_features` is set, will default to the corresponding stages.
            If unset and `out_features` is unset, will default to the last stage. Must be in the
            same order as defined in the `stage_names` attribute.

    Example:
        ```python
        >>> from transformers import ConvNextConfig, ConvNextModel
        ...
        >>> # Initializing a ConvNext convnext-tiny-224 style configuration
        >>> configuration = ConvNextConfig()
        ...
        >>> # Initializing a model (with random weights) from the convnext-tiny-224 style configuration
        >>> model = ConvNextModel(configuration)
        ...
        >>> # Accessing the model configuration
        >>> configuration = model.config
        ```
    """
    model_type = "convnext"

    def __init__(
        self,
        num_channels=3,
        patch_size=4,
        num_stages=4,
        hidden_sizes=None,
        depths=None,
        hidden_act="gelu",
        initializer_range=0.02,
        layer_norm_eps=1e-12,
        layer_scale_init_value=1e-6,
        drop_path_rate=0.0,
        image_size=224,
        out_features=None,
        out_indices=None,
        **kwargs,
    ):
        """
        Initialize a ConvNextConfig object.

        Args:
            num_channels (int): Number of input channels. Default is 3.
            patch_size (int): Patch size used in the model. Default is 4.
            num_stages (int): Number of stages in the model. Default is 4.
            hidden_sizes (list): List of hidden layer sizes for each stage. Default is [96, 192, 384, 768].
            depths (list): List of depths for each stage. Default is [3, 3, 9, 3].
            hidden_act (str): Activation function for hidden layers. Default is 'gelu'.
            initializer_range (float): Range for weight initialization. Default is 0.02.
            layer_norm_eps (float): Epsilon value for layer normalization. Default is 1e-12.
            layer_scale_init_value (float): Initial value for layer scale. Default is 1e-06.
            drop_path_rate (float): Rate of drop path regularization. Default is 0.0.
            image_size (int): Size of input images. Default is 224.
            out_features (list): List of output features.
            out_indices (list): List of output indices.

        Returns:
            None

        Raises:
            None
        """
        super().__init__(**kwargs)

        self.num_channels = num_channels
        self.patch_size = patch_size
        self.num_stages = num_stages
        self.hidden_sizes = [96, 192, 384, 768] if hidden_sizes is None else hidden_sizes
        self.depths = [3, 3, 9, 3] if depths is None else depths
        self.hidden_act = hidden_act
        self.initializer_range = initializer_range
        self.layer_norm_eps = layer_norm_eps
        self.layer_scale_init_value = layer_scale_init_value
        self.drop_path_rate = drop_path_rate
        self.image_size = image_size
        self.stage_names = ["stem"] + [f"stage{idx}" for idx in range(1, len(self.depths) + 1)]
        self._out_features, self._out_indices = get_aligned_output_features_output_indices(
            out_features=out_features, out_indices=out_indices, stage_names=self.stage_names
        )

mindnlp.transformers.models.convnext.configuration_convnext.ConvNextConfig.__init__(num_channels=3, patch_size=4, num_stages=4, hidden_sizes=None, depths=None, hidden_act='gelu', initializer_range=0.02, layer_norm_eps=1e-12, layer_scale_init_value=1e-06, drop_path_rate=0.0, image_size=224, out_features=None, out_indices=None, **kwargs)

Initialize a ConvNextConfig object.

PARAMETER DESCRIPTION
num_channels

Number of input channels. Default is 3.

TYPE: int DEFAULT: 3

patch_size

Patch size used in the model. Default is 4.

TYPE: int DEFAULT: 4

num_stages

Number of stages in the model. Default is 4.

TYPE: int DEFAULT: 4

hidden_sizes

List of hidden layer sizes for each stage. Default is [96, 192, 384, 768].

TYPE: list DEFAULT: None

depths

List of depths for each stage. Default is [3, 3, 9, 3].

TYPE: list DEFAULT: None

hidden_act

Activation function for hidden layers. Default is 'gelu'.

TYPE: str DEFAULT: 'gelu'

initializer_range

Range for weight initialization. Default is 0.02.

TYPE: float DEFAULT: 0.02

layer_norm_eps

Epsilon value for layer normalization. Default is 1e-12.

TYPE: float DEFAULT: 1e-12

layer_scale_init_value

Initial value for layer scale. Default is 1e-06.

TYPE: float DEFAULT: 1e-06

drop_path_rate

Rate of drop path regularization. Default is 0.0.

TYPE: float DEFAULT: 0.0

image_size

Size of input images. Default is 224.

TYPE: int DEFAULT: 224

out_features

List of output features.

TYPE: list DEFAULT: None

out_indices

List of output indices.

TYPE: list DEFAULT: None

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/convnext/configuration_convnext.py
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
def __init__(
    self,
    num_channels=3,
    patch_size=4,
    num_stages=4,
    hidden_sizes=None,
    depths=None,
    hidden_act="gelu",
    initializer_range=0.02,
    layer_norm_eps=1e-12,
    layer_scale_init_value=1e-6,
    drop_path_rate=0.0,
    image_size=224,
    out_features=None,
    out_indices=None,
    **kwargs,
):
    """
    Initialize a ConvNextConfig object.

    Args:
        num_channels (int): Number of input channels. Default is 3.
        patch_size (int): Patch size used in the model. Default is 4.
        num_stages (int): Number of stages in the model. Default is 4.
        hidden_sizes (list): List of hidden layer sizes for each stage. Default is [96, 192, 384, 768].
        depths (list): List of depths for each stage. Default is [3, 3, 9, 3].
        hidden_act (str): Activation function for hidden layers. Default is 'gelu'.
        initializer_range (float): Range for weight initialization. Default is 0.02.
        layer_norm_eps (float): Epsilon value for layer normalization. Default is 1e-12.
        layer_scale_init_value (float): Initial value for layer scale. Default is 1e-06.
        drop_path_rate (float): Rate of drop path regularization. Default is 0.0.
        image_size (int): Size of input images. Default is 224.
        out_features (list): List of output features.
        out_indices (list): List of output indices.

    Returns:
        None

    Raises:
        None
    """
    super().__init__(**kwargs)

    self.num_channels = num_channels
    self.patch_size = patch_size
    self.num_stages = num_stages
    self.hidden_sizes = [96, 192, 384, 768] if hidden_sizes is None else hidden_sizes
    self.depths = [3, 3, 9, 3] if depths is None else depths
    self.hidden_act = hidden_act
    self.initializer_range = initializer_range
    self.layer_norm_eps = layer_norm_eps
    self.layer_scale_init_value = layer_scale_init_value
    self.drop_path_rate = drop_path_rate
    self.image_size = image_size
    self.stage_names = ["stem"] + [f"stage{idx}" for idx in range(1, len(self.depths) + 1)]
    self._out_features, self._out_indices = get_aligned_output_features_output_indices(
        out_features=out_features, out_indices=out_indices, stage_names=self.stage_names
    )

mindnlp.transformers.models.convnext.modeling_convnext

MindSpore ConvNext model.

mindnlp.transformers.models.convnext.modeling_convnext.ConvNextBackbone

Bases: ConvNextPreTrainedModel, BackboneMixin

This class represents the ConvNext backbone used in a ConvNext model for image processing tasks. It inherits functionality from ConvNextPreTrainedModel and BackboneMixin.

The ConvNextBackbone class initializes the backbone architecture with ConvNextEmbeddings and ConvNextEncoder components. It also sets up layer normalization for hidden states based on the specified configuration. The forward method processes input pixel values through the embeddings and encoder, optionally returning hidden states and feature maps. It handles the logic for outputting the desired information based on the configuration settings.

RETURNS DESCRIPTION
BackboneOutput

A named tuple containing the feature maps and hidden states of the backbone.

Example
>>> from transformers import AutoImageProcessor, AutoBackbone
>>> import torch
>>> from PIL import Image
>>> import requests
...
>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
>>> image = Image.open(requests.get(url, stream=True).raw)
...
>>> processor = AutoImageProcessor.from_pretrained("facebook/convnext-tiny-224")
>>> model = AutoBackbone.from_pretrained("facebook/convnext-tiny-224")
...
>>> inputs = processor(image, return_tensors="pt")
>>> outputs = model(**inputs)
Source code in mindnlp/transformers/models/convnext/modeling_convnext.py
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
class ConvNextBackbone(ConvNextPreTrainedModel, BackboneMixin):

    """
    This class represents the ConvNext backbone used in a ConvNext model for image processing tasks.
    It inherits functionality from ConvNextPreTrainedModel and BackboneMixin.

    The ConvNextBackbone class initializes the backbone architecture with ConvNextEmbeddings and ConvNextEncoder
    components. It also sets up layer normalization for hidden states based on the specified configuration.
    The forward method processes input pixel values through the embeddings and encoder, optionally returning
    hidden states and feature maps. It handles the logic for outputting the desired information based
    on the configuration settings.

    Returns:
        BackboneOutput: A named tuple containing the feature maps and hidden states of the backbone.

    Example:
        ```python
        >>> from transformers import AutoImageProcessor, AutoBackbone
        >>> import torch
        >>> from PIL import Image
        >>> import requests
        ...
        >>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
        >>> image = Image.open(requests.get(url, stream=True).raw)
        ...
        >>> processor = AutoImageProcessor.from_pretrained("facebook/convnext-tiny-224")
        >>> model = AutoBackbone.from_pretrained("facebook/convnext-tiny-224")
        ...
        >>> inputs = processor(image, return_tensors="pt")
        >>> outputs = model(**inputs)
        ```
    """
    def __init__(self, config):
        """
        Initializes an instance of the ConvNextBackbone class.

        Args:
            self: The instance of the class.
            config: A configuration object containing the necessary parameters for initializing the backbone.
                It should have the following attributes:

                - hidden_sizes (list): A list of integers representing the hidden layer sizes.
                - channels (list): A list of integers representing the number of channels for each stage.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__(config)
        super()._init_backbone(config)

        self.embeddings = ConvNextEmbeddings(config)
        self.encoder = ConvNextEncoder(config)
        self.num_features = [config.hidden_sizes[0]] + config.hidden_sizes

        # Add layer norms to hidden states of out_features
        hidden_states_norms = {}
        for stage, num_channels in zip(self._out_features, self.channels):
            hidden_states_norms[stage] = ConvNextLayerNorm(num_channels, data_format="channels_first")
        self.hidden_states_norms = nn.ModuleDict(hidden_states_norms)

        # initialize weights and apply final processing
        self.post_init()

    def forward(
        self,
        pixel_values: mindspore.Tensor,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> BackboneOutput:
        """
        Returns:
            BackboneOutput

        Example:
            ```python
            >>> from transformers import AutoImageProcessor, AutoBackbone
            >>> import torch
            >>> from PIL import Image
            >>> import requests
            ...
            >>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
            >>> image = Image.open(requests.get(url, stream=True).raw)
            ...
            >>> processor = AutoImageProcessor.from_pretrained("facebook/convnext-tiny-224")
            >>> model = AutoBackbone.from_pretrained("facebook/convnext-tiny-224")
            ...
            >>> inputs = processor(image, return_tensors="pt")
            >>> outputs = model(**inputs)
            ```
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
        output_hidden_states = (
            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
        )

        embedding_output = self.embeddings(pixel_values)

        outputs = self.encoder(
            embedding_output,
            output_hidden_states=True,
            return_dict=return_dict,
        )

        hidden_states = outputs.hidden_states if return_dict else outputs[1]

        feature_maps = ()
        for stage, hidden_state in zip(self.stage_names, hidden_states):
            if stage in self.out_features:
                hidden_state = self.hidden_states_norms[stage](hidden_state)
                feature_maps += (hidden_state,)

        if not return_dict:
            output = (feature_maps,)
            if output_hidden_states:
                output += (hidden_states,)
            return output

        return BackboneOutput(
            feature_maps=feature_maps,
            hidden_states=hidden_states if output_hidden_states else None,
            attentions=None,
        )

mindnlp.transformers.models.convnext.modeling_convnext.ConvNextBackbone.__init__(config)

Initializes an instance of the ConvNextBackbone class.

PARAMETER DESCRIPTION
self

The instance of the class.

config

A configuration object containing the necessary parameters for initializing the backbone. It should have the following attributes:

  • hidden_sizes (list): A list of integers representing the hidden layer sizes.
  • channels (list): A list of integers representing the number of channels for each stage.

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/convnext/modeling_convnext.py
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
def __init__(self, config):
    """
    Initializes an instance of the ConvNextBackbone class.

    Args:
        self: The instance of the class.
        config: A configuration object containing the necessary parameters for initializing the backbone.
            It should have the following attributes:

            - hidden_sizes (list): A list of integers representing the hidden layer sizes.
            - channels (list): A list of integers representing the number of channels for each stage.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__(config)
    super()._init_backbone(config)

    self.embeddings = ConvNextEmbeddings(config)
    self.encoder = ConvNextEncoder(config)
    self.num_features = [config.hidden_sizes[0]] + config.hidden_sizes

    # Add layer norms to hidden states of out_features
    hidden_states_norms = {}
    for stage, num_channels in zip(self._out_features, self.channels):
        hidden_states_norms[stage] = ConvNextLayerNorm(num_channels, data_format="channels_first")
    self.hidden_states_norms = nn.ModuleDict(hidden_states_norms)

    # initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.convnext.modeling_convnext.ConvNextBackbone.forward(pixel_values, output_hidden_states=None, return_dict=None)

RETURNS DESCRIPTION
BackboneOutput

BackboneOutput

Example
>>> from transformers import AutoImageProcessor, AutoBackbone
>>> import torch
>>> from PIL import Image
>>> import requests
...
>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
>>> image = Image.open(requests.get(url, stream=True).raw)
...
>>> processor = AutoImageProcessor.from_pretrained("facebook/convnext-tiny-224")
>>> model = AutoBackbone.from_pretrained("facebook/convnext-tiny-224")
...
>>> inputs = processor(image, return_tensors="pt")
>>> outputs = model(**inputs)
Source code in mindnlp/transformers/models/convnext/modeling_convnext.py
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
def forward(
    self,
    pixel_values: mindspore.Tensor,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> BackboneOutput:
    """
    Returns:
        BackboneOutput

    Example:
        ```python
        >>> from transformers import AutoImageProcessor, AutoBackbone
        >>> import torch
        >>> from PIL import Image
        >>> import requests
        ...
        >>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
        >>> image = Image.open(requests.get(url, stream=True).raw)
        ...
        >>> processor = AutoImageProcessor.from_pretrained("facebook/convnext-tiny-224")
        >>> model = AutoBackbone.from_pretrained("facebook/convnext-tiny-224")
        ...
        >>> inputs = processor(image, return_tensors="pt")
        >>> outputs = model(**inputs)
        ```
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict
    output_hidden_states = (
        output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
    )

    embedding_output = self.embeddings(pixel_values)

    outputs = self.encoder(
        embedding_output,
        output_hidden_states=True,
        return_dict=return_dict,
    )

    hidden_states = outputs.hidden_states if return_dict else outputs[1]

    feature_maps = ()
    for stage, hidden_state in zip(self.stage_names, hidden_states):
        if stage in self.out_features:
            hidden_state = self.hidden_states_norms[stage](hidden_state)
            feature_maps += (hidden_state,)

    if not return_dict:
        output = (feature_maps,)
        if output_hidden_states:
            output += (hidden_states,)
        return output

    return BackboneOutput(
        feature_maps=feature_maps,
        hidden_states=hidden_states if output_hidden_states else None,
        attentions=None,
    )

mindnlp.transformers.models.convnext.modeling_convnext.ConvNextDropPath

Bases: Module

Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).

Source code in mindnlp/transformers/models/convnext/modeling_convnext.py
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
class ConvNextDropPath(nn.Module):
    """Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks)."""
    def __init__(self, drop_prob: Optional[float] = None) -> None:
        """
        Initializes an instance of the ConvNextDropPath class.

        Args:
            self (object): The instance of the ConvNextDropPath class.
            drop_prob (Optional[float]): The probability of dropping a connection during training. 
                If not provided, defaults to None. Should be a float value between 0 and 1, inclusive.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()
        self.drop_prob = drop_prob

    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
        """
        Construct a drop path operation on the hidden states.

        Args:
            self (ConvNextDropPath): The instance of the ConvNextDropPath class.
            hidden_states (mindspore.Tensor):
                The input tensor of hidden states on which the drop path operation will be performed.

        Returns:
            mindspore.Tensor: The tensor resulting from applying the drop path operation on the input hidden states.

        Raises:
            ValueError: If the drop probability is not within the valid range.
            TypeError: If the input hidden_states is not a valid tensor type.
            RuntimeError: If the operation fails due to an internal error.
        """
        return drop_path(hidden_states, self.drop_prob, self.training)

    def extra_repr(self) -> str:
        """
        Method to generate a string representation of the drop probability in the ConvNextDropPath class.

        Args:
            self: ConvNextDropPath object. Represents the instance of the ConvNextDropPath class.

        Returns:
            str: A string representing the drop probability of the ConvNextDropPath object.

        Raises:
            None.
        """
        return "p={}".format(self.drop_prob)

mindnlp.transformers.models.convnext.modeling_convnext.ConvNextDropPath.__init__(drop_prob=None)

Initializes an instance of the ConvNextDropPath class.

PARAMETER DESCRIPTION
self

The instance of the ConvNextDropPath class.

TYPE: object

drop_prob

The probability of dropping a connection during training. If not provided, defaults to None. Should be a float value between 0 and 1, inclusive.

TYPE: Optional[float] DEFAULT: None

RETURNS DESCRIPTION
None

None.

Source code in mindnlp/transformers/models/convnext/modeling_convnext.py
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
def __init__(self, drop_prob: Optional[float] = None) -> None:
    """
    Initializes an instance of the ConvNextDropPath class.

    Args:
        self (object): The instance of the ConvNextDropPath class.
        drop_prob (Optional[float]): The probability of dropping a connection during training. 
            If not provided, defaults to None. Should be a float value between 0 and 1, inclusive.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()
    self.drop_prob = drop_prob

mindnlp.transformers.models.convnext.modeling_convnext.ConvNextDropPath.extra_repr()

Method to generate a string representation of the drop probability in the ConvNextDropPath class.

PARAMETER DESCRIPTION
self

ConvNextDropPath object. Represents the instance of the ConvNextDropPath class.

RETURNS DESCRIPTION
str

A string representing the drop probability of the ConvNextDropPath object.

TYPE: str

Source code in mindnlp/transformers/models/convnext/modeling_convnext.py
112
113
114
115
116
117
118
119
120
121
122
123
124
125
def extra_repr(self) -> str:
    """
    Method to generate a string representation of the drop probability in the ConvNextDropPath class.

    Args:
        self: ConvNextDropPath object. Represents the instance of the ConvNextDropPath class.

    Returns:
        str: A string representing the drop probability of the ConvNextDropPath object.

    Raises:
        None.
    """
    return "p={}".format(self.drop_prob)

mindnlp.transformers.models.convnext.modeling_convnext.ConvNextDropPath.forward(hidden_states)

Construct a drop path operation on the hidden states.

PARAMETER DESCRIPTION
self

The instance of the ConvNextDropPath class.

TYPE: ConvNextDropPath

hidden_states

The input tensor of hidden states on which the drop path operation will be performed.

TYPE: Tensor

RETURNS DESCRIPTION
Tensor

mindspore.Tensor: The tensor resulting from applying the drop path operation on the input hidden states.

RAISES DESCRIPTION
ValueError

If the drop probability is not within the valid range.

TypeError

If the input hidden_states is not a valid tensor type.

RuntimeError

If the operation fails due to an internal error.

Source code in mindnlp/transformers/models/convnext/modeling_convnext.py
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
    """
    Construct a drop path operation on the hidden states.

    Args:
        self (ConvNextDropPath): The instance of the ConvNextDropPath class.
        hidden_states (mindspore.Tensor):
            The input tensor of hidden states on which the drop path operation will be performed.

    Returns:
        mindspore.Tensor: The tensor resulting from applying the drop path operation on the input hidden states.

    Raises:
        ValueError: If the drop probability is not within the valid range.
        TypeError: If the input hidden_states is not a valid tensor type.
        RuntimeError: If the operation fails due to an internal error.
    """
    return drop_path(hidden_states, self.drop_prob, self.training)

mindnlp.transformers.models.convnext.modeling_convnext.ConvNextEmbeddings

Bases: Module

This class is comparable to (and inspired by) the SwinEmbeddings class found in src/transformers/models/swin/modeling_swin.py.

Source code in mindnlp/transformers/models/convnext/modeling_convnext.py
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
class ConvNextEmbeddings(nn.Module):
    """
    This class is comparable to (and inspired by) the SwinEmbeddings class
    found in src/transformers/models/swin/modeling_swin.py.
    """
    def __init__(self, config):
        """
        Initializes the ConvNextEmbeddings class.

        Args:
            self: The instance of the ConvNextEmbeddings class.
            config: An object containing the configuration parameters for the ConvNextEmbeddings class.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()
        self.patch_embeddings = nn.Conv2d(
            config.num_channels, config.hidden_sizes[0], kernel_size=config.patch_size, stride=config.patch_size,
            pad_mode='valid', bias=True
        )
        self.layernorm = ConvNextLayerNorm(config.hidden_sizes[0], eps=1e-6, data_format="channels_first")
        self.num_channels = config.num_channels

    def forward(self, pixel_values: mindspore.Tensor) -> mindspore.Tensor:
        """
        Constructs embeddings from the input pixel values using the ConvNextEmbeddings class.

        Args:
            self (ConvNextEmbeddings): An instance of the ConvNextEmbeddings class.
            pixel_values (mindspore.Tensor): A tensor containing pixel values with shape (batch_size, num_channels, height, width).
                The pixel values should align with the channel dimension specified in the configuration.

        Returns:
            mindspore.Tensor: A tensor representing the embeddings generated from the input pixel values.
                The embeddings have the same shape as the input pixel values.

        Raises:
            ValueError: If the number of channels in the input pixel values does not match the configured number of channels.
        """
        num_channels = pixel_values.shape[1]
        if num_channels != self.num_channels:
            raise ValueError(
                "Make sure that the channel dimension of the pixel values match with the one set in the configuration."
            )
        embeddings = self.patch_embeddings(pixel_values)
        embeddings = self.layernorm(embeddings)
        return embeddings

mindnlp.transformers.models.convnext.modeling_convnext.ConvNextEmbeddings.__init__(config)

Initializes the ConvNextEmbeddings class.

PARAMETER DESCRIPTION
self

The instance of the ConvNextEmbeddings class.

config

An object containing the configuration parameters for the ConvNextEmbeddings class.

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/convnext/modeling_convnext.py
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
def __init__(self, config):
    """
    Initializes the ConvNextEmbeddings class.

    Args:
        self: The instance of the ConvNextEmbeddings class.
        config: An object containing the configuration parameters for the ConvNextEmbeddings class.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()
    self.patch_embeddings = nn.Conv2d(
        config.num_channels, config.hidden_sizes[0], kernel_size=config.patch_size, stride=config.patch_size,
        pad_mode='valid', bias=True
    )
    self.layernorm = ConvNextLayerNorm(config.hidden_sizes[0], eps=1e-6, data_format="channels_first")
    self.num_channels = config.num_channels

mindnlp.transformers.models.convnext.modeling_convnext.ConvNextEmbeddings.forward(pixel_values)

Constructs embeddings from the input pixel values using the ConvNextEmbeddings class.

PARAMETER DESCRIPTION
self

An instance of the ConvNextEmbeddings class.

TYPE: ConvNextEmbeddings

pixel_values

A tensor containing pixel values with shape (batch_size, num_channels, height, width). The pixel values should align with the channel dimension specified in the configuration.

TYPE: Tensor

RETURNS DESCRIPTION
Tensor

mindspore.Tensor: A tensor representing the embeddings generated from the input pixel values. The embeddings have the same shape as the input pixel values.

RAISES DESCRIPTION
ValueError

If the number of channels in the input pixel values does not match the configured number of channels.

Source code in mindnlp/transformers/models/convnext/modeling_convnext.py
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
def forward(self, pixel_values: mindspore.Tensor) -> mindspore.Tensor:
    """
    Constructs embeddings from the input pixel values using the ConvNextEmbeddings class.

    Args:
        self (ConvNextEmbeddings): An instance of the ConvNextEmbeddings class.
        pixel_values (mindspore.Tensor): A tensor containing pixel values with shape (batch_size, num_channels, height, width).
            The pixel values should align with the channel dimension specified in the configuration.

    Returns:
        mindspore.Tensor: A tensor representing the embeddings generated from the input pixel values.
            The embeddings have the same shape as the input pixel values.

    Raises:
        ValueError: If the number of channels in the input pixel values does not match the configured number of channels.
    """
    num_channels = pixel_values.shape[1]
    if num_channels != self.num_channels:
        raise ValueError(
            "Make sure that the channel dimension of the pixel values match with the one set in the configuration."
        )
    embeddings = self.patch_embeddings(pixel_values)
    embeddings = self.layernorm(embeddings)
    return embeddings

mindnlp.transformers.models.convnext.modeling_convnext.ConvNextEncoder

Bases: Module

ConvNextEncoder is a Python class that represents an encoder for a Convolutional Neural Network (CNN) model.

This class inherits from the nn.Module class, which is a base class for all neural network layers in the MindSpore framework.

The ConvNextEncoder class initializes a list of stages, where each stage consists of a ConvNextStage module. The number of stages is defined by the config.num_stages attribute. Each stage performs convolutional operations with different parameters, such as input and output channels, stride, and depth. The drop_path_rates parameter specifies the drop path rates for each stage.

The forward method of the ConvNextEncoder class takes a tensor of hidden states as input and performs the forward pass through each stage. It optionally returns a tuple containing all hidden states at each stage, as specified by the output_hidden_states parameter. If return_dict is set to True, it returns an instance of the BaseModelOutputWithNoAttention class, which encapsulates the last hidden state and all hidden states.

Note that this docstring is generated based on the provided code, and the actual implementation may contain additional methods or attributes.

Source code in mindnlp/transformers/models/convnext/modeling_convnext.py
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
class ConvNextEncoder(nn.Module):

    """ConvNextEncoder is a Python class that represents an encoder for a Convolutional Neural Network (CNN) model.

    This class inherits from the nn.Module class, which is a base class for all neural network layers in the MindSpore framework.

    The ConvNextEncoder class initializes a list of stages, where each stage consists of a ConvNextStage module.
    The number of stages is defined by the config.num_stages attribute. Each stage performs convolutional operations
    with different parameters, such as input and output channels, stride, and depth.
    The drop_path_rates parameter specifies the drop path rates for each stage.

    The forward method of the ConvNextEncoder class takes a tensor of hidden states as input and performs the forward
    pass through each stage. It optionally returns a tuple containing all hidden states at each stage, as specified by
    the output_hidden_states parameter.
    If return_dict is set to True, it returns an instance of the BaseModelOutputWithNoAttention class, which
    encapsulates the last hidden state and all hidden states.

    Note that this docstring is generated based on the provided code, and the actual implementation may contain
    additional methods or attributes.

    """
    def __init__(self, config):
        """
        Initializes an instance of the ConvNextEncoder class.

        Args:
            self (ConvNextEncoder): The instance of the ConvNextEncoder class.
            config:
                A configuration object containing various settings for the ConvNextEncoder.

                - drop_path_rate (float): The rate at which to apply drop path regularization.
                - depths (list[int]): List of integers representing the depths of each stage.
                - hidden_sizes (list[int]): List of integers representing the number of hidden units in each stage.
                - num_stages (int): The total number of stages in the ConvNextEncoder.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()
        self.stages = nn.ModuleList()
        drop_path_rates = [
            x.tolist() for x in ops.linspace(0, config.drop_path_rate, sum(config.depths)).split(config.depths)
        ]
        prev_chs = config.hidden_sizes[0]
        for i in range(config.num_stages):
            out_chs = config.hidden_sizes[i]
            stage = ConvNextStage(
                config,
                in_channels=prev_chs,
                out_channels=out_chs,
                stride=2 if i > 0 else 1,
                depth=config.depths[i],
                drop_path_rates=drop_path_rates[i],
            )
            self.stages.append(stage)
            prev_chs = out_chs

    def forward(
        self,
        hidden_states: mindspore.Tensor,
        output_hidden_states: Optional[bool] = False,
        return_dict: Optional[bool] = True,
    ) -> Union[Tuple, BaseModelOutputWithNoAttention]:
        """
        Constructs the encoder for the ConvNext model.

        Args:
            self (ConvNextEncoder): The instance of the ConvNextEncoder class.
            hidden_states (mindspore.Tensor): The input hidden states to be processed by the encoder.
            output_hidden_states (Optional[bool], optional): Whether to output hidden states for each layer.
                Defaults to False.
            return_dict (Optional[bool], optional): Whether to return the output as a dictionary. Defaults to True.

        Returns:
            Union[Tuple, BaseModelOutputWithNoAttention]:
                The output value which can be a tuple of hidden states or BaseModelOutputWithNoAttention object.

        Raises:
            None
        """
        all_hidden_states = () if output_hidden_states else None

        for i, layer_module in enumerate(self.stages):
            if output_hidden_states:
                all_hidden_states = all_hidden_states + (hidden_states,)

            hidden_states = layer_module(hidden_states)

        if output_hidden_states:
            all_hidden_states = all_hidden_states + (hidden_states,)

        if not return_dict:
            return tuple(v for v in [hidden_states, all_hidden_states] if v is not None)

        return BaseModelOutputWithNoAttention(
            last_hidden_state=hidden_states,
            hidden_states=all_hidden_states,
        )

mindnlp.transformers.models.convnext.modeling_convnext.ConvNextEncoder.__init__(config)

Initializes an instance of the ConvNextEncoder class.

PARAMETER DESCRIPTION
self

The instance of the ConvNextEncoder class.

TYPE: ConvNextEncoder

config

A configuration object containing various settings for the ConvNextEncoder.

  • drop_path_rate (float): The rate at which to apply drop path regularization.
  • depths (list[int]): List of integers representing the depths of each stage.
  • hidden_sizes (list[int]): List of integers representing the number of hidden units in each stage.
  • num_stages (int): The total number of stages in the ConvNextEncoder.

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/convnext/modeling_convnext.py
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
def __init__(self, config):
    """
    Initializes an instance of the ConvNextEncoder class.

    Args:
        self (ConvNextEncoder): The instance of the ConvNextEncoder class.
        config:
            A configuration object containing various settings for the ConvNextEncoder.

            - drop_path_rate (float): The rate at which to apply drop path regularization.
            - depths (list[int]): List of integers representing the depths of each stage.
            - hidden_sizes (list[int]): List of integers representing the number of hidden units in each stage.
            - num_stages (int): The total number of stages in the ConvNextEncoder.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()
    self.stages = nn.ModuleList()
    drop_path_rates = [
        x.tolist() for x in ops.linspace(0, config.drop_path_rate, sum(config.depths)).split(config.depths)
    ]
    prev_chs = config.hidden_sizes[0]
    for i in range(config.num_stages):
        out_chs = config.hidden_sizes[i]
        stage = ConvNextStage(
            config,
            in_channels=prev_chs,
            out_channels=out_chs,
            stride=2 if i > 0 else 1,
            depth=config.depths[i],
            drop_path_rates=drop_path_rates[i],
        )
        self.stages.append(stage)
        prev_chs = out_chs

mindnlp.transformers.models.convnext.modeling_convnext.ConvNextEncoder.forward(hidden_states, output_hidden_states=False, return_dict=True)

Constructs the encoder for the ConvNext model.

PARAMETER DESCRIPTION
self

The instance of the ConvNextEncoder class.

TYPE: ConvNextEncoder

hidden_states

The input hidden states to be processed by the encoder.

TYPE: Tensor

output_hidden_states

Whether to output hidden states for each layer. Defaults to False.

TYPE: Optional[bool] DEFAULT: False

return_dict

Whether to return the output as a dictionary. Defaults to True.

TYPE: Optional[bool] DEFAULT: True

RETURNS DESCRIPTION
Union[Tuple, BaseModelOutputWithNoAttention]

Union[Tuple, BaseModelOutputWithNoAttention]: The output value which can be a tuple of hidden states or BaseModelOutputWithNoAttention object.

Source code in mindnlp/transformers/models/convnext/modeling_convnext.py
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
def forward(
    self,
    hidden_states: mindspore.Tensor,
    output_hidden_states: Optional[bool] = False,
    return_dict: Optional[bool] = True,
) -> Union[Tuple, BaseModelOutputWithNoAttention]:
    """
    Constructs the encoder for the ConvNext model.

    Args:
        self (ConvNextEncoder): The instance of the ConvNextEncoder class.
        hidden_states (mindspore.Tensor): The input hidden states to be processed by the encoder.
        output_hidden_states (Optional[bool], optional): Whether to output hidden states for each layer.
            Defaults to False.
        return_dict (Optional[bool], optional): Whether to return the output as a dictionary. Defaults to True.

    Returns:
        Union[Tuple, BaseModelOutputWithNoAttention]:
            The output value which can be a tuple of hidden states or BaseModelOutputWithNoAttention object.

    Raises:
        None
    """
    all_hidden_states = () if output_hidden_states else None

    for i, layer_module in enumerate(self.stages):
        if output_hidden_states:
            all_hidden_states = all_hidden_states + (hidden_states,)

        hidden_states = layer_module(hidden_states)

    if output_hidden_states:
        all_hidden_states = all_hidden_states + (hidden_states,)

    if not return_dict:
        return tuple(v for v in [hidden_states, all_hidden_states] if v is not None)

    return BaseModelOutputWithNoAttention(
        last_hidden_state=hidden_states,
        hidden_states=all_hidden_states,
    )

mindnlp.transformers.models.convnext.modeling_convnext.ConvNextForImageClassification

Bases: ConvNextPreTrainedModel

ConvNextForImageClassification

This class represents a Convolutional Neural Network (CNN) model for image classification using the ConvNext architecture. The model is designed for tasks such as single-label or multi-label classification and regression. It inherits from the ConvNextPreTrainedModel class.

ATTRIBUTE DESCRIPTION
num_labels

The number of labels in the classification task.

TYPE: int

convnext

The ConvNext model used for feature extraction.

TYPE: ConvNextModel

classifier

The classifier layer for predicting the final output.

TYPE: Linear or Identity

Source code in mindnlp/transformers/models/convnext/modeling_convnext.py
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
class ConvNextForImageClassification(ConvNextPreTrainedModel):

    """ConvNextForImageClassification

    This class represents a Convolutional Neural Network (CNN) model for image classification using the
    ConvNext architecture. The model is designed for tasks such as single-label or multi-label classification
    and regression.
    It inherits from the ConvNextPreTrainedModel class.

    Attributes:
        num_labels (int): The number of labels in the classification task.
        convnext (ConvNextModel): The ConvNext model used for feature extraction.
        classifier (nn.Linear or nn.Identity): The classifier layer for predicting the final output.

    Methods:
        forward(pixel_values, labels, output_hidden_states, return_dict)
            Constructs the ConvNextForImageClassification model.

    """
    def __init__(self, config):
        """
        __init__

        Initializes an instance of the ConvNextForImageClassification class.

        Args:
            self: The instance of the class.
            config:
                An instance of the configuration class containing the necessary parameters for model initialization.

                - Type: config
                - Purpose: To configure the model with specific settings and hyperparameters.
                - Restrictions: Must be an instance of the appropriate configuration class.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__(config)

        self.num_labels = config.num_labels
        self.convnext = ConvNextModel(config)

        # Classifier head
        self.classifier = (
            nn.Linear(config.hidden_sizes[-1], config.num_labels) if config.num_labels > 0 else nn.Identity()
        )

        # Initialize weights and apply final processing
        self.post_init()

    def forward(
        self,
        pixel_values: mindspore.Tensor = None,
        labels: Optional[mindspore.Tensor] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple, ImageClassifierOutputWithNoAttention]:
        r"""
        Args:
            labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
                Labels for computing the image classification/regression loss. Indices should be in `[0, ...,
                config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
                `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        outputs = self.convnext(pixel_values, output_hidden_states=output_hidden_states, return_dict=return_dict)

        pooled_output = outputs.pooler_output if return_dict else outputs[1]

        logits = self.classifier(pooled_output)

        loss = None
        if labels is not None:
            if self.config.problem_type is None:
                if self.num_labels == 1:
                    self.config.problem_type = "regression"
                elif self.num_labels > 1 and labels.dtype in (mindspore.int32, mindspore.int64):
                    self.config.problem_type = "single_label_classification"
                else:
                    self.config.problem_type = "multi_label_classification"

            if self.config.problem_type == "regression":
                if self.num_labels == 1:
                    loss = ops.mse_loss(logits.squeeze(), labels.squeeze())
                else:
                    loss = ops.mse_loss(logits, labels)
            elif self.config.problem_type == "single_label_classification":
                loss = ops.cross_entropy(logits.view(-1, self.num_labels), labels.view(-1))
            elif self.config.problem_type == "multi_label_classification":
                loss = ops.binary_cross_entropy_with_logits(logits, labels)
        if not return_dict:
            output = (logits,) + outputs[2:]
            return ((loss,) + output) if loss is not None else output

        return ImageClassifierOutputWithNoAttention(
            loss=loss,
            logits=logits,
            hidden_states=outputs.hidden_states,
        )

mindnlp.transformers.models.convnext.modeling_convnext.ConvNextForImageClassification.__init__(config)

init

Initializes an instance of the ConvNextForImageClassification class.

PARAMETER DESCRIPTION
self

The instance of the class.

config

An instance of the configuration class containing the necessary parameters for model initialization.

  • Type: config
  • Purpose: To configure the model with specific settings and hyperparameters.
  • Restrictions: Must be an instance of the appropriate configuration class.

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/convnext/modeling_convnext.py
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
def __init__(self, config):
    """
    __init__

    Initializes an instance of the ConvNextForImageClassification class.

    Args:
        self: The instance of the class.
        config:
            An instance of the configuration class containing the necessary parameters for model initialization.

            - Type: config
            - Purpose: To configure the model with specific settings and hyperparameters.
            - Restrictions: Must be an instance of the appropriate configuration class.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__(config)

    self.num_labels = config.num_labels
    self.convnext = ConvNextModel(config)

    # Classifier head
    self.classifier = (
        nn.Linear(config.hidden_sizes[-1], config.num_labels) if config.num_labels > 0 else nn.Identity()
    )

    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.convnext.modeling_convnext.ConvNextForImageClassification.forward(pixel_values=None, labels=None, output_hidden_states=None, return_dict=None)

PARAMETER DESCRIPTION
labels

Labels for computing the image classification/regression loss. Indices should be in [0, ..., config.num_labels - 1]. If config.num_labels == 1 a regression loss is computed (Mean-Square loss), If config.num_labels > 1 a classification loss is computed (Cross-Entropy).

TYPE: `mindspore.Tensor` of shape `(batch_size,)`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/convnext/modeling_convnext.py
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
def forward(
    self,
    pixel_values: mindspore.Tensor = None,
    labels: Optional[mindspore.Tensor] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[Tuple, ImageClassifierOutputWithNoAttention]:
    r"""
    Args:
        labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for computing the image classification/regression loss. Indices should be in `[0, ...,
            config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
            `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    outputs = self.convnext(pixel_values, output_hidden_states=output_hidden_states, return_dict=return_dict)

    pooled_output = outputs.pooler_output if return_dict else outputs[1]

    logits = self.classifier(pooled_output)

    loss = None
    if labels is not None:
        if self.config.problem_type is None:
            if self.num_labels == 1:
                self.config.problem_type = "regression"
            elif self.num_labels > 1 and labels.dtype in (mindspore.int32, mindspore.int64):
                self.config.problem_type = "single_label_classification"
            else:
                self.config.problem_type = "multi_label_classification"

        if self.config.problem_type == "regression":
            if self.num_labels == 1:
                loss = ops.mse_loss(logits.squeeze(), labels.squeeze())
            else:
                loss = ops.mse_loss(logits, labels)
        elif self.config.problem_type == "single_label_classification":
            loss = ops.cross_entropy(logits.view(-1, self.num_labels), labels.view(-1))
        elif self.config.problem_type == "multi_label_classification":
            loss = ops.binary_cross_entropy_with_logits(logits, labels)
    if not return_dict:
        output = (logits,) + outputs[2:]
        return ((loss,) + output) if loss is not None else output

    return ImageClassifierOutputWithNoAttention(
        loss=loss,
        logits=logits,
        hidden_states=outputs.hidden_states,
    )

mindnlp.transformers.models.convnext.modeling_convnext.ConvNextLayer

Bases: Module

This corresponds to the Block class in the original implementation.

There are two equivalent implementations: [DwConv, LayerNorm (channels_first), Conv, GELU,1x1 Conv]; all in (N, C, H, W) (2) [DwConv, Permute to (N, H, W, C), LayerNorm (channels_last), Linear, GELU, Linear]; Permute back

The authors used (2) as they find it slightly faster in PyTorch.

PARAMETER DESCRIPTION
config

Model configuration class.

TYPE: [`ConvNextConfig`]

dim

Number of input channels.

TYPE: `int`

drop_path

Stochastic depth rate. Default: 0.0.

TYPE: `float` DEFAULT: 0

Source code in mindnlp/transformers/models/convnext/modeling_convnext.py
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
class ConvNextLayer(nn.Module):
    """
    This corresponds to the `Block` class in the original implementation.

    There are two equivalent implementations: [DwConv, LayerNorm (channels_first), Conv, GELU,1x1 Conv]; all in (N, C,
    H, W) (2) [DwConv, Permute to (N, H, W, C), LayerNorm (channels_last), Linear, GELU, Linear]; Permute back

    The authors used (2) as they find it slightly faster in PyTorch.

    Args:
        config ([`ConvNextConfig`]): Model configuration class.
        dim (`int`): Number of input channels.
        drop_path (`float`): Stochastic depth rate. Default: 0.0.
    """
    def __init__(self, config, dim, drop_path=0):
        '''
        Initializes the ConvNextLayer.

        Args:
            self: The instance of the class.
            config: An object containing configuration settings.
            dim: An integer representing the dimension for convolution operation.
            drop_path: A float representing the dropout probability for drop path regularization.

        Returns:
            None.

        Raises:
            ValueError: If config.hidden_act is not found in ACT2FN.
            TypeError: If config.layer_scale_init_value is not a positive number.
        '''
        super().__init__()
        self.dwconv = nn.Conv2d(dim, dim, kernel_size=7, padding=3, pad_mode='pad', group=dim, bias=True)  # depthwise conv
        self.layernorm = ConvNextLayerNorm(dim, eps=1e-6)
        self.pwconv1 = nn.Linear(dim, 4 * dim)  # pointwise/1x1 convs, implemented with linear layers
        self.act = ACT2FN[config.hidden_act]
        self.pwconv2 = nn.Linear(4 * dim, dim)
        self.layer_scale_parameter = (
            Parameter(config.layer_scale_init_value * ops.ones((dim)), requires_grad=True)
            if config.layer_scale_init_value > 0
            else None
        )
        self.drop_path = ConvNextDropPath(drop_path) if drop_path > 0.0 else nn.Identity()

    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
        '''
        Construct method in the ConvNextLayer class.

        Args:
            self: ConvNextLayer instance.
            hidden_states (mindspore.Tensor): The input hidden states tensor.

        Returns:
            mindspore.Tensor: The output tensor after applying the convolutional layer operations.

        Raises:
            None.
        '''
        input = hidden_states
        x = self.dwconv(hidden_states)
        x = x.permute(0, 2, 3, 1)  # (N, C, H, W) -> (N, H, W, C)
        x = self.layernorm(x)
        x = self.pwconv1(x)
        x = self.act(x)
        x = self.pwconv2(x)
        if self.layer_scale_parameter is not None:
            x = self.layer_scale_parameter * x
        x = x.permute(0, 3, 1, 2)  # (N, H, W, C) -> (N, C, H, W)

        x = input + self.drop_path(x)
        return x

mindnlp.transformers.models.convnext.modeling_convnext.ConvNextLayer.__init__(config, dim, drop_path=0)

Initializes the ConvNextLayer.

PARAMETER DESCRIPTION
self

The instance of the class.

config

An object containing configuration settings.

dim

An integer representing the dimension for convolution operation.

drop_path

A float representing the dropout probability for drop path regularization.

DEFAULT: 0

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
ValueError

If config.hidden_act is not found in ACT2FN.

TypeError

If config.layer_scale_init_value is not a positive number.

Source code in mindnlp/transformers/models/convnext/modeling_convnext.py
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
def __init__(self, config, dim, drop_path=0):
    '''
    Initializes the ConvNextLayer.

    Args:
        self: The instance of the class.
        config: An object containing configuration settings.
        dim: An integer representing the dimension for convolution operation.
        drop_path: A float representing the dropout probability for drop path regularization.

    Returns:
        None.

    Raises:
        ValueError: If config.hidden_act is not found in ACT2FN.
        TypeError: If config.layer_scale_init_value is not a positive number.
    '''
    super().__init__()
    self.dwconv = nn.Conv2d(dim, dim, kernel_size=7, padding=3, pad_mode='pad', group=dim, bias=True)  # depthwise conv
    self.layernorm = ConvNextLayerNorm(dim, eps=1e-6)
    self.pwconv1 = nn.Linear(dim, 4 * dim)  # pointwise/1x1 convs, implemented with linear layers
    self.act = ACT2FN[config.hidden_act]
    self.pwconv2 = nn.Linear(4 * dim, dim)
    self.layer_scale_parameter = (
        Parameter(config.layer_scale_init_value * ops.ones((dim)), requires_grad=True)
        if config.layer_scale_init_value > 0
        else None
    )
    self.drop_path = ConvNextDropPath(drop_path) if drop_path > 0.0 else nn.Identity()

mindnlp.transformers.models.convnext.modeling_convnext.ConvNextLayer.forward(hidden_states)

Construct method in the ConvNextLayer class.

PARAMETER DESCRIPTION
self

ConvNextLayer instance.

hidden_states

The input hidden states tensor.

TYPE: Tensor

RETURNS DESCRIPTION
Tensor

mindspore.Tensor: The output tensor after applying the convolutional layer operations.

Source code in mindnlp/transformers/models/convnext/modeling_convnext.py
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
    '''
    Construct method in the ConvNextLayer class.

    Args:
        self: ConvNextLayer instance.
        hidden_states (mindspore.Tensor): The input hidden states tensor.

    Returns:
        mindspore.Tensor: The output tensor after applying the convolutional layer operations.

    Raises:
        None.
    '''
    input = hidden_states
    x = self.dwconv(hidden_states)
    x = x.permute(0, 2, 3, 1)  # (N, C, H, W) -> (N, H, W, C)
    x = self.layernorm(x)
    x = self.pwconv1(x)
    x = self.act(x)
    x = self.pwconv2(x)
    if self.layer_scale_parameter is not None:
        x = self.layer_scale_parameter * x
    x = x.permute(0, 3, 1, 2)  # (N, H, W, C) -> (N, C, H, W)

    x = input + self.drop_path(x)
    return x

mindnlp.transformers.models.convnext.modeling_convnext.ConvNextLayerNorm

Bases: Module

LayerNorm that supports two data formats: channels_last (default) or channels_first. The ordering of the dimensions in the inputs. channels_last corresponds to inputs with shape (batch_size, height, width, channels) while channels_first corresponds to inputs with shape (batch_size, channels, height, width).

Source code in mindnlp/transformers/models/convnext/modeling_convnext.py
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
class ConvNextLayerNorm(nn.Module):
    r"""
    LayerNorm that supports two data formats: channels_last (default) or channels_first.
    The ordering of the dimensions in the inputs. channels_last corresponds to inputs with shape (batch_size, height,
    width, channels) while channels_first corresponds to inputs with shape (batch_size, channels, height, width).
    """
    def __init__(self, normalized_shape, eps=1e-6, data_format="channels_last"):
        """
        Initializes an instance of the ConvNextLayerNorm class.

        Args:
            self: The object itself.
            normalized_shape (tuple): The shape of the input tensor normalized over the specified axes.
            eps (float, optional): A small value added to the denominator for numerical stability. Defaults to 1e-06.
            data_format (str, optional): The format of the input data. Must be either 'channels_last' or 'channels_first'.
                Defaults to 'channels_last'.

        Returns:
            None

        Raises:
            NotImplementedError: If the data format is not supported.

        """
        super().__init__()
        self.weight = Parameter(ops.ones(normalized_shape))
        self.bias = Parameter(ops.zeros(normalized_shape))
        self.eps = eps
        self.data_format = data_format
        if self.data_format not in ["channels_last", "channels_first"]:
            raise NotImplementedError(f"Unsupported data format: {self.data_format}")
        self.normalized_shape = (normalized_shape,)
        self.layer_norm = ops.LayerNorm(begin_norm_axis=-1,
                                        begin_params_axis=-1,
                                        epsilon=eps)

    def forward(self, x: mindspore.Tensor) -> mindspore.Tensor:
        """
        Constructs the ConvNextLayerNorm.

        Args:
            self (ConvNextLayerNorm): An instance of the ConvNextLayerNorm class.
            x (mindspore.Tensor): The input tensor to be normalized.

        Returns:
            mindspore.Tensor: The normalized tensor.

        Raises:
            TypeError: If the input tensor is not of type mindspore.Tensor.
            ValueError: If the data format is not 'channels_last' or 'channels_first'.
            ValueError: If the input tensor has an unsupported dtype.
        """
        if self.data_format == "channels_last":
            x, _, _ = self.layer_norm(x, self.weight, self.bias)
        elif self.data_format == "channels_first":
            input_dtype = x.dtype
            x = x.float()
            u = x.mean(1, keep_dims=True)
            s = (x - u).pow(2).mean(1, keep_dims=True)
            x = (x - u) / ops.sqrt(s + self.eps)
            x = x.to(dtype=input_dtype)
            x = self.weight[:, None, None] * x + self.bias[:, None, None]
        return x

mindnlp.transformers.models.convnext.modeling_convnext.ConvNextLayerNorm.__init__(normalized_shape, eps=1e-06, data_format='channels_last')

Initializes an instance of the ConvNextLayerNorm class.

PARAMETER DESCRIPTION
self

The object itself.

normalized_shape

The shape of the input tensor normalized over the specified axes.

TYPE: tuple

eps

A small value added to the denominator for numerical stability. Defaults to 1e-06.

TYPE: float DEFAULT: 1e-06

data_format

The format of the input data. Must be either 'channels_last' or 'channels_first'. Defaults to 'channels_last'.

TYPE: str DEFAULT: 'channels_last'

RETURNS DESCRIPTION

None

RAISES DESCRIPTION
NotImplementedError

If the data format is not supported.

Source code in mindnlp/transformers/models/convnext/modeling_convnext.py
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
def __init__(self, normalized_shape, eps=1e-6, data_format="channels_last"):
    """
    Initializes an instance of the ConvNextLayerNorm class.

    Args:
        self: The object itself.
        normalized_shape (tuple): The shape of the input tensor normalized over the specified axes.
        eps (float, optional): A small value added to the denominator for numerical stability. Defaults to 1e-06.
        data_format (str, optional): The format of the input data. Must be either 'channels_last' or 'channels_first'.
            Defaults to 'channels_last'.

    Returns:
        None

    Raises:
        NotImplementedError: If the data format is not supported.

    """
    super().__init__()
    self.weight = Parameter(ops.ones(normalized_shape))
    self.bias = Parameter(ops.zeros(normalized_shape))
    self.eps = eps
    self.data_format = data_format
    if self.data_format not in ["channels_last", "channels_first"]:
        raise NotImplementedError(f"Unsupported data format: {self.data_format}")
    self.normalized_shape = (normalized_shape,)
    self.layer_norm = ops.LayerNorm(begin_norm_axis=-1,
                                    begin_params_axis=-1,
                                    epsilon=eps)

mindnlp.transformers.models.convnext.modeling_convnext.ConvNextLayerNorm.forward(x)

Constructs the ConvNextLayerNorm.

PARAMETER DESCRIPTION
self

An instance of the ConvNextLayerNorm class.

TYPE: ConvNextLayerNorm

x

The input tensor to be normalized.

TYPE: Tensor

RETURNS DESCRIPTION
Tensor

mindspore.Tensor: The normalized tensor.

RAISES DESCRIPTION
TypeError

If the input tensor is not of type mindspore.Tensor.

ValueError

If the data format is not 'channels_last' or 'channels_first'.

ValueError

If the input tensor has an unsupported dtype.

Source code in mindnlp/transformers/models/convnext/modeling_convnext.py
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
def forward(self, x: mindspore.Tensor) -> mindspore.Tensor:
    """
    Constructs the ConvNextLayerNorm.

    Args:
        self (ConvNextLayerNorm): An instance of the ConvNextLayerNorm class.
        x (mindspore.Tensor): The input tensor to be normalized.

    Returns:
        mindspore.Tensor: The normalized tensor.

    Raises:
        TypeError: If the input tensor is not of type mindspore.Tensor.
        ValueError: If the data format is not 'channels_last' or 'channels_first'.
        ValueError: If the input tensor has an unsupported dtype.
    """
    if self.data_format == "channels_last":
        x, _, _ = self.layer_norm(x, self.weight, self.bias)
    elif self.data_format == "channels_first":
        input_dtype = x.dtype
        x = x.float()
        u = x.mean(1, keep_dims=True)
        s = (x - u).pow(2).mean(1, keep_dims=True)
        x = (x - u) / ops.sqrt(s + self.eps)
        x = x.to(dtype=input_dtype)
        x = self.weight[:, None, None] * x + self.bias[:, None, None]
    return x

mindnlp.transformers.models.convnext.modeling_convnext.ConvNextModel

Bases: ConvNextPreTrainedModel

The ConvNextModel class represents a ConvNext model for image processing tasks. It inherits from ConvNextPreTrainedModel and includes methods for model initialization and forwardion.

The init method initializes the ConvNextModel with the provided configuration. It sets up the embeddings, encoder, and layer normalization based on the configuration parameters.

The forward method processes the input pixel values using the embeddings and encoder, and returns the last hidden state and pooled output. It allows for customization of returning hidden states and outputs as specified in the configuration parameters.

Note

This docstring is based on the provided code snippet and does not include complete signatures or any other code.

Source code in mindnlp/transformers/models/convnext/modeling_convnext.py
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
class ConvNextModel(ConvNextPreTrainedModel):

    """
    The ConvNextModel class represents a ConvNext model for image processing tasks.
    It inherits from ConvNextPreTrainedModel and includes methods for model initialization and forwardion.

    The __init__ method initializes the ConvNextModel with the provided configuration.
    It sets up the embeddings, encoder, and layer normalization based on the configuration parameters.

    The forward method processes the input pixel values using the embeddings and encoder, and returns
    the last hidden state and pooled output. It allows for customization of returning hidden states and outputs
    as specified in the configuration parameters.

    Note:
        This docstring is based on the provided code snippet and does not include complete signatures or any other code.
    """
    def __init__(self, config):
        """
        Initializes a new instance of the ConvNextModel class.

        Args:
            self: The instance of the ConvNextModel class.
            config: A dictionary containing configuration parameters for the model.

        Returns:
            None

        Raises:
            TypeError: If the provided config parameter is not a dictionary.
            ValueError: If the config parameter does not contain the required keys for initializing the model.
            RuntimeError: If an error occurs during the initialization process.
        """
        super().__init__(config)
        self.config = config

        self.embeddings = ConvNextEmbeddings(config)
        self.encoder = ConvNextEncoder(config)

        # final layernorm layer
        self.layernorm = nn.LayerNorm(config.hidden_sizes[-1], eps=config.layer_norm_eps)

        # Initialize weights and apply final processing
        self.post_init()

    def forward(
        self,
        pixel_values: mindspore.Tensor = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple, BaseModelOutputWithPoolingAndNoAttention]:
        """
        Constructs a ConvNextModel by processing the given pixel values.

        Args:
            self (ConvNextModel): The instance of the ConvNextModel class.
            pixel_values (mindspore.Tensor): The input pixel values. It should be a tensor.
            output_hidden_states (Optional[bool]): Whether or not to output hidden states. Defaults to None.
            return_dict (Optional[bool]): Whether or not to use a return dictionary. Defaults to None.

        Returns:
            Union[Tuple, BaseModelOutputWithPoolingAndNoAttention]: The forwarded ConvNextModel output.
                It can be either a tuple or an instance of BaseModelOutputWithPoolingAndNoAttention.

        Raises:
            ValueError: If pixel_values is not specified.

        Note:
            - If output_hidden_states is not provided, it defaults to the value specified in the configuration.
            - If return_dict is not provided, it defaults to the value specified in the configuration.
            - The returned value may contain the last hidden state, pooled output, and additional encoder outputs.

        """
        output_hidden_states = (
            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
        )
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        if pixel_values is None:
            raise ValueError("You have to specify pixel_values")

        embedding_output = self.embeddings(pixel_values)

        encoder_outputs = self.encoder(
            embedding_output,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        last_hidden_state = encoder_outputs[0]

        # global average pooling, (N, C, H, W) -> (N, C)
        pooled_output = self.layernorm(last_hidden_state.mean([-2, -1]))

        if not return_dict:
            return (last_hidden_state, pooled_output) + encoder_outputs[1:]

        return BaseModelOutputWithPoolingAndNoAttention(
            last_hidden_state=last_hidden_state,
            pooler_output=pooled_output,
            hidden_states=encoder_outputs.hidden_states,
        )

mindnlp.transformers.models.convnext.modeling_convnext.ConvNextModel.__init__(config)

Initializes a new instance of the ConvNextModel class.

PARAMETER DESCRIPTION
self

The instance of the ConvNextModel class.

config

A dictionary containing configuration parameters for the model.

RETURNS DESCRIPTION

None

RAISES DESCRIPTION
TypeError

If the provided config parameter is not a dictionary.

ValueError

If the config parameter does not contain the required keys for initializing the model.

RuntimeError

If an error occurs during the initialization process.

Source code in mindnlp/transformers/models/convnext/modeling_convnext.py
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
def __init__(self, config):
    """
    Initializes a new instance of the ConvNextModel class.

    Args:
        self: The instance of the ConvNextModel class.
        config: A dictionary containing configuration parameters for the model.

    Returns:
        None

    Raises:
        TypeError: If the provided config parameter is not a dictionary.
        ValueError: If the config parameter does not contain the required keys for initializing the model.
        RuntimeError: If an error occurs during the initialization process.
    """
    super().__init__(config)
    self.config = config

    self.embeddings = ConvNextEmbeddings(config)
    self.encoder = ConvNextEncoder(config)

    # final layernorm layer
    self.layernorm = nn.LayerNorm(config.hidden_sizes[-1], eps=config.layer_norm_eps)

    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.convnext.modeling_convnext.ConvNextModel.forward(pixel_values=None, output_hidden_states=None, return_dict=None)

Constructs a ConvNextModel by processing the given pixel values.

PARAMETER DESCRIPTION
self

The instance of the ConvNextModel class.

TYPE: ConvNextModel

pixel_values

The input pixel values. It should be a tensor.

TYPE: Tensor DEFAULT: None

output_hidden_states

Whether or not to output hidden states. Defaults to None.

TYPE: Optional[bool] DEFAULT: None

return_dict

Whether or not to use a return dictionary. Defaults to None.

TYPE: Optional[bool] DEFAULT: None

RETURNS DESCRIPTION
Union[Tuple, BaseModelOutputWithPoolingAndNoAttention]

Union[Tuple, BaseModelOutputWithPoolingAndNoAttention]: The forwarded ConvNextModel output. It can be either a tuple or an instance of BaseModelOutputWithPoolingAndNoAttention.

RAISES DESCRIPTION
ValueError

If pixel_values is not specified.

Note
  • If output_hidden_states is not provided, it defaults to the value specified in the configuration.
  • If return_dict is not provided, it defaults to the value specified in the configuration.
  • The returned value may contain the last hidden state, pooled output, and additional encoder outputs.
Source code in mindnlp/transformers/models/convnext/modeling_convnext.py
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
def forward(
    self,
    pixel_values: mindspore.Tensor = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[Tuple, BaseModelOutputWithPoolingAndNoAttention]:
    """
    Constructs a ConvNextModel by processing the given pixel values.

    Args:
        self (ConvNextModel): The instance of the ConvNextModel class.
        pixel_values (mindspore.Tensor): The input pixel values. It should be a tensor.
        output_hidden_states (Optional[bool]): Whether or not to output hidden states. Defaults to None.
        return_dict (Optional[bool]): Whether or not to use a return dictionary. Defaults to None.

    Returns:
        Union[Tuple, BaseModelOutputWithPoolingAndNoAttention]: The forwarded ConvNextModel output.
            It can be either a tuple or an instance of BaseModelOutputWithPoolingAndNoAttention.

    Raises:
        ValueError: If pixel_values is not specified.

    Note:
        - If output_hidden_states is not provided, it defaults to the value specified in the configuration.
        - If return_dict is not provided, it defaults to the value specified in the configuration.
        - The returned value may contain the last hidden state, pooled output, and additional encoder outputs.

    """
    output_hidden_states = (
        output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
    )
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    if pixel_values is None:
        raise ValueError("You have to specify pixel_values")

    embedding_output = self.embeddings(pixel_values)

    encoder_outputs = self.encoder(
        embedding_output,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    last_hidden_state = encoder_outputs[0]

    # global average pooling, (N, C, H, W) -> (N, C)
    pooled_output = self.layernorm(last_hidden_state.mean([-2, -1]))

    if not return_dict:
        return (last_hidden_state, pooled_output) + encoder_outputs[1:]

    return BaseModelOutputWithPoolingAndNoAttention(
        last_hidden_state=last_hidden_state,
        pooler_output=pooled_output,
        hidden_states=encoder_outputs.hidden_states,
    )

mindnlp.transformers.models.convnext.modeling_convnext.ConvNextPreTrainedModel

Bases: PreTrainedModel

An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained models.

Source code in mindnlp/transformers/models/convnext/modeling_convnext.py
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
class ConvNextPreTrainedModel(PreTrainedModel):
    """
    An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
    models.
    """
    config_class = ConvNextConfig
    base_model_prefix = "convnext"
    main_input_name = "pixel_values"
    _no_split_modules = ["ConvNextLayer"]

    def _init_weights(self, module):
        """Initialize the weights"""
        if isinstance(module, (nn.Linear, nn.Conv2d)):
            # Slightly different from the TF version which uses truncated_normal for initialization
            # cf https://github.com/pytorch/pytorch/pull/5617
            module.weight.initialize(Normal(self.config.initializer_range))
            if module.bias is not None:
                module.bias.initialize('zeros')
        elif isinstance(module, nn.LayerNorm):
            module.bias.initialize('zeros')
            module.weight.initialize('ones')

mindnlp.transformers.models.convnext.modeling_convnext.ConvNextStage

Bases: Module

ConvNeXT stage, consisting of an optional downsampling layer + multiple residual blocks.

PARAMETER DESCRIPTION
config

Model configuration class.

TYPE: [`ConvNextConfig`]

in_channels

Number of input channels.

TYPE: `int`

out_channels

Number of output channels.

TYPE: `int`

depth

Number of residual blocks.

TYPE: `int` DEFAULT: 2

drop_path_rates(`List[float]`)

Stochastic depth rates for each layer.

Source code in mindnlp/transformers/models/convnext/modeling_convnext.py
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
class ConvNextStage(nn.Module):
    """ConvNeXT stage, consisting of an optional downsampling layer + multiple residual blocks.

    Args:
        config ([`ConvNextConfig`]): Model configuration class.
        in_channels (`int`): Number of input channels.
        out_channels (`int`): Number of output channels.
        depth (`int`): Number of residual blocks.
        drop_path_rates(`List[float]`): Stochastic depth rates for each layer.
    """
    def __init__(self, config, in_channels, out_channels, kernel_size=2, stride=2, depth=2, drop_path_rates=None):
        """
        Initializes a ConvNextStage object with the provided configuration.

        Args:
            self (ConvNextStage): The ConvNextStage object itself.
            config (any): The configuration settings for the ConvNextStage.
            in_channels (int): The number of input channels.
            out_channels (int): The number of output channels.
            kernel_size (int, optional): The size of the convolutional kernel. Defaults to 2.
            stride (int, optional): The stride of the convolution operation. Defaults to 2.
            depth (int): The depth of the ConvNextStage.
            drop_path_rates (list, optional): A list of dropout rates for each layer in the stage. Defaults to None.

        Returns:
            None.

        Raises:
            ValueError: If in_channels is not equal to out_channels or stride is greater than 1.
            TypeError: If drop_path_rates is not a list.
        """
        super().__init__()

        if in_channels != out_channels or stride > 1:
            self.downsampling_layer = nn.SequentialCell(
                ConvNextLayerNorm(in_channels, eps=1e-6, data_format="channels_first"),
                nn.Conv2d(in_channels, out_channels, kernel_size=kernel_size, stride=stride, pad_mode='valid', bias=True),
            )
        else:
            self.downsampling_layer = nn.Identity()
        drop_path_rates = drop_path_rates or [0.0] * depth
        self.layers = nn.SequentialCell(
            *[ConvNextLayer(config, dim=out_channels, drop_path=drop_path_rates[j]) for j in range(depth)]
        )

    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
        """
        Constructs the next stage of a convolutional neural network.

        Args:
            self (ConvNextStage): An instance of the ConvNextStage class.
            hidden_states (mindspore.Tensor): The input tensor representing the hidden states.

        Returns:
            mindspore.Tensor: The tensor representing the output hidden states after the next stage.

        Raises:
            None.
        """
        hidden_states = self.downsampling_layer(hidden_states)
        hidden_states = self.layers(hidden_states)
        return hidden_states

mindnlp.transformers.models.convnext.modeling_convnext.ConvNextStage.__init__(config, in_channels, out_channels, kernel_size=2, stride=2, depth=2, drop_path_rates=None)

Initializes a ConvNextStage object with the provided configuration.

PARAMETER DESCRIPTION
self

The ConvNextStage object itself.

TYPE: ConvNextStage

config

The configuration settings for the ConvNextStage.

TYPE: any

in_channels

The number of input channels.

TYPE: int

out_channels

The number of output channels.

TYPE: int

kernel_size

The size of the convolutional kernel. Defaults to 2.

TYPE: int DEFAULT: 2

stride

The stride of the convolution operation. Defaults to 2.

TYPE: int DEFAULT: 2

depth

The depth of the ConvNextStage.

TYPE: int DEFAULT: 2

drop_path_rates

A list of dropout rates for each layer in the stage. Defaults to None.

TYPE: list DEFAULT: None

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
ValueError

If in_channels is not equal to out_channels or stride is greater than 1.

TypeError

If drop_path_rates is not a list.

Source code in mindnlp/transformers/models/convnext/modeling_convnext.py
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
def __init__(self, config, in_channels, out_channels, kernel_size=2, stride=2, depth=2, drop_path_rates=None):
    """
    Initializes a ConvNextStage object with the provided configuration.

    Args:
        self (ConvNextStage): The ConvNextStage object itself.
        config (any): The configuration settings for the ConvNextStage.
        in_channels (int): The number of input channels.
        out_channels (int): The number of output channels.
        kernel_size (int, optional): The size of the convolutional kernel. Defaults to 2.
        stride (int, optional): The stride of the convolution operation. Defaults to 2.
        depth (int): The depth of the ConvNextStage.
        drop_path_rates (list, optional): A list of dropout rates for each layer in the stage. Defaults to None.

    Returns:
        None.

    Raises:
        ValueError: If in_channels is not equal to out_channels or stride is greater than 1.
        TypeError: If drop_path_rates is not a list.
    """
    super().__init__()

    if in_channels != out_channels or stride > 1:
        self.downsampling_layer = nn.SequentialCell(
            ConvNextLayerNorm(in_channels, eps=1e-6, data_format="channels_first"),
            nn.Conv2d(in_channels, out_channels, kernel_size=kernel_size, stride=stride, pad_mode='valid', bias=True),
        )
    else:
        self.downsampling_layer = nn.Identity()
    drop_path_rates = drop_path_rates or [0.0] * depth
    self.layers = nn.SequentialCell(
        *[ConvNextLayer(config, dim=out_channels, drop_path=drop_path_rates[j]) for j in range(depth)]
    )

mindnlp.transformers.models.convnext.modeling_convnext.ConvNextStage.forward(hidden_states)

Constructs the next stage of a convolutional neural network.

PARAMETER DESCRIPTION
self

An instance of the ConvNextStage class.

TYPE: ConvNextStage

hidden_states

The input tensor representing the hidden states.

TYPE: Tensor

RETURNS DESCRIPTION
Tensor

mindspore.Tensor: The tensor representing the output hidden states after the next stage.

Source code in mindnlp/transformers/models/convnext/modeling_convnext.py
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
    """
    Constructs the next stage of a convolutional neural network.

    Args:
        self (ConvNextStage): An instance of the ConvNextStage class.
        hidden_states (mindspore.Tensor): The input tensor representing the hidden states.

    Returns:
        mindspore.Tensor: The tensor representing the output hidden states after the next stage.

    Raises:
        None.
    """
    hidden_states = self.downsampling_layer(hidden_states)
    hidden_states = self.layers(hidden_states)
    return hidden_states

mindnlp.transformers.models.convnext.modeling_convnext.drop_path(input, drop_prob=0.0, training=False)

Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).

Comment by Ross Wightman: This is the same as the DropConnect impl I created for EfficientNet, etc networks, however, the original name is misleading as 'Drop Connect' is a different form of dropout in a separate paper... See discussion: https://github.com/tensorflow/tpu/issues/494#issuecomment-532968956 ... I've opted for changing the layer and argument names to 'drop path' rather than mix DropConnect as a layer name and use 'survival rate' as the argument.

Source code in mindnlp/transformers/models/convnext/modeling_convnext.py
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
def drop_path(input: mindspore.Tensor, drop_prob: float = 0.0, training: bool = False) -> mindspore.Tensor:
    """
    Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).

    Comment by Ross Wightman: This is the same as the DropConnect impl I created for EfficientNet, etc networks,
    however, the original name is misleading as 'Drop Connect' is a different form of dropout in a separate paper...
    See discussion: https://github.com/tensorflow/tpu/issues/494#issuecomment-532968956 ... I've opted for changing the
    layer and argument names to 'drop path' rather than mix DropConnect as a layer name and use 'survival rate' as the
    argument.
    """
    if drop_prob == 0.0 or not training:
        return input
    keep_prob = 1 - drop_prob
    shape = (input.shape[0],) + (1,) * (input.ndim - 1)  # work with diff dim tensors, not just 2D ConvNets
    random_tensor = keep_prob + ops.rand(shape, dtype=input.dtype)
    random_tensor = random_tensor.floor()  # binarize
    output = input.div(keep_prob) * random_tensor
    return output

mindnlp.transformers.models.convnext.image_processing_convnext

Image processor class for ConvNeXT.

mindnlp.transformers.models.convnext.image_processing_convnext.ConvNextImageProcessor

Bases: BaseImageProcessor

Constructs a ConvNeXT image processor.

PARAMETER DESCRIPTION
do_resize

Controls whether to resize the image's (height, width) dimensions to the specified size. Can be overriden by do_resize in the preprocess method.

TYPE: `bool`, *optional*, defaults to `True` DEFAULT: True

size

Resolution of the output image after resize is applied. If size["shortest_edge"] >= 384, the image is resized to (size["shortest_edge"], size["shortest_edge"]). Otherwise, the smaller edge of the image will be matched to int(size["shortest_edge"]/crop_pct), after which the image is cropped to (size["shortest_edge"], size["shortest_edge"]). Only has an effect if do_resize is set to True. Can be overriden by size in the preprocess method.

TYPE: `Dict[str, int]` *optional*, defaults to `{"shortest_edge" -- 384}` DEFAULT: None

crop_pct

Percentage of the image to crop. Only has an effect if do_resize is True and size < 384. Can be overriden by crop_pct in the preprocess method.

TYPE: `float` *optional*, defaults to 224 / 256 DEFAULT: None

resample

Resampling filter to use if resizing the image. Can be overriden by resample in the preprocess method.

TYPE: `PILImageResampling`, *optional*, defaults to `Resampling.BILINEAR` DEFAULT: BILINEAR

do_rescale

Whether to rescale the image by the specified scale rescale_factor. Can be overriden by do_rescale in the preprocess method.

TYPE: `bool`, *optional*, defaults to `True` DEFAULT: True

rescale_factor

Scale factor to use if rescaling the image. Can be overriden by rescale_factor in the preprocess method.

TYPE: `int` or `float`, *optional*, defaults to `1/255` DEFAULT: 1 / 255

do_normalize

Whether to normalize the image. Can be overridden by the do_normalize parameter in the preprocess method.

TYPE: `bool`, *optional*, defaults to `True` DEFAULT: True

image_mean

Mean to use if normalizing the image. This is a float or list of floats the length of the number of channels in the image. Can be overridden by the image_mean parameter in the preprocess method.

TYPE: `float` or `List[float]`, *optional*, defaults to `IMAGENET_STANDARD_MEAN` DEFAULT: None

image_std

Standard deviation to use if normalizing the image. This is a float or list of floats the length of the number of channels in the image. Can be overridden by the image_std parameter in the preprocess method.

TYPE: `float` or `List[float]`, *optional*, defaults to `IMAGENET_STANDARD_STD` DEFAULT: None

Source code in mindnlp/transformers/models/convnext/image_processing_convnext.py
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
class ConvNextImageProcessor(BaseImageProcessor):
    r"""
    Constructs a ConvNeXT image processor.

    Args:
        do_resize (`bool`, *optional*, defaults to `True`):
            Controls whether to resize the image's (height, width) dimensions to the specified `size`. Can be overriden
            by `do_resize` in the `preprocess` method.
        size (`Dict[str, int]` *optional*, defaults to `{"shortest_edge" -- 384}`):
            Resolution of the output image after `resize` is applied. If `size["shortest_edge"]` >= 384, the image is
            resized to `(size["shortest_edge"], size["shortest_edge"])`. Otherwise, the smaller edge of the image will
            be matched to `int(size["shortest_edge"]/crop_pct)`, after which the image is cropped to
            `(size["shortest_edge"], size["shortest_edge"])`. Only has an effect if `do_resize` is set to `True`. Can
            be overriden by `size` in the `preprocess` method.
        crop_pct (`float` *optional*, defaults to 224 / 256):
            Percentage of the image to crop. Only has an effect if `do_resize` is `True` and size < 384. Can be
            overriden by `crop_pct` in the `preprocess` method.
        resample (`PILImageResampling`, *optional*, defaults to `Resampling.BILINEAR`):
            Resampling filter to use if resizing the image. Can be overriden by `resample` in the `preprocess` method.
        do_rescale (`bool`, *optional*, defaults to `True`):
            Whether to rescale the image by the specified scale `rescale_factor`. Can be overriden by `do_rescale` in
            the `preprocess` method.
        rescale_factor (`int` or `float`, *optional*, defaults to `1/255`):
            Scale factor to use if rescaling the image. Can be overriden by `rescale_factor` in the `preprocess`
            method.
        do_normalize (`bool`, *optional*, defaults to `True`):
            Whether to normalize the image. Can be overridden by the `do_normalize` parameter in the `preprocess`
            method.
        image_mean (`float` or `List[float]`, *optional*, defaults to `IMAGENET_STANDARD_MEAN`):
            Mean to use if normalizing the image. This is a float or list of floats the length of the number of
            channels in the image. Can be overridden by the `image_mean` parameter in the `preprocess` method.
        image_std (`float` or `List[float]`, *optional*, defaults to `IMAGENET_STANDARD_STD`):
            Standard deviation to use if normalizing the image. This is a float or list of floats the length of the
            number of channels in the image. Can be overridden by the `image_std` parameter in the `preprocess` method.
    """
    model_input_names = ["pixel_values"]

    def __init__(
        self,
        do_resize: bool = True,
        size: Dict[str, int] = None,
        crop_pct: float = None,
        resample: PILImageResampling = PILImageResampling.BILINEAR,
        do_rescale: bool = True,
        rescale_factor: Union[int, float] = 1 / 255,
        do_normalize: bool = True,
        image_mean: Optional[Union[float, List[float]]] = None,
        image_std: Optional[Union[float, List[float]]] = None,
        **kwargs,
    ) -> None:
        """
        Initialize a ConvNextImageProcessor object.

        Args:
            self (object): The instance of the ConvNextImageProcessor class.
            do_resize (bool): A flag indicating whether to resize the input image. Default is True.
            size (Dict[str, int]): A dictionary specifying the size of the output image. Default is {'shortest_edge': 384}.
            crop_pct (float): The percentage of the image to be cropped. Default is 224 / 256.
            resample (PILImageResampling): The resampling method for image resizing. Default is PILImageResampling.BILINEAR.
            do_rescale (bool): A flag indicating whether to rescale the image. Default is True.
            rescale_factor (Union[int, float]): The factor by which to rescale the image. Default is 1 / 255.
            do_normalize (bool): A flag indicating whether to normalize the image. Default is True.
            image_mean (Optional[Union[float, List[float]]): The mean values for image normalization.
                Default is IMAGENET_STANDARD_MEAN.
            image_std (Optional[Union[float, List[float]]): The standard deviation values for image normalization.
                Default is IMAGENET_STANDARD_STD.

        Returns:
            None.

        Raises:
            None
        """
        super().__init__(**kwargs)
        size = size if size is not None else {"shortest_edge": 384}
        size = get_size_dict(size, default_to_square=False)

        self.do_resize = do_resize
        self.size = size
        # Default value set here for backwards compatibility where the value in config is None
        self.crop_pct = crop_pct if crop_pct is not None else 224 / 256
        self.resample = resample
        self.do_rescale = do_rescale
        self.rescale_factor = rescale_factor
        self.do_normalize = do_normalize
        self.image_mean = image_mean if image_mean is not None else IMAGENET_STANDARD_MEAN
        self.image_std = image_std if image_std is not None else IMAGENET_STANDARD_STD
        self._valid_processor_keys = [
            "images",
            "do_resize",
            "size",
            "crop_pct",
            "resample",
            "do_rescale",
            "rescale_factor",
            "do_normalize",
            "image_mean",
            "image_std",
            "return_tensors",
            "data_format",
            "input_data_format",
        ]

    def resize(
        self,
        image: np.ndarray,
        size: Dict[str, int],
        crop_pct: float,
        resample: PILImageResampling = PILImageResampling.BICUBIC,
        data_format: Optional[Union[str, ChannelDimension]] = None,
        input_data_format: Optional[Union[str, ChannelDimension]] = None,
        **kwargs,
    ) -> np.ndarray:
        """
        Resize an image.

        Args:
            image (`np.ndarray`):
                Image to resize.
            size (`Dict[str, int]`):
                Dictionary of the form `{"shortest_edge": int}`, specifying the size of the output image. If
                `size["shortest_edge"]` >= 384 image is resized to `(size["shortest_edge"], size["shortest_edge"])`.
                Otherwise, the smaller edge of the image will be matched to `int(size["shortest_edge"] / crop_pct)`,
                after which the image is cropped to `(size["shortest_edge"], size["shortest_edge"])`.
            crop_pct (`float`):
                Percentage of the image to crop. Only has an effect if size < 384.
            resample (`PILImageResampling`, *optional*, defaults to `PILImageResampling.BICUBIC`):
                Resampling filter to use when resizing the image.
            data_format (`str` or `ChannelDimension`, *optional*):
                The channel dimension format of the image. If not provided, it will be the same as the input image.
            input_data_format (`ChannelDimension` or `str`, *optional*):
                The channel dimension format of the input image. If not provided, it will be inferred from the input
                image.
        """
        size = get_size_dict(size, default_to_square=False)
        if "shortest_edge" not in size:
            raise ValueError(f"Size dictionary must contain 'shortest_edge' key. Got {size.keys()}")
        shortest_edge = size["shortest_edge"]

        if shortest_edge < 384:
            # maintain same ratio, resizing shortest edge to shortest_edge/crop_pct
            resize_shortest_edge = int(shortest_edge / crop_pct)
            resize_size = get_resize_output_image_size(
                image, size=resize_shortest_edge, default_to_square=False, input_data_format=input_data_format
            )
            image = resize(
                image=image,
                size=resize_size,
                resample=resample,
                data_format=data_format,
                input_data_format=input_data_format,
                **kwargs,
            )
            # then crop to (shortest_edge, shortest_edge)
            return center_crop(
                image=image,
                size=(shortest_edge, shortest_edge),
                data_format=data_format,
                input_data_format=input_data_format,
                **kwargs,
            )
        else:
            # warping (no cropping) when evaluated at 384 or larger
            return resize(
                image,
                size=(shortest_edge, shortest_edge),
                resample=resample,
                data_format=data_format,
                input_data_format=input_data_format,
                **kwargs,
            )

    def preprocess(
        self,
        images: ImageInput,
        do_resize: bool = None,
        size: Dict[str, int] = None,
        crop_pct: float = None,
        resample: PILImageResampling = None,
        do_rescale: bool = None,
        rescale_factor: float = None,
        do_normalize: bool = None,
        image_mean: Optional[Union[float, List[float]]] = None,
        image_std: Optional[Union[float, List[float]]] = None,
        return_tensors: Optional[Union[str, TensorType]] = None,
        data_format: ChannelDimension = ChannelDimension.FIRST,
        input_data_format: Optional[Union[str, ChannelDimension]] = None,
        **kwargs,
    ) -> PIL.Image.Image:
        """
        Preprocess an image or batch of images.

        Args:
            images (`ImageInput`):
                Image to preprocess. Expects a single or batch of images with pixel values ranging from 0 to 255. If
                passing in images with pixel values between 0 and 1, set `do_rescale=False`.
            do_resize (`bool`, *optional*, defaults to `self.do_resize`):
                Whether to resize the image.
            size (`Dict[str, int]`, *optional*, defaults to `self.size`):
                Size of the output image after `resize` has been applied. If `size["shortest_edge"]` >= 384, the image
                is resized to `(size["shortest_edge"], size["shortest_edge"])`. Otherwise, the smaller edge of the
                image will be matched to `int(size["shortest_edge"]/ crop_pct)`, after which the image is cropped to
                `(size["shortest_edge"], size["shortest_edge"])`. Only has an effect if `do_resize` is set to `True`.
            crop_pct (`float`, *optional*, defaults to `self.crop_pct`):
                Percentage of the image to crop if size < 384.
            resample (`int`, *optional*, defaults to `self.resample`):
                Resampling filter to use if resizing the image. This can be one of `PILImageResampling`, filters. Only
                has an effect if `do_resize` is set to `True`.
            do_rescale (`bool`, *optional*, defaults to `self.do_rescale`):
                Whether to rescale the image values between [0 - 1].
            rescale_factor (`float`, *optional*, defaults to `self.rescale_factor`):
                Rescale factor to rescale the image by if `do_rescale` is set to `True`.
            do_normalize (`bool`, *optional*, defaults to `self.do_normalize`):
                Whether to normalize the image.
            image_mean (`float` or `List[float]`, *optional*, defaults to `self.image_mean`):
                Image mean.
            image_std (`float` or `List[float]`, *optional*, defaults to `self.image_std`):
                Image standard deviation.
            return_tensors (`str` or `TensorType`, *optional*):
                The type of tensors to return. Can be one of:

                - Unset: Return a list of `np.ndarray`.
                - `TensorType.TENSORFLOW` or `'tf'`: Return a batch of type `tf.Tensor`.
                - `TensorType.PYTORCH` or `'pt'`: Return a batch of type `torch.Tensor`.
                - `TensorType.NUMPY` or `'np'`: Return a batch of type `np.ndarray`.
                - `TensorType.JAX` or `'jax'`: Return a batch of type `jax.numpy.ndarray`.
            data_format (`ChannelDimension` or `str`, *optional*, defaults to `ChannelDimension.FIRST`):
                The channel dimension format for the output image. Can be one of:

                - `"channels_first"` or `ChannelDimension.FIRST`: image in (num_channels, height, width) format.
                - `"channels_last"` or `ChannelDimension.LAST`: image in (height, width, num_channels) format.
                - Unset: Use the channel dimension format of the input image.
            input_data_format (`ChannelDimension` or `str`, *optional*):
                The channel dimension format for the input image. If unset, the channel dimension format is inferred
                from the input image. Can be one of:

                - `"channels_first"` or `ChannelDimension.FIRST`: image in (num_channels, height, width) format.
                - `"channels_last"` or `ChannelDimension.LAST`: image in (height, width, num_channels) format.
                - `"none"` or `ChannelDimension.NONE`: image in (height, width) format.
        """
        do_resize = do_resize if do_resize is not None else self.do_resize
        crop_pct = crop_pct if crop_pct is not None else self.crop_pct
        resample = resample if resample is not None else self.resample
        do_rescale = do_rescale if do_rescale is not None else self.do_rescale
        rescale_factor = rescale_factor if rescale_factor is not None else self.rescale_factor
        do_normalize = do_normalize if do_normalize is not None else self.do_normalize
        image_mean = image_mean if image_mean is not None else self.image_mean
        image_std = image_std if image_std is not None else self.image_std

        size = size if size is not None else self.size
        size = get_size_dict(size, default_to_square=False)

        validate_kwargs(captured_kwargs=kwargs.keys(), valid_processor_keys=self._valid_processor_keys)

        images = make_list_of_images(images)

        if not valid_images(images):
            raise ValueError(
                "Invalid image type. Must be of type PIL.Image.Image, numpy.ndarray, "
                "torch.Tensor, tf.Tensor or jax.ndarray."
            )

        validate_preprocess_arguments(
            do_rescale=do_rescale,
            rescale_factor=rescale_factor,
            do_normalize=do_normalize,
            image_mean=image_mean,
            image_std=image_std,
            do_resize=do_resize,
            size=size,
            resample=resample,
        )

        # All transformations expect numpy arrays.
        images = [to_numpy_array(image) for image in images]

        if is_scaled_image(images[0]) and do_rescale:
            logger.warning_once(
                "It looks like you are trying to rescale already rescaled images. If the input"
                " images have pixel values between 0 and 1, set `do_rescale=False` to avoid rescaling them again."
            )

        if input_data_format is None:
            # We assume that all images have the same channel dimension format.
            input_data_format = infer_channel_dimension_format(images[0])

        if do_resize:
            images = [
                self.resize(
                    image=image, size=size, crop_pct=crop_pct, resample=resample, input_data_format=input_data_format
                )
                for image in images
            ]

        if do_rescale:
            images = [
                self.rescale(image=image, scale=rescale_factor, input_data_format=input_data_format)
                for image in images
            ]

        if do_normalize:
            images = [
                self.normalize(image=image, mean=image_mean, std=image_std, input_data_format=input_data_format)
                for image in images
            ]

        images = [
            to_channel_dimension_format(image, data_format, input_channel_dim=input_data_format) for image in images
        ]

        data = {"pixel_values": images}
        return BatchFeature(data=data, tensor_type=return_tensors)

mindnlp.transformers.models.convnext.image_processing_convnext.ConvNextImageProcessor.__init__(do_resize=True, size=None, crop_pct=None, resample=PILImageResampling.BILINEAR, do_rescale=True, rescale_factor=1 / 255, do_normalize=True, image_mean=None, image_std=None, **kwargs)

Initialize a ConvNextImageProcessor object.

PARAMETER DESCRIPTION
self

The instance of the ConvNextImageProcessor class.

TYPE: object

do_resize

A flag indicating whether to resize the input image. Default is True.

TYPE: bool DEFAULT: True

size

A dictionary specifying the size of the output image. Default is {'shortest_edge': 384}.

TYPE: Dict[str, int] DEFAULT: None

crop_pct

The percentage of the image to be cropped. Default is 224 / 256.

TYPE: float DEFAULT: None

resample

The resampling method for image resizing. Default is PILImageResampling.BILINEAR.

TYPE: PILImageResampling DEFAULT: BILINEAR

do_rescale

A flag indicating whether to rescale the image. Default is True.

TYPE: bool DEFAULT: True

rescale_factor

The factor by which to rescale the image. Default is 1 / 255.

TYPE: Union[int, float] DEFAULT: 1 / 255

do_normalize

A flag indicating whether to normalize the image. Default is True.

TYPE: bool DEFAULT: True

image_mean

The mean values for image normalization. Default is IMAGENET_STANDARD_MEAN.

TYPE: Optional[Union[float, List[float]] DEFAULT: None

image_std

The standard deviation values for image normalization. Default is IMAGENET_STANDARD_STD.

TYPE: Optional[Union[float, List[float]] DEFAULT: None

RETURNS DESCRIPTION
None

None.

Source code in mindnlp/transformers/models/convnext/image_processing_convnext.py
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
def __init__(
    self,
    do_resize: bool = True,
    size: Dict[str, int] = None,
    crop_pct: float = None,
    resample: PILImageResampling = PILImageResampling.BILINEAR,
    do_rescale: bool = True,
    rescale_factor: Union[int, float] = 1 / 255,
    do_normalize: bool = True,
    image_mean: Optional[Union[float, List[float]]] = None,
    image_std: Optional[Union[float, List[float]]] = None,
    **kwargs,
) -> None:
    """
    Initialize a ConvNextImageProcessor object.

    Args:
        self (object): The instance of the ConvNextImageProcessor class.
        do_resize (bool): A flag indicating whether to resize the input image. Default is True.
        size (Dict[str, int]): A dictionary specifying the size of the output image. Default is {'shortest_edge': 384}.
        crop_pct (float): The percentage of the image to be cropped. Default is 224 / 256.
        resample (PILImageResampling): The resampling method for image resizing. Default is PILImageResampling.BILINEAR.
        do_rescale (bool): A flag indicating whether to rescale the image. Default is True.
        rescale_factor (Union[int, float]): The factor by which to rescale the image. Default is 1 / 255.
        do_normalize (bool): A flag indicating whether to normalize the image. Default is True.
        image_mean (Optional[Union[float, List[float]]): The mean values for image normalization.
            Default is IMAGENET_STANDARD_MEAN.
        image_std (Optional[Union[float, List[float]]): The standard deviation values for image normalization.
            Default is IMAGENET_STANDARD_STD.

    Returns:
        None.

    Raises:
        None
    """
    super().__init__(**kwargs)
    size = size if size is not None else {"shortest_edge": 384}
    size = get_size_dict(size, default_to_square=False)

    self.do_resize = do_resize
    self.size = size
    # Default value set here for backwards compatibility where the value in config is None
    self.crop_pct = crop_pct if crop_pct is not None else 224 / 256
    self.resample = resample
    self.do_rescale = do_rescale
    self.rescale_factor = rescale_factor
    self.do_normalize = do_normalize
    self.image_mean = image_mean if image_mean is not None else IMAGENET_STANDARD_MEAN
    self.image_std = image_std if image_std is not None else IMAGENET_STANDARD_STD
    self._valid_processor_keys = [
        "images",
        "do_resize",
        "size",
        "crop_pct",
        "resample",
        "do_rescale",
        "rescale_factor",
        "do_normalize",
        "image_mean",
        "image_std",
        "return_tensors",
        "data_format",
        "input_data_format",
    ]

mindnlp.transformers.models.convnext.image_processing_convnext.ConvNextImageProcessor.preprocess(images, do_resize=None, size=None, crop_pct=None, resample=None, do_rescale=None, rescale_factor=None, do_normalize=None, image_mean=None, image_std=None, return_tensors=None, data_format=ChannelDimension.FIRST, input_data_format=None, **kwargs)

Preprocess an image or batch of images.

PARAMETER DESCRIPTION
images

Image to preprocess. Expects a single or batch of images with pixel values ranging from 0 to 255. If passing in images with pixel values between 0 and 1, set do_rescale=False.

TYPE: `ImageInput`

do_resize

Whether to resize the image.

TYPE: `bool`, *optional*, defaults to `self.do_resize` DEFAULT: None

size

Size of the output image after resize has been applied. If size["shortest_edge"] >= 384, the image is resized to (size["shortest_edge"], size["shortest_edge"]). Otherwise, the smaller edge of the image will be matched to int(size["shortest_edge"]/ crop_pct), after which the image is cropped to (size["shortest_edge"], size["shortest_edge"]). Only has an effect if do_resize is set to True.

TYPE: `Dict[str, int]`, *optional*, defaults to `self.size` DEFAULT: None

crop_pct

Percentage of the image to crop if size < 384.

TYPE: `float`, *optional*, defaults to `self.crop_pct` DEFAULT: None

resample

Resampling filter to use if resizing the image. This can be one of PILImageResampling, filters. Only has an effect if do_resize is set to True.

TYPE: `int`, *optional*, defaults to `self.resample` DEFAULT: None

do_rescale

Whether to rescale the image values between [0 - 1].

TYPE: `bool`, *optional*, defaults to `self.do_rescale` DEFAULT: None

rescale_factor

Rescale factor to rescale the image by if do_rescale is set to True.

TYPE: `float`, *optional*, defaults to `self.rescale_factor` DEFAULT: None

do_normalize

Whether to normalize the image.

TYPE: `bool`, *optional*, defaults to `self.do_normalize` DEFAULT: None

image_mean

Image mean.

TYPE: `float` or `List[float]`, *optional*, defaults to `self.image_mean` DEFAULT: None

image_std

Image standard deviation.

TYPE: `float` or `List[float]`, *optional*, defaults to `self.image_std` DEFAULT: None

return_tensors

The type of tensors to return. Can be one of:

  • Unset: Return a list of np.ndarray.
  • TensorType.TENSORFLOW or 'tf': Return a batch of type tf.Tensor.
  • TensorType.PYTORCH or 'pt': Return a batch of type torch.Tensor.
  • TensorType.NUMPY or 'np': Return a batch of type np.ndarray.
  • TensorType.JAX or 'jax': Return a batch of type jax.numpy.ndarray.

TYPE: `str` or `TensorType`, *optional* DEFAULT: None

data_format

The channel dimension format for the output image. Can be one of:

  • "channels_first" or ChannelDimension.FIRST: image in (num_channels, height, width) format.
  • "channels_last" or ChannelDimension.LAST: image in (height, width, num_channels) format.
  • Unset: Use the channel dimension format of the input image.

TYPE: `ChannelDimension` or `str`, *optional*, defaults to `ChannelDimension.FIRST` DEFAULT: FIRST

input_data_format

The channel dimension format for the input image. If unset, the channel dimension format is inferred from the input image. Can be one of:

  • "channels_first" or ChannelDimension.FIRST: image in (num_channels, height, width) format.
  • "channels_last" or ChannelDimension.LAST: image in (height, width, num_channels) format.
  • "none" or ChannelDimension.NONE: image in (height, width) format.

TYPE: `ChannelDimension` or `str`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/convnext/image_processing_convnext.py
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
def preprocess(
    self,
    images: ImageInput,
    do_resize: bool = None,
    size: Dict[str, int] = None,
    crop_pct: float = None,
    resample: PILImageResampling = None,
    do_rescale: bool = None,
    rescale_factor: float = None,
    do_normalize: bool = None,
    image_mean: Optional[Union[float, List[float]]] = None,
    image_std: Optional[Union[float, List[float]]] = None,
    return_tensors: Optional[Union[str, TensorType]] = None,
    data_format: ChannelDimension = ChannelDimension.FIRST,
    input_data_format: Optional[Union[str, ChannelDimension]] = None,
    **kwargs,
) -> PIL.Image.Image:
    """
    Preprocess an image or batch of images.

    Args:
        images (`ImageInput`):
            Image to preprocess. Expects a single or batch of images with pixel values ranging from 0 to 255. If
            passing in images with pixel values between 0 and 1, set `do_rescale=False`.
        do_resize (`bool`, *optional*, defaults to `self.do_resize`):
            Whether to resize the image.
        size (`Dict[str, int]`, *optional*, defaults to `self.size`):
            Size of the output image after `resize` has been applied. If `size["shortest_edge"]` >= 384, the image
            is resized to `(size["shortest_edge"], size["shortest_edge"])`. Otherwise, the smaller edge of the
            image will be matched to `int(size["shortest_edge"]/ crop_pct)`, after which the image is cropped to
            `(size["shortest_edge"], size["shortest_edge"])`. Only has an effect if `do_resize` is set to `True`.
        crop_pct (`float`, *optional*, defaults to `self.crop_pct`):
            Percentage of the image to crop if size < 384.
        resample (`int`, *optional*, defaults to `self.resample`):
            Resampling filter to use if resizing the image. This can be one of `PILImageResampling`, filters. Only
            has an effect if `do_resize` is set to `True`.
        do_rescale (`bool`, *optional*, defaults to `self.do_rescale`):
            Whether to rescale the image values between [0 - 1].
        rescale_factor (`float`, *optional*, defaults to `self.rescale_factor`):
            Rescale factor to rescale the image by if `do_rescale` is set to `True`.
        do_normalize (`bool`, *optional*, defaults to `self.do_normalize`):
            Whether to normalize the image.
        image_mean (`float` or `List[float]`, *optional*, defaults to `self.image_mean`):
            Image mean.
        image_std (`float` or `List[float]`, *optional*, defaults to `self.image_std`):
            Image standard deviation.
        return_tensors (`str` or `TensorType`, *optional*):
            The type of tensors to return. Can be one of:

            - Unset: Return a list of `np.ndarray`.
            - `TensorType.TENSORFLOW` or `'tf'`: Return a batch of type `tf.Tensor`.
            - `TensorType.PYTORCH` or `'pt'`: Return a batch of type `torch.Tensor`.
            - `TensorType.NUMPY` or `'np'`: Return a batch of type `np.ndarray`.
            - `TensorType.JAX` or `'jax'`: Return a batch of type `jax.numpy.ndarray`.
        data_format (`ChannelDimension` or `str`, *optional*, defaults to `ChannelDimension.FIRST`):
            The channel dimension format for the output image. Can be one of:

            - `"channels_first"` or `ChannelDimension.FIRST`: image in (num_channels, height, width) format.
            - `"channels_last"` or `ChannelDimension.LAST`: image in (height, width, num_channels) format.
            - Unset: Use the channel dimension format of the input image.
        input_data_format (`ChannelDimension` or `str`, *optional*):
            The channel dimension format for the input image. If unset, the channel dimension format is inferred
            from the input image. Can be one of:

            - `"channels_first"` or `ChannelDimension.FIRST`: image in (num_channels, height, width) format.
            - `"channels_last"` or `ChannelDimension.LAST`: image in (height, width, num_channels) format.
            - `"none"` or `ChannelDimension.NONE`: image in (height, width) format.
    """
    do_resize = do_resize if do_resize is not None else self.do_resize
    crop_pct = crop_pct if crop_pct is not None else self.crop_pct
    resample = resample if resample is not None else self.resample
    do_rescale = do_rescale if do_rescale is not None else self.do_rescale
    rescale_factor = rescale_factor if rescale_factor is not None else self.rescale_factor
    do_normalize = do_normalize if do_normalize is not None else self.do_normalize
    image_mean = image_mean if image_mean is not None else self.image_mean
    image_std = image_std if image_std is not None else self.image_std

    size = size if size is not None else self.size
    size = get_size_dict(size, default_to_square=False)

    validate_kwargs(captured_kwargs=kwargs.keys(), valid_processor_keys=self._valid_processor_keys)

    images = make_list_of_images(images)

    if not valid_images(images):
        raise ValueError(
            "Invalid image type. Must be of type PIL.Image.Image, numpy.ndarray, "
            "torch.Tensor, tf.Tensor or jax.ndarray."
        )

    validate_preprocess_arguments(
        do_rescale=do_rescale,
        rescale_factor=rescale_factor,
        do_normalize=do_normalize,
        image_mean=image_mean,
        image_std=image_std,
        do_resize=do_resize,
        size=size,
        resample=resample,
    )

    # All transformations expect numpy arrays.
    images = [to_numpy_array(image) for image in images]

    if is_scaled_image(images[0]) and do_rescale:
        logger.warning_once(
            "It looks like you are trying to rescale already rescaled images. If the input"
            " images have pixel values between 0 and 1, set `do_rescale=False` to avoid rescaling them again."
        )

    if input_data_format is None:
        # We assume that all images have the same channel dimension format.
        input_data_format = infer_channel_dimension_format(images[0])

    if do_resize:
        images = [
            self.resize(
                image=image, size=size, crop_pct=crop_pct, resample=resample, input_data_format=input_data_format
            )
            for image in images
        ]

    if do_rescale:
        images = [
            self.rescale(image=image, scale=rescale_factor, input_data_format=input_data_format)
            for image in images
        ]

    if do_normalize:
        images = [
            self.normalize(image=image, mean=image_mean, std=image_std, input_data_format=input_data_format)
            for image in images
        ]

    images = [
        to_channel_dimension_format(image, data_format, input_channel_dim=input_data_format) for image in images
    ]

    data = {"pixel_values": images}
    return BatchFeature(data=data, tensor_type=return_tensors)

mindnlp.transformers.models.convnext.image_processing_convnext.ConvNextImageProcessor.resize(image, size, crop_pct, resample=PILImageResampling.BICUBIC, data_format=None, input_data_format=None, **kwargs)

Resize an image.

PARAMETER DESCRIPTION
image

Image to resize.

TYPE: `np.ndarray`

size

Dictionary of the form {"shortest_edge": int}, specifying the size of the output image. If size["shortest_edge"] >= 384 image is resized to (size["shortest_edge"], size["shortest_edge"]). Otherwise, the smaller edge of the image will be matched to int(size["shortest_edge"] / crop_pct), after which the image is cropped to (size["shortest_edge"], size["shortest_edge"]).

TYPE: `Dict[str, int]`

crop_pct

Percentage of the image to crop. Only has an effect if size < 384.

TYPE: `float`

resample

Resampling filter to use when resizing the image.

TYPE: `PILImageResampling`, *optional*, defaults to `PILImageResampling.BICUBIC` DEFAULT: BICUBIC

data_format

The channel dimension format of the image. If not provided, it will be the same as the input image.

TYPE: `str` or `ChannelDimension`, *optional* DEFAULT: None

input_data_format

The channel dimension format of the input image. If not provided, it will be inferred from the input image.

TYPE: `ChannelDimension` or `str`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/convnext/image_processing_convnext.py
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
def resize(
    self,
    image: np.ndarray,
    size: Dict[str, int],
    crop_pct: float,
    resample: PILImageResampling = PILImageResampling.BICUBIC,
    data_format: Optional[Union[str, ChannelDimension]] = None,
    input_data_format: Optional[Union[str, ChannelDimension]] = None,
    **kwargs,
) -> np.ndarray:
    """
    Resize an image.

    Args:
        image (`np.ndarray`):
            Image to resize.
        size (`Dict[str, int]`):
            Dictionary of the form `{"shortest_edge": int}`, specifying the size of the output image. If
            `size["shortest_edge"]` >= 384 image is resized to `(size["shortest_edge"], size["shortest_edge"])`.
            Otherwise, the smaller edge of the image will be matched to `int(size["shortest_edge"] / crop_pct)`,
            after which the image is cropped to `(size["shortest_edge"], size["shortest_edge"])`.
        crop_pct (`float`):
            Percentage of the image to crop. Only has an effect if size < 384.
        resample (`PILImageResampling`, *optional*, defaults to `PILImageResampling.BICUBIC`):
            Resampling filter to use when resizing the image.
        data_format (`str` or `ChannelDimension`, *optional*):
            The channel dimension format of the image. If not provided, it will be the same as the input image.
        input_data_format (`ChannelDimension` or `str`, *optional*):
            The channel dimension format of the input image. If not provided, it will be inferred from the input
            image.
    """
    size = get_size_dict(size, default_to_square=False)
    if "shortest_edge" not in size:
        raise ValueError(f"Size dictionary must contain 'shortest_edge' key. Got {size.keys()}")
    shortest_edge = size["shortest_edge"]

    if shortest_edge < 384:
        # maintain same ratio, resizing shortest edge to shortest_edge/crop_pct
        resize_shortest_edge = int(shortest_edge / crop_pct)
        resize_size = get_resize_output_image_size(
            image, size=resize_shortest_edge, default_to_square=False, input_data_format=input_data_format
        )
        image = resize(
            image=image,
            size=resize_size,
            resample=resample,
            data_format=data_format,
            input_data_format=input_data_format,
            **kwargs,
        )
        # then crop to (shortest_edge, shortest_edge)
        return center_crop(
            image=image,
            size=(shortest_edge, shortest_edge),
            data_format=data_format,
            input_data_format=input_data_format,
            **kwargs,
        )
    else:
        # warping (no cropping) when evaluated at 384 or larger
        return resize(
            image,
            size=(shortest_edge, shortest_edge),
            resample=resample,
            data_format=data_format,
            input_data_format=input_data_format,
            **kwargs,
        )