Skip to content

van

mindnlp.transformers.models.van.configuration_van

VAN model configuration

mindnlp.transformers.models.van.configuration_van.VanConfig

Bases: PretrainedConfig

This is the configuration class to store the configuration of a [VanModel]. It is used to instantiate a VAN model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the VAN Visual-Attention-Network/van-base architecture.

Configuration objects inherit from [PretrainedConfig] and can be used to control the model outputs. Read the documentation from [PretrainedConfig] for more information.

PARAMETER DESCRIPTION
image_size

The size (resolution) of each image.

TYPE: `int`, *optional*, defaults to 224 DEFAULT: 224

num_channels

The number of input channels.

TYPE: `int`, *optional*, defaults to 3 DEFAULT: 3

patch_sizes

Patch size to use in each stage's embedding layer.

TYPE: `List[int]`, *optional*, defaults to `[7, 3, 3, 3]` DEFAULT: [7, 3, 3, 3]

strides

Stride size to use in each stage's embedding layer to downsample the input.

TYPE: `List[int]`, *optional*, defaults to `[4, 2, 2, 2]` DEFAULT: [4, 2, 2, 2]

hidden_sizes

Dimensionality (hidden size) at each stage.

TYPE: `List[int]`, *optional*, defaults to `[64, 128, 320, 512]` DEFAULT: [64, 128, 320, 512]

depths

Depth (number of layers) for each stage.

TYPE: `List[int]`, *optional*, defaults to `[3, 3, 12, 3]` DEFAULT: [3, 3, 12, 3]

mlp_ratios

The expansion ratio for mlp layer at each stage.

TYPE: `List[int]`, *optional*, defaults to `[8, 8, 4, 4]` DEFAULT: [8, 8, 4, 4]

hidden_act

The non-linear activation function (function or string) in each layer. If string, "gelu", "relu", "selu" and "gelu_new" are supported.

TYPE: `str` or `function`, *optional*, defaults to `"gelu"` DEFAULT: 'gelu'

initializer_range

The standard deviation of the truncated_normal_initializer for initializing all weight matrices.

TYPE: `float`, *optional*, defaults to 0.02 DEFAULT: 0.02

layer_norm_eps

The epsilon used by the layer normalization layers.

TYPE: `float`, *optional*, defaults to 1e-06 DEFAULT: 1e-06

layer_scale_init_value

The initial value for layer scaling.

TYPE: `float`, *optional*, defaults to 0.01 DEFAULT: 0.01

drop_path_rate

The dropout probability for stochastic depth.

TYPE: `float`, *optional*, defaults to 0.0 DEFAULT: 0.0

dropout_rate

The dropout probability for dropout.

TYPE: `float`, *optional*, defaults to 0.0 DEFAULT: 0.0

Example
>>> from transformers import VanModel, VanConfig
...
>>> # Initializing a VAN van-base style configuration
>>> configuration = VanConfig()
>>> # Initializing a model from the van-base style configuration
>>> model = VanModel(configuration)
>>> # Accessing the model configuration
>>> configuration = model.config
Source code in mindnlp/transformers/models/van/configuration_van.py
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
class VanConfig(PretrainedConfig):
    r"""
    This is the configuration class to store the configuration of a [`VanModel`]. It is used to instantiate a VAN model
    according to the specified arguments, defining the model architecture. Instantiating a configuration with the
    defaults will yield a similar configuration to that of the VAN
    [Visual-Attention-Network/van-base](https://huggingface.co/Visual-Attention-Network/van-base) architecture.

    Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
    documentation from [`PretrainedConfig`] for more information.

    Args:
        image_size (`int`, *optional*, defaults to 224):
            The size (resolution) of each image.
        num_channels (`int`, *optional*, defaults to 3):
            The number of input channels.
        patch_sizes (`List[int]`, *optional*, defaults to `[7, 3, 3, 3]`):
            Patch size to use in each stage's embedding layer.
        strides (`List[int]`, *optional*, defaults to `[4, 2, 2, 2]`):
            Stride size to use in each stage's embedding layer to downsample the input.
        hidden_sizes (`List[int]`, *optional*, defaults to `[64, 128, 320, 512]`):
            Dimensionality (hidden size) at each stage.
        depths (`List[int]`, *optional*, defaults to `[3, 3, 12, 3]`):
            Depth (number of layers) for each stage.
        mlp_ratios (`List[int]`, *optional*, defaults to `[8, 8, 4, 4]`):
            The expansion ratio for mlp layer at each stage.
        hidden_act (`str` or `function`, *optional*, defaults to `"gelu"`):
            The non-linear activation function (function or string) in each layer. If string, `"gelu"`, `"relu"`,
            `"selu"` and `"gelu_new"` are supported.
        initializer_range (`float`, *optional*, defaults to 0.02):
            The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
        layer_norm_eps (`float`, *optional*, defaults to 1e-06):
            The epsilon used by the layer normalization layers.
        layer_scale_init_value (`float`, *optional*, defaults to 0.01):
            The initial value for layer scaling.
        drop_path_rate (`float`, *optional*, defaults to 0.0):
            The dropout probability for stochastic depth.
        dropout_rate (`float`, *optional*, defaults to 0.0):
            The dropout probability for dropout.

    Example:
        ```python
        >>> from transformers import VanModel, VanConfig
        ...
        >>> # Initializing a VAN van-base style configuration
        >>> configuration = VanConfig()
        >>> # Initializing a model from the van-base style configuration
        >>> model = VanModel(configuration)
        >>> # Accessing the model configuration
        >>> configuration = model.config
        ```
    """
    model_type = "van"

    def __init__(
        self,
        image_size=224,
        num_channels=3,
        patch_sizes=[7, 3, 3, 3],
        strides=[4, 2, 2, 2],
        hidden_sizes=[64, 128, 320, 512],
        depths=[3, 3, 12, 3],
        mlp_ratios=[8, 8, 4, 4],
        hidden_act="gelu",
        initializer_range=0.02,
        layer_norm_eps=1e-6,
        layer_scale_init_value=1e-2,
        drop_path_rate=0.0,
        dropout_rate=0.0,
        **kwargs,
    ):
        """
        Initializes a new instance of the VanConfig class.

        Args:
            self (object): The instance of the class.
            image_size (int): The size of the input image (default is 224).
            num_channels (int): The number of channels in the input image (default is 3).
            patch_sizes (list): List of patch sizes for each layer in the model (default is [7, 3, 3, 3]).
            strides (list): List of stride values for each layer in the model (default is [4, 2, 2, 2]).
            hidden_sizes (list): List of hidden layer sizes for each layer in the model (default is [64, 128, 320, 512]).
            depths (list): List of depths for each layer in the model (default is [3, 3, 12, 3]).
            mlp_ratios (list): List of MLP ratio values for each layer in the model (default is [8, 8, 4, 4]).
            hidden_act (str): The activation function to be used in hidden layers (default is 'gelu').
            initializer_range (float): The range for weight initialization (default is 0.02).
            layer_norm_eps (float): The epsilon value for layer normalization (default is 1e-06).
            layer_scale_init_value (float): The initial value for layer scale (default is 0.01).
            drop_path_rate (float): The rate for drop path regularization (default is 0.0).
            dropout_rate (float): The dropout rate (default is 0.0).

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__(**kwargs)
        self.image_size = image_size
        self.num_channels = num_channels
        self.patch_sizes = patch_sizes
        self.strides = strides
        self.hidden_sizes = hidden_sizes
        self.depths = depths
        self.mlp_ratios = mlp_ratios
        self.hidden_act = hidden_act
        self.initializer_range = initializer_range
        self.layer_norm_eps = layer_norm_eps
        self.layer_scale_init_value = layer_scale_init_value
        self.drop_path_rate = drop_path_rate
        self.dropout_rate = dropout_rate

mindnlp.transformers.models.van.configuration_van.VanConfig.__init__(image_size=224, num_channels=3, patch_sizes=[7, 3, 3, 3], strides=[4, 2, 2, 2], hidden_sizes=[64, 128, 320, 512], depths=[3, 3, 12, 3], mlp_ratios=[8, 8, 4, 4], hidden_act='gelu', initializer_range=0.02, layer_norm_eps=1e-06, layer_scale_init_value=0.01, drop_path_rate=0.0, dropout_rate=0.0, **kwargs)

Initializes a new instance of the VanConfig class.

PARAMETER DESCRIPTION
self

The instance of the class.

TYPE: object

image_size

The size of the input image (default is 224).

TYPE: int DEFAULT: 224

num_channels

The number of channels in the input image (default is 3).

TYPE: int DEFAULT: 3

patch_sizes

List of patch sizes for each layer in the model (default is [7, 3, 3, 3]).

TYPE: list DEFAULT: [7, 3, 3, 3]

strides

List of stride values for each layer in the model (default is [4, 2, 2, 2]).

TYPE: list DEFAULT: [4, 2, 2, 2]

hidden_sizes

List of hidden layer sizes for each layer in the model (default is [64, 128, 320, 512]).

TYPE: list DEFAULT: [64, 128, 320, 512]

depths

List of depths for each layer in the model (default is [3, 3, 12, 3]).

TYPE: list DEFAULT: [3, 3, 12, 3]

mlp_ratios

List of MLP ratio values for each layer in the model (default is [8, 8, 4, 4]).

TYPE: list DEFAULT: [8, 8, 4, 4]

hidden_act

The activation function to be used in hidden layers (default is 'gelu').

TYPE: str DEFAULT: 'gelu'

initializer_range

The range for weight initialization (default is 0.02).

TYPE: float DEFAULT: 0.02

layer_norm_eps

The epsilon value for layer normalization (default is 1e-06).

TYPE: float DEFAULT: 1e-06

layer_scale_init_value

The initial value for layer scale (default is 0.01).

TYPE: float DEFAULT: 0.01

drop_path_rate

The rate for drop path regularization (default is 0.0).

TYPE: float DEFAULT: 0.0

dropout_rate

The dropout rate (default is 0.0).

TYPE: float DEFAULT: 0.0

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/van/configuration_van.py
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
def __init__(
    self,
    image_size=224,
    num_channels=3,
    patch_sizes=[7, 3, 3, 3],
    strides=[4, 2, 2, 2],
    hidden_sizes=[64, 128, 320, 512],
    depths=[3, 3, 12, 3],
    mlp_ratios=[8, 8, 4, 4],
    hidden_act="gelu",
    initializer_range=0.02,
    layer_norm_eps=1e-6,
    layer_scale_init_value=1e-2,
    drop_path_rate=0.0,
    dropout_rate=0.0,
    **kwargs,
):
    """
    Initializes a new instance of the VanConfig class.

    Args:
        self (object): The instance of the class.
        image_size (int): The size of the input image (default is 224).
        num_channels (int): The number of channels in the input image (default is 3).
        patch_sizes (list): List of patch sizes for each layer in the model (default is [7, 3, 3, 3]).
        strides (list): List of stride values for each layer in the model (default is [4, 2, 2, 2]).
        hidden_sizes (list): List of hidden layer sizes for each layer in the model (default is [64, 128, 320, 512]).
        depths (list): List of depths for each layer in the model (default is [3, 3, 12, 3]).
        mlp_ratios (list): List of MLP ratio values for each layer in the model (default is [8, 8, 4, 4]).
        hidden_act (str): The activation function to be used in hidden layers (default is 'gelu').
        initializer_range (float): The range for weight initialization (default is 0.02).
        layer_norm_eps (float): The epsilon value for layer normalization (default is 1e-06).
        layer_scale_init_value (float): The initial value for layer scale (default is 0.01).
        drop_path_rate (float): The rate for drop path regularization (default is 0.0).
        dropout_rate (float): The dropout rate (default is 0.0).

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__(**kwargs)
    self.image_size = image_size
    self.num_channels = num_channels
    self.patch_sizes = patch_sizes
    self.strides = strides
    self.hidden_sizes = hidden_sizes
    self.depths = depths
    self.mlp_ratios = mlp_ratios
    self.hidden_act = hidden_act
    self.initializer_range = initializer_range
    self.layer_norm_eps = layer_norm_eps
    self.layer_scale_init_value = layer_scale_init_value
    self.drop_path_rate = drop_path_rate
    self.dropout_rate = dropout_rate

mindnlp.transformers.models.van.modeling_van

MindSpore Visual Attention Network (VAN) model.

mindnlp.transformers.models.van.modeling_van.VanDropPath

Bases: Module

Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).

Source code in mindnlp/transformers/models/van/modeling_van.py
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
class VanDropPath(nn.Module):
    """Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks)."""
    def __init__(self, drop_prob: Optional[float] = None) -> None:
        """
        Initialize a new instance of the VanDropPath class.

        Args:
            self: The instance of the VanDropPath class.
            drop_prob (Optional[float]): The probability of dropping a path during training. 
                If set to None, no paths will be dropped. Should be a float value between 0 and 1, inclusive.

        Returns:
            None.

        Raises:
            None
        """
        super().__init__()
        self.drop_prob = drop_prob

    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
        """
        Constructs a new tensor by applying drop_path operation to the given hidden states.

        Args:
            self (VanDropPath): An instance of the VanDropPath class.
            hidden_states (mindspore.Tensor): A tensor containing the hidden states.

        Returns:
            mindspore.Tensor: A tensor representing the output of the drop_path operation.

        Raises:
            None.

        Note:
            The drop_path operation randomly sets a fraction of the hidden states to zero during training.
            This helps in regularizing the model and preventing overfitting. The drop probability is controlled by
            the 'drop_prob' attribute of the VanDropPath class.

        Example:
            ```python
            >>> drop_path = VanDropPath()
            >>> hidden_states = mindspore.Tensor([[1, 2, 3], [4, 5, 6]], mindspore.float32)
            >>> output = drop_path.forward(hidden_states)
            >>> print(output)
            [[1, 0, 3], [4, 0, 6]]
            ```
        """
        return drop_path(hidden_states, self.drop_prob, self.training)

    def extra_repr(self) -> str:
        """
        Return a string representation of the probability of dropping nodes during training.

        Args:
            self (VanDropPath): An instance of the VanDropPath class.

        Returns:
            str: A string representation of the probability of dropping nodes during training.

        Raises:
            None.

        This method returns a formatted string representation of the drop probability of the VanDropPath instance.
        The drop probability is obtained from the `drop_prob` attribute of the instance. The returned string is of the
        form 'p={}', where '{}' is replaced by the actual drop probability value.

        Example:
            If the `drop_prob` attribute of the instance is 0.3, the method will return the string "p=0.3".
        """
        return "p={}".format(self.drop_prob)

mindnlp.transformers.models.van.modeling_van.VanDropPath.__init__(drop_prob=None)

Initialize a new instance of the VanDropPath class.

PARAMETER DESCRIPTION
self

The instance of the VanDropPath class.

drop_prob

The probability of dropping a path during training. If set to None, no paths will be dropped. Should be a float value between 0 and 1, inclusive.

TYPE: Optional[float] DEFAULT: None

RETURNS DESCRIPTION
None

None.

Source code in mindnlp/transformers/models/van/modeling_van.py
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
def __init__(self, drop_prob: Optional[float] = None) -> None:
    """
    Initialize a new instance of the VanDropPath class.

    Args:
        self: The instance of the VanDropPath class.
        drop_prob (Optional[float]): The probability of dropping a path during training. 
            If set to None, no paths will be dropped. Should be a float value between 0 and 1, inclusive.

    Returns:
        None.

    Raises:
        None
    """
    super().__init__()
    self.drop_prob = drop_prob

mindnlp.transformers.models.van.modeling_van.VanDropPath.extra_repr()

Return a string representation of the probability of dropping nodes during training.

PARAMETER DESCRIPTION
self

An instance of the VanDropPath class.

TYPE: VanDropPath

RETURNS DESCRIPTION
str

A string representation of the probability of dropping nodes during training.

TYPE: str

This method returns a formatted string representation of the drop probability of the VanDropPath instance. The drop probability is obtained from the drop_prob attribute of the instance. The returned string is of the form 'p={}', where '{}' is replaced by the actual drop probability value.

Example

If the drop_prob attribute of the instance is 0.3, the method will return the string "p=0.3".

Source code in mindnlp/transformers/models/van/modeling_van.py
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
def extra_repr(self) -> str:
    """
    Return a string representation of the probability of dropping nodes during training.

    Args:
        self (VanDropPath): An instance of the VanDropPath class.

    Returns:
        str: A string representation of the probability of dropping nodes during training.

    Raises:
        None.

    This method returns a formatted string representation of the drop probability of the VanDropPath instance.
    The drop probability is obtained from the `drop_prob` attribute of the instance. The returned string is of the
    form 'p={}', where '{}' is replaced by the actual drop probability value.

    Example:
        If the `drop_prob` attribute of the instance is 0.3, the method will return the string "p=0.3".
    """
    return "p={}".format(self.drop_prob)

mindnlp.transformers.models.van.modeling_van.VanDropPath.forward(hidden_states)

Constructs a new tensor by applying drop_path operation to the given hidden states.

PARAMETER DESCRIPTION
self

An instance of the VanDropPath class.

TYPE: VanDropPath

hidden_states

A tensor containing the hidden states.

TYPE: Tensor

RETURNS DESCRIPTION
Tensor

mindspore.Tensor: A tensor representing the output of the drop_path operation.

Note

The drop_path operation randomly sets a fraction of the hidden states to zero during training. This helps in regularizing the model and preventing overfitting. The drop probability is controlled by the 'drop_prob' attribute of the VanDropPath class.

Example
>>> drop_path = VanDropPath()
>>> hidden_states = mindspore.Tensor([[1, 2, 3], [4, 5, 6]], mindspore.float32)
>>> output = drop_path.forward(hidden_states)
>>> print(output)
[[1, 0, 3], [4, 0, 6]]
Source code in mindnlp/transformers/models/van/modeling_van.py
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
    """
    Constructs a new tensor by applying drop_path operation to the given hidden states.

    Args:
        self (VanDropPath): An instance of the VanDropPath class.
        hidden_states (mindspore.Tensor): A tensor containing the hidden states.

    Returns:
        mindspore.Tensor: A tensor representing the output of the drop_path operation.

    Raises:
        None.

    Note:
        The drop_path operation randomly sets a fraction of the hidden states to zero during training.
        This helps in regularizing the model and preventing overfitting. The drop probability is controlled by
        the 'drop_prob' attribute of the VanDropPath class.

    Example:
        ```python
        >>> drop_path = VanDropPath()
        >>> hidden_states = mindspore.Tensor([[1, 2, 3], [4, 5, 6]], mindspore.float32)
        >>> output = drop_path.forward(hidden_states)
        >>> print(output)
        [[1, 0, 3], [4, 0, 6]]
        ```
    """
    return drop_path(hidden_states, self.drop_prob, self.training)

mindnlp.transformers.models.van.modeling_van.VanEncoder

Bases: Module

VanEncoder, consisting of multiple stages.

Source code in mindnlp/transformers/models/van/modeling_van.py
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
class VanEncoder(nn.Module):
    """
    VanEncoder, consisting of multiple stages.
    """
    def __init__(self, config: VanConfig):
        """
        Initializes a VanEncoder object.

        Args:
            self: The instance of the class.
            config (VanConfig): An object containing configuration parameters for the VanEncoder.
                It includes the following attributes:

                - patch_sizes (List[int]): List of patch sizes for each stage.
                - strides (List[int]): List of stride values for each stage.
                - hidden_sizes (List[int]): List of hidden layer sizes for each stage.
                - depths (List[int]): List of depths for each stage.
                - mlp_ratios (List[int]): List of MLP expansion ratios for each stage.
                - drop_path_rate (float): Drop path rate for the encoder.

        Returns:
            None.

        Raises:
            AssertionError: If the config parameter is not of type VanConfig.
            TypeError: If any of the config attributes are not of the expected types.
            ValueError: If the drop_path_rate value is out of range or invalid.
        """
        super().__init__()
        self.stages = nn.ModuleList([])
        patch_sizes = config.patch_sizes
        strides = config.strides
        hidden_sizes = config.hidden_sizes
        depths = config.depths
        mlp_ratios = config.mlp_ratios
        drop_path_rates = [x.item() for x in ops.linspace(0, config.drop_path_rate, sum(config.depths))]

        for num_stage, (patch_size, stride, hidden_size, depth, mlp_expantion, drop_path_rate) in enumerate(
            zip(patch_sizes, strides, hidden_sizes, depths, mlp_ratios, drop_path_rates)
        ):
            is_first_stage = num_stage == 0
            in_channels = hidden_sizes[num_stage - 1]
            if is_first_stage:
                in_channels = config.num_channels
            self.stages.append(
                VanStage(
                    config,
                    in_channels,
                    hidden_size,
                    patch_size=patch_size,
                    stride=stride,
                    depth=depth,
                    mlp_ratio=mlp_expantion,
                    drop_path_rate=drop_path_rate,
                )
            )

    def forward(
        self,
        hidden_state: mindspore.Tensor,
        output_hidden_states: Optional[bool] = False,
        return_dict: Optional[bool] = True,
    ) -> Union[Tuple, BaseModelOutputWithNoAttention]:
        """
        Construct method in the VanEncoder class.

        Args:
            self: The instance of the class.
            hidden_state (mindspore.Tensor): The input hidden state tensor.
            output_hidden_states (bool, optional): A flag indicating whether to output hidden states. Defaults to False.
            return_dict (bool, optional): A flag indicating whether to return the output as a dictionary. Defaults to True.

        Returns:
            Union[Tuple, BaseModelOutputWithNoAttention]: The forwarded output, which is either a tuple of hidden
                state and all hidden states or an instance of BaseModelOutputWithNoAttention.

        Raises:
            None
        """
        all_hidden_states = () if output_hidden_states else None

        for _, stage_module in enumerate(self.stages):
            hidden_state = stage_module(hidden_state)

            if output_hidden_states:
                all_hidden_states = all_hidden_states + (hidden_state,)

        if not return_dict:
            return tuple(v for v in [hidden_state, all_hidden_states] if v is not None)

        return BaseModelOutputWithNoAttention(last_hidden_state=hidden_state, hidden_states=all_hidden_states)

mindnlp.transformers.models.van.modeling_van.VanEncoder.__init__(config)

Initializes a VanEncoder object.

PARAMETER DESCRIPTION
self

The instance of the class.

config

An object containing configuration parameters for the VanEncoder. It includes the following attributes:

  • patch_sizes (List[int]): List of patch sizes for each stage.
  • strides (List[int]): List of stride values for each stage.
  • hidden_sizes (List[int]): List of hidden layer sizes for each stage.
  • depths (List[int]): List of depths for each stage.
  • mlp_ratios (List[int]): List of MLP expansion ratios for each stage.
  • drop_path_rate (float): Drop path rate for the encoder.

TYPE: VanConfig

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
AssertionError

If the config parameter is not of type VanConfig.

TypeError

If any of the config attributes are not of the expected types.

ValueError

If the drop_path_rate value is out of range or invalid.

Source code in mindnlp/transformers/models/van/modeling_van.py
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
def __init__(self, config: VanConfig):
    """
    Initializes a VanEncoder object.

    Args:
        self: The instance of the class.
        config (VanConfig): An object containing configuration parameters for the VanEncoder.
            It includes the following attributes:

            - patch_sizes (List[int]): List of patch sizes for each stage.
            - strides (List[int]): List of stride values for each stage.
            - hidden_sizes (List[int]): List of hidden layer sizes for each stage.
            - depths (List[int]): List of depths for each stage.
            - mlp_ratios (List[int]): List of MLP expansion ratios for each stage.
            - drop_path_rate (float): Drop path rate for the encoder.

    Returns:
        None.

    Raises:
        AssertionError: If the config parameter is not of type VanConfig.
        TypeError: If any of the config attributes are not of the expected types.
        ValueError: If the drop_path_rate value is out of range or invalid.
    """
    super().__init__()
    self.stages = nn.ModuleList([])
    patch_sizes = config.patch_sizes
    strides = config.strides
    hidden_sizes = config.hidden_sizes
    depths = config.depths
    mlp_ratios = config.mlp_ratios
    drop_path_rates = [x.item() for x in ops.linspace(0, config.drop_path_rate, sum(config.depths))]

    for num_stage, (patch_size, stride, hidden_size, depth, mlp_expantion, drop_path_rate) in enumerate(
        zip(patch_sizes, strides, hidden_sizes, depths, mlp_ratios, drop_path_rates)
    ):
        is_first_stage = num_stage == 0
        in_channels = hidden_sizes[num_stage - 1]
        if is_first_stage:
            in_channels = config.num_channels
        self.stages.append(
            VanStage(
                config,
                in_channels,
                hidden_size,
                patch_size=patch_size,
                stride=stride,
                depth=depth,
                mlp_ratio=mlp_expantion,
                drop_path_rate=drop_path_rate,
            )
        )

mindnlp.transformers.models.van.modeling_van.VanEncoder.forward(hidden_state, output_hidden_states=False, return_dict=True)

Construct method in the VanEncoder class.

PARAMETER DESCRIPTION
self

The instance of the class.

hidden_state

The input hidden state tensor.

TYPE: Tensor

output_hidden_states

A flag indicating whether to output hidden states. Defaults to False.

TYPE: bool DEFAULT: False

return_dict

A flag indicating whether to return the output as a dictionary. Defaults to True.

TYPE: bool DEFAULT: True

RETURNS DESCRIPTION
Union[Tuple, BaseModelOutputWithNoAttention]

Union[Tuple, BaseModelOutputWithNoAttention]: The forwarded output, which is either a tuple of hidden state and all hidden states or an instance of BaseModelOutputWithNoAttention.

Source code in mindnlp/transformers/models/van/modeling_van.py
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
def forward(
    self,
    hidden_state: mindspore.Tensor,
    output_hidden_states: Optional[bool] = False,
    return_dict: Optional[bool] = True,
) -> Union[Tuple, BaseModelOutputWithNoAttention]:
    """
    Construct method in the VanEncoder class.

    Args:
        self: The instance of the class.
        hidden_state (mindspore.Tensor): The input hidden state tensor.
        output_hidden_states (bool, optional): A flag indicating whether to output hidden states. Defaults to False.
        return_dict (bool, optional): A flag indicating whether to return the output as a dictionary. Defaults to True.

    Returns:
        Union[Tuple, BaseModelOutputWithNoAttention]: The forwarded output, which is either a tuple of hidden
            state and all hidden states or an instance of BaseModelOutputWithNoAttention.

    Raises:
        None
    """
    all_hidden_states = () if output_hidden_states else None

    for _, stage_module in enumerate(self.stages):
        hidden_state = stage_module(hidden_state)

        if output_hidden_states:
            all_hidden_states = all_hidden_states + (hidden_state,)

    if not return_dict:
        return tuple(v for v in [hidden_state, all_hidden_states] if v is not None)

    return BaseModelOutputWithNoAttention(last_hidden_state=hidden_state, hidden_states=all_hidden_states)

mindnlp.transformers.models.van.modeling_van.VanForImageClassification

Bases: VanPreTrainedModel

VanForImageClassification is a class that represents a model for image classification using a pre-trained VanModel for feature extraction and a classifier for final prediction. It inherits from VanPreTrainedModel and implements methods for model initialization and inference.

ATTRIBUTE DESCRIPTION
van

The VanModel instance used for feature extraction.

TYPE: VanModel

classifier

The classifier module for predicting the final output based on the extracted features.

TYPE: Module

METHOD DESCRIPTION
__init__

Initializes the VanForImageClassification model with the given configuration.

forward

Constructs the model for image classification.

Args:

  • pixel_values (Optional[mindspore.Tensor]): The input pixel values representing the image.
  • labels (Optional[mindspore.Tensor]): Labels for computing the image classification/regression loss.
  • output_hidden_states (Optional[bool]): Flag to output hidden states.
  • return_dict (Optional[bool]): Flag to determine if the return should be a dictionary.

Returns:

  • Union[Tuple, ImageClassifierOutputWithNoAttention]: Tuple of output elements or ImageClassifierOutputWithNoAttention object.
Example
>>> model = VanForImageClassification(config)
>>> output = model.forward(pixel_values, labels, output_hidden_states, return_dict)
Note

The forward method computes the loss based on the labels and the model's prediction, and returns the output based on the configured settings.

Source code in mindnlp/transformers/models/van/modeling_van.py
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
class VanForImageClassification(VanPreTrainedModel):

    """
    VanForImageClassification is a class that represents a model for image classification using a pre-trained VanModel
    for feature extraction and a classifier for final prediction. It inherits from VanPreTrainedModel and implements
    methods for model initialization and inference.

    Attributes:
        van (VanModel): The VanModel instance used for feature extraction.
        classifier (nn.Module): The classifier module for predicting the final output based on the extracted features.

    Methods:
        __init__:
            Initializes the VanForImageClassification model with the given configuration.

        forward:
            Constructs the model for image classification.

            Args:

            - pixel_values (Optional[mindspore.Tensor]): The input pixel values representing the image.
            - labels (Optional[mindspore.Tensor]): Labels for computing the image classification/regression loss.
            - output_hidden_states (Optional[bool]): Flag to output hidden states.
            - return_dict (Optional[bool]): Flag to determine if the return should be a dictionary.

            Returns:

            - Union[Tuple, ImageClassifierOutputWithNoAttention]: Tuple of output elements or
            ImageClassifierOutputWithNoAttention object.

    Example:
        ```python
        >>> model = VanForImageClassification(config)
        >>> output = model.forward(pixel_values, labels, output_hidden_states, return_dict)
        ```

    Note:
        The forward method computes the loss based on the labels and the model's prediction, and returns the output
        based on the configured settings.
    """
    def __init__(self, config):
        """
        __init__

        Initializes an instance of the VanForImageClassification class.

        Args:
            self: The instance of the class.
            config: A configuration object containing parameters for the van model and classification.
                This parameter is of type 'config' and is used to configure the van model and classifier.
                It should be an instance of the configuration class and must be provided.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__(config)
        self.van = VanModel(config)
        # Classifier head
        self.classifier = (
            nn.Linear(config.hidden_sizes[-1], config.num_labels) if config.num_labels > 0 else nn.Identity()
        )

        # Initialize weights and apply final processing
        self.post_init()

    def forward(
        self,
        pixel_values: Optional[mindspore.Tensor] = None,
        labels: Optional[mindspore.Tensor] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple, ImageClassifierOutputWithNoAttention]:
        r"""
        Args:
            labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
                Labels for computing the image classification/regression loss. Indices should be in `[0, ...,
                config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
                `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        outputs = self.van(pixel_values, output_hidden_states=output_hidden_states, return_dict=return_dict)

        pooled_output = outputs.pooler_output if return_dict else outputs[1]

        logits = self.classifier(pooled_output)

        loss = None
        if labels is not None:
            if self.config.problem_type is None:
                if self.config.num_labels == 1:
                    self.config.problem_type = "regression"
                elif self.config.num_labels > 1 and labels.dtype in (mindspore.int64, mindspore.int32):
                    self.config.problem_type = "single_label_classification"
                else:
                    self.config.problem_type = "multi_label_classification"

            if self.config.problem_type == "regression":
                if self.config.num_labels == 1:
                    loss = ops.mse_loss(logits.squeeze(), labels.squeeze())
                else:
                    loss = ops.mse_loss(logits, labels)
            elif self.config.problem_type == "single_label_classification":
                loss = ops.cross_entropy(logits.view(-1, self.config.num_labels), labels.view(-1))
            elif self.config.problem_type == "multi_label_classification":
                loss = ops.binary_cross_entropy_with_logits(logits, labels)

        if not return_dict:
            output = (logits,) + outputs[2:]
            return ((loss,) + output) if loss is not None else output

        return ImageClassifierOutputWithNoAttention(loss=loss, logits=logits, hidden_states=outputs.hidden_states)

mindnlp.transformers.models.van.modeling_van.VanForImageClassification.__init__(config)

init

Initializes an instance of the VanForImageClassification class.

PARAMETER DESCRIPTION
self

The instance of the class.

config

A configuration object containing parameters for the van model and classification. This parameter is of type 'config' and is used to configure the van model and classifier. It should be an instance of the configuration class and must be provided.

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/van/modeling_van.py
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
def __init__(self, config):
    """
    __init__

    Initializes an instance of the VanForImageClassification class.

    Args:
        self: The instance of the class.
        config: A configuration object containing parameters for the van model and classification.
            This parameter is of type 'config' and is used to configure the van model and classifier.
            It should be an instance of the configuration class and must be provided.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__(config)
    self.van = VanModel(config)
    # Classifier head
    self.classifier = (
        nn.Linear(config.hidden_sizes[-1], config.num_labels) if config.num_labels > 0 else nn.Identity()
    )

    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.van.modeling_van.VanForImageClassification.forward(pixel_values=None, labels=None, output_hidden_states=None, return_dict=None)

PARAMETER DESCRIPTION
labels

Labels for computing the image classification/regression loss. Indices should be in [0, ..., config.num_labels - 1]. If config.num_labels == 1 a regression loss is computed (Mean-Square loss), If config.num_labels > 1 a classification loss is computed (Cross-Entropy).

TYPE: `mindspore.Tensor` of shape `(batch_size,)`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/van/modeling_van.py
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
def forward(
    self,
    pixel_values: Optional[mindspore.Tensor] = None,
    labels: Optional[mindspore.Tensor] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[Tuple, ImageClassifierOutputWithNoAttention]:
    r"""
    Args:
        labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for computing the image classification/regression loss. Indices should be in `[0, ...,
            config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
            `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    outputs = self.van(pixel_values, output_hidden_states=output_hidden_states, return_dict=return_dict)

    pooled_output = outputs.pooler_output if return_dict else outputs[1]

    logits = self.classifier(pooled_output)

    loss = None
    if labels is not None:
        if self.config.problem_type is None:
            if self.config.num_labels == 1:
                self.config.problem_type = "regression"
            elif self.config.num_labels > 1 and labels.dtype in (mindspore.int64, mindspore.int32):
                self.config.problem_type = "single_label_classification"
            else:
                self.config.problem_type = "multi_label_classification"

        if self.config.problem_type == "regression":
            if self.config.num_labels == 1:
                loss = ops.mse_loss(logits.squeeze(), labels.squeeze())
            else:
                loss = ops.mse_loss(logits, labels)
        elif self.config.problem_type == "single_label_classification":
            loss = ops.cross_entropy(logits.view(-1, self.config.num_labels), labels.view(-1))
        elif self.config.problem_type == "multi_label_classification":
            loss = ops.binary_cross_entropy_with_logits(logits, labels)

    if not return_dict:
        output = (logits,) + outputs[2:]
        return ((loss,) + output) if loss is not None else output

    return ImageClassifierOutputWithNoAttention(loss=loss, logits=logits, hidden_states=outputs.hidden_states)

mindnlp.transformers.models.van.modeling_van.VanLargeKernelAttention

Bases: Module

Basic Large Kernel Attention (LKA).

Source code in mindnlp/transformers/models/van/modeling_van.py
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
class VanLargeKernelAttention(nn.Module):
    """
    Basic Large Kernel Attention (LKA).
    """
    def __init__(self, hidden_size: int):
        """
        Initializes an instance of the VanLargeKernelAttention class.

        Args:
            self: The instance of the class.
            hidden_size (int): The size of the hidden layer. Specifies the number of hidden units in the neural network.
                It is used to define the dimensions of the convolutional layers within the attention mechanism.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()
        self.depth_wise = nn.Conv2d(hidden_size, hidden_size, kernel_size=5, padding=2, group=hidden_size)
        self.depth_wise_dilated = nn.Conv2d(
            hidden_size, hidden_size, kernel_size=7, dilation=3, padding=9, group=hidden_size
        )
        self.point_wise = nn.Conv2d(hidden_size, hidden_size, kernel_size=1)

    def forward(self, hidden_state: mindspore.Tensor) -> mindspore.Tensor:
        """
        Constructs the attention mechanism in the VanLargeKernelAttention class.

        Args:
            self (VanLargeKernelAttention): An instance of the VanLargeKernelAttention class.
            hidden_state (mindspore.Tensor): The hidden state tensor representing the input data.

        Returns:
            mindspore.Tensor: The transformed hidden state tensor after passing through the attention mechanism.

        Raises:
            None
        """
        hidden_state = self.depth_wise(hidden_state)
        hidden_state = self.depth_wise_dilated(hidden_state)
        hidden_state = self.point_wise(hidden_state)
        return hidden_state

mindnlp.transformers.models.van.modeling_van.VanLargeKernelAttention.__init__(hidden_size)

Initializes an instance of the VanLargeKernelAttention class.

PARAMETER DESCRIPTION
self

The instance of the class.

hidden_size

The size of the hidden layer. Specifies the number of hidden units in the neural network. It is used to define the dimensions of the convolutional layers within the attention mechanism.

TYPE: int

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/van/modeling_van.py
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
def __init__(self, hidden_size: int):
    """
    Initializes an instance of the VanLargeKernelAttention class.

    Args:
        self: The instance of the class.
        hidden_size (int): The size of the hidden layer. Specifies the number of hidden units in the neural network.
            It is used to define the dimensions of the convolutional layers within the attention mechanism.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()
    self.depth_wise = nn.Conv2d(hidden_size, hidden_size, kernel_size=5, padding=2, group=hidden_size)
    self.depth_wise_dilated = nn.Conv2d(
        hidden_size, hidden_size, kernel_size=7, dilation=3, padding=9, group=hidden_size
    )
    self.point_wise = nn.Conv2d(hidden_size, hidden_size, kernel_size=1)

mindnlp.transformers.models.van.modeling_van.VanLargeKernelAttention.forward(hidden_state)

Constructs the attention mechanism in the VanLargeKernelAttention class.

PARAMETER DESCRIPTION
self

An instance of the VanLargeKernelAttention class.

TYPE: VanLargeKernelAttention

hidden_state

The hidden state tensor representing the input data.

TYPE: Tensor

RETURNS DESCRIPTION
Tensor

mindspore.Tensor: The transformed hidden state tensor after passing through the attention mechanism.

Source code in mindnlp/transformers/models/van/modeling_van.py
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
def forward(self, hidden_state: mindspore.Tensor) -> mindspore.Tensor:
    """
    Constructs the attention mechanism in the VanLargeKernelAttention class.

    Args:
        self (VanLargeKernelAttention): An instance of the VanLargeKernelAttention class.
        hidden_state (mindspore.Tensor): The hidden state tensor representing the input data.

    Returns:
        mindspore.Tensor: The transformed hidden state tensor after passing through the attention mechanism.

    Raises:
        None
    """
    hidden_state = self.depth_wise(hidden_state)
    hidden_state = self.depth_wise_dilated(hidden_state)
    hidden_state = self.point_wise(hidden_state)
    return hidden_state

mindnlp.transformers.models.van.modeling_van.VanLargeKernelAttentionLayer

Bases: Module

Computes attention using Large Kernel Attention (LKA) and attends the input.

Source code in mindnlp/transformers/models/van/modeling_van.py
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
class VanLargeKernelAttentionLayer(nn.Module):
    """
    Computes attention using Large Kernel Attention (LKA) and attends the input.
    """
    def __init__(self, hidden_size: int):
        """
        Initializes a VanLargeKernelAttentionLayer instance with the specified hidden size.

        Args:
            self: The instance of the VanLargeKernelAttentionLayer class.
            hidden_size (int): The size of the hidden state, representing the dimensionality of the input feature space.
                It must be a positive integer.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()
        self.attention = VanLargeKernelAttention(hidden_size)

    def forward(self, hidden_state: mindspore.Tensor) -> mindspore.Tensor:
        """
        This method forwards an attention mechanism in the VanLargeKernelAttentionLayer class.

        Args:
            self: The instance of the VanLargeKernelAttentionLayer class.
            hidden_state (mindspore.Tensor): The hidden state tensor on which the attention mechanism is applied.

        Returns:
            mindspore.Tensor: The attended tensor resulting from applying attention to the hidden state.

        Raises:
            No specific exceptions are raised by this method.
        """
        attention = self.attention(hidden_state)
        attended = hidden_state * attention
        return attended

mindnlp.transformers.models.van.modeling_van.VanLargeKernelAttentionLayer.__init__(hidden_size)

Initializes a VanLargeKernelAttentionLayer instance with the specified hidden size.

PARAMETER DESCRIPTION
self

The instance of the VanLargeKernelAttentionLayer class.

hidden_size

The size of the hidden state, representing the dimensionality of the input feature space. It must be a positive integer.

TYPE: int

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/van/modeling_van.py
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
def __init__(self, hidden_size: int):
    """
    Initializes a VanLargeKernelAttentionLayer instance with the specified hidden size.

    Args:
        self: The instance of the VanLargeKernelAttentionLayer class.
        hidden_size (int): The size of the hidden state, representing the dimensionality of the input feature space.
            It must be a positive integer.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()
    self.attention = VanLargeKernelAttention(hidden_size)

mindnlp.transformers.models.van.modeling_van.VanLargeKernelAttentionLayer.forward(hidden_state)

This method forwards an attention mechanism in the VanLargeKernelAttentionLayer class.

PARAMETER DESCRIPTION
self

The instance of the VanLargeKernelAttentionLayer class.

hidden_state

The hidden state tensor on which the attention mechanism is applied.

TYPE: Tensor

RETURNS DESCRIPTION
Tensor

mindspore.Tensor: The attended tensor resulting from applying attention to the hidden state.

Source code in mindnlp/transformers/models/van/modeling_van.py
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
def forward(self, hidden_state: mindspore.Tensor) -> mindspore.Tensor:
    """
    This method forwards an attention mechanism in the VanLargeKernelAttentionLayer class.

    Args:
        self: The instance of the VanLargeKernelAttentionLayer class.
        hidden_state (mindspore.Tensor): The hidden state tensor on which the attention mechanism is applied.

    Returns:
        mindspore.Tensor: The attended tensor resulting from applying attention to the hidden state.

    Raises:
        No specific exceptions are raised by this method.
    """
    attention = self.attention(hidden_state)
    attended = hidden_state * attention
    return attended

mindnlp.transformers.models.van.modeling_van.VanLayer

Bases: Module

Van layer composed by normalization layers, large kernel attention (LKA) and a multi layer perceptron (MLP).

Source code in mindnlp/transformers/models/van/modeling_van.py
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
class VanLayer(nn.Module):
    """
    Van layer composed by normalization layers, large kernel attention (LKA) and a multi layer perceptron (MLP).
    """
    def __init__(
        self,
        config: VanConfig,
        hidden_size: int,
        mlp_ratio: int = 4,
        drop_path_rate: float = 0.5,
    ):
        """
        Initializes an instance of the VanLayer class.

        Args:
            self: The object itself.
            config (VanConfig): An object containing configuration settings for the layer.
            hidden_size (int): The size of the hidden layer.
            mlp_ratio (int, optional): The ratio of the hidden size to the output size of the MLP layer. Defaults to 4.
            drop_path_rate (float, optional): The rate at which to apply drop path regularization. Defaults to 0.5.

        Returns:
            None

        Raises:
            None
        """
        super().__init__()
        self.drop_path = VanDropPath(drop_path_rate) if drop_path_rate > 0.0 else nn.Identity()
        self.pre_normomalization = nn.BatchNorm2d(hidden_size)
        self.attention = VanSpatialAttentionLayer(hidden_size, config.hidden_act)
        self.attention_scaling = VanLayerScaling(hidden_size, config.layer_scale_init_value)
        self.post_normalization = nn.BatchNorm2d(hidden_size)
        self.mlp = VanMlpLayer(
            hidden_size, hidden_size * mlp_ratio, hidden_size, config.hidden_act, config.dropout_rate
        )
        self.mlp_scaling = VanLayerScaling(hidden_size, config.layer_scale_init_value)

    def forward(self, hidden_state: mindspore.Tensor) -> mindspore.Tensor:
        """
        Construct method in the VanLayer class.

        This method forwards the output tensor by applying a series of operations to the input hidden state.

        Args:
            self: Instance of the VanLayer class.
            hidden_state (mindspore.Tensor): The input hidden state tensor on which the operations are performed.

        Returns:
            mindspore.Tensor: The output tensor after applying the operations on the input hidden state.

        Raises:
            None.
        """
        residual = hidden_state
        # attention
        hidden_state = self.pre_normomalization(hidden_state)
        hidden_state = self.attention(hidden_state)
        hidden_state = self.attention_scaling(hidden_state)
        hidden_state = self.drop_path(hidden_state)
        # residual connection
        hidden_state = residual + hidden_state
        residual = hidden_state
        # mlp
        hidden_state = self.post_normalization(hidden_state)
        hidden_state = self.mlp(hidden_state)
        hidden_state = self.mlp_scaling(hidden_state)
        hidden_state = self.drop_path(hidden_state)
        # residual connection
        hidden_state = residual + hidden_state
        return hidden_state

mindnlp.transformers.models.van.modeling_van.VanLayer.__init__(config, hidden_size, mlp_ratio=4, drop_path_rate=0.5)

Initializes an instance of the VanLayer class.

PARAMETER DESCRIPTION
self

The object itself.

config

An object containing configuration settings for the layer.

TYPE: VanConfig

hidden_size

The size of the hidden layer.

TYPE: int

mlp_ratio

The ratio of the hidden size to the output size of the MLP layer. Defaults to 4.

TYPE: int DEFAULT: 4

drop_path_rate

The rate at which to apply drop path regularization. Defaults to 0.5.

TYPE: float DEFAULT: 0.5

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/van/modeling_van.py
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
def __init__(
    self,
    config: VanConfig,
    hidden_size: int,
    mlp_ratio: int = 4,
    drop_path_rate: float = 0.5,
):
    """
    Initializes an instance of the VanLayer class.

    Args:
        self: The object itself.
        config (VanConfig): An object containing configuration settings for the layer.
        hidden_size (int): The size of the hidden layer.
        mlp_ratio (int, optional): The ratio of the hidden size to the output size of the MLP layer. Defaults to 4.
        drop_path_rate (float, optional): The rate at which to apply drop path regularization. Defaults to 0.5.

    Returns:
        None

    Raises:
        None
    """
    super().__init__()
    self.drop_path = VanDropPath(drop_path_rate) if drop_path_rate > 0.0 else nn.Identity()
    self.pre_normomalization = nn.BatchNorm2d(hidden_size)
    self.attention = VanSpatialAttentionLayer(hidden_size, config.hidden_act)
    self.attention_scaling = VanLayerScaling(hidden_size, config.layer_scale_init_value)
    self.post_normalization = nn.BatchNorm2d(hidden_size)
    self.mlp = VanMlpLayer(
        hidden_size, hidden_size * mlp_ratio, hidden_size, config.hidden_act, config.dropout_rate
    )
    self.mlp_scaling = VanLayerScaling(hidden_size, config.layer_scale_init_value)

mindnlp.transformers.models.van.modeling_van.VanLayer.forward(hidden_state)

Construct method in the VanLayer class.

This method forwards the output tensor by applying a series of operations to the input hidden state.

PARAMETER DESCRIPTION
self

Instance of the VanLayer class.

hidden_state

The input hidden state tensor on which the operations are performed.

TYPE: Tensor

RETURNS DESCRIPTION
Tensor

mindspore.Tensor: The output tensor after applying the operations on the input hidden state.

Source code in mindnlp/transformers/models/van/modeling_van.py
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
def forward(self, hidden_state: mindspore.Tensor) -> mindspore.Tensor:
    """
    Construct method in the VanLayer class.

    This method forwards the output tensor by applying a series of operations to the input hidden state.

    Args:
        self: Instance of the VanLayer class.
        hidden_state (mindspore.Tensor): The input hidden state tensor on which the operations are performed.

    Returns:
        mindspore.Tensor: The output tensor after applying the operations on the input hidden state.

    Raises:
        None.
    """
    residual = hidden_state
    # attention
    hidden_state = self.pre_normomalization(hidden_state)
    hidden_state = self.attention(hidden_state)
    hidden_state = self.attention_scaling(hidden_state)
    hidden_state = self.drop_path(hidden_state)
    # residual connection
    hidden_state = residual + hidden_state
    residual = hidden_state
    # mlp
    hidden_state = self.post_normalization(hidden_state)
    hidden_state = self.mlp(hidden_state)
    hidden_state = self.mlp_scaling(hidden_state)
    hidden_state = self.drop_path(hidden_state)
    # residual connection
    hidden_state = residual + hidden_state
    return hidden_state

mindnlp.transformers.models.van.modeling_van.VanLayerScaling

Bases: Module

Scales the inputs by a learnable parameter initialized by initial_value.

Source code in mindnlp/transformers/models/van/modeling_van.py
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
class VanLayerScaling(nn.Module):
    """
    Scales the inputs by a learnable parameter initialized by `initial_value`.
    """
    def __init__(self, hidden_size: int, initial_value: float = 1e-2):
        """
        Initializes a new instance of the VanLayerScaling class.

        Args:
            self: The object itself.
            hidden_size (int): The size of the hidden layer.
            initial_value (float, optional): The initial value for the weight parameter. Default is 0.01.

        Returns:
            None

        Raises:
            None
        """
        super().__init__()
        self.weight = Parameter(initial_value * ops.ones((hidden_size)), requires_grad=True)

    def forward(self, hidden_state: mindspore.Tensor) -> mindspore.Tensor:
        """
        This method is part of the VanLayerScaling class and is used to perform scaling on the hidden_state tensor.

        Args:
            self (VanLayerScaling): The instance of the VanLayerScaling class.
            hidden_state (mindspore.Tensor): The input tensor representing the hidden state.
                It is expected to be a tensor of type mindspore.Tensor.

        Returns:
            mindspore.Tensor: Returns a tensor of type mindspore.Tensor which is the result of scaling the
                input hidden_state tensor.

        Raises:
            None.
        """
        # unsqueezing for broadcasting
        hidden_state = self.weight.unsqueeze(-1).unsqueeze(-1) * hidden_state
        return hidden_state

mindnlp.transformers.models.van.modeling_van.VanLayerScaling.__init__(hidden_size, initial_value=0.01)

Initializes a new instance of the VanLayerScaling class.

PARAMETER DESCRIPTION
self

The object itself.

hidden_size

The size of the hidden layer.

TYPE: int

initial_value

The initial value for the weight parameter. Default is 0.01.

TYPE: float DEFAULT: 0.01

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/van/modeling_van.py
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
def __init__(self, hidden_size: int, initial_value: float = 1e-2):
    """
    Initializes a new instance of the VanLayerScaling class.

    Args:
        self: The object itself.
        hidden_size (int): The size of the hidden layer.
        initial_value (float, optional): The initial value for the weight parameter. Default is 0.01.

    Returns:
        None

    Raises:
        None
    """
    super().__init__()
    self.weight = Parameter(initial_value * ops.ones((hidden_size)), requires_grad=True)

mindnlp.transformers.models.van.modeling_van.VanLayerScaling.forward(hidden_state)

This method is part of the VanLayerScaling class and is used to perform scaling on the hidden_state tensor.

PARAMETER DESCRIPTION
self

The instance of the VanLayerScaling class.

TYPE: VanLayerScaling

hidden_state

The input tensor representing the hidden state. It is expected to be a tensor of type mindspore.Tensor.

TYPE: Tensor

RETURNS DESCRIPTION
Tensor

mindspore.Tensor: Returns a tensor of type mindspore.Tensor which is the result of scaling the input hidden_state tensor.

Source code in mindnlp/transformers/models/van/modeling_van.py
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
def forward(self, hidden_state: mindspore.Tensor) -> mindspore.Tensor:
    """
    This method is part of the VanLayerScaling class and is used to perform scaling on the hidden_state tensor.

    Args:
        self (VanLayerScaling): The instance of the VanLayerScaling class.
        hidden_state (mindspore.Tensor): The input tensor representing the hidden state.
            It is expected to be a tensor of type mindspore.Tensor.

    Returns:
        mindspore.Tensor: Returns a tensor of type mindspore.Tensor which is the result of scaling the
            input hidden_state tensor.

    Raises:
        None.
    """
    # unsqueezing for broadcasting
    hidden_state = self.weight.unsqueeze(-1).unsqueeze(-1) * hidden_state
    return hidden_state

mindnlp.transformers.models.van.modeling_van.VanMlpLayer

Bases: Module

MLP with depth-wise convolution, from PVTv2: Improved Baselines with Pyramid Vision Transformer.

Source code in mindnlp/transformers/models/van/modeling_van.py
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
class VanMlpLayer(nn.Module):
    """
    MLP with depth-wise convolution, from [PVTv2: Improved Baselines with Pyramid Vision
    Transformer](https://arxiv.org/abs/2106.13797).
    """
    def __init__(
        self,
        in_channels: int,
        hidden_size: int,
        out_channels: int,
        hidden_act: str = "gelu",
        dropout_rate: float = 0.5,
    ):
        """
        Initializes an instance of the VanMlpLayer class.

        Args:
            self: The object itself.
            in_channels (int): The number of input channels.
                This specifies the number of channels in the input tensor.
            hidden_size (int): The size of the hidden layer.
                This determines the number of output channels of the first convolutional layer.
            out_channels (int): The number of output channels.
                This specifies the number of channels in the output tensor.
            hidden_act (str, optional): The activation function for the hidden layer. Defaults to 'gelu'.
                This specifies the activation function to be used in the hidden layer.
                Supported options are 'gelu', 'relu', 'sigmoid', 'tanh', 'softmax', 'softplus', 'softsign', 'leaky_relu'.
            dropout_rate (float, optional): The dropout rate. Defaults to 0.5.
                This specifies the probability of an element to be zeroed in the dropout layers.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()
        self.in_dense = nn.Conv2d(in_channels, hidden_size, kernel_size=1)
        self.depth_wise = nn.Conv2d(hidden_size, hidden_size, kernel_size=3, padding=1, group=hidden_size)
        self.activation = ACT2FN[hidden_act]
        self.dropout1 = nn.Dropout(dropout_rate)
        self.out_dense = nn.Conv2d(hidden_size, out_channels, kernel_size=1)
        self.dropout2 = nn.Dropout(dropout_rate)

    def forward(self, hidden_state: mindspore.Tensor) -> mindspore.Tensor:
        """
        This method forwards a multi-layer perceptron (MLP) layer in the VanMlpLayer class.

        Args:
            self (VanMlpLayer): The instance of the VanMlpLayer class.
            hidden_state (mindspore.Tensor): The input hidden state tensor to be processed by the MLP layer.

        Returns:
            mindspore.Tensor: The output tensor after processing through the MLP layer.

        Raises:
            None
        """
        hidden_state = self.in_dense(hidden_state)
        hidden_state = self.depth_wise(hidden_state)
        hidden_state = self.activation(hidden_state)
        hidden_state = self.dropout1(hidden_state)
        hidden_state = self.out_dense(hidden_state)
        hidden_state = self.dropout2(hidden_state)
        return hidden_state

mindnlp.transformers.models.van.modeling_van.VanMlpLayer.__init__(in_channels, hidden_size, out_channels, hidden_act='gelu', dropout_rate=0.5)

Initializes an instance of the VanMlpLayer class.

PARAMETER DESCRIPTION
self

The object itself.

in_channels

The number of input channels. This specifies the number of channels in the input tensor.

TYPE: int

hidden_size

The size of the hidden layer. This determines the number of output channels of the first convolutional layer.

TYPE: int

out_channels

The number of output channels. This specifies the number of channels in the output tensor.

TYPE: int

hidden_act

The activation function for the hidden layer. Defaults to 'gelu'. This specifies the activation function to be used in the hidden layer. Supported options are 'gelu', 'relu', 'sigmoid', 'tanh', 'softmax', 'softplus', 'softsign', 'leaky_relu'.

TYPE: str DEFAULT: 'gelu'

dropout_rate

The dropout rate. Defaults to 0.5. This specifies the probability of an element to be zeroed in the dropout layers.

TYPE: float DEFAULT: 0.5

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/van/modeling_van.py
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
def __init__(
    self,
    in_channels: int,
    hidden_size: int,
    out_channels: int,
    hidden_act: str = "gelu",
    dropout_rate: float = 0.5,
):
    """
    Initializes an instance of the VanMlpLayer class.

    Args:
        self: The object itself.
        in_channels (int): The number of input channels.
            This specifies the number of channels in the input tensor.
        hidden_size (int): The size of the hidden layer.
            This determines the number of output channels of the first convolutional layer.
        out_channels (int): The number of output channels.
            This specifies the number of channels in the output tensor.
        hidden_act (str, optional): The activation function for the hidden layer. Defaults to 'gelu'.
            This specifies the activation function to be used in the hidden layer.
            Supported options are 'gelu', 'relu', 'sigmoid', 'tanh', 'softmax', 'softplus', 'softsign', 'leaky_relu'.
        dropout_rate (float, optional): The dropout rate. Defaults to 0.5.
            This specifies the probability of an element to be zeroed in the dropout layers.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()
    self.in_dense = nn.Conv2d(in_channels, hidden_size, kernel_size=1)
    self.depth_wise = nn.Conv2d(hidden_size, hidden_size, kernel_size=3, padding=1, group=hidden_size)
    self.activation = ACT2FN[hidden_act]
    self.dropout1 = nn.Dropout(dropout_rate)
    self.out_dense = nn.Conv2d(hidden_size, out_channels, kernel_size=1)
    self.dropout2 = nn.Dropout(dropout_rate)

mindnlp.transformers.models.van.modeling_van.VanMlpLayer.forward(hidden_state)

This method forwards a multi-layer perceptron (MLP) layer in the VanMlpLayer class.

PARAMETER DESCRIPTION
self

The instance of the VanMlpLayer class.

TYPE: VanMlpLayer

hidden_state

The input hidden state tensor to be processed by the MLP layer.

TYPE: Tensor

RETURNS DESCRIPTION
Tensor

mindspore.Tensor: The output tensor after processing through the MLP layer.

Source code in mindnlp/transformers/models/van/modeling_van.py
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
def forward(self, hidden_state: mindspore.Tensor) -> mindspore.Tensor:
    """
    This method forwards a multi-layer perceptron (MLP) layer in the VanMlpLayer class.

    Args:
        self (VanMlpLayer): The instance of the VanMlpLayer class.
        hidden_state (mindspore.Tensor): The input hidden state tensor to be processed by the MLP layer.

    Returns:
        mindspore.Tensor: The output tensor after processing through the MLP layer.

    Raises:
        None
    """
    hidden_state = self.in_dense(hidden_state)
    hidden_state = self.depth_wise(hidden_state)
    hidden_state = self.activation(hidden_state)
    hidden_state = self.dropout1(hidden_state)
    hidden_state = self.out_dense(hidden_state)
    hidden_state = self.dropout2(hidden_state)
    return hidden_state

mindnlp.transformers.models.van.modeling_van.VanModel

Bases: VanPreTrainedModel

The VanModel class represents a model for processing pixel values using the VanEncoder and providing various output representations. It inherits from the VanPreTrainedModel class and includes methods for initialization and forwarding the model's output. The forwardor initializes the model with the provided configuration, while the forward method processes the pixel values and returns the output representation. The class provides flexibility for handling hidden states and returning output in the form of BaseModelOutputWithPoolingAndNoAttention.

Source code in mindnlp/transformers/models/van/modeling_van.py
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
class VanModel(VanPreTrainedModel):

    """
    The VanModel class represents a model for processing pixel values using the VanEncoder and providing various
    output representations. It inherits from the VanPreTrainedModel class and includes methods for initialization and
    forwarding the model's output. The forwardor initializes the model with the provided configuration, while the
    forward method processes the pixel values and returns the output representation. The class provides flexibility
    for handling hidden states and returning output in the form of BaseModelOutputWithPoolingAndNoAttention.
    """
    def __init__(self, config):
        """
        Initializes a new instance of the VanModel class.

        Args:
            self: The object itself.
            config (object): The configuration object that contains various settings for the model.
                This object should have the following attributes:

                - hidden_sizes (list): A list of integers representing the sizes of hidden layers.
                - layer_norm_eps (float): A small value used for numerical stability in layer normalization.

                The config object is required for the proper initialization of the model.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__(config)
        self.config = config
        self.encoder = VanEncoder(config)
        # final layernorm layer
        self.layernorm = nn.LayerNorm(config.hidden_sizes[-1], eps=config.layer_norm_eps)
        # Initialize weights and apply final processing
        self.post_init()

    def forward(
        self,
        pixel_values: Optional[mindspore.Tensor],
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple, BaseModelOutputWithPoolingAndNoAttention]:
        """
        Constructs the encoder outputs and pooled output from the given pixel values.

        Args:
            self (VanModel): The instance of the VanModel class.
            pixel_values (Optional[mindspore.Tensor]): The input pixel values.
                If provided, it should be a Tensor.
            output_hidden_states (Optional[bool]): Whether to output hidden states.
                If None, the value is taken from self.config.output_hidden_states.
            return_dict (Optional[bool]): Whether to return the output as a dictionary.
                If None, the value is taken from self.config.use_return_dict.

        Returns:
            Union[Tuple, BaseModelOutputWithPoolingAndNoAttention]: A tuple containing the last hidden state and the
                pooled output, along with the encoder hidden states if return_dict is False. Otherwise, it
                returns a BaseModelOutputWithPoolingAndNoAttention object.

        Raises:
            None
        """
        output_hidden_states = (
            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
        )
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        encoder_outputs = self.encoder(
            pixel_values,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )
        last_hidden_state = encoder_outputs[0]
        # global average pooling, n c w h -> n c
        pooled_output = last_hidden_state.mean(dim=[-2, -1])

        if not return_dict:
            return (last_hidden_state, pooled_output) + encoder_outputs[1:]

        return BaseModelOutputWithPoolingAndNoAttention(
            last_hidden_state=last_hidden_state,
            pooler_output=pooled_output,
            hidden_states=encoder_outputs.hidden_states,
        )

mindnlp.transformers.models.van.modeling_van.VanModel.__init__(config)

Initializes a new instance of the VanModel class.

PARAMETER DESCRIPTION
self

The object itself.

config

The configuration object that contains various settings for the model. This object should have the following attributes:

  • hidden_sizes (list): A list of integers representing the sizes of hidden layers.
  • layer_norm_eps (float): A small value used for numerical stability in layer normalization.

The config object is required for the proper initialization of the model.

TYPE: object

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/van/modeling_van.py
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
def __init__(self, config):
    """
    Initializes a new instance of the VanModel class.

    Args:
        self: The object itself.
        config (object): The configuration object that contains various settings for the model.
            This object should have the following attributes:

            - hidden_sizes (list): A list of integers representing the sizes of hidden layers.
            - layer_norm_eps (float): A small value used for numerical stability in layer normalization.

            The config object is required for the proper initialization of the model.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__(config)
    self.config = config
    self.encoder = VanEncoder(config)
    # final layernorm layer
    self.layernorm = nn.LayerNorm(config.hidden_sizes[-1], eps=config.layer_norm_eps)
    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.van.modeling_van.VanModel.forward(pixel_values, output_hidden_states=None, return_dict=None)

Constructs the encoder outputs and pooled output from the given pixel values.

PARAMETER DESCRIPTION
self

The instance of the VanModel class.

TYPE: VanModel

pixel_values

The input pixel values. If provided, it should be a Tensor.

TYPE: Optional[Tensor]

output_hidden_states

Whether to output hidden states. If None, the value is taken from self.config.output_hidden_states.

TYPE: Optional[bool] DEFAULT: None

return_dict

Whether to return the output as a dictionary. If None, the value is taken from self.config.use_return_dict.

TYPE: Optional[bool] DEFAULT: None

RETURNS DESCRIPTION
Union[Tuple, BaseModelOutputWithPoolingAndNoAttention]

Union[Tuple, BaseModelOutputWithPoolingAndNoAttention]: A tuple containing the last hidden state and the pooled output, along with the encoder hidden states if return_dict is False. Otherwise, it returns a BaseModelOutputWithPoolingAndNoAttention object.

Source code in mindnlp/transformers/models/van/modeling_van.py
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
def forward(
    self,
    pixel_values: Optional[mindspore.Tensor],
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[Tuple, BaseModelOutputWithPoolingAndNoAttention]:
    """
    Constructs the encoder outputs and pooled output from the given pixel values.

    Args:
        self (VanModel): The instance of the VanModel class.
        pixel_values (Optional[mindspore.Tensor]): The input pixel values.
            If provided, it should be a Tensor.
        output_hidden_states (Optional[bool]): Whether to output hidden states.
            If None, the value is taken from self.config.output_hidden_states.
        return_dict (Optional[bool]): Whether to return the output as a dictionary.
            If None, the value is taken from self.config.use_return_dict.

    Returns:
        Union[Tuple, BaseModelOutputWithPoolingAndNoAttention]: A tuple containing the last hidden state and the
            pooled output, along with the encoder hidden states if return_dict is False. Otherwise, it
            returns a BaseModelOutputWithPoolingAndNoAttention object.

    Raises:
        None
    """
    output_hidden_states = (
        output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
    )
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    encoder_outputs = self.encoder(
        pixel_values,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )
    last_hidden_state = encoder_outputs[0]
    # global average pooling, n c w h -> n c
    pooled_output = last_hidden_state.mean(dim=[-2, -1])

    if not return_dict:
        return (last_hidden_state, pooled_output) + encoder_outputs[1:]

    return BaseModelOutputWithPoolingAndNoAttention(
        last_hidden_state=last_hidden_state,
        pooler_output=pooled_output,
        hidden_states=encoder_outputs.hidden_states,
    )

mindnlp.transformers.models.van.modeling_van.VanOverlappingPatchEmbedder

Bases: Module

Downsamples the input using a patchify operation with a stride of 4 by default making adjacent windows overlap by half of the area. From PVTv2: Improved Baselines with Pyramid Vision Transformer.

Source code in mindnlp/transformers/models/van/modeling_van.py
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
class VanOverlappingPatchEmbedder(nn.Module):
    """
    Downsamples the input using a patchify operation with a `stride` of 4 by default making adjacent windows overlap by
    half of the area. From [PVTv2: Improved Baselines with Pyramid Vision
    Transformer](https://arxiv.org/abs/2106.13797).
    """
    def __init__(self, in_channels: int, hidden_size: int, patch_size: int = 7, stride: int = 4):
        """
        Initializes a VanOverlappingPatchEmbedder object.

        Args:
            self: The instance of the class.
            in_channels (int): Number of input channels for the convolutional layer.
            hidden_size (int): Number of output channels from the convolutional layer.
            patch_size (int, optional): Size of the patch/kernel for the convolutional layer. Default is 7.
            stride (int, optional): Stride value for the convolution operation. Default is 4.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()
        self.convolution = nn.Conv2d(
            in_channels, hidden_size, kernel_size=patch_size, stride=stride, padding=patch_size // 2
        )
        self.normalization = nn.BatchNorm2d(hidden_size)

    def forward(self, input: mindspore.Tensor) -> mindspore.Tensor:
        """
        Constructs a hidden state tensor using the provided input tensor.

        Args:
            self (VanOverlappingPatchEmbedder): An instance of the VanOverlappingPatchEmbedder class.
            input (mindspore.Tensor): The input tensor to be processed.
                It should have shape (batch_size, channels, height, width).

        Returns:
            mindspore.Tensor: The hidden state tensor obtained from the input tensor after applying convolution and
                normalization. It has the same shape as the input tensor.

        Raises:
            None.

        Note:
            - The 'convolution' method is applied to the input tensor to obtain an intermediate hidden state tensor.
            - The 'normalization' method is then applied to the intermediate hidden state tensor to obtain the final
            hidden state tensor.
        """
        hidden_state = self.convolution(input)
        hidden_state = self.normalization(hidden_state)
        return hidden_state

mindnlp.transformers.models.van.modeling_van.VanOverlappingPatchEmbedder.__init__(in_channels, hidden_size, patch_size=7, stride=4)

Initializes a VanOverlappingPatchEmbedder object.

PARAMETER DESCRIPTION
self

The instance of the class.

in_channels

Number of input channels for the convolutional layer.

TYPE: int

hidden_size

Number of output channels from the convolutional layer.

TYPE: int

patch_size

Size of the patch/kernel for the convolutional layer. Default is 7.

TYPE: int DEFAULT: 7

stride

Stride value for the convolution operation. Default is 4.

TYPE: int DEFAULT: 4

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/van/modeling_van.py
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
def __init__(self, in_channels: int, hidden_size: int, patch_size: int = 7, stride: int = 4):
    """
    Initializes a VanOverlappingPatchEmbedder object.

    Args:
        self: The instance of the class.
        in_channels (int): Number of input channels for the convolutional layer.
        hidden_size (int): Number of output channels from the convolutional layer.
        patch_size (int, optional): Size of the patch/kernel for the convolutional layer. Default is 7.
        stride (int, optional): Stride value for the convolution operation. Default is 4.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()
    self.convolution = nn.Conv2d(
        in_channels, hidden_size, kernel_size=patch_size, stride=stride, padding=patch_size // 2
    )
    self.normalization = nn.BatchNorm2d(hidden_size)

mindnlp.transformers.models.van.modeling_van.VanOverlappingPatchEmbedder.forward(input)

Constructs a hidden state tensor using the provided input tensor.

PARAMETER DESCRIPTION
self

An instance of the VanOverlappingPatchEmbedder class.

TYPE: VanOverlappingPatchEmbedder

input

The input tensor to be processed. It should have shape (batch_size, channels, height, width).

TYPE: Tensor

RETURNS DESCRIPTION
Tensor

mindspore.Tensor: The hidden state tensor obtained from the input tensor after applying convolution and normalization. It has the same shape as the input tensor.

Note
  • The 'convolution' method is applied to the input tensor to obtain an intermediate hidden state tensor.
  • The 'normalization' method is then applied to the intermediate hidden state tensor to obtain the final hidden state tensor.
Source code in mindnlp/transformers/models/van/modeling_van.py
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
def forward(self, input: mindspore.Tensor) -> mindspore.Tensor:
    """
    Constructs a hidden state tensor using the provided input tensor.

    Args:
        self (VanOverlappingPatchEmbedder): An instance of the VanOverlappingPatchEmbedder class.
        input (mindspore.Tensor): The input tensor to be processed.
            It should have shape (batch_size, channels, height, width).

    Returns:
        mindspore.Tensor: The hidden state tensor obtained from the input tensor after applying convolution and
            normalization. It has the same shape as the input tensor.

    Raises:
        None.

    Note:
        - The 'convolution' method is applied to the input tensor to obtain an intermediate hidden state tensor.
        - The 'normalization' method is then applied to the intermediate hidden state tensor to obtain the final
        hidden state tensor.
    """
    hidden_state = self.convolution(input)
    hidden_state = self.normalization(hidden_state)
    return hidden_state

mindnlp.transformers.models.van.modeling_van.VanPreTrainedModel

Bases: PreTrainedModel

An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained models.

Source code in mindnlp/transformers/models/van/modeling_van.py
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
class VanPreTrainedModel(PreTrainedModel):
    """
    An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
    models.
    """
    config_class = VanConfig
    base_model_prefix = "van"
    main_input_name = "pixel_values"
    supports_gradient_checkpointing = True

    def _init_weights(self, module):
        """Initialize the weights"""
        if isinstance(module, nn.Linear):
            nn.init.trunc_normal_(module.weight, std=self.config.initializer_range)
            if isinstance(module, nn.Linear) and module.bias is not None:
                nn.init.constant_(module.bias, 0)
        elif isinstance(module, nn.LayerNorm):
            nn.init.constant_(module.bias, 0)
            nn.init.constant_(module.weight, 1.0)
        elif isinstance(module, nn.Conv2d):
            fan_out = module.kernel_size[0] * module.kernel_size[1] * module.out_channels
            fan_out //= module.group
            module.weight.data.normal_(0, math.sqrt(2.0 / fan_out))
            if module.bias is not None:
                module.bias.data.zero_()

mindnlp.transformers.models.van.modeling_van.VanSpatialAttentionLayer

Bases: Module

Van spatial attention layer composed by projection (via conv) -> act -> Large Kernel Attention (LKA) attention -> projection (via conv) + residual connection.

Source code in mindnlp/transformers/models/van/modeling_van.py
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
class VanSpatialAttentionLayer(nn.Module):
    """
    Van spatial attention layer composed by projection (via conv) -> act -> Large Kernel Attention (LKA) attention ->
    projection (via conv) + residual connection.
    """
    def __init__(self, hidden_size: int, hidden_act: str = "gelu"):
        """
        Initializes an instance of the VanSpatialAttentionLayer class.

        Args:
            hidden_size (int): The size of the hidden layer.
            hidden_act (str, optional): The activation function to be used in the pre_projection layer. Defaults to 'gelu'.

        Returns:
            None

        Raises:
            None
        """
        super().__init__()
        self.pre_projection = nn.Sequential(
            OrderedDict(
                [
                    ("conv", nn.Conv2d(hidden_size, hidden_size, kernel_size=1)),
                    ("act", ACT2FN[hidden_act]),
                ]
            )
        )
        self.attention_layer = VanLargeKernelAttentionLayer(hidden_size)
        self.post_projection = nn.Conv2d(hidden_size, hidden_size, kernel_size=1)

    def forward(self, hidden_state: mindspore.Tensor) -> mindspore.Tensor:
        """
        This method forwards a spatial attention layer in the VanSpatialAttentionLayer class.

        Args:
            self: The instance of the VanSpatialAttentionLayer class.
            hidden_state (mindspore.Tensor): The input hidden state tensor to be processed.
                It represents the feature map of the input data and should be a tensor of shape
                [batch_size, channels, height, width].

        Returns:
            mindspore.Tensor: The processed hidden state tensor after applying the spatial attention mechanism.
                It has the same shape as the input hidden_state tensor.

        Raises:
            ValueError: If the input hidden_state tensor is not a valid mindspore.Tensor.
            RuntimeError: If an error occurs during the processing of the spatial attention mechanism.
        """
        residual = hidden_state
        hidden_state = self.pre_projection(hidden_state)
        hidden_state = self.attention_layer(hidden_state)
        hidden_state = self.post_projection(hidden_state)
        hidden_state = hidden_state + residual
        return hidden_state

mindnlp.transformers.models.van.modeling_van.VanSpatialAttentionLayer.__init__(hidden_size, hidden_act='gelu')

Initializes an instance of the VanSpatialAttentionLayer class.

PARAMETER DESCRIPTION
hidden_size

The size of the hidden layer.

TYPE: int

hidden_act

The activation function to be used in the pre_projection layer. Defaults to 'gelu'.

TYPE: str DEFAULT: 'gelu'

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/van/modeling_van.py
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
def __init__(self, hidden_size: int, hidden_act: str = "gelu"):
    """
    Initializes an instance of the VanSpatialAttentionLayer class.

    Args:
        hidden_size (int): The size of the hidden layer.
        hidden_act (str, optional): The activation function to be used in the pre_projection layer. Defaults to 'gelu'.

    Returns:
        None

    Raises:
        None
    """
    super().__init__()
    self.pre_projection = nn.Sequential(
        OrderedDict(
            [
                ("conv", nn.Conv2d(hidden_size, hidden_size, kernel_size=1)),
                ("act", ACT2FN[hidden_act]),
            ]
        )
    )
    self.attention_layer = VanLargeKernelAttentionLayer(hidden_size)
    self.post_projection = nn.Conv2d(hidden_size, hidden_size, kernel_size=1)

mindnlp.transformers.models.van.modeling_van.VanSpatialAttentionLayer.forward(hidden_state)

This method forwards a spatial attention layer in the VanSpatialAttentionLayer class.

PARAMETER DESCRIPTION
self

The instance of the VanSpatialAttentionLayer class.

hidden_state

The input hidden state tensor to be processed. It represents the feature map of the input data and should be a tensor of shape [batch_size, channels, height, width].

TYPE: Tensor

RETURNS DESCRIPTION
Tensor

mindspore.Tensor: The processed hidden state tensor after applying the spatial attention mechanism. It has the same shape as the input hidden_state tensor.

RAISES DESCRIPTION
ValueError

If the input hidden_state tensor is not a valid mindspore.Tensor.

RuntimeError

If an error occurs during the processing of the spatial attention mechanism.

Source code in mindnlp/transformers/models/van/modeling_van.py
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
def forward(self, hidden_state: mindspore.Tensor) -> mindspore.Tensor:
    """
    This method forwards a spatial attention layer in the VanSpatialAttentionLayer class.

    Args:
        self: The instance of the VanSpatialAttentionLayer class.
        hidden_state (mindspore.Tensor): The input hidden state tensor to be processed.
            It represents the feature map of the input data and should be a tensor of shape
            [batch_size, channels, height, width].

    Returns:
        mindspore.Tensor: The processed hidden state tensor after applying the spatial attention mechanism.
            It has the same shape as the input hidden_state tensor.

    Raises:
        ValueError: If the input hidden_state tensor is not a valid mindspore.Tensor.
        RuntimeError: If an error occurs during the processing of the spatial attention mechanism.
    """
    residual = hidden_state
    hidden_state = self.pre_projection(hidden_state)
    hidden_state = self.attention_layer(hidden_state)
    hidden_state = self.post_projection(hidden_state)
    hidden_state = hidden_state + residual
    return hidden_state

mindnlp.transformers.models.van.modeling_van.VanStage

Bases: Module

VanStage, consisting of multiple layers.

Source code in mindnlp/transformers/models/van/modeling_van.py
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
class VanStage(nn.Module):
    """
    VanStage, consisting of multiple layers.
    """
    def __init__(
        self,
        config: VanConfig,
        in_channels: int,
        hidden_size: int,
        patch_size: int,
        stride: int,
        depth: int,
        mlp_ratio: int = 4,
        drop_path_rate: float = 0.0,
    ):
        """
        __init__

        Initializes a new instance of the VanStage class.

        Args:
            self: The current object instance.
            config (VanConfig): An instance of VanConfig class containing configuration parameters.
            in_channels (int): The number of input channels.
            hidden_size (int): The size of the hidden layer.
            patch_size (int): The size of the patch.
            stride (int): The stride for patching.
            depth (int): The depth of the network.
            mlp_ratio (int, optional): The ratio for the multi-layer perceptron. Defaults to 4.
            drop_path_rate (float, optional): The rate for drop path regularization. Defaults to 0.0.

        Returns:
            None.

        Raises:
            TypeError: If any of the input arguments does not match the expected type.
            ValueError: If any of the input arguments does not meet the specified restrictions.
        """
        super().__init__()
        self.embeddings = VanOverlappingPatchEmbedder(in_channels, hidden_size, patch_size, stride)
        self.layers = nn.Sequential(
            *[
                VanLayer(
                    config,
                    hidden_size,
                    mlp_ratio=mlp_ratio,
                    drop_path_rate=drop_path_rate,
                )
                for _ in range(depth)
            ]
        )
        self.normalization = nn.LayerNorm(hidden_size, eps=config.layer_norm_eps)

    def forward(self, hidden_state: mindspore.Tensor) -> mindspore.Tensor:
        """
        Constructs the hidden state tensor for the VanStage class.

        Args:
            self: An instance of the VanStage class.
            hidden_state (mindspore.Tensor): A tensor representing the hidden state.
                It should have a shape of (batch_size, hidden_size, height, width).

        Returns:
            mindspore.Tensor: A tensor representing the forwarded hidden state.
                It has a shape of (batch_size, hidden_size, height, width).

        Raises:
            None.
        """
        hidden_state = self.embeddings(hidden_state)
        hidden_state = self.layers(hidden_state)
        # rearrange b c h w -> b (h w) c
        batch_size, hidden_size, height, width = hidden_state.shape
        hidden_state = hidden_state.flatten(2).transpose(1, 2)
        hidden_state = self.normalization(hidden_state)
        # rearrange  b (h w) c- > b c h w
        hidden_state = hidden_state.view(batch_size, height, width, hidden_size).permute(0, 3, 1, 2)
        return hidden_state

mindnlp.transformers.models.van.modeling_van.VanStage.__init__(config, in_channels, hidden_size, patch_size, stride, depth, mlp_ratio=4, drop_path_rate=0.0)

init

Initializes a new instance of the VanStage class.

PARAMETER DESCRIPTION
self

The current object instance.

config

An instance of VanConfig class containing configuration parameters.

TYPE: VanConfig

in_channels

The number of input channels.

TYPE: int

hidden_size

The size of the hidden layer.

TYPE: int

patch_size

The size of the patch.

TYPE: int

stride

The stride for patching.

TYPE: int

depth

The depth of the network.

TYPE: int

mlp_ratio

The ratio for the multi-layer perceptron. Defaults to 4.

TYPE: int DEFAULT: 4

drop_path_rate

The rate for drop path regularization. Defaults to 0.0.

TYPE: float DEFAULT: 0.0

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
TypeError

If any of the input arguments does not match the expected type.

ValueError

If any of the input arguments does not meet the specified restrictions.

Source code in mindnlp/transformers/models/van/modeling_van.py
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
def __init__(
    self,
    config: VanConfig,
    in_channels: int,
    hidden_size: int,
    patch_size: int,
    stride: int,
    depth: int,
    mlp_ratio: int = 4,
    drop_path_rate: float = 0.0,
):
    """
    __init__

    Initializes a new instance of the VanStage class.

    Args:
        self: The current object instance.
        config (VanConfig): An instance of VanConfig class containing configuration parameters.
        in_channels (int): The number of input channels.
        hidden_size (int): The size of the hidden layer.
        patch_size (int): The size of the patch.
        stride (int): The stride for patching.
        depth (int): The depth of the network.
        mlp_ratio (int, optional): The ratio for the multi-layer perceptron. Defaults to 4.
        drop_path_rate (float, optional): The rate for drop path regularization. Defaults to 0.0.

    Returns:
        None.

    Raises:
        TypeError: If any of the input arguments does not match the expected type.
        ValueError: If any of the input arguments does not meet the specified restrictions.
    """
    super().__init__()
    self.embeddings = VanOverlappingPatchEmbedder(in_channels, hidden_size, patch_size, stride)
    self.layers = nn.Sequential(
        *[
            VanLayer(
                config,
                hidden_size,
                mlp_ratio=mlp_ratio,
                drop_path_rate=drop_path_rate,
            )
            for _ in range(depth)
        ]
    )
    self.normalization = nn.LayerNorm(hidden_size, eps=config.layer_norm_eps)

mindnlp.transformers.models.van.modeling_van.VanStage.forward(hidden_state)

Constructs the hidden state tensor for the VanStage class.

PARAMETER DESCRIPTION
self

An instance of the VanStage class.

hidden_state

A tensor representing the hidden state. It should have a shape of (batch_size, hidden_size, height, width).

TYPE: Tensor

RETURNS DESCRIPTION
Tensor

mindspore.Tensor: A tensor representing the forwarded hidden state. It has a shape of (batch_size, hidden_size, height, width).

Source code in mindnlp/transformers/models/van/modeling_van.py
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
def forward(self, hidden_state: mindspore.Tensor) -> mindspore.Tensor:
    """
    Constructs the hidden state tensor for the VanStage class.

    Args:
        self: An instance of the VanStage class.
        hidden_state (mindspore.Tensor): A tensor representing the hidden state.
            It should have a shape of (batch_size, hidden_size, height, width).

    Returns:
        mindspore.Tensor: A tensor representing the forwarded hidden state.
            It has a shape of (batch_size, hidden_size, height, width).

    Raises:
        None.
    """
    hidden_state = self.embeddings(hidden_state)
    hidden_state = self.layers(hidden_state)
    # rearrange b c h w -> b (h w) c
    batch_size, hidden_size, height, width = hidden_state.shape
    hidden_state = hidden_state.flatten(2).transpose(1, 2)
    hidden_state = self.normalization(hidden_state)
    # rearrange  b (h w) c- > b c h w
    hidden_state = hidden_state.view(batch_size, height, width, hidden_size).permute(0, 3, 1, 2)
    return hidden_state

mindnlp.transformers.models.van.modeling_van.drop_path(input, drop_prob=0.0, training=False)

Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).

Comment by Ross Wightman: This is the same as the DropConnect impl I created for EfficientNet, etc networks, however, the original name is misleading as 'Drop Connect' is a different form of dropout in a separate paper... See discussion: https://github.com/tensorflow/tpu/issues/494#issuecomment-532968956 ... I've opted for changing the layer and argument names to 'drop path' rather than mix DropConnect as a layer name and use 'survival rate' as the argument.

Source code in mindnlp/transformers/models/van/modeling_van.py
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
def drop_path(input: mindspore.Tensor, drop_prob: float = 0.0, training: bool = False) -> mindspore.Tensor:
    """
    Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).

    Comment by Ross Wightman: This is the same as the DropConnect impl I created for EfficientNet, etc networks,
    however, the original name is misleading as 'Drop Connect' is a different form of dropout in a separate paper...
    See discussion: https://github.com/tensorflow/tpu/issues/494#issuecomment-532968956 ... I've opted for changing the
    layer and argument names to 'drop path' rather than mix DropConnect as a layer name and use 'survival rate' as the
    argument.
    """
    if drop_prob == 0.0 or not training:
        return input
    keep_prob = 1 - drop_prob
    shape = (input.shape[0],) + (1,) * (input.ndim - 1)  # work with diff dim tensors, not just 2D ConvNets
    random_tensor = keep_prob + ops.rand(shape, dtype=input.dtype)
    random_tensor = random_tensor.floor()  # binarize
    output = input.div(keep_prob) * random_tensor
    return output