Skip to content

albert

mindnlp.transformers.models.albert.configuration_albert

ALBERT model configuration

mindnlp.transformers.models.albert.configuration_albert.AlbertConfig

Bases: PretrainedConfig

This is the configuration class to store the configuration of a [AlbertModel] or a [TFAlbertModel]. It is used to instantiate an ALBERT model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the ALBERT albert-xxlarge-v2 architecture.

Configuration objects inherit from [PretrainedConfig] and can be used to control the model outputs. Read the documentation from [PretrainedConfig] for more information.

PARAMETER DESCRIPTION
vocab_size

Vocabulary size of the ALBERT model. Defines the number of different tokens that can be represented by the inputs_ids passed when calling [AlbertModel] or [TFAlbertModel].

TYPE: `int`, *optional*, defaults to 30000 DEFAULT: 30000

embedding_size

Dimensionality of vocabulary embeddings.

TYPE: `int`, *optional*, defaults to 128 DEFAULT: 128

hidden_size

Dimensionality of the encoder layers and the pooler layer.

TYPE: `int`, *optional*, defaults to 4096 DEFAULT: 4096

num_hidden_layers

Number of hidden layers in the Transformer encoder.

TYPE: `int`, *optional*, defaults to 12 DEFAULT: 12

num_hidden_groups

Number of groups for the hidden layers, parameters in the same group are shared.

TYPE: `int`, *optional*, defaults to 1 DEFAULT: 1

num_attention_heads

Number of attention heads for each attention layer in the Transformer encoder.

TYPE: `int`, *optional*, defaults to 64 DEFAULT: 64

intermediate_size

The dimensionality of the "intermediate" (often named feed-forward) layer in the Transformer encoder.

TYPE: `int`, *optional*, defaults to 16384 DEFAULT: 16384

inner_group_num

The number of inner repetition of attention and ffn.

TYPE: `int`, *optional*, defaults to 1 DEFAULT: 1

hidden_act

The non-linear activation function (function or string) in the encoder and pooler. If string, "gelu", "relu", "silu" and "gelu_new" are supported.

TYPE: `str` or `Callable`, *optional*, defaults to `"gelu_new"` DEFAULT: 'gelu_new'

hidden_dropout_prob

The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.

TYPE: `float`, *optional*, defaults to 0 DEFAULT: 0

attention_probs_dropout_prob

The dropout ratio for the attention probabilities.

TYPE: `float`, *optional*, defaults to 0 DEFAULT: 0

max_position_embeddings

The maximum sequence length that this model might ever be used with. Typically set this to something large (e.g., 512 or 1024 or 2048).

TYPE: `int`, *optional*, defaults to 512 DEFAULT: 512

type_vocab_size

The vocabulary size of the token_type_ids passed when calling [AlbertModel] or [TFAlbertModel].

TYPE: `int`, *optional*, defaults to 2 DEFAULT: 2

initializer_range

The standard deviation of the truncated_normal_initializer for initializing all weight matrices.

TYPE: `float`, *optional*, defaults to 0.02 DEFAULT: 0.02

layer_norm_eps

The epsilon used by the layer normalization layers.

TYPE: `float`, *optional*, defaults to 1e-12 DEFAULT: 1e-12

classifier_dropout_prob

The dropout ratio for attached classifiers.

TYPE: `float`, *optional*, defaults to 0.1 DEFAULT: 0.1

position_embedding_type

Type of position embedding. Choose one of "absolute", "relative_key", "relative_key_query". For positional embeddings use "absolute". For more information on "relative_key", please refer to Self-Attention with Relative Position Representations (Shaw et al.). For more information on "relative_key_query", please refer to Method 4 in Improve Transformer Models with Better Relative Position Embeddings (Huang et al.).

TYPE: `str`, *optional*, defaults to `"absolute"` DEFAULT: 'absolute'

pad_token_id

Padding token id.

TYPE: `int`, *optional*, defaults to 0 DEFAULT: 0

bos_token_id

Beginning of stream token id.

TYPE: `int`, *optional*, defaults to 2 DEFAULT: 2

eos_token_id

End of stream token id.

TYPE: `int`, *optional*, defaults to 3 DEFAULT: 3

Example
>>> from transformers import AlbertConfig, AlbertModel
...
>>> # Initializing an ALBERT-xxlarge style configuration
>>> albert_xxlarge_configuration = AlbertConfig()
...
>>> # Initializing an ALBERT-base style configuration
>>> albert_base_configuration = AlbertConfig(
...     hidden_size=768,
...     num_attention_heads=12,
...     intermediate_size=3072,
... )
...
>>> # Initializing a model (with random weights) from the ALBERT-base style configuration
>>> model = AlbertModel(albert_xxlarge_configuration)
...
>>> # Accessing the model configuration
>>> configuration = model.config
Source code in mindnlp/transformers/models/albert/configuration_albert.py
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
class AlbertConfig(PretrainedConfig):
    r"""
    This is the configuration class to store the configuration of a [`AlbertModel`] or a [`TFAlbertModel`]. It is used
    to instantiate an ALBERT model according to the specified arguments, defining the model architecture. Instantiating
    a configuration with the defaults will yield a similar configuration to that of the ALBERT
    [albert-xxlarge-v2](https://hf-mirror.com/albert-xxlarge-v2) architecture.

    Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
    documentation from [`PretrainedConfig`] for more information.

    Args:
        vocab_size (`int`, *optional*, defaults to 30000):
            Vocabulary size of the ALBERT model. Defines the number of different tokens that can be represented by the
            `inputs_ids` passed when calling [`AlbertModel`] or [`TFAlbertModel`].
        embedding_size (`int`, *optional*, defaults to 128):
            Dimensionality of vocabulary embeddings.
        hidden_size (`int`, *optional*, defaults to 4096):
            Dimensionality of the encoder layers and the pooler layer.
        num_hidden_layers (`int`, *optional*, defaults to 12):
            Number of hidden layers in the Transformer encoder.
        num_hidden_groups (`int`, *optional*, defaults to 1):
            Number of groups for the hidden layers, parameters in the same group are shared.
        num_attention_heads (`int`, *optional*, defaults to 64):
            Number of attention heads for each attention layer in the Transformer encoder.
        intermediate_size (`int`, *optional*, defaults to 16384):
            The dimensionality of the "intermediate" (often named feed-forward) layer in the Transformer encoder.
        inner_group_num (`int`, *optional*, defaults to 1):
            The number of inner repetition of attention and ffn.
        hidden_act (`str` or `Callable`, *optional*, defaults to `"gelu_new"`):
            The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
            `"relu"`, `"silu"` and `"gelu_new"` are supported.
        hidden_dropout_prob (`float`, *optional*, defaults to 0):
            The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
        attention_probs_dropout_prob (`float`, *optional*, defaults to 0):
            The dropout ratio for the attention probabilities.
        max_position_embeddings (`int`, *optional*, defaults to 512):
            The maximum sequence length that this model might ever be used with. Typically set this to something large
            (e.g., 512 or 1024 or 2048).
        type_vocab_size (`int`, *optional*, defaults to 2):
            The vocabulary size of the `token_type_ids` passed when calling [`AlbertModel`] or [`TFAlbertModel`].
        initializer_range (`float`, *optional*, defaults to 0.02):
            The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
        layer_norm_eps (`float`, *optional*, defaults to 1e-12):
            The epsilon used by the layer normalization layers.
        classifier_dropout_prob (`float`, *optional*, defaults to 0.1):
            The dropout ratio for attached classifiers.
        position_embedding_type (`str`, *optional*, defaults to `"absolute"`):
            Type of position embedding. Choose one of `"absolute"`, `"relative_key"`, `"relative_key_query"`. For
            positional embeddings use `"absolute"`. For more information on `"relative_key"`, please refer to
            [Self-Attention with Relative Position Representations (Shaw et al.)](https://arxiv.org/abs/1803.02155).
            For more information on `"relative_key_query"`, please refer to *Method 4* in [Improve Transformer Models
            with Better Relative Position Embeddings (Huang et al.)](https://arxiv.org/abs/2009.13658).
        pad_token_id (`int`, *optional*, defaults to 0):
            Padding token id.
        bos_token_id (`int`, *optional*, defaults to 2):
            Beginning of stream token id.
        eos_token_id (`int`, *optional*, defaults to 3):
            End of stream token id.

    Example:
        ```python
        >>> from transformers import AlbertConfig, AlbertModel
        ...
        >>> # Initializing an ALBERT-xxlarge style configuration
        >>> albert_xxlarge_configuration = AlbertConfig()
        ...
        >>> # Initializing an ALBERT-base style configuration
        >>> albert_base_configuration = AlbertConfig(
        ...     hidden_size=768,
        ...     num_attention_heads=12,
        ...     intermediate_size=3072,
        ... )
        ...
        >>> # Initializing a model (with random weights) from the ALBERT-base style configuration
        >>> model = AlbertModel(albert_xxlarge_configuration)
        ...
        >>> # Accessing the model configuration
        >>> configuration = model.config
        ```
    """
    model_type = "albert"

    def __init__(
        self,
        vocab_size=30000,
        embedding_size=128,
        hidden_size=4096,
        num_hidden_layers=12,
        num_hidden_groups=1,
        num_attention_heads=64,
        intermediate_size=16384,
        inner_group_num=1,
        hidden_act="gelu_new",
        hidden_dropout_prob=0,
        attention_probs_dropout_prob=0,
        max_position_embeddings=512,
        type_vocab_size=2,
        initializer_range=0.02,
        layer_norm_eps=1e-12,
        classifier_dropout_prob=0.1,
        position_embedding_type="absolute",
        pad_token_id=0,
        bos_token_id=2,
        eos_token_id=3,
        **kwargs,
    ):
        """
        __init__

        Initializes an instance of AlbertConfig.

        Args:
            self: The instance of the class.
            vocab_size (int, optional): The vocabulary size. Defaults to 30000.
            embedding_size (int, optional): The size of word embeddings. Defaults to 128.
            hidden_size (int, optional): The size of hidden layers. Defaults to 4096.
            num_hidden_layers (int, optional): The number of hidden layers. Defaults to 12.
            num_hidden_groups (int, optional): The number of hidden groups. Defaults to 1.
            num_attention_heads (int, optional): The number of attention heads. Defaults to 64.
            intermediate_size (int, optional): The size of intermediate layers. Defaults to 16384.
            inner_group_num (int, optional): The number of inner groups. Defaults to 1.
            hidden_act (str, optional): The activation function for hidden layers. Defaults to 'gelu_new'.
            hidden_dropout_prob (float, optional): The dropout probability for hidden layers. Defaults to 0.
            attention_probs_dropout_prob (float, optional): The dropout probability for attention probabilities. Defaults to 0.
            max_position_embeddings (int, optional): The maximum position for embeddings. Defaults to 512.
            type_vocab_size (int, optional): The size of the type vocabulary. Defaults to 2.
            initializer_range (float, optional): The range for weight initialization. Defaults to 0.02.
            layer_norm_eps (float, optional): The epsilon value for layer normalization. Defaults to 1e-12.
            classifier_dropout_prob (float, optional): The dropout probability for the classifier. Defaults to 0.1.
            position_embedding_type (str, optional): The type of position embedding. Defaults to 'absolute'.
            pad_token_id (int, optional): The ID for padding token. Defaults to 0.
            bos_token_id (int, optional): The ID for beginning of sequence token. Defaults to 2.
            eos_token_id (int, optional): The ID for end of sequence token. Defaults to 3.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__(pad_token_id=pad_token_id, bos_token_id=bos_token_id, eos_token_id=eos_token_id, **kwargs)

        self.vocab_size = vocab_size
        self.embedding_size = embedding_size
        self.hidden_size = hidden_size
        self.num_hidden_layers = num_hidden_layers
        self.num_hidden_groups = num_hidden_groups
        self.num_attention_heads = num_attention_heads
        self.inner_group_num = inner_group_num
        self.hidden_act = hidden_act
        self.intermediate_size = intermediate_size
        self.hidden_dropout_prob = hidden_dropout_prob
        self.attention_probs_dropout_prob = attention_probs_dropout_prob
        self.max_position_embeddings = max_position_embeddings
        self.type_vocab_size = type_vocab_size
        self.initializer_range = initializer_range
        self.layer_norm_eps = layer_norm_eps
        self.classifier_dropout_prob = classifier_dropout_prob
        self.position_embedding_type = position_embedding_type

mindnlp.transformers.models.albert.configuration_albert.AlbertConfig.__init__(vocab_size=30000, embedding_size=128, hidden_size=4096, num_hidden_layers=12, num_hidden_groups=1, num_attention_heads=64, intermediate_size=16384, inner_group_num=1, hidden_act='gelu_new', hidden_dropout_prob=0, attention_probs_dropout_prob=0, max_position_embeddings=512, type_vocab_size=2, initializer_range=0.02, layer_norm_eps=1e-12, classifier_dropout_prob=0.1, position_embedding_type='absolute', pad_token_id=0, bos_token_id=2, eos_token_id=3, **kwargs)

init

Initializes an instance of AlbertConfig.

PARAMETER DESCRIPTION
self

The instance of the class.

vocab_size

The vocabulary size. Defaults to 30000.

TYPE: int DEFAULT: 30000

embedding_size

The size of word embeddings. Defaults to 128.

TYPE: int DEFAULT: 128

hidden_size

The size of hidden layers. Defaults to 4096.

TYPE: int DEFAULT: 4096

num_hidden_layers

The number of hidden layers. Defaults to 12.

TYPE: int DEFAULT: 12

num_hidden_groups

The number of hidden groups. Defaults to 1.

TYPE: int DEFAULT: 1

num_attention_heads

The number of attention heads. Defaults to 64.

TYPE: int DEFAULT: 64

intermediate_size

The size of intermediate layers. Defaults to 16384.

TYPE: int DEFAULT: 16384

inner_group_num

The number of inner groups. Defaults to 1.

TYPE: int DEFAULT: 1

hidden_act

The activation function for hidden layers. Defaults to 'gelu_new'.

TYPE: str DEFAULT: 'gelu_new'

hidden_dropout_prob

The dropout probability for hidden layers. Defaults to 0.

TYPE: float DEFAULT: 0

attention_probs_dropout_prob

The dropout probability for attention probabilities. Defaults to 0.

TYPE: float DEFAULT: 0

max_position_embeddings

The maximum position for embeddings. Defaults to 512.

TYPE: int DEFAULT: 512

type_vocab_size

The size of the type vocabulary. Defaults to 2.

TYPE: int DEFAULT: 2

initializer_range

The range for weight initialization. Defaults to 0.02.

TYPE: float DEFAULT: 0.02

layer_norm_eps

The epsilon value for layer normalization. Defaults to 1e-12.

TYPE: float DEFAULT: 1e-12

classifier_dropout_prob

The dropout probability for the classifier. Defaults to 0.1.

TYPE: float DEFAULT: 0.1

position_embedding_type

The type of position embedding. Defaults to 'absolute'.

TYPE: str DEFAULT: 'absolute'

pad_token_id

The ID for padding token. Defaults to 0.

TYPE: int DEFAULT: 0

bos_token_id

The ID for beginning of sequence token. Defaults to 2.

TYPE: int DEFAULT: 2

eos_token_id

The ID for end of sequence token. Defaults to 3.

TYPE: int DEFAULT: 3

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/albert/configuration_albert.py
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
def __init__(
    self,
    vocab_size=30000,
    embedding_size=128,
    hidden_size=4096,
    num_hidden_layers=12,
    num_hidden_groups=1,
    num_attention_heads=64,
    intermediate_size=16384,
    inner_group_num=1,
    hidden_act="gelu_new",
    hidden_dropout_prob=0,
    attention_probs_dropout_prob=0,
    max_position_embeddings=512,
    type_vocab_size=2,
    initializer_range=0.02,
    layer_norm_eps=1e-12,
    classifier_dropout_prob=0.1,
    position_embedding_type="absolute",
    pad_token_id=0,
    bos_token_id=2,
    eos_token_id=3,
    **kwargs,
):
    """
    __init__

    Initializes an instance of AlbertConfig.

    Args:
        self: The instance of the class.
        vocab_size (int, optional): The vocabulary size. Defaults to 30000.
        embedding_size (int, optional): The size of word embeddings. Defaults to 128.
        hidden_size (int, optional): The size of hidden layers. Defaults to 4096.
        num_hidden_layers (int, optional): The number of hidden layers. Defaults to 12.
        num_hidden_groups (int, optional): The number of hidden groups. Defaults to 1.
        num_attention_heads (int, optional): The number of attention heads. Defaults to 64.
        intermediate_size (int, optional): The size of intermediate layers. Defaults to 16384.
        inner_group_num (int, optional): The number of inner groups. Defaults to 1.
        hidden_act (str, optional): The activation function for hidden layers. Defaults to 'gelu_new'.
        hidden_dropout_prob (float, optional): The dropout probability for hidden layers. Defaults to 0.
        attention_probs_dropout_prob (float, optional): The dropout probability for attention probabilities. Defaults to 0.
        max_position_embeddings (int, optional): The maximum position for embeddings. Defaults to 512.
        type_vocab_size (int, optional): The size of the type vocabulary. Defaults to 2.
        initializer_range (float, optional): The range for weight initialization. Defaults to 0.02.
        layer_norm_eps (float, optional): The epsilon value for layer normalization. Defaults to 1e-12.
        classifier_dropout_prob (float, optional): The dropout probability for the classifier. Defaults to 0.1.
        position_embedding_type (str, optional): The type of position embedding. Defaults to 'absolute'.
        pad_token_id (int, optional): The ID for padding token. Defaults to 0.
        bos_token_id (int, optional): The ID for beginning of sequence token. Defaults to 2.
        eos_token_id (int, optional): The ID for end of sequence token. Defaults to 3.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__(pad_token_id=pad_token_id, bos_token_id=bos_token_id, eos_token_id=eos_token_id, **kwargs)

    self.vocab_size = vocab_size
    self.embedding_size = embedding_size
    self.hidden_size = hidden_size
    self.num_hidden_layers = num_hidden_layers
    self.num_hidden_groups = num_hidden_groups
    self.num_attention_heads = num_attention_heads
    self.inner_group_num = inner_group_num
    self.hidden_act = hidden_act
    self.intermediate_size = intermediate_size
    self.hidden_dropout_prob = hidden_dropout_prob
    self.attention_probs_dropout_prob = attention_probs_dropout_prob
    self.max_position_embeddings = max_position_embeddings
    self.type_vocab_size = type_vocab_size
    self.initializer_range = initializer_range
    self.layer_norm_eps = layer_norm_eps
    self.classifier_dropout_prob = classifier_dropout_prob
    self.position_embedding_type = position_embedding_type

mindnlp.transformers.models.albert.modeling_albert

MindSpore ALBERT model.

mindnlp.transformers.models.albert.modeling_albert.AlbertAttention

Bases: Module

A class representing the attention mechanism for the ALBERT (A Lite BERT) model.

This class implements the attention mechanism used in the ALBERT model for processing input sequences. It includes methods for processing queries, keys, and values, calculating attention scores, applying attention masks, handling position embeddings, and generating the final contextualized output.

This class inherits from the nn.Module class and contains the following methods:

  • init(self, config: AlbertConfig): Initializes the AlbertAttention instance with the provided configuration.
  • transpose_for_scores(self, x: mindspore.Tensor) -> mindspore.Tensor: Transposes the input tensor for calculating attention scores.
  • prune_heads(self, heads: List[int]) -> None: Prunes specific attention heads from the model.
  • forward(self, hidden_states: mindspore.Tensor, attention_mask: Optional[mindspore.Tensor] = None, head_mask: Optional[mindspore.Tensor] = None, output_attentions: bool = False) -> Union[Tuple[mindspore.Tensor], Tuple[mindspore.Tensor, mindspore.Tensor]]: Constructs the output based on the input hidden states, applying attention and head masks if provided.

The AlbertAttention class is a crucial component in the ALBERT model architecture, responsible for capturing interactions between tokens in the input sequence to generate contextualized representations.

Source code in mindnlp/transformers/models/albert/modeling_albert.py
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
class AlbertAttention(nn.Module):

    """
    A class representing the attention mechanism for the ALBERT (A Lite BERT) model.

    This class implements the attention mechanism used in the ALBERT model for processing input sequences.
    It includes methods for processing queries, keys, and values, calculating attention scores, applying attention masks,
    handling position embeddings, and generating the final contextualized output.

    This class inherits from the nn.Module class and contains the following methods:

    - __init__(self, config: AlbertConfig): Initializes the AlbertAttention instance with the provided configuration.
    - transpose_for_scores(self, x: mindspore.Tensor) -> mindspore.Tensor: Transposes the input tensor for calculating attention scores.
    - prune_heads(self, heads: List[int]) -> None: Prunes specific attention heads from the model.
    - forward(self, hidden_states: mindspore.Tensor, attention_mask: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None, output_attentions: bool = False) ->
    Union[Tuple[mindspore.Tensor], Tuple[mindspore.Tensor, mindspore.Tensor]]: Constructs the output based on the input
    hidden states, applying attention and head masks if provided.

    The AlbertAttention class is a crucial component in the ALBERT model architecture, responsible for capturing
    interactions between tokens in the input sequence to generate contextualized representations.

    """
    def __init__(self, config: AlbertConfig):
        """
        Initializes an instance of the AlbertAttention class.

        Args:
            self: The instance of the class.
            config (AlbertConfig):
                An object of type AlbertConfig containing configuration parameters for the Albert model.

                - config.hidden_size (int): The size of the hidden layers in the model.
                - config.num_attention_heads (int): The number of attention heads in the model.
                - config.embedding_size (int, optional): The size of the embeddings in the model.
                - config.attention_probs_dropout_prob (float): The dropout probability for attention probabilities.
                - config.hidden_dropout_prob (float): The dropout probability for hidden layers.
                - config.layer_norm_eps (float): The epsilon value for LayerNorm.
                - config.position_embedding_type (str): The type of position embedding ('absolute', 'relative_key', 'relative_key_query').
                - config.max_position_embeddings (int): The maximum position embeddings allowed.

        Returns:
            None.

        Raises:
            ValueError: Raised if the hidden_size is not a multiple of the num_attention_heads and no embedding_size is provided.
        """
        super().__init__()
        if config.hidden_size % config.num_attention_heads != 0 and not hasattr(config, "embedding_size"):
            raise ValueError(
                f"The hidden size ({config.hidden_size}) is not a multiple of the number of attention "
                f"heads ({config.num_attention_heads}"
            )

        self.num_attention_heads = config.num_attention_heads
        self.hidden_size = config.hidden_size
        self.attention_head_size = config.hidden_size // config.num_attention_heads
        self.all_head_size = self.num_attention_heads * self.attention_head_size

        self.query = nn.Linear(config.hidden_size, self.all_head_size)
        self.key = nn.Linear(config.hidden_size, self.all_head_size)
        self.value = nn.Linear(config.hidden_size, self.all_head_size)

        self.attention_dropout = nn.Dropout(p=config.attention_probs_dropout_prob)
        self.output_dropout = nn.Dropout(p=config.hidden_dropout_prob)
        self.dense = nn.Linear(config.hidden_size, config.hidden_size)
        self.LayerNorm = nn.LayerNorm([config.hidden_size], eps=config.layer_norm_eps)
        self.pruned_heads = set()

        self.position_embedding_type = getattr(config, "position_embedding_type", "absolute")
        if self.position_embedding_type in ('relative_key', 'relative_key_query'):
            self.max_position_embeddings = config.max_position_embeddings
            self.distance_embedding = nn.Embedding(2 * config.max_position_embeddings - 1, self.attention_head_size)

    # Copied from transformers.models.bert.modeling_bert.BertSelfAttention.transpose_for_scores
    def transpose_for_scores(self, x: mindspore.Tensor) -> mindspore.Tensor:
        """
        Transpose the input tensor for calculating attention scores in the AlbertAttention class.

        Args:
            self (AlbertAttention): The instance of the AlbertAttention class.
            x (mindspore.Tensor): The input tensor to be transposed. It should have a shape of (batch_size, sequence_length, hidden_size).

        Returns:
            mindspore.Tensor:
                The transposed tensor with shape (batch_size, num_attention_heads, sequence_length, attention_head_size).
                The attention_head_size is calculated as hidden_size / num_attention_heads.

        Raises:
            None.
        """
        new_x_shape = x.shape[:-1] + (self.num_attention_heads, self.attention_head_size)
        x = x.view(new_x_shape)
        return x.permute(0, 2, 1, 3)

    def prune_heads(self, heads: List[int]) -> None:
        """
        This method prunes specific attention heads from the AlbertAttention class.

        Args:
            self: The instance of the AlbertAttention class.
            heads (List[int]): A list of integers representing the attention heads to be pruned. If the list is empty, no action is taken.

        Returns:
            None: This method does not return any value, it modifies the internal state of the AlbertAttention instance.

        Raises:
            None
        """
        if len(heads) == 0:
            return
        heads, index = find_pruneable_heads_and_indices(
            heads, self.num_attention_heads, self.attention_head_size, self.pruned_heads
        )

        # Prune linear layers
        self.query = prune_linear_layer(self.query, index)
        self.key = prune_linear_layer(self.key, index)
        self.value = prune_linear_layer(self.value, index)
        self.dense = prune_linear_layer(self.dense, index, dim=1)

        # Update hyper params and store pruned heads
        self.num_attention_heads = self.num_attention_heads - len(heads)
        self.all_head_size = self.attention_head_size * self.num_attention_heads
        self.pruned_heads = self.pruned_heads.union(heads)

    def forward(
        self,
        hidden_states: mindspore.Tensor,
        attention_mask: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        output_attentions: bool = False,
    ) -> Union[Tuple[mindspore.Tensor], Tuple[mindspore.Tensor, mindspore.Tensor]]:
        '''
        Constructs the attention mechanism for the Albert model.

        Args:
            self (AlbertAttention): An instance of the AlbertAttention class.
            hidden_states (mindspore.Tensor): The input hidden states tensor of shape (batch_size, seq_length, hidden_size).
            attention_mask (Optional[mindspore.Tensor]): The attention mask tensor of shape (batch_size, seq_length, seq_length).
                Defaults to None.
            head_mask (Optional[mindspore.Tensor]): The head mask tensor of shape (num_attention_heads, seq_length, seq_length).
                Defaults to None.
            output_attentions (bool): Whether to output the attention probabilities. Defaults to False.

        Returns:
            Union[Tuple[mindspore.Tensor], Tuple[mindspore.Tensor, mindspore.Tensor]]:

                - If output_attentions is False, returns a tuple containing:

                    - layernormed_context_layer (mindspore.Tensor): The output tensor after applying layer normalization
                    of shape (batch_size, seq_length, hidden_size).

                - If output_attentions is True, returns a tuple containing:

                    - layernormed_context_layer (mindspore.Tensor): The output tensor after applying layer normalization
                    of shape (batch_size, seq_length, hidden_size).

                    - attention_probs (mindspore.Tensor): The attention probabilities tensor of shape
                    (batch_size, num_attention_heads, seq_length, seq_length).

        Raises:
            None.
        '''
        mixed_query_layer = self.query(hidden_states)
        mixed_key_layer = self.key(hidden_states)
        mixed_value_layer = self.value(hidden_states)

        query_layer = self.transpose_for_scores(mixed_query_layer)
        key_layer = self.transpose_for_scores(mixed_key_layer)
        value_layer = self.transpose_for_scores(mixed_value_layer)

        # Take the dot product between "query" and "key" to get the raw attention scores.
        attention_scores = ops.matmul(query_layer, key_layer.swapaxes(-1, -2))
        attention_scores = attention_scores / math.sqrt(self.attention_head_size)

        if attention_mask is not None:
            # Apply the attention mask is (precomputed for all layers in BertModel forward() function)
            attention_scores = attention_scores + attention_mask

        if self.position_embedding_type in ('relative_key', 'relative_key_query'):
            seq_length = hidden_states.shape[1]
            position_ids_l = ops.arange(seq_length, dtype=mindspore.int64).view(-1, 1)
            position_ids_r = ops.arange(seq_length, dtype=mindspore.int64).view(1, -1)
            distance = position_ids_l - position_ids_r
            positional_embedding = self.distance_embedding(distance + self.max_position_embeddings - 1)
            positional_embedding = positional_embedding.to(dtype=query_layer.dtype)  # fp16 compatibility

            if self.position_embedding_type == "relative_key":
                relative_position_scores = ops.einsum("bhld,lrd->bhlr", query_layer, positional_embedding)
                attention_scores = attention_scores + relative_position_scores
            elif self.position_embedding_type == "relative_key_query":
                relative_position_scores_query = ops.einsum("bhld,lrd->bhlr", query_layer, positional_embedding)
                relative_position_scores_key = ops.einsum("bhrd,lrd->bhlr", key_layer, positional_embedding)
                attention_scores = attention_scores + relative_position_scores_query + relative_position_scores_key

        # Normalize the attention scores to probabilities.
        attention_probs = ops.softmax(attention_scores, dim=-1)

        # This is actually dropping out entire tokens to attend to, which might
        # seem a bit unusual, but is taken from the original Transformer paper.
        attention_probs = self.attention_dropout(attention_probs)

        # Mask heads if we want to
        if head_mask is not None:
            attention_probs = attention_probs * head_mask

        context_layer = ops.matmul(attention_probs, value_layer)
        context_layer = context_layer.swapaxes(2, 1).flatten(start_dim=2)

        projected_context_layer = self.dense(context_layer)
        projected_context_layer_dropout = self.output_dropout(projected_context_layer)
        layernormed_context_layer = self.LayerNorm(hidden_states + projected_context_layer_dropout)
        return (layernormed_context_layer, attention_probs) if output_attentions else (layernormed_context_layer,)

mindnlp.transformers.models.albert.modeling_albert.AlbertAttention.__init__(config)

Initializes an instance of the AlbertAttention class.

PARAMETER DESCRIPTION
self

The instance of the class.

config

An object of type AlbertConfig containing configuration parameters for the Albert model.

  • config.hidden_size (int): The size of the hidden layers in the model.
  • config.num_attention_heads (int): The number of attention heads in the model.
  • config.embedding_size (int, optional): The size of the embeddings in the model.
  • config.attention_probs_dropout_prob (float): The dropout probability for attention probabilities.
  • config.hidden_dropout_prob (float): The dropout probability for hidden layers.
  • config.layer_norm_eps (float): The epsilon value for LayerNorm.
  • config.position_embedding_type (str): The type of position embedding ('absolute', 'relative_key', 'relative_key_query').
  • config.max_position_embeddings (int): The maximum position embeddings allowed.

TYPE: AlbertConfig

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
ValueError

Raised if the hidden_size is not a multiple of the num_attention_heads and no embedding_size is provided.

Source code in mindnlp/transformers/models/albert/modeling_albert.py
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
def __init__(self, config: AlbertConfig):
    """
    Initializes an instance of the AlbertAttention class.

    Args:
        self: The instance of the class.
        config (AlbertConfig):
            An object of type AlbertConfig containing configuration parameters for the Albert model.

            - config.hidden_size (int): The size of the hidden layers in the model.
            - config.num_attention_heads (int): The number of attention heads in the model.
            - config.embedding_size (int, optional): The size of the embeddings in the model.
            - config.attention_probs_dropout_prob (float): The dropout probability for attention probabilities.
            - config.hidden_dropout_prob (float): The dropout probability for hidden layers.
            - config.layer_norm_eps (float): The epsilon value for LayerNorm.
            - config.position_embedding_type (str): The type of position embedding ('absolute', 'relative_key', 'relative_key_query').
            - config.max_position_embeddings (int): The maximum position embeddings allowed.

    Returns:
        None.

    Raises:
        ValueError: Raised if the hidden_size is not a multiple of the num_attention_heads and no embedding_size is provided.
    """
    super().__init__()
    if config.hidden_size % config.num_attention_heads != 0 and not hasattr(config, "embedding_size"):
        raise ValueError(
            f"The hidden size ({config.hidden_size}) is not a multiple of the number of attention "
            f"heads ({config.num_attention_heads}"
        )

    self.num_attention_heads = config.num_attention_heads
    self.hidden_size = config.hidden_size
    self.attention_head_size = config.hidden_size // config.num_attention_heads
    self.all_head_size = self.num_attention_heads * self.attention_head_size

    self.query = nn.Linear(config.hidden_size, self.all_head_size)
    self.key = nn.Linear(config.hidden_size, self.all_head_size)
    self.value = nn.Linear(config.hidden_size, self.all_head_size)

    self.attention_dropout = nn.Dropout(p=config.attention_probs_dropout_prob)
    self.output_dropout = nn.Dropout(p=config.hidden_dropout_prob)
    self.dense = nn.Linear(config.hidden_size, config.hidden_size)
    self.LayerNorm = nn.LayerNorm([config.hidden_size], eps=config.layer_norm_eps)
    self.pruned_heads = set()

    self.position_embedding_type = getattr(config, "position_embedding_type", "absolute")
    if self.position_embedding_type in ('relative_key', 'relative_key_query'):
        self.max_position_embeddings = config.max_position_embeddings
        self.distance_embedding = nn.Embedding(2 * config.max_position_embeddings - 1, self.attention_head_size)

mindnlp.transformers.models.albert.modeling_albert.AlbertAttention.forward(hidden_states, attention_mask=None, head_mask=None, output_attentions=False)

Constructs the attention mechanism for the Albert model.

PARAMETER DESCRIPTION
self

An instance of the AlbertAttention class.

TYPE: AlbertAttention

hidden_states

The input hidden states tensor of shape (batch_size, seq_length, hidden_size).

TYPE: Tensor

attention_mask

The attention mask tensor of shape (batch_size, seq_length, seq_length). Defaults to None.

TYPE: Optional[Tensor] DEFAULT: None

head_mask

The head mask tensor of shape (num_attention_heads, seq_length, seq_length). Defaults to None.

TYPE: Optional[Tensor] DEFAULT: None

output_attentions

Whether to output the attention probabilities. Defaults to False.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
Union[Tuple[Tensor], Tuple[Tensor, Tensor]]

Union[Tuple[mindspore.Tensor], Tuple[mindspore.Tensor, mindspore.Tensor]]:

  • If output_attentions is False, returns a tuple containing:

    • layernormed_context_layer (mindspore.Tensor): The output tensor after applying layer normalization of shape (batch_size, seq_length, hidden_size).
  • If output_attentions is True, returns a tuple containing:

    • layernormed_context_layer (mindspore.Tensor): The output tensor after applying layer normalization of shape (batch_size, seq_length, hidden_size).

    • attention_probs (mindspore.Tensor): The attention probabilities tensor of shape (batch_size, num_attention_heads, seq_length, seq_length).

Source code in mindnlp/transformers/models/albert/modeling_albert.py
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
def forward(
    self,
    hidden_states: mindspore.Tensor,
    attention_mask: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    output_attentions: bool = False,
) -> Union[Tuple[mindspore.Tensor], Tuple[mindspore.Tensor, mindspore.Tensor]]:
    '''
    Constructs the attention mechanism for the Albert model.

    Args:
        self (AlbertAttention): An instance of the AlbertAttention class.
        hidden_states (mindspore.Tensor): The input hidden states tensor of shape (batch_size, seq_length, hidden_size).
        attention_mask (Optional[mindspore.Tensor]): The attention mask tensor of shape (batch_size, seq_length, seq_length).
            Defaults to None.
        head_mask (Optional[mindspore.Tensor]): The head mask tensor of shape (num_attention_heads, seq_length, seq_length).
            Defaults to None.
        output_attentions (bool): Whether to output the attention probabilities. Defaults to False.

    Returns:
        Union[Tuple[mindspore.Tensor], Tuple[mindspore.Tensor, mindspore.Tensor]]:

            - If output_attentions is False, returns a tuple containing:

                - layernormed_context_layer (mindspore.Tensor): The output tensor after applying layer normalization
                of shape (batch_size, seq_length, hidden_size).

            - If output_attentions is True, returns a tuple containing:

                - layernormed_context_layer (mindspore.Tensor): The output tensor after applying layer normalization
                of shape (batch_size, seq_length, hidden_size).

                - attention_probs (mindspore.Tensor): The attention probabilities tensor of shape
                (batch_size, num_attention_heads, seq_length, seq_length).

    Raises:
        None.
    '''
    mixed_query_layer = self.query(hidden_states)
    mixed_key_layer = self.key(hidden_states)
    mixed_value_layer = self.value(hidden_states)

    query_layer = self.transpose_for_scores(mixed_query_layer)
    key_layer = self.transpose_for_scores(mixed_key_layer)
    value_layer = self.transpose_for_scores(mixed_value_layer)

    # Take the dot product between "query" and "key" to get the raw attention scores.
    attention_scores = ops.matmul(query_layer, key_layer.swapaxes(-1, -2))
    attention_scores = attention_scores / math.sqrt(self.attention_head_size)

    if attention_mask is not None:
        # Apply the attention mask is (precomputed for all layers in BertModel forward() function)
        attention_scores = attention_scores + attention_mask

    if self.position_embedding_type in ('relative_key', 'relative_key_query'):
        seq_length = hidden_states.shape[1]
        position_ids_l = ops.arange(seq_length, dtype=mindspore.int64).view(-1, 1)
        position_ids_r = ops.arange(seq_length, dtype=mindspore.int64).view(1, -1)
        distance = position_ids_l - position_ids_r
        positional_embedding = self.distance_embedding(distance + self.max_position_embeddings - 1)
        positional_embedding = positional_embedding.to(dtype=query_layer.dtype)  # fp16 compatibility

        if self.position_embedding_type == "relative_key":
            relative_position_scores = ops.einsum("bhld,lrd->bhlr", query_layer, positional_embedding)
            attention_scores = attention_scores + relative_position_scores
        elif self.position_embedding_type == "relative_key_query":
            relative_position_scores_query = ops.einsum("bhld,lrd->bhlr", query_layer, positional_embedding)
            relative_position_scores_key = ops.einsum("bhrd,lrd->bhlr", key_layer, positional_embedding)
            attention_scores = attention_scores + relative_position_scores_query + relative_position_scores_key

    # Normalize the attention scores to probabilities.
    attention_probs = ops.softmax(attention_scores, dim=-1)

    # This is actually dropping out entire tokens to attend to, which might
    # seem a bit unusual, but is taken from the original Transformer paper.
    attention_probs = self.attention_dropout(attention_probs)

    # Mask heads if we want to
    if head_mask is not None:
        attention_probs = attention_probs * head_mask

    context_layer = ops.matmul(attention_probs, value_layer)
    context_layer = context_layer.swapaxes(2, 1).flatten(start_dim=2)

    projected_context_layer = self.dense(context_layer)
    projected_context_layer_dropout = self.output_dropout(projected_context_layer)
    layernormed_context_layer = self.LayerNorm(hidden_states + projected_context_layer_dropout)
    return (layernormed_context_layer, attention_probs) if output_attentions else (layernormed_context_layer,)

mindnlp.transformers.models.albert.modeling_albert.AlbertAttention.prune_heads(heads)

This method prunes specific attention heads from the AlbertAttention class.

PARAMETER DESCRIPTION
self

The instance of the AlbertAttention class.

heads

A list of integers representing the attention heads to be pruned. If the list is empty, no action is taken.

TYPE: List[int]

RETURNS DESCRIPTION
None

This method does not return any value, it modifies the internal state of the AlbertAttention instance.

TYPE: None

Source code in mindnlp/transformers/models/albert/modeling_albert.py
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
def prune_heads(self, heads: List[int]) -> None:
    """
    This method prunes specific attention heads from the AlbertAttention class.

    Args:
        self: The instance of the AlbertAttention class.
        heads (List[int]): A list of integers representing the attention heads to be pruned. If the list is empty, no action is taken.

    Returns:
        None: This method does not return any value, it modifies the internal state of the AlbertAttention instance.

    Raises:
        None
    """
    if len(heads) == 0:
        return
    heads, index = find_pruneable_heads_and_indices(
        heads, self.num_attention_heads, self.attention_head_size, self.pruned_heads
    )

    # Prune linear layers
    self.query = prune_linear_layer(self.query, index)
    self.key = prune_linear_layer(self.key, index)
    self.value = prune_linear_layer(self.value, index)
    self.dense = prune_linear_layer(self.dense, index, dim=1)

    # Update hyper params and store pruned heads
    self.num_attention_heads = self.num_attention_heads - len(heads)
    self.all_head_size = self.attention_head_size * self.num_attention_heads
    self.pruned_heads = self.pruned_heads.union(heads)

mindnlp.transformers.models.albert.modeling_albert.AlbertAttention.transpose_for_scores(x)

Transpose the input tensor for calculating attention scores in the AlbertAttention class.

PARAMETER DESCRIPTION
self

The instance of the AlbertAttention class.

TYPE: AlbertAttention

x

The input tensor to be transposed. It should have a shape of (batch_size, sequence_length, hidden_size).

TYPE: Tensor

RETURNS DESCRIPTION
Tensor

mindspore.Tensor: The transposed tensor with shape (batch_size, num_attention_heads, sequence_length, attention_head_size). The attention_head_size is calculated as hidden_size / num_attention_heads.

Source code in mindnlp/transformers/models/albert/modeling_albert.py
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
def transpose_for_scores(self, x: mindspore.Tensor) -> mindspore.Tensor:
    """
    Transpose the input tensor for calculating attention scores in the AlbertAttention class.

    Args:
        self (AlbertAttention): The instance of the AlbertAttention class.
        x (mindspore.Tensor): The input tensor to be transposed. It should have a shape of (batch_size, sequence_length, hidden_size).

    Returns:
        mindspore.Tensor:
            The transposed tensor with shape (batch_size, num_attention_heads, sequence_length, attention_head_size).
            The attention_head_size is calculated as hidden_size / num_attention_heads.

    Raises:
        None.
    """
    new_x_shape = x.shape[:-1] + (self.num_attention_heads, self.attention_head_size)
    x = x.view(new_x_shape)
    return x.permute(0, 2, 1, 3)

mindnlp.transformers.models.albert.modeling_albert.AlbertEmbeddings

Bases: Module

Construct the embeddings from word, position and token_type embeddings.

Source code in mindnlp/transformers/models/albert/modeling_albert.py
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
class AlbertEmbeddings(nn.Module):
    """
    Construct the embeddings from word, position and token_type embeddings.
    """
    def __init__(self, config: AlbertConfig):
        """
        Initializes an instance of the `AlbertEmbeddings` class.

        Args:
            self: The object itself.
            config (AlbertConfig):
                The configuration object containing various parameters for the embeddings.

                - `vocab_size` (int): The size of the vocabulary.
                - `embedding_size` (int): The size of the embeddings.
                - `pad_token_id` (int): The ID of the padding token.
                - `max_position_embeddings` (int): The maximum number of positions for the embeddings.
                - `type_vocab_size` (int): The size of the token type vocabulary.
                - `layer_norm_eps` (float): The epsilon value for LayerNorm.
                - `hidden_dropout_prob` (float): The dropout probability for embeddings.
                - `position_embedding_type` (str, optional): The type of position embeddings. Defaults to 'absolute'.

        Returns:
            None

        Raises:
            None
        """
        super().__init__()
        self.word_embeddings = nn.Embedding(config.vocab_size, config.embedding_size, padding_idx=config.pad_token_id)
        self.position_embeddings = nn.Embedding(config.max_position_embeddings, config.embedding_size)
        self.token_type_embeddings = nn.Embedding(config.type_vocab_size, config.embedding_size)

        # self.LayerNorm is not snake-cased to stick with TensorFlow model variable name and be able to load
        # any TensorFlow checkpoint file
        self.LayerNorm = nn.LayerNorm([config.embedding_size], eps=config.layer_norm_eps)
        self.dropout = nn.Dropout(p=config.hidden_dropout_prob)

        # position_ids (1, len position emb) is contiguous in memory and exported when serialized
        self.position_ids = ops.arange(config.max_position_embeddings).broadcast_to((1, -1))
        self.position_embedding_type = getattr(config, "position_embedding_type", "absolute")
        self.token_type_ids = ops.zeros(*self.position_ids.shape, dtype=mindspore.int64)

    # Copied from transformers.models.bert.modeling_bert.BertEmbeddings.forward
    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        token_type_ids: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        past_key_values_length: int = 0,
    ) -> mindspore.Tensor:
        """
        This method 'forward' is a part of the 'AlbertEmbeddings' class and is used to forward the embeddings for input tokens in the Albert model.

        Args:
            self: The instance of the class.
            input_ids (Optional[mindspore.Tensor]):
                The input token IDs, representing the index of each token in the vocabulary. Default is None.
            token_type_ids (Optional[mindspore.Tensor]):
                The token type IDs, representing the segment ID for each token (e.g., sentence A or B). Default is None.
            position_ids (Optional[mindspore.Tensor]):
                The position IDs, representing the position of each token in the sequence. Default is None.
            inputs_embeds (Optional[mindspore.Tensor]):
                The input embeddings directly provided instead of input_ids. Default is None.
            past_key_values_length (int): The length of past key values. Default is 0.

        Returns:
            mindspore.Tensor: The forwarded embeddings for the input tokens.

        Raises:
            ValueError: If the input shape and inputs_embeds shape are incompatible.
            ValueError: If the position embedding type is not supported.
            ValueError: If the token type embeddings shape and input_shape are incompatible.
            ValueError: If the position embeddings shape and input_shape are incompatible.
            ValueError: If the dimensions of input_shape are not as expected during computations.
        """
        if input_ids is not None:
            input_shape = input_ids.shape
        else:
            input_shape = inputs_embeds.shape[:-1]

        seq_length = input_shape[1]

        if position_ids is None:
            position_ids = self.position_ids[:, past_key_values_length : seq_length + past_key_values_length]

        # Setting the token_type_ids to the registered buffer in forwardor where it is all zeros, which usually occurs
        # when its auto-generated, registered buffer helps users when tracing the model without passing token_type_ids, solves
        # issue #5664
        if token_type_ids is None:
            if hasattr(self, "token_type_ids"):
                buffered_token_type_ids = self.token_type_ids[:, :seq_length]
                buffered_token_type_ids_expanded = buffered_token_type_ids.broadcast_to((input_shape[0], seq_length))
                token_type_ids = buffered_token_type_ids_expanded
            else:
                token_type_ids = ops.zeros(*input_shape, dtype=mindspore.int64)

        if inputs_embeds is None:
            inputs_embeds = self.word_embeddings(input_ids)
        token_type_embeddings = self.token_type_embeddings(token_type_ids)

        embeddings = inputs_embeds + token_type_embeddings
        if self.position_embedding_type == "absolute":
            position_embeddings = self.position_embeddings(position_ids)
            embeddings += position_embeddings
        embeddings = self.LayerNorm(embeddings)
        embeddings = self.dropout(embeddings)
        return embeddings

mindnlp.transformers.models.albert.modeling_albert.AlbertEmbeddings.__init__(config)

Initializes an instance of the AlbertEmbeddings class.

PARAMETER DESCRIPTION
self

The object itself.

config

The configuration object containing various parameters for the embeddings.

  • vocab_size (int): The size of the vocabulary.
  • embedding_size (int): The size of the embeddings.
  • pad_token_id (int): The ID of the padding token.
  • max_position_embeddings (int): The maximum number of positions for the embeddings.
  • type_vocab_size (int): The size of the token type vocabulary.
  • layer_norm_eps (float): The epsilon value for LayerNorm.
  • hidden_dropout_prob (float): The dropout probability for embeddings.
  • position_embedding_type (str, optional): The type of position embeddings. Defaults to 'absolute'.

TYPE: AlbertConfig

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/albert/modeling_albert.py
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
def __init__(self, config: AlbertConfig):
    """
    Initializes an instance of the `AlbertEmbeddings` class.

    Args:
        self: The object itself.
        config (AlbertConfig):
            The configuration object containing various parameters for the embeddings.

            - `vocab_size` (int): The size of the vocabulary.
            - `embedding_size` (int): The size of the embeddings.
            - `pad_token_id` (int): The ID of the padding token.
            - `max_position_embeddings` (int): The maximum number of positions for the embeddings.
            - `type_vocab_size` (int): The size of the token type vocabulary.
            - `layer_norm_eps` (float): The epsilon value for LayerNorm.
            - `hidden_dropout_prob` (float): The dropout probability for embeddings.
            - `position_embedding_type` (str, optional): The type of position embeddings. Defaults to 'absolute'.

    Returns:
        None

    Raises:
        None
    """
    super().__init__()
    self.word_embeddings = nn.Embedding(config.vocab_size, config.embedding_size, padding_idx=config.pad_token_id)
    self.position_embeddings = nn.Embedding(config.max_position_embeddings, config.embedding_size)
    self.token_type_embeddings = nn.Embedding(config.type_vocab_size, config.embedding_size)

    # self.LayerNorm is not snake-cased to stick with TensorFlow model variable name and be able to load
    # any TensorFlow checkpoint file
    self.LayerNorm = nn.LayerNorm([config.embedding_size], eps=config.layer_norm_eps)
    self.dropout = nn.Dropout(p=config.hidden_dropout_prob)

    # position_ids (1, len position emb) is contiguous in memory and exported when serialized
    self.position_ids = ops.arange(config.max_position_embeddings).broadcast_to((1, -1))
    self.position_embedding_type = getattr(config, "position_embedding_type", "absolute")
    self.token_type_ids = ops.zeros(*self.position_ids.shape, dtype=mindspore.int64)

mindnlp.transformers.models.albert.modeling_albert.AlbertEmbeddings.forward(input_ids=None, token_type_ids=None, position_ids=None, inputs_embeds=None, past_key_values_length=0)

This method 'forward' is a part of the 'AlbertEmbeddings' class and is used to forward the embeddings for input tokens in the Albert model.

PARAMETER DESCRIPTION
self

The instance of the class.

input_ids

The input token IDs, representing the index of each token in the vocabulary. Default is None.

TYPE: Optional[Tensor] DEFAULT: None

token_type_ids

The token type IDs, representing the segment ID for each token (e.g., sentence A or B). Default is None.

TYPE: Optional[Tensor] DEFAULT: None

position_ids

The position IDs, representing the position of each token in the sequence. Default is None.

TYPE: Optional[Tensor] DEFAULT: None

inputs_embeds

The input embeddings directly provided instead of input_ids. Default is None.

TYPE: Optional[Tensor] DEFAULT: None

past_key_values_length

The length of past key values. Default is 0.

TYPE: int DEFAULT: 0

RETURNS DESCRIPTION
Tensor

mindspore.Tensor: The forwarded embeddings for the input tokens.

RAISES DESCRIPTION
ValueError

If the input shape and inputs_embeds shape are incompatible.

ValueError

If the position embedding type is not supported.

ValueError

If the token type embeddings shape and input_shape are incompatible.

ValueError

If the position embeddings shape and input_shape are incompatible.

ValueError

If the dimensions of input_shape are not as expected during computations.

Source code in mindnlp/transformers/models/albert/modeling_albert.py
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    token_type_ids: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    past_key_values_length: int = 0,
) -> mindspore.Tensor:
    """
    This method 'forward' is a part of the 'AlbertEmbeddings' class and is used to forward the embeddings for input tokens in the Albert model.

    Args:
        self: The instance of the class.
        input_ids (Optional[mindspore.Tensor]):
            The input token IDs, representing the index of each token in the vocabulary. Default is None.
        token_type_ids (Optional[mindspore.Tensor]):
            The token type IDs, representing the segment ID for each token (e.g., sentence A or B). Default is None.
        position_ids (Optional[mindspore.Tensor]):
            The position IDs, representing the position of each token in the sequence. Default is None.
        inputs_embeds (Optional[mindspore.Tensor]):
            The input embeddings directly provided instead of input_ids. Default is None.
        past_key_values_length (int): The length of past key values. Default is 0.

    Returns:
        mindspore.Tensor: The forwarded embeddings for the input tokens.

    Raises:
        ValueError: If the input shape and inputs_embeds shape are incompatible.
        ValueError: If the position embedding type is not supported.
        ValueError: If the token type embeddings shape and input_shape are incompatible.
        ValueError: If the position embeddings shape and input_shape are incompatible.
        ValueError: If the dimensions of input_shape are not as expected during computations.
    """
    if input_ids is not None:
        input_shape = input_ids.shape
    else:
        input_shape = inputs_embeds.shape[:-1]

    seq_length = input_shape[1]

    if position_ids is None:
        position_ids = self.position_ids[:, past_key_values_length : seq_length + past_key_values_length]

    # Setting the token_type_ids to the registered buffer in forwardor where it is all zeros, which usually occurs
    # when its auto-generated, registered buffer helps users when tracing the model without passing token_type_ids, solves
    # issue #5664
    if token_type_ids is None:
        if hasattr(self, "token_type_ids"):
            buffered_token_type_ids = self.token_type_ids[:, :seq_length]
            buffered_token_type_ids_expanded = buffered_token_type_ids.broadcast_to((input_shape[0], seq_length))
            token_type_ids = buffered_token_type_ids_expanded
        else:
            token_type_ids = ops.zeros(*input_shape, dtype=mindspore.int64)

    if inputs_embeds is None:
        inputs_embeds = self.word_embeddings(input_ids)
    token_type_embeddings = self.token_type_embeddings(token_type_ids)

    embeddings = inputs_embeds + token_type_embeddings
    if self.position_embedding_type == "absolute":
        position_embeddings = self.position_embeddings(position_ids)
        embeddings += position_embeddings
    embeddings = self.LayerNorm(embeddings)
    embeddings = self.dropout(embeddings)
    return embeddings

mindnlp.transformers.models.albert.modeling_albert.AlbertForMaskedLM

Bases: AlbertPreTrainedModel

AlbertForMaskedLM is a class that represents an Albert model for Masked Language Modeling tasks. It inherits from AlbertPreTrainedModel and provides methods for setting and getting output embeddings, input embeddings, and for forwarding the model for masked language modeling. The class includes an initialization method that sets up the model with AlbertModel and AlbertMLMHead components, as well as methods for manipulating embeddings and forwarding the model for training or inference. The 'forward' method takes various input tensors and parameters for the model and returns the masked language modeling output including the loss and prediction scores. The class is designed to be used in natural language processing tasks where masked language modeling is required.

Source code in mindnlp/transformers/models/albert/modeling_albert.py
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
class AlbertForMaskedLM(AlbertPreTrainedModel):

    """
    AlbertForMaskedLM is a class that represents an Albert model for Masked Language Modeling tasks.
    It inherits from AlbertPreTrainedModel and provides methods for setting and getting output embeddings, input
    embeddings, and for forwarding the model for masked language modeling.
    The class includes an initialization method that sets up the model with AlbertModel and AlbertMLMHead components, as well as methods for
    manipulating embeddings and forwarding the model for training or inference.
    The 'forward' method takes various input tensors and parameters for the model and returns the masked language modeling output
    including the loss and prediction scores. The class is designed to be used in natural language processing tasks where masked language modeling is required.
    """
    _tied_weights_keys = ["predictions.decoder.bias", "predictions.decoder.weight"]

    def __init__(self, config):
        """
        Initializes an instance of the AlbertForMaskedLM class.

        Args:
            self: The current instance of the class.
            config (AlbertConfig): An instance of AlbertConfig containing the model configuration settings.

        Returns:
            None

        Raises:
            None
        """
        super().__init__(config)

        self.albert = AlbertModel(config, add_pooling_layer=False)
        self.predictions = AlbertMLMHead(config)

        # Initialize weights and apply final processing
        self.post_init()

    def get_output_embeddings(self) -> nn.Linear:
        """
        Retrieve the output embeddings from the AlbertForMaskedLM model.

        Args:
            self (AlbertForMaskedLM): The instance of the AlbertForMaskedLM class.
                This parameter is automatically passed when calling the method.
                It is used to access the model's predictions.decoder attribute.

        Returns:
            nn.Linear: The output embeddings of the model.
                These embeddings are used for generating predictions for masked tokens.

        Raises:
            None: This method does not raise any exceptions.
        """
        return self.predictions.decoder

    def set_output_embeddings(self, new_embeddings: nn.Linear) -> None:
        """
        Sets the output embeddings for the AlbertForMaskedLM model.

        Args:
            self (AlbertForMaskedLM): The instance of the AlbertForMaskedLM class.
            new_embeddings (nn.Linear): The new embeddings to be set for the output layer of the model.

        Returns:
            None.

        Raises:
            None.
        """
        self.predictions.decoder = new_embeddings

    def get_input_embeddings(self) -> nn.Embedding:
        """
        Retrieve the input embeddings for the AlbertForMaskedLM model.

        Args:
            self (AlbertForMaskedLM): An instance of the AlbertForMaskedLM class.

        Returns:
            nn.Embedding:
                The input embeddings used by the AlbertForMaskedLM model. These embeddings are of type nn.Embedding and
                represent the mapping of input tokens to their respective embeddings.

        Raises:
            None.

        Note:
            The input embeddings are obtained from the 'word_embeddings' attribute of the ALBERT model's 'embeddings' module.
        """
        return self.albert.embeddings.word_embeddings

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        token_type_ids: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        labels: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[MaskedLMOutput, Tuple]:
        r"""
        Args:
            labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
                Labels for computing the masked language modeling loss. Indices should be in `[-100, 0, ...,
                config.vocab_size]` (see `input_ids` docstring) Tokens with indices set to `-100` are ignored (masked), the
                loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`

        Returns:
            Union[MaskedLMOutput, Tuple]

        Example:
            ```python
            >>> from transformers import AutoTokenizer, AlbertForMaskedLM
            ...
            >>> tokenizer = AutoTokenizer.from_pretrained("albert-base-v2")
            >>> model = AlbertForMaskedLM.from_pretrained("albert-base-v2")
            ...
            >>> # add mask_token
            >>> inputs = tokenizer("The capital of [MASK] is Paris.", return_tensors="pt")
            >>> with torch.no_grad():
            ...     logits = model(**inputs).logits
            ...
            >>> # retrieve index of [MASK]
            >>> mask_token_index = (inputs.input_ids == tokenizer.mask_token_id)[0].nonzero(as_tuple=True)[0]
            >>> predicted_token_id = logits[0, mask_token_index].argmax(dim=-1)
            >>> tokenizer.decode(predicted_token_id)
            'france'
            ```

            ```python
            >>> labels = tokenizer("The capital of France is Paris.", return_tensors="pt")["input_ids"]
            >>> labels = torch.where(inputs.input_ids == tokenizer.mask_token_id, labels, -100)
            >>> outputs = model(**inputs, labels=labels)
            >>> round(outputs.loss.item(), 2)
            0.81
            ```
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        outputs = self.albert(
            input_ids=input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )
        sequence_outputs = outputs[0]

        prediction_scores = self.predictions(sequence_outputs)

        masked_lm_loss = None
        if labels is not None:
            masked_lm_loss = F.cross_entropy(prediction_scores.view(-1, self.config.vocab_size), labels.view(-1))

        if not return_dict:
            output = (prediction_scores,) + outputs[2:]
            return ((masked_lm_loss,) + output) if masked_lm_loss is not None else output

        return MaskedLMOutput(
            loss=masked_lm_loss,
            logits=prediction_scores,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )

mindnlp.transformers.models.albert.modeling_albert.AlbertForMaskedLM.__init__(config)

Initializes an instance of the AlbertForMaskedLM class.

PARAMETER DESCRIPTION
self

The current instance of the class.

config

An instance of AlbertConfig containing the model configuration settings.

TYPE: AlbertConfig

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/albert/modeling_albert.py
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
def __init__(self, config):
    """
    Initializes an instance of the AlbertForMaskedLM class.

    Args:
        self: The current instance of the class.
        config (AlbertConfig): An instance of AlbertConfig containing the model configuration settings.

    Returns:
        None

    Raises:
        None
    """
    super().__init__(config)

    self.albert = AlbertModel(config, add_pooling_layer=False)
    self.predictions = AlbertMLMHead(config)

    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.albert.modeling_albert.AlbertForMaskedLM.forward(input_ids=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=None)

PARAMETER DESCRIPTION
labels

Labels for computing the masked language modeling loss. Indices should be in [-100, 0, ..., config.vocab_size] (see input_ids docstring) Tokens with indices set to -100 are ignored (masked), the loss is only computed for the tokens with labels in [0, ..., config.vocab_size]

TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional* DEFAULT: None

RETURNS DESCRIPTION
Union[MaskedLMOutput, Tuple]

Union[MaskedLMOutput, Tuple]

Example
>>> from transformers import AutoTokenizer, AlbertForMaskedLM
...
>>> tokenizer = AutoTokenizer.from_pretrained("albert-base-v2")
>>> model = AlbertForMaskedLM.from_pretrained("albert-base-v2")
...
>>> # add mask_token
>>> inputs = tokenizer("The capital of [MASK] is Paris.", return_tensors="pt")
>>> with torch.no_grad():
...     logits = model(**inputs).logits
...
>>> # retrieve index of [MASK]
>>> mask_token_index = (inputs.input_ids == tokenizer.mask_token_id)[0].nonzero(as_tuple=True)[0]
>>> predicted_token_id = logits[0, mask_token_index].argmax(dim=-1)
>>> tokenizer.decode(predicted_token_id)
'france'
>>> labels = tokenizer("The capital of France is Paris.", return_tensors="pt")["input_ids"]
>>> labels = torch.where(inputs.input_ids == tokenizer.mask_token_id, labels, -100)
>>> outputs = model(**inputs, labels=labels)
>>> round(outputs.loss.item(), 2)
0.81
Source code in mindnlp/transformers/models/albert/modeling_albert.py
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    token_type_ids: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    labels: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[MaskedLMOutput, Tuple]:
    r"""
    Args:
        labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
            Labels for computing the masked language modeling loss. Indices should be in `[-100, 0, ...,
            config.vocab_size]` (see `input_ids` docstring) Tokens with indices set to `-100` are ignored (masked), the
            loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`

    Returns:
        Union[MaskedLMOutput, Tuple]

    Example:
        ```python
        >>> from transformers import AutoTokenizer, AlbertForMaskedLM
        ...
        >>> tokenizer = AutoTokenizer.from_pretrained("albert-base-v2")
        >>> model = AlbertForMaskedLM.from_pretrained("albert-base-v2")
        ...
        >>> # add mask_token
        >>> inputs = tokenizer("The capital of [MASK] is Paris.", return_tensors="pt")
        >>> with torch.no_grad():
        ...     logits = model(**inputs).logits
        ...
        >>> # retrieve index of [MASK]
        >>> mask_token_index = (inputs.input_ids == tokenizer.mask_token_id)[0].nonzero(as_tuple=True)[0]
        >>> predicted_token_id = logits[0, mask_token_index].argmax(dim=-1)
        >>> tokenizer.decode(predicted_token_id)
        'france'
        ```

        ```python
        >>> labels = tokenizer("The capital of France is Paris.", return_tensors="pt")["input_ids"]
        >>> labels = torch.where(inputs.input_ids == tokenizer.mask_token_id, labels, -100)
        >>> outputs = model(**inputs, labels=labels)
        >>> round(outputs.loss.item(), 2)
        0.81
        ```
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    outputs = self.albert(
        input_ids=input_ids,
        attention_mask=attention_mask,
        token_type_ids=token_type_ids,
        position_ids=position_ids,
        head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )
    sequence_outputs = outputs[0]

    prediction_scores = self.predictions(sequence_outputs)

    masked_lm_loss = None
    if labels is not None:
        masked_lm_loss = F.cross_entropy(prediction_scores.view(-1, self.config.vocab_size), labels.view(-1))

    if not return_dict:
        output = (prediction_scores,) + outputs[2:]
        return ((masked_lm_loss,) + output) if masked_lm_loss is not None else output

    return MaskedLMOutput(
        loss=masked_lm_loss,
        logits=prediction_scores,
        hidden_states=outputs.hidden_states,
        attentions=outputs.attentions,
    )

mindnlp.transformers.models.albert.modeling_albert.AlbertForMaskedLM.get_input_embeddings()

Retrieve the input embeddings for the AlbertForMaskedLM model.

PARAMETER DESCRIPTION
self

An instance of the AlbertForMaskedLM class.

TYPE: AlbertForMaskedLM

RETURNS DESCRIPTION
Embedding

nn.Embedding: The input embeddings used by the AlbertForMaskedLM model. These embeddings are of type nn.Embedding and represent the mapping of input tokens to their respective embeddings.

Note

The input embeddings are obtained from the 'word_embeddings' attribute of the ALBERT model's 'embeddings' module.

Source code in mindnlp/transformers/models/albert/modeling_albert.py
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
def get_input_embeddings(self) -> nn.Embedding:
    """
    Retrieve the input embeddings for the AlbertForMaskedLM model.

    Args:
        self (AlbertForMaskedLM): An instance of the AlbertForMaskedLM class.

    Returns:
        nn.Embedding:
            The input embeddings used by the AlbertForMaskedLM model. These embeddings are of type nn.Embedding and
            represent the mapping of input tokens to their respective embeddings.

    Raises:
        None.

    Note:
        The input embeddings are obtained from the 'word_embeddings' attribute of the ALBERT model's 'embeddings' module.
    """
    return self.albert.embeddings.word_embeddings

mindnlp.transformers.models.albert.modeling_albert.AlbertForMaskedLM.get_output_embeddings()

Retrieve the output embeddings from the AlbertForMaskedLM model.

PARAMETER DESCRIPTION
self

The instance of the AlbertForMaskedLM class. This parameter is automatically passed when calling the method. It is used to access the model's predictions.decoder attribute.

TYPE: AlbertForMaskedLM

RETURNS DESCRIPTION
Linear

nn.Linear: The output embeddings of the model. These embeddings are used for generating predictions for masked tokens.

RAISES DESCRIPTION
None

This method does not raise any exceptions.

Source code in mindnlp/transformers/models/albert/modeling_albert.py
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
def get_output_embeddings(self) -> nn.Linear:
    """
    Retrieve the output embeddings from the AlbertForMaskedLM model.

    Args:
        self (AlbertForMaskedLM): The instance of the AlbertForMaskedLM class.
            This parameter is automatically passed when calling the method.
            It is used to access the model's predictions.decoder attribute.

    Returns:
        nn.Linear: The output embeddings of the model.
            These embeddings are used for generating predictions for masked tokens.

    Raises:
        None: This method does not raise any exceptions.
    """
    return self.predictions.decoder

mindnlp.transformers.models.albert.modeling_albert.AlbertForMaskedLM.set_output_embeddings(new_embeddings)

Sets the output embeddings for the AlbertForMaskedLM model.

PARAMETER DESCRIPTION
self

The instance of the AlbertForMaskedLM class.

TYPE: AlbertForMaskedLM

new_embeddings

The new embeddings to be set for the output layer of the model.

TYPE: Linear

RETURNS DESCRIPTION
None

None.

Source code in mindnlp/transformers/models/albert/modeling_albert.py
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
def set_output_embeddings(self, new_embeddings: nn.Linear) -> None:
    """
    Sets the output embeddings for the AlbertForMaskedLM model.

    Args:
        self (AlbertForMaskedLM): The instance of the AlbertForMaskedLM class.
        new_embeddings (nn.Linear): The new embeddings to be set for the output layer of the model.

    Returns:
        None.

    Raises:
        None.
    """
    self.predictions.decoder = new_embeddings

mindnlp.transformers.models.albert.modeling_albert.AlbertForMultipleChoice

Bases: AlbertPreTrainedModel

This class represents the Albert model for multiple choice classification tasks. It is a subclass of the AlbertPreTrainedModel.

The AlbertForMultipleChoice class contains methods for model initialization and forwardion. It inherits the configuration from AlbertConfig and utilizes the AlbertModel for the underlying Albert architecture.

METHOD DESCRIPTION
__init__

Initializes the AlbertForMultipleChoice model with the given configuration.

forward

Constructs the AlbertForMultipleChoice model with the given input tensors and returns the output.

ATTRIBUTE DESCRIPTION
albert

The underlying AlbertModel instance.

dropout

Dropout layer for regularization.

classifier

Dense layer for classification.

config

The AlbertConfig instance used for model initialization.

Note

The forward method follows the multiple choice classification setup and returns either the classification loss and logits or a tuple containing the loss, logits, hidden states, and attentions, depending on the return_dict parameter.

Please refer to the AlbertConfig documentation for more details on the configuration options used by this class.

Source code in mindnlp/transformers/models/albert/modeling_albert.py
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
class AlbertForMultipleChoice(AlbertPreTrainedModel):

    """
    This class represents the Albert model for multiple choice classification tasks. It is a subclass of the AlbertPreTrainedModel.

    The AlbertForMultipleChoice class contains methods for model initialization and forwardion.
    It inherits the configuration from AlbertConfig and utilizes the AlbertModel for the underlying Albert architecture.

    Methods:
        __init__: Initializes the AlbertForMultipleChoice model with the given configuration.
        forward: Constructs the AlbertForMultipleChoice model with the given input tensors and returns the output.

    Attributes:
        albert: The underlying AlbertModel instance.
        dropout: Dropout layer for regularization.
        classifier: Dense layer for classification.
        config: The AlbertConfig instance used for model initialization.

    Note:
        The forward method follows the multiple choice classification setup and returns either the classification loss
        and logits or a tuple containing the loss, logits, hidden states, and attentions, depending on the return_dict parameter.

    Please refer to the AlbertConfig documentation for more details on the configuration options used by this class.
    """
    def __init__(self, config: AlbertConfig):
        """
        Initialize the AlbertForMultipleChoice model.

        Args:
            self: The object instance of the AlbertForMultipleChoice class.
            config (AlbertConfig): An instance of AlbertConfig class containing the configuration settings for the Albert model.

        Returns:
            None.

        Raises:
            TypeError: If the config parameter is not of type AlbertConfig.
            ValueError: If the classifier_dropout_prob attribute in the config parameter is not within the valid range [0, 1].
        """
        super().__init__(config)

        self.albert = AlbertModel(config)
        self.dropout = nn.Dropout(p=config.classifier_dropout_prob)
        self.classifier = nn.Linear(config.hidden_size, 1)

        # Initialize weights and apply final processing
        self.post_init()

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        token_type_ids: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        labels: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[AlbertForPreTrainingOutput, Tuple]:
        r"""
        Args:
            labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
                Labels for computing the multiple choice classification loss. Indices should be in `[0, ...,
                num_choices-1]` where *num_choices* is the size of the second dimension of the input tensors. (see
                *input_ids* above)
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
        num_choices = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1]

        input_ids = input_ids.view(-1, input_ids.shape[-1]) if input_ids is not None else None
        attention_mask = attention_mask.view(-1, attention_mask.shape[-1]) if attention_mask is not None else None
        token_type_ids = token_type_ids.view(-1, token_type_ids.shape[-1]) if token_type_ids is not None else None
        position_ids = position_ids.view(-1, position_ids.shape[-1]) if position_ids is not None else None
        inputs_embeds = (
            inputs_embeds.view(-1, inputs_embeds.shape[-2], inputs_embeds.shape[-1])
            if inputs_embeds is not None
            else None
        )
        outputs = self.albert(
            input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        pooled_output = outputs[1]

        pooled_output = self.dropout(pooled_output)
        logits: mindspore.Tensor = self.classifier(pooled_output)
        reshaped_logits = logits.view(-1, num_choices)

        loss = None
        if labels is not None:
            loss = F.cross_entropy(reshaped_logits, labels)

        if not return_dict:
            output = (reshaped_logits,) + outputs[2:]
            return ((loss,) + output) if loss is not None else output

        return MultipleChoiceModelOutput(
            loss=loss,
            logits=reshaped_logits,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )

mindnlp.transformers.models.albert.modeling_albert.AlbertForMultipleChoice.__init__(config)

Initialize the AlbertForMultipleChoice model.

PARAMETER DESCRIPTION
self

The object instance of the AlbertForMultipleChoice class.

config

An instance of AlbertConfig class containing the configuration settings for the Albert model.

TYPE: AlbertConfig

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
TypeError

If the config parameter is not of type AlbertConfig.

ValueError

If the classifier_dropout_prob attribute in the config parameter is not within the valid range [0, 1].

Source code in mindnlp/transformers/models/albert/modeling_albert.py
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
def __init__(self, config: AlbertConfig):
    """
    Initialize the AlbertForMultipleChoice model.

    Args:
        self: The object instance of the AlbertForMultipleChoice class.
        config (AlbertConfig): An instance of AlbertConfig class containing the configuration settings for the Albert model.

    Returns:
        None.

    Raises:
        TypeError: If the config parameter is not of type AlbertConfig.
        ValueError: If the classifier_dropout_prob attribute in the config parameter is not within the valid range [0, 1].
    """
    super().__init__(config)

    self.albert = AlbertModel(config)
    self.dropout = nn.Dropout(p=config.classifier_dropout_prob)
    self.classifier = nn.Linear(config.hidden_size, 1)

    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.albert.modeling_albert.AlbertForMultipleChoice.forward(input_ids=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=None)

PARAMETER DESCRIPTION
labels

Labels for computing the multiple choice classification loss. Indices should be in [0, ..., num_choices-1] where num_choices is the size of the second dimension of the input tensors. (see input_ids above)

TYPE: `mindspore.Tensor` of shape `(batch_size,)`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/albert/modeling_albert.py
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    token_type_ids: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    labels: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[AlbertForPreTrainingOutput, Tuple]:
    r"""
    Args:
        labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for computing the multiple choice classification loss. Indices should be in `[0, ...,
            num_choices-1]` where *num_choices* is the size of the second dimension of the input tensors. (see
            *input_ids* above)
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict
    num_choices = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1]

    input_ids = input_ids.view(-1, input_ids.shape[-1]) if input_ids is not None else None
    attention_mask = attention_mask.view(-1, attention_mask.shape[-1]) if attention_mask is not None else None
    token_type_ids = token_type_ids.view(-1, token_type_ids.shape[-1]) if token_type_ids is not None else None
    position_ids = position_ids.view(-1, position_ids.shape[-1]) if position_ids is not None else None
    inputs_embeds = (
        inputs_embeds.view(-1, inputs_embeds.shape[-2], inputs_embeds.shape[-1])
        if inputs_embeds is not None
        else None
    )
    outputs = self.albert(
        input_ids,
        attention_mask=attention_mask,
        token_type_ids=token_type_ids,
        position_ids=position_ids,
        head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    pooled_output = outputs[1]

    pooled_output = self.dropout(pooled_output)
    logits: mindspore.Tensor = self.classifier(pooled_output)
    reshaped_logits = logits.view(-1, num_choices)

    loss = None
    if labels is not None:
        loss = F.cross_entropy(reshaped_logits, labels)

    if not return_dict:
        output = (reshaped_logits,) + outputs[2:]
        return ((loss,) + output) if loss is not None else output

    return MultipleChoiceModelOutput(
        loss=loss,
        logits=reshaped_logits,
        hidden_states=outputs.hidden_states,
        attentions=outputs.attentions,
    )

mindnlp.transformers.models.albert.modeling_albert.AlbertForPreTraining

Bases: AlbertPreTrainedModel

The AlbertForPreTraining class represents an Albert model for pre-training, inheriting from AlbertPreTrainedModel. It includes methods for initializing the model with the specified configuration, retrieving output embeddings, setting new output embeddings, retrieving input embeddings, and forwarding the model for pre-training tasks. The forward method accepts various input parameters and returns pre-training outputs. I t also includes examples of usage.

The AlbertForPreTraining class provides functionality for masked language modeling and next sequence prediction (classification) loss. It utilizes the Albert model, prediction heads, and sentence order prediction head to compute the total loss for pre-training tasks.

For additional details and examples on how to use the AlbertForPreTraining class, please refer to the provided code example and the official documentation for the transformers library.

Source code in mindnlp/transformers/models/albert/modeling_albert.py
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
class AlbertForPreTraining(AlbertPreTrainedModel):

    """
    The `AlbertForPreTraining` class represents an Albert model for pre-training, inheriting from `AlbertPreTrainedModel`.
    It includes methods for initializing the model with the specified configuration, retrieving output embeddings,
    setting new output embeddings, retrieving input embeddings, and forwarding the model for pre-training tasks.
    The `forward` method accepts various input parameters and returns pre-training outputs. I
    t also includes examples of usage.

    The `AlbertForPreTraining` class provides functionality for masked language modeling and next sequence prediction (classification) loss.
    It utilizes the Albert model, prediction heads, and sentence order prediction head to compute the total loss for pre-training tasks.

    For additional details and examples on how to use the `AlbertForPreTraining` class,
    please refer to the provided code example and the official documentation for the `transformers` library.
    """
    _tied_weights_keys = ["predictions.decoder.bias", "predictions.decoder.weight"]

    def __init__(self, config: AlbertConfig):
        """
        Initializes an instance of the AlbertForPreTraining class.

        Args:
            self: The instance of the class.
            config (AlbertConfig): An object of the AlbertConfig class containing the configuration parameters for the model.

        Returns:
            None

        Raises:
            None
        """
        super().__init__(config)

        self.albert = AlbertModel(config)
        self.predictions = AlbertMLMHead(config)
        self.sop_classifier = AlbertSOPHead(config)

        # Initialize weights and apply final processing
        self.post_init()

    def get_output_embeddings(self) -> nn.Linear:
        """
        Retrieves the output embeddings from the AlbertForPreTraining model.

        Args:
            self (AlbertForPreTraining): The current instance of the AlbertForPreTraining class.

        Returns:
            nn.Linear: The output embeddings of the model.

        Raises:
            None.

        This method returns the output embeddings of the AlbertForPreTraining model. The output embeddings
        represent the encoded representation of the input sequence. The embeddings are obtained from the
        predictions decoder of the model.

        Example:
            ```python
            >>> model = AlbertForPreTraining()
            >>> embeddings = model.get_output_embeddings()
            ```
        """
        return self.predictions.decoder

    def set_output_embeddings(self, new_embeddings: nn.Linear) -> None:
        """
        Set the output embeddings for the AlbertForPreTraining model.

        Args:
            self (AlbertForPreTraining): The current instance of the AlbertForPreTraining model.
            new_embeddings (nn.Linear): The new embeddings to be set as the output embeddings for the model.
                It should be an instance of nn.Linear representing the new output embeddings.

        Returns:
            None.

        Raises:
            TypeError: If the new_embeddings parameter is not of type nn.Linear.
        """
        self.predictions.decoder = new_embeddings

    def get_input_embeddings(self) -> nn.Embedding:
        """
        Retrieve the input embeddings for the ALBERT model.

        Args:
            self: An instance of the AlbertForPreTraining class.

        Returns:
            nn.Embedding: An instance of the nn.Embedding class representing the input embeddings for the ALBERT model.

        Raises:
            None
        """
        return self.albert.embeddings.word_embeddings

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        token_type_ids: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        labels: Optional[mindspore.Tensor] = None,
        sentence_order_label: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[AlbertForPreTrainingOutput, Tuple]:
        r"""
        Args:
            labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
                Labels for computing the masked language modeling loss. Indices should be in `[-100, 0, ...,
                config.vocab_size]` (see `input_ids` docstring) Tokens with indices set to `-100` are ignored (masked), the
                loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`
            sentence_order_label (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
                Labels for computing the next sequence prediction (classification) loss. Input should be a sequence pair
                (see `input_ids` docstring) Indices should be in `[0, 1]`. `0` indicates original order (sequence A, then
                sequence B), `1` indicates switched order (sequence B, then sequence A).

        Returns:
            Union[AlbertForPreTrainingOutput, Tuple]

        Example:
            ```python
            >>> from transformers import AutoTokenizer, AlbertForPreTraining
            ...
            ...
            >>> tokenizer = AutoTokenizer.from_pretrained("albert-base-v2")
            >>> model = AlbertForPreTraining.from_pretrained("albert-base-v2")
            ...
            >>> input_ids = mindspore.Tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0)
            >>> # Batch size 1
            >>> outputs = model(input_ids)
            ...
            >>> prediction_logits = outputs.prediction_logits
            >>> sop_logits = outputs.sop_logits
            ```
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        outputs = self.albert(
            input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        sequence_output, pooled_output = outputs[:2]

        prediction_scores = self.predictions(sequence_output)
        sop_scores = self.sop_classifier(pooled_output)

        total_loss = None
        if labels is not None and sentence_order_label is not None:
            masked_lm_loss = F.cross_entropy(prediction_scores.view(-1, self.config.vocab_size), labels.view(-1))
            sentence_order_loss = F.cross_entropy(sop_scores.view(-1, 2), sentence_order_label.view(-1))
            total_loss = masked_lm_loss + sentence_order_loss

        if not return_dict:
            output = (prediction_scores, sop_scores) + outputs[2:]
            return ((total_loss,) + output) if total_loss is not None else output

        return AlbertForPreTrainingOutput(
            loss=total_loss,
            prediction_logits=prediction_scores,
            sop_logits=sop_scores,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )

mindnlp.transformers.models.albert.modeling_albert.AlbertForPreTraining.__init__(config)

Initializes an instance of the AlbertForPreTraining class.

PARAMETER DESCRIPTION
self

The instance of the class.

config

An object of the AlbertConfig class containing the configuration parameters for the model.

TYPE: AlbertConfig

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/albert/modeling_albert.py
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
def __init__(self, config: AlbertConfig):
    """
    Initializes an instance of the AlbertForPreTraining class.

    Args:
        self: The instance of the class.
        config (AlbertConfig): An object of the AlbertConfig class containing the configuration parameters for the model.

    Returns:
        None

    Raises:
        None
    """
    super().__init__(config)

    self.albert = AlbertModel(config)
    self.predictions = AlbertMLMHead(config)
    self.sop_classifier = AlbertSOPHead(config)

    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.albert.modeling_albert.AlbertForPreTraining.forward(input_ids=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, labels=None, sentence_order_label=None, output_attentions=None, output_hidden_states=None, return_dict=None)

PARAMETER DESCRIPTION
labels

Labels for computing the masked language modeling loss. Indices should be in [-100, 0, ..., config.vocab_size] (see input_ids docstring) Tokens with indices set to -100 are ignored (masked), the loss is only computed for the tokens with labels in [0, ..., config.vocab_size]

TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional* DEFAULT: None

sentence_order_label

Labels for computing the next sequence prediction (classification) loss. Input should be a sequence pair (see input_ids docstring) Indices should be in [0, 1]. 0 indicates original order (sequence A, then sequence B), 1 indicates switched order (sequence B, then sequence A).

TYPE: `mindspore.Tensor` of shape `(batch_size,)`, *optional* DEFAULT: None

RETURNS DESCRIPTION
Union[AlbertForPreTrainingOutput, Tuple]

Union[AlbertForPreTrainingOutput, Tuple]

Example
>>> from transformers import AutoTokenizer, AlbertForPreTraining
...
...
>>> tokenizer = AutoTokenizer.from_pretrained("albert-base-v2")
>>> model = AlbertForPreTraining.from_pretrained("albert-base-v2")
...
>>> input_ids = mindspore.Tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0)
>>> # Batch size 1
>>> outputs = model(input_ids)
...
>>> prediction_logits = outputs.prediction_logits
>>> sop_logits = outputs.sop_logits
Source code in mindnlp/transformers/models/albert/modeling_albert.py
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    token_type_ids: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    labels: Optional[mindspore.Tensor] = None,
    sentence_order_label: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[AlbertForPreTrainingOutput, Tuple]:
    r"""
    Args:
        labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
            Labels for computing the masked language modeling loss. Indices should be in `[-100, 0, ...,
            config.vocab_size]` (see `input_ids` docstring) Tokens with indices set to `-100` are ignored (masked), the
            loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`
        sentence_order_label (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for computing the next sequence prediction (classification) loss. Input should be a sequence pair
            (see `input_ids` docstring) Indices should be in `[0, 1]`. `0` indicates original order (sequence A, then
            sequence B), `1` indicates switched order (sequence B, then sequence A).

    Returns:
        Union[AlbertForPreTrainingOutput, Tuple]

    Example:
        ```python
        >>> from transformers import AutoTokenizer, AlbertForPreTraining
        ...
        ...
        >>> tokenizer = AutoTokenizer.from_pretrained("albert-base-v2")
        >>> model = AlbertForPreTraining.from_pretrained("albert-base-v2")
        ...
        >>> input_ids = mindspore.Tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0)
        >>> # Batch size 1
        >>> outputs = model(input_ids)
        ...
        >>> prediction_logits = outputs.prediction_logits
        >>> sop_logits = outputs.sop_logits
        ```
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    outputs = self.albert(
        input_ids,
        attention_mask=attention_mask,
        token_type_ids=token_type_ids,
        position_ids=position_ids,
        head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    sequence_output, pooled_output = outputs[:2]

    prediction_scores = self.predictions(sequence_output)
    sop_scores = self.sop_classifier(pooled_output)

    total_loss = None
    if labels is not None and sentence_order_label is not None:
        masked_lm_loss = F.cross_entropy(prediction_scores.view(-1, self.config.vocab_size), labels.view(-1))
        sentence_order_loss = F.cross_entropy(sop_scores.view(-1, 2), sentence_order_label.view(-1))
        total_loss = masked_lm_loss + sentence_order_loss

    if not return_dict:
        output = (prediction_scores, sop_scores) + outputs[2:]
        return ((total_loss,) + output) if total_loss is not None else output

    return AlbertForPreTrainingOutput(
        loss=total_loss,
        prediction_logits=prediction_scores,
        sop_logits=sop_scores,
        hidden_states=outputs.hidden_states,
        attentions=outputs.attentions,
    )

mindnlp.transformers.models.albert.modeling_albert.AlbertForPreTraining.get_input_embeddings()

Retrieve the input embeddings for the ALBERT model.

PARAMETER DESCRIPTION
self

An instance of the AlbertForPreTraining class.

RETURNS DESCRIPTION
Embedding

nn.Embedding: An instance of the nn.Embedding class representing the input embeddings for the ALBERT model.

Source code in mindnlp/transformers/models/albert/modeling_albert.py
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
def get_input_embeddings(self) -> nn.Embedding:
    """
    Retrieve the input embeddings for the ALBERT model.

    Args:
        self: An instance of the AlbertForPreTraining class.

    Returns:
        nn.Embedding: An instance of the nn.Embedding class representing the input embeddings for the ALBERT model.

    Raises:
        None
    """
    return self.albert.embeddings.word_embeddings

mindnlp.transformers.models.albert.modeling_albert.AlbertForPreTraining.get_output_embeddings()

Retrieves the output embeddings from the AlbertForPreTraining model.

PARAMETER DESCRIPTION
self

The current instance of the AlbertForPreTraining class.

TYPE: AlbertForPreTraining

RETURNS DESCRIPTION
Linear

nn.Linear: The output embeddings of the model.

This method returns the output embeddings of the AlbertForPreTraining model. The output embeddings represent the encoded representation of the input sequence. The embeddings are obtained from the predictions decoder of the model.

Example
>>> model = AlbertForPreTraining()
>>> embeddings = model.get_output_embeddings()
Source code in mindnlp/transformers/models/albert/modeling_albert.py
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
def get_output_embeddings(self) -> nn.Linear:
    """
    Retrieves the output embeddings from the AlbertForPreTraining model.

    Args:
        self (AlbertForPreTraining): The current instance of the AlbertForPreTraining class.

    Returns:
        nn.Linear: The output embeddings of the model.

    Raises:
        None.

    This method returns the output embeddings of the AlbertForPreTraining model. The output embeddings
    represent the encoded representation of the input sequence. The embeddings are obtained from the
    predictions decoder of the model.

    Example:
        ```python
        >>> model = AlbertForPreTraining()
        >>> embeddings = model.get_output_embeddings()
        ```
    """
    return self.predictions.decoder

mindnlp.transformers.models.albert.modeling_albert.AlbertForPreTraining.set_output_embeddings(new_embeddings)

Set the output embeddings for the AlbertForPreTraining model.

PARAMETER DESCRIPTION
self

The current instance of the AlbertForPreTraining model.

TYPE: AlbertForPreTraining

new_embeddings

The new embeddings to be set as the output embeddings for the model. It should be an instance of nn.Linear representing the new output embeddings.

TYPE: Linear

RETURNS DESCRIPTION
None

None.

RAISES DESCRIPTION
TypeError

If the new_embeddings parameter is not of type nn.Linear.

Source code in mindnlp/transformers/models/albert/modeling_albert.py
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
def set_output_embeddings(self, new_embeddings: nn.Linear) -> None:
    """
    Set the output embeddings for the AlbertForPreTraining model.

    Args:
        self (AlbertForPreTraining): The current instance of the AlbertForPreTraining model.
        new_embeddings (nn.Linear): The new embeddings to be set as the output embeddings for the model.
            It should be an instance of nn.Linear representing the new output embeddings.

    Returns:
        None.

    Raises:
        TypeError: If the new_embeddings parameter is not of type nn.Linear.
    """
    self.predictions.decoder = new_embeddings

mindnlp.transformers.models.albert.modeling_albert.AlbertForPreTrainingOutput dataclass

Bases: ModelOutput

Output type of [AlbertForPreTraining].

PARAMETER DESCRIPTION
loss

Total loss as the sum of the masked language modeling loss and the next sequence prediction (classification) loss.

TYPE: *optional*, returned when `labels` is provided, `mindspore.Tensor` of shape `(1,)` DEFAULT: None

prediction_logits

Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).

TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length, config.vocab_size)` DEFAULT: None

sop_logits

Prediction scores of the next sequence prediction (classification) head (scores of True/False continuation before SoftMax).

TYPE: `mindspore.Tensor` of shape `(batch_size, 2)` DEFAULT: None

hidden_states

Tuple of mindspore.Tensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the initial embedding outputs.

TYPE: `tuple(mindspore.Tensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True` DEFAULT: None

attentions

Tuple of mindspore.Tensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

TYPE: `tuple(mindspore.Tensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True` DEFAULT: None

Source code in mindnlp/transformers/models/albert/modeling_albert.py
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
@dataclass
class AlbertForPreTrainingOutput(ModelOutput):
    """
    Output type of [`AlbertForPreTraining`].

    Args:
        loss (*optional*, returned when `labels` is provided, `mindspore.Tensor` of shape `(1,)`):
            Total loss as the sum of the masked language modeling loss and the next sequence prediction
            (classification) loss.
        prediction_logits (`mindspore.Tensor` of shape `(batch_size, sequence_length, config.vocab_size)`):
            Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
        sop_logits (`mindspore.Tensor` of shape `(batch_size, 2)`):
            Prediction scores of the next sequence prediction (classification) head (scores of True/False continuation
            before SoftMax).
        hidden_states (`tuple(mindspore.Tensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`):
            Tuple of `mindspore.Tensor` (one for the output of the embeddings + one for the output of each layer) of
            shape `(batch_size, sequence_length, hidden_size)`.

            Hidden-states of the model at the output of each layer plus the initial embedding outputs.
        attentions (`tuple(mindspore.Tensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`):
            Tuple of `mindspore.Tensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length,
            sequence_length)`.

            Attentions weights after the attention softmax, used to compute the weighted average in the self-attention
            heads.
    """
    loss: Optional[mindspore.Tensor] = None
    prediction_logits: mindspore.Tensor = None
    sop_logits: mindspore.Tensor = None
    hidden_states: Optional[Tuple[mindspore.Tensor]] = None
    attentions: Optional[Tuple[mindspore.Tensor]] = None

mindnlp.transformers.models.albert.modeling_albert.AlbertForQuestionAnswering

Bases: AlbertPreTrainedModel

AlbertForQuestionAnswering represents a fine-tuned Albert model for question answering tasks. This class inherits from AlbertPreTrainedModel and includes functionality to handle question answering tasks by computing start and end logits for the labelled spans in the input sequence.

ATTRIBUTE DESCRIPTION
num_labels

Number of labels for the classification task.

TYPE: int

albert

The Albert model used for question answering.

TYPE: AlbertModel

qa_outputs

A dense layer for computing logits for start and end positions.

TYPE: Linear

METHOD DESCRIPTION
__init__

Initializes the AlbertForQuestionAnswering class with the provided configuration.

forward

Constructs the Albert model for question answering and computes the loss for token classification based on start and end positions. Returns the total loss along with start and end logits if return_dict is False, otherwise returns a QuestionAnsweringModelOutput object.

Source code in mindnlp/transformers/models/albert/modeling_albert.py
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
class AlbertForQuestionAnswering(AlbertPreTrainedModel):

    """
    AlbertForQuestionAnswering represents a fine-tuned Albert model for question answering tasks.
    This class inherits from AlbertPreTrainedModel and includes functionality to handle question answering tasks
    by computing start and end logits for the labelled spans in the input sequence.

    Attributes:
        num_labels (int): Number of labels for the classification task.
        albert (AlbertModel): The Albert model used for question answering.
        qa_outputs (nn.Linear): A dense layer for computing logits for start and end positions.

    Methods:
        __init__: Initializes the AlbertForQuestionAnswering class with the provided configuration.
        forward:
            Constructs the Albert model for question answering and computes the loss for token classification based on start and end positions.
            Returns the total loss along with start and end logits if return_dict is False, otherwise returns a QuestionAnsweringModelOutput object.
    """
    def __init__(self, config: AlbertConfig):
        """
        Initializes an instance of AlbertForQuestionAnswering.

        Args:
            self: The instance of the class.
            config (AlbertConfig): An instance of AlbertConfig containing the configuration parameters for the model.
                It is used to set up the model architecture and initialize its components.
                The parameter 'config' should be of type AlbertConfig.

        Returns:
            None.

        Raises:
            TypeError: If the 'config' parameter is not of type AlbertConfig.
            ValueError: If the 'num_labels' attribute is not found in the 'config' parameter.
            AttributeError: If an attribute error occurs while initializing the model components.
        """
        super().__init__(config)
        self.num_labels = config.num_labels

        self.albert = AlbertModel(config, add_pooling_layer=False)
        self.qa_outputs = nn.Linear(config.hidden_size, config.num_labels)

        # Initialize weights and apply final processing
        self.post_init()

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        token_type_ids: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        start_positions: Optional[mindspore.Tensor] = None,
        end_positions: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[AlbertForPreTrainingOutput, Tuple]:
        r"""
        Args:
            start_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
                Labels for position (index) of the start of the labelled span for computing the token classification loss.
                Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
                are not taken into account for computing the loss.
            end_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
                Labels for position (index) of the end of the labelled span for computing the token classification loss.
                Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
                are not taken into account for computing the loss.
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        outputs = self.albert(
            input_ids=input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        sequence_output = outputs[0]

        logits: mindspore.Tensor = self.qa_outputs(sequence_output)
        start_logits, end_logits = logits.split(1, axis=-1)
        start_logits = start_logits.squeeze(-1)
        end_logits = end_logits.squeeze(-1)

        total_loss = None
        if start_positions is not None and end_positions is not None:
            # If we are on multi-GPU, split add a dimension
            if len(start_positions.shape) > 1:
                start_positions = start_positions.squeeze(-1)
            if len(end_positions.shape) > 1:
                end_positions = end_positions.squeeze(-1)
            # sometimes the start/end positions are outside our model inputs, we ignore these terms
            ignored_index = start_logits.shape[1]
            start_positions = start_positions.clamp(0, ignored_index)
            end_positions = end_positions.clamp(0, ignored_index)

            start_loss = F.cross_entropy(start_logits, start_positions, ignore_index=ignored_index)
            end_loss = F.cross_entropy(end_logits, end_positions, ignore_index=ignored_index)
            total_loss = (start_loss + end_loss) / 2

        if not return_dict:
            output = (start_logits, end_logits) + outputs[2:]
            return ((total_loss,) + output) if total_loss is not None else output

        return QuestionAnsweringModelOutput(
            loss=total_loss,
            start_logits=start_logits,
            end_logits=end_logits,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )

mindnlp.transformers.models.albert.modeling_albert.AlbertForQuestionAnswering.__init__(config)

Initializes an instance of AlbertForQuestionAnswering.

PARAMETER DESCRIPTION
self

The instance of the class.

config

An instance of AlbertConfig containing the configuration parameters for the model. It is used to set up the model architecture and initialize its components. The parameter 'config' should be of type AlbertConfig.

TYPE: AlbertConfig

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
TypeError

If the 'config' parameter is not of type AlbertConfig.

ValueError

If the 'num_labels' attribute is not found in the 'config' parameter.

AttributeError

If an attribute error occurs while initializing the model components.

Source code in mindnlp/transformers/models/albert/modeling_albert.py
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
def __init__(self, config: AlbertConfig):
    """
    Initializes an instance of AlbertForQuestionAnswering.

    Args:
        self: The instance of the class.
        config (AlbertConfig): An instance of AlbertConfig containing the configuration parameters for the model.
            It is used to set up the model architecture and initialize its components.
            The parameter 'config' should be of type AlbertConfig.

    Returns:
        None.

    Raises:
        TypeError: If the 'config' parameter is not of type AlbertConfig.
        ValueError: If the 'num_labels' attribute is not found in the 'config' parameter.
        AttributeError: If an attribute error occurs while initializing the model components.
    """
    super().__init__(config)
    self.num_labels = config.num_labels

    self.albert = AlbertModel(config, add_pooling_layer=False)
    self.qa_outputs = nn.Linear(config.hidden_size, config.num_labels)

    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.albert.modeling_albert.AlbertForQuestionAnswering.forward(input_ids=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, start_positions=None, end_positions=None, output_attentions=None, output_hidden_states=None, return_dict=None)

PARAMETER DESCRIPTION
start_positions

Labels for position (index) of the start of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length). Position outside of the sequence are not taken into account for computing the loss.

TYPE: `mindspore.Tensor` of shape `(batch_size,)`, *optional* DEFAULT: None

end_positions

Labels for position (index) of the end of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length). Position outside of the sequence are not taken into account for computing the loss.

TYPE: `mindspore.Tensor` of shape `(batch_size,)`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/albert/modeling_albert.py
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    token_type_ids: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    start_positions: Optional[mindspore.Tensor] = None,
    end_positions: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[AlbertForPreTrainingOutput, Tuple]:
    r"""
    Args:
        start_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for position (index) of the start of the labelled span for computing the token classification loss.
            Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
            are not taken into account for computing the loss.
        end_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for position (index) of the end of the labelled span for computing the token classification loss.
            Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
            are not taken into account for computing the loss.
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    outputs = self.albert(
        input_ids=input_ids,
        attention_mask=attention_mask,
        token_type_ids=token_type_ids,
        position_ids=position_ids,
        head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    sequence_output = outputs[0]

    logits: mindspore.Tensor = self.qa_outputs(sequence_output)
    start_logits, end_logits = logits.split(1, axis=-1)
    start_logits = start_logits.squeeze(-1)
    end_logits = end_logits.squeeze(-1)

    total_loss = None
    if start_positions is not None and end_positions is not None:
        # If we are on multi-GPU, split add a dimension
        if len(start_positions.shape) > 1:
            start_positions = start_positions.squeeze(-1)
        if len(end_positions.shape) > 1:
            end_positions = end_positions.squeeze(-1)
        # sometimes the start/end positions are outside our model inputs, we ignore these terms
        ignored_index = start_logits.shape[1]
        start_positions = start_positions.clamp(0, ignored_index)
        end_positions = end_positions.clamp(0, ignored_index)

        start_loss = F.cross_entropy(start_logits, start_positions, ignore_index=ignored_index)
        end_loss = F.cross_entropy(end_logits, end_positions, ignore_index=ignored_index)
        total_loss = (start_loss + end_loss) / 2

    if not return_dict:
        output = (start_logits, end_logits) + outputs[2:]
        return ((total_loss,) + output) if total_loss is not None else output

    return QuestionAnsweringModelOutput(
        loss=total_loss,
        start_logits=start_logits,
        end_logits=end_logits,
        hidden_states=outputs.hidden_states,
        attentions=outputs.attentions,
    )

mindnlp.transformers.models.albert.modeling_albert.AlbertForSequenceClassification

Bases: AlbertPreTrainedModel

This class represents an Albert model for sequence classification. It inherits from AlbertPreTrainedModel and includes methods for initializing the model and forwarding the sequence classification output. The model utilizes the Albert architecture for natural language processing tasks, such as text classification and regression.

The init method initializes the AlbertForSequenceClassification model with the provided AlbertConfig. It sets the number of labels, config, Albert model, dropout layer, and classifier for sequence classification.

The forward method takes input tensors and optional arguments for sequence classification and returns the sequence classifier output. It also handles the computation of loss based on the problem type and labels provided.

Note

This docstring is a high-level summary and is not meant to be executed as code.

Source code in mindnlp/transformers/models/albert/modeling_albert.py
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
class AlbertForSequenceClassification(AlbertPreTrainedModel):

    """
    This class represents an Albert model for sequence classification.
    It inherits from AlbertPreTrainedModel and includes methods for initializing the model and forwarding the sequence classification
    output.
    The model utilizes the Albert architecture for natural language processing tasks, such as text classification and regression.

    The __init__ method initializes the AlbertForSequenceClassification model with the provided AlbertConfig.
    It sets the number of labels, config, Albert model, dropout layer, and classifier for sequence classification.

    The forward method takes input tensors and optional arguments for sequence classification and returns the sequence classifier output.
    It also handles the computation of loss based on the problem type and labels provided.

    Note:
        This docstring is a high-level summary and is not meant to be executed as code.
    """
    def __init__(self, config: AlbertConfig):
        """
        Initializes a new instance of the `AlbertForSequenceClassification` class.

        Args:
            self: The instance of the class.
            config (AlbertConfig): The configuration object for the model.

        Returns:
            None

        Raises:
            None

        Description:
            This method initializes a new instance of the `AlbertForSequenceClassification` class. It takes in two parameters: `self` and `config`. The `self` parameter represents the instance of the class itself.
            The `config` parameter is an object of the `AlbertConfig` class, which holds the configuration settings for the model.

            This method performs the following operations:

            1. Calls the `__init__` method of the base class to initialize the inherited attributes.
            2. Sets the `num_labels` attribute of the instance to the `num_labels` value from the `config` parameter.
            3. Sets the `config` attribute of the instance to the `config` parameter.
            4. Creates a new instance of the `AlbertModel` class, named `albert`, using the `config` parameter.
            5. Creates a new instance of the `nn.Dropout` class, named `dropout`, with the dropout probability specified in `config.classifier_dropout_prob`.
            6. Creates a new instance of the `nn.Linear` class, named `classifier`, with the input size of `config.hidden_size` and the output size of `config.num_labels`.
            7. Calls the `post_init` method to perform any additional initialization steps.

        Note:
            The `AlbertForSequenceClassification` class is typically used for sequence classification tasks using the ALBERT model.
        """
        super().__init__(config)
        self.num_labels = config.num_labels
        self.config = config

        self.albert = AlbertModel(config)
        self.dropout = nn.Dropout(p=config.classifier_dropout_prob)
        self.classifier = nn.Linear(config.hidden_size, self.config.num_labels)

        # Initialize weights and apply final processing
        self.post_init()

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        token_type_ids: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        labels: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[SequenceClassifierOutput, Tuple]:
        r"""
        Args:
            labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
                Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
                config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
                `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        outputs = self.albert(
            input_ids=input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        pooled_output = outputs[1]

        pooled_output = self.dropout(pooled_output)
        logits = self.classifier(pooled_output)

        loss = None
        if labels is not None:
            if self.config.problem_type is None:
                if self.num_labels == 1:
                    self.config.problem_type = "regression"
                elif self.num_labels > 1 and labels.dtype in (mindspore.int64, mindspore.int32):
                    self.config.problem_type = "single_label_classification"
                else:
                    self.config.problem_type = "multi_label_classification"

            if self.config.problem_type == "regression":
                if self.num_labels == 1:
                    loss = ops.mse_loss(logits.squeeze(), labels.squeeze())
                else:
                    loss = ops.mse_loss(logits, labels)
            elif self.config.problem_type == "single_label_classification":
                loss = F.cross_entropy(logits.view(-1, self.num_labels), labels.view(-1))
            elif self.config.problem_type == "multi_label_classification":
                loss = ops.binary_cross_entropy_with_logits(logits, labels)

        if not return_dict:
            output = (logits,) + outputs[2:]
            return ((loss,) + output) if loss is not None else output

        return SequenceClassifierOutput(
            loss=loss,
            logits=logits,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )

mindnlp.transformers.models.albert.modeling_albert.AlbertForSequenceClassification.__init__(config)

Initializes a new instance of the AlbertForSequenceClassification class.

PARAMETER DESCRIPTION
self

The instance of the class.

config

The configuration object for the model.

TYPE: AlbertConfig

RETURNS DESCRIPTION

None

Description

This method initializes a new instance of the AlbertForSequenceClassification class. It takes in two parameters: self and config. The self parameter represents the instance of the class itself. The config parameter is an object of the AlbertConfig class, which holds the configuration settings for the model.

This method performs the following operations:

  1. Calls the __init__ method of the base class to initialize the inherited attributes.
  2. Sets the num_labels attribute of the instance to the num_labels value from the config parameter.
  3. Sets the config attribute of the instance to the config parameter.
  4. Creates a new instance of the AlbertModel class, named albert, using the config parameter.
  5. Creates a new instance of the nn.Dropout class, named dropout, with the dropout probability specified in config.classifier_dropout_prob.
  6. Creates a new instance of the nn.Linear class, named classifier, with the input size of config.hidden_size and the output size of config.num_labels.
  7. Calls the post_init method to perform any additional initialization steps.
Note

The AlbertForSequenceClassification class is typically used for sequence classification tasks using the ALBERT model.

Source code in mindnlp/transformers/models/albert/modeling_albert.py
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
def __init__(self, config: AlbertConfig):
    """
    Initializes a new instance of the `AlbertForSequenceClassification` class.

    Args:
        self: The instance of the class.
        config (AlbertConfig): The configuration object for the model.

    Returns:
        None

    Raises:
        None

    Description:
        This method initializes a new instance of the `AlbertForSequenceClassification` class. It takes in two parameters: `self` and `config`. The `self` parameter represents the instance of the class itself.
        The `config` parameter is an object of the `AlbertConfig` class, which holds the configuration settings for the model.

        This method performs the following operations:

        1. Calls the `__init__` method of the base class to initialize the inherited attributes.
        2. Sets the `num_labels` attribute of the instance to the `num_labels` value from the `config` parameter.
        3. Sets the `config` attribute of the instance to the `config` parameter.
        4. Creates a new instance of the `AlbertModel` class, named `albert`, using the `config` parameter.
        5. Creates a new instance of the `nn.Dropout` class, named `dropout`, with the dropout probability specified in `config.classifier_dropout_prob`.
        6. Creates a new instance of the `nn.Linear` class, named `classifier`, with the input size of `config.hidden_size` and the output size of `config.num_labels`.
        7. Calls the `post_init` method to perform any additional initialization steps.

    Note:
        The `AlbertForSequenceClassification` class is typically used for sequence classification tasks using the ALBERT model.
    """
    super().__init__(config)
    self.num_labels = config.num_labels
    self.config = config

    self.albert = AlbertModel(config)
    self.dropout = nn.Dropout(p=config.classifier_dropout_prob)
    self.classifier = nn.Linear(config.hidden_size, self.config.num_labels)

    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.albert.modeling_albert.AlbertForSequenceClassification.forward(input_ids=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=None)

PARAMETER DESCRIPTION
labels

Labels for computing the sequence classification/regression loss. Indices should be in [0, ..., config.num_labels - 1]. If config.num_labels == 1 a regression loss is computed (Mean-Square loss), If config.num_labels > 1 a classification loss is computed (Cross-Entropy).

TYPE: `mindspore.Tensor` of shape `(batch_size,)`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/albert/modeling_albert.py
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    token_type_ids: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    labels: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[SequenceClassifierOutput, Tuple]:
    r"""
    Args:
        labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
            config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
            `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    outputs = self.albert(
        input_ids=input_ids,
        attention_mask=attention_mask,
        token_type_ids=token_type_ids,
        position_ids=position_ids,
        head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    pooled_output = outputs[1]

    pooled_output = self.dropout(pooled_output)
    logits = self.classifier(pooled_output)

    loss = None
    if labels is not None:
        if self.config.problem_type is None:
            if self.num_labels == 1:
                self.config.problem_type = "regression"
            elif self.num_labels > 1 and labels.dtype in (mindspore.int64, mindspore.int32):
                self.config.problem_type = "single_label_classification"
            else:
                self.config.problem_type = "multi_label_classification"

        if self.config.problem_type == "regression":
            if self.num_labels == 1:
                loss = ops.mse_loss(logits.squeeze(), labels.squeeze())
            else:
                loss = ops.mse_loss(logits, labels)
        elif self.config.problem_type == "single_label_classification":
            loss = F.cross_entropy(logits.view(-1, self.num_labels), labels.view(-1))
        elif self.config.problem_type == "multi_label_classification":
            loss = ops.binary_cross_entropy_with_logits(logits, labels)

    if not return_dict:
        output = (logits,) + outputs[2:]
        return ((loss,) + output) if loss is not None else output

    return SequenceClassifierOutput(
        loss=loss,
        logits=logits,
        hidden_states=outputs.hidden_states,
        attentions=outputs.attentions,
    )

mindnlp.transformers.models.albert.modeling_albert.AlbertForTokenClassification

Bases: AlbertPreTrainedModel

This class represents an Albert model for token classification, specifically designed for tasks like named entity recognition or part-of-speech tagging. It extends the AlbertPreTrainedModel class.

ATTRIBUTE DESCRIPTION
num_labels

The number of labels for the token classification task.

TYPE: int

albert

The underlying AlbertModel instance for feature extraction.

TYPE: AlbertModel

dropout

Dropout layer for regularization.

TYPE: Dropout

classifier

Dense layer for classification.

TYPE: Linear

config

The configuration object for the model.

TYPE: AlbertConfig

METHOD DESCRIPTION
__init__

Initializes the AlbertForTokenClassification instance.

forward

Constructs the AlbertForTokenClassification model.

Example
>>> # Initialize the configuration object
>>> config = AlbertConfig(num_labels=10, hidden_size=256, classifier_dropout_prob=0.1)
...
>>> # Create an instance of AlbertForTokenClassification
>>> model = AlbertForTokenClassification(config)
...
>>> # Perform forward pass
>>> outputs = model.forward(input_ids, attention_mask, labels=labels)
...
>>> # Extract the logits
>>> logits = outputs.logits
...
>>> # Calculate the loss
>>> loss = outputs.loss
Note

The labels should be tensor of shape (batch_size, sequence_length) with indices in the range [0, ..., num_labels - 1].

Source code in mindnlp/transformers/models/albert/modeling_albert.py
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
class AlbertForTokenClassification(AlbertPreTrainedModel):

    """
    This class represents an Albert model for token classification, specifically designed for tasks
    like named entity recognition or part-of-speech tagging. It extends the AlbertPreTrainedModel class.

    Attributes:
        num_labels (int): The number of labels for the token classification task.
        albert (AlbertModel): The underlying AlbertModel instance for feature extraction.
        dropout (nn.Dropout): Dropout layer for regularization.
        classifier (nn.Linear): Dense layer for classification.
        config (AlbertConfig): The configuration object for the model.

    Methods:
        __init__: Initializes the AlbertForTokenClassification instance.
        forward: Constructs the AlbertForTokenClassification model.

    Example:
        ```python
        >>> # Initialize the configuration object
        >>> config = AlbertConfig(num_labels=10, hidden_size=256, classifier_dropout_prob=0.1)
        ...
        >>> # Create an instance of AlbertForTokenClassification
        >>> model = AlbertForTokenClassification(config)
        ...
        >>> # Perform forward pass
        >>> outputs = model.forward(input_ids, attention_mask, labels=labels)
        ...
        >>> # Extract the logits
        >>> logits = outputs.logits
        ...
        >>> # Calculate the loss
        >>> loss = outputs.loss
        ```

    Note:
        The labels should be tensor of shape `(batch_size, sequence_length)` with indices in the range `[0, ..., num_labels - 1]`.
    """
    def __init__(self, config: AlbertConfig):
        """
        Initializes an instance of the AlbertForTokenClassification class.

        Args:
            self: The instance of the class.
            config (AlbertConfig): The configuration for the Albert model.
                It contains various hyperparameters to customize the model.
                The config parameter should be an instance of AlbertConfig class.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__(config)
        self.num_labels = config.num_labels

        self.albert = AlbertModel(config, add_pooling_layer=False)
        classifier_dropout_prob = (
            config.classifier_dropout_prob
            if config.classifier_dropout_prob is not None
            else config.hidden_dropout_prob
        )
        self.dropout = nn.Dropout(p=classifier_dropout_prob)
        self.classifier = nn.Linear(config.hidden_size, self.config.num_labels)

        # Initialize weights and apply final processing
        self.post_init()

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        token_type_ids: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        labels: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[TokenClassifierOutput, Tuple]:
        r"""
        Args:
            labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
                Labels for computing the token classification loss. Indices should be in `[0, ..., config.num_labels - 1]`.
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        outputs = self.albert(
            input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        sequence_output = outputs[0]

        sequence_output = self.dropout(sequence_output)
        logits = self.classifier(sequence_output)

        loss = None
        if labels is not None:
            loss = F.cross_entropy(logits.view(-1, self.num_labels), labels.view(-1))

        if not return_dict:
            output = (logits,) + outputs[2:]
            return ((loss,) + output) if loss is not None else output

        return TokenClassifierOutput(
            loss=loss,
            logits=logits,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )

mindnlp.transformers.models.albert.modeling_albert.AlbertForTokenClassification.__init__(config)

Initializes an instance of the AlbertForTokenClassification class.

PARAMETER DESCRIPTION
self

The instance of the class.

config

The configuration for the Albert model. It contains various hyperparameters to customize the model. The config parameter should be an instance of AlbertConfig class.

TYPE: AlbertConfig

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/albert/modeling_albert.py
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
def __init__(self, config: AlbertConfig):
    """
    Initializes an instance of the AlbertForTokenClassification class.

    Args:
        self: The instance of the class.
        config (AlbertConfig): The configuration for the Albert model.
            It contains various hyperparameters to customize the model.
            The config parameter should be an instance of AlbertConfig class.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__(config)
    self.num_labels = config.num_labels

    self.albert = AlbertModel(config, add_pooling_layer=False)
    classifier_dropout_prob = (
        config.classifier_dropout_prob
        if config.classifier_dropout_prob is not None
        else config.hidden_dropout_prob
    )
    self.dropout = nn.Dropout(p=classifier_dropout_prob)
    self.classifier = nn.Linear(config.hidden_size, self.config.num_labels)

    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.albert.modeling_albert.AlbertForTokenClassification.forward(input_ids=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=None)

PARAMETER DESCRIPTION
labels

Labels for computing the token classification loss. Indices should be in [0, ..., config.num_labels - 1].

TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/albert/modeling_albert.py
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    token_type_ids: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    labels: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[TokenClassifierOutput, Tuple]:
    r"""
    Args:
        labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
            Labels for computing the token classification loss. Indices should be in `[0, ..., config.num_labels - 1]`.
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    outputs = self.albert(
        input_ids,
        attention_mask=attention_mask,
        token_type_ids=token_type_ids,
        position_ids=position_ids,
        head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    sequence_output = outputs[0]

    sequence_output = self.dropout(sequence_output)
    logits = self.classifier(sequence_output)

    loss = None
    if labels is not None:
        loss = F.cross_entropy(logits.view(-1, self.num_labels), labels.view(-1))

    if not return_dict:
        output = (logits,) + outputs[2:]
        return ((loss,) + output) if loss is not None else output

    return TokenClassifierOutput(
        loss=loss,
        logits=logits,
        hidden_states=outputs.hidden_states,
        attentions=outputs.attentions,
    )

mindnlp.transformers.models.albert.modeling_albert.AlbertLayer

Bases: Module

This class represents an AlbertLayer module, which is a single layer of the Albert model. It inherits from nn.Module and contains methods for initialization and forward pass computation.

The init method initializes the AlbertLayer with the provided configuration. It sets various attributes based on the configuration, including chunk size for feed forward, sequence length dimension, layer normalization, attention module, feed forward network, activation function, and dropout.

The forward method computes the forward pass for the AlbertLayer. It takes hidden_states, attention_mask, head_mask, output_attentions, and output_hidden_states as input and returns the hidden states along with optional attention outputs.

The ff_chunk method is a helper function used within the forward method to perform the feed forward computation.

Note

This class assumes that the nn module is imported as nn and that the AlbertAttention and ACT2FN classes are defined elsewhere.

Source code in mindnlp/transformers/models/albert/modeling_albert.py
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
class AlbertLayer(nn.Module):

    '''
    This class represents an AlbertLayer module, which is a single layer of the Albert model.
    It inherits from nn.Module and contains methods for initialization and forward pass computation.

    The __init__ method initializes the AlbertLayer with the provided configuration.
    It sets various attributes based on the configuration, including chunk size for feed forward, sequence length dimension,
    layer normalization, attention module, feed forward network, activation function, and dropout.

    The forward method computes the forward pass for the AlbertLayer.
    It takes hidden_states, attention_mask, head_mask, output_attentions, and output_hidden_states as input and returns the hidden states
    along with optional attention outputs.

    The ff_chunk method is a helper function used within the forward method to perform the feed forward computation.

    Note:
        This class assumes that the nn module is imported as nn and that the AlbertAttention and ACT2FN classes are defined elsewhere.
    '''
    def __init__(self, config: AlbertConfig):
        """Initializes an instance of the AlbertLayer class.

        Args:
            self: The instance of the class.
            config (AlbertConfig): The configuration object for the Albert model.
                This object contains various settings and hyperparameters for the model.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()

        self.config = config
        self.chunk_size_feed_forward = config.chunk_size_feed_forward
        self.seq_len_dim = 1
        self.full_layer_layer_norm = nn.LayerNorm([config.hidden_size], eps=config.layer_norm_eps)
        self.attention = AlbertAttention(config)
        self.ffn = nn.Linear(config.hidden_size, config.intermediate_size)
        self.ffn_output = nn.Linear(config.intermediate_size, config.hidden_size)
        self.activation = ACT2FN[config.hidden_act]
        self.dropout = nn.Dropout(p=config.hidden_dropout_prob)

    def forward(
        self,
        hidden_states: mindspore.Tensor,
        attention_mask: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        output_attentions: bool = False,
        output_hidden_states: bool = False,
    ) -> Tuple[mindspore.Tensor, mindspore.Tensor]:
        '''
        Constructs an AlbertLayer.

        Args:
            self: The instance of the class.
            hidden_states (mindspore.Tensor): The input hidden states.
                Tensor of shape (batch_size, seq_len, hidden_size).
            attention_mask (Optional[mindspore.Tensor]): Mask for attention computation.
                Tensor of shape (batch_size, seq_len).
            head_mask (Optional[mindspore.Tensor]): Mask for attention computation.
                Tensor of shape (num_heads,) or (num_layers, num_heads).
            output_attentions (bool): Whether to output attentions.
            output_hidden_states (bool): Whether to output hidden states.

        Returns:
            Tuple[mindspore.Tensor, mindspore.Tensor]: A tuple containing the updated hidden states
            and additional outputs based on the arguments.

        Raises:
            ValueError: If the shapes of input tensors are invalid.
            TypeError: If the input types are incorrect.
            RuntimeError: If an error occurs during the computation.
        '''
        attention_output = self.attention(hidden_states, attention_mask, head_mask, output_attentions)

        ffn_output = apply_chunking_to_forward(
            self.ff_chunk,
            self.chunk_size_feed_forward,
            self.seq_len_dim,
            attention_output[0],
        )
        hidden_states = self.full_layer_layer_norm(ffn_output + attention_output[0])

        return (hidden_states,) + attention_output[1:]  # add attentions if we output them

    def ff_chunk(self, attention_output: mindspore.Tensor) -> mindspore.Tensor:
        """
        Performs a feedforward chunk operation on the input attention output tensor.

        Args:
            self: Instance of the AlbertLayer class.
            attention_output (mindspore.Tensor): The input tensor representing the attention output.
                This tensor is expected to have the shape (batch_size, seq_length, hidden_size).
                It serves as the input to the feedforward network.

        Returns:
            mindspore.Tensor: The output tensor after applying the feedforward chunk operation.
                The shape of the returned tensor is expected to be (batch_size, seq_length, hidden_size).

        Raises:
            None
        """
        ffn_output = self.ffn(attention_output)
        ffn_output = self.activation(ffn_output)
        ffn_output = self.ffn_output(ffn_output)
        return ffn_output

mindnlp.transformers.models.albert.modeling_albert.AlbertLayer.__init__(config)

Initializes an instance of the AlbertLayer class.

PARAMETER DESCRIPTION
self

The instance of the class.

config

The configuration object for the Albert model. This object contains various settings and hyperparameters for the model.

TYPE: AlbertConfig

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/albert/modeling_albert.py
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
def __init__(self, config: AlbertConfig):
    """Initializes an instance of the AlbertLayer class.

    Args:
        self: The instance of the class.
        config (AlbertConfig): The configuration object for the Albert model.
            This object contains various settings and hyperparameters for the model.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()

    self.config = config
    self.chunk_size_feed_forward = config.chunk_size_feed_forward
    self.seq_len_dim = 1
    self.full_layer_layer_norm = nn.LayerNorm([config.hidden_size], eps=config.layer_norm_eps)
    self.attention = AlbertAttention(config)
    self.ffn = nn.Linear(config.hidden_size, config.intermediate_size)
    self.ffn_output = nn.Linear(config.intermediate_size, config.hidden_size)
    self.activation = ACT2FN[config.hidden_act]
    self.dropout = nn.Dropout(p=config.hidden_dropout_prob)

mindnlp.transformers.models.albert.modeling_albert.AlbertLayer.ff_chunk(attention_output)

Performs a feedforward chunk operation on the input attention output tensor.

PARAMETER DESCRIPTION
self

Instance of the AlbertLayer class.

attention_output

The input tensor representing the attention output. This tensor is expected to have the shape (batch_size, seq_length, hidden_size). It serves as the input to the feedforward network.

TYPE: Tensor

RETURNS DESCRIPTION
Tensor

mindspore.Tensor: The output tensor after applying the feedforward chunk operation. The shape of the returned tensor is expected to be (batch_size, seq_length, hidden_size).

Source code in mindnlp/transformers/models/albert/modeling_albert.py
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
def ff_chunk(self, attention_output: mindspore.Tensor) -> mindspore.Tensor:
    """
    Performs a feedforward chunk operation on the input attention output tensor.

    Args:
        self: Instance of the AlbertLayer class.
        attention_output (mindspore.Tensor): The input tensor representing the attention output.
            This tensor is expected to have the shape (batch_size, seq_length, hidden_size).
            It serves as the input to the feedforward network.

    Returns:
        mindspore.Tensor: The output tensor after applying the feedforward chunk operation.
            The shape of the returned tensor is expected to be (batch_size, seq_length, hidden_size).

    Raises:
        None
    """
    ffn_output = self.ffn(attention_output)
    ffn_output = self.activation(ffn_output)
    ffn_output = self.ffn_output(ffn_output)
    return ffn_output

mindnlp.transformers.models.albert.modeling_albert.AlbertLayer.forward(hidden_states, attention_mask=None, head_mask=None, output_attentions=False, output_hidden_states=False)

Constructs an AlbertLayer.

PARAMETER DESCRIPTION
self

The instance of the class.

hidden_states

The input hidden states. Tensor of shape (batch_size, seq_len, hidden_size).

TYPE: Tensor

attention_mask

Mask for attention computation. Tensor of shape (batch_size, seq_len).

TYPE: Optional[Tensor] DEFAULT: None

head_mask

Mask for attention computation. Tensor of shape (num_heads,) or (num_layers, num_heads).

TYPE: Optional[Tensor] DEFAULT: None

output_attentions

Whether to output attentions.

TYPE: bool DEFAULT: False

output_hidden_states

Whether to output hidden states.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
Tensor

Tuple[mindspore.Tensor, mindspore.Tensor]: A tuple containing the updated hidden states

Tensor

and additional outputs based on the arguments.

RAISES DESCRIPTION
ValueError

If the shapes of input tensors are invalid.

TypeError

If the input types are incorrect.

RuntimeError

If an error occurs during the computation.

Source code in mindnlp/transformers/models/albert/modeling_albert.py
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
def forward(
    self,
    hidden_states: mindspore.Tensor,
    attention_mask: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    output_attentions: bool = False,
    output_hidden_states: bool = False,
) -> Tuple[mindspore.Tensor, mindspore.Tensor]:
    '''
    Constructs an AlbertLayer.

    Args:
        self: The instance of the class.
        hidden_states (mindspore.Tensor): The input hidden states.
            Tensor of shape (batch_size, seq_len, hidden_size).
        attention_mask (Optional[mindspore.Tensor]): Mask for attention computation.
            Tensor of shape (batch_size, seq_len).
        head_mask (Optional[mindspore.Tensor]): Mask for attention computation.
            Tensor of shape (num_heads,) or (num_layers, num_heads).
        output_attentions (bool): Whether to output attentions.
        output_hidden_states (bool): Whether to output hidden states.

    Returns:
        Tuple[mindspore.Tensor, mindspore.Tensor]: A tuple containing the updated hidden states
        and additional outputs based on the arguments.

    Raises:
        ValueError: If the shapes of input tensors are invalid.
        TypeError: If the input types are incorrect.
        RuntimeError: If an error occurs during the computation.
    '''
    attention_output = self.attention(hidden_states, attention_mask, head_mask, output_attentions)

    ffn_output = apply_chunking_to_forward(
        self.ff_chunk,
        self.chunk_size_feed_forward,
        self.seq_len_dim,
        attention_output[0],
    )
    hidden_states = self.full_layer_layer_norm(ffn_output + attention_output[0])

    return (hidden_states,) + attention_output[1:]  # add attentions if we output them

mindnlp.transformers.models.albert.modeling_albert.AlbertLayerGroup

Bases: Module

This class represents a group of Albert layers within the Albert model. It inherits from the nn.Module class.

ATTRIBUTE DESCRIPTION
albert_layers

A list of AlbertLayer instances that make up the group.

TYPE: ModuleList

METHOD DESCRIPTION
__init__

Initializes an instance of the AlbertLayerGroup class.

forward

Constructs the AlbertLayerGroup by applying each AlbertLayer in the group to the input hidden_states. This method returns the resulting hidden states and optionally the layer attentions and hidden states.

Source code in mindnlp/transformers/models/albert/modeling_albert.py
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
class AlbertLayerGroup(nn.Module):

    """
    This class represents a group of Albert layers within the Albert model. It inherits from the nn.Module class.

    Attributes:
        albert_layers (nn.ModuleList): A list of AlbertLayer instances that make up the group.

    Methods:
        __init__:
            Initializes an instance of the AlbertLayerGroup class.

        forward:
            Constructs the AlbertLayerGroup by applying each AlbertLayer in the group to the input hidden_states.
            This method returns the resulting hidden states and optionally the layer attentions and hidden states.

    """
    def __init__(self, config: AlbertConfig):
        """
        Initializes an instance of the AlbertLayerGroup class.

        Args:
            self: The instance of the class.
            config (AlbertConfig):
                An instance of the AlbertConfig class that holds the configuration parameters for the Albert model.

        Returns:
            None

        Raises:
            None

        Description:
            This method initializes an instance of the AlbertLayerGroup class.
            It takes in a configuration object of type AlbertConfig, which holds the configuration parameters for the Albert model.
            The method initializes the superclass and creates a list of AlbertLayer objects, each with the given configuration parameters.
            The number of AlbertLayer objects in the list is determined by the 'inner_group_num' parameter of the configuration object.
        """
        super().__init__()

        self.albert_layers = nn.ModuleList([AlbertLayer(config) for _ in range(config.inner_group_num)])

    def forward(
        self,
        hidden_states: mindspore.Tensor,
        attention_mask: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        output_attentions: bool = False,
        output_hidden_states: bool = False,
    ) -> Tuple[Union[mindspore.Tensor, Tuple[mindspore.Tensor]], ...]:
        """
        Constructs an Albert Layer Group.

        Args:
            self: An instance of the AlbertLayerGroup class.
            hidden_states (mindspore.Tensor): The input hidden states of shape (batch_size, sequence_length, hidden_size).
            attention_mask (Optional[mindspore.Tensor]): The attention mask tensor of shape (batch_size, sequence_length).
            head_mask (Optional[mindspore.Tensor]): The head mask tensor of shape (num_hidden_layers, num_attention_heads).
            output_attentions (bool): Whether to return the attention weights. Default is False.
            output_hidden_states (bool): Whether to return the hidden states of all layers. Default is False.

        Returns:
            Tuple[Union[mindspore.Tensor, Tuple[mindspore.Tensor]], ...]:
                A tuple containing the output hidden states of shape (batch_size, sequence_length, hidden_size).

                - If output_hidden_states is True, the tuple also contains the hidden states of all layers.
                - If output_attentions is True, the tuple also contains the attention weights of all layers.

        Raises:
            None.
        """
        layer_hidden_states = ()
        layer_attentions = ()

        for layer_index, albert_layer in enumerate(self.albert_layers):
            layer_output = albert_layer(hidden_states, attention_mask, head_mask[layer_index], output_attentions)
            hidden_states = layer_output[0]

            if output_attentions:
                layer_attentions = layer_attentions + (layer_output[1],)

            if output_hidden_states:
                layer_hidden_states = layer_hidden_states + (hidden_states,)

        outputs = (hidden_states,)
        if output_hidden_states:
            outputs = outputs + (layer_hidden_states,)
        if output_attentions:
            outputs = outputs + (layer_attentions,)
        return outputs  # last-layer hidden state, (layer hidden states), (layer attentions)

mindnlp.transformers.models.albert.modeling_albert.AlbertLayerGroup.__init__(config)

Initializes an instance of the AlbertLayerGroup class.

PARAMETER DESCRIPTION
self

The instance of the class.

config

An instance of the AlbertConfig class that holds the configuration parameters for the Albert model.

TYPE: AlbertConfig

RETURNS DESCRIPTION

None

Description

This method initializes an instance of the AlbertLayerGroup class. It takes in a configuration object of type AlbertConfig, which holds the configuration parameters for the Albert model. The method initializes the superclass and creates a list of AlbertLayer objects, each with the given configuration parameters. The number of AlbertLayer objects in the list is determined by the 'inner_group_num' parameter of the configuration object.

Source code in mindnlp/transformers/models/albert/modeling_albert.py
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
def __init__(self, config: AlbertConfig):
    """
    Initializes an instance of the AlbertLayerGroup class.

    Args:
        self: The instance of the class.
        config (AlbertConfig):
            An instance of the AlbertConfig class that holds the configuration parameters for the Albert model.

    Returns:
        None

    Raises:
        None

    Description:
        This method initializes an instance of the AlbertLayerGroup class.
        It takes in a configuration object of type AlbertConfig, which holds the configuration parameters for the Albert model.
        The method initializes the superclass and creates a list of AlbertLayer objects, each with the given configuration parameters.
        The number of AlbertLayer objects in the list is determined by the 'inner_group_num' parameter of the configuration object.
    """
    super().__init__()

    self.albert_layers = nn.ModuleList([AlbertLayer(config) for _ in range(config.inner_group_num)])

mindnlp.transformers.models.albert.modeling_albert.AlbertLayerGroup.forward(hidden_states, attention_mask=None, head_mask=None, output_attentions=False, output_hidden_states=False)

Constructs an Albert Layer Group.

PARAMETER DESCRIPTION
self

An instance of the AlbertLayerGroup class.

hidden_states

The input hidden states of shape (batch_size, sequence_length, hidden_size).

TYPE: Tensor

attention_mask

The attention mask tensor of shape (batch_size, sequence_length).

TYPE: Optional[Tensor] DEFAULT: None

head_mask

The head mask tensor of shape (num_hidden_layers, num_attention_heads).

TYPE: Optional[Tensor] DEFAULT: None

output_attentions

Whether to return the attention weights. Default is False.

TYPE: bool DEFAULT: False

output_hidden_states

Whether to return the hidden states of all layers. Default is False.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
Tuple[Union[Tensor, Tuple[Tensor]], ...]

Tuple[Union[mindspore.Tensor, Tuple[mindspore.Tensor]], ...]: A tuple containing the output hidden states of shape (batch_size, sequence_length, hidden_size).

  • If output_hidden_states is True, the tuple also contains the hidden states of all layers.
  • If output_attentions is True, the tuple also contains the attention weights of all layers.
Source code in mindnlp/transformers/models/albert/modeling_albert.py
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
def forward(
    self,
    hidden_states: mindspore.Tensor,
    attention_mask: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    output_attentions: bool = False,
    output_hidden_states: bool = False,
) -> Tuple[Union[mindspore.Tensor, Tuple[mindspore.Tensor]], ...]:
    """
    Constructs an Albert Layer Group.

    Args:
        self: An instance of the AlbertLayerGroup class.
        hidden_states (mindspore.Tensor): The input hidden states of shape (batch_size, sequence_length, hidden_size).
        attention_mask (Optional[mindspore.Tensor]): The attention mask tensor of shape (batch_size, sequence_length).
        head_mask (Optional[mindspore.Tensor]): The head mask tensor of shape (num_hidden_layers, num_attention_heads).
        output_attentions (bool): Whether to return the attention weights. Default is False.
        output_hidden_states (bool): Whether to return the hidden states of all layers. Default is False.

    Returns:
        Tuple[Union[mindspore.Tensor, Tuple[mindspore.Tensor]], ...]:
            A tuple containing the output hidden states of shape (batch_size, sequence_length, hidden_size).

            - If output_hidden_states is True, the tuple also contains the hidden states of all layers.
            - If output_attentions is True, the tuple also contains the attention weights of all layers.

    Raises:
        None.
    """
    layer_hidden_states = ()
    layer_attentions = ()

    for layer_index, albert_layer in enumerate(self.albert_layers):
        layer_output = albert_layer(hidden_states, attention_mask, head_mask[layer_index], output_attentions)
        hidden_states = layer_output[0]

        if output_attentions:
            layer_attentions = layer_attentions + (layer_output[1],)

        if output_hidden_states:
            layer_hidden_states = layer_hidden_states + (hidden_states,)

    outputs = (hidden_states,)
    if output_hidden_states:
        outputs = outputs + (layer_hidden_states,)
    if output_attentions:
        outputs = outputs + (layer_attentions,)
    return outputs  # last-layer hidden state, (layer hidden states), (layer attentions)

mindnlp.transformers.models.albert.modeling_albert.AlbertMLMHead

Bases: Module

AlbertMLMHead class represents the MLM (Masked Language Model) head for an ALBERT (A Lite BERT) model in a neural network. It includes methods for initializing the MLM head, forwarding the prediction scores, and tying the weights.

This class inherits from the nn.Module class and implements the following methods:

  1. init(self, config: AlbertConfig):

    • Initializes the AlbertMLMHead with the provided AlbertConfig settings.
    • Initializes the LayerNorm, bias, dense, decoder, activation, and ties the weights.
  2. forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:

    • Constructs the prediction scores based on the input hidden_states tensor.
    • Applies the dense layer, activation function, LayerNorm, and decoder to generate the prediction scores.
  3. _tie_weights(self) -> None:

    • Ties the weights by setting the bias attribute equal to the decoder's bias.

This class is designed to be used as part of an ALBERT model architecture for masked language modeling tasks.

Source code in mindnlp/transformers/models/albert/modeling_albert.py
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
class AlbertMLMHead(nn.Module):

    """
    AlbertMLMHead class represents the MLM (Masked Language Model) head for an ALBERT (A Lite BERT) model in a neural network.
    It includes methods for initializing the MLM head, forwarding the prediction scores, and tying the weights.

    This class inherits from the nn.Module class and implements the following methods:

    1. __init__(self, config: AlbertConfig):

        - Initializes the AlbertMLMHead with the provided AlbertConfig settings.
        - Initializes the LayerNorm, bias, dense, decoder, activation, and ties the weights.

    2. forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:

        - Constructs the prediction scores based on the input hidden_states tensor.
        - Applies the dense layer, activation function, LayerNorm, and decoder to generate the prediction scores.

    3. _tie_weights(self) -> None:

        - Ties the weights by setting the bias attribute equal to the decoder's bias.

    This class is designed to be used as part of an ALBERT model architecture for masked language modeling tasks.
    """
    def __init__(self, config: AlbertConfig):
        """
        Initializes an instance of the AlbertMLMHead class.

        Args:
            self: The instance of the class itself.
            config (AlbertConfig):
                An object of the AlbertConfig class containing the configuration settings for the model.

                - config.embedding_size (int): The size of the embedding.
                - config.layer_norm_eps (float): The epsilon value for layer normalization.
                - config.vocab_size (int): The size of the vocabulary.
                - config.hidden_size (int): The size of the hidden layer.
                - config.hidden_act (str): The activation function for the hidden layer.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()

        self.LayerNorm = nn.LayerNorm([config.embedding_size], eps=config.layer_norm_eps)
        self.bias = Parameter(ops.zeros(config.vocab_size), 'bias')
        self.dense = nn.Linear(config.hidden_size, config.embedding_size)
        self.decoder = nn.Linear(config.embedding_size, config.vocab_size)
        self.activation = ACT2FN[config.hidden_act]
        self.decoder.bias = self.bias

    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
        """
        This method forwards the Albert Masked Language Model (MLM) head.

        Args:
            self (AlbertMLMHead): An instance of the AlbertMLMHead class.
            hidden_states (mindspore.Tensor): The input hidden states tensor to be processed. It represents the output
                of the previous layer and serves as input to the MLM head. It must be a tensor of shape compatible with
                the internal operations of the method.

        Returns:
            mindspore.Tensor: The prediction scores tensor generated by the MLM head. It represents the model's predictions
                for masked tokens based on the input hidden states.

        Raises:
            None
        """
        hidden_states = self.dense(hidden_states)
        hidden_states = self.activation(hidden_states)
        hidden_states = self.LayerNorm(hidden_states)
        hidden_states = self.decoder(hidden_states)

        prediction_scores = hidden_states

        return prediction_scores

    def _tie_weights(self) -> None:
        """
        This method ties the weights of the decoder bias to the main decoder weights.

        Args:
            self (AlbertMLMHead): The instance of the AlbertMLMHead class.

        Returns:
            None.

        Raises:
            None
        """
        # To tie those two weights if they get disconnected (on TPU or when the bias is resized)
        self.bias = self.decoder.bias

mindnlp.transformers.models.albert.modeling_albert.AlbertMLMHead.__init__(config)

Initializes an instance of the AlbertMLMHead class.

PARAMETER DESCRIPTION
self

The instance of the class itself.

config

An object of the AlbertConfig class containing the configuration settings for the model.

  • config.embedding_size (int): The size of the embedding.
  • config.layer_norm_eps (float): The epsilon value for layer normalization.
  • config.vocab_size (int): The size of the vocabulary.
  • config.hidden_size (int): The size of the hidden layer.
  • config.hidden_act (str): The activation function for the hidden layer.

TYPE: AlbertConfig

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/albert/modeling_albert.py
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
def __init__(self, config: AlbertConfig):
    """
    Initializes an instance of the AlbertMLMHead class.

    Args:
        self: The instance of the class itself.
        config (AlbertConfig):
            An object of the AlbertConfig class containing the configuration settings for the model.

            - config.embedding_size (int): The size of the embedding.
            - config.layer_norm_eps (float): The epsilon value for layer normalization.
            - config.vocab_size (int): The size of the vocabulary.
            - config.hidden_size (int): The size of the hidden layer.
            - config.hidden_act (str): The activation function for the hidden layer.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()

    self.LayerNorm = nn.LayerNorm([config.embedding_size], eps=config.layer_norm_eps)
    self.bias = Parameter(ops.zeros(config.vocab_size), 'bias')
    self.dense = nn.Linear(config.hidden_size, config.embedding_size)
    self.decoder = nn.Linear(config.embedding_size, config.vocab_size)
    self.activation = ACT2FN[config.hidden_act]
    self.decoder.bias = self.bias

mindnlp.transformers.models.albert.modeling_albert.AlbertMLMHead.forward(hidden_states)

This method forwards the Albert Masked Language Model (MLM) head.

PARAMETER DESCRIPTION
self

An instance of the AlbertMLMHead class.

TYPE: AlbertMLMHead

hidden_states

The input hidden states tensor to be processed. It represents the output of the previous layer and serves as input to the MLM head. It must be a tensor of shape compatible with the internal operations of the method.

TYPE: Tensor

RETURNS DESCRIPTION
Tensor

mindspore.Tensor: The prediction scores tensor generated by the MLM head. It represents the model's predictions for masked tokens based on the input hidden states.

Source code in mindnlp/transformers/models/albert/modeling_albert.py
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
    """
    This method forwards the Albert Masked Language Model (MLM) head.

    Args:
        self (AlbertMLMHead): An instance of the AlbertMLMHead class.
        hidden_states (mindspore.Tensor): The input hidden states tensor to be processed. It represents the output
            of the previous layer and serves as input to the MLM head. It must be a tensor of shape compatible with
            the internal operations of the method.

    Returns:
        mindspore.Tensor: The prediction scores tensor generated by the MLM head. It represents the model's predictions
            for masked tokens based on the input hidden states.

    Raises:
        None
    """
    hidden_states = self.dense(hidden_states)
    hidden_states = self.activation(hidden_states)
    hidden_states = self.LayerNorm(hidden_states)
    hidden_states = self.decoder(hidden_states)

    prediction_scores = hidden_states

    return prediction_scores

mindnlp.transformers.models.albert.modeling_albert.AlbertModel

Bases: AlbertPreTrainedModel

This class represents the AlbertModel, which inherits from AlbertPreTrainedModel. It includes methods for initializing the model, getting and setting input embeddings, pruning heads of the model, and forwarding the model. The 'forward' method takes various input parameters and returns the model output. The class also includes detailed comments and error handling for certain scenarios. The 'prune_heads' method is used to prune heads of the model, and the 'forward' method forwards the model based on input parameters. The model outputs are returned based on the specified conditions.

For more information and usage details, refer to the base class 'PreTrainedModel'.

Source code in mindnlp/transformers/models/albert/modeling_albert.py
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
class AlbertModel(AlbertPreTrainedModel):

    """
    This class represents the AlbertModel, which inherits from AlbertPreTrainedModel.
    It includes methods for initializing the model, getting and setting input embeddings, pruning heads of the model, and
    forwarding the model. The 'forward' method takes various input parameters and returns the model output.
    The class also includes detailed comments and error handling for certain scenarios.
    The 'prune_heads' method is used to prune heads of the model, and the 'forward' method forwards the model based on input parameters.
    The model outputs are returned based on the specified conditions.

    For more information and usage details, refer to the base class 'PreTrainedModel'.
    """
    config_class = AlbertConfig
    base_model_prefix = "albert"

    def __init__(self, config: AlbertConfig, add_pooling_layer: bool = True):
        """
        Initializes an instance of the AlbertModel class.

        Args:
            self: The instance of the class.
            config (AlbertConfig): An instance of AlbertConfig containing the model configuration.
            add_pooling_layer (bool, optional): A flag indicating whether to add a pooling layer. Defaults to True.

        Returns:
            None.

        Raises:
            TypeError: If the config parameter is not of type AlbertConfig.
            ValueError: If the config parameter is invalid or if the add_pooling_layer parameter is not a boolean.
        """
        super().__init__(config)

        self.config = config
        self.embeddings = AlbertEmbeddings(config)
        self.encoder = AlbertTransformer(config)
        if add_pooling_layer:
            self.pooler = nn.Linear(config.hidden_size, config.hidden_size)
            self.pooler_activation = nn.Tanh()
        else:
            self.pooler = None
            self.pooler_activation = None

        # Initialize weights and apply final processing
        self.post_init()

    def get_input_embeddings(self) -> nn.Embedding:
        """
        Retrieve the input embeddings for the AlbertModel.

        Args:
            self (object): The instance of the AlbertModel class.
                This parameter is required to access the instance attributes and methods.

        Returns:
            nn.Embedding: An instance of the nn.Embedding class representing the input embeddings.
                The input embeddings are used to convert input tokens into their corresponding word embeddings.

        Raises:
            None.
        """
        return self.embeddings.word_embeddings

    def set_input_embeddings(self, value: nn.Embedding) -> None:
        """
        Set input embeddings for the AlbertModel.

        Args:
            self (object): The instance of the AlbertModel class.
            value (nn.Embedding): The input embeddings to be set for the model. It should be an instance of nn.Embedding class.

        Returns:
            None.

        Raises:
            None.
        """
        self.embeddings.word_embeddings = value

    def _prune_heads(self, heads_to_prune: Dict[int, List[int]]) -> None:
        """
        Prunes heads of the model. heads_to_prune: dict of {layer_num: list of heads to prune in this layer} ALBERT has
        a different architecture in that its layers are shared across groups, which then has inner groups. If an ALBERT
        model has 12 hidden layers and 2 hidden groups, with two inner groups, there is a total of 4 different layers.

        These layers are flattened: the indices [0,1] correspond to the two inner groups of the first hidden layer,
        while [2,3] correspond to the two inner groups of the second hidden layer.

        Any layer with in index other than [0,1,2,3] will result in an error. See base class PreTrainedModel for more
        information about head pruning
        """
        for layer, heads in heads_to_prune.items():
            group_idx = int(layer / self.config.inner_group_num)
            inner_group_idx = int(layer - group_idx * self.config.inner_group_num)
            self.encoder.albert_layer_groups[group_idx].albert_layers[inner_group_idx].attention.prune_heads(heads)

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        token_type_ids: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[BaseModelOutputWithPooling, Tuple]:
        """
        Constructs the AlbertModel.

        Args:
            self (AlbertModel): The instance of the AlbertModel class.
            input_ids (Optional[mindspore.Tensor]): The input tensor containing the IDs of the tokens. Default is None.
            attention_mask (Optional[mindspore.Tensor]): The tensor specifying which tokens should be attended to. Default is None.
            token_type_ids (Optional[mindspore.Tensor]): The tensor containing the type IDs of the tokens. Default is None.
            position_ids (Optional[mindspore.Tensor]): The tensor containing the position IDs of the tokens. Default is None.
            head_mask (Optional[mindspore.Tensor]): The tensor specifying which heads to mask in the attention layers. Default is None.
            inputs_embeds (Optional[mindspore.Tensor]): The tensor containing the embedded representations of the input tokens. Default is None.
            output_attentions (Optional[bool]): Whether to output the attentions. Default is None.
            output_hidden_states (Optional[bool]): Whether to output the hidden states. Default is None.
            return_dict (Optional[bool]): Whether to return a BaseModelOutputWithPooling object. Default is None.

        Returns:
            Union[BaseModelOutputWithPooling, Tuple]: Either a BaseModelOutputWithPooling object or a tuple containing
                the sequence output, pooled output, hidden states, and attentions.

        Raises:
            ValueError: If both input_ids and inputs_embeds are specified.
            ValueError: If neither input_ids nor inputs_embeds are specified.
        """
        output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
        output_hidden_states = (
            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
        )
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        if input_ids is not None and inputs_embeds is not None:
            raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
        if input_ids is not None:
            self.warn_if_padding_and_no_attention_mask(input_ids, attention_mask)
            input_shape = input_ids.shape
        elif inputs_embeds is not None:
            input_shape = inputs_embeds.shape[:-1]
        else:
            raise ValueError("You have to specify either input_ids or inputs_embeds")

        batch_size, seq_length = input_shape

        if attention_mask is None:
            attention_mask = ops.ones(*input_shape)
        if token_type_ids is None:
            if hasattr(self.embeddings, "token_type_ids"):
                buffered_token_type_ids = self.embeddings.token_type_ids[:, :seq_length]
                buffered_token_type_ids_expanded = buffered_token_type_ids.broadcast_to((batch_size, seq_length))
                token_type_ids = buffered_token_type_ids_expanded
            else:
                token_type_ids = ops.zeros(*input_shape, dtype=mindspore.int64)

        extended_attention_mask = attention_mask.unsqueeze(1).unsqueeze(2)
        extended_attention_mask = extended_attention_mask.to(dtype=self.dtype)  # fp16 compatibility
        extended_attention_mask = (1.0 - extended_attention_mask) * mindspore.tensor(np.finfo(mindspore.dtype_to_nptype(self.dtype)).min)
        head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)

        embedding_output = self.embeddings(
            input_ids, position_ids=position_ids, token_type_ids=token_type_ids, inputs_embeds=inputs_embeds
        )
        encoder_outputs = self.encoder(
            embedding_output,
            extended_attention_mask,
            head_mask=head_mask,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        sequence_output = encoder_outputs[0]

        pooled_output = self.pooler_activation(self.pooler(sequence_output[:, 0])) if self.pooler is not None else None

        if not return_dict:
            return (sequence_output, pooled_output) + encoder_outputs[1:]

        return BaseModelOutputWithPooling(
            last_hidden_state=sequence_output,
            pooler_output=pooled_output,
            hidden_states=encoder_outputs.hidden_states,
            attentions=encoder_outputs.attentions,
        )

mindnlp.transformers.models.albert.modeling_albert.AlbertModel.__init__(config, add_pooling_layer=True)

Initializes an instance of the AlbertModel class.

PARAMETER DESCRIPTION
self

The instance of the class.

config

An instance of AlbertConfig containing the model configuration.

TYPE: AlbertConfig

add_pooling_layer

A flag indicating whether to add a pooling layer. Defaults to True.

TYPE: bool DEFAULT: True

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
TypeError

If the config parameter is not of type AlbertConfig.

ValueError

If the config parameter is invalid or if the add_pooling_layer parameter is not a boolean.

Source code in mindnlp/transformers/models/albert/modeling_albert.py
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
def __init__(self, config: AlbertConfig, add_pooling_layer: bool = True):
    """
    Initializes an instance of the AlbertModel class.

    Args:
        self: The instance of the class.
        config (AlbertConfig): An instance of AlbertConfig containing the model configuration.
        add_pooling_layer (bool, optional): A flag indicating whether to add a pooling layer. Defaults to True.

    Returns:
        None.

    Raises:
        TypeError: If the config parameter is not of type AlbertConfig.
        ValueError: If the config parameter is invalid or if the add_pooling_layer parameter is not a boolean.
    """
    super().__init__(config)

    self.config = config
    self.embeddings = AlbertEmbeddings(config)
    self.encoder = AlbertTransformer(config)
    if add_pooling_layer:
        self.pooler = nn.Linear(config.hidden_size, config.hidden_size)
        self.pooler_activation = nn.Tanh()
    else:
        self.pooler = None
        self.pooler_activation = None

    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.albert.modeling_albert.AlbertModel.forward(input_ids=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, output_attentions=None, output_hidden_states=None, return_dict=None)

Constructs the AlbertModel.

PARAMETER DESCRIPTION
self

The instance of the AlbertModel class.

TYPE: AlbertModel

input_ids

The input tensor containing the IDs of the tokens. Default is None.

TYPE: Optional[Tensor] DEFAULT: None

attention_mask

The tensor specifying which tokens should be attended to. Default is None.

TYPE: Optional[Tensor] DEFAULT: None

token_type_ids

The tensor containing the type IDs of the tokens. Default is None.

TYPE: Optional[Tensor] DEFAULT: None

position_ids

The tensor containing the position IDs of the tokens. Default is None.

TYPE: Optional[Tensor] DEFAULT: None

head_mask

The tensor specifying which heads to mask in the attention layers. Default is None.

TYPE: Optional[Tensor] DEFAULT: None

inputs_embeds

The tensor containing the embedded representations of the input tokens. Default is None.

TYPE: Optional[Tensor] DEFAULT: None

output_attentions

Whether to output the attentions. Default is None.

TYPE: Optional[bool] DEFAULT: None

output_hidden_states

Whether to output the hidden states. Default is None.

TYPE: Optional[bool] DEFAULT: None

return_dict

Whether to return a BaseModelOutputWithPooling object. Default is None.

TYPE: Optional[bool] DEFAULT: None

RETURNS DESCRIPTION
Union[BaseModelOutputWithPooling, Tuple]

Union[BaseModelOutputWithPooling, Tuple]: Either a BaseModelOutputWithPooling object or a tuple containing the sequence output, pooled output, hidden states, and attentions.

RAISES DESCRIPTION
ValueError

If both input_ids and inputs_embeds are specified.

ValueError

If neither input_ids nor inputs_embeds are specified.

Source code in mindnlp/transformers/models/albert/modeling_albert.py
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    token_type_ids: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[BaseModelOutputWithPooling, Tuple]:
    """
    Constructs the AlbertModel.

    Args:
        self (AlbertModel): The instance of the AlbertModel class.
        input_ids (Optional[mindspore.Tensor]): The input tensor containing the IDs of the tokens. Default is None.
        attention_mask (Optional[mindspore.Tensor]): The tensor specifying which tokens should be attended to. Default is None.
        token_type_ids (Optional[mindspore.Tensor]): The tensor containing the type IDs of the tokens. Default is None.
        position_ids (Optional[mindspore.Tensor]): The tensor containing the position IDs of the tokens. Default is None.
        head_mask (Optional[mindspore.Tensor]): The tensor specifying which heads to mask in the attention layers. Default is None.
        inputs_embeds (Optional[mindspore.Tensor]): The tensor containing the embedded representations of the input tokens. Default is None.
        output_attentions (Optional[bool]): Whether to output the attentions. Default is None.
        output_hidden_states (Optional[bool]): Whether to output the hidden states. Default is None.
        return_dict (Optional[bool]): Whether to return a BaseModelOutputWithPooling object. Default is None.

    Returns:
        Union[BaseModelOutputWithPooling, Tuple]: Either a BaseModelOutputWithPooling object or a tuple containing
            the sequence output, pooled output, hidden states, and attentions.

    Raises:
        ValueError: If both input_ids and inputs_embeds are specified.
        ValueError: If neither input_ids nor inputs_embeds are specified.
    """
    output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
    output_hidden_states = (
        output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
    )
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    if input_ids is not None and inputs_embeds is not None:
        raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
    if input_ids is not None:
        self.warn_if_padding_and_no_attention_mask(input_ids, attention_mask)
        input_shape = input_ids.shape
    elif inputs_embeds is not None:
        input_shape = inputs_embeds.shape[:-1]
    else:
        raise ValueError("You have to specify either input_ids or inputs_embeds")

    batch_size, seq_length = input_shape

    if attention_mask is None:
        attention_mask = ops.ones(*input_shape)
    if token_type_ids is None:
        if hasattr(self.embeddings, "token_type_ids"):
            buffered_token_type_ids = self.embeddings.token_type_ids[:, :seq_length]
            buffered_token_type_ids_expanded = buffered_token_type_ids.broadcast_to((batch_size, seq_length))
            token_type_ids = buffered_token_type_ids_expanded
        else:
            token_type_ids = ops.zeros(*input_shape, dtype=mindspore.int64)

    extended_attention_mask = attention_mask.unsqueeze(1).unsqueeze(2)
    extended_attention_mask = extended_attention_mask.to(dtype=self.dtype)  # fp16 compatibility
    extended_attention_mask = (1.0 - extended_attention_mask) * mindspore.tensor(np.finfo(mindspore.dtype_to_nptype(self.dtype)).min)
    head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)

    embedding_output = self.embeddings(
        input_ids, position_ids=position_ids, token_type_ids=token_type_ids, inputs_embeds=inputs_embeds
    )
    encoder_outputs = self.encoder(
        embedding_output,
        extended_attention_mask,
        head_mask=head_mask,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    sequence_output = encoder_outputs[0]

    pooled_output = self.pooler_activation(self.pooler(sequence_output[:, 0])) if self.pooler is not None else None

    if not return_dict:
        return (sequence_output, pooled_output) + encoder_outputs[1:]

    return BaseModelOutputWithPooling(
        last_hidden_state=sequence_output,
        pooler_output=pooled_output,
        hidden_states=encoder_outputs.hidden_states,
        attentions=encoder_outputs.attentions,
    )

mindnlp.transformers.models.albert.modeling_albert.AlbertModel.get_input_embeddings()

Retrieve the input embeddings for the AlbertModel.

PARAMETER DESCRIPTION
self

The instance of the AlbertModel class. This parameter is required to access the instance attributes and methods.

TYPE: object

RETURNS DESCRIPTION
Embedding

nn.Embedding: An instance of the nn.Embedding class representing the input embeddings. The input embeddings are used to convert input tokens into their corresponding word embeddings.

Source code in mindnlp/transformers/models/albert/modeling_albert.py
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
def get_input_embeddings(self) -> nn.Embedding:
    """
    Retrieve the input embeddings for the AlbertModel.

    Args:
        self (object): The instance of the AlbertModel class.
            This parameter is required to access the instance attributes and methods.

    Returns:
        nn.Embedding: An instance of the nn.Embedding class representing the input embeddings.
            The input embeddings are used to convert input tokens into their corresponding word embeddings.

    Raises:
        None.
    """
    return self.embeddings.word_embeddings

mindnlp.transformers.models.albert.modeling_albert.AlbertModel.set_input_embeddings(value)

Set input embeddings for the AlbertModel.

PARAMETER DESCRIPTION
self

The instance of the AlbertModel class.

TYPE: object

value

The input embeddings to be set for the model. It should be an instance of nn.Embedding class.

TYPE: Embedding

RETURNS DESCRIPTION
None

None.

Source code in mindnlp/transformers/models/albert/modeling_albert.py
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
def set_input_embeddings(self, value: nn.Embedding) -> None:
    """
    Set input embeddings for the AlbertModel.

    Args:
        self (object): The instance of the AlbertModel class.
        value (nn.Embedding): The input embeddings to be set for the model. It should be an instance of nn.Embedding class.

    Returns:
        None.

    Raises:
        None.
    """
    self.embeddings.word_embeddings = value

mindnlp.transformers.models.albert.modeling_albert.AlbertPreTrainedModel

Bases: PreTrainedModel

An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained models.

Source code in mindnlp/transformers/models/albert/modeling_albert.py
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
class AlbertPreTrainedModel(PreTrainedModel):
    """
    An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
    models.
    """
    config_class = AlbertConfig
    base_model_prefix = "albert"

    def _init_weights(self, cell):
        """Initialize the weights"""
        if isinstance(cell, nn.Linear):
            # Slightly different from the TF version which uses truncated_normal for initialization
            # cf https://github.com/pytorch/pytorch/pull/5617
            cell.weight.set_data(initializer(Normal(self.config.initializer_range),
                                                    cell.weight.shape, cell.weight.dtype))
            if cell.bias is not None:
                cell.bias.set_data(initializer('zeros', cell.bias.shape, cell.bias.dtype))
        elif isinstance(cell, nn.Embedding):
            weight = np.random.normal(0.0, self.config.initializer_range, cell.weight.shape)
            if cell.padding_idx:
                weight[cell.padding_idx] = 0

            cell.weight.set_data(Tensor(weight, cell.weight.dtype))
        elif isinstance(cell, nn.LayerNorm):
            cell.weight.set_data(initializer('ones', cell.weight.shape, cell.weight.dtype))
            cell.bias.set_data(initializer('zeros', cell.bias.shape, cell.bias.dtype))

mindnlp.transformers.models.albert.modeling_albert.AlbertSOPHead

Bases: Module

This class represents the AlbertSOPHead, which is responsible for forwarding the sentence-order prediction (SOP) head in an ALBERT (A Lite BERT) model.

The AlbertSOPHead class inherits from nn.Module and provides methods for initializing the SOP head and forwarding the logits for SOP classification.

ATTRIBUTE DESCRIPTION
config

The configuration object for the ALBERT model.

TYPE: AlbertConfig

METHOD DESCRIPTION
__init__

Initializes the AlbertSOPHead instance.

forward

Constructs the logits for SOP classification based on the pooled_output tensor.

Example
>>> from mindspore import nn
>>> import mindspore.numpy as np
>>> import mindspore.ops as ops
...
>>> config = AlbertConfig()  # create the ALBERT configuration object
>>> albert_sop_head = AlbertSOPHead(config)  # create an instance of AlbertSOPHead
...
>>> pooled_output = np.random.randn(2, config.hidden_size)  # create a random pooled_output tensor
>>> logits = albert_sop_head.forward(pooled_output)  # forward the logits for SOP classification
Source code in mindnlp/transformers/models/albert/modeling_albert.py
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
class AlbertSOPHead(nn.Module):

    """
    This class represents the AlbertSOPHead, which is responsible for forwarding the sentence-order prediction (SOP) head in an ALBERT (A Lite BERT) model.

    The AlbertSOPHead class inherits from nn.Module and provides methods for initializing the SOP head and forwarding the logits for SOP classification.

    Attributes:
        config (AlbertConfig): The configuration object for the ALBERT model.

    Methods:
        __init__:
            Initializes the AlbertSOPHead instance.

        forward:
            Constructs the logits for SOP classification based on the pooled_output tensor.

    Example:
        ```python
        >>> from mindspore import nn
        >>> import mindspore.numpy as np
        >>> import mindspore.ops as ops
        ...
        >>> config = AlbertConfig()  # create the ALBERT configuration object
        >>> albert_sop_head = AlbertSOPHead(config)  # create an instance of AlbertSOPHead
        ...
        >>> pooled_output = np.random.randn(2, config.hidden_size)  # create a random pooled_output tensor
        >>> logits = albert_sop_head.forward(pooled_output)  # forward the logits for SOP classification
        ```
    """
    def __init__(self, config: AlbertConfig):
        """
        Initializes an instance of the AlbertSOPHead class.

        Args:
            self: The current instance of the class.
            config (AlbertConfig): The configuration object for the Albert model.

        Returns:
            None

        Raises:
            None
        """
        super().__init__()

        self.dropout = nn.Dropout(p=config.classifier_dropout_prob)
        self.classifier = nn.Linear(config.hidden_size, config.num_labels)

    def forward(self, pooled_output: mindspore.Tensor) -> mindspore.Tensor:
        """
        This method forwards the AlbertSOPHead by applying dropout and classifier operations on the provided pooled_output.

        Args:
            self (object): The instance of the AlbertSOPHead class.
            pooled_output (mindspore.Tensor): The pooled output tensor obtained from the previous layer. It serves as the input to the method.

        Returns:
            mindspore.Tensor:
                The output tensor (logits) obtained after applying the dropout and classifier operations on the pooled_output.
                This tensor represents the final result of the AlbertSOPHead forwardion process.

        Raises:
            None
        """
        dropout_pooled_output = self.dropout(pooled_output)
        logits = self.classifier(dropout_pooled_output)
        return logits

mindnlp.transformers.models.albert.modeling_albert.AlbertSOPHead.__init__(config)

Initializes an instance of the AlbertSOPHead class.

PARAMETER DESCRIPTION
self

The current instance of the class.

config

The configuration object for the Albert model.

TYPE: AlbertConfig

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/albert/modeling_albert.py
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
def __init__(self, config: AlbertConfig):
    """
    Initializes an instance of the AlbertSOPHead class.

    Args:
        self: The current instance of the class.
        config (AlbertConfig): The configuration object for the Albert model.

    Returns:
        None

    Raises:
        None
    """
    super().__init__()

    self.dropout = nn.Dropout(p=config.classifier_dropout_prob)
    self.classifier = nn.Linear(config.hidden_size, config.num_labels)

mindnlp.transformers.models.albert.modeling_albert.AlbertSOPHead.forward(pooled_output)

This method forwards the AlbertSOPHead by applying dropout and classifier operations on the provided pooled_output.

PARAMETER DESCRIPTION
self

The instance of the AlbertSOPHead class.

TYPE: object

pooled_output

The pooled output tensor obtained from the previous layer. It serves as the input to the method.

TYPE: Tensor

RETURNS DESCRIPTION
Tensor

mindspore.Tensor: The output tensor (logits) obtained after applying the dropout and classifier operations on the pooled_output. This tensor represents the final result of the AlbertSOPHead forwardion process.

Source code in mindnlp/transformers/models/albert/modeling_albert.py
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
def forward(self, pooled_output: mindspore.Tensor) -> mindspore.Tensor:
    """
    This method forwards the AlbertSOPHead by applying dropout and classifier operations on the provided pooled_output.

    Args:
        self (object): The instance of the AlbertSOPHead class.
        pooled_output (mindspore.Tensor): The pooled output tensor obtained from the previous layer. It serves as the input to the method.

    Returns:
        mindspore.Tensor:
            The output tensor (logits) obtained after applying the dropout and classifier operations on the pooled_output.
            This tensor represents the final result of the AlbertSOPHead forwardion process.

    Raises:
        None
    """
    dropout_pooled_output = self.dropout(pooled_output)
    logits = self.classifier(dropout_pooled_output)
    return logits

mindnlp.transformers.models.albert.modeling_albert.AlbertTransformer

Bases: Module

This class represents the AlbertTransformer, which is a part of the Albert model in the MindSpore library. It is responsible for forwarding the Albert transformer layers.

The AlbertTransformer class inherits from the nn.Module class.

ATTRIBUTE DESCRIPTION
config

The configuration object for the Albert model.

TYPE: AlbertConfig

embedding_hidden_mapping_in

The dense layer to map the input hidden states to the embedding size.

TYPE: Linear

albert_layer_groups

A list of AlbertLayerGroup instances representing the transformer layers.

TYPE: ModuleList

METHOD DESCRIPTION
forward

Constructs the Albert transformer layers.

  • Args:

    • hidden_states (mindspore.Tensor): The input hidden states.
    • attention_mask (Optional[mindspore.Tensor]): The attention mask tensor (default None).
    • head_mask (Optional[mindspore.Tensor]): The head mask tensor (default None).
    • output_attentions (bool): Whether to output attentions (default False).
    • output_hidden_states (bool): Whether to output hidden states (default False).
    • return_dict (bool): Whether to return the output as a BaseModelOutput instance (default True).
  • Returns:

    • Union[BaseModelOutput, Tuple]: The output as a BaseModelOutput instance or a tuple of tensors.
Source code in mindnlp/transformers/models/albert/modeling_albert.py
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
class AlbertTransformer(nn.Module):

    """
    This class represents the AlbertTransformer, which is a part of the Albert model in the MindSpore library. It is responsible for forwarding the Albert transformer layers.

    The AlbertTransformer class inherits from the nn.Module class.

    Attributes:
        config (AlbertConfig): The configuration object for the Albert model.
        embedding_hidden_mapping_in (nn.Linear): The dense layer to map the input hidden states to the embedding size.
        albert_layer_groups (nn.ModuleList): A list of AlbertLayerGroup instances representing the transformer layers.

    Methods:
        forward(hidden_states, attention_mask=None, head_mask=None, output_attentions=False, output_hidden_states=False, return_dict=True):
            Constructs the Albert transformer layers.

            - Args:

                - hidden_states (mindspore.Tensor): The input hidden states.
                - attention_mask (Optional[mindspore.Tensor]): The attention mask tensor (default None).
                - head_mask (Optional[mindspore.Tensor]): The head mask tensor (default None).
                - output_attentions (bool): Whether to output attentions (default False).
                - output_hidden_states (bool): Whether to output hidden states (default False).
                - return_dict (bool): Whether to return the output as a BaseModelOutput instance (default True).

            - Returns:

                - Union[BaseModelOutput, Tuple]: The output as a BaseModelOutput instance or a tuple of tensors.

    """
    def __init__(self, config: AlbertConfig):
        """
        Initializes an instance of the AlbertTransformer class.

        Args:
            self: The instance of the AlbertTransformer class.
            config (AlbertConfig): An instance of AlbertConfig specifying the configuration settings for the transformer.
                The config parameter defines the model's architecture, including the embedding size, hidden size, and the number of hidden groups.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()

        self.config = config
        self.embedding_hidden_mapping_in = nn.Linear(config.embedding_size, config.hidden_size)
        self.albert_layer_groups = nn.ModuleList([AlbertLayerGroup(config) for _ in range(config.num_hidden_groups)])

    def forward(
        self,
        hidden_states: mindspore.Tensor,
        attention_mask: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        output_attentions: bool = False,
        output_hidden_states: bool = False,
        return_dict: bool = True,
    ) -> Union[BaseModelOutput, Tuple]:
        """
        Constructs the AlbertTransformer.

        Args:
            self: The instance of the class.
            hidden_states (mindspore.Tensor): The input hidden states to be transformed.
            attention_mask (Optional[mindspore.Tensor]): A tensor representing the attention mask, defaults to None.
            head_mask (Optional[mindspore.Tensor]): A tensor representing the head mask, defaults to None.
            output_attentions (bool): A boolean indicating whether to output attentions, defaults to False.
            output_hidden_states (bool): A boolean indicating whether to output hidden states, defaults to False.
            return_dict (bool): A boolean indicating whether to return a dictionary, defaults to True.

        Returns:
            Union[BaseModelOutput, Tuple]:
                The output value, which could be either BaseModelOutput or a tuple.

        Raises:
            None.
        """
        hidden_states = self.embedding_hidden_mapping_in(hidden_states)

        all_hidden_states = (hidden_states,) if output_hidden_states else None
        all_attentions = () if output_attentions else None

        head_mask = [None] * self.config.num_hidden_layers if head_mask is None else head_mask

        for i in range(self.config.num_hidden_layers):
            # Number of layers in a hidden group
            layers_per_group = int(self.config.num_hidden_layers / self.config.num_hidden_groups)

            # Index of the hidden group
            group_idx = int(i / (self.config.num_hidden_layers / self.config.num_hidden_groups))

            layer_group_output = self.albert_layer_groups[group_idx](
                hidden_states,
                attention_mask,
                head_mask[group_idx * layers_per_group : (group_idx + 1) * layers_per_group],
                output_attentions,
                output_hidden_states,
            )
            hidden_states = layer_group_output[0]

            if output_attentions:
                all_attentions = all_attentions + layer_group_output[-1]

            if output_hidden_states:
                all_hidden_states = all_hidden_states + (hidden_states,)

        if not return_dict:
            return tuple(v for v in [hidden_states, all_hidden_states, all_attentions] if v is not None)
        return BaseModelOutput(
            last_hidden_state=hidden_states, hidden_states=all_hidden_states, attentions=all_attentions
        )

mindnlp.transformers.models.albert.modeling_albert.AlbertTransformer.__init__(config)

Initializes an instance of the AlbertTransformer class.

PARAMETER DESCRIPTION
self

The instance of the AlbertTransformer class.

config

An instance of AlbertConfig specifying the configuration settings for the transformer. The config parameter defines the model's architecture, including the embedding size, hidden size, and the number of hidden groups.

TYPE: AlbertConfig

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/albert/modeling_albert.py
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
def __init__(self, config: AlbertConfig):
    """
    Initializes an instance of the AlbertTransformer class.

    Args:
        self: The instance of the AlbertTransformer class.
        config (AlbertConfig): An instance of AlbertConfig specifying the configuration settings for the transformer.
            The config parameter defines the model's architecture, including the embedding size, hidden size, and the number of hidden groups.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()

    self.config = config
    self.embedding_hidden_mapping_in = nn.Linear(config.embedding_size, config.hidden_size)
    self.albert_layer_groups = nn.ModuleList([AlbertLayerGroup(config) for _ in range(config.num_hidden_groups)])

mindnlp.transformers.models.albert.modeling_albert.AlbertTransformer.forward(hidden_states, attention_mask=None, head_mask=None, output_attentions=False, output_hidden_states=False, return_dict=True)

Constructs the AlbertTransformer.

PARAMETER DESCRIPTION
self

The instance of the class.

hidden_states

The input hidden states to be transformed.

TYPE: Tensor

attention_mask

A tensor representing the attention mask, defaults to None.

TYPE: Optional[Tensor] DEFAULT: None

head_mask

A tensor representing the head mask, defaults to None.

TYPE: Optional[Tensor] DEFAULT: None

output_attentions

A boolean indicating whether to output attentions, defaults to False.

TYPE: bool DEFAULT: False

output_hidden_states

A boolean indicating whether to output hidden states, defaults to False.

TYPE: bool DEFAULT: False

return_dict

A boolean indicating whether to return a dictionary, defaults to True.

TYPE: bool DEFAULT: True

RETURNS DESCRIPTION
Union[BaseModelOutput, Tuple]

Union[BaseModelOutput, Tuple]: The output value, which could be either BaseModelOutput or a tuple.

Source code in mindnlp/transformers/models/albert/modeling_albert.py
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
def forward(
    self,
    hidden_states: mindspore.Tensor,
    attention_mask: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    output_attentions: bool = False,
    output_hidden_states: bool = False,
    return_dict: bool = True,
) -> Union[BaseModelOutput, Tuple]:
    """
    Constructs the AlbertTransformer.

    Args:
        self: The instance of the class.
        hidden_states (mindspore.Tensor): The input hidden states to be transformed.
        attention_mask (Optional[mindspore.Tensor]): A tensor representing the attention mask, defaults to None.
        head_mask (Optional[mindspore.Tensor]): A tensor representing the head mask, defaults to None.
        output_attentions (bool): A boolean indicating whether to output attentions, defaults to False.
        output_hidden_states (bool): A boolean indicating whether to output hidden states, defaults to False.
        return_dict (bool): A boolean indicating whether to return a dictionary, defaults to True.

    Returns:
        Union[BaseModelOutput, Tuple]:
            The output value, which could be either BaseModelOutput or a tuple.

    Raises:
        None.
    """
    hidden_states = self.embedding_hidden_mapping_in(hidden_states)

    all_hidden_states = (hidden_states,) if output_hidden_states else None
    all_attentions = () if output_attentions else None

    head_mask = [None] * self.config.num_hidden_layers if head_mask is None else head_mask

    for i in range(self.config.num_hidden_layers):
        # Number of layers in a hidden group
        layers_per_group = int(self.config.num_hidden_layers / self.config.num_hidden_groups)

        # Index of the hidden group
        group_idx = int(i / (self.config.num_hidden_layers / self.config.num_hidden_groups))

        layer_group_output = self.albert_layer_groups[group_idx](
            hidden_states,
            attention_mask,
            head_mask[group_idx * layers_per_group : (group_idx + 1) * layers_per_group],
            output_attentions,
            output_hidden_states,
        )
        hidden_states = layer_group_output[0]

        if output_attentions:
            all_attentions = all_attentions + layer_group_output[-1]

        if output_hidden_states:
            all_hidden_states = all_hidden_states + (hidden_states,)

    if not return_dict:
        return tuple(v for v in [hidden_states, all_hidden_states, all_attentions] if v is not None)
    return BaseModelOutput(
        last_hidden_state=hidden_states, hidden_states=all_hidden_states, attentions=all_attentions
    )

mindnlp.transformers.models.albert.tokenization_albert

Tokenization classes for ALBERT model.

mindnlp.transformers.models.albert.tokenization_albert.AlbertTokenizer

Bases: PreTrainedTokenizer

Construct an ALBERT tokenizer. Based on SentencePiece.

This tokenizer inherits from [PreTrainedTokenizer] which contains most of the main methods. Users should refer to this superclass for more information regarding those methods.

PARAMETER DESCRIPTION
vocab_file

SentencePiece file (generally has a .spm extension) that contains the vocabulary necessary to instantiate a tokenizer.

TYPE: `str`

do_lower_case

Whether or not to lowercase the input when tokenizing.

TYPE: `bool`, *optional*, defaults to `True` DEFAULT: True

remove_space

Whether or not to strip the text when tokenizing (removing excess spaces before and after the string).

TYPE: `bool`, *optional*, defaults to `True` DEFAULT: True

keep_accents

Whether or not to keep accents when tokenizing.

TYPE: `bool`, *optional*, defaults to `False` DEFAULT: False

bos_token

The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token.

When building a sequence using special tokens, this is not the token that is used for the beginning of sequence. The token used is the cls_token.

TYPE: `str`, *optional*, defaults to `"[CLS]"` DEFAULT: '[CLS]'

eos_token

The end of sequence token.

When building a sequence using special tokens, this is not the token that is used for the end of sequence. The token used is the sep_token.

TYPE: `str`, *optional*, defaults to `"[SEP]"` DEFAULT: '[SEP]'

unk_token

The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this token instead.

TYPE: `str`, *optional*, defaults to `"<unk>"` DEFAULT: '<unk>'

sep_token

The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for sequence classification or for a text and a question for question answering. It is also used as the last token of a sequence built with special tokens.

TYPE: `str`, *optional*, defaults to `"[SEP]"` DEFAULT: '[SEP]'

pad_token

The token used for padding, for example when batching sequences of different lengths.

TYPE: `str`, *optional*, defaults to `"<pad>"` DEFAULT: '<pad>'

cls_token

The classifier token which is used when doing sequence classification (classification of the whole sequence instead of per-token classification). It is the first token of the sequence when built with special tokens.

TYPE: `str`, *optional*, defaults to `"[CLS]"` DEFAULT: '[CLS]'

mask_token

The token used for masking values. This is the token used when training this model with masked language modeling. This is the token which the model will try to predict.

TYPE: `str`, *optional*, defaults to `"[MASK]"` DEFAULT: '[MASK]'

sp_model_kwargs

Will be passed to the SentencePieceProcessor.__init__() method. The Python wrapper for SentencePiece can be used, among other things, to set:

  • enable_sampling: Enable subword regularization.
  • nbest_size: Sampling parameters for unigram. Invalid for BPE-Dropout.

    • nbest_size = {0,1}: No sampling is performed.
    • nbest_size > 1: samples from the nbest_size results.
    • nbest_size < 0: assuming that nbest_size is infinite and samples from the all hypothesis (lattice) using forward-filtering-and-backward-sampling algorithm.
  • alpha: Smoothing parameter for unigram sampling, and dropout probability of merge operations for BPE-dropout.

TYPE: `dict`, *optional* DEFAULT: None

ATTRIBUTE DESCRIPTION
sp_model

The SentencePiece processor that is used for every conversion (string, tokens and IDs).

TYPE: `SentencePieceProcessor`

Source code in mindnlp/transformers/models/albert/tokenization_albert.py
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
class AlbertTokenizer(PreTrainedTokenizer):
    """
    Construct an ALBERT tokenizer. Based on [SentencePiece](https://github.com/google/sentencepiece).

    This tokenizer inherits from [`PreTrainedTokenizer`] which contains most of the main methods. Users should refer to
    this superclass for more information regarding those methods.

    Args:
        vocab_file (`str`):
            [SentencePiece](https://github.com/google/sentencepiece) file (generally has a *.spm* extension) that
            contains the vocabulary necessary to instantiate a tokenizer.
        do_lower_case (`bool`, *optional*, defaults to `True`):
            Whether or not to lowercase the input when tokenizing.
        remove_space (`bool`, *optional*, defaults to `True`):
            Whether or not to strip the text when tokenizing (removing excess spaces before and after the string).
        keep_accents (`bool`, *optional*, defaults to `False`):
            Whether or not to keep accents when tokenizing.
        bos_token (`str`, *optional*, defaults to `"[CLS]"`):
            The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token.

            <Tip>

            When building a sequence using special tokens, this is not the token that is used for the beginning of
            sequence. The token used is the `cls_token`.

            </Tip>

        eos_token (`str`, *optional*, defaults to `"[SEP]"`):
            The end of sequence token.

            <Tip>

            When building a sequence using special tokens, this is not the token that is used for the end of sequence.
            The token used is the `sep_token`.

            </Tip>

        unk_token (`str`, *optional*, defaults to `"<unk>"`):
            The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
            token instead.
        sep_token (`str`, *optional*, defaults to `"[SEP]"`):
            The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for
            sequence classification or for a text and a question for question answering. It is also used as the last
            token of a sequence built with special tokens.
        pad_token (`str`, *optional*, defaults to `"<pad>"`):
            The token used for padding, for example when batching sequences of different lengths.
        cls_token (`str`, *optional*, defaults to `"[CLS]"`):
            The classifier token which is used when doing sequence classification (classification of the whole sequence
            instead of per-token classification). It is the first token of the sequence when built with special tokens.
        mask_token (`str`, *optional*, defaults to `"[MASK]"`):
            The token used for masking values. This is the token used when training this model with masked language
            modeling. This is the token which the model will try to predict.
        sp_model_kwargs (`dict`, *optional*):
            Will be passed to the `SentencePieceProcessor.__init__()` method. The [Python wrapper for
            SentencePiece](https://github.com/google/sentencepiece/tree/master/python) can be used, among other things,
            to set:

            - `enable_sampling`: Enable subword regularization.
            - `nbest_size`: Sampling parameters for unigram. Invalid for BPE-Dropout.

                - `nbest_size = {0,1}`: No sampling is performed.
                - `nbest_size > 1`: samples from the nbest_size results.
                - `nbest_size < 0`: assuming that nbest_size is infinite and samples from the all hypothesis (lattice)
                    using forward-filtering-and-backward-sampling algorithm.

            - `alpha`: Smoothing parameter for unigram sampling, and dropout probability of merge operations for
              BPE-dropout.

    Attributes:
        sp_model (`SentencePieceProcessor`):
            The *SentencePiece* processor that is used for every conversion (string, tokens and IDs).
    """
    vocab_files_names = VOCAB_FILES_NAMES
    pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP
    max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES

    def __init__(
        self,
        vocab_file,
        do_lower_case=True,
        remove_space=True,
        keep_accents=False,
        bos_token="[CLS]",
        eos_token="[SEP]",
        unk_token="<unk>",
        sep_token="[SEP]",
        pad_token="<pad>",
        cls_token="[CLS]",
        mask_token="[MASK]",
        sp_model_kwargs: Optional[Dict[str, Any]] = None,
        **kwargs,
    ) -> None:
        """
        Initializes an instance of the AlbertTokenizer class.

        Args:
            self: The object instance.
            vocab_file (str): The path to the vocabulary file.
            do_lower_case (bool, optional): Whether to convert all characters to lowercase. Default is True.
            remove_space (bool, optional): Whether to remove spaces. Default is True.
            keep_accents (bool, optional): Whether to keep accents. Default is False.
            bos_token (str, optional): The beginning of sentence token. Default is '[CLS]'.
            eos_token (str, optional): The end of sentence token. Default is '[SEP]'.
            unk_token (str, optional): The unknown token. Default is '<unk>'.
            sep_token (str, optional): The separator token. Default is '[SEP]'.
            pad_token (str, optional): The padding token. Default is '<pad>'.
            cls_token (str, optional): The classification token. Default is '[CLS]'.
            mask_token (str, optional): The masking token. Default is '[MASK]'.
            sp_model_kwargs (dict, optional): Additional keyword arguments for SentencePieceProcessor. Default is None.
            **kwargs: Additional keyword arguments.

        Returns:
            None

        Raises:
            None
        """
        # Mask token behave like a normal word, i.e. include the space before it and
        # is included in the raw text, there should be a match in a non-normalized sentence.
        mask_token = (
            AddedToken(mask_token, lstrip=True, rstrip=False, normalized=False)
            if isinstance(mask_token, str)
            else mask_token
        )

        self.sp_model_kwargs = {} if sp_model_kwargs is None else sp_model_kwargs

        self.do_lower_case = do_lower_case
        self.remove_space = remove_space
        self.keep_accents = keep_accents
        self.vocab_file = vocab_file

        self.sp_model = spm.SentencePieceProcessor(**self.sp_model_kwargs)
        self.sp_model.Load(vocab_file)

        super().__init__(
            do_lower_case=do_lower_case,
            remove_space=remove_space,
            keep_accents=keep_accents,
            bos_token=bos_token,
            eos_token=eos_token,
            unk_token=unk_token,
            sep_token=sep_token,
            pad_token=pad_token,
            cls_token=cls_token,
            mask_token=mask_token,
            sp_model_kwargs=self.sp_model_kwargs,
            **kwargs,
        )

    @property
    def vocab_size(self) -> int:
        """
        This method returns the size of the vocabulary used in the AlbertTokenizer.

        Args:
            self (AlbertTokenizer): The instance of the AlbertTokenizer class.

        Returns:
            int: The size of the vocabulary used in the AlbertTokenizer.

        Raises:
            None
        """
        return len(self.sp_model)

    def get_vocab(self) -> Dict[str, int]:
        """
        Get the vocabulary of the AlbertTokenizer.

        Args:
            self: The instance of the AlbertTokenizer class.
                This parameter is required to access the tokenizer's vocabulary.

        Returns:
            Dict[str, int]: A dictionary containing the vocabulary of the AlbertTokenizer where
                the keys are strings representing tokens and the values are integers representing token IDs.

        Raises:
            No specific exceptions are raised by this method.
        """
        vocab = {self.convert_ids_to_tokens(i): i for i in range(self.vocab_size)}
        vocab.update(self.added_tokens_encoder)
        return vocab

    def __getstate__(self):
        """
        Method: __getstate__

        Description:
            This method is implemented in the 'AlbertTokenizer' class to retrieve the state of the object for pickling.

        Args:
            self: An instance of the 'AlbertTokenizer' class.

        Returns:
            None: This method does not explicitly return a value.
                However, it modifies the state of the object by setting the 'sp_model' attribute to None.

        Raises:
            None.
        """
        state = self.__dict__.copy()
        state["sp_model"] = None
        return state

    def __setstate__(self, d):
        """
        Sets the internal state of the AlbertTokenizer instance.

        Args:
            self (AlbertTokenizer): The instance of the AlbertTokenizer class.
            d (dict): The dictionary containing the state of the instance.

        Returns:
            None.

        Raises:
            None.

        Description:
            This method is called during unpickling or deserialization of an AlbertTokenizer instance.
            It sets the internal state of the instance by assigning the provided dictionary 'd' to the '__dict__'
            attribute.

            If the instance does not have an attribute named 'sp_model_kwargs', it is initialized as an empty dictionary.

            Then, a SentencePieceProcessor object is created using the 'sp_model_kwargs' and assigned to the 'sp_model' attribute of the instance.
            The SentencePieceProcessor object is instantiated with the keyword arguments provided through 'self.sp_model_kwargs'.

            Finally, the SentencePieceProcessor object loads the vocabulary file specified by 'self.vocab_file'.

        Note:
            - This method is automatically called by the pickle module when unpickling an AlbertTokenizer object.
            - The '__setstate__' method is used in conjunction with the '__getstate__' method to enable pickling
            and unpickling of the AlbertTokenizer instances.

        Example:
            ```python
            >>> tokenizer = AlbertTokenizer()
            >>> state = {'sp_model_kwargs': {'model_type': 'unigram', 'vocab_size': 30000}, 'vocab_file': 'vocab.txt'}
            >>> tokenizer.__setstate__(state)
            ```

        """
        self.__dict__ =