Skip to content

ernie_m

mindnlp.transformers.models.ernie_m.configuration_ernie_m

ErnieM model configuration

mindnlp.transformers.models.ernie_m.configuration_ernie_m.ErnieMConfig

Bases: PretrainedConfig

This is the configuration class to store the configuration of a [ErnieMModel]. It is used to instantiate a Ernie-M model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the Ernie-M susnato/ernie-m-base_pytorch architecture.

Configuration objects inherit from [PretrainedConfig] and can be used to control the model outputs. Read the documentation from [PretrainedConfig] for more information.

PARAMETER DESCRIPTION
vocab_size

Vocabulary size of inputs_ids in [ErnieMModel]. Also is the vocab size of token embedding matrix. Defines the number of different tokens that can be represented by the inputs_ids passed when calling [ErnieMModel].

TYPE: `int`, *optional*, defaults to 250002 DEFAULT: 250002

hidden_size

Dimensionality of the embedding layer, encoder layers and pooler layer.

TYPE: `int`, *optional*, defaults to 768 DEFAULT: 768

num_hidden_layers

Number of hidden layers in the Transformer encoder.

TYPE: `int`, *optional*, defaults to 12 DEFAULT: 12

num_attention_heads

Number of attention heads for each attention layer in the Transformer encoder.

TYPE: `int`, *optional*, defaults to 12 DEFAULT: 12

intermediate_size

Dimensionality of the feed-forward (ff) layer in the encoder. Input tensors to feed-forward layers are firstly projected from hidden_size to intermediate_size, and then projected back to hidden_size. Typically intermediate_size is larger than hidden_size.

TYPE: `int`, *optional*, defaults to 3072 DEFAULT: 3072

hidden_act

The non-linear activation function in the feed-forward layer. "gelu", "relu" and any other torch supported activation functions are supported.

TYPE: `str`, *optional*, defaults to `"gelu"` DEFAULT: 'gelu'

hidden_dropout_prob

The dropout probability for all fully connected layers in the embeddings and encoder.

TYPE: `float`, *optional*, defaults to 0.1 DEFAULT: 0.1

attention_probs_dropout_prob

The dropout probability used in MultiHeadAttention in all encoder layers to drop some attention target.

TYPE: `float`, *optional*, defaults to 0.1 DEFAULT: 0.1

act_dropout

This dropout probability is used in ErnieMEncoderLayer after activation.

TYPE: `float`, *optional*, defaults to 0.0 DEFAULT: 0.0

max_position_embeddings

The maximum value of the dimensionality of position encoding, which dictates the maximum supported length of an input sequence.

TYPE: `int`, *optional*, defaults to 512 DEFAULT: 514

layer_norm_eps

The epsilon used by the layer normalization layers.

TYPE: `float`, *optional*, defaults to 1e-05 DEFAULT: 1e-05

classifier_dropout

The dropout ratio for the classification head.

TYPE: `float`, *optional* DEFAULT: None

initializer_range

The standard deviation of the normal initializer for initializing all weight matrices.

TYPE: `float`, *optional*, defaults to 0.02 DEFAULT: 0.02

pad_token_id(`int`,

The index of padding token in the token vocabulary.

TYPE: *optional*, defaults to 1

A normal_initializer initializes weight matrices as normal distributions. See ErnieMPretrainedModel._init_weights() for how weights are initialized in ErnieMModel.

Source code in mindnlp/transformers/models/ernie_m/configuration_ernie_m.py
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
class ErnieMConfig(PretrainedConfig):
    r"""
    This is the configuration class to store the configuration of a [`ErnieMModel`]. It is used to instantiate a
    Ernie-M model according to the specified arguments, defining the model architecture. Instantiating a configuration
    with the defaults will yield a similar configuration to that of the `Ernie-M`
    [susnato/ernie-m-base_pytorch](https://hf-mirror.com/susnato/ernie-m-base_pytorch) architecture.

    Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
    documentation from [`PretrainedConfig`] for more information.

    Args:
        vocab_size (`int`, *optional*, defaults to 250002):
            Vocabulary size of `inputs_ids` in [`ErnieMModel`]. Also is the vocab size of token embedding matrix.
            Defines the number of different tokens that can be represented by the `inputs_ids` passed when calling
            [`ErnieMModel`].
        hidden_size (`int`, *optional*, defaults to 768):
            Dimensionality of the embedding layer, encoder layers and pooler layer.
        num_hidden_layers (`int`, *optional*, defaults to 12):
            Number of hidden layers in the Transformer encoder.
        num_attention_heads (`int`, *optional*, defaults to 12):
            Number of attention heads for each attention layer in the Transformer encoder.
        intermediate_size (`int`, *optional*, defaults to 3072):
            Dimensionality of the feed-forward (ff) layer in the encoder. Input tensors to feed-forward layers are
            firstly projected from hidden_size to intermediate_size, and then projected back to hidden_size. Typically
            intermediate_size is larger than hidden_size.
        hidden_act (`str`, *optional*, defaults to `"gelu"`):
            The non-linear activation function in the feed-forward layer. `"gelu"`, `"relu"` and any other torch
            supported activation functions are supported.
        hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
            The dropout probability for all fully connected layers in the embeddings and encoder.
        attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
            The dropout probability used in `MultiHeadAttention` in all encoder layers to drop some attention target.
        act_dropout (`float`, *optional*, defaults to 0.0):
            This dropout probability is used in `ErnieMEncoderLayer` after activation.
        max_position_embeddings (`int`, *optional*, defaults to 512):
            The maximum value of the dimensionality of position encoding, which dictates the maximum supported length
            of an input sequence.
        layer_norm_eps (`float`, *optional*, defaults to 1e-05):
            The epsilon used by the layer normalization layers.
        classifier_dropout (`float`, *optional*):
            The dropout ratio for the classification head.
        initializer_range (`float`, *optional*, defaults to 0.02):
            The standard deviation of the normal initializer for initializing all weight matrices.
        pad_token_id(`int`, *optional*, defaults to 1):
            The index of padding token in the token vocabulary.

    A normal_initializer initializes weight matrices as normal distributions. See
    `ErnieMPretrainedModel._init_weights()` for how weights are initialized in `ErnieMModel`.
    """
    model_type = "ernie_m"
    attribute_map: Dict[str, str] = {"dropout": "classifier_dropout", "num_classes": "num_labels"}

    def __init__(
        self,
        vocab_size: int = 250002,
        hidden_size: int = 768,
        num_hidden_layers: int = 12,
        num_attention_heads: int = 12,
        intermediate_size: int = 3072,
        hidden_act: str = "gelu",
        hidden_dropout_prob: float = 0.1,
        attention_probs_dropout_prob: float = 0.1,
        max_position_embeddings: int = 514,
        initializer_range: float = 0.02,
        pad_token_id: int = 1,
        layer_norm_eps: float = 1e-05,
        classifier_dropout=None,
        is_decoder=False,
        act_dropout=0.0,
        **kwargs,
    ):
        """
        This method initializes an instance of the ErnieMConfig class.

        Args:
            self: The instance of the class.
            vocab_size (int): The size of the vocabulary. Default is 250002.
            hidden_size (int): The size of the hidden layers. Default is 768.
            num_hidden_layers (int): The number of hidden layers. Default is 12.
            num_attention_heads (int): The number of attention heads. Default is 12.
            intermediate_size (int): The size of the intermediate layer in the transformer. Default is 3072.
            hidden_act (str): The activation function for the hidden layers. Default is 'gelu'.
            hidden_dropout_prob (float): The dropout probability for the hidden layers. Default is 0.1.
            attention_probs_dropout_prob (float): The dropout probability for the attention probabilities. Default is 0.1.
            max_position_embeddings (int): The maximum position for the embeddings. Default is 514.
            initializer_range (float): The range for the weight initializers. Default is 0.02.
            pad_token_id (int): The ID for padding tokens. Default is 1.
            layer_norm_eps (float): The epsilon value for layer normalization. Default is 1e-05.
            classifier_dropout (None): The dropout rate for the classifier layer. Default is None.
            is_decoder (bool): Whether the model is a decoder. Default is False.
            act_dropout (float): The dropout rate for the activation function. Default is 0.0.

        Returns:
            None.

        Raises:
            None
        """
        super().__init__(pad_token_id=pad_token_id, **kwargs)
        self.vocab_size = vocab_size
        self.hidden_size = hidden_size
        self.num_hidden_layers = num_hidden_layers
        self.num_attention_heads = num_attention_heads
        self.intermediate_size = intermediate_size
        self.hidden_act = hidden_act
        self.hidden_dropout_prob = hidden_dropout_prob
        self.attention_probs_dropout_prob = attention_probs_dropout_prob
        self.max_position_embeddings = max_position_embeddings
        self.initializer_range = initializer_range
        self.layer_norm_eps = layer_norm_eps
        self.classifier_dropout = classifier_dropout
        self.is_decoder = is_decoder
        self.act_dropout = act_dropout

mindnlp.transformers.models.ernie_m.configuration_ernie_m.ErnieMConfig.__init__(vocab_size=250002, hidden_size=768, num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072, hidden_act='gelu', hidden_dropout_prob=0.1, attention_probs_dropout_prob=0.1, max_position_embeddings=514, initializer_range=0.02, pad_token_id=1, layer_norm_eps=1e-05, classifier_dropout=None, is_decoder=False, act_dropout=0.0, **kwargs)

This method initializes an instance of the ErnieMConfig class.

PARAMETER DESCRIPTION
self

The instance of the class.

vocab_size

The size of the vocabulary. Default is 250002.

TYPE: int DEFAULT: 250002

hidden_size

The size of the hidden layers. Default is 768.

TYPE: int DEFAULT: 768

num_hidden_layers

The number of hidden layers. Default is 12.

TYPE: int DEFAULT: 12

num_attention_heads

The number of attention heads. Default is 12.

TYPE: int DEFAULT: 12

intermediate_size

The size of the intermediate layer in the transformer. Default is 3072.

TYPE: int DEFAULT: 3072

hidden_act

The activation function for the hidden layers. Default is 'gelu'.

TYPE: str DEFAULT: 'gelu'

hidden_dropout_prob

The dropout probability for the hidden layers. Default is 0.1.

TYPE: float DEFAULT: 0.1

attention_probs_dropout_prob

The dropout probability for the attention probabilities. Default is 0.1.

TYPE: float DEFAULT: 0.1

max_position_embeddings

The maximum position for the embeddings. Default is 514.

TYPE: int DEFAULT: 514

initializer_range

The range for the weight initializers. Default is 0.02.

TYPE: float DEFAULT: 0.02

pad_token_id

The ID for padding tokens. Default is 1.

TYPE: int DEFAULT: 1

layer_norm_eps

The epsilon value for layer normalization. Default is 1e-05.

TYPE: float DEFAULT: 1e-05

classifier_dropout

The dropout rate for the classifier layer. Default is None.

TYPE: None DEFAULT: None

is_decoder

Whether the model is a decoder. Default is False.

TYPE: bool DEFAULT: False

act_dropout

The dropout rate for the activation function. Default is 0.0.

TYPE: float DEFAULT: 0.0

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/ernie_m/configuration_ernie_m.py
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
def __init__(
    self,
    vocab_size: int = 250002,
    hidden_size: int = 768,
    num_hidden_layers: int = 12,
    num_attention_heads: int = 12,
    intermediate_size: int = 3072,
    hidden_act: str = "gelu",
    hidden_dropout_prob: float = 0.1,
    attention_probs_dropout_prob: float = 0.1,
    max_position_embeddings: int = 514,
    initializer_range: float = 0.02,
    pad_token_id: int = 1,
    layer_norm_eps: float = 1e-05,
    classifier_dropout=None,
    is_decoder=False,
    act_dropout=0.0,
    **kwargs,
):
    """
    This method initializes an instance of the ErnieMConfig class.

    Args:
        self: The instance of the class.
        vocab_size (int): The size of the vocabulary. Default is 250002.
        hidden_size (int): The size of the hidden layers. Default is 768.
        num_hidden_layers (int): The number of hidden layers. Default is 12.
        num_attention_heads (int): The number of attention heads. Default is 12.
        intermediate_size (int): The size of the intermediate layer in the transformer. Default is 3072.
        hidden_act (str): The activation function for the hidden layers. Default is 'gelu'.
        hidden_dropout_prob (float): The dropout probability for the hidden layers. Default is 0.1.
        attention_probs_dropout_prob (float): The dropout probability for the attention probabilities. Default is 0.1.
        max_position_embeddings (int): The maximum position for the embeddings. Default is 514.
        initializer_range (float): The range for the weight initializers. Default is 0.02.
        pad_token_id (int): The ID for padding tokens. Default is 1.
        layer_norm_eps (float): The epsilon value for layer normalization. Default is 1e-05.
        classifier_dropout (None): The dropout rate for the classifier layer. Default is None.
        is_decoder (bool): Whether the model is a decoder. Default is False.
        act_dropout (float): The dropout rate for the activation function. Default is 0.0.

    Returns:
        None.

    Raises:
        None
    """
    super().__init__(pad_token_id=pad_token_id, **kwargs)
    self.vocab_size = vocab_size
    self.hidden_size = hidden_size
    self.num_hidden_layers = num_hidden_layers
    self.num_attention_heads = num_attention_heads
    self.intermediate_size = intermediate_size
    self.hidden_act = hidden_act
    self.hidden_dropout_prob = hidden_dropout_prob
    self.attention_probs_dropout_prob = attention_probs_dropout_prob
    self.max_position_embeddings = max_position_embeddings
    self.initializer_range = initializer_range
    self.layer_norm_eps = layer_norm_eps
    self.classifier_dropout = classifier_dropout
    self.is_decoder = is_decoder
    self.act_dropout = act_dropout

mindnlp.transformers.models.ernie_m.modeling_ernie_m

MindSpore ErnieM model.

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMAttention

Bases: Module

ErnieMAttention is a class that represents an attention mechanism used in the ERNIE-M model. It contains methods for initializing the attention mechanism, pruning attention heads, and forwarding attention outputs. This class inherits from nn.Module and utilizes an ErnieMSelfAttention module for self-attention calculations. The attention mechanism includes projection layers for query, key, and value, as well as an output projection layer. The prune_heads method allows for pruning specific attention heads based on provided indices. The forward method processes input hidden states through the self-attention mechanism and output projection layer to generate attention outputs.

Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
class ErnieMAttention(nn.Module):

    """
    ErnieMAttention is a class that represents an attention mechanism used in the ERNIE-M model.
    It contains methods for initializing the attention mechanism, pruning attention heads, and forwarding attention outputs.
    This class inherits from nn.Module and utilizes an ErnieMSelfAttention module for self-attention calculations.
    The attention mechanism includes projection layers for query, key, and value, as well as an output projection layer.
    The `prune_heads` method allows for pruning specific attention heads based on provided indices.
    The `forward` method processes input hidden states through the self-attention mechanism and output projection
    layer to generate attention outputs.
    """
    def __init__(self, config, position_embedding_type=None):
        """
        Initialize the ErnieMAttention class.

        Args:
            self: The instance of the class.
            config: An object containing configuration parameters.
            position_embedding_type: Type of position embedding to be used, default is None.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()
        self.self_attn = ErnieMSelfAttention(config, position_embedding_type=position_embedding_type)
        self.out_proj = nn.Linear(config.hidden_size, config.hidden_size)
        self.pruned_heads = set()

    def prune_heads(self, heads):
        """
        This method 'prune_heads' belongs to the class 'ErnieMAttention' and is responsible for pruning specific
        attention heads in the model based on the provided list of heads.

        Args:
            self: Instance of the 'ErnieMAttention' class. It is used to access attributes and methods within the class.
            heads: A list containing the indices of the attention heads that need to be pruned. Each element in the list
                should be an integer representing the index of the head to be pruned.

        Returns:
            None: This method does not return any value but modifies the attention heads in the model in-place.

        Raises:
            None:
                However, it is assumed that the functions called within this method, 
                such as 'find_pruneable_heads_and_indices' and 'prune_linear_layer', may raise exceptions related to 
                input validation or processing errors.
        """
        if len(heads) == 0:
            return
        heads, index = find_pruneable_heads_and_indices(
            heads, self.self_attn.num_attention_heads, self.self_attn.attention_head_size, self.pruned_heads
        )

        # Prune linear layers
        self.self_attn.q_proj = prune_linear_layer(self.self_attn.q_proj, index)
        self.self_attn.k_proj = prune_linear_layer(self.self_attn.k_proj, index)
        self.self_attn.v_proj = prune_linear_layer(self.self_attn.v_proj, index)
        self.out_proj = prune_linear_layer(self.out_proj, index, dim=1)

        # Update hyper params and store pruned heads
        self.self_attn.num_attention_heads = self.self_attn.num_attention_heads - len(heads)
        self.self_attn.all_head_size = self.self_attn.attention_head_size * self.self_attn.num_attention_heads
        self.pruned_heads = self.pruned_heads.union(heads)

    def forward(
        self,
        hidden_states: mindspore.Tensor,
        attention_mask: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        encoder_hidden_states: Optional[mindspore.Tensor] = None,
        encoder_attention_mask: Optional[mindspore.Tensor] = None,
        past_key_value: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
        output_attentions: Optional[bool] = False,
    ) -> Tuple[mindspore.Tensor]:
        """
        This method forwards the ErnieMAttention module.

        Args:
            self: The instance of the ErnieMAttention class.
            hidden_states (mindspore.Tensor): The input hidden states tensor.
            attention_mask (Optional[mindspore.Tensor]): Optional tensor containing attention mask values.
            head_mask (Optional[mindspore.Tensor]): Optional tensor containing head mask values.
            encoder_hidden_states (Optional[mindspore.Tensor]): Optional tensor containing encoder hidden states.
            encoder_attention_mask (Optional[mindspore.Tensor]): Optional tensor containing encoder attention mask values.
            past_key_value (Optional[Tuple[Tuple[mindspore.Tensor]]]): Optional tuple containing past key and value tensors.
            output_attentions (Optional[bool]): Optional boolean indicating whether to output attentions.

        Returns:
            Tuple[mindspore.Tensor]: A tuple containing the attention output tensor.

        Raises:
            None
        """
        self_outputs = self.self_attn(
            hidden_states,
            attention_mask,
            head_mask,
            encoder_hidden_states,
            encoder_attention_mask,
            past_key_value,
            output_attentions,
        )
        attention_output = self.out_proj(self_outputs[0])
        outputs = (attention_output,) + self_outputs[1:]  # add attentions if we output them
        return outputs

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMAttention.__init__(config, position_embedding_type=None)

Initialize the ErnieMAttention class.

PARAMETER DESCRIPTION
self

The instance of the class.

config

An object containing configuration parameters.

position_embedding_type

Type of position embedding to be used, default is None.

DEFAULT: None

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
def __init__(self, config, position_embedding_type=None):
    """
    Initialize the ErnieMAttention class.

    Args:
        self: The instance of the class.
        config: An object containing configuration parameters.
        position_embedding_type: Type of position embedding to be used, default is None.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()
    self.self_attn = ErnieMSelfAttention(config, position_embedding_type=position_embedding_type)
    self.out_proj = nn.Linear(config.hidden_size, config.hidden_size)
    self.pruned_heads = set()

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMAttention.forward(hidden_states, attention_mask=None, head_mask=None, encoder_hidden_states=None, encoder_attention_mask=None, past_key_value=None, output_attentions=False)

This method forwards the ErnieMAttention module.

PARAMETER DESCRIPTION
self

The instance of the ErnieMAttention class.

hidden_states

The input hidden states tensor.

TYPE: Tensor

attention_mask

Optional tensor containing attention mask values.

TYPE: Optional[Tensor] DEFAULT: None

head_mask

Optional tensor containing head mask values.

TYPE: Optional[Tensor] DEFAULT: None

encoder_hidden_states

Optional tensor containing encoder hidden states.

TYPE: Optional[Tensor] DEFAULT: None

encoder_attention_mask

Optional tensor containing encoder attention mask values.

TYPE: Optional[Tensor] DEFAULT: None

past_key_value

Optional tuple containing past key and value tensors.

TYPE: Optional[Tuple[Tuple[Tensor]]] DEFAULT: None

output_attentions

Optional boolean indicating whether to output attentions.

TYPE: Optional[bool] DEFAULT: False

RETURNS DESCRIPTION
Tuple[Tensor]

Tuple[mindspore.Tensor]: A tuple containing the attention output tensor.

Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
def forward(
    self,
    hidden_states: mindspore.Tensor,
    attention_mask: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    encoder_hidden_states: Optional[mindspore.Tensor] = None,
    encoder_attention_mask: Optional[mindspore.Tensor] = None,
    past_key_value: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
    output_attentions: Optional[bool] = False,
) -> Tuple[mindspore.Tensor]:
    """
    This method forwards the ErnieMAttention module.

    Args:
        self: The instance of the ErnieMAttention class.
        hidden_states (mindspore.Tensor): The input hidden states tensor.
        attention_mask (Optional[mindspore.Tensor]): Optional tensor containing attention mask values.
        head_mask (Optional[mindspore.Tensor]): Optional tensor containing head mask values.
        encoder_hidden_states (Optional[mindspore.Tensor]): Optional tensor containing encoder hidden states.
        encoder_attention_mask (Optional[mindspore.Tensor]): Optional tensor containing encoder attention mask values.
        past_key_value (Optional[Tuple[Tuple[mindspore.Tensor]]]): Optional tuple containing past key and value tensors.
        output_attentions (Optional[bool]): Optional boolean indicating whether to output attentions.

    Returns:
        Tuple[mindspore.Tensor]: A tuple containing the attention output tensor.

    Raises:
        None
    """
    self_outputs = self.self_attn(
        hidden_states,
        attention_mask,
        head_mask,
        encoder_hidden_states,
        encoder_attention_mask,
        past_key_value,
        output_attentions,
    )
    attention_output = self.out_proj(self_outputs[0])
    outputs = (attention_output,) + self_outputs[1:]  # add attentions if we output them
    return outputs

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMAttention.prune_heads(heads)

This method 'prune_heads' belongs to the class 'ErnieMAttention' and is responsible for pruning specific attention heads in the model based on the provided list of heads.

PARAMETER DESCRIPTION
self

Instance of the 'ErnieMAttention' class. It is used to access attributes and methods within the class.

heads

A list containing the indices of the attention heads that need to be pruned. Each element in the list should be an integer representing the index of the head to be pruned.

RETURNS DESCRIPTION
None

This method does not return any value but modifies the attention heads in the model in-place.

RAISES DESCRIPTION
None

However, it is assumed that the functions called within this method, such as 'find_pruneable_heads_and_indices' and 'prune_linear_layer', may raise exceptions related to input validation or processing errors.

Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
def prune_heads(self, heads):
    """
    This method 'prune_heads' belongs to the class 'ErnieMAttention' and is responsible for pruning specific
    attention heads in the model based on the provided list of heads.

    Args:
        self: Instance of the 'ErnieMAttention' class. It is used to access attributes and methods within the class.
        heads: A list containing the indices of the attention heads that need to be pruned. Each element in the list
            should be an integer representing the index of the head to be pruned.

    Returns:
        None: This method does not return any value but modifies the attention heads in the model in-place.

    Raises:
        None:
            However, it is assumed that the functions called within this method, 
            such as 'find_pruneable_heads_and_indices' and 'prune_linear_layer', may raise exceptions related to 
            input validation or processing errors.
    """
    if len(heads) == 0:
        return
    heads, index = find_pruneable_heads_and_indices(
        heads, self.self_attn.num_attention_heads, self.self_attn.attention_head_size, self.pruned_heads
    )

    # Prune linear layers
    self.self_attn.q_proj = prune_linear_layer(self.self_attn.q_proj, index)
    self.self_attn.k_proj = prune_linear_layer(self.self_attn.k_proj, index)
    self.self_attn.v_proj = prune_linear_layer(self.self_attn.v_proj, index)
    self.out_proj = prune_linear_layer(self.out_proj, index, dim=1)

    # Update hyper params and store pruned heads
    self.self_attn.num_attention_heads = self.self_attn.num_attention_heads - len(heads)
    self.self_attn.all_head_size = self.self_attn.attention_head_size * self.self_attn.num_attention_heads
    self.pruned_heads = self.pruned_heads.union(heads)

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMEmbeddings

Bases: Module

Construct the embeddings from word and position embeddings.

Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
class ErnieMEmbeddings(nn.Module):
    """Construct the embeddings from word and position embeddings."""
    def __init__(self, config):
        """
        Args:
            self (object): The instance of the ErnieMEmbeddings class.
            config (object): An object containing configuration parameters for the ErnieMEmbeddings instance,
                including the hidden size, vocabulary size, maximum position embeddings, padding token ID, layer
                normalization epsilon, and hidden dropout probability.

        Returns:
            None.

        Raises:
            TypeError: If the config parameter is not of the expected type.
            ValueError: If the config parameter does not contain required attributes or if the padding token ID is not valid.
        """
        super().__init__()
        self.hidden_size = config.hidden_size
        self.word_embeddings = nn.Embedding(config.vocab_size, config.hidden_size, padding_idx=config.pad_token_id)
        self.position_embeddings = nn.Embedding(
            config.max_position_embeddings, config.hidden_size, padding_idx=config.pad_token_id
        )
        self.layer_norm = nn.LayerNorm([config.hidden_size], eps=config.layer_norm_eps)
        self.dropout = nn.Dropout(p=config.hidden_dropout_prob)
        self.padding_idx = config.pad_token_id

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        past_key_values_length: int = 0,
    ) -> mindspore.Tensor:
        """
        This method 'forward' in the class 'ErnieMEmbeddings' forwards the embeddings for the input tokens.

        Args:
            self: The instance of the class.
            input_ids (Optional[mindspore.Tensor]):
                The input token IDs. Default is None. If None, 'inputs_embeds' is used to generate the embeddings.
            position_ids (Optional[mindspore.Tensor]): The position IDs for the input tokens.
                Default is None. If None, position IDs are calculated based on the input shape.
            inputs_embeds (Optional[mindspore.Tensor]): The input embeddings.
                Default is None. If None, input embeddings are generated using 'word_embeddings' based on 'input_ids'.
            past_key_values_length (int): The length of past key values.
                Default is 0. It is used to adjust the 'position_ids' if past key values are present.

        Returns:
            mindspore.Tensor: The forwarded embeddings for the input tokens.

        Raises:
            ValueError: If the input shape is invalid or if 'position_ids' cannot be calculated.
            TypeError: If the input types are not as expected.
        """
        if inputs_embeds is None:
            inputs_embeds = self.word_embeddings(input_ids)
        if position_ids is None:
            input_shape = inputs_embeds.shape[:-1]
            ones = ops.ones(input_shape, dtype=mindspore.int64)
            seq_length = ops.cumsum(ones, axis=1)
            position_ids = seq_length - ones

            if past_key_values_length > 0:
                position_ids = position_ids + past_key_values_length
        # to mimic paddlenlp implementation
        position_ids += 2
        position_embeddings = self.position_embeddings(position_ids)
        embeddings = inputs_embeds + position_embeddings
        embeddings = self.layer_norm(embeddings)
        embeddings = self.dropout(embeddings)

        return embeddings

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMEmbeddings.__init__(config)

PARAMETER DESCRIPTION
self

The instance of the ErnieMEmbeddings class.

TYPE: object

config

An object containing configuration parameters for the ErnieMEmbeddings instance, including the hidden size, vocabulary size, maximum position embeddings, padding token ID, layer normalization epsilon, and hidden dropout probability.

TYPE: object

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
TypeError

If the config parameter is not of the expected type.

ValueError

If the config parameter does not contain required attributes or if the padding token ID is not valid.

Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
def __init__(self, config):
    """
    Args:
        self (object): The instance of the ErnieMEmbeddings class.
        config (object): An object containing configuration parameters for the ErnieMEmbeddings instance,
            including the hidden size, vocabulary size, maximum position embeddings, padding token ID, layer
            normalization epsilon, and hidden dropout probability.

    Returns:
        None.

    Raises:
        TypeError: If the config parameter is not of the expected type.
        ValueError: If the config parameter does not contain required attributes or if the padding token ID is not valid.
    """
    super().__init__()
    self.hidden_size = config.hidden_size
    self.word_embeddings = nn.Embedding(config.vocab_size, config.hidden_size, padding_idx=config.pad_token_id)
    self.position_embeddings = nn.Embedding(
        config.max_position_embeddings, config.hidden_size, padding_idx=config.pad_token_id
    )
    self.layer_norm = nn.LayerNorm([config.hidden_size], eps=config.layer_norm_eps)
    self.dropout = nn.Dropout(p=config.hidden_dropout_prob)
    self.padding_idx = config.pad_token_id

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMEmbeddings.forward(input_ids=None, position_ids=None, inputs_embeds=None, past_key_values_length=0)

This method 'forward' in the class 'ErnieMEmbeddings' forwards the embeddings for the input tokens.

PARAMETER DESCRIPTION
self

The instance of the class.

input_ids

The input token IDs. Default is None. If None, 'inputs_embeds' is used to generate the embeddings.

TYPE: Optional[Tensor] DEFAULT: None

position_ids

The position IDs for the input tokens. Default is None. If None, position IDs are calculated based on the input shape.

TYPE: Optional[Tensor] DEFAULT: None

inputs_embeds

The input embeddings. Default is None. If None, input embeddings are generated using 'word_embeddings' based on 'input_ids'.

TYPE: Optional[Tensor] DEFAULT: None

past_key_values_length

The length of past key values. Default is 0. It is used to adjust the 'position_ids' if past key values are present.

TYPE: int DEFAULT: 0

RETURNS DESCRIPTION
Tensor

mindspore.Tensor: The forwarded embeddings for the input tokens.

RAISES DESCRIPTION
ValueError

If the input shape is invalid or if 'position_ids' cannot be calculated.

TypeError

If the input types are not as expected.

Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    past_key_values_length: int = 0,
) -> mindspore.Tensor:
    """
    This method 'forward' in the class 'ErnieMEmbeddings' forwards the embeddings for the input tokens.

    Args:
        self: The instance of the class.
        input_ids (Optional[mindspore.Tensor]):
            The input token IDs. Default is None. If None, 'inputs_embeds' is used to generate the embeddings.
        position_ids (Optional[mindspore.Tensor]): The position IDs for the input tokens.
            Default is None. If None, position IDs are calculated based on the input shape.
        inputs_embeds (Optional[mindspore.Tensor]): The input embeddings.
            Default is None. If None, input embeddings are generated using 'word_embeddings' based on 'input_ids'.
        past_key_values_length (int): The length of past key values.
            Default is 0. It is used to adjust the 'position_ids' if past key values are present.

    Returns:
        mindspore.Tensor: The forwarded embeddings for the input tokens.

    Raises:
        ValueError: If the input shape is invalid or if 'position_ids' cannot be calculated.
        TypeError: If the input types are not as expected.
    """
    if inputs_embeds is None:
        inputs_embeds = self.word_embeddings(input_ids)
    if position_ids is None:
        input_shape = inputs_embeds.shape[:-1]
        ones = ops.ones(input_shape, dtype=mindspore.int64)
        seq_length = ops.cumsum(ones, axis=1)
        position_ids = seq_length - ones

        if past_key_values_length > 0:
            position_ids = position_ids + past_key_values_length
    # to mimic paddlenlp implementation
    position_ids += 2
    position_embeddings = self.position_embeddings(position_ids)
    embeddings = inputs_embeds + position_embeddings
    embeddings = self.layer_norm(embeddings)
    embeddings = self.dropout(embeddings)

    return embeddings

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMEncoder

Bases: Module

ErnieMEncoder represents a multi-layer Transformer-based encoder model for processing sequences of input data.

The ErnieMEncoder class inherits from nn.Module and implements a multi-layer Transformer-based encoder, with the ability to return hidden states and attention weights if specified. The class provides methods for initializing the model and processing input data through its layers.

ATTRIBUTE DESCRIPTION
config

A configuration object containing the model's hyperparameters.

layers

A list of ErnieMEncoderLayer instances representing the individual layers of the encoder model.

METHOD DESCRIPTION
forward

Processes input embeddings through the encoder layers, optionally returning hidden states and

Please note that the actual code implementation is not included in this docstring.

Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
class ErnieMEncoder(nn.Module):

    """
    ErnieMEncoder represents a multi-layer Transformer-based encoder model for processing sequences of input data.

    The ErnieMEncoder class inherits from nn.Module and implements a multi-layer Transformer-based encoder,
    with the ability to return hidden states and attention weights if specified.
    The class provides methods for initializing the model and processing input data through its layers.

    Attributes:
        config: A configuration object containing the model's hyperparameters.
        layers: A list of ErnieMEncoderLayer instances representing the individual layers of the encoder model.

    Methods:
        forward: Processes input embeddings through the encoder layers, optionally returning hidden states and
        attention weights based on the specified parameters.

    Please note that the actual code implementation is not included in this docstring.
    """
    def __init__(self, config):
        """
        Initializes an instance of the ErnieMEncoder class.

        Args:
            self (ErnieMEncoder): The instance of the ErnieMEncoder class.
            config (object): The configuration object containing settings for the ErnieMEncoder.
                This parameter is required for configuring the ErnieMEncoder instance.
                It should be an object that provides necessary configuration details.
                It is expected to have attributes such as num_hidden_layers to specify the number of hidden layers.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()
        self.config = config
        self.layers = nn.ModuleList([ErnieMEncoderLayer(config) for _ in range(config.num_hidden_layers)])

    def forward(
        self,
        input_embeds: mindspore.Tensor,
        attention_mask: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        past_key_values: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
        output_attentions: Optional[bool] = False,
        output_hidden_states: Optional[bool] = False,
        return_dict: Optional[bool] = True,
    ) -> Union[Tuple[mindspore.Tensor], BaseModelOutputWithPastAndCrossAttentions]:
        """
        Constructs the ErnieMEncoder.

        Args:
            self: The instance of the class.
            input_embeds (mindspore.Tensor): The input embeddings. Shape (batch_size, sequence_length, hidden_size).
            attention_mask (Optional[mindspore.Tensor]): The attention mask. Shape (batch_size, sequence_length).
            head_mask (Optional[mindspore.Tensor]): The head mask. Shape (num_layers, num_heads).
            past_key_values (Optional[Tuple[Tuple[mindspore.Tensor]]]): The past key values.
                Shape (num_layers, 2, batch_size, num_heads, sequence_length // num_heads, hidden_size // num_heads).
            output_attentions (Optional[bool]): Whether to output attention weights. Default is False.
            output_hidden_states (Optional[bool]): Whether to output hidden states. Default is False.
            return_dict (Optional[bool]): Whether to return a BaseModelOutputWithPastAndCrossAttentions. Default is True.

        Returns:
            Union[Tuple[mindspore.Tensor], BaseModelOutputWithPastAndCrossAttentions]:
                The encoded last hidden state, optional hidden states, and optional attention weights.

        Raises:
            None.
        """
        hidden_states = () if output_hidden_states else None
        attentions = () if output_attentions else None

        output = input_embeds
        if output_hidden_states:
            hidden_states = hidden_states + (output,)
        for i, layer in enumerate(self.layers):
            layer_head_mask = head_mask[i] if head_mask is not None else None
            past_key_value = past_key_values[i] if past_key_values is not None else None

            output, opt_attn_weights = layer(
                hidden_states=output,
                attention_mask=attention_mask,
                head_mask=layer_head_mask,
                past_key_value=past_key_value,
            )

            if output_hidden_states:
                hidden_states = hidden_states + (output,)
            if output_attentions:
                attentions = attentions + (opt_attn_weights,)

        last_hidden_state = output
        if not return_dict:
            return tuple(v for v in [last_hidden_state, hidden_states, attentions] if v is not None)

        return BaseModelOutputWithPastAndCrossAttentions(
            last_hidden_state=last_hidden_state, hidden_states=hidden_states, attentions=attentions
        )

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMEncoder.__init__(config)

Initializes an instance of the ErnieMEncoder class.

PARAMETER DESCRIPTION
self

The instance of the ErnieMEncoder class.

TYPE: ErnieMEncoder

config

The configuration object containing settings for the ErnieMEncoder. This parameter is required for configuring the ErnieMEncoder instance. It should be an object that provides necessary configuration details. It is expected to have attributes such as num_hidden_layers to specify the number of hidden layers.

TYPE: object

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
def __init__(self, config):
    """
    Initializes an instance of the ErnieMEncoder class.

    Args:
        self (ErnieMEncoder): The instance of the ErnieMEncoder class.
        config (object): The configuration object containing settings for the ErnieMEncoder.
            This parameter is required for configuring the ErnieMEncoder instance.
            It should be an object that provides necessary configuration details.
            It is expected to have attributes such as num_hidden_layers to specify the number of hidden layers.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()
    self.config = config
    self.layers = nn.ModuleList([ErnieMEncoderLayer(config) for _ in range(config.num_hidden_layers)])

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMEncoder.forward(input_embeds, attention_mask=None, head_mask=None, past_key_values=None, output_attentions=False, output_hidden_states=False, return_dict=True)

Constructs the ErnieMEncoder.

PARAMETER DESCRIPTION
self

The instance of the class.

input_embeds

The input embeddings. Shape (batch_size, sequence_length, hidden_size).

TYPE: Tensor

attention_mask

The attention mask. Shape (batch_size, sequence_length).

TYPE: Optional[Tensor] DEFAULT: None

head_mask

The head mask. Shape (num_layers, num_heads).

TYPE: Optional[Tensor] DEFAULT: None

past_key_values

The past key values. Shape (num_layers, 2, batch_size, num_heads, sequence_length // num_heads, hidden_size // num_heads).

TYPE: Optional[Tuple[Tuple[Tensor]]] DEFAULT: None

output_attentions

Whether to output attention weights. Default is False.

TYPE: Optional[bool] DEFAULT: False

output_hidden_states

Whether to output hidden states. Default is False.

TYPE: Optional[bool] DEFAULT: False

return_dict

Whether to return a BaseModelOutputWithPastAndCrossAttentions. Default is True.

TYPE: Optional[bool] DEFAULT: True

RETURNS DESCRIPTION
Union[Tuple[Tensor], BaseModelOutputWithPastAndCrossAttentions]

Union[Tuple[mindspore.Tensor], BaseModelOutputWithPastAndCrossAttentions]: The encoded last hidden state, optional hidden states, and optional attention weights.

Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
def forward(
    self,
    input_embeds: mindspore.Tensor,
    attention_mask: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    past_key_values: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
    output_attentions: Optional[bool] = False,
    output_hidden_states: Optional[bool] = False,
    return_dict: Optional[bool] = True,
) -> Union[Tuple[mindspore.Tensor], BaseModelOutputWithPastAndCrossAttentions]:
    """
    Constructs the ErnieMEncoder.

    Args:
        self: The instance of the class.
        input_embeds (mindspore.Tensor): The input embeddings. Shape (batch_size, sequence_length, hidden_size).
        attention_mask (Optional[mindspore.Tensor]): The attention mask. Shape (batch_size, sequence_length).
        head_mask (Optional[mindspore.Tensor]): The head mask. Shape (num_layers, num_heads).
        past_key_values (Optional[Tuple[Tuple[mindspore.Tensor]]]): The past key values.
            Shape (num_layers, 2, batch_size, num_heads, sequence_length // num_heads, hidden_size // num_heads).
        output_attentions (Optional[bool]): Whether to output attention weights. Default is False.
        output_hidden_states (Optional[bool]): Whether to output hidden states. Default is False.
        return_dict (Optional[bool]): Whether to return a BaseModelOutputWithPastAndCrossAttentions. Default is True.

    Returns:
        Union[Tuple[mindspore.Tensor], BaseModelOutputWithPastAndCrossAttentions]:
            The encoded last hidden state, optional hidden states, and optional attention weights.

    Raises:
        None.
    """
    hidden_states = () if output_hidden_states else None
    attentions = () if output_attentions else None

    output = input_embeds
    if output_hidden_states:
        hidden_states = hidden_states + (output,)
    for i, layer in enumerate(self.layers):
        layer_head_mask = head_mask[i] if head_mask is not None else None
        past_key_value = past_key_values[i] if past_key_values is not None else None

        output, opt_attn_weights = layer(
            hidden_states=output,
            attention_mask=attention_mask,
            head_mask=layer_head_mask,
            past_key_value=past_key_value,
        )

        if output_hidden_states:
            hidden_states = hidden_states + (output,)
        if output_attentions:
            attentions = attentions + (opt_attn_weights,)

    last_hidden_state = output
    if not return_dict:
        return tuple(v for v in [last_hidden_state, hidden_states, attentions] if v is not None)

    return BaseModelOutputWithPastAndCrossAttentions(
        last_hidden_state=last_hidden_state, hidden_states=hidden_states, attentions=attentions
    )

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMEncoderLayer

Bases: Module

The ErnieMEncoderLayer class represents a single layer of the ErnieM (Enhanced Representation through kNowledge Integration) encoder, which is designed for natural language processing tasks. This class inherits from the nn.Module class and implements the functionality for processing input hidden states using multi-head self-attention mechanism and feedforward neural network layers with layer normalization and dropout.

ATTRIBUTE DESCRIPTION
self_attn

Instance of ErnieMAttention for multi-head self-attention mechanism.

linear1

Instance of nn.Linear for the first feedforward neural network layer.

dropout

Instance of nn.Dropout for applying dropout within the feedforward network.

linear2

Instance of nn.Linear for the second feedforward neural network layer.

norm1

Instance of nn.LayerNorm for the first layer normalization.

norm2

Instance of nn.LayerNorm for the second layer normalization.

dropout1

Instance of nn.Dropout for applying dropout after the first feedforward network layer.

dropout2

Instance of nn.Dropout for applying dropout after the second feedforward network layer.

activation

Activation function for the feedforward network.

METHOD DESCRIPTION
forward

Applies the multi-head self-attention mechanism and feedforward network layers to the input hidden states, optionally producing attention weights.

Args:

  • hidden_states (mindspore.Tensor): The input hidden states.
  • attention_mask (Optional[mindspore.Tensor]): Optional tensor for masking the attention scores.
  • head_mask (Optional[mindspore.Tensor]): Optional tensor for masking specific attention heads.
  • past_key_value (Optional[Tuple[Tuple[mindspore.Tensor]]]): Optional tuple containing past key and value tensors for fast decoding.
  • output_attentions (Optional[bool]): Optional boolean indicating whether to return attention weights.

Returns:

  • mindspore.Tensor or Tuple[mindspore.Tensor]: The processed hidden states and optionally the attention weights.
Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
class ErnieMEncoderLayer(nn.Module):

    """
    The ErnieMEncoderLayer class represents a single layer of the ErnieM (Enhanced Representation through kNowledge 
    Integration) encoder, which is designed for natural language processing tasks. This class inherits from the nn.Module 
    class and implements the functionality for processing input hidden states using multi-head self-attention mechanism 
    and feedforward neural network layers with layer normalization and dropout.

    Attributes:
        self_attn: Instance of ErnieMAttention for multi-head self-attention mechanism.
        linear1: Instance of nn.Linear for the first feedforward neural network layer.
        dropout: Instance of nn.Dropout for applying dropout within the feedforward network.
        linear2: Instance of nn.Linear for the second feedforward neural network layer.
        norm1: Instance of nn.LayerNorm for the first layer normalization.
        norm2: Instance of nn.LayerNorm for the second layer normalization.
        dropout1: Instance of nn.Dropout for applying dropout after the first feedforward network layer.
        dropout2: Instance of nn.Dropout for applying dropout after the second feedforward network layer.
        activation: Activation function for the feedforward network.

    Methods:
        forward(self, hidden_states, attention_mask=None, head_mask=None, past_key_value=None, output_attentions=True):
            Applies the multi-head self-attention mechanism and feedforward network layers to the input hidden states, 
            optionally producing attention weights.

            Args:

            - hidden_states (mindspore.Tensor): The input hidden states.
            - attention_mask (Optional[mindspore.Tensor]): Optional tensor for masking the attention scores.
            - head_mask (Optional[mindspore.Tensor]): Optional tensor for masking specific attention heads.
            - past_key_value (Optional[Tuple[Tuple[mindspore.Tensor]]]):
            Optional tuple containing past key and value tensors for fast decoding.
            - output_attentions (Optional[bool]): Optional boolean indicating whether to return attention weights.

            Returns:

            - mindspore.Tensor or Tuple[mindspore.Tensor]: The processed hidden states and optionally the attention weights.
    """
    def __init__(self, config):
        """
        Initialize an instance of the ErnieMEncoderLayer class.

        Args:
            self (ErnieMEncoderLayer): The instance of the ErnieMEncoderLayer class.
            config (object): 
                An object containing configuration parameters for the encoder layer.

                - hidden_dropout_prob (float): The probability of dropout for hidden layers. Default is 0.1.
                - act_dropout (float): The probability of dropout for activation functions. 
                Default is the value of hidden_dropout_prob.
                - hidden_size (int): The size of the hidden layers.
                - intermediate_size (int): The size of the intermediate layers.
                - layer_norm_eps (float): The epsilon value for layer normalization.
                - hidden_act (str or function): The activation function to be used. 
                If a string, it will be converted to a function using ACT2FN dictionary.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()
        # to mimic paddlenlp implementation
        dropout = 0.1 if config.hidden_dropout_prob is None else config.hidden_dropout_prob
        act_dropout = config.hidden_dropout_prob if config.act_dropout is None else config.act_dropout

        self.self_attn = ErnieMAttention(config)
        self.linear1 = nn.Linear(config.hidden_size, config.intermediate_size)
        self.dropout = nn.Dropout(p=act_dropout)
        self.linear2 = nn.Linear(config.intermediate_size, config.hidden_size)
        self.norm1 = nn.LayerNorm([config.hidden_size], eps=config.layer_norm_eps)
        self.norm2 = nn.LayerNorm([config.hidden_size], eps=config.layer_norm_eps)
        self.dropout1 = nn.Dropout(p=dropout)
        self.dropout2 = nn.Dropout(p=dropout)
        if isinstance(config.hidden_act, str):
            self.activation = ACT2FN[config.hidden_act]
        else:
            self.activation = config.hidden_act

    def forward(
        self,
        hidden_states: mindspore.Tensor,
        attention_mask: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        past_key_value: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
        output_attentions: Optional[bool] = True,
    ):
        """
        Constructs an ErnieMEncoderLayer.

        This method applies the ErnieMEncoderLayer transformation to the input hidden states.

        Args:
            self: An instance of the ErnieMEncoderLayer class.
            hidden_states (mindspore.Tensor): The input hidden states. This should be a tensor.
            attention_mask (Optional[mindspore.Tensor]): The attention mask tensor. Defaults to None.
            head_mask (Optional[mindspore.Tensor]): The head mask tensor. Defaults to None.
            past_key_value (Optional[Tuple[Tuple[mindspore.Tensor]]]): The past key value tensor. Defaults to None.
            output_attentions (Optional[bool]): Whether to output attention weights. Defaults to True.

        Returns:
            None.

        Raises:
            None.
        """
        residual = hidden_states
        if output_attentions:
            hidden_states, attention_opt_weights = self.self_attn(
                hidden_states=hidden_states,
                attention_mask=attention_mask,
                head_mask=head_mask,
                past_key_value=past_key_value,
                output_attentions=output_attentions,
            )

        else:
            hidden_states = self.self_attn(
                hidden_states=hidden_states,
                attention_mask=attention_mask,
                head_mask=head_mask,
                past_key_value=past_key_value,
                output_attentions=output_attentions,
            )
        hidden_states = residual + self.dropout1(hidden_states)
        hidden_states = self.norm1(hidden_states)
        residual = hidden_states

        hidden_states = self.linear1(hidden_states)
        hidden_states = self.activation(hidden_states)
        hidden_states = self.dropout(hidden_states)
        hidden_states = self.linear2(hidden_states)
        hidden_states = residual + self.dropout2(hidden_states)
        hidden_states = self.norm2(hidden_states)

        if output_attentions:
            return hidden_states, attention_opt_weights
        return hidden_states

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMEncoderLayer.__init__(config)

Initialize an instance of the ErnieMEncoderLayer class.

PARAMETER DESCRIPTION
self

The instance of the ErnieMEncoderLayer class.

TYPE: ErnieMEncoderLayer

config

An object containing configuration parameters for the encoder layer.

  • hidden_dropout_prob (float): The probability of dropout for hidden layers. Default is 0.1.
  • act_dropout (float): The probability of dropout for activation functions. Default is the value of hidden_dropout_prob.
  • hidden_size (int): The size of the hidden layers.
  • intermediate_size (int): The size of the intermediate layers.
  • layer_norm_eps (float): The epsilon value for layer normalization.
  • hidden_act (str or function): The activation function to be used. If a string, it will be converted to a function using ACT2FN dictionary.

TYPE: object

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
def __init__(self, config):
    """
    Initialize an instance of the ErnieMEncoderLayer class.

    Args:
        self (ErnieMEncoderLayer): The instance of the ErnieMEncoderLayer class.
        config (object): 
            An object containing configuration parameters for the encoder layer.

            - hidden_dropout_prob (float): The probability of dropout for hidden layers. Default is 0.1.
            - act_dropout (float): The probability of dropout for activation functions. 
            Default is the value of hidden_dropout_prob.
            - hidden_size (int): The size of the hidden layers.
            - intermediate_size (int): The size of the intermediate layers.
            - layer_norm_eps (float): The epsilon value for layer normalization.
            - hidden_act (str or function): The activation function to be used. 
            If a string, it will be converted to a function using ACT2FN dictionary.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()
    # to mimic paddlenlp implementation
    dropout = 0.1 if config.hidden_dropout_prob is None else config.hidden_dropout_prob
    act_dropout = config.hidden_dropout_prob if config.act_dropout is None else config.act_dropout

    self.self_attn = ErnieMAttention(config)
    self.linear1 = nn.Linear(config.hidden_size, config.intermediate_size)
    self.dropout = nn.Dropout(p=act_dropout)
    self.linear2 = nn.Linear(config.intermediate_size, config.hidden_size)
    self.norm1 = nn.LayerNorm([config.hidden_size], eps=config.layer_norm_eps)
    self.norm2 = nn.LayerNorm([config.hidden_size], eps=config.layer_norm_eps)
    self.dropout1 = nn.Dropout(p=dropout)
    self.dropout2 = nn.Dropout(p=dropout)
    if isinstance(config.hidden_act, str):
        self.activation = ACT2FN[config.hidden_act]
    else:
        self.activation = config.hidden_act

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMEncoderLayer.forward(hidden_states, attention_mask=None, head_mask=None, past_key_value=None, output_attentions=True)

Constructs an ErnieMEncoderLayer.

This method applies the ErnieMEncoderLayer transformation to the input hidden states.

PARAMETER DESCRIPTION
self

An instance of the ErnieMEncoderLayer class.

hidden_states

The input hidden states. This should be a tensor.

TYPE: Tensor

attention_mask

The attention mask tensor. Defaults to None.

TYPE: Optional[Tensor] DEFAULT: None

head_mask

The head mask tensor. Defaults to None.

TYPE: Optional[Tensor] DEFAULT: None

past_key_value

The past key value tensor. Defaults to None.

TYPE: Optional[Tuple[Tuple[Tensor]]] DEFAULT: None

output_attentions

Whether to output attention weights. Defaults to True.

TYPE: Optional[bool] DEFAULT: True

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
def forward(
    self,
    hidden_states: mindspore.Tensor,
    attention_mask: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    past_key_value: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
    output_attentions: Optional[bool] = True,
):
    """
    Constructs an ErnieMEncoderLayer.

    This method applies the ErnieMEncoderLayer transformation to the input hidden states.

    Args:
        self: An instance of the ErnieMEncoderLayer class.
        hidden_states (mindspore.Tensor): The input hidden states. This should be a tensor.
        attention_mask (Optional[mindspore.Tensor]): The attention mask tensor. Defaults to None.
        head_mask (Optional[mindspore.Tensor]): The head mask tensor. Defaults to None.
        past_key_value (Optional[Tuple[Tuple[mindspore.Tensor]]]): The past key value tensor. Defaults to None.
        output_attentions (Optional[bool]): Whether to output attention weights. Defaults to True.

    Returns:
        None.

    Raises:
        None.
    """
    residual = hidden_states
    if output_attentions:
        hidden_states, attention_opt_weights = self.self_attn(
            hidden_states=hidden_states,
            attention_mask=attention_mask,
            head_mask=head_mask,
            past_key_value=past_key_value,
            output_attentions=output_attentions,
        )

    else:
        hidden_states = self.self_attn(
            hidden_states=hidden_states,
            attention_mask=attention_mask,
            head_mask=head_mask,
            past_key_value=past_key_value,
            output_attentions=output_attentions,
        )
    hidden_states = residual + self.dropout1(hidden_states)
    hidden_states = self.norm1(hidden_states)
    residual = hidden_states

    hidden_states = self.linear1(hidden_states)
    hidden_states = self.activation(hidden_states)
    hidden_states = self.dropout(hidden_states)
    hidden_states = self.linear2(hidden_states)
    hidden_states = residual + self.dropout2(hidden_states)
    hidden_states = self.norm2(hidden_states)

    if output_attentions:
        return hidden_states, attention_opt_weights
    return hidden_states

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMForInformationExtraction

Bases: ErnieMPreTrainedModel

ErnieMForInformationExtraction is a class that represents an ErnieM model for information extraction tasks. It inherits from ErnieMPreTrainedModel and includes methods for initializing the model and forwarding the forward pass.

ATTRIBUTE DESCRIPTION
ernie_m

The ErnieM model used for information extraction.

TYPE: ErnieMModel

linear_start

Linear layer for predicting the start position in the input sequence.

TYPE: Linear

linear_end

Linear layer for predicting the end position in the input sequence.

TYPE: Linear

sigmoid

Sigmoid activation function for probability calculation.

TYPE: Sigmoid

METHOD DESCRIPTION
__init__

Initializes the ErnieMForInformationExtraction class with the provided configuration.

forward

Constructs the forward pass of the model for information extraction tasks.

PARAMETER DESCRIPTION
input_ids

Input tensor containing token ids.

TYPE: Tensor

attention_mask

Tensor specifying which tokens should be attended to.

TYPE: Tensor

position_ids

Tensor specifying the position ids of tokens.

TYPE: Tensor

head_mask

Tensor for masking specific heads in the self-attention layers.

TYPE: Tensor

inputs_embeds

Tensor for providing custom embeddings instead of token ids.

TYPE: Tensor

start_positions

Labels for start positions in the input sequence.

TYPE: Tensor

end_positions

Labels for end positions in the input sequence.

TYPE: Tensor

output_attentions

Flag to output attention weights.

TYPE: bool

output_hidden_states

Flag to output hidden states.

TYPE: bool

return_dict

Flag to return outputs as a dictionary.

TYPE: bool

RETURNS DESCRIPTION

Union[Tuple[mindspore.Tensor], QuestionAnsweringModelOutput]: Tuple of output tensors or a QuestionAnsweringModelOutput object.

RAISES DESCRIPTION
ValueError

If start_positions or end_positions are not of the expected shape.

Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
class ErnieMForInformationExtraction(ErnieMPreTrainedModel):

    """
    ErnieMForInformationExtraction is a class that represents an ErnieM model for information extraction tasks. 
    It inherits from ErnieMPreTrainedModel and includes methods for initializing the model and forwarding the forward pass.

    Attributes:
        ernie_m (ErnieMModel): The ErnieM model used for information extraction.
        linear_start (nn.Linear): Linear layer for predicting the start position in the input sequence.
        linear_end (nn.Linear): Linear layer for predicting the end position in the input sequence.
        sigmoid (nn.Sigmoid): Sigmoid activation function for probability calculation.

    Methods:
        __init__: Initializes the ErnieMForInformationExtraction class with the provided configuration.
        forward: Constructs the forward pass of the model for information extraction tasks.

    Args:
        input_ids (mindspore.Tensor): Input tensor containing token ids.
        attention_mask (mindspore.Tensor): Tensor specifying which tokens should be attended to.
        position_ids (mindspore.Tensor): Tensor specifying the position ids of tokens.
        head_mask (mindspore.Tensor): Tensor for masking specific heads in the self-attention layers.
        inputs_embeds (mindspore.Tensor): Tensor for providing custom embeddings instead of token ids.
        start_positions (mindspore.Tensor): Labels for start positions in the input sequence.
        end_positions (mindspore.Tensor): Labels for end positions in the input sequence.
        output_attentions (bool): Flag to output attention weights.
        output_hidden_states (bool): Flag to output hidden states.
        return_dict (bool): Flag to return outputs as a dictionary.

    Returns:
        Union[Tuple[mindspore.Tensor], QuestionAnsweringModelOutput]: Tuple of output tensors or a QuestionAnsweringModelOutput object.

    Raises:
        ValueError: If start_positions or end_positions are not of the expected shape.

    """
    def __init__(self, config):
        """
        Initializes a new instance of the ErnieMForInformationExtraction class.

        Args:
            self: The instance of the class.
            config: An instance of the ErnieMConfig class containing the configuration parameters for the model.

        Returns:
            None

        Raises:
            None
        """
        super().__init__(config)
        self.ernie_m = ErnieMModel(config)
        self.linear_start = nn.Linear(config.hidden_size, 1)
        self.linear_end = nn.Linear(config.hidden_size, 1)
        self.sigmoid = nn.Sigmoid()
        self.post_init()

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        start_positions: Optional[mindspore.Tensor] = None,
        end_positions: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = True,
    ) -> Union[Tuple[mindspore.Tensor], QuestionAnsweringModelOutput]:
        r"""
        Args:
            start_positions (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
                Labels for position (index) for computing the start_positions loss. Position outside of the sequence are
                not taken into account for computing the loss.
            end_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
                Labels for position (index) for computing the end_positions loss. Position outside of the sequence are not
                taken into account for computing the loss.
        """
        result = self.ernie_m(
            input_ids,
            attention_mask=attention_mask,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )
        if return_dict:
            sequence_output = result.last_hidden_state
        elif not return_dict:
            sequence_output = result[0]

        start_logits = self.linear_start(sequence_output)
        start_logits = start_logits.squeeze(-1)
        start_prob = self.sigmoid(start_logits)
        end_logits = self.linear_end(sequence_output)
        end_logits = end_logits.squeeze(-1)
        end_prob = self.sigmoid(end_logits)

        total_loss = None
        if start_positions is not None and end_positions is not None:
            # If we are on multi-GPU, split add a dimension
            if len(start_positions.shape) > 1 and start_positions.shape[-1] == 1:
                start_positions = start_positions.squeeze(-1)
            if len(end_positions.shape) > 1 and end_positions.shape[-1] == 1:
                end_positions = end_positions.squeeze(-1)
            # sometimes the start/end positions are outside our model inputs, we ignore these terms
            ignored_index = start_logits.shape[1]
            start_positions = start_positions.clamp(0, ignored_index)
            end_positions = end_positions.clamp(0, ignored_index)

            start_loss = ops.binary_cross_entropy(start_prob, start_positions)
            end_loss = ops.binary_cross_entropy(end_prob, end_positions)
            total_loss = (start_loss + end_loss) / 2

        if not return_dict:
            return tuple(
                i
                for i in [total_loss, start_prob, end_prob, result.hidden_states, result.attentions]
                if i is not None
            )

        return QuestionAnsweringModelOutput(
            loss=total_loss,
            start_logits=start_prob,
            end_logits=end_prob,
            hidden_states=result.hidden_states,
            attentions=result.attentions,
        )

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMForInformationExtraction.__init__(config)

Initializes a new instance of the ErnieMForInformationExtraction class.

PARAMETER DESCRIPTION
self

The instance of the class.

config

An instance of the ErnieMConfig class containing the configuration parameters for the model.

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
def __init__(self, config):
    """
    Initializes a new instance of the ErnieMForInformationExtraction class.

    Args:
        self: The instance of the class.
        config: An instance of the ErnieMConfig class containing the configuration parameters for the model.

    Returns:
        None

    Raises:
        None
    """
    super().__init__(config)
    self.ernie_m = ErnieMModel(config)
    self.linear_start = nn.Linear(config.hidden_size, 1)
    self.linear_end = nn.Linear(config.hidden_size, 1)
    self.sigmoid = nn.Sigmoid()
    self.post_init()

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMForInformationExtraction.forward(input_ids=None, attention_mask=None, position_ids=None, head_mask=None, inputs_embeds=None, start_positions=None, end_positions=None, output_attentions=None, output_hidden_states=None, return_dict=True)

PARAMETER DESCRIPTION
start_positions

Labels for position (index) for computing the start_positions loss. Position outside of the sequence are not taken into account for computing the loss.

TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional* DEFAULT: None

end_positions

Labels for position (index) for computing the end_positions loss. Position outside of the sequence are not taken into account for computing the loss.

TYPE: `mindspore.Tensor` of shape `(batch_size,)`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    start_positions: Optional[mindspore.Tensor] = None,
    end_positions: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = True,
) -> Union[Tuple[mindspore.Tensor], QuestionAnsweringModelOutput]:
    r"""
    Args:
        start_positions (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
            Labels for position (index) for computing the start_positions loss. Position outside of the sequence are
            not taken into account for computing the loss.
        end_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for position (index) for computing the end_positions loss. Position outside of the sequence are not
            taken into account for computing the loss.
    """
    result = self.ernie_m(
        input_ids,
        attention_mask=attention_mask,
        position_ids=position_ids,
        head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )
    if return_dict:
        sequence_output = result.last_hidden_state
    elif not return_dict:
        sequence_output = result[0]

    start_logits = self.linear_start(sequence_output)
    start_logits = start_logits.squeeze(-1)
    start_prob = self.sigmoid(start_logits)
    end_logits = self.linear_end(sequence_output)
    end_logits = end_logits.squeeze(-1)
    end_prob = self.sigmoid(end_logits)

    total_loss = None
    if start_positions is not None and end_positions is not None:
        # If we are on multi-GPU, split add a dimension
        if len(start_positions.shape) > 1 and start_positions.shape[-1] == 1:
            start_positions = start_positions.squeeze(-1)
        if len(end_positions.shape) > 1 and end_positions.shape[-1] == 1:
            end_positions = end_positions.squeeze(-1)
        # sometimes the start/end positions are outside our model inputs, we ignore these terms
        ignored_index = start_logits.shape[1]
        start_positions = start_positions.clamp(0, ignored_index)
        end_positions = end_positions.clamp(0, ignored_index)

        start_loss = ops.binary_cross_entropy(start_prob, start_positions)
        end_loss = ops.binary_cross_entropy(end_prob, end_positions)
        total_loss = (start_loss + end_loss) / 2

    if not return_dict:
        return tuple(
            i
            for i in [total_loss, start_prob, end_prob, result.hidden_states, result.attentions]
            if i is not None
        )

    return QuestionAnsweringModelOutput(
        loss=total_loss,
        start_logits=start_prob,
        end_logits=end_prob,
        hidden_states=result.hidden_states,
        attentions=result.attentions,
    )

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMForMultipleChoice

Bases: ErnieMPreTrainedModel

ErnieMForMultipleChoice is a class that represents a multiple choice question answering model based on the ERNIE-M architecture. It inherits from ErnieMPreTrainedModel and implements methods for forwarding the model and computing the multiple choice classification loss.

ATTRIBUTE DESCRIPTION
ernie_m

The ERNIE-M model used for processing inputs.

TYPE: ErnieMModel

dropout

Dropout layer used in the classifier.

TYPE: Dropout

classifier

Dense layer for classification.

TYPE: Linear

METHOD DESCRIPTION
__init__

Initializes the ErnieMForMultipleChoice model with the given configuration.

forward

Constructs the model for multiple choice question answering and computes the classification loss.

The forward method takes various input tensors and parameters, processes them through the ERNIE-M model, applies dropout, and computes the classification logits. If labels are provided, it calculates the cross-entropy loss. The method returns the loss and model outputs based on the return_dict parameter.

This class is designed to be used for multiple choice question answering tasks with ERNIE-M models.

Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
class ErnieMForMultipleChoice(ErnieMPreTrainedModel):

    """
    ErnieMForMultipleChoice is a class that represents a multiple choice question answering model based on the
    ERNIE-M architecture.
    It inherits from ErnieMPreTrainedModel and implements methods for forwarding the model and computing the multiple
    choice classification loss.

    Attributes:
        ernie_m (ErnieMModel): The ERNIE-M model used for processing inputs.
        dropout (nn.Dropout): Dropout layer used in the classifier.
        classifier (nn.Linear): Dense layer for classification.

    Methods:
        __init__: Initializes the ErnieMForMultipleChoice model with the given configuration.
        forward: Constructs the model for multiple choice question answering and computes the classification loss.

    The forward method takes various input tensors and parameters, processes them through the ERNIE-M model,
    applies dropout, and computes the classification logits.
    If labels are provided, it calculates the cross-entropy loss. The method returns the loss and model outputs based on
    the return_dict parameter.

    This class is designed to be used for multiple choice question answering tasks with ERNIE-M models.
    """
    # Copied from transformers.models.bert.modeling_bert.BertForMultipleChoice.__init__ with Bert->ErnieM,bert->ernie_m
    def __init__(self, config):
        """
        Initializes an instance of the ErnieMForMultipleChoice class.

        Args:
            self: The object instance.
            config: An instance of the ErnieMConfig class containing the model configuration.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__(config)

        self.ernie_m = ErnieMModel(config)
        classifier_dropout = (
            config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
        )
        self.dropout = nn.Dropout(p=classifier_dropout)
        self.classifier = nn.Linear(config.hidden_size, 1)

        # Initialize weights and apply final processing
        self.post_init()

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        labels: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = True,
    ) -> Union[Tuple[mindspore.Tensor], MultipleChoiceModelOutput]:
        r"""
        Args:
            labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
                Labels for computing the multiple choice classification loss. Indices should be in `[0, ...,
                num_choices-1]` where `num_choices` is the size of the second dimension of the input tensors. (See
                `input_ids` above)
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
        num_choices = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1]

        input_ids = input_ids.view(-1, input_ids.shape[-1]) if input_ids is not None else None
        attention_mask = attention_mask.view(-1, attention_mask.shape[-1]) if attention_mask is not None else None
        position_ids = position_ids.view(-1, position_ids.shape[-1]) if position_ids is not None else None
        inputs_embeds = (
            inputs_embeds.view(-1, inputs_embeds.shape[-2], inputs_embeds.shape[-1])
            if inputs_embeds is not None
            else None
        )

        outputs = self.ernie_m(
            input_ids,
            attention_mask=attention_mask,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        pooled_output = outputs[1]

        pooled_output = self.dropout(pooled_output)
        logits = self.classifier(pooled_output)
        reshaped_logits = logits.view(-1, num_choices)

        loss = None
        if labels is not None:
            loss = ops.cross_entropy(reshaped_logits, labels)

        if not return_dict:
            output = (reshaped_logits,) + outputs[2:]
            return ((loss,) + output) if loss is not None else output

        return MultipleChoiceModelOutput(
            loss=loss,
            logits=reshaped_logits,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMForMultipleChoice.__init__(config)

Initializes an instance of the ErnieMForMultipleChoice class.

PARAMETER DESCRIPTION
self

The object instance.

config

An instance of the ErnieMConfig class containing the model configuration.

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
def __init__(self, config):
    """
    Initializes an instance of the ErnieMForMultipleChoice class.

    Args:
        self: The object instance.
        config: An instance of the ErnieMConfig class containing the model configuration.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__(config)

    self.ernie_m = ErnieMModel(config)
    classifier_dropout = (
        config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
    )
    self.dropout = nn.Dropout(p=classifier_dropout)
    self.classifier = nn.Linear(config.hidden_size, 1)

    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMForMultipleChoice.forward(input_ids=None, attention_mask=None, position_ids=None, head_mask=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=True)

PARAMETER DESCRIPTION
labels

Labels for computing the multiple choice classification loss. Indices should be in [0, ..., num_choices-1] where num_choices is the size of the second dimension of the input tensors. (See input_ids above)

TYPE: `mindspore.Tensor` of shape `(batch_size,)`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    labels: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = True,
) -> Union[Tuple[mindspore.Tensor], MultipleChoiceModelOutput]:
    r"""
    Args:
        labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for computing the multiple choice classification loss. Indices should be in `[0, ...,
            num_choices-1]` where `num_choices` is the size of the second dimension of the input tensors. (See
            `input_ids` above)
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict
    num_choices = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1]

    input_ids = input_ids.view(-1, input_ids.shape[-1]) if input_ids is not None else None
    attention_mask = attention_mask.view(-1, attention_mask.shape[-1]) if attention_mask is not None else None
    position_ids = position_ids.view(-1, position_ids.shape[-1]) if position_ids is not None else None
    inputs_embeds = (
        inputs_embeds.view(-1, inputs_embeds.shape[-2], inputs_embeds.shape[-1])
        if inputs_embeds is not None
        else None
    )

    outputs = self.ernie_m(
        input_ids,
        attention_mask=attention_mask,
        position_ids=position_ids,
        head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    pooled_output = outputs[1]

    pooled_output = self.dropout(pooled_output)
    logits = self.classifier(pooled_output)
    reshaped_logits = logits.view(-1, num_choices)

    loss = None
    if labels is not None:
        loss = ops.cross_entropy(reshaped_logits, labels)

    if not return_dict:
        output = (reshaped_logits,) + outputs[2:]
        return ((loss,) + output) if loss is not None else output

    return MultipleChoiceModelOutput(
        loss=loss,
        logits=reshaped_logits,
        hidden_states=outputs.hidden_states,
        attentions=outputs.attentions,
    )

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMForQuestionAnswering

Bases: ErnieMPreTrainedModel

ErnieMForQuestionAnswering is a class that represents a fine-tuned ErnieM model for question answering tasks. It is a subclass of ErnieMPreTrainedModel.

This class extends the functionality of the base ErnieM model by adding a question answering head on top of it. It takes as input the configuration of the model and initializes the necessary components. The class provides a method called 'forward' which performs the forward pass of the model for question answering.

The 'forward' method takes several input tensors such as 'input_ids', 'attention_mask', 'position_ids', 'head_mask', and 'inputs_embeds'. It also supports optional inputs like 'start_positions', 'end_positions', 'output_attentions', 'output_hidden_states', and 'return_dict'. The method returns the question answering model output, which includes the start and end logits, hidden states, attentions, and an optional total loss.

The 'forward' method internally calls the 'ernie_m' method of the base ErnieM model to obtain the sequence output. It then passes the sequence output through a dense layer 'qa_outputs' to get the logits for the start and end positions. The logits are then processed to obtain the final start and end logits. If 'start_positions' and 'end_positions' are provided, the method calculates the cross-entropy loss for the predicted logits and the provided positions. The total loss is computed as the average of the start and end losses.

The 'forward' method returns the model output in a structured manner based on the 'return_dict' parameter.

  • If 'return_dict' is False, the method returns a tuple containing the total loss, start logits, end logits, and any additional hidden states or attentions.
  • If 'return_dict' is True, the method returns an instance of the 'QuestionAnsweringModelOutput' class, which encapsulates the output elements as attributes.
Note
  • If 'start_positions' and 'end_positions' are not provided, the total loss will be None.
  • The start and end positions are clamped to the length of the sequence and positions outside the sequence are ignored for computing the loss.
Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
class ErnieMForQuestionAnswering(ErnieMPreTrainedModel):

    """
    ErnieMForQuestionAnswering is a class that represents a fine-tuned ErnieM model for question answering tasks.
    It is a subclass of ErnieMPreTrainedModel.

    This class extends the functionality of the base ErnieM model by adding a question answering head on top of it.
    It takes as input the configuration of the model and initializes the necessary components.
    The class provides a method called 'forward' which performs the forward pass of the model for question answering.

    The 'forward' method takes several input tensors such as 'input_ids', 'attention_mask', 'position_ids',
    'head_mask', and 'inputs_embeds'. It also supports optional inputs like 'start_positions', 'end_positions',
    'output_attentions', 'output_hidden_states', and 'return_dict'.
    The method returns the question answering model output, which includes the start and end logits, hidden states,
    attentions, and an optional total loss.

    The 'forward' method internally calls the 'ernie_m' method of the base ErnieM model to obtain the sequence output.
    It then passes the sequence output through a dense layer 'qa_outputs' to get the logits for the start and end
    positions. The logits are then processed to obtain the final start and end logits. If 'start_positions' and
    'end_positions' are provided, the method calculates the cross-entropy loss for the predicted logits and the provided
    positions. The total loss is computed as the average of the start and end losses.

    The 'forward' method returns the model output in a structured manner based on the 'return_dict' parameter.

    - If 'return_dict' is False, the method returns a tuple containing the total loss, start logits, end logits, and any
    additional hidden states or attentions.
    - If 'return_dict' is True, the method returns an instance of the 'QuestionAnsweringModelOutput' class, which
    encapsulates the output elements as attributes.

    Note:
        - If 'start_positions' and 'end_positions' are not provided, the total loss will be None.
        - The start and end positions are clamped to the length of the sequence and positions outside the sequence are
        ignored for computing the loss.

    """
    # Copied from transformers.models.bert.modeling_bert.BertForQuestionAnswering.__init__ with Bert->ErnieM,bert->ernie_m
    def __init__(self, config):
        """Initializes a new instance of the ErnieMForQuestionAnswering class.

        Args:
            self: The object itself.
            config: An instance of the ErnieMConfig class containing the model configuration.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__(config)
        self.num_labels = config.num_labels

        self.ernie_m = ErnieMModel(config, add_pooling_layer=False)
        self.qa_outputs = nn.Linear(config.hidden_size, config.num_labels)

        # Initialize weights and apply final processing
        self.post_init()

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        start_positions: Optional[mindspore.Tensor] = None,
        end_positions: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = True,
    ) -> Union[Tuple[mindspore.Tensor], QuestionAnsweringModelOutput]:
        r"""
        Args:
            start_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
                Labels for position (index) of the start of the labelled span for computing the token classification loss.
                Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
                are not taken into account for computing the loss.
            end_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
                Labels for position (index) of the end of the labelled span for computing the token classification loss.
                Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
                are not taken into account for computing the loss.
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        outputs = self.ernie_m(
            input_ids,
            attention_mask=attention_mask,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        sequence_output = outputs[0]

        logits = self.qa_outputs(sequence_output)
        start_logits, end_logits = logits.split(1, axis=-1)
        start_logits = start_logits.squeeze(-1)
        end_logits = end_logits.squeeze(-1)

        total_loss = None
        if start_positions is not None and end_positions is not None:
            # If we are on multi-GPU, split add a dimension
            if len(start_positions.shape) > 1:
                start_positions = start_positions.squeeze(-1)
            if len(end_positions.shape) > 1:
                end_positions = end_positions.squeeze(-1)
            # sometimes the start/end positions are outside our model inputs, we ignore these terms
            ignored_index = start_logits.shape[1]
            start_positions = start_positions.clamp(0, ignored_index)
            end_positions = end_positions.clamp(0, ignored_index)

            start_loss = ops.cross_entropy(start_logits, start_positions, ignore_index=ignored_index)
            end_loss = ops.cross_entropy(end_logits, end_positions, ignore_index=ignored_index)
            total_loss = (start_loss + end_loss) / 2

        if not return_dict:
            output = (start_logits, end_logits) + outputs[2:]
            return ((total_loss,) + output) if total_loss is not None else output

        return QuestionAnsweringModelOutput(
            loss=total_loss,
            start_logits=start_logits,
            end_logits=end_logits,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMForQuestionAnswering.__init__(config)

Initializes a new instance of the ErnieMForQuestionAnswering class.

PARAMETER DESCRIPTION
self

The object itself.

config

An instance of the ErnieMConfig class containing the model configuration.

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
def __init__(self, config):
    """Initializes a new instance of the ErnieMForQuestionAnswering class.

    Args:
        self: The object itself.
        config: An instance of the ErnieMConfig class containing the model configuration.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__(config)
    self.num_labels = config.num_labels

    self.ernie_m = ErnieMModel(config, add_pooling_layer=False)
    self.qa_outputs = nn.Linear(config.hidden_size, config.num_labels)

    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMForQuestionAnswering.forward(input_ids=None, attention_mask=None, position_ids=None, head_mask=None, inputs_embeds=None, start_positions=None, end_positions=None, output_attentions=None, output_hidden_states=None, return_dict=True)

PARAMETER DESCRIPTION
start_positions

Labels for position (index) of the start of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length). Position outside of the sequence are not taken into account for computing the loss.

TYPE: `mindspore.Tensor` of shape `(batch_size,)`, *optional* DEFAULT: None

end_positions

Labels for position (index) of the end of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length). Position outside of the sequence are not taken into account for computing the loss.

TYPE: `mindspore.Tensor` of shape `(batch_size,)`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    start_positions: Optional[mindspore.Tensor] = None,
    end_positions: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = True,
) -> Union[Tuple[mindspore.Tensor], QuestionAnsweringModelOutput]:
    r"""
    Args:
        start_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for position (index) of the start of the labelled span for computing the token classification loss.
            Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
            are not taken into account for computing the loss.
        end_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for position (index) of the end of the labelled span for computing the token classification loss.
            Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
            are not taken into account for computing the loss.
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    outputs = self.ernie_m(
        input_ids,
        attention_mask=attention_mask,
        position_ids=position_ids,
        head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    sequence_output = outputs[0]

    logits = self.qa_outputs(sequence_output)
    start_logits, end_logits = logits.split(1, axis=-1)
    start_logits = start_logits.squeeze(-1)
    end_logits = end_logits.squeeze(-1)

    total_loss = None
    if start_positions is not None and end_positions is not None:
        # If we are on multi-GPU, split add a dimension
        if len(start_positions.shape) > 1:
            start_positions = start_positions.squeeze(-1)
        if len(end_positions.shape) > 1:
            end_positions = end_positions.squeeze(-1)
        # sometimes the start/end positions are outside our model inputs, we ignore these terms
        ignored_index = start_logits.shape[1]
        start_positions = start_positions.clamp(0, ignored_index)
        end_positions = end_positions.clamp(0, ignored_index)

        start_loss = ops.cross_entropy(start_logits, start_positions, ignore_index=ignored_index)
        end_loss = ops.cross_entropy(end_logits, end_positions, ignore_index=ignored_index)
        total_loss = (start_loss + end_loss) / 2

    if not return_dict:
        output = (start_logits, end_logits) + outputs[2:]
        return ((total_loss,) + output) if total_loss is not None else output

    return QuestionAnsweringModelOutput(
        loss=total_loss,
        start_logits=start_logits,
        end_logits=end_logits,
        hidden_states=outputs.hidden_states,
        attentions=outputs.attentions,
    )

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMForSequenceClassification

Bases: ErnieMPreTrainedModel

ErnieMForSequenceClassification is a class that represents a fine-tuned ErnieM model for sequence classification tasks. It inherits from ErnieMPreTrainedModel and implements methods for initializing the model and forwarding predictions.

ATTRIBUTE DESCRIPTION
num_labels

Number of labels for sequence classification.

config

Configuration object for the model.

ernie_m

ErnieMModel instance for processing input sequences.

dropout

Dropout layer for regularization.

classifier

Dense layer for classification predictions.

METHOD DESCRIPTION
__init__

Initializes the ErnieMForSequenceClassification instance with the provided configuration.

forward

Constructs the model for making predictions on input sequences and computes the loss based on predicted labels.

Args:

  • input_ids (Optional[mindspore.Tensor]): Tensor of input token IDs.
  • attention_mask (Optional[mindspore.Tensor]): Tensor of attention masks.
  • position_ids (Optional[mindspore.Tensor]): Tensor of position IDs.
  • head_mask (Optional[mindspore.Tensor]): Tensor of head masks.
  • inputs_embeds (Optional[mindspore.Tensor]): Tensor of input embeddings.
  • past_key_values (Optional[List[mindspore.Tensor]]): List of past key values for caching.
  • use_cache (Optional[bool]): Flag for using caching.
  • output_hidden_states (Optional[bool]): Flag for outputting hidden states.
  • output_attentions (Optional[bool]): Flag for outputting attentions.
  • return_dict (Optional[bool]): Flag for returning output in a dictionary format.
  • labels (Optional[mindspore.Tensor]): Tensor of target labels for computing loss.

Returns:

  • Union[Tuple[mindspore.Tensor], SequenceClassifierOutput]: Tuple of model outputs and loss.

Raises:

  • ValueError: If the provided labels are not in the expected format or number.
Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
class ErnieMForSequenceClassification(ErnieMPreTrainedModel):

    """
    ErnieMForSequenceClassification is a class that represents a fine-tuned ErnieM model for sequence classification tasks.
    It inherits from ErnieMPreTrainedModel and implements methods for initializing the model and forwarding predictions.

    Attributes:
        num_labels: Number of labels for sequence classification.
        config: Configuration object for the model.
        ernie_m: ErnieMModel instance for processing input sequences.
        dropout: Dropout layer for regularization.
        classifier: Dense layer for classification predictions.

    Methods:
        __init__: Initializes the ErnieMForSequenceClassification instance with the provided configuration.
        forward:
            Constructs the model for making predictions on input sequences and computes the loss based on predicted labels.

            Args:

            - input_ids (Optional[mindspore.Tensor]): Tensor of input token IDs.
            - attention_mask (Optional[mindspore.Tensor]): Tensor of attention masks.
            - position_ids (Optional[mindspore.Tensor]): Tensor of position IDs.
            - head_mask (Optional[mindspore.Tensor]): Tensor of head masks.
            - inputs_embeds (Optional[mindspore.Tensor]): Tensor of input embeddings.
            - past_key_values (Optional[List[mindspore.Tensor]]): List of past key values for caching.
            - use_cache (Optional[bool]): Flag for using caching.
            - output_hidden_states (Optional[bool]): Flag for outputting hidden states.
            - output_attentions (Optional[bool]): Flag for outputting attentions.
            - return_dict (Optional[bool]): Flag for returning output in a dictionary format.
            - labels (Optional[mindspore.Tensor]): Tensor of target labels for computing loss.

            Returns:

            - Union[Tuple[mindspore.Tensor], SequenceClassifierOutput]: Tuple of model outputs and loss.

            Raises:

            - ValueError: If the provided labels are not in the expected format or number.
    """
    # Copied from transformers.models.bert.modeling_bert.BertForSequenceClassification.__init__ with Bert->ErnieM,bert->ernie_m
    def __init__(self, config):
        """
        Initializes an instance of the ErnieMForSequenceClassification class.

        Args:
            self: The instance of the class.
            config (object): The configuration object containing settings for the model initialization.
                It must have the following attributes:

                - num_labels (int): The number of labels for classification.
                - classifier_dropout (float, optional): The dropout probability for the classifier layer.
                If not provided, it defaults to the hidden dropout probability.
                - hidden_dropout_prob (float): The default hidden dropout probability.

        Returns:
            None.

        Raises:
            ValueError: If the config object is missing the num_labels attribute.
            TypeError: If the config object does not have the expected attributes or if their types are incorrect.
        """
        super().__init__(config)
        self.num_labels = config.num_labels
        self.config = config

        self.ernie_m = ErnieMModel(config)
        classifier_dropout = (
            config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
        )
        self.dropout = nn.Dropout(p=classifier_dropout)
        self.classifier = nn.Linear(config.hidden_size, config.num_labels)

        # Initialize weights and apply final processing
        self.post_init()

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        past_key_values: Optional[List[mindspore.Tensor]] = None,
        use_cache: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        output_attentions: Optional[bool] = None,
        return_dict: Optional[bool] = True,
        labels: Optional[mindspore.Tensor] = None,
    ) -> Union[Tuple[mindspore.Tensor], SequenceClassifierOutput]:
        r"""
        Args:
            labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
                Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
                config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
                `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        outputs = self.ernie_m(
            input_ids,
            attention_mask=attention_mask,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            past_key_values=past_key_values,
            output_hidden_states=output_hidden_states,
            output_attentions=output_attentions,
            return_dict=return_dict,
        )

        pooled_output = outputs[1]

        pooled_output = self.dropout(pooled_output)
        logits = self.classifier(pooled_output)

        loss = None
        if labels is not None:
            if self.config.problem_type is None:
                if self.num_labels == 1:
                    self.config.problem_type = "regression"
                elif self.num_labels > 1 and labels.dtype in (mindspore.int64, mindspore.int32):
                    self.config.problem_type = "single_label_classification"
                else:
                    self.config.problem_type = "multi_label_classification"

            if self.config.problem_type == "regression":
                if self.num_labels == 1:
                    loss = ops.mse_loss(logits.squeeze(), labels.squeeze())
                else:
                    loss = ops.mse_loss(logits, labels)
            elif self.config.problem_type == "single_label_classification":
                loss = ops.cross_entropy(logits.view(-1, self.num_labels), labels.view(-1))
            elif self.config.problem_type == "multi_label_classification":
                loss = ops.binary_cross_entropy_with_logits(logits, labels)
        if not return_dict:
            output = (logits,) + outputs[2:]
            return ((loss,) + output) if loss is not None else output

        return SequenceClassifierOutput(
            loss=loss,
            logits=logits,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMForSequenceClassification.__init__(config)

Initializes an instance of the ErnieMForSequenceClassification class.

PARAMETER DESCRIPTION
self

The instance of the class.

config

The configuration object containing settings for the model initialization. It must have the following attributes:

  • num_labels (int): The number of labels for classification.
  • classifier_dropout (float, optional): The dropout probability for the classifier layer. If not provided, it defaults to the hidden dropout probability.
  • hidden_dropout_prob (float): The default hidden dropout probability.

TYPE: object

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
ValueError

If the config object is missing the num_labels attribute.

TypeError

If the config object does not have the expected attributes or if their types are incorrect.

Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
def __init__(self, config):
    """
    Initializes an instance of the ErnieMForSequenceClassification class.

    Args:
        self: The instance of the class.
        config (object): The configuration object containing settings for the model initialization.
            It must have the following attributes:

            - num_labels (int): The number of labels for classification.
            - classifier_dropout (float, optional): The dropout probability for the classifier layer.
            If not provided, it defaults to the hidden dropout probability.
            - hidden_dropout_prob (float): The default hidden dropout probability.

    Returns:
        None.

    Raises:
        ValueError: If the config object is missing the num_labels attribute.
        TypeError: If the config object does not have the expected attributes or if their types are incorrect.
    """
    super().__init__(config)
    self.num_labels = config.num_labels
    self.config = config

    self.ernie_m = ErnieMModel(config)
    classifier_dropout = (
        config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
    )
    self.dropout = nn.Dropout(p=classifier_dropout)
    self.classifier = nn.Linear(config.hidden_size, config.num_labels)

    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMForSequenceClassification.forward(input_ids=None, attention_mask=None, position_ids=None, head_mask=None, inputs_embeds=None, past_key_values=None, use_cache=None, output_hidden_states=None, output_attentions=None, return_dict=True, labels=None)

PARAMETER DESCRIPTION
labels

Labels for computing the sequence classification/regression loss. Indices should be in [0, ..., config.num_labels - 1]. If config.num_labels == 1 a regression loss is computed (Mean-Square loss), If config.num_labels > 1 a classification loss is computed (Cross-Entropy).

TYPE: `mindspore.Tensor` of shape `(batch_size,)`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    past_key_values: Optional[List[mindspore.Tensor]] = None,
    use_cache: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    output_attentions: Optional[bool] = None,
    return_dict: Optional[bool] = True,
    labels: Optional[mindspore.Tensor] = None,
) -> Union[Tuple[mindspore.Tensor], SequenceClassifierOutput]:
    r"""
    Args:
        labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
            config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
            `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    outputs = self.ernie_m(
        input_ids,
        attention_mask=attention_mask,
        position_ids=position_ids,
        head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        past_key_values=past_key_values,
        output_hidden_states=output_hidden_states,
        output_attentions=output_attentions,
        return_dict=return_dict,
    )

    pooled_output = outputs[1]

    pooled_output = self.dropout(pooled_output)
    logits = self.classifier(pooled_output)

    loss = None
    if labels is not None:
        if self.config.problem_type is None:
            if self.num_labels == 1:
                self.config.problem_type = "regression"
            elif self.num_labels > 1 and labels.dtype in (mindspore.int64, mindspore.int32):
                self.config.problem_type = "single_label_classification"
            else:
                self.config.problem_type = "multi_label_classification"

        if self.config.problem_type == "regression":
            if self.num_labels == 1:
                loss = ops.mse_loss(logits.squeeze(), labels.squeeze())
            else:
                loss = ops.mse_loss(logits, labels)
        elif self.config.problem_type == "single_label_classification":
            loss = ops.cross_entropy(logits.view(-1, self.num_labels), labels.view(-1))
        elif self.config.problem_type == "multi_label_classification":
            loss = ops.binary_cross_entropy_with_logits(logits, labels)
    if not return_dict:
        output = (logits,) + outputs[2:]
        return ((loss,) + output) if loss is not None else output

    return SequenceClassifierOutput(
        loss=loss,
        logits=logits,
        hidden_states=outputs.hidden_states,
        attentions=outputs.attentions,
    )

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMForTokenClassification

Bases: ErnieMPreTrainedModel

This class represents a fine-tuned ErnieM model for token classification tasks. It inherits from the ErnieMPreTrainedModel class.

The ErnieMForTokenClassification class implements the necessary methods and attributes for token classification tasks. It takes a configuration object as input during initialization and sets up the model architecture accordingly. The model consists of an ErnieMModel instance, a dropout layer, and a classifier layer.

METHOD DESCRIPTION
__init__

Initializes the ErnieMForTokenClassification instance with the given configuration. It sets the number of labels, creates an ErnieMModel object, initializes the dropout layer, and creates the classifier layer.

forward

Constructs the forward pass of the model. It takes various input tensors and returns the token classification output. Optionally, it can also compute the token classification loss if labels are provided.

ATTRIBUTE DESCRIPTION
num_labels

The number of possible labels for the token classification task.

Example
>>> config = ErnieMConfig()
>>> model = ErnieMForTokenClassification(config)
>>> input_ids = ...
>>> attention_mask = ...
>>> output = model.forward(input_ids=input_ids, attention_mask=attention_mask)
Note

It is important to provide the input tensors in the correct shape and format to ensure proper model functioning.

Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
class ErnieMForTokenClassification(ErnieMPreTrainedModel):

    """
    This class represents a fine-tuned ErnieM model for token classification tasks. It inherits from the ErnieMPreTrainedModel class.

    The ErnieMForTokenClassification class implements the necessary methods and attributes for token classification tasks.
    It takes a configuration object as input during initialization and sets up the model architecture accordingly.
    The model consists of an ErnieMModel instance, a dropout layer, and a classifier layer.

    Methods:
        __init__: Initializes the ErnieMForTokenClassification instance with the given configuration.
            It sets the number of labels, creates an ErnieMModel object, initializes the dropout layer, and
            creates the classifier layer.

        forward: Constructs the forward pass of the model. It takes various input tensors and returns the token
            classification output. Optionally, it can also compute the token classification loss if labels are provided.

    Attributes:
        num_labels: The number of possible labels for the token classification task.

    Example:
        ```python
        >>> config = ErnieMConfig()
        >>> model = ErnieMForTokenClassification(config)
        >>> input_ids = ...
        >>> attention_mask = ...
        >>> output = model.forward(input_ids=input_ids, attention_mask=attention_mask)
        ```

    Note:
        It is important to provide the input tensors in the correct shape and format to ensure proper model functioning.
    """
    # Copied from transformers.models.bert.modeling_bert.BertForTokenClassification.__init__ with Bert->ErnieM,bert->ernie_m
    def __init__(self, config):
        """
        Initializes an instance of the ErnieMForTokenClassification class.

        Args:
            self: The instance of the ErnieMForTokenClassification class.
            config: An instance of the configuration class containing the model configuration settings.

        Returns:
            None

        Raises:
            None.
        """
        super().__init__(config)
        self.num_labels = config.num_labels

        self.ernie_m = ErnieMModel(config, add_pooling_layer=False)
        classifier_dropout = (
            config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
        )
        self.dropout = nn.Dropout(p=classifier_dropout)
        self.classifier = nn.Linear(config.hidden_size, config.num_labels)

        # Initialize weights and apply final processing
        self.post_init()

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        past_key_values: Optional[List[mindspore.Tensor]] = None,
        output_hidden_states: Optional[bool] = None,
        output_attentions: Optional[bool] = None,
        return_dict: Optional[bool] = True,
        labels: Optional[mindspore.Tensor] = None,
    ) -> Union[Tuple[mindspore.Tensor], TokenClassifierOutput]:
        r"""
        Args:
            labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
                Labels for computing the token classification loss. Indices should be in `[0, ..., config.num_labels - 1]`.
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        outputs = self.ernie_m(
            input_ids,
            attention_mask=attention_mask,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            past_key_values=past_key_values,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        sequence_output = outputs[0]

        sequence_output = self.dropout(sequence_output)
        logits = self.classifier(sequence_output)

        loss = None
        if labels is not None:
            loss = ops.cross_entropy(logits.view(-1, self.num_labels), labels.view(-1))

        if not return_dict:
            output = (logits,) + outputs[2:]
            return ((loss,) + output) if loss is not None else output

        return TokenClassifierOutput(
            loss=loss,
            logits=logits,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMForTokenClassification.__init__(config)

Initializes an instance of the ErnieMForTokenClassification class.

PARAMETER DESCRIPTION
self

The instance of the ErnieMForTokenClassification class.

config

An instance of the configuration class containing the model configuration settings.

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
def __init__(self, config):
    """
    Initializes an instance of the ErnieMForTokenClassification class.

    Args:
        self: The instance of the ErnieMForTokenClassification class.
        config: An instance of the configuration class containing the model configuration settings.

    Returns:
        None

    Raises:
        None.
    """
    super().__init__(config)
    self.num_labels = config.num_labels

    self.ernie_m = ErnieMModel(config, add_pooling_layer=False)
    classifier_dropout = (
        config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
    )
    self.dropout = nn.Dropout(p=classifier_dropout)
    self.classifier = nn.Linear(config.hidden_size, config.num_labels)

    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMForTokenClassification.forward(input_ids=None, attention_mask=None, position_ids=None, head_mask=None, inputs_embeds=None, past_key_values=None, output_hidden_states=None, output_attentions=None, return_dict=True, labels=None)

PARAMETER DESCRIPTION
labels

Labels for computing the token classification loss. Indices should be in [0, ..., config.num_labels - 1].

TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    past_key_values: Optional[List[mindspore.Tensor]] = None,
    output_hidden_states: Optional[bool] = None,
    output_attentions: Optional[bool] = None,
    return_dict: Optional[bool] = True,
    labels: Optional[mindspore.Tensor] = None,
) -> Union[Tuple[mindspore.Tensor], TokenClassifierOutput]:
    r"""
    Args:
        labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
            Labels for computing the token classification loss. Indices should be in `[0, ..., config.num_labels - 1]`.
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    outputs = self.ernie_m(
        input_ids,
        attention_mask=attention_mask,
        position_ids=position_ids,
        head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        past_key_values=past_key_values,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    sequence_output = outputs[0]

    sequence_output = self.dropout(sequence_output)
    logits = self.classifier(sequence_output)

    loss = None
    if labels is not None:
        loss = ops.cross_entropy(logits.view(-1, self.num_labels), labels.view(-1))

    if not return_dict:
        output = (logits,) + outputs[2:]
        return ((loss,) + output) if loss is not None else output

    return TokenClassifierOutput(
        loss=loss,
        logits=logits,
        hidden_states=outputs.hidden_states,
        attentions=outputs.attentions,
    )

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMModel

Bases: ErnieMPreTrainedModel

This class represents an ERNIE-M (Enhanced Representation through kNowledge Integration) model for multi-purpose pre-training and fine-tuning on downstream tasks. It incorporates ERNIE-M embeddings, encoder, and optional pooling layer. The class provides methods for initializing, getting and setting input embeddings, pruning model heads, and forwarding the model with various input and output options. The class inherits from ErnieMPreTrainedModel and extends its functionality to support specific ERNIE-M model architecture and operations.

Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
class ErnieMModel(ErnieMPreTrainedModel):

    """
    This class represents an ERNIE-M (Enhanced Representation through kNowledge Integration) model for multi-purpose
    pre-training and fine-tuning on downstream tasks. It incorporates ERNIE-M embeddings, encoder, and optional pooling
    layer. The class provides methods for initializing, getting and setting input embeddings, pruning model heads,
    and forwarding the model with various input and output options.
    The class inherits from ErnieMPreTrainedModel and extends its functionality to support specific ERNIE-M model
    architecture and operations.
    """
    def __init__(self, config, add_pooling_layer=True):
        """
        Initializes the ErnieMModel.

        Args:
            self: The instance of the class.
            config (object): The configuration object containing model settings.
            add_pooling_layer (bool): A flag indicating whether to add a pooling layer to the model.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__(config)
        self.initializer_range = config.initializer_range
        self.embeddings = ErnieMEmbeddings(config)
        self.encoder = ErnieMEncoder(config)
        self.pooler = ErnieMPooler(config) if add_pooling_layer else None
        self.post_init()

    def get_input_embeddings(self):
        """
        This method returns the input embeddings from the ErnieMModel.

        Args:
            self: ErnieMModel object. The instance of the ErnieMModel class.

        Returns:
            word_embeddings: The method returns the input embeddings from the ErnieMModel.

        Raises:
            None.
        """
        return self.embeddings.word_embeddings

    def set_input_embeddings(self, value):
        """
        Set the input embeddings for the ErnieMModel.

        Args:
            self (ErnieMModel): The instance of the ErnieMModel class.
            value: The input embeddings value to be set. It should be a tensor representing the input embeddings.

        Returns:
            None.

        Raises:
            None.
        """
        self.embeddings.word_embeddings = value

    def _prune_heads(self, heads_to_prune):
        """
        Prunes heads of the model. heads_to_prune: dict of {layer_num: list of heads to prune in this layer} See base
        class PreTrainedModel
        """
        for layer, heads in heads_to_prune.items():
            self.encoder.layers[layer].self_attn.prune_heads(heads)

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        past_key_values: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
        use_cache: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        output_attentions: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple[mindspore.Tensor], BaseModelOutputWithPoolingAndCrossAttentions]:
        """
        Constructs the ERNIE-M model.

        Args:
            self: The object instance.
            input_ids (Optional[mindspore.Tensor]): The input tensor of token indices. Default is None.
            position_ids (Optional[mindspore.Tensor]): The tensor indicating the position of tokens. Default is None.
            attention_mask (Optional[mindspore.Tensor]):
                The tensor indicating which elements in the input do not need to be attended to. Default is None.
            head_mask (Optional[mindspore.Tensor]):
                The tensor indicating the heads in the multi-head attention layer to be masked. Default is None.
            inputs_embeds (Optional[mindspore.Tensor]): The input embeddings. Default is None.
            past_key_values (Optional[Tuple[Tuple[mindspore.Tensor]]]): The previous key values. Default is None.
            use_cache (Optional[bool]): Whether to use the cache. Default is None.
            output_hidden_states (Optional[bool]): Whether to output the hidden states. Default is None.
            output_attentions (Optional[bool]): Whether to output the attentions. Default is None.
            return_dict (Optional[bool]): Whether to return a dictionary. Default is None.

        Returns:
            Union[Tuple[mindspore.Tensor], BaseModelOutputWithPoolingAndCrossAttentions]:
                Depending on the value of `return_dict`, returns a tuple of tensors including the last hidden state and
                the pooler output, or a BaseModelOutputWithPoolingAndCrossAttentions object.

        Raises:
            ValueError: If both `input_ids` and `inputs_embeds` are specified.
        """
        if input_ids is not None and inputs_embeds is not None:
            raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time.")

        # init the default bool value
        output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
        output_hidden_states = (
            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
        )
        return_dict = return_dict if return_dict is not None else self.config.return_dict

        head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)

        past_key_values_length = 0
        if past_key_values is not None:
            past_key_values_length = past_key_values[0][0].shape[2]

        # Adapted from paddlenlp.transformers.ernie_m.ErnieMModel
        if attention_mask is None:
            attention_mask = (input_ids == 0).to(self.dtype)
            attention_mask *= mindspore.tensor(np.finfo(mindspore.dtype_to_nptype(attention_mask.dtype)).min, attention_mask.dtype)
            if past_key_values is not None:
                batch_size = past_key_values[0][0].shape[0]
                past_mask = ops.zeros([batch_size, 1, 1, past_key_values_length], dtype=attention_mask.dtype)
                attention_mask = ops.concat([past_mask, attention_mask], axis=-1)
        # For 2D attention_mask from tokenizer
        elif attention_mask.ndim == 2:
            attention_mask = attention_mask.to(self.dtype)
            attention_mask = 1.0 - attention_mask
            attention_mask *= mindspore.tensor(np.finfo(mindspore.dtype_to_nptype(attention_mask.dtype)).min, attention_mask.dtype)

        extended_attention_mask = attention_mask.unsqueeze(1).unsqueeze(1)

        embedding_output = self.embeddings(
            input_ids=input_ids,
            position_ids=position_ids,
            inputs_embeds=inputs_embeds,
            past_key_values_length=past_key_values_length,
        )
        encoder_outputs = self.encoder(
            embedding_output,
            attention_mask=extended_attention_mask,
            head_mask=head_mask,
            past_key_values=past_key_values,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        if not return_dict:
            sequence_output = encoder_outputs[0]
            pooler_output = self.pooler(sequence_output) if self.pooler is not None else None
            return (sequence_output, pooler_output) + encoder_outputs[1:]

        sequence_output = encoder_outputs["last_hidden_state"]
        pooler_output = self.pooler(sequence_output) if self.pooler is not None else None
        hidden_states = None if not output_hidden_states else encoder_outputs["hidden_states"]
        attentions = None if not output_attentions else encoder_outputs["attentions"]

        return BaseModelOutputWithPoolingAndCrossAttentions(
            last_hidden_state=sequence_output,
            pooler_output=pooler_output,
            hidden_states=hidden_states,
            attentions=attentions,
        )

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMModel.__init__(config, add_pooling_layer=True)

Initializes the ErnieMModel.

PARAMETER DESCRIPTION
self

The instance of the class.

config

The configuration object containing model settings.

TYPE: object

add_pooling_layer

A flag indicating whether to add a pooling layer to the model.

TYPE: bool DEFAULT: True

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
def __init__(self, config, add_pooling_layer=True):
    """
    Initializes the ErnieMModel.

    Args:
        self: The instance of the class.
        config (object): The configuration object containing model settings.
        add_pooling_layer (bool): A flag indicating whether to add a pooling layer to the model.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__(config)
    self.initializer_range = config.initializer_range
    self.embeddings = ErnieMEmbeddings(config)
    self.encoder = ErnieMEncoder(config)
    self.pooler = ErnieMPooler(config) if add_pooling_layer else None
    self.post_init()

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMModel.forward(input_ids=None, position_ids=None, attention_mask=None, head_mask=None, inputs_embeds=None, past_key_values=None, use_cache=None, output_hidden_states=None, output_attentions=None, return_dict=None)

Constructs the ERNIE-M model.

PARAMETER DESCRIPTION
self

The object instance.

input_ids

The input tensor of token indices. Default is None.

TYPE: Optional[Tensor] DEFAULT: None

position_ids

The tensor indicating the position of tokens. Default is None.

TYPE: Optional[Tensor] DEFAULT: None

attention_mask

The tensor indicating which elements in the input do not need to be attended to. Default is None.

TYPE: Optional[Tensor] DEFAULT: None

head_mask

The tensor indicating the heads in the multi-head attention layer to be masked. Default is None.

TYPE: Optional[Tensor] DEFAULT: None

inputs_embeds

The input embeddings. Default is None.

TYPE: Optional[Tensor] DEFAULT: None

past_key_values

The previous key values. Default is None.

TYPE: Optional[Tuple[Tuple[Tensor]]] DEFAULT: None

use_cache

Whether to use the cache. Default is None.

TYPE: Optional[bool] DEFAULT: None

output_hidden_states

Whether to output the hidden states. Default is None.

TYPE: Optional[bool] DEFAULT: None

output_attentions

Whether to output the attentions. Default is None.

TYPE: Optional[bool] DEFAULT: None

return_dict

Whether to return a dictionary. Default is None.

TYPE: Optional[bool] DEFAULT: None

RETURNS DESCRIPTION
Union[Tuple[Tensor], BaseModelOutputWithPoolingAndCrossAttentions]

Union[Tuple[mindspore.Tensor], BaseModelOutputWithPoolingAndCrossAttentions]: Depending on the value of return_dict, returns a tuple of tensors including the last hidden state and the pooler output, or a BaseModelOutputWithPoolingAndCrossAttentions object.

RAISES DESCRIPTION
ValueError

If both input_ids and inputs_embeds are specified.

Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    past_key_values: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
    use_cache: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    output_attentions: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[Tuple[mindspore.Tensor], BaseModelOutputWithPoolingAndCrossAttentions]:
    """
    Constructs the ERNIE-M model.

    Args:
        self: The object instance.
        input_ids (Optional[mindspore.Tensor]): The input tensor of token indices. Default is None.
        position_ids (Optional[mindspore.Tensor]): The tensor indicating the position of tokens. Default is None.
        attention_mask (Optional[mindspore.Tensor]):
            The tensor indicating which elements in the input do not need to be attended to. Default is None.
        head_mask (Optional[mindspore.Tensor]):
            The tensor indicating the heads in the multi-head attention layer to be masked. Default is None.
        inputs_embeds (Optional[mindspore.Tensor]): The input embeddings. Default is None.
        past_key_values (Optional[Tuple[Tuple[mindspore.Tensor]]]): The previous key values. Default is None.
        use_cache (Optional[bool]): Whether to use the cache. Default is None.
        output_hidden_states (Optional[bool]): Whether to output the hidden states. Default is None.
        output_attentions (Optional[bool]): Whether to output the attentions. Default is None.
        return_dict (Optional[bool]): Whether to return a dictionary. Default is None.

    Returns:
        Union[Tuple[mindspore.Tensor], BaseModelOutputWithPoolingAndCrossAttentions]:
            Depending on the value of `return_dict`, returns a tuple of tensors including the last hidden state and
            the pooler output, or a BaseModelOutputWithPoolingAndCrossAttentions object.

    Raises:
        ValueError: If both `input_ids` and `inputs_embeds` are specified.
    """
    if input_ids is not None and inputs_embeds is not None:
        raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time.")

    # init the default bool value
    output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
    output_hidden_states = (
        output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
    )
    return_dict = return_dict if return_dict is not None else self.config.return_dict

    head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)

    past_key_values_length = 0
    if past_key_values is not None:
        past_key_values_length = past_key_values[0][0].shape[2]

    # Adapted from paddlenlp.transformers.ernie_m.ErnieMModel
    if attention_mask is None:
        attention_mask = (input_ids == 0).to(self.dtype)
        attention_mask *= mindspore.tensor(np.finfo(mindspore.dtype_to_nptype(attention_mask.dtype)).min, attention_mask.dtype)
        if past_key_values is not None:
            batch_size = past_key_values[0][0].shape[0]
            past_mask = ops.zeros([batch_size, 1, 1, past_key_values_length], dtype=attention_mask.dtype)
            attention_mask = ops.concat([past_mask, attention_mask], axis=-1)
    # For 2D attention_mask from tokenizer
    elif attention_mask.ndim == 2:
        attention_mask = attention_mask.to(self.dtype)
        attention_mask = 1.0 - attention_mask
        attention_mask *= mindspore.tensor(np.finfo(mindspore.dtype_to_nptype(attention_mask.dtype)).min, attention_mask.dtype)

    extended_attention_mask = attention_mask.unsqueeze(1).unsqueeze(1)

    embedding_output = self.embeddings(
        input_ids=input_ids,
        position_ids=position_ids,
        inputs_embeds=inputs_embeds,
        past_key_values_length=past_key_values_length,
    )
    encoder_outputs = self.encoder(
        embedding_output,
        attention_mask=extended_attention_mask,
        head_mask=head_mask,
        past_key_values=past_key_values,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    if not return_dict:
        sequence_output = encoder_outputs[0]
        pooler_output = self.pooler(sequence_output) if self.pooler is not None else None
        return (sequence_output, pooler_output) + encoder_outputs[1:]

    sequence_output = encoder_outputs["last_hidden_state"]
    pooler_output = self.pooler(sequence_output) if self.pooler is not None else None
    hidden_states = None if not output_hidden_states else encoder_outputs["hidden_states"]
    attentions = None if not output_attentions else encoder_outputs["attentions"]

    return BaseModelOutputWithPoolingAndCrossAttentions(
        last_hidden_state=sequence_output,
        pooler_output=pooler_output,
        hidden_states=hidden_states,
        attentions=attentions,
    )

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMModel.get_input_embeddings()

This method returns the input embeddings from the ErnieMModel.

PARAMETER DESCRIPTION
self

ErnieMModel object. The instance of the ErnieMModel class.

RETURNS DESCRIPTION
word_embeddings

The method returns the input embeddings from the ErnieMModel.

Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
832
833
834
835
836
837
838
839
840
841
842
843
844
845
def get_input_embeddings(self):
    """
    This method returns the input embeddings from the ErnieMModel.

    Args:
        self: ErnieMModel object. The instance of the ErnieMModel class.

    Returns:
        word_embeddings: The method returns the input embeddings from the ErnieMModel.

    Raises:
        None.
    """
    return self.embeddings.word_embeddings

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMModel.set_input_embeddings(value)

Set the input embeddings for the ErnieMModel.

PARAMETER DESCRIPTION
self

The instance of the ErnieMModel class.

TYPE: ErnieMModel

value

The input embeddings value to be set. It should be a tensor representing the input embeddings.

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
def set_input_embeddings(self, value):
    """
    Set the input embeddings for the ErnieMModel.

    Args:
        self (ErnieMModel): The instance of the ErnieMModel class.
        value: The input embeddings value to be set. It should be a tensor representing the input embeddings.

    Returns:
        None.

    Raises:
        None.
    """
    self.embeddings.word_embeddings = value

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMPooler

Bases: Module

This class represents the MPooler module of the ERNIE model, which is responsible for pooling the hidden states to obtain a single representation of the input sequence.

Inherits from

nn.Module

ATTRIBUTE DESCRIPTION
dense

A fully connected layer that projects the input hidden states to a new hidden size.

TYPE: Linear

activation

The activation function applied to the projected hidden states.

TYPE: Tanh

METHOD DESCRIPTION
__init__

Initializes the ERNIE MPooler module.

forward

Constructs the MPooler module by pooling the hidden states.

Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
class ErnieMPooler(nn.Module):
    """
    This class represents the MPooler module of the ERNIE model, which is responsible for pooling the hidden states to
    obtain a single representation of the input sequence.

    Inherits from:
        nn.Module

    Attributes:
        dense (nn.Linear): A fully connected layer that projects the input hidden states to a new hidden size.
        activation (nn.Tanh): The activation function applied to the projected hidden states.

    Methods:
        __init__(config): Initializes the ERNIE MPooler module.
        forward(hidden_states): Constructs the MPooler module by pooling the hidden states.

    """
    def __init__(self, config):
        """
        Initializes a new instance of the ErnieMPooler class.

        Args:
            self: The object instance.
            config: An instance of the configuration class used to configure the ErnieMPooler.
                It provides various settings and parameters for the ErnieMPooler's behavior. This parameter is required.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()
        self.dense = nn.Linear(config.hidden_size, config.hidden_size)
        self.activation = nn.Tanh()

    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
        """
        Constructs the pooled output tensor for the ERNIE model.

        Args:
            self (ErnieMPooler): An instance of the ErnieMPooler class.
            hidden_states (mindspore.Tensor): A tensor containing the hidden states from the ERNIE model.
                It should have the shape (batch_size, sequence_length, hidden_size) where:

                - batch_size: The number of sequences in the batch.
                - sequence_length: The length of each input sequence.
                - hidden_size: The size of the hidden state vectors.

        Returns:
            mindspore.Tensor: A tensor representing the pooled output of the ERNIE model.
                The pooled output is obtained by applying dense and activation layers to the first token tensor
                extracted from the hidden states tensor.

        Raises:
            None
        """
        # We "pool" the model by simply taking the hidden state corresponding
        # to the first token.
        first_token_tensor = hidden_states[:, 0]
        pooled_output = self.dense(first_token_tensor)
        pooled_output = self.activation(pooled_output)
        return pooled_output

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMPooler.__init__(config)

Initializes a new instance of the ErnieMPooler class.

PARAMETER DESCRIPTION
self

The object instance.

config

An instance of the configuration class used to configure the ErnieMPooler. It provides various settings and parameters for the ErnieMPooler's behavior. This parameter is required.

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
def __init__(self, config):
    """
    Initializes a new instance of the ErnieMPooler class.

    Args:
        self: The object instance.
        config: An instance of the configuration class used to configure the ErnieMPooler.
            It provides various settings and parameters for the ErnieMPooler's behavior. This parameter is required.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()
    self.dense = nn.Linear(config.hidden_size, config.hidden_size)
    self.activation = nn.Tanh()

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMPooler.forward(hidden_states)

Constructs the pooled output tensor for the ERNIE model.

PARAMETER DESCRIPTION
self

An instance of the ErnieMPooler class.

TYPE: ErnieMPooler

hidden_states

A tensor containing the hidden states from the ERNIE model. It should have the shape (batch_size, sequence_length, hidden_size) where:

  • batch_size: The number of sequences in the batch.
  • sequence_length: The length of each input sequence.
  • hidden_size: The size of the hidden state vectors.

TYPE: Tensor

RETURNS DESCRIPTION
Tensor

mindspore.Tensor: A tensor representing the pooled output of the ERNIE model. The pooled output is obtained by applying dense and activation layers to the first token tensor extracted from the hidden states tensor.

Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
    """
    Constructs the pooled output tensor for the ERNIE model.

    Args:
        self (ErnieMPooler): An instance of the ErnieMPooler class.
        hidden_states (mindspore.Tensor): A tensor containing the hidden states from the ERNIE model.
            It should have the shape (batch_size, sequence_length, hidden_size) where:

            - batch_size: The number of sequences in the batch.
            - sequence_length: The length of each input sequence.
            - hidden_size: The size of the hidden state vectors.

    Returns:
        mindspore.Tensor: A tensor representing the pooled output of the ERNIE model.
            The pooled output is obtained by applying dense and activation layers to the first token tensor
            extracted from the hidden states tensor.

    Raises:
        None
    """
    # We "pool" the model by simply taking the hidden state corresponding
    # to the first token.
    first_token_tensor = hidden_states[:, 0]
    pooled_output = self.dense(first_token_tensor)
    pooled_output = self.activation(pooled_output)
    return pooled_output

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMPreTrainedModel

Bases: PreTrainedModel

An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained models.

Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
class ErnieMPreTrainedModel(PreTrainedModel):
    """
    An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
    models.
    """
    config_class = ErnieMConfig
    base_model_prefix = "ernie_m"

    def _init_weights(self, cell):
        """Initialize the weights"""
        if isinstance(cell, nn.Linear):
            # Slightly different from the TF version which uses truncated_normal for initialization
            # cf https://github.com/pytorch/pytorch/pull/5617
            cell.weight.set_data(initializer(Normal(self.config.initializer_range),
                                                    cell.weight.shape, cell.weight.dtype))
            if cell.bias:
                cell.bias.set_data(initializer('zeros', cell.bias.shape, cell.bias.dtype))
        elif isinstance(cell, nn.Embedding):
            weight = np.random.normal(0.0, self.config.initializer_range, cell.weight.shape)
            if cell.padding_idx:
                weight[cell.padding_idx] = 0

            cell.weight.set_data(Tensor(weight, cell.weight.dtype))
        elif isinstance(cell, nn.LayerNorm):
            cell.weight.set_data(initializer('ones', cell.weight.shape, cell.weight.dtype))
            cell.bias.set_data(initializer('zeros', cell.bias.shape, cell.bias.dtype))

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMSelfAttention

Bases: Module

A module that implements the self-attention mechanism used in ERNIE model.

This module contains the ErnieMSelfAttention class, which represents the self-attention mechanism used in the ERNIE model. It is a subclass of nn.Module and is responsible for calculating the attention scores and producing the context layer.

ATTRIBUTE DESCRIPTION
num_attention_heads

The number of attention heads in the self-attention mechanism.

TYPE: int

attention_head_size

The size of each attention head.

TYPE: int

all_head_size

The total size of all attention heads combined.

TYPE: int

q_proj

The projection layer for the query tensor.

TYPE: Linear

k_proj

The projection layer for the key tensor.

TYPE: Linear

v_proj

The projection layer for the value tensor.

TYPE: Linear

dropout

The dropout layer applied to the attention probabilities.

TYPE: Dropout

position_embedding_type

The type of position embedding used in the attention mechanism.

TYPE: str

distance_embedding

The embedding layer for computing relative positions in the attention scores.

TYPE: Embedding

is_decoder

Whether the self-attention mechanism is used in a decoder module.

TYPE: bool

METHOD DESCRIPTION
transpose_for_scores

Reshapes the input tensor for calculating attention scores.

forward

Constructs the self-attention mechanism by calculating attention scores and producing the context layer.

Example
>>> config = ErnieConfig(hidden_size=768, num_attention_heads=12, attention_probs_dropout_prob=0.1)
>>> self_attention = ErnieMSelfAttention(config)
Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
class ErnieMSelfAttention(nn.Module):
    """
    A module that implements the self-attention mechanism used in ERNIE model.

    This module contains the `ErnieMSelfAttention` class, which represents the self-attention mechanism used in the
    ERNIE model. It is a subclass of `nn.Module` and is responsible for calculating the attention scores and producing
    the context layer.

    Attributes:
        num_attention_heads (int): The number of attention heads in the self-attention mechanism.
        attention_head_size (int): The size of each attention head.
        all_head_size (int): The total size of all attention heads combined.
        q_proj (nn.Linear): The projection layer for the query tensor.
        k_proj (nn.Linear): The projection layer for the key tensor.
        v_proj (nn.Linear): The projection layer for the value tensor.
        dropout (nn.Dropout): The dropout layer applied to the attention probabilities.
        position_embedding_type (str): The type of position embedding used in the attention mechanism.
        distance_embedding (nn.Embedding): The embedding layer for computing relative positions in the attention scores.
        is_decoder (bool): Whether the self-attention mechanism is used in a decoder module.

    Methods:
        transpose_for_scores:
            Reshapes the input tensor for calculating attention scores.

        forward:
            Constructs the self-attention mechanism by calculating attention scores and producing the context layer.

    Example:
        ```python
        >>> config = ErnieConfig(hidden_size=768, num_attention_heads=12, attention_probs_dropout_prob=0.1)
        >>> self_attention = ErnieMSelfAttention(config)
        ```
        """
    def __init__(self, config, position_embedding_type=None):
        """
        Initializes the ErnieMSelfAttention class.

        Args:
            self: The object itself.
            config (object): An object containing configuration parameters for the self-attention mechanism.
            position_embedding_type (str, optional): The type of position embedding to use. Defaults to None.

        Returns:
            None.

        Raises:
            ValueError: If the hidden size is not a multiple of the number of attention heads.
        """
        super().__init__()
        if config.hidden_size % config.num_attention_heads != 0 and not hasattr(config, "embedding_size"):
            raise ValueError(
                f"The hidden size ({config.hidden_size}) is not a multiple of the number of attention "
                f"heads ({config.num_attention_heads})"
            )

        self.num_attention_heads = config.num_attention_heads
        self.attention_head_size = int(config.hidden_size / config.num_attention_heads)
        self.all_head_size = self.num_attention_heads * self.attention_head_size

        self.q_proj = nn.Linear(config.hidden_size, self.all_head_size)
        self.k_proj = nn.Linear(config.hidden_size, self.all_head_size)
        self.v_proj = nn.Linear(config.hidden_size, self.all_head_size)

        self.dropout = nn.Dropout(p=config.attention_probs_dropout_prob)
        self.position_embedding_type = position_embedding_type or getattr(
            config, "position_embedding_type", "absolute"
        )
        if self.position_embedding_type in ('relative_key', 'relative_key_query'):
            self.max_position_embeddings = config.max_position_embeddings
            self.distance_embedding = nn.Embedding(2 * config.max_position_embeddings - 1, self.attention_head_size)

        self.is_decoder = config.is_decoder

    def transpose_for_scores(self, x: mindspore.Tensor) -> mindspore.Tensor:
        """
        Transposes the input tensor for calculating attention scores in the ErnieMSelfAttention class.

        Args:
            self (ErnieMSelfAttention): The instance of the ErnieMSelfAttention class.
            x (mindspore.Tensor): The input tensor to be transposed.
                It should have a shape of (batch_size, sequence_length, hidden_size).

        Returns:
            mindspore.Tensor:
                The transposed tensor with shape (batch_size, num_attention_heads, sequence_length, attention_head_size).

        Raises:
            None.
        """
        new_x_shape = x.shape[:-1] + (self.num_attention_heads, self.attention_head_size)
        x = x.view(new_x_shape)
        return x.permute(0, 2, 1, 3)

    def forward(
        self,
        hidden_states: mindspore.Tensor,
        attention_mask: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        encoder_hidden_states: Optional[mindspore.Tensor] = None,
        encoder_attention_mask: Optional[mindspore.Tensor] = None,
        past_key_value: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
        output_attentions: Optional[bool] = False,
    ) -> Tuple[mindspore.Tensor]:
        """
        This method forwards the self-attention mechanism for the ErnieMSelfAttention class.

        Args:
            self: The instance of the class.
            hidden_states (mindspore.Tensor): The input tensor representing the hidden states.
            attention_mask (Optional[mindspore.Tensor]):
                Optional tensor for masking attention scores. Defaults to None.
            head_mask (Optional[mindspore.Tensor]): Optional tensor for masking attention heads. Defaults to None.
            encoder_hidden_states (Optional[mindspore.Tensor]):
                Optional tensor representing hidden states from an encoder. Defaults to None.
            encoder_attention_mask (Optional[mindspore.Tensor]):
                Optional tensor for masking encoder attention scores. Defaults to None.
            past_key_value (Optional[Tuple[Tuple[mindspore.Tensor]]]):
                Optional tuple of past key and value tensors. Defaults to None.
            output_attentions (Optional[bool]):
                Flag indicating whether to output attentions. Defaults to False.

        Returns:
            Tuple[mindspore.Tensor]:
                A tuple containing the context layer tensor and optionally the attention probabilities tensor.

        Raises:
            ValueError: If the input tensor shapes are incompatible for matrix multiplication.
            ValueError: If the position_embedding_type specified is not supported.
            RuntimeError: If there is an issue with applying softmax or dropout operations.
            RuntimeError: If there is an issue with reshaping the context layer tensor.
        """
        mixed_query_layer = self.q_proj(hidden_states)

        # If this is instantiated as a cross-attention module, the keys
        # and values come from an encoder; the attention mask needs to be
        # such that the encoder's padding tokens are not attended to.
        is_cross_attention = encoder_hidden_states is not None

        if is_cross_attention and past_key_value is not None:
            # reuse k,v, cross_attentions
            key_layer = past_key_value[0]
            value_layer = past_key_value[1]
            attention_mask = encoder_attention_mask
        elif is_cross_attention:
            key_layer = self.transpose_for_scores(self.k_proj(encoder_hidden_states))
            value_layer = self.transpose_for_scores(self.v_proj(encoder_hidden_states))
            attention_mask = encoder_attention_mask
        elif past_key_value is not None:
            key_layer = self.transpose_for_scores(self.k_proj(hidden_states))
            value_layer = self.transpose_for_scores(self.v_proj(hidden_states))
            key_layer = ops.cat([past_key_value[0], key_layer], axis=2)
            value_layer = ops.cat([past_key_value[1], value_layer], axis=2)
        else:
            key_layer = self.transpose_for_scores(self.k_proj(hidden_states))
            value_layer = self.transpose_for_scores(self.v_proj(hidden_states))

        query_layer = self.transpose_for_scores(mixed_query_layer)

        use_cache = past_key_value is not None
        if self.is_decoder:
            # if cross_attention save Tuple(mindspore.Tensor, mindspore.Tensor) of all cross attention key/value_states.
            # Further calls to cross_attention layer can then reuse all cross-attention
            # key/value_states (first "if" case)
            # if uni-directional self-attention (decoder) save Tuple(mindspore.Tensor, mindspore.Tensor) of
            # all previous decoder key/value_states. Further calls to uni-directional self-attention
            # can concat previous decoder key/value_states to current projected key/value_states (third "elif" case)
            # if encoder bi-directional self-attention `past_key_value` is always `None`
            past_key_value = (key_layer, value_layer)

        # Take the dot product between "query" and "key" to get the raw attention scores.
        attention_scores = ops.matmul(query_layer, key_layer.swapaxes(-1, -2))

        if self.position_embedding_type in ('relative_key', 'relative_key_query'):
            query_length, key_length = query_layer.shape[2], key_layer.shape[2]
            if use_cache:
                position_ids_l = mindspore.tensor(key_length - 1, dtype=mindspore.int64).view(
                    -1, 1
                )
            else:
                position_ids_l = ops.arange(query_length, dtype=mindspore.int64).view(-1, 1)
            position_ids_r = ops.arange(key_length, dtype=mindspore.int64).view(1, -1)
            distance = position_ids_l - position_ids_r

            positional_embedding = self.distance_embedding(distance + self.max_position_embeddings - 1)
            positional_embedding = positional_embedding.to(dtype=query_layer.dtype)  # fp16 compatibility

            if self.position_embedding_type == "relative_key":
                relative_position_scores = ops.einsum("bhld,lrd->bhlr", query_layer, positional_embedding)
                attention_scores = attention_scores + relative_position_scores
            elif self.position_embedding_type == "relative_key_query":
                relative_position_scores_query = ops.einsum("bhld,lrd->bhlr", query_layer, positional_embedding)
                relative_position_scores_key = ops.einsum("bhrd,lrd->bhlr", key_layer, positional_embedding)
                attention_scores = attention_scores + relative_position_scores_query + relative_position_scores_key

        attention_scores = attention_scores / math.sqrt(self.attention_head_size)
        if attention_mask is not None:
            # Apply the attention mask is (precomputed for all layers in ErnieMModel forward() function)
            attention_scores = attention_scores + attention_mask

        # Normalize the attention scores to probabilities.
        attention_probs = ops.softmax(attention_scores, axis=-1)

        # This is actually dropping out entire tokens to attend to, which might
        # seem a bit unusual, but is taken from the original Transformer paper.
        attention_probs = self.dropout(attention_probs)

        # Mask heads if we want to
        if head_mask is not None:
            attention_probs = attention_probs * head_mask

        context_layer = ops.matmul(attention_probs, value_layer)

        context_layer = context_layer.permute(0, 2, 1, 3)
        new_context_layer_shape = context_layer.shape[:-2] + (self.all_head_size,)
        context_layer = context_layer.view(new_context_layer_shape)

        outputs = (context_layer, attention_probs) if output_attentions else (context_layer,)

        if self.is_decoder:
            outputs = outputs + (past_key_value,)
        return outputs

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMSelfAttention.__init__(config, position_embedding_type=None)

Initializes the ErnieMSelfAttention class.

PARAMETER DESCRIPTION
self

The object itself.

config

An object containing configuration parameters for the self-attention mechanism.

TYPE: object

position_embedding_type

The type of position embedding to use. Defaults to None.

TYPE: str DEFAULT: None

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
ValueError

If the hidden size is not a multiple of the number of attention heads.

Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
def __init__(self, config, position_embedding_type=None):
    """
    Initializes the ErnieMSelfAttention class.

    Args:
        self: The object itself.
        config (object): An object containing configuration parameters for the self-attention mechanism.
        position_embedding_type (str, optional): The type of position embedding to use. Defaults to None.

    Returns:
        None.

    Raises:
        ValueError: If the hidden size is not a multiple of the number of attention heads.
    """
    super().__init__()
    if config.hidden_size % config.num_attention_heads != 0 and not hasattr(config, "embedding_size"):
        raise ValueError(
            f"The hidden size ({config.hidden_size}) is not a multiple of the number of attention "
            f"heads ({config.num_attention_heads})"
        )

    self.num_attention_heads = config.num_attention_heads
    self.attention_head_size = int(config.hidden_size / config.num_attention_heads)
    self.all_head_size = self.num_attention_heads * self.attention_head_size

    self.q_proj = nn.Linear(config.hidden_size, self.all_head_size)
    self.k_proj = nn.Linear(config.hidden_size, self.all_head_size)
    self.v_proj = nn.Linear(config.hidden_size, self.all_head_size)

    self.dropout = nn.Dropout(p=config.attention_probs_dropout_prob)
    self.position_embedding_type = position_embedding_type or getattr(
        config, "position_embedding_type", "absolute"
    )
    if self.position_embedding_type in ('relative_key', 'relative_key_query'):
        self.max_position_embeddings = config.max_position_embeddings
        self.distance_embedding = nn.Embedding(2 * config.max_position_embeddings - 1, self.attention_head_size)

    self.is_decoder = config.is_decoder

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMSelfAttention.forward(hidden_states, attention_mask=None, head_mask=None, encoder_hidden_states=None, encoder_attention_mask=None, past_key_value=None, output_attentions=False)

This method forwards the self-attention mechanism for the ErnieMSelfAttention class.

PARAMETER DESCRIPTION
self

The instance of the class.

hidden_states

The input tensor representing the hidden states.

TYPE: Tensor

attention_mask

Optional tensor for masking attention scores. Defaults to None.

TYPE: Optional[Tensor] DEFAULT: None

head_mask

Optional tensor for masking attention heads. Defaults to None.

TYPE: Optional[Tensor] DEFAULT: None

encoder_hidden_states

Optional tensor representing hidden states from an encoder. Defaults to None.

TYPE: Optional[Tensor] DEFAULT: None

encoder_attention_mask

Optional tensor for masking encoder attention scores. Defaults to None.

TYPE: Optional[Tensor] DEFAULT: None

past_key_value

Optional tuple of past key and value tensors. Defaults to None.

TYPE: Optional[Tuple[Tuple[Tensor]]] DEFAULT: None

output_attentions

Flag indicating whether to output attentions. Defaults to False.

TYPE: Optional[bool] DEFAULT: False

RETURNS DESCRIPTION
Tuple[Tensor]

Tuple[mindspore.Tensor]: A tuple containing the context layer tensor and optionally the attention probabilities tensor.

RAISES DESCRIPTION
ValueError

If the input tensor shapes are incompatible for matrix multiplication.

ValueError

If the position_embedding_type specified is not supported.

RuntimeError

If there is an issue with applying softmax or dropout operations.

RuntimeError

If there is an issue with reshaping the context layer tensor.

Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
def forward(
    self,
    hidden_states: mindspore.Tensor,
    attention_mask: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    encoder_hidden_states: Optional[mindspore.Tensor] = None,
    encoder_attention_mask: Optional[mindspore.Tensor] = None,
    past_key_value: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
    output_attentions: Optional[bool] = False,
) -> Tuple[mindspore.Tensor]:
    """
    This method forwards the self-attention mechanism for the ErnieMSelfAttention class.

    Args:
        self: The instance of the class.
        hidden_states (mindspore.Tensor): The input tensor representing the hidden states.
        attention_mask (Optional[mindspore.Tensor]):
            Optional tensor for masking attention scores. Defaults to None.
        head_mask (Optional[mindspore.Tensor]): Optional tensor for masking attention heads. Defaults to None.
        encoder_hidden_states (Optional[mindspore.Tensor]):
            Optional tensor representing hidden states from an encoder. Defaults to None.
        encoder_attention_mask (Optional[mindspore.Tensor]):
            Optional tensor for masking encoder attention scores. Defaults to None.
        past_key_value (Optional[Tuple[Tuple[mindspore.Tensor]]]):
            Optional tuple of past key and value tensors. Defaults to None.
        output_attentions (Optional[bool]):
            Flag indicating whether to output attentions. Defaults to False.

    Returns:
        Tuple[mindspore.Tensor]:
            A tuple containing the context layer tensor and optionally the attention probabilities tensor.

    Raises:
        ValueError: If the input tensor shapes are incompatible for matrix multiplication.
        ValueError: If the position_embedding_type specified is not supported.
        RuntimeError: If there is an issue with applying softmax or dropout operations.
        RuntimeError: If there is an issue with reshaping the context layer tensor.
    """
    mixed_query_layer = self.q_proj(hidden_states)

    # If this is instantiated as a cross-attention module, the keys
    # and values come from an encoder; the attention mask needs to be
    # such that the encoder's padding tokens are not attended to.
    is_cross_attention = encoder_hidden_states is not None

    if is_cross_attention and past_key_value is not None:
        # reuse k,v, cross_attentions
        key_layer = past_key_value[0]
        value_layer = past_key_value[1]
        attention_mask = encoder_attention_mask
    elif is_cross_attention:
        key_layer = self.transpose_for_scores(self.k_proj(encoder_hidden_states))
        value_layer = self.transpose_for_scores(self.v_proj(encoder_hidden_states))
        attention_mask = encoder_attention_mask
    elif past_key_value is not None:
        key_layer = self.transpose_for_scores(self.k_proj(hidden_states))
        value_layer = self.transpose_for_scores(self.v_proj(hidden_states))
        key_layer = ops.cat([past_key_value[0], key_layer], axis=2)
        value_layer = ops.cat([past_key_value[1], value_layer], axis=2)
    else:
        key_layer = self.transpose_for_scores(self.k_proj(hidden_states))
        value_layer = self.transpose_for_scores(self.v_proj(hidden_states))

    query_layer = self.transpose_for_scores(mixed_query_layer)

    use_cache = past_key_value is not None
    if self.is_decoder:
        # if cross_attention save Tuple(mindspore.Tensor, mindspore.Tensor) of all cross attention key/value_states.
        # Further calls to cross_attention layer can then reuse all cross-attention
        # key/value_states (first "if" case)
        # if uni-directional self-attention (decoder) save Tuple(mindspore.Tensor, mindspore.Tensor) of
        # all previous decoder key/value_states. Further calls to uni-directional self-attention
        # can concat previous decoder key/value_states to current projected key/value_states (third "elif" case)
        # if encoder bi-directional self-attention `past_key_value` is always `None`
        past_key_value = (key_layer, value_layer)

    # Take the dot product between "query" and "key" to get the raw attention scores.
    attention_scores = ops.matmul(query_layer, key_layer.swapaxes(-1, -2))

    if self.position_embedding_type in ('relative_key', 'relative_key_query'):
        query_length, key_length = query_layer.shape[2], key_layer.shape[2]
        if use_cache:
            position_ids_l = mindspore.tensor(key_length - 1, dtype=mindspore.int64).view(
                -1, 1
            )
        else:
            position_ids_l = ops.arange(query_length, dtype=mindspore.int64).view(-1, 1)
        position_ids_r = ops.arange(key_length, dtype=mindspore.int64).view(1, -1)
        distance = position_ids_l - position_ids_r

        positional_embedding = self.distance_embedding(distance + self.max_position_embeddings - 1)
        positional_embedding = positional_embedding.to(dtype=query_layer.dtype)  # fp16 compatibility

        if self.position_embedding_type == "relative_key":
            relative_position_scores = ops.einsum("bhld,lrd->bhlr", query_layer, positional_embedding)
            attention_scores = attention_scores + relative_position_scores
        elif self.position_embedding_type == "relative_key_query":
            relative_position_scores_query = ops.einsum("bhld,lrd->bhlr", query_layer, positional_embedding)
            relative_position_scores_key = ops.einsum("bhrd,lrd->bhlr", key_layer, positional_embedding)
            attention_scores = attention_scores + relative_position_scores_query + relative_position_scores_key

    attention_scores = attention_scores / math.sqrt(self.attention_head_size)
    if attention_mask is not None:
        # Apply the attention mask is (precomputed for all layers in ErnieMModel forward() function)
        attention_scores = attention_scores + attention_mask

    # Normalize the attention scores to probabilities.
    attention_probs = ops.softmax(attention_scores, axis=-1)

    # This is actually dropping out entire tokens to attend to, which might
    # seem a bit unusual, but is taken from the original Transformer paper.
    attention_probs = self.dropout(attention_probs)

    # Mask heads if we want to
    if head_mask is not None:
        attention_probs = attention_probs * head_mask

    context_layer = ops.matmul(attention_probs, value_layer)

    context_layer = context_layer.permute(0, 2, 1, 3)
    new_context_layer_shape = context_layer.shape[:-2] + (self.all_head_size,)
    context_layer = context_layer.view(new_context_layer_shape)

    outputs = (context_layer, attention_probs) if output_attentions else (context_layer,)

    if self.is_decoder:
        outputs = outputs + (past_key_value,)
    return outputs

mindnlp.transformers.models.ernie_m.modeling_ernie_m.ErnieMSelfAttention.transpose_for_scores(x)

Transposes the input tensor for calculating attention scores in the ErnieMSelfAttention class.

PARAMETER DESCRIPTION
self

The instance of the ErnieMSelfAttention class.

TYPE: ErnieMSelfAttention

x

The input tensor to be transposed. It should have a shape of (batch_size, sequence_length, hidden_size).

TYPE: Tensor

RETURNS DESCRIPTION
Tensor

mindspore.Tensor: The transposed tensor with shape (batch_size, num_attention_heads, sequence_length, attention_head_size).

Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
def transpose_for_scores(self, x: mindspore.Tensor) -> mindspore.Tensor:
    """
    Transposes the input tensor for calculating attention scores in the ErnieMSelfAttention class.

    Args:
        self (ErnieMSelfAttention): The instance of the ErnieMSelfAttention class.
        x (mindspore.Tensor): The input tensor to be transposed.
            It should have a shape of (batch_size, sequence_length, hidden_size).

    Returns:
        mindspore.Tensor:
            The transposed tensor with shape (batch_size, num_attention_heads, sequence_length, attention_head_size).

    Raises:
        None.
    """
    new_x_shape = x.shape[:-1] + (self.num_attention_heads, self.attention_head_size)
    x = x.view(new_x_shape)
    return x.permute(0, 2, 1, 3)

mindnlp.transformers.models.ernie_m.modeling_ernie_m.UIEM

Bases: ErnieMForInformationExtraction

UIEM model

Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
class UIEM(ErnieMForInformationExtraction):
    """UIEM model"""
    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        start_positions: Optional[mindspore.Tensor] = None,
        end_positions: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = True,
    ) -> Union[Tuple[mindspore.Tensor], QuestionAnsweringModelOutput]:
        r"""
        Args:
            start_positions (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
                Labels for position (index) for computing the start_positions loss. Position outside of the sequence are
                not taken into account for computing the loss.
            end_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
                Labels for position (index) for computing the end_positions loss. Position outside of the sequence are not
                taken into account for computing the loss.
        """
        result = self.ernie_m(
            input_ids,
            # attention_mask=attention_mask,
            position_ids=position_ids,
            # head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )
        if return_dict:
            sequence_output = result.last_hidden_state
        elif not return_dict:
            sequence_output = result[0]

        start_logits = self.linear_start(sequence_output)
        start_logits = start_logits.squeeze(-1)
        start_prob = self.sigmoid(start_logits)
        end_logits = self.linear_end(sequence_output)
        end_logits = end_logits.squeeze(-1)
        end_prob = self.sigmoid(end_logits)

        total_loss = None
        if start_positions is not None and end_positions is not None:
            # If we are on multi-GPU, split add a dimension
            if len(start_positions.shape) > 1 and start_positions.shape[-1] == 1:
                start_positions = start_positions.squeeze(-1)
            if len(end_positions.shape) > 1 and end_positions.shape[-1] == 1:
                end_positions = end_positions.squeeze(-1)
            # sometimes the start/end positions are outside our model inputs, we ignore these terms
            ignored_index = start_logits.shape[1]
            start_positions = start_positions.clamp(0, ignored_index)
            end_positions = end_positions.clamp(0, ignored_index)

            start_loss = ops.binary_cross_entropy(start_prob, start_positions)
            end_loss = ops.binary_cross_entropy(end_prob, end_positions)
            total_loss = (start_loss + end_loss) / 2

        if not return_dict:
            return tuple(
                i
                for i in [total_loss, start_prob, end_prob, result.hidden_states, result.attentions]
                if i is not None
            )

        return QuestionAnsweringModelOutput(
            loss=total_loss,
            start_logits=start_prob,
            end_logits=end_prob,
            hidden_states=result.hidden_states,
            attentions=result.attentions,
        )

mindnlp.transformers.models.ernie_m.modeling_ernie_m.UIEM.forward(input_ids=None, attention_mask=None, position_ids=None, head_mask=None, inputs_embeds=None, start_positions=None, end_positions=None, output_attentions=None, output_hidden_states=None, return_dict=True)

PARAMETER DESCRIPTION
start_positions

Labels for position (index) for computing the start_positions loss. Position outside of the sequence are not taken into account for computing the loss.

TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional* DEFAULT: None

end_positions

Labels for position (index) for computing the end_positions loss. Position outside of the sequence are not taken into account for computing the loss.

TYPE: `mindspore.Tensor` of shape `(batch_size,)`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/ernie_m/modeling_ernie_m.py
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    start_positions: Optional[mindspore.Tensor] = None,
    end_positions: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = True,
) -> Union[Tuple[mindspore.Tensor], QuestionAnsweringModelOutput]:
    r"""
    Args:
        start_positions (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
            Labels for position (index) for computing the start_positions loss. Position outside of the sequence are
            not taken into account for computing the loss.
        end_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for position (index) for computing the end_positions loss. Position outside of the sequence are not
            taken into account for computing the loss.
    """
    result = self.ernie_m(
        input_ids,
        # attention_mask=attention_mask,
        position_ids=position_ids,
        # head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )
    if return_dict:
        sequence_output = result.last_hidden_state
    elif not return_dict:
        sequence_output = result[0]

    start_logits = self.linear_start(sequence_output)
    start_logits = start_logits.squeeze(-1)
    start_prob = self.sigmoid(start_logits)
    end_logits = self.linear_end(sequence_output)
    end_logits = end_logits.squeeze(-1)
    end_prob = self.sigmoid(end_logits)

    total_loss = None
    if start_positions is not None and end_positions is not None:
        # If we are on multi-GPU, split add a dimension
        if len(start_positions.shape) > 1 and start_positions.shape[-1] == 1:
            start_positions = start_positions.squeeze(-1)
        if len(end_positions.shape) > 1 and end_positions.shape[-1] == 1:
            end_positions = end_positions.squeeze(-1)
        # sometimes the start/end positions are outside our model inputs, we ignore these terms
        ignored_index = start_logits.shape[1]
        start_positions = start_positions.clamp(0, ignored_index)
        end_positions = end_positions.clamp(0, ignored_index)

        start_loss = ops.binary_cross_entropy(start_prob, start_positions)
        end_loss = ops.binary_cross_entropy(end_prob, end_positions)
        total_loss = (start_loss + end_loss) / 2

    if not return_dict:
        return tuple(
            i
            for i in [total_loss, start_prob, end_prob, result.hidden_states, result.attentions]
            if i is not None
        )

    return QuestionAnsweringModelOutput(
        loss=total_loss,
        start_logits=start_prob,
        end_logits=end_prob,
        hidden_states=result.hidden_states,
        attentions=result.attentions,
    )

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m

MindSpore ErnieM model.

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMAttention

Bases: Module

This class represents an attention module for MSErnieM model, which includes self-attention mechanism and projection layers. It inherits from nn.Module and provides methods to initialize the attention module, prune attention heads, and perform attention computation. The attention module consists of self-attention mechanism with configurable position embedding type and projection layers for output transformation. The 'prune_heads' method allows pruning specific attention heads based on provided indices. The 'forward' method computes the attention output given input hidden states, optional masks, and other optional inputs.

Source code in mindnlp/transformers/models/ernie_m/modeling_graph_ernie_m.py
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
class MSErnieMAttention(nn.Module):

    """
    This class represents an attention module for MSErnieM model, which includes self-attention mechanism and projection
    layers.
    It inherits from nn.Module and provides methods to initialize the attention module, prune attention heads, and perform
    attention computation.
    The attention module consists of self-attention mechanism with configurable position embedding type and projection
    layers for output transformation.
    The 'prune_heads' method allows pruning specific attention heads based on provided indices.
    The 'forward' method computes the attention output given input hidden states, optional masks, and other optional
    inputs.
    """
    def __init__(self, config, position_embedding_type=None):
        """
        Initializes an instance of the MSErnieMAttention class.

        Args:
            self: The instance of the class.
            config (object): An object that contains the configuration settings for the attention layer.
            position_embedding_type (str, optional): The type of position embedding to use. Defaults to None.

        Returns:
            None

        Raises:
            None
        """
        super().__init__()
        self.self_attn = MSErnieMSelfAttention(config, position_embedding_type=position_embedding_type)
        self.out_proj = nn.Linear(config.hidden_size, config.hidden_size)
        self.pruned_heads = set()

    def prune_heads(self, heads):
        """
        This method 'prune_heads' in the class 'MSErnieMAttention' prunes heads from the attention mechanism.

        Args:
            self (object): The instance of the class.
            heads (list): A list of integers representing the indices of heads to be pruned from the attention mechanism.

        Returns:
            None: This method does not return anything explicitly, as it operates by mutating the internal state of the class.

        Raises:
            ValueError: If the length of the 'heads' list is equal to 0.
            TypeError: If the 'heads' parameter is not a list of integers.
            IndexError: If the indices in 'heads' exceed the available attention heads in the mechanism.
        """
        if len(heads) == 0:
            return
        heads, index = find_pruneable_heads_and_indices(
            heads, self.self_attn.num_attention_heads, self.self_attn.attention_head_size, self.pruned_heads
        )

        # Prune linear layers
        self.self_attn.q_proj = prune_linear_layer(self.self_attn.q_proj, index)
        self.self_attn.k_proj = prune_linear_layer(self.self_attn.k_proj, index)
        self.self_attn.v_proj = prune_linear_layer(self.self_attn.v_proj, index)
        self.out_proj = prune_linear_layer(self.out_proj, index, dim=1)

        # Update hyper params and store pruned heads
        self.self_attn.num_attention_heads = self.self_attn.num_attention_heads - len(heads)
        self.self_attn.all_head_size = self.self_attn.attention_head_size * self.self_attn.num_attention_heads
        self.pruned_heads = self.pruned_heads.union(heads)

    def forward(
        self,
        hidden_states: mindspore.Tensor,
        attention_mask: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        encoder_hidden_states: Optional[mindspore.Tensor] = None,
        encoder_attention_mask: Optional[mindspore.Tensor] = None,
        past_key_value: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
        output_attentions: Optional[bool] = False,
    ) -> Tuple[mindspore.Tensor]:
        """
        Constructs the MSErnieMAttention module.

        Args:
            self (MSErnieMAttention): The instance of the MSErnieMAttention class.
            hidden_states (mindspore.Tensor): The input hidden states of the model.
                Shape: (batch_size, seq_length, hidden_size).
            attention_mask (Optional[mindspore.Tensor], optional):
                The attention mask tensor, indicating which tokens should be attended to and which should not.
                Shape: (batch_size, seq_length). Defaults to None.
            head_mask (Optional[mindspore.Tensor], optional):
                The head mask tensor, indicating which heads should be masked out.
                Shape: (num_heads, seq_length, seq_length). Defaults to None.
            encoder_hidden_states (Optional[mindspore.Tensor], optional):
                The hidden states of the encoder. Shape: (batch_size, seq_length, hidden_size). Defaults to None.
            encoder_attention_mask (Optional[mindspore.Tensor], optional):
                The attention mask tensor for the encoder, indicating which tokens should be attended to and which
                should not. Shape: (batch_size, seq_length). Defaults to None.
            past_key_value (Optional[Tuple[Tuple[mindspore.Tensor]]], optional):
                The tuple of past key and value tensors for keeping the previous attention weights.
                Shape: ((batch_size, num_heads, seq_length, hidden_size),
                (batch_size, num_heads, seq_length, hidden_size)). Defaults to None.
            output_attentions (Optional[bool], optional): Whether to output attention weights. Defaults to False.

        Returns:
            Tuple[mindspore.Tensor]: A tuple containing the attention output tensor and other optional outputs.

        Raises:
            None.
        """
        self_outputs = self.self_attn(
            hidden_states,
            attention_mask,
            head_mask,
            encoder_hidden_states,
            encoder_attention_mask,
            past_key_value,
            output_attentions,
        )
        attention_output = self.out_proj(self_outputs[0])
        outputs = (attention_output,) + self_outputs[1:]  # add attentions if we output them
        return outputs

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMAttention.__init__(config, position_embedding_type=None)

Initializes an instance of the MSErnieMAttention class.

PARAMETER DESCRIPTION
self

The instance of the class.

config

An object that contains the configuration settings for the attention layer.

TYPE: object

position_embedding_type

The type of position embedding to use. Defaults to None.

TYPE: str DEFAULT: None

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/ernie_m/modeling_graph_ernie_m.py
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
def __init__(self, config, position_embedding_type=None):
    """
    Initializes an instance of the MSErnieMAttention class.

    Args:
        self: The instance of the class.
        config (object): An object that contains the configuration settings for the attention layer.
        position_embedding_type (str, optional): The type of position embedding to use. Defaults to None.

    Returns:
        None

    Raises:
        None
    """
    super().__init__()
    self.self_attn = MSErnieMSelfAttention(config, position_embedding_type=position_embedding_type)
    self.out_proj = nn.Linear(config.hidden_size, config.hidden_size)
    self.pruned_heads = set()

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMAttention.forward(hidden_states, attention_mask=None, head_mask=None, encoder_hidden_states=None, encoder_attention_mask=None, past_key_value=None, output_attentions=False)

Constructs the MSErnieMAttention module.

PARAMETER DESCRIPTION
self

The instance of the MSErnieMAttention class.

TYPE: MSErnieMAttention

hidden_states

The input hidden states of the model. Shape: (batch_size, seq_length, hidden_size).

TYPE: Tensor

attention_mask

The attention mask tensor, indicating which tokens should be attended to and which should not. Shape: (batch_size, seq_length). Defaults to None.

TYPE: Optional[Tensor] DEFAULT: None

head_mask

The head mask tensor, indicating which heads should be masked out. Shape: (num_heads, seq_length, seq_length). Defaults to None.

TYPE: Optional[Tensor] DEFAULT: None

encoder_hidden_states

The hidden states of the encoder. Shape: (batch_size, seq_length, hidden_size). Defaults to None.

TYPE: Optional[Tensor] DEFAULT: None

encoder_attention_mask

The attention mask tensor for the encoder, indicating which tokens should be attended to and which should not. Shape: (batch_size, seq_length). Defaults to None.

TYPE: Optional[Tensor] DEFAULT: None

past_key_value

The tuple of past key and value tensors for keeping the previous attention weights. Shape: ((batch_size, num_heads, seq_length, hidden_size), (batch_size, num_heads, seq_length, hidden_size)). Defaults to None.

TYPE: Optional[Tuple[Tuple[Tensor]]] DEFAULT: None

output_attentions

Whether to output attention weights. Defaults to False.

TYPE: Optional[bool] DEFAULT: False

RETURNS DESCRIPTION
Tuple[Tensor]

Tuple[mindspore.Tensor]: A tuple containing the attention output tensor and other optional outputs.

Source code in mindnlp/transformers/models/ernie_m/modeling_graph_ernie_m.py
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
def forward(
    self,
    hidden_states: mindspore.Tensor,
    attention_mask: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    encoder_hidden_states: Optional[mindspore.Tensor] = None,
    encoder_attention_mask: Optional[mindspore.Tensor] = None,
    past_key_value: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
    output_attentions: Optional[bool] = False,
) -> Tuple[mindspore.Tensor]:
    """
    Constructs the MSErnieMAttention module.

    Args:
        self (MSErnieMAttention): The instance of the MSErnieMAttention class.
        hidden_states (mindspore.Tensor): The input hidden states of the model.
            Shape: (batch_size, seq_length, hidden_size).
        attention_mask (Optional[mindspore.Tensor], optional):
            The attention mask tensor, indicating which tokens should be attended to and which should not.
            Shape: (batch_size, seq_length). Defaults to None.
        head_mask (Optional[mindspore.Tensor], optional):
            The head mask tensor, indicating which heads should be masked out.
            Shape: (num_heads, seq_length, seq_length). Defaults to None.
        encoder_hidden_states (Optional[mindspore.Tensor], optional):
            The hidden states of the encoder. Shape: (batch_size, seq_length, hidden_size). Defaults to None.
        encoder_attention_mask (Optional[mindspore.Tensor], optional):
            The attention mask tensor for the encoder, indicating which tokens should be attended to and which
            should not. Shape: (batch_size, seq_length). Defaults to None.
        past_key_value (Optional[Tuple[Tuple[mindspore.Tensor]]], optional):
            The tuple of past key and value tensors for keeping the previous attention weights.
            Shape: ((batch_size, num_heads, seq_length, hidden_size),
            (batch_size, num_heads, seq_length, hidden_size)). Defaults to None.
        output_attentions (Optional[bool], optional): Whether to output attention weights. Defaults to False.

    Returns:
        Tuple[mindspore.Tensor]: A tuple containing the attention output tensor and other optional outputs.

    Raises:
        None.
    """
    self_outputs = self.self_attn(
        hidden_states,
        attention_mask,
        head_mask,
        encoder_hidden_states,
        encoder_attention_mask,
        past_key_value,
        output_attentions,
    )
    attention_output = self.out_proj(self_outputs[0])
    outputs = (attention_output,) + self_outputs[1:]  # add attentions if we output them
    return outputs

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMAttention.prune_heads(heads)

This method 'prune_heads' in the class 'MSErnieMAttention' prunes heads from the attention mechanism.

PARAMETER DESCRIPTION
self

The instance of the class.

TYPE: object

heads

A list of integers representing the indices of heads to be pruned from the attention mechanism.

TYPE: list

RETURNS DESCRIPTION
None

This method does not return anything explicitly, as it operates by mutating the internal state of the class.

RAISES DESCRIPTION
ValueError

If the length of the 'heads' list is equal to 0.

TypeError

If the 'heads' parameter is not a list of integers.

IndexError

If the indices in 'heads' exceed the available attention heads in the mechanism.

Source code in mindnlp/transformers/models/ernie_m/modeling_graph_ernie_m.py
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
def prune_heads(self, heads):
    """
    This method 'prune_heads' in the class 'MSErnieMAttention' prunes heads from the attention mechanism.

    Args:
        self (object): The instance of the class.
        heads (list): A list of integers representing the indices of heads to be pruned from the attention mechanism.

    Returns:
        None: This method does not return anything explicitly, as it operates by mutating the internal state of the class.

    Raises:
        ValueError: If the length of the 'heads' list is equal to 0.
        TypeError: If the 'heads' parameter is not a list of integers.
        IndexError: If the indices in 'heads' exceed the available attention heads in the mechanism.
    """
    if len(heads) == 0:
        return
    heads, index = find_pruneable_heads_and_indices(
        heads, self.self_attn.num_attention_heads, self.self_attn.attention_head_size, self.pruned_heads
    )

    # Prune linear layers
    self.self_attn.q_proj = prune_linear_layer(self.self_attn.q_proj, index)
    self.self_attn.k_proj = prune_linear_layer(self.self_attn.k_proj, index)
    self.self_attn.v_proj = prune_linear_layer(self.self_attn.v_proj, index)
    self.out_proj = prune_linear_layer(self.out_proj, index, dim=1)

    # Update hyper params and store pruned heads
    self.self_attn.num_attention_heads = self.self_attn.num_attention_heads - len(heads)
    self.self_attn.all_head_size = self.self_attn.attention_head_size * self.self_attn.num_attention_heads
    self.pruned_heads = self.pruned_heads.union(heads)

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMEmbeddings

Bases: Module

Construct the embeddings from word and position embeddings.

Source code in mindnlp/transformers/models/ernie_m/modeling_graph_ernie_m.py
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
class MSErnieMEmbeddings(nn.Module):
    """Construct the embeddings from word and position embeddings."""
    def __init__(self, config):
        """
        Initializes an instance of the MSErnieMEmbeddings class.

        Args:
            self: The object instance.
            config (object):
                A configuration object containing various parameters.

                - hidden_size (int): The size of the hidden state.
                - vocab_size (int): The size of the vocabulary.
                - pad_token_id (int): The ID of the padding token.
                - max_position_embeddings (int): The maximum number of positional embeddings.
                - layer_norm_eps (float): The epsilon value for layer normalization.
                - hidden_dropout_prob (float): The dropout probability for the hidden state.

        Returns:
            None

        Raises:
            None
        """
        super().__init__()
        self.hidden_size = config.hidden_size
        self.word_embeddings = nn.Embedding(config.vocab_size, config.hidden_size, padding_idx=config.pad_token_id)
        self.position_embeddings = nn.Embedding(
            config.max_position_embeddings, config.hidden_size, padding_idx=config.pad_token_id
        )
        self.layer_norm = nn.LayerNorm([config.hidden_size], eps=config.layer_norm_eps)
        self.dropout = nn.Dropout(p=config.hidden_dropout_prob)
        self.padding_idx = config.pad_token_id

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        past_key_values_length: int = 0,
    ) -> mindspore.Tensor:
        """
        Constructs the embeddings for MSErnieM model.

        Args:
            self (MSErnieMEmbeddings): The MSErnieMEmbeddings instance.
            input_ids (Optional[mindspore.Tensor]):
                The input tensor containing the indices of input tokens. Default is None.
            position_ids (Optional[mindspore.Tensor]):
                The input tensor containing the indices of position tokens. Default is None.
            inputs_embeds (Optional[mindspore.Tensor]):
                The input tensor containing the embeddings of input tokens. Default is None.
            past_key_values_length (int): The length of past key values. Default is 0.

        Returns:
            mindspore.Tensor: The forwarded embeddings tensor.

        Raises:
            ValueError: If the input_ids and inputs_embeds are both None.
            ValueError: If the input_shape is invalid for position_ids calculation.
            ValueError: If past_key_values_length is negative.
        """
        if inputs_embeds is None:
            inputs_embeds = self.word_embeddings(input_ids)
        if position_ids is None:
            input_shape = inputs_embeds.shape[:-1]
            ones = ops.ones(input_shape, dtype=mindspore.int64)
            seq_length = ops.cumsum(ones, axis=1)
            position_ids = seq_length - ones

            if past_key_values_length > 0:
                position_ids = position_ids + past_key_values_length
        # to mimic paddlenlp implementation
        position_ids += 2
        position_embeddings = self.position_embeddings(position_ids)
        embeddings = inputs_embeds + position_embeddings
        embeddings = self.layer_norm(embeddings)
        embeddings = self.dropout(embeddings)

        return embeddings

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMEmbeddings.__init__(config)

Initializes an instance of the MSErnieMEmbeddings class.

PARAMETER DESCRIPTION
self

The object instance.

config

A configuration object containing various parameters.

  • hidden_size (int): The size of the hidden state.
  • vocab_size (int): The size of the vocabulary.
  • pad_token_id (int): The ID of the padding token.
  • max_position_embeddings (int): The maximum number of positional embeddings.
  • layer_norm_eps (float): The epsilon value for layer normalization.
  • hidden_dropout_prob (float): The dropout probability for the hidden state.

TYPE: object

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/ernie_m/modeling_graph_ernie_m.py
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
def __init__(self, config):
    """
    Initializes an instance of the MSErnieMEmbeddings class.

    Args:
        self: The object instance.
        config (object):
            A configuration object containing various parameters.

            - hidden_size (int): The size of the hidden state.
            - vocab_size (int): The size of the vocabulary.
            - pad_token_id (int): The ID of the padding token.
            - max_position_embeddings (int): The maximum number of positional embeddings.
            - layer_norm_eps (float): The epsilon value for layer normalization.
            - hidden_dropout_prob (float): The dropout probability for the hidden state.

    Returns:
        None

    Raises:
        None
    """
    super().__init__()
    self.hidden_size = config.hidden_size
    self.word_embeddings = nn.Embedding(config.vocab_size, config.hidden_size, padding_idx=config.pad_token_id)
    self.position_embeddings = nn.Embedding(
        config.max_position_embeddings, config.hidden_size, padding_idx=config.pad_token_id
    )
    self.layer_norm = nn.LayerNorm([config.hidden_size], eps=config.layer_norm_eps)
    self.dropout = nn.Dropout(p=config.hidden_dropout_prob)
    self.padding_idx = config.pad_token_id

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMEmbeddings.forward(input_ids=None, position_ids=None, inputs_embeds=None, past_key_values_length=0)

Constructs the embeddings for MSErnieM model.

PARAMETER DESCRIPTION
self

The MSErnieMEmbeddings instance.

TYPE: MSErnieMEmbeddings

input_ids

The input tensor containing the indices of input tokens. Default is None.

TYPE: Optional[Tensor] DEFAULT: None

position_ids

The input tensor containing the indices of position tokens. Default is None.

TYPE: Optional[Tensor] DEFAULT: None

inputs_embeds

The input tensor containing the embeddings of input tokens. Default is None.

TYPE: Optional[Tensor] DEFAULT: None

past_key_values_length

The length of past key values. Default is 0.

TYPE: int DEFAULT: 0

RETURNS DESCRIPTION
Tensor

mindspore.Tensor: The forwarded embeddings tensor.

RAISES DESCRIPTION
ValueError

If the input_ids and inputs_embeds are both None.

ValueError

If the input_shape is invalid for position_ids calculation.

ValueError

If past_key_values_length is negative.

Source code in mindnlp/transformers/models/ernie_m/modeling_graph_ernie_m.py
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    past_key_values_length: int = 0,
) -> mindspore.Tensor:
    """
    Constructs the embeddings for MSErnieM model.

    Args:
        self (MSErnieMEmbeddings): The MSErnieMEmbeddings instance.
        input_ids (Optional[mindspore.Tensor]):
            The input tensor containing the indices of input tokens. Default is None.
        position_ids (Optional[mindspore.Tensor]):
            The input tensor containing the indices of position tokens. Default is None.
        inputs_embeds (Optional[mindspore.Tensor]):
            The input tensor containing the embeddings of input tokens. Default is None.
        past_key_values_length (int): The length of past key values. Default is 0.

    Returns:
        mindspore.Tensor: The forwarded embeddings tensor.

    Raises:
        ValueError: If the input_ids and inputs_embeds are both None.
        ValueError: If the input_shape is invalid for position_ids calculation.
        ValueError: If past_key_values_length is negative.
    """
    if inputs_embeds is None:
        inputs_embeds = self.word_embeddings(input_ids)
    if position_ids is None:
        input_shape = inputs_embeds.shape[:-1]
        ones = ops.ones(input_shape, dtype=mindspore.int64)
        seq_length = ops.cumsum(ones, axis=1)
        position_ids = seq_length - ones

        if past_key_values_length > 0:
            position_ids = position_ids + past_key_values_length
    # to mimic paddlenlp implementation
    position_ids += 2
    position_embeddings = self.position_embeddings(position_ids)
    embeddings = inputs_embeds + position_embeddings
    embeddings = self.layer_norm(embeddings)
    embeddings = self.dropout(embeddings)

    return embeddings

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMEncoder

Bases: Module

This class represents an MSErnieMEncoder, which is a multi-layer transformer-based encoder model for natural language processing tasks.

The MSErnieMEncoder inherits from the nn.Module class and is designed to process input embeddings and generate hidden states, attentions, and last hidden state output.

ATTRIBUTE DESCRIPTION
config

The configuration object that contains the model's hyperparameters and settings.

TYPE: object

layers

A list of MSErnieMEncoderLayer instances that make up the layers of the encoder.

TYPE: ModuleList

METHOD DESCRIPTION
__init__

Initializes a new MSErnieMEncoder instance with the given configuration.

forward

Constructs the MSErnieMEncoder model by processing the input embeddings and generating the desired outputs.

Args:

  • input_embeds (mindspore.Tensor): The input embeddings for the model.
  • attention_mask (Optional[mindspore.Tensor], optional): The attention mask tensor to mask certain positions. Defaults to None.
  • head_mask (Optional[mindspore.Tensor], optional): The head mask tensor to mask certain heads. Defaults to None.
  • past_key_values (Optional[Tuple[Tuple[mindspore.Tensor]]], optional): The cached key-value tensors from previous decoding steps. Defaults to None.
  • output_attentions (Optional[bool], optional): Whether to output attention weights. Defaults to False.
  • output_hidden_states (Optional[bool], optional): Whether to output hidden states. Defaults to False.

Returns:

  • Tuple[mindspore.Tensor]: A tuple containing the last hidden state, hidden states, and attentions (if enabled).
Source code in mindnlp/transformers/models/ernie_m/modeling_graph_ernie_m.py
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
class MSErnieMEncoder(nn.Module):

    """
    This class represents an MSErnieMEncoder, which is a multi-layer transformer-based encoder model for
    natural language processing tasks.

    The MSErnieMEncoder inherits from the nn.Module class and is designed to process input embeddings and generate
    hidden states, attentions, and last hidden state output.

    Attributes:
        config (object): The configuration object that contains the model's hyperparameters and settings.
        layers (nn.ModuleList): A list of MSErnieMEncoderLayer instances that make up the layers of the encoder.

    Methods:
        __init__(self, config):
            Initializes a new MSErnieMEncoder instance with the given configuration.

        forward(self, input_embeds, attention_mask=None, head_mask=None, past_key_values=None, output_attentions=False, output_hidden_states=False):
            Constructs the MSErnieMEncoder model by processing the input embeddings and generating the desired outputs.

            Args:

            - input_embeds (mindspore.Tensor): The input embeddings for the model.
            - attention_mask (Optional[mindspore.Tensor], optional): The attention mask tensor to mask
            certain positions. Defaults to None.
            - head_mask (Optional[mindspore.Tensor], optional): The head mask tensor to mask certain heads.
            Defaults to None.
            - past_key_values (Optional[Tuple[Tuple[mindspore.Tensor]]], optional): The cached key-value tensors
            from previous decoding steps. Defaults to None.
            - output_attentions (Optional[bool], optional): Whether to output attention weights. Defaults to False.
            - output_hidden_states (Optional[bool], optional): Whether to output hidden states. Defaults to False.

            Returns:

            - Tuple[mindspore.Tensor]: A tuple containing the last hidden state, hidden states, and attentions (if enabled).

        """
    def __init__(self, config):
        """
        Initializes the MSErnieMEncoder class.

        Args:
            self: The object itself.
            config (object): An object containing the configuration parameters for the MSErnieMEncoder.
                The config object should have the following attributes:

                - num_hidden_layers (int): The number of hidden layers in the encoder.
                - other attributes specific to the MSErnieMEncoderLayer.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()
        self.config = config
        self.layers = nn.ModuleList([MSErnieMEncoderLayer(config) for _ in range(config.num_hidden_layers)])

    def forward(
        self,
        input_embeds: mindspore.Tensor,
        attention_mask: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        past_key_values: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
        output_attentions: Optional[bool] = False,
        output_hidden_states: Optional[bool] = False,
    ) -> Tuple[mindspore.Tensor]:
        """
        This method forwards the MSErnieMEncoder by processing the input embeddings and applying attention masks and
        head masks if provided.

        Args:
            self: The instance of the MSErnieMEncoder class.
            input_embeds (mindspore.Tensor): The input embeddings to be processed by the encoder.
            attention_mask (Optional[mindspore.Tensor]): An optional tensor representing the attention mask.
                If provided, it restricts the attention of the encoder.
            head_mask (Optional[mindspore.Tensor]): An optional tensor representing the head mask.
                If provided, it restricts the attention heads of the encoder.
            past_key_values (Optional[Tuple[Tuple[mindspore.Tensor]]]): An optional tuple of past key values,
                if provided, it allows the encoder to reuse previously computed key value states.
            output_attentions (Optional[bool]): An optional boolean indicating whether to output attentions.
                Default is False.
            output_hidden_states (Optional[bool]): An optional boolean indicating whether to output hidden states.
                Default is False.

        Returns:
            Tuple[mindspore.Tensor]: A tuple containing the processed output tensor.

        Raises:
            ValueError: If the input_embeds parameter is not of type mindspore.Tensor.
            ValueError: If the attention_mask parameter is not of type Optional[mindspore.Tensor].
            ValueError: If the head_mask parameter is not of type Optional[mindspore.Tensor].
            ValueError: If the past_key_values parameter is not of type Optional[Tuple[Tuple[mindspore.Tensor]]].
            ValueError: If the output_attentions parameter is not of type Optional[bool].
            ValueError: If the output_hidden_states parameter is not of type Optional[bool].
        """
        hidden_states = () if output_hidden_states else None
        attentions = () if output_attentions else None

        output = input_embeds
        if output_hidden_states:
            hidden_states = hidden_states + (output,)
        for i, layer in enumerate(self.layers):
            layer_head_mask = head_mask[i] if head_mask is not None else None
            past_key_value = past_key_values[i] if past_key_values is not None else None

            output, opt_attn_weights = layer(
                hidden_states=output,
                attention_mask=attention_mask,
                head_mask=layer_head_mask,
                past_key_value=past_key_value,
            )

            if output_hidden_states:
                hidden_states = hidden_states + (output,)
            if output_attentions:
                attentions = attentions + (opt_attn_weights,)

        last_hidden_state = output
        return tuple(v for v in [last_hidden_state, hidden_states, attentions] if v is not None)

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMEncoder.__init__(config)

Initializes the MSErnieMEncoder class.

PARAMETER DESCRIPTION
self

The object itself.

config

An object containing the configuration parameters for the MSErnieMEncoder. The config object should have the following attributes:

  • num_hidden_layers (int): The number of hidden layers in the encoder.
  • other attributes specific to the MSErnieMEncoderLayer.

TYPE: object

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/ernie_m/modeling_graph_ernie_m.py
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
def __init__(self, config):
    """
    Initializes the MSErnieMEncoder class.

    Args:
        self: The object itself.
        config (object): An object containing the configuration parameters for the MSErnieMEncoder.
            The config object should have the following attributes:

            - num_hidden_layers (int): The number of hidden layers in the encoder.
            - other attributes specific to the MSErnieMEncoderLayer.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()
    self.config = config
    self.layers = nn.ModuleList([MSErnieMEncoderLayer(config) for _ in range(config.num_hidden_layers)])

mindnlp.transformers.models.ernie_m.modeling_graph_ernie_m.MSErnieMEncoder.forward(input_embeds, attention_mask=None, head_mask=None, past_key_values=None, output_attentions=False, output_hidden_states=False)

This method forwards the MSErnieMEncoder by processing the input embeddings and applying attention masks and head masks if provided.

PARAMETER DESCRIPTION
self

The instance of the MSErnieMEncoder class.

input_embeds

The input embeddings to be processed by the encoder.

TYPE: Tensor

attention_mask

An optional tensor representing the attention mask. If provided, it restricts the attention of the encoder.

TYPE: Optional[Tensor] DEFAULT: None

head_mask

An optional tensor representing the head mask. If provided, it restricts the attention heads of the encoder.

TYPE: Optional[Tensor] DEFAULT: None

past_key_values

An optional tuple of past key values, if provided, it allows the encoder to reuse previously computed key value states.

TYPE: Optional[Tuple[Tuple[Tensor]]] DEFAULT: None

output_attentions

An optional boolean indicating whether to output attentions. Default is False.

TYPE: Optional[bool] DEFAULT: False