Skip to content

bigbird_pegasus

mindnlp.transformers.models.bigbird_pegasus.configuration_bigbird_pegasus.BigBirdPegasusConfig

Bases: PretrainedConfig

This is the configuration class to store the configuration of a [BigBirdPegasusModel]. It is used to instantiate an BigBirdPegasus model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the BigBirdPegasus google/bigbird-pegasus-large-arxiv architecture.

Configuration objects inherit from [PretrainedConfig] and can be used to control the model outputs. Read the documentation from [PretrainedConfig] for more information.

PARAMETER DESCRIPTION
vocab_size

Vocabulary size of the BigBirdPegasus model. Defines the number of different tokens that can be represented by the inputs_ids passed when calling [BigBirdPegasusModel].

TYPE: `int`, *optional*, defaults to 96103 DEFAULT: 96103

d_model

Dimension of the layers and the pooler layer.

TYPE: `int`, *optional*, defaults to 1024 DEFAULT: 1024

encoder_layers

Number of encoder layers.

TYPE: `int`, *optional*, defaults to 16 DEFAULT: 16

decoder_layers

Number of decoder layers.

TYPE: `int`, *optional*, defaults to 16 DEFAULT: 16

encoder_attention_heads

Number of attention heads for each attention layer in the Transformer encoder.

TYPE: `int`, *optional*, defaults to 16 DEFAULT: 16

decoder_attention_heads

Number of attention heads for each attention layer in the Transformer decoder.

TYPE: `int`, *optional*, defaults to 16 DEFAULT: 16

decoder_ffn_dim

Dimension of the "intermediate" (often named feed-forward) layer in decoder.

TYPE: `int`, *optional*, defaults to 4096 DEFAULT: 4096

encoder_ffn_dim

Dimension of the "intermediate" (often named feed-forward) layer in decoder.

TYPE: `int`, *optional*, defaults to 4096 DEFAULT: 4096

activation_function

The non-linear activation function (function or string) in the encoder and pooler. If string, "gelu", "relu", "silu" and "gelu_new" are supported.

TYPE: `str` or `function`, *optional*, defaults to `"gelu_new"` DEFAULT: 'gelu_new'

dropout

The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.

TYPE: `float`, *optional*, defaults to 0.1 DEFAULT: 0.1

attention_dropout

The dropout ratio for the attention probabilities.

TYPE: `float`, *optional*, defaults to 0.0 DEFAULT: 0.0

activation_dropout

The dropout ratio for activations inside the fully connected layer.

TYPE: `float`, *optional*, defaults to 0.0 DEFAULT: 0.0

classifier_dropout

The dropout ratio for classifier.

TYPE: `float`, *optional*, defaults to 0.0 DEFAULT: 0.0

max_position_embeddings

The maximum sequence length that this model might ever be used with. Typically set this to something large just in case (e.g., 1024 or 2048 or 4096).

TYPE: `int`, *optional*, defaults to 4096 DEFAULT: 4096

init_std

The standard deviation of the truncated_normal_initializer for initializing all weight matrices.

TYPE: `float`, *optional*, defaults to 0.02 DEFAULT: 0.02

encoder_layerdrop

The LayerDrop probability for the encoder. See the LayerDrop paper for more details.

TYPE: `float`, *optional*, defaults to 0.0 DEFAULT: 0.0

decoder_layerdrop

The LayerDrop probability for the decoder. See the LayerDrop paper for more details.

TYPE: `float`, *optional*, defaults to 0.0 DEFAULT: 0.0

use_cache

Whether or not the model should return the last key/values attentions (not used by all models).

TYPE: `bool`, *optional*, defaults to `True` DEFAULT: True

Example
>>> from transformers import BigBirdPegasusConfig, BigBirdPegasusModel
...
>>> # Initializing a BigBirdPegasus bigbird-pegasus-base style configuration
>>> configuration = BigBirdPegasusConfig()
...
>>> # Initializing a model (with random weights) from the bigbird-pegasus-base style configuration
>>> model = BigBirdPegasusModel(configuration)
...
>>> # Accessing the model configuration
>>> configuration = model.config
Source code in mindnlp/transformers/models/bigbird_pegasus/configuration_bigbird_pegasus.py
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
class BigBirdPegasusConfig(PretrainedConfig):
    r"""
    This is the configuration class to store the configuration of a [`BigBirdPegasusModel`]. It is used to instantiate
    an BigBirdPegasus model according to the specified arguments, defining the model architecture. Instantiating a
    configuration with the defaults will yield a similar configuration to that of the BigBirdPegasus
    [google/bigbird-pegasus-large-arxiv](https://hf-mirror.com/google/bigbird-pegasus-large-arxiv) architecture.

    Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
    documentation from [`PretrainedConfig`] for more information.


    Args:
        vocab_size (`int`, *optional*, defaults to 96103):
            Vocabulary size of the BigBirdPegasus model. Defines the number of different tokens that can be represented
            by the `inputs_ids` passed when calling [`BigBirdPegasusModel`].
        d_model (`int`, *optional*, defaults to 1024):
            Dimension of the layers and the pooler layer.
        encoder_layers (`int`, *optional*, defaults to 16):
            Number of encoder layers.
        decoder_layers (`int`, *optional*, defaults to 16):
            Number of decoder layers.
        encoder_attention_heads (`int`, *optional*, defaults to 16):
            Number of attention heads for each attention layer in the Transformer encoder.
        decoder_attention_heads (`int`, *optional*, defaults to 16):
            Number of attention heads for each attention layer in the Transformer decoder.
        decoder_ffn_dim (`int`, *optional*, defaults to 4096):
            Dimension of the "intermediate" (often named feed-forward) layer in decoder.
        encoder_ffn_dim (`int`, *optional*, defaults to 4096):
            Dimension of the "intermediate" (often named feed-forward) layer in decoder.
        activation_function (`str` or `function`, *optional*, defaults to `"gelu_new"`):
            The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
            `"relu"`, `"silu"` and `"gelu_new"` are supported.
        dropout (`float`, *optional*, defaults to 0.1):
            The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
        attention_dropout (`float`, *optional*, defaults to 0.0):
            The dropout ratio for the attention probabilities.
        activation_dropout (`float`, *optional*, defaults to 0.0):
            The dropout ratio for activations inside the fully connected layer.
        classifier_dropout (`float`, *optional*, defaults to 0.0):
            The dropout ratio for classifier.
        max_position_embeddings (`int`, *optional*, defaults to 4096):
            The maximum sequence length that this model might ever be used with. Typically set this to something large
            just in case (e.g., 1024 or 2048 or 4096).
        init_std (`float`, *optional*, defaults to 0.02):
            The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
        encoder_layerdrop (`float`, *optional*, defaults to 0.0):
            The LayerDrop probability for the encoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556)
            for more details.
        decoder_layerdrop (`float`, *optional*, defaults to 0.0):
            The LayerDrop probability for the decoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556)
            for more details.
        use_cache (`bool`, *optional*, defaults to `True`):
            Whether or not the model should return the last key/values attentions (not used by all models).
        attention_type (`str`, *optional*, defaults to `"block_sparse"`)
            Whether to use block sparse attention (with n complexity) as introduced in paper or original attention
            layer (with n^2 complexity) in encoder. Possible values are `"original_full"` and `"block_sparse"`.
        use_bias (`bool`, *optional*, defaults to `False`)
            Whether to use bias in query, key, value.
        block_size (`int`, *optional*, defaults to 64)
            Size of each block. Useful only when `attention_type == "block_sparse"`.
        num_random_blocks (`int`, *optional*, defaults to 3)
            Each query is going to attend these many number of random blocks. Useful only when `attention_type ==
            "block_sparse"`.
        scale_embeddings (`bool`, *optional*, defaults to `True`)
            Whether to rescale embeddings with (hidden_size ** 0.5).

    Example:
        ```python
        >>> from transformers import BigBirdPegasusConfig, BigBirdPegasusModel
        ...
        >>> # Initializing a BigBirdPegasus bigbird-pegasus-base style configuration
        >>> configuration = BigBirdPegasusConfig()
        ...
        >>> # Initializing a model (with random weights) from the bigbird-pegasus-base style configuration
        >>> model = BigBirdPegasusModel(configuration)
        ...
        >>> # Accessing the model configuration
        >>> configuration = model.config
        ```
    """
    model_type = "bigbird_pegasus"
    keys_to_ignore_at_inference = ["past_key_values"]
    attribute_map = {
        "num_attention_heads": "encoder_attention_heads",
        "hidden_size": "d_model",
        "attention_probs_dropout_prob": "attention_dropout",
    }

    def __init__(
        self,
        vocab_size=96103,
        max_position_embeddings=4096,
        encoder_layers=16,
        encoder_ffn_dim=4096,
        encoder_attention_heads=16,
        decoder_layers=16,
        decoder_ffn_dim=4096,
        decoder_attention_heads=16,
        encoder_layerdrop=0.0,
        decoder_layerdrop=0.0,
        use_cache=True,
        is_encoder_decoder=True,
        activation_function="gelu_new",
        d_model=1024,
        dropout=0.1,
        attention_dropout=0.0,
        activation_dropout=0.0,
        init_std=0.02,
        decoder_start_token_id=2,
        classifier_dropout=0.0,
        scale_embedding=True,
        pad_token_id=0,
        bos_token_id=2,
        eos_token_id=1,
        attention_type="block_sparse",  # only for encoder
        block_size=64,
        num_random_blocks=3,
        use_bias=False,
        **kwargs,
    ):
        """
        Initializes a new instance of the BigBirdPegasusConfig class.

        Args:
            self: The instance of the class.
            vocab_size (int, optional): The size of the vocabulary. Defaults to 96103.
            max_position_embeddings (int, optional): The maximum number of positional embeddings. Defaults to 4096.
            encoder_layers (int, optional): The number of encoder layers. Defaults to 16.
            encoder_ffn_dim (int, optional): The dimension of the encoder feed-forward network. Defaults to 4096.
            encoder_attention_heads (int, optional): The number of attention heads in the encoder. Defaults to 16.
            decoder_layers (int, optional): The number of decoder layers. Defaults to 16.
            decoder_ffn_dim (int, optional): The dimension of the decoder feed-forward network. Defaults to 4096.
            decoder_attention_heads (int, optional): The number of attention heads in the decoder. Defaults to 16.
            encoder_layerdrop (float, optional): The probability of dropping an encoder layer. Defaults to 0.0.
            decoder_layerdrop (float, optional): The probability of dropping a decoder layer. Defaults to 0.0.
            use_cache (bool, optional): Whether to use cache. Defaults to True.
            is_encoder_decoder (bool, optional): Whether the model is an encoder-decoder. Defaults to True.
            activation_function (str, optional): The activation function to be used. Defaults to 'gelu_new'.
            d_model (int, optional): The model dimension. Defaults to 1024.
            dropout (float, optional): The dropout probability. Defaults to 0.1.
            attention_dropout (float, optional): The dropout probability for attention layers. Defaults to 0.0.
            activation_dropout (float, optional): The dropout probability for activation layers. Defaults to 0.0.
            init_std (float, optional): The standard deviation for weight initialization. Defaults to 0.02.
            decoder_start_token_id (int, optional): The start token id for the decoder. Defaults to 2.
            classifier_dropout (float, optional): The dropout probability for the classifier. Defaults to 0.0.
            scale_embedding (bool, optional): Whether to scale the embeddings. Defaults to True.
            pad_token_id (int, optional): The id for padding tokens. Defaults to 0.
            bos_token_id (int, optional): The id for the beginning of sequence token. Defaults to 2.
            eos_token_id (int, optional): The id for the end of sequence token. Defaults to 1.
            attention_type (str, optional): The type of attention mechanism. Defaults to 'block_sparse'.
            block_size (int, optional): The size of blocks for block_sparse attention. Defaults to 64.
            num_random_blocks (int, optional): The number of random blocks for block_sparse attention. Defaults to 3.
            use_bias (bool, optional): Whether to use bias. Defaults to False.

        Returns:
            None.

        Raises:
            None.
        """
        self.vocab_size = vocab_size
        self.max_position_embeddings = max_position_embeddings
        self.d_model = d_model
        self.encoder_ffn_dim = encoder_ffn_dim
        self.encoder_layers = encoder_layers
        self.encoder_attention_heads = encoder_attention_heads
        self.decoder_ffn_dim = decoder_ffn_dim
        self.decoder_layers = decoder_layers
        self.decoder_attention_heads = decoder_attention_heads
        self.dropout = dropout
        self.attention_dropout = attention_dropout
        self.activation_dropout = activation_dropout
        self.activation_function = activation_function
        self.init_std = init_std
        self.encoder_layerdrop = encoder_layerdrop
        self.decoder_layerdrop = decoder_layerdrop
        self.classifier_dropout = classifier_dropout
        self.use_cache = use_cache
        self.num_hidden_layers = encoder_layers
        self.scale_embedding = scale_embedding  # scale factor will be sqrt(d_model) if True

        # extra config
        self.attention_type = attention_type
        self.block_size = block_size
        self.num_random_blocks = num_random_blocks
        self.use_bias = use_bias

        super().__init__(
            pad_token_id=pad_token_id,
            bos_token_id=bos_token_id,
            eos_token_id=eos_token_id,
            is_encoder_decoder=is_encoder_decoder,
            decoder_start_token_id=decoder_start_token_id,
            **kwargs,
        )

mindnlp.transformers.models.bigbird_pegasus.configuration_bigbird_pegasus.BigBirdPegasusConfig.__init__(vocab_size=96103, max_position_embeddings=4096, encoder_layers=16, encoder_ffn_dim=4096, encoder_attention_heads=16, decoder_layers=16, decoder_ffn_dim=4096, decoder_attention_heads=16, encoder_layerdrop=0.0, decoder_layerdrop=0.0, use_cache=True, is_encoder_decoder=True, activation_function='gelu_new', d_model=1024, dropout=0.1, attention_dropout=0.0, activation_dropout=0.0, init_std=0.02, decoder_start_token_id=2, classifier_dropout=0.0, scale_embedding=True, pad_token_id=0, bos_token_id=2, eos_token_id=1, attention_type='block_sparse', block_size=64, num_random_blocks=3, use_bias=False, **kwargs)

Initializes a new instance of the BigBirdPegasusConfig class.

PARAMETER DESCRIPTION
self

The instance of the class.

vocab_size

The size of the vocabulary. Defaults to 96103.

TYPE: int DEFAULT: 96103

max_position_embeddings

The maximum number of positional embeddings. Defaults to 4096.

TYPE: int DEFAULT: 4096

encoder_layers

The number of encoder layers. Defaults to 16.

TYPE: int DEFAULT: 16

encoder_ffn_dim

The dimension of the encoder feed-forward network. Defaults to 4096.

TYPE: int DEFAULT: 4096

encoder_attention_heads

The number of attention heads in the encoder. Defaults to 16.

TYPE: int DEFAULT: 16

decoder_layers

The number of decoder layers. Defaults to 16.

TYPE: int DEFAULT: 16

decoder_ffn_dim

The dimension of the decoder feed-forward network. Defaults to 4096.

TYPE: int DEFAULT: 4096

decoder_attention_heads

The number of attention heads in the decoder. Defaults to 16.

TYPE: int DEFAULT: 16

encoder_layerdrop

The probability of dropping an encoder layer. Defaults to 0.0.

TYPE: float DEFAULT: 0.0

decoder_layerdrop

The probability of dropping a decoder layer. Defaults to 0.0.

TYPE: float DEFAULT: 0.0

use_cache

Whether to use cache. Defaults to True.

TYPE: bool DEFAULT: True

is_encoder_decoder

Whether the model is an encoder-decoder. Defaults to True.

TYPE: bool DEFAULT: True

activation_function

The activation function to be used. Defaults to 'gelu_new'.

TYPE: str DEFAULT: 'gelu_new'

d_model

The model dimension. Defaults to 1024.

TYPE: int DEFAULT: 1024

dropout

The dropout probability. Defaults to 0.1.

TYPE: float DEFAULT: 0.1

attention_dropout

The dropout probability for attention layers. Defaults to 0.0.

TYPE: float DEFAULT: 0.0

activation_dropout

The dropout probability for activation layers. Defaults to 0.0.

TYPE: float DEFAULT: 0.0

init_std

The standard deviation for weight initialization. Defaults to 0.02.

TYPE: float DEFAULT: 0.02

decoder_start_token_id

The start token id for the decoder. Defaults to 2.

TYPE: int DEFAULT: 2

classifier_dropout

The dropout probability for the classifier. Defaults to 0.0.

TYPE: float DEFAULT: 0.0

scale_embedding

Whether to scale the embeddings. Defaults to True.

TYPE: bool DEFAULT: True

pad_token_id

The id for padding tokens. Defaults to 0.

TYPE: int DEFAULT: 0

bos_token_id

The id for the beginning of sequence token. Defaults to 2.

TYPE: int DEFAULT: 2

eos_token_id

The id for the end of sequence token. Defaults to 1.

TYPE: int DEFAULT: 1

attention_type

The type of attention mechanism. Defaults to 'block_sparse'.

TYPE: str DEFAULT: 'block_sparse'

block_size

The size of blocks for block_sparse attention. Defaults to 64.

TYPE: int DEFAULT: 64

num_random_blocks

The number of random blocks for block_sparse attention. Defaults to 3.

TYPE: int DEFAULT: 3

use_bias

Whether to use bias. Defaults to False.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/bigbird_pegasus/configuration_bigbird_pegasus.py
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
def __init__(
    self,
    vocab_size=96103,
    max_position_embeddings=4096,
    encoder_layers=16,
    encoder_ffn_dim=4096,
    encoder_attention_heads=16,
    decoder_layers=16,
    decoder_ffn_dim=4096,
    decoder_attention_heads=16,
    encoder_layerdrop=0.0,
    decoder_layerdrop=0.0,
    use_cache=True,
    is_encoder_decoder=True,
    activation_function="gelu_new",
    d_model=1024,
    dropout=0.1,
    attention_dropout=0.0,
    activation_dropout=0.0,
    init_std=0.02,
    decoder_start_token_id=2,
    classifier_dropout=0.0,
    scale_embedding=True,
    pad_token_id=0,
    bos_token_id=2,
    eos_token_id=1,
    attention_type="block_sparse",  # only for encoder
    block_size=64,
    num_random_blocks=3,
    use_bias=False,
    **kwargs,
):
    """
    Initializes a new instance of the BigBirdPegasusConfig class.

    Args:
        self: The instance of the class.
        vocab_size (int, optional): The size of the vocabulary. Defaults to 96103.
        max_position_embeddings (int, optional): The maximum number of positional embeddings. Defaults to 4096.
        encoder_layers (int, optional): The number of encoder layers. Defaults to 16.
        encoder_ffn_dim (int, optional): The dimension of the encoder feed-forward network. Defaults to 4096.
        encoder_attention_heads (int, optional): The number of attention heads in the encoder. Defaults to 16.
        decoder_layers (int, optional): The number of decoder layers. Defaults to 16.
        decoder_ffn_dim (int, optional): The dimension of the decoder feed-forward network. Defaults to 4096.
        decoder_attention_heads (int, optional): The number of attention heads in the decoder. Defaults to 16.
        encoder_layerdrop (float, optional): The probability of dropping an encoder layer. Defaults to 0.0.
        decoder_layerdrop (float, optional): The probability of dropping a decoder layer. Defaults to 0.0.
        use_cache (bool, optional): Whether to use cache. Defaults to True.
        is_encoder_decoder (bool, optional): Whether the model is an encoder-decoder. Defaults to True.
        activation_function (str, optional): The activation function to be used. Defaults to 'gelu_new'.
        d_model (int, optional): The model dimension. Defaults to 1024.
        dropout (float, optional): The dropout probability. Defaults to 0.1.
        attention_dropout (float, optional): The dropout probability for attention layers. Defaults to 0.0.
        activation_dropout (float, optional): The dropout probability for activation layers. Defaults to 0.0.
        init_std (float, optional): The standard deviation for weight initialization. Defaults to 0.02.
        decoder_start_token_id (int, optional): The start token id for the decoder. Defaults to 2.
        classifier_dropout (float, optional): The dropout probability for the classifier. Defaults to 0.0.
        scale_embedding (bool, optional): Whether to scale the embeddings. Defaults to True.
        pad_token_id (int, optional): The id for padding tokens. Defaults to 0.
        bos_token_id (int, optional): The id for the beginning of sequence token. Defaults to 2.
        eos_token_id (int, optional): The id for the end of sequence token. Defaults to 1.
        attention_type (str, optional): The type of attention mechanism. Defaults to 'block_sparse'.
        block_size (int, optional): The size of blocks for block_sparse attention. Defaults to 64.
        num_random_blocks (int, optional): The number of random blocks for block_sparse attention. Defaults to 3.
        use_bias (bool, optional): Whether to use bias. Defaults to False.

    Returns:
        None.

    Raises:
        None.
    """
    self.vocab_size = vocab_size
    self.max_position_embeddings = max_position_embeddings
    self.d_model = d_model
    self.encoder_ffn_dim = encoder_ffn_dim
    self.encoder_layers = encoder_layers
    self.encoder_attention_heads = encoder_attention_heads
    self.decoder_ffn_dim = decoder_ffn_dim
    self.decoder_layers = decoder_layers
    self.decoder_attention_heads = decoder_attention_heads
    self.dropout = dropout
    self.attention_dropout = attention_dropout
    self.activation_dropout = activation_dropout
    self.activation_function = activation_function
    self.init_std = init_std
    self.encoder_layerdrop = encoder_layerdrop
    self.decoder_layerdrop = decoder_layerdrop
    self.classifier_dropout = classifier_dropout
    self.use_cache = use_cache
    self.num_hidden_layers = encoder_layers
    self.scale_embedding = scale_embedding  # scale factor will be sqrt(d_model) if True

    # extra config
    self.attention_type = attention_type
    self.block_size = block_size
    self.num_random_blocks = num_random_blocks
    self.use_bias = use_bias

    super().__init__(
        pad_token_id=pad_token_id,
        bos_token_id=bos_token_id,
        eos_token_id=eos_token_id,
        is_encoder_decoder=is_encoder_decoder,
        decoder_start_token_id=decoder_start_token_id,
        **kwargs,
    )

mindnlp.transformers.models.bigbird_pegasus.modeling_bigbird_pegasus.BigBirdPegasusForCausalLM

Bases: BigBirdPegasusPreTrainedModel

The BigBirdPegasusForCausalLM class represents a BigBird Pegasus model for causal language modeling tasks. It inherits from the BigBirdPegasusPreTrainedModel class.

The class initializes the model with the provided configuration and defines methods for getting and setting input and output embeddings, setting the decoder, and forwarding the model for generation. Additionally, it provides methods for preparing inputs for generation and reordering cache for beam search.

The forward method processes the input data for the model and returns the model outputs. The prepare_inputs_for_generation method prepares input data for generation, and the _reorder_cache method reorders the cache for beam search.

The class also includes detailed documentation for the input and output parameters of the forward method, providing information on the usage and functionality of each parameter.

Example usage of the BigBirdPegasusForCausalLM class is provided in the docstring, demonstrating how to initialize the model and generate predictions.

Source code in mindnlp/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py
3752
3753
3754
3755
3756
3757
3758
3759
3760
3761
3762
3763
3764
3765
3766
3767
3768
3769
3770
3771
3772
3773
3774
3775
3776
3777
3778
3779
3780
3781
3782
3783
3784
3785
3786
3787
3788
3789
3790
3791
3792
3793
3794
3795
3796
3797
3798
3799
3800
3801
3802
3803
3804
3805
3806
3807
3808
3809
3810
3811
3812
3813
3814
3815
3816
3817
3818
3819
3820
3821
3822
3823
3824
3825
3826
3827
3828
3829
3830
3831
3832
3833
3834
3835
3836
3837
3838
3839
3840
3841
3842
3843
3844
3845
3846
3847
3848
3849
3850
3851
3852
3853
3854
3855
3856
3857
3858
3859
3860
3861
3862
3863
3864
3865
3866
3867
3868
3869
3870
3871
3872
3873
3874
3875
3876
3877
3878
3879
3880
3881
3882
3883
3884
3885
3886
3887
3888
3889
3890
3891
3892
3893
3894
3895
3896
3897
3898
3899
3900
3901
3902
3903
3904
3905
3906
3907
3908
3909
3910
3911
3912
3913
3914
3915
3916
3917
3918
3919
3920
3921
3922
3923
3924
3925
3926
3927
3928
3929
3930
3931
3932
3933
3934
3935
3936
3937
3938
3939
3940
3941
3942
3943
3944
3945
3946
3947
3948
3949
3950
3951
3952
3953
3954
3955
3956
3957
3958
3959
3960
3961
3962
3963
3964
3965
3966
3967
3968
3969
3970
3971
3972
3973
3974
3975
3976
3977
3978
3979
3980
3981
3982
3983
3984
3985
3986
3987
3988
3989
3990
3991
3992
3993
3994
3995
3996
3997
3998
3999
4000
4001
4002
4003
4004
4005
4006
4007
4008
4009
4010
4011
4012
4013
4014
4015
4016
4017
4018
4019
4020
4021
4022
4023
4024
4025
4026
4027
4028
4029
4030
4031
4032
4033
4034
4035
4036
4037
4038
4039
4040
4041
4042
4043
4044
4045
4046
4047
4048
4049
4050
4051
4052
4053
4054
4055
4056
4057
4058
4059
4060
4061
4062
4063
4064
4065
4066
4067
4068
4069
4070
4071
4072
4073
4074
4075
4076
4077
4078
4079
4080
4081
4082
4083
4084
4085
4086
4087
4088
4089
4090
4091
4092
4093
4094
4095
4096
4097
4098
4099
4100
4101
4102
4103
4104
4105
4106
4107
4108
4109
4110
4111
4112
4113
4114
4115
4116
4117
class BigBirdPegasusForCausalLM(BigBirdPegasusPreTrainedModel):

    """
    The `BigBirdPegasusForCausalLM` class represents a BigBird Pegasus model for causal language modeling tasks.
    It inherits from the `BigBirdPegasusPreTrainedModel` class.

    The class initializes the model with the provided configuration and defines methods for getting and
    setting input and output embeddings, setting the decoder, and forwarding the model for generation.
    Additionally, it provides methods for preparing inputs for generation and reordering cache for beam search.

    The `forward` method processes the input data for the model and returns the model outputs.
    The `prepare_inputs_for_generation` method prepares input data for generation, and the `_reorder_cache` method
    reorders the cache for beam search.

    The class also includes detailed documentation for the input and output parameters of the `forward` method,
    providing information on the usage and functionality of each parameter.

    Example usage of the `BigBirdPegasusForCausalLM` class is provided in the docstring, demonstrating how to
    initialize the model and generate predictions.

    """
    _tied_weights_keys = ["lm_head.weight"]

    def __init__(self, config):
        """
        Initializes the BigBirdPegasusForCausalLM class.

        Args:
            self: The instance of the class.
            config: A configuration object containing the model's configuration parameters.
                It is expected to be a dictionary or an object that can be deep-copied.
                It should include the necessary parameters for initializing the model.
                The 'is_decoder' and 'is_encoder_decoder' attributes will be modified within this method.

        Returns:
            None.

        Raises:
            AttributeError: If the 'config' parameter is missing required attributes.
            TypeError: If the 'config' parameter is not of the expected type.
            ValueError: If the 'config' parameter contains invalid values.
        """
        config = copy.deepcopy(config)
        config.is_decoder = True
        config.is_encoder_decoder = False
        super().__init__(config)
        self.model = BigBirdPegasusDecoderWrapper(config)

        self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=False)

        # Initialize weights and apply final processing
        self.post_init()

    def get_input_embeddings(self):
        """
        Method: get_input_embeddings

        Description:
            Returns the input embeddings used by the BigBirdPegasusForCausalLM model's decoder.

        Args:
            self (object): The instance of the BigBirdPegasusForCausalLM class.

        Returns:
            None:
                This method returns None as it directly retrieves and returns the input embeddings from the decoder of the model.

        Raises:
            None
        """
        return self.model.decoder.embed_tokens

    def set_input_embeddings(self, value):
        """
        Sets the input embeddings for the BigBirdPegasusForCausalLM model.

        Args:
            self (BigBirdPegasusForCausalLM): The instance of the BigBirdPegasusForCausalLM class.
            value: The input embeddings to be set for the model. This should be of type torch.Tensor.

        Returns:
            None.

        Raises:
            None.

        This method sets the input embeddings for the BigBirdPegasusForCausalLM model.
        It assigns the given 'value' to the 'embed_tokens' attribute of the decoder in the model.
        The 'embed_tokens' attribute represents the embedding layer used for token inputs in the decoder.
        By setting the input embeddings, the model will use the provided embeddings during inference and decoding.

        Note:
            It is important to ensure that the 'value' parameter is a tensor of shape (vocab_size, embedding_dim)
            where 'vocab_size' is the size of the vocabulary and 'embedding_dim' is the dimensionality of the
            embedding space.
        """
        self.model.decoder.embed_tokens = value

    def get_output_embeddings(self):
        """
        Method to retrieve the output embeddings from the BigBirdPegasusForCausalLM model.

        Args:
            self (BigBirdPegasusForCausalLM): The instance of the BigBirdPegasusForCausalLM class.
                This parameter refers to the current instance of the model.

        Returns:
            lm_head: The method returns the 'lm_head' attribute of the model, which represents the output embeddings.

        Raises:
            None.
        """
        return self.lm_head

    def set_output_embeddings(self, new_embeddings):
        """Set the output embeddings for the BigBirdPegasusForCausalLM model.

        Args:
            self (BigBirdPegasusForCausalLM): The instance of the BigBirdPegasusForCausalLM class.
            new_embeddings (Any): The new embeddings to be set for the output layer.

        Returns:
            None:
                This method updates the lm_head attribute of the BigBirdPegasusForCausalLM instance with the new embeddings.

        Raises:
            None.
        """
        self.lm_head = new_embeddings

    def set_decoder(self, decoder):
        """
        Sets the decoder for the BigBirdPegasusForCausalLM model.

        Args:
            self (BigBirdPegasusForCausalLM): The instance of the BigBirdPegasusForCausalLM class.
            decoder: The decoder object to be set for the model. It should be of the appropriate type.

        Returns:
            None.

        Raises:
            None.
        """
        self.model.decoder = decoder

    def get_decoder(self):
        """
        Retrieve the decoder component from the BigBirdPegasusForCausalLM model.

        Args:
            self (object): Instance of the BigBirdPegasusForCausalLM class.
                This parameter is required to access the model attributes.

        Returns:
            NoneType: This method returns the decoder component of the model.
                The decoder is responsible for generating the output sequences.

        Raises:
            No specific exceptions are raised by this method.
        """
        return self.model.decoder

    def forward(
        self,
        input_ids: mindspore.Tensor = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        encoder_hidden_states: Optional[mindspore.Tensor] = None,
        encoder_attention_mask: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        cross_attn_head_mask: Optional[mindspore.Tensor] = None,
        past_key_values: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        labels: Optional[mindspore.Tensor] = None,
        use_cache: Optional[bool] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple, CausalLMOutputWithCrossAttentions]:
        r"""
        Args:
            input_ids (`mindspore.Tensor` of shape `(batch_size, sequence_length)`):
                Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you
                provide it.

                Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and
                [`PreTrainedTokenizer.__call__`] for details.

                [What are input IDs?](../glossary#input-ids)
            attention_mask (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
                Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:

                - 1 for tokens that are **not masked**,
                - 0 for tokens that are **masked**.

                [What are attention masks?](../glossary#attention-mask)
            encoder_hidden_states  (`mindspore.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
                Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention
                if the model is configured as a decoder.
            encoder_attention_mask (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
                Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used
                in the cross-attention if the model is configured as a decoder. Mask values selected in `[0, 1]`:
            head_mask (`mindspore.Tensor` of shape `(decoder_layers, decoder_attention_heads)`, *optional*):
                Mask to nullify selected heads of the attention modules. Mask values selected in `[0, 1]`:

                - 1 indicates the head is **not masked**,
                - 0 indicates the head is **masked**.
            cross_attn_head_mask (`mindspore.Tensor` of shape `(decoder_layers, decoder_attention_heads)`, *optional*):
                Mask to nullify selected heads of the cross-attention modules. Mask values selected in `[0, 1]`:

                - 1 indicates the head is **not masked**,
                - 0 indicates the head is **masked**.
            past_key_values (`tuple(tuple(mindspore.Tensor))`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`):
                Tuple of `tuple(mindspore.Tensor)` of length `config.n_layers`, with each tuple having 2 tensors of
                shape `(batch_size, num_heads, sequence_length, embed_size_per_head)`) and 2 additional tensors of
                shape `(batch_size, num_heads, encoder_sequence_length, embed_size_per_head)`. The two additional
                tensors are only required when the model is used as a decoder in a Sequence to Sequence model.

                Contains pre-computed hidden-states (key and values in the self-attention blocks and in the
                cross-attention blocks) that can be used (see `past_key_values` input) to speed up sequential decoding.

                If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
                that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
                all `decoder_input_ids` of shape `(batch_size, sequence_length)`.
            labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
                Labels for computing the masked language modeling loss. Indices should either be in `[0, ...,
                config.vocab_size]` or -100 (see `input_ids` docstring). Tokens with indices set to `-100` are ignored
                (masked), the loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`.
            use_cache (`bool`, *optional*):
                If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
                (see `past_key_values`).

                - 1 for tokens that are **not masked**,
                - 0 for tokens that are **masked**.
            output_attentions (`bool`, *optional*):
                Whether or not to return the attentions tensors of all attention layers. See `attentions` under
                returned tensors for more detail.
            output_hidden_states (`bool`, *optional*):
                Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors
                for more detail.
            return_dict (`bool`, *optional*):
                Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.

        Returns:
            Union[Tuple, CausalLMOutputWithCrossAttentions]

        Example:
            ```python
            >>> from transformers import AutoTokenizer, BigBirdPegasusForCausalLM
            ...
            >>> tokenizer = AutoTokenizer.from_pretrained("google/bigbird-pegasus-large-arxiv")
            >>> model = BigBirdPegasusForCausalLM.from_pretrained(
            ...     "google/bigbird-pegasus-large-arxiv", add_cross_attention=False
            ... )
            >>> assert model.config.is_decoder, f"{model.__class__} has to be configured as a decoder."
            >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
            >>> outputs = model(**inputs)
            ...
            >>> logits = outputs.logits
            ```
        """
        output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
        output_hidden_states = (
            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
        )
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        # decoder outputs consists of (dec_features, layer_state, dec_hidden, dec_attn)
        outputs = self.model.decoder(
            input_ids=input_ids,
            attention_mask=attention_mask,
            encoder_hidden_states=encoder_hidden_states,
            encoder_attention_mask=encoder_attention_mask,
            head_mask=head_mask,
            cross_attn_head_mask=cross_attn_head_mask,
            past_key_values=past_key_values,
            inputs_embeds=inputs_embeds,
            use_cache=use_cache,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        logits = self.lm_head(outputs[0])

        loss = None
        if labels is not None:
            loss = F.cross_entropy(logits.view(-1, self.config.vocab_size), labels.view(-1))

        if not return_dict:
            output = (logits,) + outputs[1:]
            return (loss,) + output if loss is not None else output

        return CausalLMOutputWithCrossAttentions(
            loss=loss,
            logits=logits,
            past_key_values=outputs.past_key_values,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
            cross_attentions=outputs.cross_attentions,
        )

    def prepare_inputs_for_generation(
        self, input_ids, past_key_values=None, attention_mask=None, use_cache=None, **kwargs
    ):
        """
        Prepare inputs for generation.

        Args:
            self (BigBirdPegasusForCausalLM): The instance of the BigBirdPegasusForCausalLM class.
            input_ids (torch.Tensor): The input tensor of shape (batch_size, sequence_length).
            past_key_values (Optional[Union[Tuple[torch.Tensor], Tuple[torch.Tensor, torch.Tensor]]]):
                Optional tuple of past key and value tensors.
            attention_mask (Optional[torch.Tensor]): The attention mask tensor of shape (batch_size, sequence_length).
                If not provided, it will be initialized with ones.
            use_cache (Optional[bool]): Whether to use cache for faster decoding.

        Returns:
            Dict[str, Union[torch.Tensor, Tuple[torch.Tensor], bool]]:
                A dictionary containing the following items:

                - 'input_ids' (torch.Tensor): The input tensor.
                - 'attention_mask' (torch.Tensor): The attention mask tensor.
                - 'past_key_values' (Optional[Union[Tuple[torch.Tensor], Tuple[torch.Tensor, torch.Tensor]]]):
                Optional tuple of past key and value tensors.
                - 'use_cache' (Optional[bool]): Whether to use cache for faster decoding.

        Raises:
            None.
        """
        # if model is used as a decoder in encoder-decoder model, the decoder attention mask is created on the fly
        if attention_mask is None:
            attention_mask = ops.ones(*input_ids.shape, dtype=input_ids.dtype)

        if past_key_values:
            input_ids = input_ids[:, -1:]
        # first step, decoder_cached_states are empty
        return {
            "input_ids": input_ids,  # encoder_outputs is defined. input_ids not needed
            "attention_mask": attention_mask,
            "past_key_values": past_key_values,
            "use_cache": use_cache,
        }

    @staticmethod
    def _reorder_cache(past_key_values, beam_idx):
        """
        Method to reorder the cache values according to the given beam index.

        Args:
            past_key_values (tuple): A tuple containing the past key values for each layer.
            beam_idx (Tensor): A tensor representing the indices of beams.

        Returns:
            None:
                This method does not return any value, but it updates the past key values based on the specified beam index.

        Raises:
            None
        """
        reordered_past = ()
        for layer_past in past_key_values:
            reordered_past += (
                tuple(past_state.index_select(0, beam_idx) for past_state in layer_past),
            )
        return reordered_past

mindnlp.transformers.models.bigbird_pegasus.modeling_bigbird_pegasus.BigBirdPegasusForCausalLM.__init__(config)

Initializes the BigBirdPegasusForCausalLM class.

PARAMETER DESCRIPTION
self

The instance of the class.

config

A configuration object containing the model's configuration parameters. It is expected to be a dictionary or an object that can be deep-copied. It should include the necessary parameters for initializing the model. The 'is_decoder' and 'is_encoder_decoder' attributes will be modified within this method.

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
AttributeError

If the 'config' parameter is missing required attributes.

TypeError

If the 'config' parameter is not of the expected type.

ValueError

If the 'config' parameter contains invalid values.

Source code in mindnlp/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py
3775
3776
3777
3778
3779
3780
3781
3782
3783
3784
3785
3786
3787
3788
3789
3790
3791
3792
3793
3794
3795
3796
3797
3798
3799
3800
3801
3802
3803
def __init__(self, config):
    """
    Initializes the BigBirdPegasusForCausalLM class.

    Args:
        self: The instance of the class.
        config: A configuration object containing the model's configuration parameters.
            It is expected to be a dictionary or an object that can be deep-copied.
            It should include the necessary parameters for initializing the model.
            The 'is_decoder' and 'is_encoder_decoder' attributes will be modified within this method.

    Returns:
        None.

    Raises:
        AttributeError: If the 'config' parameter is missing required attributes.
        TypeError: If the 'config' parameter is not of the expected type.
        ValueError: If the 'config' parameter contains invalid values.
    """
    config = copy.deepcopy(config)
    config.is_decoder = True
    config.is_encoder_decoder = False
    super().__init__(config)
    self.model = BigBirdPegasusDecoderWrapper(config)

    self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=False)

    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.bigbird_pegasus.modeling_bigbird_pegasus.BigBirdPegasusForCausalLM.forward(input_ids=None, attention_mask=None, encoder_hidden_states=None, encoder_attention_mask=None, head_mask=None, cross_attn_head_mask=None, past_key_values=None, inputs_embeds=None, labels=None, use_cache=None, output_attentions=None, output_hidden_states=None, return_dict=None)

PARAMETER DESCRIPTION
input_ids

Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide it.

Indices can be obtained using [AutoTokenizer]. See [PreTrainedTokenizer.encode] and [PreTrainedTokenizer.__call__] for details.

What are input IDs?

TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length)` DEFAULT: None

attention_mask

Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:

  • 1 for tokens that are not masked,
  • 0 for tokens that are masked.

What are attention masks?

TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional* DEFAULT: None

encoder_hidden_states

Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention if the model is configured as a decoder.

TYPE: (`mindspore.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional* DEFAULT: None

encoder_attention_mask

Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used in the cross-attention if the model is configured as a decoder. Mask values selected in [0, 1]:

TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional* DEFAULT: None

head_mask

Mask to nullify selected heads of the attention modules. Mask values selected in [0, 1]:

  • 1 indicates the head is not masked,
  • 0 indicates the head is masked.

TYPE: `mindspore.Tensor` of shape `(decoder_layers, decoder_attention_heads)`, *optional* DEFAULT: None

cross_attn_head_mask

Mask to nullify selected heads of the cross-attention modules. Mask values selected in [0, 1]:

  • 1 indicates the head is not masked,
  • 0 indicates the head is masked.

TYPE: `mindspore.Tensor` of shape `(decoder_layers, decoder_attention_heads)`, *optional* DEFAULT: None

past_key_values

Tuple of tuple(mindspore.Tensor) of length config.n_layers, with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)) and 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head). The two additional tensors are only required when the model is used as a decoder in a Sequence to Sequence model.

Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) that can be used (see past_key_values input) to speed up sequential decoding.

If past_key_values are used, the user can optionally input only the last decoder_input_ids (those that don't have their past key value states given to this model) of shape (batch_size, 1) instead of all decoder_input_ids of shape (batch_size, sequence_length).

TYPE: `tuple(tuple(mindspore.Tensor))`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True` DEFAULT: None

labels

Labels for computing the masked language modeling loss. Indices should either be in [0, ..., config.vocab_size] or -100 (see input_ids docstring). Tokens with indices set to -100 are ignored (masked), the loss is only computed for the tokens with labels in [0, ..., config.vocab_size].

TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional* DEFAULT: None

use_cache

If set to True, past_key_values key value states are returned and can be used to speed up decoding (see past_key_values).

  • 1 for tokens that are not masked,
  • 0 for tokens that are masked.

TYPE: `bool`, *optional* DEFAULT: None

output_attentions

Whether or not to return the attentions tensors of all attention layers. See attentions under returned tensors for more detail.

TYPE: `bool`, *optional* DEFAULT: None

output_hidden_states

Whether or not to return the hidden states of all layers. See hidden_states under returned tensors for more detail.

TYPE: `bool`, *optional* DEFAULT: None

return_dict

Whether or not to return a [~utils.ModelOutput] instead of a plain tuple.

TYPE: `bool`, *optional* DEFAULT: None

RETURNS DESCRIPTION
Union[Tuple, CausalLMOutputWithCrossAttentions]

Union[Tuple, CausalLMOutputWithCrossAttentions]

Example
>>> from transformers import AutoTokenizer, BigBirdPegasusForCausalLM
...
>>> tokenizer = AutoTokenizer.from_pretrained("google/bigbird-pegasus-large-arxiv")
>>> model = BigBirdPegasusForCausalLM.from_pretrained(
...     "google/bigbird-pegasus-large-arxiv", add_cross_attention=False
... )
>>> assert model.config.is_decoder, f"{model.__class__} has to be configured as a decoder."
>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> outputs = model(**inputs)
...
>>> logits = outputs.logits
Source code in mindnlp/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py
3915
3916
3917
3918
3919
3920
3921
3922
3923
3924
3925
3926
3927
3928
3929
3930
3931
3932
3933
3934
3935
3936
3937
3938
3939
3940
3941
3942
3943
3944
3945
3946
3947
3948
3949
3950
3951
3952
3953
3954
3955
3956
3957
3958
3959
3960
3961
3962
3963
3964
3965
3966
3967
3968
3969
3970
3971
3972
3973
3974
3975
3976
3977
3978
3979
3980
3981
3982
3983
3984
3985
3986
3987
3988
3989
3990
3991
3992
3993
3994
3995
3996
3997
3998
3999
4000
4001
4002
4003
4004
4005
4006
4007
4008
4009
4010
4011
4012
4013
4014
4015
4016
4017
4018
4019
4020
4021
4022
4023
4024
4025
4026
4027
4028
4029
4030
4031
4032
4033
4034
4035
4036
4037
4038
4039
4040
4041
4042
4043
4044
4045
4046
4047
4048
4049
4050
4051
4052
def forward(
    self,
    input_ids: mindspore.Tensor = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    encoder_hidden_states: Optional[mindspore.Tensor] = None,
    encoder_attention_mask: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    cross_attn_head_mask: Optional[mindspore.Tensor] = None,
    past_key_values: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    labels: Optional[mindspore.Tensor] = None,
    use_cache: Optional[bool] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[Tuple, CausalLMOutputWithCrossAttentions]:
    r"""
    Args:
        input_ids (`mindspore.Tensor` of shape `(batch_size, sequence_length)`):
            Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you
            provide it.

            Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and
            [`PreTrainedTokenizer.__call__`] for details.

            [What are input IDs?](../glossary#input-ids)
        attention_mask (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
            Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:

            - 1 for tokens that are **not masked**,
            - 0 for tokens that are **masked**.

            [What are attention masks?](../glossary#attention-mask)
        encoder_hidden_states  (`mindspore.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
            Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention
            if the model is configured as a decoder.
        encoder_attention_mask (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
            Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used
            in the cross-attention if the model is configured as a decoder. Mask values selected in `[0, 1]`:
        head_mask (`mindspore.Tensor` of shape `(decoder_layers, decoder_attention_heads)`, *optional*):
            Mask to nullify selected heads of the attention modules. Mask values selected in `[0, 1]`:

            - 1 indicates the head is **not masked**,
            - 0 indicates the head is **masked**.
        cross_attn_head_mask (`mindspore.Tensor` of shape `(decoder_layers, decoder_attention_heads)`, *optional*):
            Mask to nullify selected heads of the cross-attention modules. Mask values selected in `[0, 1]`:

            - 1 indicates the head is **not masked**,
            - 0 indicates the head is **masked**.
        past_key_values (`tuple(tuple(mindspore.Tensor))`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`):
            Tuple of `tuple(mindspore.Tensor)` of length `config.n_layers`, with each tuple having 2 tensors of
            shape `(batch_size, num_heads, sequence_length, embed_size_per_head)`) and 2 additional tensors of
            shape `(batch_size, num_heads, encoder_sequence_length, embed_size_per_head)`. The two additional
            tensors are only required when the model is used as a decoder in a Sequence to Sequence model.

            Contains pre-computed hidden-states (key and values in the self-attention blocks and in the
            cross-attention blocks) that can be used (see `past_key_values` input) to speed up sequential decoding.

            If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
            that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
            all `decoder_input_ids` of shape `(batch_size, sequence_length)`.
        labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
            Labels for computing the masked language modeling loss. Indices should either be in `[0, ...,
            config.vocab_size]` or -100 (see `input_ids` docstring). Tokens with indices set to `-100` are ignored
            (masked), the loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`.
        use_cache (`bool`, *optional*):
            If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
            (see `past_key_values`).

            - 1 for tokens that are **not masked**,
            - 0 for tokens that are **masked**.
        output_attentions (`bool`, *optional*):
            Whether or not to return the attentions tensors of all attention layers. See `attentions` under
            returned tensors for more detail.
        output_hidden_states (`bool`, *optional*):
            Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors
            for more detail.
        return_dict (`bool`, *optional*):
            Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.

    Returns:
        Union[Tuple, CausalLMOutputWithCrossAttentions]

    Example:
        ```python
        >>> from transformers import AutoTokenizer, BigBirdPegasusForCausalLM
        ...
        >>> tokenizer = AutoTokenizer.from_pretrained("google/bigbird-pegasus-large-arxiv")
        >>> model = BigBirdPegasusForCausalLM.from_pretrained(
        ...     "google/bigbird-pegasus-large-arxiv", add_cross_attention=False
        ... )
        >>> assert model.config.is_decoder, f"{model.__class__} has to be configured as a decoder."
        >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
        >>> outputs = model(**inputs)
        ...
        >>> logits = outputs.logits
        ```
    """
    output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
    output_hidden_states = (
        output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
    )
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    # decoder outputs consists of (dec_features, layer_state, dec_hidden, dec_attn)
    outputs = self.model.decoder(
        input_ids=input_ids,
        attention_mask=attention_mask,
        encoder_hidden_states=encoder_hidden_states,
        encoder_attention_mask=encoder_attention_mask,
        head_mask=head_mask,
        cross_attn_head_mask=cross_attn_head_mask,
        past_key_values=past_key_values,
        inputs_embeds=inputs_embeds,
        use_cache=use_cache,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    logits = self.lm_head(outputs[0])

    loss = None
    if labels is not None:
        loss = F.cross_entropy(logits.view(-1, self.config.vocab_size), labels.view(-1))

    if not return_dict:
        output = (logits,) + outputs[1:]
        return (loss,) + output if loss is not None else output

    return CausalLMOutputWithCrossAttentions(
        loss=loss,
        logits=logits,
        past_key_values=outputs.past_key_values,
        hidden_states=outputs.hidden_states,
        attentions=outputs.attentions,
        cross_attentions=outputs.cross_attentions,
    )

mindnlp.transformers.models.bigbird_pegasus.modeling_bigbird_pegasus.BigBirdPegasusForCausalLM.get_decoder()

Retrieve the decoder component from the BigBirdPegasusForCausalLM model.

PARAMETER DESCRIPTION
self

Instance of the BigBirdPegasusForCausalLM class. This parameter is required to access the model attributes.

TYPE: object

RETURNS DESCRIPTION
NoneType

This method returns the decoder component of the model. The decoder is responsible for generating the output sequences.

Source code in mindnlp/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py
3898
3899
3900
3901
3902
3903
3904
3905
3906
3907
3908
3909
3910
3911
3912
3913
def get_decoder(self):
    """
    Retrieve the decoder component from the BigBirdPegasusForCausalLM model.

    Args:
        self (object): Instance of the BigBirdPegasusForCausalLM class.
            This parameter is required to access the model attributes.

    Returns:
        NoneType: This method returns the decoder component of the model.
            The decoder is responsible for generating the output sequences.

    Raises:
        No specific exceptions are raised by this method.
    """
    return self.model.decoder

mindnlp.transformers.models.bigbird_pegasus.modeling_bigbird_pegasus.BigBirdPegasusForCausalLM.get_input_embeddings()

Description

Returns the input embeddings used by the BigBirdPegasusForCausalLM model's decoder.

PARAMETER DESCRIPTION
self

The instance of the BigBirdPegasusForCausalLM class.

TYPE: object

RETURNS DESCRIPTION
None

This method returns None as it directly retrieves and returns the input embeddings from the decoder of the model.

Source code in mindnlp/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py
3805
3806
3807
3808
3809
3810
3811
3812
3813
3814
3815
3816
3817
3818
3819
3820
3821
3822
def get_input_embeddings(self):
    """
    Method: get_input_embeddings

    Description:
        Returns the input embeddings used by the BigBirdPegasusForCausalLM model's decoder.

    Args:
        self (object): The instance of the BigBirdPegasusForCausalLM class.

    Returns:
        None:
            This method returns None as it directly retrieves and returns the input embeddings from the decoder of the model.

    Raises:
        None
    """
    return self.model.decoder.embed_tokens

mindnlp.transformers.models.bigbird_pegasus.modeling_bigbird_pegasus.BigBirdPegasusForCausalLM.get_output_embeddings()

Method to retrieve the output embeddings from the BigBirdPegasusForCausalLM model.

PARAMETER DESCRIPTION
self

The instance of the BigBirdPegasusForCausalLM class. This parameter refers to the current instance of the model.

TYPE: BigBirdPegasusForCausalLM

RETURNS DESCRIPTION
lm_head

The method returns the 'lm_head' attribute of the model, which represents the output embeddings.

Source code in mindnlp/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py
3850
3851
3852
3853
3854
3855
3856
3857
3858
3859
3860
3861
3862
3863
3864
def get_output_embeddings(self):
    """
    Method to retrieve the output embeddings from the BigBirdPegasusForCausalLM model.

    Args:
        self (BigBirdPegasusForCausalLM): The instance of the BigBirdPegasusForCausalLM class.
            This parameter refers to the current instance of the model.

    Returns:
        lm_head: The method returns the 'lm_head' attribute of the model, which represents the output embeddings.

    Raises:
        None.
    """
    return self.lm_head

mindnlp.transformers.models.bigbird_pegasus.modeling_bigbird_pegasus.BigBirdPegasusForCausalLM.prepare_inputs_for_generation(input_ids, past_key_values=None, attention_mask=None, use_cache=None, **kwargs)

Prepare inputs for generation.

PARAMETER DESCRIPTION
self

The instance of the BigBirdPegasusForCausalLM class.

TYPE: BigBirdPegasusForCausalLM

input_ids

The input tensor of shape (batch_size, sequence_length).

TYPE: Tensor

past_key_values

Optional tuple of past key and value tensors.

TYPE: Optional[Union[Tuple[Tensor], Tuple[Tensor, Tensor]]] DEFAULT: None

attention_mask

The attention mask tensor of shape (batch_size, sequence_length). If not provided, it will be initialized with ones.

TYPE: Optional[Tensor] DEFAULT: None

use_cache

Whether to use cache for faster decoding.

TYPE: Optional[bool] DEFAULT: None

RETURNS DESCRIPTION

Dict[str, Union[torch.Tensor, Tuple[torch.Tensor], bool]]: A dictionary containing the following items:

  • 'input_ids' (torch.Tensor): The input tensor.
  • 'attention_mask' (torch.Tensor): The attention mask tensor.
  • 'past_key_values' (Optional[Union[Tuple[torch.Tensor], Tuple[torch.Tensor, torch.Tensor]]]): Optional tuple of past key and value tensors.
  • 'use_cache' (Optional[bool]): Whether to use cache for faster decoding.
Source code in mindnlp/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py
4054
4055
4056
4057
4058
4059
4060
4061
4062
4063
4064
4065
4066
4067
4068
4069
4070
4071
4072
4073
4074
4075
4076
4077
4078
4079
4080
4081
4082
4083
4084
4085
4086
4087
4088
4089
4090
4091
4092
4093
4094
def prepare_inputs_for_generation(
    self, input_ids, past_key_values=None, attention_mask=None, use_cache=None, **kwargs
):
    """
    Prepare inputs for generation.

    Args:
        self (BigBirdPegasusForCausalLM): The instance of the BigBirdPegasusForCausalLM class.
        input_ids (torch.Tensor): The input tensor of shape (batch_size, sequence_length).
        past_key_values (Optional[Union[Tuple[torch.Tensor], Tuple[torch.Tensor, torch.Tensor]]]):
            Optional tuple of past key and value tensors.
        attention_mask (Optional[torch.Tensor]): The attention mask tensor of shape (batch_size, sequence_length).
            If not provided, it will be initialized with ones.
        use_cache (Optional[bool]): Whether to use cache for faster decoding.

    Returns:
        Dict[str, Union[torch.Tensor, Tuple[torch.Tensor], bool]]:
            A dictionary containing the following items:

            - 'input_ids' (torch.Tensor): The input tensor.
            - 'attention_mask' (torch.Tensor): The attention mask tensor.
            - 'past_key_values' (Optional[Union[Tuple[torch.Tensor], Tuple[torch.Tensor, torch.Tensor]]]):
            Optional tuple of past key and value tensors.
            - 'use_cache' (Optional[bool]): Whether to use cache for faster decoding.

    Raises:
        None.
    """
    # if model is used as a decoder in encoder-decoder model, the decoder attention mask is created on the fly
    if attention_mask is None:
        attention_mask = ops.ones(*input_ids.shape, dtype=input_ids.dtype)

    if past_key_values:
        input_ids = input_ids[:, -1:]
    # first step, decoder_cached_states are empty
    return {
        "input_ids": input_ids,  # encoder_outputs is defined. input_ids not needed
        "attention_mask": attention_mask,
        "past_key_values": past_key_values,
        "use_cache": use_cache,
    }

mindnlp.transformers.models.bigbird_pegasus.modeling_bigbird_pegasus.BigBirdPegasusForCausalLM.set_decoder(decoder)

Sets the decoder for the BigBirdPegasusForCausalLM model.

PARAMETER DESCRIPTION
self

The instance of the BigBirdPegasusForCausalLM class.

TYPE: BigBirdPegasusForCausalLM

decoder

The decoder object to be set for the model. It should be of the appropriate type.

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py
3882
3883
3884
3885
3886
3887
3888
3889
3890
3891
3892
3893
3894
3895
3896
def set_decoder(self, decoder):
    """
    Sets the decoder for the BigBirdPegasusForCausalLM model.

    Args:
        self (BigBirdPegasusForCausalLM): The instance of the BigBirdPegasusForCausalLM class.
        decoder: The decoder object to be set for the model. It should be of the appropriate type.

    Returns:
        None.

    Raises:
        None.
    """
    self.model.decoder = decoder

mindnlp.transformers.models.bigbird_pegasus.modeling_bigbird_pegasus.BigBirdPegasusForCausalLM.set_input_embeddings(value)

Sets the input embeddings for the BigBirdPegasusForCausalLM model.

PARAMETER DESCRIPTION
self

The instance of the BigBirdPegasusForCausalLM class.

TYPE: BigBirdPegasusForCausalLM

value

The input embeddings to be set for the model. This should be of type torch.Tensor.

RETURNS DESCRIPTION

None.

This method sets the input embeddings for the BigBirdPegasusForCausalLM model. It assigns the given 'value' to the 'embed_tokens' attribute of the decoder in the model. The 'embed_tokens' attribute represents the embedding layer used for token inputs in the decoder. By setting the input embeddings, the model will use the provided embeddings during inference and decoding.

Note

It is important to ensure that the 'value' parameter is a tensor of shape (vocab_size, embedding_dim) where 'vocab_size' is the size of the vocabulary and 'embedding_dim' is the dimensionality of the embedding space.

Source code in mindnlp/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py
3824
3825
3826
3827
3828
3829
3830
3831
3832
3833
3834
3835
3836
3837
3838
3839
3840
3841
3842
3843
3844
3845
3846
3847
3848
def set_input_embeddings(self, value):
    """
    Sets the input embeddings for the BigBirdPegasusForCausalLM model.

    Args:
        self (BigBirdPegasusForCausalLM): The instance of the BigBirdPegasusForCausalLM class.
        value: The input embeddings to be set for the model. This should be of type torch.Tensor.

    Returns:
        None.

    Raises:
        None.

    This method sets the input embeddings for the BigBirdPegasusForCausalLM model.
    It assigns the given 'value' to the 'embed_tokens' attribute of the decoder in the model.
    The 'embed_tokens' attribute represents the embedding layer used for token inputs in the decoder.
    By setting the input embeddings, the model will use the provided embeddings during inference and decoding.

    Note:
        It is important to ensure that the 'value' parameter is a tensor of shape (vocab_size, embedding_dim)
        where 'vocab_size' is the size of the vocabulary and 'embedding_dim' is the dimensionality of the
        embedding space.
    """
    self.model.decoder.embed_tokens = value

mindnlp.transformers.models.bigbird_pegasus.modeling_bigbird_pegasus.BigBirdPegasusForCausalLM.set_output_embeddings(new_embeddings)

Set the output embeddings for the BigBirdPegasusForCausalLM model.

PARAMETER DESCRIPTION
self

The instance of the BigBirdPegasusForCausalLM class.

TYPE: BigBirdPegasusForCausalLM

new_embeddings

The new embeddings to be set for the output layer.

TYPE: Any

RETURNS DESCRIPTION
None

This method updates the lm_head attribute of the BigBirdPegasusForCausalLM instance with the new embeddings.

Source code in mindnlp/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py
3866
3867
3868
3869
3870
3871
3872
3873
3874
3875
3876
3877
3878
3879
3880
def set_output_embeddings(self, new_embeddings):
    """Set the output embeddings for the BigBirdPegasusForCausalLM model.

    Args:
        self (BigBirdPegasusForCausalLM): The instance of the BigBirdPegasusForCausalLM class.
        new_embeddings (Any): The new embeddings to be set for the output layer.

    Returns:
        None:
            This method updates the lm_head attribute of the BigBirdPegasusForCausalLM instance with the new embeddings.

    Raises:
        None.
    """
    self.lm_head = new_embeddings

mindnlp.transformers.models.bigbird_pegasus.modeling_bigbird_pegasus.BigBirdPegasusForConditionalGeneration

Bases: BigBirdPegasusPreTrainedModel

This class represents a conditional generation model based on BigBirdPegasus. It is a subclass of BigBirdPegasusPreTrainedModel.

The BigBirdPegasusForConditionalGeneration class extends the functionality of its parent class by adding methods for conditional generation tasks, such as generating text given a prompt or a set of input tokens.

METHOD DESCRIPTION
__init__

Initializes the model with the given configuration.

get_encoder

Returns the encoder of the model.

get_decoder

Returns the decoder of the model.

resize_token_embeddings

Resizes the token embeddings of the model.

_resize_final_logits_bias

Resizes the bias tensor used for final logits.

get_output_embeddings

Returns the output embedding layer of the model.

set_output_embeddings

Sets the output embedding layer of the model.

forward

Constructs the model for conditional generation tasks.

prepare_inputs_for_generation

Prepares the input tensors for generation.

prepare_decoder_input_ids_from_labels

Prepares the decoder input IDs from the given labels.

_reorder_cache

Reorders the past key values for beam search.

The BigBirdPegasusForConditionalGeneration class is designed to be used for various conditional generation tasks, such as text generation, text completion, and text summarization.

Source code in mindnlp/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py
3065
3066
3067
3068
3069
3070
3071
3072
3073
3074
3075
3076
3077
3078
3079
3080
3081
3082
3083
3084
3085
3086
3087
3088
3089
3090
3091
3092
3093
3094
3095
3096
3097
3098
3099
3100
3101
3102
3103
3104
3105
3106
3107
3108
3109
3110
3111
3112
3113
3114
3115
3116
3117
3118
3119
3120
3121
3122
3123
3124
3125
3126
3127
3128
3129
3130
3131
3132
3133
3134
3135
3136
3137
3138
3139
3140
3141
3142
3143
3144
3145
3146
3147
3148
3149
3150
3151
3152
3153
3154
3155
3156
3157
3158
3159
3160
3161
3162
3163
3164
3165
3166
3167
3168
3169
3170
3171
3172
3173
3174
3175
3176
3177
3178
3179
3180
3181
3182
3183
3184
3185
3186
3187
3188
3189
3190
3191
3192
3193
3194
3195
3196
3197
3198
3199
3200
3201
3202
3203
3204
3205
3206
3207
3208
3209
3210
3211
3212
3213
3214
3215
3216
3217
3218
3219
3220
3221
3222
3223
3224
3225
3226
3227
3228
3229
3230
3231
3232
3233
3234
3235
3236
3237
3238
3239
3240
3241
3242
3243
3244
3245
3246
3247
3248
3249
3250
3251
3252
3253
3254
3255
3256
3257
3258
3259
3260
3261
3262
3263
3264
3265
3266
3267
3268
3269
3270
3271
3272
3273
3274
3275
3276
3277
3278
3279
3280
3281
3282
3283
3284
3285
3286
3287
3288
3289
3290
3291
3292
3293
3294
3295
3296
3297
3298
3299
3300
3301
3302
3303
3304
3305
3306
3307
3308
3309
3310
3311
3312
3313
3314
3315
3316
3317
3318
3319
3320
3321
3322
3323
3324
3325
3326
3327
3328
3329
3330
3331
3332
3333
3334
3335
3336
3337
3338
3339
3340
3341
3342
3343
3344
3345
3346
3347
3348
3349
3350
3351
3352
3353
3354
3355
3356
3357
3358
3359
3360
3361
3362
3363
3364
3365
3366
3367
3368
3369
3370
3371
3372
3373
3374
3375
3376
3377
3378
3379
3380
3381
3382
3383
3384
3385
3386
3387
3388
3389
3390
3391
3392
3393
3394
3395
3396
3397
3398
3399
3400
3401
3402
3403
3404
3405
3406
3407
3408
3409
3410
3411
3412
3413
3414
3415
3416
3417
class BigBirdPegasusForConditionalGeneration(BigBirdPegasusPreTrainedModel):

    """
    This class represents a conditional generation model based on BigBirdPegasus.
    It is a subclass of BigBirdPegasusPreTrainedModel.

    The BigBirdPegasusForConditionalGeneration class extends the functionality of its parent class by adding methods
    for conditional generation tasks, such as generating text given a prompt or a set of input tokens.

    Methods:
        __init__: Initializes the model with the given configuration.
        get_encoder: Returns the encoder of the model.
        get_decoder: Returns the decoder of the model.
        resize_token_embeddings: Resizes the token embeddings of the model.
        _resize_final_logits_bias: Resizes the bias tensor used for final logits.
        get_output_embeddings: Returns the output embedding layer of the model.
        set_output_embeddings: Sets the output embedding layer of the model.
        forward: Constructs the model for conditional generation tasks.
        prepare_inputs_for_generation: Prepares the input tensors for generation.
        prepare_decoder_input_ids_from_labels: Prepares the decoder input IDs from the given labels.
        _reorder_cache: Reorders the past key values for beam search.

    The BigBirdPegasusForConditionalGeneration class is designed to be used for various conditional generation tasks,
    such as text generation, text completion, and text summarization.
    """
    base_model_prefix = "model"
    _tied_weights_keys = ["encoder.embed_tokens.weight", "decoder.embed_tokens.weight", "lm_head.weight"]
    _keys_to_ignore_on_load_unexpected = ["final_logits_bias"]

    def __init__(self, config: BigBirdPegasusConfig):
        """
        Initializes an instance of the BigBirdPegasusForConditionalGeneration class.

        Args:
            self: The instance of the class.
            config (BigBirdPegasusConfig): An instance of BigBirdPegasusConfig containing the configuration parameters for the model.

        Returns:
            None.

        Raises:
            None
        """
        super().__init__(config)
        self.model = BigBirdPegasusModel(config)
        self.final_logits_bias = ops.zeros(1, self.model.shared.num_embeddings)
        self.lm_head = nn.Linear(config.d_model, self.model.shared.num_embeddings, bias=False)

        # Initialize weights and apply final processing
        self.post_init()

    def get_encoder(self):
        """
        Retrieve the encoder component from the model.

        Args:
            self: An instance of the BigBirdPegasusForConditionalGeneration class.

        Returns:
            None: The method returns None as the encoder component.

        Raises:
            None.
        """
        return self.model.get_encoder()

    def get_decoder(self):
        """
        This method returns the decoder from the BigBirdPegasusForConditionalGeneration model.

        Args:
            self (BigBirdPegasusForConditionalGeneration): The instance of the BigBirdPegasusForConditionalGeneration class.

        Returns:
            None.

        Raises:
            None.
        """
        return self.model.get_decoder()

    def resize_token_embeddings(self, new_num_tokens: int, pad_to_multiple_of: Optional[int] = None) -> nn.Embedding:
        """
        Resize the token embeddings for the model.

        Args:
            self: The instance of the BigBirdPegasusForConditionalGeneration class.
            new_num_tokens (int): The new number of tokens to resize the embeddings to.
            pad_to_multiple_of (Optional[int]): A value to pad the new number of tokens to a multiple of, if specified.

        Returns:
            nn.Embedding: The resized token embeddings of type nn.Embedding.

        Raises:
            ValueError: If new_num_tokens is not a positive integer.
            TypeError: If new_num_tokens is not an integer.
            TypeError: If pad_to_multiple_of is not an integer.
        """
        new_embeddings = super().resize_token_embeddings(new_num_tokens, pad_to_multiple_of)
        self._resize_final_logits_bias(new_embeddings.weight.shape[0])
        return new_embeddings

    def _resize_final_logits_bias(self, new_num_tokens: int) -> None:
        """
        Resizes the final logits bias tensor in the BigBirdPegasusForConditionalGeneration class.

        Args:
            self (BigBirdPegasusForConditionalGeneration): An instance of the BigBirdPegasusForConditionalGeneration class.
            new_num_tokens (int): The desired number of tokens for the resized final logits bias tensor.

        Returns:
            None:
                This method modifies the `final_logits_bias` attribute of the BigBirdPegasusForConditionalGeneration
                instance.

        Raises:
            None.

        The method `_resize_final_logits_bias` resizes the `final_logits_bias` tensor based on the provided
        `new_num_tokens` parameter. If the `new_num_tokens` is less than or equal to the current number of
        tokens in `final_logits_bias`, the tensor is sliced to retain only the first `new_num_tokens` columns.
        Otherwise, extra bias columns are added to the tensor using `zeros` and `cat` operations.

        Note:
            This method directly modifies the `final_logits_bias` attribute of the
            BigBirdPegasusForConditionalGeneration instance.
        """
        old_num_tokens = self.final_logits_bias.shape[-1]
        if new_num_tokens <= old_num_tokens:
            new_bias = self.final_logits_bias[:, :new_num_tokens]
        else:
            extra_bias = ops.zeros(1, new_num_tokens - old_num_tokens)
            new_bias = ops.cat([self.final_logits_bias, extra_bias], dim=1)
        self.final_logits_bias = new_bias

    def get_output_embeddings(self):
        """
        Returns the output embeddings for the BigBirdPegasus model.

        Args:
            self (BigBirdPegasusForConditionalGeneration):
                The instance of the BigBirdPegasusForConditionalGeneration class.

        Returns:
            None.

        Raises:
            None.
        """
        return self.lm_head

    def set_output_embeddings(self, new_embeddings):
        """
        Sets the output embeddings for the BigBirdPegasusForConditionalGeneration model.

        Args:
            self (BigBirdPegasusForConditionalGeneration): The instance of the BigBirdPegasusForConditionalGeneration class.
            new_embeddings (torch.nn.Embedding): The new output embeddings to be set for the model.

        Returns:
            None.

        Raises:
            None.
        """
        self.lm_head = new_embeddings

    def forward(
        self,
        input_ids: mindspore.Tensor = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        decoder_input_ids: Optional[mindspore.Tensor] = None,
        decoder_attention_mask: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        decoder_head_mask: Optional[mindspore.Tensor] = None,
        cross_attn_head_mask: Optional[mindspore.Tensor] = None,
        encoder_outputs: Optional[List[mindspore.Tensor]] = None,
        past_key_values: Optional[List[mindspore.Tensor]] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        decoder_inputs_embeds: Optional[mindspore.Tensor] = None,
        labels: Optional[mindspore.Tensor] = None,
        use_cache: Optional[bool] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple, Seq2SeqLMOutput]:
        r"""
        Args:
            labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
                Labels for computing the masked language modeling loss. Indices should either be in `[0, ...,
                config.vocab_size]` or -100 (see `input_ids` docstring). Tokens with indices set to `-100` are ignored
                (masked), the loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`.

        Returns:
            `Union[Tuple, Seq2SeqLMOutput]`
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        if labels is not None:
            if use_cache:
                logger.warning("The `use_cache` argument is changed to `False` since `labels` is provided.")
            use_cache = False
            if decoder_input_ids is None and decoder_inputs_embeds is None:
                decoder_input_ids = shift_tokens_right(
                    labels, self.config.pad_token_id, self.config.decoder_start_token_id
                )

        outputs = self.model(
            input_ids,
            attention_mask=attention_mask,
            decoder_input_ids=decoder_input_ids,
            encoder_outputs=encoder_outputs,
            decoder_attention_mask=decoder_attention_mask,
            head_mask=head_mask,
            decoder_head_mask=decoder_head_mask,
            cross_attn_head_mask=cross_attn_head_mask,
            past_key_values=past_key_values,
            inputs_embeds=inputs_embeds,
            decoder_inputs_embeds=decoder_inputs_embeds,
            use_cache=use_cache,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        lm_logits = self.lm_head(outputs[0])
        lm_logits = lm_logits + self.final_logits_bias

        masked_lm_loss = None
        if labels is not None:
            masked_lm_loss = F.cross_entropy(lm_logits.view(-1, self.config.vocab_size), labels.view(-1))

        if not return_dict:
            output = (lm_logits,) + outputs[1:]
            return ((masked_lm_loss,) + output) if masked_lm_loss is not None else output

        return Seq2SeqLMOutput(
            loss=masked_lm_loss,
            logits=lm_logits,
            past_key_values=outputs.past_key_values,
            decoder_hidden_states=outputs.decoder_hidden_states,
            decoder_attentions=outputs.decoder_attentions,
            cross_attentions=outputs.cross_attentions,
            encoder_last_hidden_state=outputs.encoder_last_hidden_state,
            encoder_hidden_states=outputs.encoder_hidden_states,
            encoder_attentions=outputs.encoder_attentions,
        )

    def prepare_inputs_for_generation(
        self,
        decoder_input_ids,
        past_key_values=None,
        attention_mask=None,
        decoder_attention_mask=None,
        head_mask=None,
        decoder_head_mask=None,
        cross_attn_head_mask=None,
        use_cache=None,
        encoder_outputs=None,
        **kwargs,
    ):
        """
        This method prepares inputs for generation in the BigBirdPegasusForConditionalGeneration class.

        Args:
            self: The instance of the class.
            decoder_input_ids (Tensor): The input tensor for the decoder.
            past_key_values (tuple, optional): A tuple of past key values for the model's autoregressive decoding. Defaults to None.
            attention_mask (Tensor, optional): The attention mask for the input. Defaults to None.
            decoder_attention_mask (Tensor, optional): The attention mask for the decoder input. Defaults to None.
            head_mask (Tensor, optional): The mask for the attention heads. Defaults to None.
            decoder_head_mask (Tensor, optional): The mask for the decoder's attention heads. Defaults to None.
            cross_attn_head_mask (Tensor, optional): The mask for cross-attention heads. Defaults to None.
            use_cache (bool, optional): Whether to use caching for the model. Defaults to None.
            encoder_outputs (Tensor, optional): The outputs from the encoder. Defaults to None.

        Returns:
            dict: A dictionary containing
                'input_ids', 'encoder_outputs', 'past_key_values', 'decoder_input_ids', 'attention_mask',
                'decoder_attention_mask', 'head_mask', 'decoder_head_mask', 'cross_attn_head_mask', and 'use_cache'.

        Raises:
            None
        """
        # cut decoder_input_ids if past_key_values is used
        if past_key_values is not None:
            past_length = past_key_values[0][0].shape[2]

            # Some generation methods already pass only the last input ID
            if decoder_input_ids.shape[1] > past_length:
                remove_prefix_length = past_length
            else:
                # Default to old behavior: keep only final ID
                remove_prefix_length = decoder_input_ids.shape[1] - 1

            decoder_input_ids = decoder_input_ids[:, remove_prefix_length:]

        return {
            "input_ids": None,  # encoder_outputs is defined. input_ids not needed
            "encoder_outputs": encoder_outputs,
            "past_key_values": past_key_values,
            "decoder_input_ids": decoder_input_ids,
            "attention_mask": attention_mask,
            "decoder_attention_mask": decoder_attention_mask,
            "head_mask": head_mask,
            "decoder_head_mask": decoder_head_mask,
            "cross_attn_head_mask": cross_attn_head_mask,
            "use_cache": use_cache,  # change this to avoid caching (presumably for debugging)
        }

    def prepare_decoder_input_ids_from_labels(self, labels: mindspore.Tensor):
        """
        Prepare decoder input IDs from labels.

        This method takes two parameters: self, labels.

        Args:
            self (BigBirdPegasusForConditionalGeneration): An instance of the BigBirdPegasusForConditionalGeneration class.
            labels (mindspore.Tensor): The labels tensor representing the ground truth sequence.

        Returns:
            None.

        Raises:
            None.
        """
        return shift_tokens_right(labels, self.config.pad_token_id, self.config.decoder_start_token_id)

    @staticmethod
    def _reorder_cache(past_key_values, beam_idx):
        """
        Reorders the past key values based on the provided beam index.

        Args:
            past_key_values (tuple): A tuple containing past key values for each layer.
                Each element in the tuple is a tuple representing the past key values for a layer.
            beam_idx (Tensor): A tensor containing the indices of the beams to reorder the past key values.

        Returns:
            tuple: A tuple of reordered past key values,
                where each element in the tuple represents the reordered past key values for a layer.

        Raises:
            None
        """
        reordered_past = ()
        for layer_past in past_key_values:
            # cached cross_attention states don't have to be reordered -> they are always the same
            reordered_past += (
                tuple(past_state.index_select(0, beam_idx) for past_state in layer_past[:2])
                + layer_past[2:],
            )
        return reordered_past

mindnlp.transformers.models.bigbird_pegasus.modeling_bigbird_pegasus.BigBirdPegasusForConditionalGeneration.__init__(config)

Initializes an instance of the BigBirdPegasusForConditionalGeneration class.

PARAMETER DESCRIPTION
self

The instance of the class.

config

An instance of BigBirdPegasusConfig containing the configuration parameters for the model.

TYPE: BigBirdPegasusConfig

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py
3094
3095
3096
3097
3098
3099
3100
3101
3102
3103
3104
3105
3106
3107
3108
3109
3110
3111
3112
3113
3114
def __init__(self, config: BigBirdPegasusConfig):
    """
    Initializes an instance of the BigBirdPegasusForConditionalGeneration class.

    Args:
        self: The instance of the class.
        config (BigBirdPegasusConfig): An instance of BigBirdPegasusConfig containing the configuration parameters for the model.

    Returns:
        None.

    Raises:
        None
    """
    super().__init__(config)
    self.model = BigBirdPegasusModel(config)
    self.final_logits_bias = ops.zeros(1, self.model.shared.num_embeddings)
    self.lm_head = nn.Linear(config.d_model, self.model.shared.num_embeddings, bias=False)

    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.bigbird_pegasus.modeling_bigbird_pegasus.BigBirdPegasusForConditionalGeneration.forward(input_ids=None, attention_mask=None, decoder_input_ids=None, decoder_attention_mask=None, head_mask=None, decoder_head_mask=None, cross_attn_head_mask=None, encoder_outputs=None, past_key_values=None, inputs_embeds=None, decoder_inputs_embeds=None, labels=None, use_cache=None, output_attentions=None, output_hidden_states=None, return_dict=None)

PARAMETER DESCRIPTION
labels

Labels for computing the masked language modeling loss. Indices should either be in [0, ..., config.vocab_size] or -100 (see input_ids docstring). Tokens with indices set to -100 are ignored (masked), the loss is only computed for the tokens with labels in [0, ..., config.vocab_size].

TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional* DEFAULT: None

RETURNS DESCRIPTION
Union[Tuple, Seq2SeqLMOutput]

Union[Tuple, Seq2SeqLMOutput]

Source code in mindnlp/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py
3232
3233
3234
3235
3236
3237
3238
3239
3240
3241
3242
3243
3244
3245
3246
3247
3248
3249
3250
3251
3252
3253
3254
3255
3256
3257
3258
3259
3260
3261
3262
3263
3264
3265
3266
3267
3268
3269
3270
3271
3272
3273
3274
3275
3276
3277
3278
3279
3280
3281
3282
3283
3284
3285
3286
3287
3288
3289
3290
3291
3292
3293
3294
3295
3296
3297
3298
3299
3300
3301
3302
3303
3304
3305
3306
3307
3308
3309
3310
3311
def forward(
    self,
    input_ids: mindspore.Tensor = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    decoder_input_ids: Optional[mindspore.Tensor] = None,
    decoder_attention_mask: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    decoder_head_mask: Optional[mindspore.Tensor] = None,
    cross_attn_head_mask: Optional[mindspore.Tensor] = None,
    encoder_outputs: Optional[List[mindspore.Tensor]] = None,
    past_key_values: Optional[List[mindspore.Tensor]] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    decoder_inputs_embeds: Optional[mindspore.Tensor] = None,
    labels: Optional[mindspore.Tensor] = None,
    use_cache: Optional[bool] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[Tuple, Seq2SeqLMOutput]:
    r"""
    Args:
        labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
            Labels for computing the masked language modeling loss. Indices should either be in `[0, ...,
            config.vocab_size]` or -100 (see `input_ids` docstring). Tokens with indices set to `-100` are ignored
            (masked), the loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`.

    Returns:
        `Union[Tuple, Seq2SeqLMOutput]`
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    if labels is not None:
        if use_cache:
            logger.warning("The `use_cache` argument is changed to `False` since `labels` is provided.")
        use_cache = False
        if decoder_input_ids is None and decoder_inputs_embeds is None:
            decoder_input_ids = shift_tokens_right(
                labels, self.config.pad_token_id, self.config.decoder_start_token_id
            )

    outputs = self.model(
        input_ids,
        attention_mask=attention_mask,
        decoder_input_ids=decoder_input_ids,
        encoder_outputs=encoder_outputs,
        decoder_attention_mask=decoder_attention_mask,
        head_mask=head_mask,
        decoder_head_mask=decoder_head_mask,
        cross_attn_head_mask=cross_attn_head_mask,
        past_key_values=past_key_values,
        inputs_embeds=inputs_embeds,
        decoder_inputs_embeds=decoder_inputs_embeds,
        use_cache=use_cache,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    lm_logits = self.lm_head(outputs[0])
    lm_logits = lm_logits + self.final_logits_bias

    masked_lm_loss = None
    if labels is not None:
        masked_lm_loss = F.cross_entropy(lm_logits.view(-1, self.config.vocab_size), labels.view(-1))

    if not return_dict:
        output = (lm_logits,) + outputs[1:]
        return ((masked_lm_loss,) + output) if masked_lm_loss is not None else output

    return Seq2SeqLMOutput(
        loss=masked_lm_loss,
        logits=lm_logits,
        past_key_values=outputs.past_key_values,
        decoder_hidden_states=outputs.decoder_hidden_states,
        decoder_attentions=outputs.decoder_attentions,
        cross_attentions=outputs.cross_attentions,
        encoder_last_hidden_state=outputs.encoder_last_hidden_state,
        encoder_hidden_states=outputs.encoder_hidden_states,
        encoder_attentions=outputs.encoder_attentions,
    )

mindnlp.transformers.models.bigbird_pegasus.modeling_bigbird_pegasus.BigBirdPegasusForConditionalGeneration.get_decoder()

This method returns the decoder from the BigBirdPegasusForConditionalGeneration model.

PARAMETER DESCRIPTION
self

The instance of the BigBirdPegasusForConditionalGeneration class.

TYPE: BigBirdPegasusForConditionalGeneration

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py
3131
3132
3133
3134
3135
3136
3137
3138
3139
3140
3141
3142
3143
3144
def get_decoder(self):
    """
    This method returns the decoder from the BigBirdPegasusForConditionalGeneration model.

    Args:
        self (BigBirdPegasusForConditionalGeneration): The instance of the BigBirdPegasusForConditionalGeneration class.

    Returns:
        None.

    Raises:
        None.
    """
    return self.model.get_decoder()

mindnlp.transformers.models.bigbird_pegasus.modeling_bigbird_pegasus.BigBirdPegasusForConditionalGeneration.get_encoder()

Retrieve the encoder component from the model.

PARAMETER DESCRIPTION
self

An instance of the BigBirdPegasusForConditionalGeneration class.

RETURNS DESCRIPTION
None

The method returns None as the encoder component.

Source code in mindnlp/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py
3116
3117
3118
3119
3120
3121
3122
3123
3124
3125
3126
3127
3128
3129
def get_encoder(self):
    """
    Retrieve the encoder component from the model.

    Args:
        self: An instance of the BigBirdPegasusForConditionalGeneration class.

    Returns:
        None: The method returns None as the encoder component.

    Raises:
        None.
    """
    return self.model.get_encoder()

mindnlp.transformers.models.bigbird_pegasus.modeling_bigbird_pegasus.BigBirdPegasusForConditionalGeneration.get_output_embeddings()

Returns the output embeddings for the BigBirdPegasus model.

PARAMETER DESCRIPTION
self

The instance of the BigBirdPegasusForConditionalGeneration class.

TYPE: BigBirdPegasusForConditionalGeneration

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py
3200
3201
3202
3203
3204
3205
3206
3207
3208
3209
3210
3211
3212
3213
3214
def get_output_embeddings(self):
    """
    Returns the output embeddings for the BigBirdPegasus model.

    Args:
        self (BigBirdPegasusForConditionalGeneration):
            The instance of the BigBirdPegasusForConditionalGeneration class.

    Returns:
        None.

    Raises:
        None.
    """
    return self.lm_head

mindnlp.transformers.models.bigbird_pegasus.modeling_bigbird_pegasus.BigBirdPegasusForConditionalGeneration.prepare_decoder_input_ids_from_labels(labels)

Prepare decoder input IDs from labels.

This method takes two parameters: self, labels.

PARAMETER DESCRIPTION
self

An instance of the BigBirdPegasusForConditionalGeneration class.

TYPE: BigBirdPegasusForConditionalGeneration

labels

The labels tensor representing the ground truth sequence.

TYPE: Tensor

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py
3375
3376
3377
3378
3379
3380
3381
3382
3383
3384
3385
3386
3387
3388
3389
3390
3391
def prepare_decoder_input_ids_from_labels(self, labels: mindspore.Tensor):
    """
    Prepare decoder input IDs from labels.

    This method takes two parameters: self, labels.

    Args:
        self (BigBirdPegasusForConditionalGeneration): An instance of the BigBirdPegasusForConditionalGeneration class.
        labels (mindspore.Tensor): The labels tensor representing the ground truth sequence.

    Returns:
        None.

    Raises:
        None.
    """
    return shift_tokens_right(labels, self.config.pad_token_id, self.config.decoder_start_token_id)

mindnlp.transformers.models.bigbird_pegasus.modeling_bigbird_pegasus.BigBirdPegasusForConditionalGeneration.prepare_inputs_for_generation(decoder_input_ids, past_key_values=None, attention_mask=None, decoder_attention_mask=None, head_mask=None, decoder_head_mask=None, cross_attn_head_mask=None, use_cache=None, encoder_outputs=None, **kwargs)

This method prepares inputs for generation in the BigBirdPegasusForConditionalGeneration class.

PARAMETER DESCRIPTION
self

The instance of the class.

decoder_input_ids

The input tensor for the decoder.

TYPE: Tensor

past_key_values

A tuple of past key values for the model's autoregressive decoding. Defaults to None.

TYPE: tuple DEFAULT: None

attention_mask

The attention mask for the input. Defaults to None.

TYPE: Tensor DEFAULT: None

decoder_attention_mask

The attention mask for the decoder input. Defaults to None.

TYPE: Tensor DEFAULT: None

head_mask

The mask for the attention heads. Defaults to None.

TYPE: Tensor DEFAULT: None

decoder_head_mask

The mask for the decoder's attention heads. Defaults to None.

TYPE: Tensor DEFAULT: None

cross_attn_head_mask

The mask for cross-attention heads. Defaults to None.

TYPE: Tensor DEFAULT: None

use_cache

Whether to use caching for the model. Defaults to None.

TYPE: bool DEFAULT: None

encoder_outputs

The outputs from the encoder. Defaults to None.

TYPE: Tensor DEFAULT: None

RETURNS DESCRIPTION
dict

A dictionary containing 'input_ids', 'encoder_outputs', 'past_key_values', 'decoder_input_ids', 'attention_mask', 'decoder_attention_mask', 'head_mask', 'decoder_head_mask', 'cross_attn_head_mask', and 'use_cache'.

Source code in mindnlp/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py
3313
3314
3315
3316
3317
3318
3319
3320
3321
3322
3323
3324
3325
3326
3327
3328
3329
3330
3331
3332
3333
3334
3335
3336
3337
3338
3339
3340
3341
3342
3343
3344
3345
3346
3347
3348
3349
3350
3351
3352
3353
3354
3355
3356
3357
3358
3359
3360
3361
3362
3363
3364
3365
3366
3367
3368
3369
3370
3371
3372
3373
def prepare_inputs_for_generation(
    self,
    decoder_input_ids,
    past_key_values=None,
    attention_mask=None,
    decoder_attention_mask=None,
    head_mask=None,
    decoder_head_mask=None,
    cross_attn_head_mask=None,
    use_cache=None,
    encoder_outputs=None,
    **kwargs,
):
    """
    This method prepares inputs for generation in the BigBirdPegasusForConditionalGeneration class.

    Args:
        self: The instance of the class.
        decoder_input_ids (Tensor): The input tensor for the decoder.
        past_key_values (tuple, optional): A tuple of past key values for the model's autoregressive decoding. Defaults to None.
        attention_mask (Tensor, optional): The attention mask for the input. Defaults to None.
        decoder_attention_mask (Tensor, optional): The attention mask for the decoder input. Defaults to None.
        head_mask (Tensor, optional): The mask for the attention heads. Defaults to None.
        decoder_head_mask (Tensor, optional): The mask for the decoder's attention heads. Defaults to None.
        cross_attn_head_mask (Tensor, optional): The mask for cross-attention heads. Defaults to None.
        use_cache (bool, optional): Whether to use caching for the model. Defaults to None.
        encoder_outputs (Tensor, optional): The outputs from the encoder. Defaults to None.

    Returns:
        dict: A dictionary containing
            'input_ids', 'encoder_outputs', 'past_key_values', 'decoder_input_ids', 'attention_mask',
            'decoder_attention_mask', 'head_mask', 'decoder_head_mask', 'cross_attn_head_mask', and 'use_cache'.

    Raises:
        None
    """
    # cut decoder_input_ids if past_key_values is used
    if past_key_values is not None:
        past_length = past_key_values[0][0].shape[2]

        # Some generation methods already pass only the last input ID
        if decoder_input_ids.shape[1] > past_length:
            remove_prefix_length = past_length
        else:
            # Default to old behavior: keep only final ID
            remove_prefix_length = decoder_input_ids.shape[1] - 1

        decoder_input_ids = decoder_input_ids[:, remove_prefix_length:]

    return {
        "input_ids": None,  # encoder_outputs is defined. input_ids not needed
        "encoder_outputs": encoder_outputs,
        "past_key_values": past_key_values,
        "decoder_input_ids": decoder_input_ids,
        "attention_mask": attention_mask,
        "decoder_attention_mask": decoder_attention_mask,
        "head_mask": head_mask,
        "decoder_head_mask": decoder_head_mask,
        "cross_attn_head_mask": cross_attn_head_mask,
        "use_cache": use_cache,  # change this to avoid caching (presumably for debugging)
    }

mindnlp.transformers.models.bigbird_pegasus.modeling_bigbird_pegasus.BigBirdPegasusForConditionalGeneration.resize_token_embeddings(new_num_tokens, pad_to_multiple_of=None)

Resize the token embeddings for the model.

PARAMETER DESCRIPTION
self

The instance of the BigBirdPegasusForConditionalGeneration class.

new_num_tokens

The new number of tokens to resize the embeddings to.

TYPE: int

pad_to_multiple_of

A value to pad the new number of tokens to a multiple of, if specified.

TYPE: Optional[int] DEFAULT: None

RETURNS DESCRIPTION
Embedding

nn.Embedding: The resized token embeddings of type nn.Embedding.

RAISES DESCRIPTION
ValueError

If new_num_tokens is not a positive integer.

TypeError

If new_num_tokens is not an integer.

TypeError

If pad_to_multiple_of is not an integer.

Source code in mindnlp/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py
3146
3147
3148
3149
3150
3151
3152
3153
3154
3155
3156
3157
3158
3159
3160
3161
3162
3163
3164
3165
def resize_token_embeddings(self, new_num_tokens: int, pad_to_multiple_of: Optional[int] = None) -> nn.Embedding:
    """
    Resize the token embeddings for the model.

    Args:
        self: The instance of the BigBirdPegasusForConditionalGeneration class.
        new_num_tokens (int): The new number of tokens to resize the embeddings to.
        pad_to_multiple_of (Optional[int]): A value to pad the new number of tokens to a multiple of, if specified.

    Returns:
        nn.Embedding: The resized token embeddings of type nn.Embedding.

    Raises:
        ValueError: If new_num_tokens is not a positive integer.
        TypeError: If new_num_tokens is not an integer.
        TypeError: If pad_to_multiple_of is not an integer.
    """
    new_embeddings = super().resize_token_embeddings(new_num_tokens, pad_to_multiple_of)
    self._resize_final_logits_bias(new_embeddings.weight.shape[0])
    return new_embeddings

mindnlp.transformers.models.bigbird_pegasus.modeling_bigbird_pegasus.BigBirdPegasusForConditionalGeneration.set_output_embeddings(new_embeddings)

Sets the output embeddings for the BigBirdPegasusForConditionalGeneration model.

PARAMETER DESCRIPTION
self

The instance of the BigBirdPegasusForConditionalGeneration class.

TYPE: BigBirdPegasusForConditionalGeneration

new_embeddings

The new output embeddings to be set for the model.

TYPE: Embedding

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py
3216
3217
3218
3219
3220
3221
3222
3223
3224
3225
3226
3227
3228
3229
3230
def set_output_embeddings(self, new_embeddings):
    """
    Sets the output embeddings for the BigBirdPegasusForConditionalGeneration model.

    Args:
        self (BigBirdPegasusForConditionalGeneration): The instance of the BigBirdPegasusForConditionalGeneration class.
        new_embeddings (torch.nn.Embedding): The new output embeddings to be set for the model.

    Returns:
        None.

    Raises:
        None.
    """
    self.lm_head = new_embeddings

mindnlp.transformers.models.bigbird_pegasus.modeling_bigbird_pegasus.BigBirdPegasusForQuestionAnswering

Bases: BigBirdPegasusPreTrainedModel

This class represents a BigBirdPegasus model for question answering tasks. It is designed to perform question answering using the BigBirdPegasus architecture.

The class includes methods for initialization and forwarding the model for question answering tasks. It inherits from the BigBirdPegasusPreTrainedModel class and utilizes a sequence-to-sequence model for processing input and generating output.

The init method initializes the model with configuration settings, including setting the number of labels for classification. The forward method forwards the model for question answering by processing input tensors and generating start and end position logits for the answer span.

The class provides functionality for computing the token classification loss based on the start and end positions of the labelled span. It handles the calculation of loss and returns the output in the desired format based on the return_dict parameter.

For detailed information on the methods and parameters of this class, please refer to the class code and method docstrings.

Source code in mindnlp/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py
3558
3559
3560
3561
3562
3563
3564
3565
3566
3567
3568
3569
3570
3571
3572
3573
3574
3575
3576
3577
3578
3579
3580
3581
3582
3583
3584
3585
3586
3587
3588
3589
3590
3591
3592
3593
3594
3595
3596
3597
3598
3599
3600
3601
3602
3603
3604
3605
3606
3607
3608
3609
3610
3611
3612
3613
3614
3615
3616
3617
3618
3619
3620
3621
3622
3623
3624
3625
3626
3627
3628
3629
3630
3631
3632
3633
3634
3635
3636
3637
3638
3639
3640
3641
3642
3643
3644
3645
3646
3647
3648
3649
3650
3651
3652
3653
3654
3655
3656
3657
3658
3659
3660
3661
3662
3663
3664
3665
3666
3667
3668
3669
3670
3671
3672
3673
3674
3675
3676
3677
3678
3679
3680
3681
3682
3683
3684
3685
3686
3687
3688
3689
3690
3691
3692
3693
3694
3695
3696
3697
3698
3699
3700
3701
3702
3703
3704
class BigBirdPegasusForQuestionAnswering(BigBirdPegasusPreTrainedModel):

    """
    This class represents a BigBirdPegasus model for question answering tasks. It is designed to perform question
    answering using the BigBirdPegasus architecture.

    The class includes methods for initialization and forwarding the model for question answering tasks.
    It inherits from the BigBirdPegasusPreTrainedModel class and utilizes a sequence-to-sequence model for
    processing input and generating output.

    The __init__ method initializes the model with configuration settings, including setting the number of labels for classification.
    The forward method forwards the model for question answering by processing input tensors and generating
    start and end position logits for the answer span.

    The class provides functionality for computing the token classification loss based on the start and end positions
    of the labelled span.
    It handles the calculation of loss and returns the output in the desired format based on the return_dict parameter.

    For detailed information on the methods and parameters of this class, please refer to the class code and method
    docstrings.
    """
    _tied_weights_keys = ["encoder.embed_tokens.weight", "decoder.embed_tokens.weight"]

    def __init__(self, config):
        """
        Initializes a new instance of the BigBirdPegasusForQuestionAnswering class.

        Args:
            self: The object instance itself.
            config:
                An object containing configuration settings for the model.

                - Type: Any
                - Purpose: Contains configuration settings for the model initialization.
                - Restrictions: Must be a valid configuration object.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__(config)

        config.num_labels = 2
        self.num_labels = config.num_labels

        self.model = BigBirdPegasusModel(config)
        self.qa_outputs = nn.Linear(config.hidden_size, config.num_labels)

        # Initialize weights and apply final processing
        self.post_init()

    # Copied from transformers.models.bart.modeling_bart.BartForQuestionAnswering.forward
    def forward(
        self,
        input_ids: mindspore.Tensor = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        decoder_input_ids: Optional[mindspore.Tensor] = None,
        decoder_attention_mask: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        decoder_head_mask: Optional[mindspore.Tensor] = None,
        cross_attn_head_mask: Optional[mindspore.Tensor] = None,
        encoder_outputs: Optional[List[mindspore.Tensor]] = None,
        start_positions: Optional[mindspore.Tensor] = None,
        end_positions: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        decoder_inputs_embeds: Optional[mindspore.Tensor] = None,
        use_cache: Optional[bool] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple, Seq2SeqQuestionAnsweringModelOutput]:
        r"""
        Args:
            start_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
                Labels for position (index) of the start of the labelled span for computing the token classification loss.
                Positions are clamped to the length of the sequence (*sequence_length*). Position outside of the sequence
                are not taken into account for computing the loss.
            end_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
                Labels for position (index) of the end of the labelled span for computing the token classification loss.
                Positions are clamped to the length of the sequence (*sequence_length*). Position outside of the sequence
                are not taken into account for computing the loss.
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
        if start_positions is not None and end_positions is not None:
            use_cache = False

        outputs = self.model(
            input_ids,
            attention_mask=attention_mask,
            decoder_input_ids=decoder_input_ids,
            decoder_attention_mask=decoder_attention_mask,
            head_mask=head_mask,
            decoder_head_mask=decoder_head_mask,
            cross_attn_head_mask=cross_attn_head_mask,
            encoder_outputs=encoder_outputs,
            inputs_embeds=inputs_embeds,
            decoder_inputs_embeds=decoder_inputs_embeds,
            use_cache=use_cache,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        sequence_output = outputs[0]

        logits = self.qa_outputs(sequence_output)
        start_logits, end_logits = logits.split(1, axis=-1)
        start_logits = start_logits.squeeze(-1)
        end_logits = end_logits.squeeze(-1)

        total_loss = None
        if start_positions is not None and end_positions is not None:
            # If we are on multi-GPU, split add a dimension
            if len(start_positions.shape) > 1:
                start_positions = start_positions.squeeze(-1)
            if len(end_positions.shape) > 1:
                end_positions = end_positions.squeeze(-1)
            # sometimes the start/end positions are outside our model inputs, we ignore these terms
            ignored_index = start_logits.shape[1]
            start_positions = start_positions.clamp(0, ignored_index)
            end_positions = end_positions.clamp(0, ignored_index)

            start_loss = F.cross_entropy(start_logits, start_positions, ignore_index=ignored_index)
            end_loss = F.cross_entropy(end_logits, end_positions, ignore_index=ignored_index)
            total_loss = (start_loss + end_loss) / 2

        if not return_dict:
            output = (
                start_logits,
                end_logits,
            ) + outputs[1:]
            return ((total_loss,) + output) if total_loss is not None else output

        return Seq2SeqQuestionAnsweringModelOutput(
            loss=total_loss,
            start_logits=start_logits,
            end_logits=end_logits,
            past_key_values=outputs.past_key_values,
            decoder_hidden_states=outputs.decoder_hidden_states,
            decoder_attentions=outputs.decoder_attentions,
            cross_attentions=outputs.cross_attentions,
            encoder_last_hidden_state=outputs.encoder_last_hidden_state,
            encoder_hidden_states=outputs.encoder_hidden_states,
            encoder_attentions=outputs.encoder_attentions,
        )

mindnlp.transformers.models.bigbird_pegasus.modeling_bigbird_pegasus.BigBirdPegasusForQuestionAnswering.__init__(config)

Initializes a new instance of the BigBirdPegasusForQuestionAnswering class.

PARAMETER DESCRIPTION
self

The object instance itself.

config

An object containing configuration settings for the model.

  • Type: Any
  • Purpose: Contains configuration settings for the model initialization.
  • Restrictions: Must be a valid configuration object.

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py
3581
3582
3583
3584
3585
3586
3587
3588
3589
3590
3591
3592
3593
3594
3595
3596
3597
3598
3599
3600
3601
3602
3603
3604
3605
3606
3607
3608
3609
def __init__(self, config):
    """
    Initializes a new instance of the BigBirdPegasusForQuestionAnswering class.

    Args:
        self: The object instance itself.
        config:
            An object containing configuration settings for the model.

            - Type: Any
            - Purpose: Contains configuration settings for the model initialization.
            - Restrictions: Must be a valid configuration object.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__(config)

    config.num_labels = 2
    self.num_labels = config.num_labels

    self.model = BigBirdPegasusModel(config)
    self.qa_outputs = nn.Linear(config.hidden_size, config.num_labels)

    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.bigbird_pegasus.modeling_bigbird_pegasus.BigBirdPegasusForQuestionAnswering.forward(input_ids=None, attention_mask=None, decoder_input_ids=None, decoder_attention_mask=None, head_mask=None, decoder_head_mask=None, cross_attn_head_mask=None, encoder_outputs=None, start_positions=None, end_positions=None, inputs_embeds=None, decoder_inputs_embeds=None, use_cache=None, output_attentions=None, output_hidden_states=None, return_dict=None)

PARAMETER DESCRIPTION
start_positions

Labels for position (index) of the start of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length). Position outside of the sequence are not taken into account for computing the loss.

TYPE: `mindspore.Tensor` of shape `(batch_size,)`, *optional* DEFAULT: None

end_positions

Labels for position (index) of the end of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length). Position outside of the sequence are not taken into account for computing the loss.

TYPE: `mindspore.Tensor` of shape `(batch_size,)`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py
3612
3613
3614
3615
3616
3617
3618
3619
3620
3621
3622
3623
3624
3625
3626
3627
3628
3629
3630
3631
3632
3633
3634
3635
3636
3637
3638
3639
3640
3641
3642
3643
3644
3645
3646
3647
3648
3649
3650
3651
3652
3653
3654
3655
3656
3657
3658
3659
3660
3661
3662
3663
3664
3665
3666
3667
3668
3669
3670
3671
3672
3673
3674
3675
3676
3677
3678
3679
3680
3681
3682
3683
3684
3685
3686
3687
3688
3689
3690
3691
3692
3693
3694
3695
3696
3697
3698
3699
3700
3701
3702
3703
3704
def forward(
    self,
    input_ids: mindspore.Tensor = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    decoder_input_ids: Optional[mindspore.Tensor] = None,
    decoder_attention_mask: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    decoder_head_mask: Optional[mindspore.Tensor] = None,
    cross_attn_head_mask: Optional[mindspore.Tensor] = None,
    encoder_outputs: Optional[List[mindspore.Tensor]] = None,
    start_positions: Optional[mindspore.Tensor] = None,
    end_positions: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    decoder_inputs_embeds: Optional[mindspore.Tensor] = None,
    use_cache: Optional[bool] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[Tuple, Seq2SeqQuestionAnsweringModelOutput]:
    r"""
    Args:
        start_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for position (index) of the start of the labelled span for computing the token classification loss.
            Positions are clamped to the length of the sequence (*sequence_length*). Position outside of the sequence
            are not taken into account for computing the loss.
        end_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for position (index) of the end of the labelled span for computing the token classification loss.
            Positions are clamped to the length of the sequence (*sequence_length*). Position outside of the sequence
            are not taken into account for computing the loss.
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict
    if start_positions is not None and end_positions is not None:
        use_cache = False

    outputs = self.model(
        input_ids,
        attention_mask=attention_mask,
        decoder_input_ids=decoder_input_ids,
        decoder_attention_mask=decoder_attention_mask,
        head_mask=head_mask,
        decoder_head_mask=decoder_head_mask,
        cross_attn_head_mask=cross_attn_head_mask,
        encoder_outputs=encoder_outputs,
        inputs_embeds=inputs_embeds,
        decoder_inputs_embeds=decoder_inputs_embeds,
        use_cache=use_cache,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    sequence_output = outputs[0]

    logits = self.qa_outputs(sequence_output)
    start_logits, end_logits = logits.split(1, axis=-1)
    start_logits = start_logits.squeeze(-1)
    end_logits = end_logits.squeeze(-1)

    total_loss = None
    if start_positions is not None and end_positions is not None:
        # If we are on multi-GPU, split add a dimension
        if len(start_positions.shape) > 1:
            start_positions = start_positions.squeeze(-1)
        if len(end_positions.shape) > 1:
            end_positions = end_positions.squeeze(-1)
        # sometimes the start/end positions are outside our model inputs, we ignore these terms
        ignored_index = start_logits.shape[1]
        start_positions = start_positions.clamp(0, ignored_index)
        end_positions = end_positions.clamp(0, ignored_index)

        start_loss = F.cross_entropy(start_logits, start_positions, ignore_index=ignored_index)
        end_loss = F.cross_entropy(end_logits, end_positions, ignore_index=ignored_index)
        total_loss = (start_loss + end_loss) / 2

    if not return_dict:
        output = (
            start_logits,
            end_logits,
        ) + outputs[1:]
        return ((total_loss,) + output) if total_loss is not None else output

    return Seq2SeqQuestionAnsweringModelOutput(
        loss=total_loss,
        start_logits=start_logits,
        end_logits=end_logits,
        past_key_values=outputs.past_key_values,
        decoder_hidden_states=outputs.decoder_hidden_states,
        decoder_attentions=outputs.decoder_attentions,
        cross_attentions=outputs.cross_attentions,
        encoder_last_hidden_state=outputs.encoder_last_hidden_state,
        encoder_hidden_states=outputs.encoder_hidden_states,
        encoder_attentions=outputs.encoder_attentions,
    )

mindnlp.transformers.models.bigbird_pegasus.modeling_bigbird_pegasus.BigBirdPegasusForSequenceClassification

Bases: BigBirdPegasusPreTrainedModel

This class represents a BigBirdPegasus model for sequence classification. It inherits from BigBirdPegasusPreTrainedModel and includes methods for model initialization and forwardion of the sequence classifier. The forward method takes various input parameters for decoding and attention masks, and returns the sequence classifier output including logits and optional loss. The class also handles different problem types such as regression, single label classification, and multi-label classification. Additionally, it ensures consistency in the number of tokens for all examples.

Source code in mindnlp/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py
3420
3421
3422
3423
3424
3425
3426
3427
3428
3429
3430
3431
3432
3433
3434
3435
3436
3437
3438
3439
3440
3441
3442
3443
3444
3445
3446
3447
3448
3449
3450
3451
3452
3453
3454
3455
3456
3457
3458
3459
3460
3461
3462
3463
3464
3465
3466
3467
3468
3469
3470
3471
3472
3473
3474
3475
3476
3477
3478
3479
3480
3481
3482
3483
3484
3485
3486
3487
3488
3489
3490
3491
3492
3493
3494
3495
3496
3497
3498
3499
3500
3501
3502
3503
3504
3505
3506
3507
3508
3509
3510
3511
3512
3513
3514
3515
3516
3517
3518
3519
3520
3521
3522
3523
3524
3525
3526
3527
3528
3529
3530
3531
3532
3533
3534
3535
3536
3537
3538
3539
3540
3541
3542
3543
3544
3545
3546
3547
3548
3549
3550
3551
3552
3553
3554
3555
class BigBirdPegasusForSequenceClassification(BigBirdPegasusPreTrainedModel):

    """
    This class represents a BigBirdPegasus model for sequence classification. It inherits from
    BigBirdPegasusPreTrainedModel and includes methods for model initialization and forwardion of the sequence
    classifier.
    The forward method takes various input parameters for decoding and attention masks, and returns the sequence
    classifier output including logits and optional loss.
    The class also handles different problem types such as regression, single label classification, and multi-label
    classification.
    Additionally, it ensures consistency in the number of <eos> tokens for all examples.
    """
    _tied_weights_keys = ["encoder.embed_tokens.weight", "decoder.embed_tokens.weight"]

    def __init__(self, config: BigBirdPegasusConfig, **kwargs):
        """
        Initializes a new instance of the BigBirdPegasusForSequenceClassification class.

        Args:
            self: The object itself.
            config (BigBirdPegasusConfig): The configuration for the BigBirdPegasus model.
            **kwargs: Additional keyword arguments.

        Returns:
            None

        Raises:
            None
        """
        super().__init__(config, **kwargs)
        self.model = BigBirdPegasusModel(config)
        self.classification_head = BigBirdPegasusClassificationHead(
            config.d_model,
            config.d_model,
            config.num_labels,
            config.classifier_dropout,
        )

        # Initialize weights and apply final processing
        self.post_init()

    # Copied from transformers.models.bart.modeling_bart.BartForSequenceClassification.forward
    def forward(
        self,
        input_ids: mindspore.Tensor = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        decoder_input_ids: Optional[mindspore.Tensor] = None,
        decoder_attention_mask: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        decoder_head_mask: Optional[mindspore.Tensor] = None,
        cross_attn_head_mask: Optional[mindspore.Tensor] = None,
        encoder_outputs: Optional[List[mindspore.Tensor]] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        decoder_inputs_embeds: Optional[mindspore.Tensor] = None,
        labels: Optional[mindspore.Tensor] = None,
        use_cache: Optional[bool] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple, Seq2SeqSequenceClassifierOutput]:
        r"""
        Args:
            labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
                Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
                config.num_labels - 1]`. If `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
        if labels is not None:
            use_cache = False

        if input_ids is None and inputs_embeds is not None:
            raise NotImplementedError(
                f"Passing input embeddings is currently not supported for {self.__class__.__name__}"
            )

        outputs = self.model(
            input_ids,
            attention_mask=attention_mask,
            decoder_input_ids=decoder_input_ids,
            decoder_attention_mask=decoder_attention_mask,
            head_mask=head_mask,
            decoder_head_mask=decoder_head_mask,
            cross_attn_head_mask=cross_attn_head_mask,
            encoder_outputs=encoder_outputs,
            inputs_embeds=inputs_embeds,
            decoder_inputs_embeds=decoder_inputs_embeds,
            use_cache=use_cache,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )
        hidden_states = outputs[0]  # last hidden state

        eos_mask = input_ids.eq(self.config.eos_token_id)

        # if len(ops.unique_consecutive(eos_mask.sum(1))) > 1:
        #     raise ValueError("All examples must have the same number of <eos> tokens.")
        sentence_representation = hidden_states[eos_mask].view(hidden_states.shape[0], -1, hidden_states.shape[-1])[
            :, -1, :
        ]
        logits = self.classification_head(sentence_representation)

        loss = None
        if labels is not None:
            if self.config.problem_type is None:
                if self.config.num_labels == 1:
                    self.config.problem_type = "regression"
                elif self.config.num_labels > 1 and labels.dtype in (mindspore.int64, mindspore.int32):
                    self.config.problem_type = "single_label_classification"
                else:
                    self.config.problem_type = "multi_label_classification"

            if self.config.problem_type == "regression":
                if self.config.num_labels == 1:
                    loss = ops.mse_loss(logits.squeeze(), labels.squeeze())
                else:
                    loss = ops.mse_loss(logits, labels)
            elif self.config.problem_type == "single_label_classification":
                loss = F.cross_entropy(logits.view(-1, self.config.num_labels), labels.view(-1))
            elif self.config.problem_type == "multi_label_classification":
                loss = ops.binary_cross_entropyy_with_logits(logits, labels)
        if not return_dict:
            output = (logits,) + outputs[1:]
            return ((loss,) + output) if loss is not None else output

        return Seq2SeqSequenceClassifierOutput(
            loss=loss,
            logits=logits,
            past_key_values=outputs.past_key_values,
            decoder_hidden_states=outputs.decoder_hidden_states,
            decoder_attentions=outputs.decoder_attentions,
            cross_attentions=outputs.cross_attentions,
            encoder_last_hidden_state=outputs.encoder_last_hidden_state,
            encoder_hidden_states=outputs.encoder_hidden_states,
            encoder_attentions=outputs.encoder_attentions,
        )

mindnlp.transformers.models.bigbird_pegasus.modeling_bigbird_pegasus.BigBirdPegasusForSequenceClassification.__init__(config, **kwargs)

Initializes a new instance of the BigBirdPegasusForSequenceClassification class.

PARAMETER DESCRIPTION
self

The object itself.

config

The configuration for the BigBirdPegasus model.

TYPE: BigBirdPegasusConfig

**kwargs

Additional keyword arguments.

DEFAULT: {}

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py
3434
3435
3436
3437
3438
3439
3440
3441
3442
3443
3444
3445
3446
3447
3448
3449
3450
3451
3452
3453
3454
3455
3456
3457
3458
3459
def __init__(self, config: BigBirdPegasusConfig, **kwargs):
    """
    Initializes a new instance of the BigBirdPegasusForSequenceClassification class.

    Args:
        self: The object itself.
        config (BigBirdPegasusConfig): The configuration for the BigBirdPegasus model.
        **kwargs: Additional keyword arguments.

    Returns:
        None

    Raises:
        None
    """
    super().__init__(config, **kwargs)
    self.model = BigBirdPegasusModel(config)
    self.classification_head = BigBirdPegasusClassificationHead(
        config.d_model,
        config.d_model,
        config.num_labels,
        config.classifier_dropout,
    )

    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.bigbird_pegasus.modeling_bigbird_pegasus.BigBirdPegasusForSequenceClassification.forward(input_ids=None, attention_mask=None, decoder_input_ids=None, decoder_attention_mask=None, head_mask=None, decoder_head_mask=None, cross_attn_head_mask=None, encoder_outputs=None, inputs_embeds=None, decoder_inputs_embeds=None, labels=None, use_cache=None, output_attentions=None, output_hidden_states=None, return_dict=None)

PARAMETER DESCRIPTION
labels

Labels for computing the sequence classification/regression loss. Indices should be in [0, ..., config.num_labels - 1]. If config.num_labels > 1 a classification loss is computed (Cross-Entropy).

TYPE: `mindspore.Tensor` of shape `(batch_size,)`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py
3462
3463
3464
3465
3466
3467
3468
3469
3470
3471
3472
3473
3474
3475
3476
3477
3478
3479
3480
3481
3482
3483
3484
3485
3486
3487
3488
3489
3490
3491
3492
3493
3494
3495
3496
3497
3498
3499
3500
3501
3502
3503
3504
3505
3506
3507
3508
3509
3510
3511
3512
3513
3514
3515
3516
3517
3518
3519
3520
3521
3522
3523
3524
3525
3526
3527
3528
3529
3530
3531
3532
3533
3534
3535
3536
3537
3538
3539
3540
3541
3542
3543
3544
3545
3546
3547
3548
3549
3550
3551
3552
3553
3554
3555
def forward(
    self,
    input_ids: mindspore.Tensor = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    decoder_input_ids: Optional[mindspore.Tensor] = None,
    decoder_attention_mask: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    decoder_head_mask: Optional[mindspore.Tensor] = None,
    cross_attn_head_mask: Optional[mindspore.Tensor] = None,
    encoder_outputs: Optional[List[mindspore.Tensor]] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    decoder_inputs_embeds: Optional[mindspore.Tensor] = None,
    labels: Optional[mindspore.Tensor] = None,
    use_cache: Optional[bool] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[Tuple, Seq2SeqSequenceClassifierOutput]:
    r"""
    Args:
        labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
            config.num_labels - 1]`. If `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict
    if labels is not None:
        use_cache = False

    if input_ids is None and inputs_embeds is not None:
        raise NotImplementedError(
            f"Passing input embeddings is currently not supported for {self.__class__.__name__}"
        )

    outputs = self.model(
        input_ids,
        attention_mask=attention_mask,
        decoder_input_ids=decoder_input_ids,
        decoder_attention_mask=decoder_attention_mask,
        head_mask=head_mask,
        decoder_head_mask=decoder_head_mask,
        cross_attn_head_mask=cross_attn_head_mask,
        encoder_outputs=encoder_outputs,
        inputs_embeds=inputs_embeds,
        decoder_inputs_embeds=decoder_inputs_embeds,
        use_cache=use_cache,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )
    hidden_states = outputs[0]  # last hidden state

    eos_mask = input_ids.eq(self.config.eos_token_id)

    # if len(ops.unique_consecutive(eos_mask.sum(1))) > 1:
    #     raise ValueError("All examples must have the same number of <eos> tokens.")
    sentence_representation = hidden_states[eos_mask].view(hidden_states.shape[0], -1, hidden_states.shape[-1])[
        :, -1, :
    ]
    logits = self.classification_head(sentence_representation)

    loss = None
    if labels is not None:
        if self.config.problem_type is None:
            if self.config.num_labels == 1:
                self.config.problem_type = "regression"
            elif self.config.num_labels > 1 and labels.dtype in (mindspore.int64, mindspore.int32):
                self.config.problem_type = "single_label_classification"
            else:
                self.config.problem_type = "multi_label_classification"

        if self.config.problem_type == "regression":
            if self.config.num_labels == 1:
                loss = ops.mse_loss(logits.squeeze(), labels.squeeze())
            else:
                loss = ops.mse_loss(logits, labels)
        elif self.config.problem_type == "single_label_classification":
            loss = F.cross_entropy(logits.view(-1, self.config.num_labels), labels.view(-1))
        elif self.config.problem_type == "multi_label_classification":
            loss = ops.binary_cross_entropyy_with_logits(logits, labels)
    if not return_dict:
        output = (logits,) + outputs[1:]
        return ((loss,) + output) if loss is not None else output

    return Seq2SeqSequenceClassifierOutput(
        loss=loss,
        logits=logits,
        past_key_values=outputs.past_key_values,
        decoder_hidden_states=outputs.decoder_hidden_states,
        decoder_attentions=outputs.decoder_attentions,
        cross_attentions=outputs.cross_attentions,
        encoder_last_hidden_state=outputs.encoder_last_hidden_state,
        encoder_hidden_states=outputs.encoder_hidden_states,
        encoder_attentions=outputs.encoder_attentions,
    )

mindnlp.transformers.models.bigbird_pegasus.modeling_bigbird_pegasus.BigBirdPegasusModel

Bases: BigBirdPegasusPreTrainedModel

This class represents a BigBirdPegasus model for sequence-to-sequence tasks. It is a variant of the BigBird model that is specifically designed for text generation tasks using the Pegasus architecture.

The BigBirdPegasusModel class inherits from the BigBirdPegasusPreTrainedModel class, which is a base class for all pre-trained BigBirdPegasus models. It provides common methods and attributes for loading and saving models.

METHOD DESCRIPTION
__init__

BigBirdPegasusConfig): Initializes the BigBirdPegasusModel instance with a given configuration.

get_input_embeddings

Returns the shared input embeddings used by the model.

set_input_embeddings

Sets the shared input embeddings of the model.

_tie_weights

Ties the weights of the encoder and decoder embedding layers if specified in the configuration.

get_encoder

Returns the encoder module of the model.

get_decoder

Returns the decoder module of the model.

Please refer to the documentation of the individual methods for more details on their parameters and return values.

Note

This docstring is generated based on the provided code snippet and may not include all the class attributes, methods, and their details. Please refer to the source code or official documentation for complete information.

Source code in mindnlp/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py
2790
2791
2792
2793
2794
2795
2796
2797
2798
2799
2800
2801
2802
2803
2804
2805
2806
2807
2808
2809
2810
2811
2812
2813
2814
2815
2816
2817
2818
2819
2820
2821
2822
2823
2824
2825
2826
2827
2828
2829
2830
2831
2832
2833
2834
2835
2836
2837
2838
2839
2840
2841
2842
2843
2844
2845
2846
2847
2848
2849
2850
2851
2852
2853
2854
2855
2856
2857
2858
2859
2860
2861
2862
2863
2864
2865
2866
2867
2868
2869
2870
2871
2872
2873
2874
2875
2876
2877
2878
2879
2880
2881
2882
2883
2884
2885
2886
2887
2888
2889
2890
2891
2892
2893
2894
2895
2896
2897
2898
2899
2900
2901
2902
2903
2904
2905
2906
2907
2908
2909
2910
2911
2912
2913
2914
2915
2916
2917
2918
2919
2920
2921
2922
2923
2924
2925
2926
2927
2928
2929
2930
2931
2932
2933
2934
2935
2936
2937
2938
2939
2940
2941
2942
2943
2944
2945
2946
2947
2948
2949
2950
2951
2952
2953
2954
2955
2956
2957
2958
2959
2960
2961
2962
2963
2964
2965
2966
2967
2968
2969
2970
2971
2972
2973
2974
2975
2976
2977
2978
2979
2980
2981
2982
2983
2984
2985
2986
2987
2988
2989
2990
2991
2992
2993
2994
2995
2996
2997
2998
2999
3000
3001
3002
3003
3004
3005
3006
3007
3008
3009
3010
3011
3012
3013
3014
3015
3016
3017
3018
3019
3020
3021
3022
3023
3024
3025
3026
3027
3028
3029
3030
3031
3032
3033
3034
3035
3036
3037
3038
3039
3040
3041
3042
3043
3044
3045
3046
3047
3048
3049
3050
3051
3052
3053
3054
3055
3056
3057
3058
3059
3060
3061
class BigBirdPegasusModel(BigBirdPegasusPreTrainedModel):

    """
    This class represents a BigBirdPegasus model for sequence-to-sequence tasks. It is a variant of the BigBird model
    that is specifically designed for text generation tasks using the Pegasus architecture.

    The BigBirdPegasusModel class inherits from the BigBirdPegasusPreTrainedModel class, which is a base class
    for all pre-trained BigBirdPegasus models. It provides common methods and attributes for loading
    and saving models.

    Methods:
        __init__(self, config: BigBirdPegasusConfig): Initializes the BigBirdPegasusModel instance with a given configuration.
        get_input_embeddings(self): Returns the shared input embeddings used by the model.
        set_input_embeddings(self, value): Sets the shared input embeddings of the model.
        _tie_weights(self): Ties the weights of the encoder and decoder embedding layers if specified in the configuration.
        get_encoder(self): Returns the encoder module of the model.
        get_decoder(self): Returns the decoder module of the model.
        forward(self, input_ids, attention_mask, decoder_input_ids, decoder_attention_mask, head_mask,
            decoder_head_mask, cross_attn_head_mask, encoder_outputs, past_key_values, inputs_embeds,
            decoder_inputs_embeds, use_cache, output_attentions, output_hidden_states, return_dict):
            Constructs the model by performing encoding and decoding operations on the input sequence.

    Please refer to the documentation of the individual methods for more details on their parameters and return values.

    Note:
        This docstring is generated based on the provided code snippet and may not include all the class attributes,
        methods, and their details. Please refer to the source code or official documentation for
        complete information.
    """
    _tied_weights_keys = ["encoder.embed_tokens.weight", "decoder.embed_tokens.weight"]

    def __init__(self, config: BigBirdPegasusConfig):
        """
        Initializes a new instance of the BigBirdPegasusModel class.

        Args:
            self: The current instance of the class.
            config (BigBirdPegasusConfig):
                The configuration object containing various settings for the model.

                - `pad_token_id` (int): The index of the padding token in the vocabulary.
                - `vocab_size` (int): The size of the vocabulary.
                - `d_model` (int): The dimensionality of the model's hidden states.

        Returns:
            None

        Raises:
            None
        """
        super().__init__(config)

        padding_idx, vocab_size = config.pad_token_id, config.vocab_size
        self.shared = nn.Embedding(vocab_size, config.d_model, padding_idx)

        self.encoder = BigBirdPegasusEncoder(config, self.shared)
        self.decoder = BigBirdPegasusDecoder(config, self.shared)

        # Initialize weights and apply final processing
        self.post_init()

    def get_input_embeddings(self):
        """
        This method retrieves the input embeddings for the BigBirdPegasusModel.

        Args:
            self: An instance of the BigBirdPegasusModel class.

        Returns:
            None.

        Raises:
            None
        """
        return self.shared

    def set_input_embeddings(self, value):
        """
        Set the input embeddings for the BigBirdPegasusModel.

        Args:
            self (BigBirdPegasusModel): The instance of the BigBirdPegasusModel class.
            value (object): The input embeddings to be set for the model.

        Returns:
            None.

        Raises:
            None.
        """
        self.shared = value
        self.encoder.embed_tokens = self.shared
        self.decoder.embed_tokens = self.shared

    def _tie_weights(self):
        """
        Ties the weights of the encoder and decoder token embeddings if the tie_word_embeddings flag is set to True.

        Args:
            self (BigBirdPegasusModel): An instance of the BigBirdPegasusModel class.

        Returns:
            None.

        Raises:
            None.

        Note:
            This method is used to ensure that the encoder and decoder token embeddings share the same weights
            when the tie_word_embeddings flag is set to True.
            It helps in reducing the number of parameters in the model and improves training efficiency.
        """
        if self.config.tie_word_embeddings:
            self._tie_or_clone_weights(self.encoder.embed_tokens, self.shared)
            self._tie_or_clone_weights(self.decoder.embed_tokens, self.shared)

    def get_encoder(self):
        """
        This method returns the encoder associated with the BigBirdPegasusModel.

        Args:
            self: The instance of the BigBirdPegasusModel class.

        Returns:
            None: This method returns the encoder associated with the BigBirdPegasusModel.

        Raises:
            This method does not raise any exceptions.
        """
        return self.encoder

    def get_decoder(self):
        """
        Returns the decoder of the BigBirdPegasusModel.

        Args:
            self: An instance of the BigBirdPegasusModel class.

        Returns:
            decoder: This method returns the decoder of the BigBirdPegasusModel.
                The decoder is responsible for decoding the input and generating the output.

        Raises:
            None.
        """
        return self.decoder

    # Copied from transformers.models.bart.modeling_bart.BartModel.forward with Bart->BigBirdPegasus
    def forward(
        self,
        input_ids: mindspore.Tensor = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        decoder_input_ids: Optional[mindspore.Tensor] = None,
        decoder_attention_mask: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        decoder_head_mask: Optional[mindspore.Tensor] = None,
        cross_attn_head_mask: Optional[mindspore.Tensor] = None,
        encoder_outputs: Optional[List[mindspore.Tensor]] = None,
        past_key_values: Optional[List[mindspore.Tensor]] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        decoder_inputs_embeds: Optional[mindspore.Tensor] = None,
        use_cache: Optional[bool] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple, Seq2SeqModelOutput]:
        """
        Constructs the BigBirdPegasusModel.

        Args:
            self: The object instance.
            input_ids (mindspore.Tensor, optional): The input token IDs of shape (batch_size, sequence_length).
                Defaults to None.
            attention_mask (mindspore.Tensor, optional): The attention mask of shape (batch_size, sequence_length).
                Defaults to None.
            decoder_input_ids (mindspore.Tensor, optional): The decoder input token IDs of shape
                (batch_size, sequence_length). Defaults to None.
            decoder_attention_mask (mindspore.Tensor, optional): The decoder attention mask of shape
                (batch_size, sequence_length). Defaults to None.
            head_mask (mindspore.Tensor, optional): The head mask tensor of shape (num_layers, num_heads) or
                (num_layers, num_heads, sequence_length, sequence_length). Defaults to None.
            decoder_head_mask (mindspore.Tensor, optional): The decoder head mask tensor of shape
                (num_layers, num_heads) or (num_layers, num_heads, sequence_length, sequence_length). Defaults to None.
            cross_attn_head_mask (mindspore.Tensor, optional): The cross-attention head mask tensor of shape
                (num_layers, num_heads) or (num_layers, num_heads, sequence_length, sequence_length). Defaults to None.
            encoder_outputs (List[mindspore.Tensor], optional): The encoder outputs of shape
                [(batch_size, sequence_length, hidden_size), ...]. Defaults to None.
            past_key_values (List[mindspore.Tensor], optional): The past key values of shape
                [(batch_size, num_heads, past_sequence_length, hidden_size), ...]. Defaults to None.
            inputs_embeds (mindspore.Tensor, optional): The embedded inputs tensor of shape
                (batch_size, sequence_length, hidden_size). Defaults to None.
            decoder_inputs_embeds (mindspore.Tensor, optional): The embedded decoder inputs tensor of shape
                (batch_size, sequence_length, hidden_size). Defaults to None.
            use_cache (bool, optional): Whether to use cache. Defaults to None.
            output_attentions (bool, optional): Whether to output attentions. Defaults to None.
            output_hidden_states (bool, optional): Whether to output hidden states. Defaults to None.
            return_dict (bool, optional): Whether to return as a dictionary. Defaults to None.

        Returns:
            Union[Tuple, Seq2SeqModelOutput]: A tuple or a Seq2SeqModelOutput containing the model outputs.

        Raises:
            ValueError: If no `decoder_input_ids` or `decoder_inputs_embeds` are passed and `input_ids` is None.
        """
        # different to other models, BigBirdPegasus automatically creates decoder_input_ids from
        # input_ids if no decoder_input_ids are provided
        if decoder_input_ids is None and decoder_inputs_embeds is None:
            if input_ids is None:
                raise ValueError(
                    "If no `decoder_input_ids` or `decoder_inputs_embeds` are "
                    "passed, `input_ids` cannot be `None`. Please pass either "
                    "`input_ids` or `decoder_input_ids` or `decoder_inputs_embeds`."
                )

            decoder_input_ids = shift_tokens_right(
                input_ids, self.config.pad_token_id, self.config.decoder_start_token_id
            )

        output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
        output_hidden_states = (
            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
        )
        use_cache = use_cache if use_cache is not None else self.config.use_cache
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        if encoder_outputs is None:
            encoder_outputs = self.encoder(
                input_ids=input_ids,
                attention_mask=attention_mask,
                head_mask=head_mask,
                inputs_embeds=inputs_embeds,
                output_attentions=output_attentions,
                output_hidden_states=output_hidden_states,
                return_dict=return_dict,
            )
        # If the user passed a tuple for encoder_outputs, we wrap it in a BaseModelOutput when return_dict=True
        elif return_dict and not isinstance(encoder_outputs, BaseModelOutput):
            encoder_outputs = BaseModelOutput(
                last_hidden_state=encoder_outputs[0],
                hidden_states=encoder_outputs[1] if len(encoder_outputs) > 1 else None,
                attentions=encoder_outputs[2] if len(encoder_outputs) > 2 else None,
            )

        # decoder outputs consists of (dec_features, past_key_value, dec_hidden, dec_attn)
        decoder_outputs = self.decoder(
            input_ids=decoder_input_ids,
            attention_mask=decoder_attention_mask,
            encoder_hidden_states=encoder_outputs[0],
            encoder_attention_mask=attention_mask,
            head_mask=decoder_head_mask,
            cross_attn_head_mask=cross_attn_head_mask,
            past_key_values=past_key_values,
            inputs_embeds=decoder_inputs_embeds,
            use_cache=use_cache,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        if not return_dict:
            return decoder_outputs + encoder_outputs

        return Seq2SeqModelOutput(
            last_hidden_state=decoder_outputs.last_hidden_state,
            past_key_values=decoder_outputs.past_key_values,
            decoder_hidden_states=decoder_outputs.hidden_states,
            decoder_attentions=decoder_outputs.attentions,
            cross_attentions=decoder_outputs.cross_attentions,
            encoder_last_hidden_state=encoder_outputs.last_hidden_state,
            encoder_hidden_states=encoder_outputs.hidden_states,
            encoder_attentions=encoder_outputs.attentions,
        )

mindnlp.transformers.models.bigbird_pegasus.modeling_bigbird_pegasus.BigBirdPegasusModel.__init__(config)

Initializes a new instance of the BigBirdPegasusModel class.

PARAMETER DESCRIPTION
self

The current instance of the class.

config

The configuration object containing various settings for the model.

  • pad_token_id (int): The index of the padding token in the vocabulary.
  • vocab_size (int): The size of the vocabulary.
  • d_model (int): The dimensionality of the model's hidden states.

TYPE: BigBirdPegasusConfig

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py
2821
2822
2823
2824
2825
2826
2827
2828
2829
2830
2831
2832
2833
2834
2835
2836
2837
2838
2839
2840
2841
2842
2843
2844
2845
2846
2847
2848
2849
def __init__(self, config: BigBirdPegasusConfig):
    """
    Initializes a new instance of the BigBirdPegasusModel class.

    Args:
        self: The current instance of the class.
        config (BigBirdPegasusConfig):
            The configuration object containing various settings for the model.

            - `pad_token_id` (int): The index of the padding token in the vocabulary.
            - `vocab_size` (int): The size of the vocabulary.
            - `d_model` (int): The dimensionality of the model's hidden states.

    Returns:
        None

    Raises:
        None
    """
    super().__init__(config)

    padding_idx, vocab_size = config.pad_token_id, config.vocab_size
    self.shared = nn.Embedding(vocab_size, config.d_model, padding_idx)

    self.encoder = BigBirdPegasusEncoder(config, self.shared)
    self.decoder = BigBirdPegasusDecoder(config, self.shared)

    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.bigbird_pegasus.modeling_bigbird_pegasus.BigBirdPegasusModel.forward(input_ids=None, attention_mask=None, decoder_input_ids=None, decoder_attention_mask=None, head_mask=None, decoder_head_mask=None, cross_attn_head_mask=None, encoder_outputs=None, past_key_values=None, inputs_embeds=None, decoder_inputs_embeds=None, use_cache=None, output_attentions=None, output_hidden_states=None, return_dict=None)

Constructs the BigBirdPegasusModel.

PARAMETER DESCRIPTION
self

The object instance.

input_ids

The input token IDs of shape (batch_size, sequence_length). Defaults to None.

TYPE: Tensor DEFAULT: None

attention_mask

The attention mask of shape (batch_size, sequence_length). Defaults to None.

TYPE: Tensor DEFAULT: None

decoder_input_ids

The decoder input token IDs of shape (batch_size, sequence_length). Defaults to None.

TYPE: Tensor DEFAULT: None

decoder_attention_mask

The decoder attention mask of shape (batch_size, sequence_length). Defaults to None.

TYPE: Tensor DEFAULT: None

head_mask

The head mask tensor of shape (num_layers, num_heads) or (num_layers, num_heads, sequence_length, sequence_length). Defaults to None.

TYPE: Tensor DEFAULT: None

decoder_head_mask

The decoder head mask tensor of shape (num_layers, num_heads) or (num_layers, num_heads, sequence_length, sequence_length). Defaults to None.

TYPE: Tensor DEFAULT: None

cross_attn_head_mask

The cross-attention head mask tensor of shape (num_layers, num_heads) or (num_layers, num_heads, sequence_length, sequence_length). Defaults to None.

TYPE: Tensor DEFAULT: None

encoder_outputs

The encoder outputs of shape [(batch_size, sequence_length, hidden_size), ...]. Defaults to None.

TYPE: List[Tensor] DEFAULT: None

past_key_values

The past key values of shape [(batch_size, num_heads, past_sequence_length, hidden_size), ...]. Defaults to None.

TYPE: List[Tensor] DEFAULT: None

inputs_embeds

The embedded inputs tensor of shape (batch_size, sequence_length, hidden_size). Defaults to None.

TYPE: Tensor DEFAULT: None

decoder_inputs_embeds

The embedded decoder inputs tensor of shape (batch_size, sequence_length, hidden_size). Defaults to None.

TYPE: Tensor DEFAULT: None

use_cache

Whether to use cache. Defaults to None.

TYPE: bool DEFAULT: None

output_attentions

Whether to output attentions. Defaults to None.

TYPE: bool DEFAULT: None

output_hidden_states

Whether to output hidden states. Defaults to None.

TYPE: bool DEFAULT: None

return_dict

Whether to return as a dictionary. Defaults to None.

TYPE: bool DEFAULT: None

RETURNS DESCRIPTION
Union[Tuple, Seq2SeqModelOutput]

Union[Tuple, Seq2SeqModelOutput]: A tuple or a Seq2SeqModelOutput containing the model outputs.

RAISES DESCRIPTION
ValueError

If no decoder_input_ids or decoder_inputs_embeds are passed and input_ids is None.

Source code in mindnlp/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py
2938
2939
2940
2941
2942
2943
2944
2945
2946
2947
2948
2949
2950
2951
2952
2953
2954
2955
2956
2957
2958
2959
2960
2961
2962
2963
2964
2965
2966
2967
2968
2969
2970
2971
2972
2973
2974
2975
2976
2977
2978
2979
2980
2981
2982
2983
2984
2985
2986
2987
2988
2989
2990
2991
2992
2993
2994
2995
2996
2997
2998
2999
3000
3001
3002
3003
3004
3005
3006
3007
3008
3009
3010
3011
3012
3013
3014
3015
3016
3017
3018
3019
3020
3021
3022
3023
3024
3025
3026
3027
3028
3029
3030
3031
3032
3033
3034
3035
3036
3037
3038
3039
3040
3041
3042
3043
3044
3045
3046
3047
3048
3049
3050
3051
3052
3053
3054
3055
3056
3057
3058
3059
3060
3061
def forward(
    self,
    input_ids: mindspore.Tensor = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    decoder_input_ids: Optional[mindspore.Tensor] = None,
    decoder_attention_mask: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    decoder_head_mask: Optional[mindspore.Tensor] = None,
    cross_attn_head_mask: Optional[mindspore.Tensor] = None,
    encoder_outputs: Optional[List[mindspore.Tensor]] = None,
    past_key_values: Optional[List[mindspore.Tensor]] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    decoder_inputs_embeds: Optional[mindspore.Tensor] = None,
    use_cache: Optional[bool] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[Tuple, Seq2SeqModelOutput]:
    """
    Constructs the BigBirdPegasusModel.

    Args:
        self: The object instance.
        input_ids (mindspore.Tensor, optional): The input token IDs of shape (batch_size, sequence_length).
            Defaults to None.
        attention_mask (mindspore.Tensor, optional): The attention mask of shape (batch_size, sequence_length).
            Defaults to None.
        decoder_input_ids (mindspore.Tensor, optional): The decoder input token IDs of shape
            (batch_size, sequence_length). Defaults to None.
        decoder_attention_mask (mindspore.Tensor, optional): The decoder attention mask of shape
            (batch_size, sequence_length). Defaults to None.
        head_mask (mindspore.Tensor, optional): The head mask tensor of shape (num_layers, num_heads) or
            (num_layers, num_heads, sequence_length, sequence_length). Defaults to None.
        decoder_head_mask (mindspore.Tensor, optional): The decoder head mask tensor of shape
            (num_layers, num_heads) or (num_layers, num_heads, sequence_length, sequence_length). Defaults to None.
        cross_attn_head_mask (mindspore.Tensor, optional): The cross-attention head mask tensor of shape
            (num_layers, num_heads) or (num_layers, num_heads, sequence_length, sequence_length). Defaults to None.
        encoder_outputs (List[mindspore.Tensor], optional): The encoder outputs of shape
            [(batch_size, sequence_length, hidden_size), ...]. Defaults to None.
        past_key_values (List[mindspore.Tensor], optional): The past key values of shape
            [(batch_size, num_heads, past_sequence_length, hidden_size), ...]. Defaults to None.
        inputs_embeds (mindspore.Tensor, optional): The embedded inputs tensor of shape
            (batch_size, sequence_length, hidden_size). Defaults to None.
        decoder_inputs_embeds (mindspore.Tensor, optional): The embedded decoder inputs tensor of shape
            (batch_size, sequence_length, hidden_size). Defaults to None.
        use_cache (bool, optional): Whether to use cache. Defaults to None.
        output_attentions (bool, optional): Whether to output attentions. Defaults to None.
        output_hidden_states (bool, optional): Whether to output hidden states. Defaults to None.
        return_dict (bool, optional): Whether to return as a dictionary. Defaults to None.

    Returns:
        Union[Tuple, Seq2SeqModelOutput]: A tuple or a Seq2SeqModelOutput containing the model outputs.

    Raises:
        ValueError: If no `decoder_input_ids` or `decoder_inputs_embeds` are passed and `input_ids` is None.
    """
    # different to other models, BigBirdPegasus automatically creates decoder_input_ids from
    # input_ids if no decoder_input_ids are provided
    if decoder_input_ids is None and decoder_inputs_embeds is None:
        if input_ids is None:
            raise ValueError(
                "If no `decoder_input_ids` or `decoder_inputs_embeds` are "
                "passed, `input_ids` cannot be `None`. Please pass either "
                "`input_ids` or `decoder_input_ids` or `decoder_inputs_embeds`."
            )

        decoder_input_ids = shift_tokens_right(
            input_ids, self.config.pad_token_id, self.config.decoder_start_token_id
        )

    output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
    output_hidden_states = (
        output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
    )
    use_cache = use_cache if use_cache is not None else self.config.use_cache
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    if encoder_outputs is None:
        encoder_outputs = self.encoder(
            input_ids=input_ids,
            attention_mask=attention_mask,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )
    # If the user passed a tuple for encoder_outputs, we wrap it in a BaseModelOutput when return_dict=True
    elif return_dict and not isinstance(encoder_outputs, BaseModelOutput):
        encoder_outputs = BaseModelOutput(
            last_hidden_state=encoder_outputs[0],
            hidden_states=encoder_outputs[1] if len(encoder_outputs) > 1 else None,
            attentions=encoder_outputs[2] if len(encoder_outputs) > 2 else None,
        )

    # decoder outputs consists of (dec_features, past_key_value, dec_hidden, dec_attn)
    decoder_outputs = self.decoder(
        input_ids=decoder_input_ids,
        attention_mask=decoder_attention_mask,
        encoder_hidden_states=encoder_outputs[0],
        encoder_attention_mask=attention_mask,
        head_mask=decoder_head_mask,
        cross_attn_head_mask=cross_attn_head_mask,
        past_key_values=past_key_values,
        inputs_embeds=decoder_inputs_embeds,
        use_cache=use_cache,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    if not return_dict:
        return decoder_outputs + encoder_outputs

    return Seq2SeqModelOutput(
        last_hidden_state=decoder_outputs.last_hidden_state,
        past_key_values=decoder_outputs.past_key_values,
        decoder_hidden_states=decoder_outputs.hidden_states,
        decoder_attentions=decoder_outputs.attentions,
        cross_attentions=decoder_outputs.cross_attentions,
        encoder_last_hidden_state=encoder_outputs.last_hidden_state,
        encoder_hidden_states=encoder_outputs.hidden_states,
        encoder_attentions=encoder_outputs.attentions,
    )

mindnlp.transformers.models.bigbird_pegasus.modeling_bigbird_pegasus.BigBirdPegasusModel.get_decoder()

Returns the decoder of the BigBirdPegasusModel.

PARAMETER DESCRIPTION
self

An instance of the BigBirdPegasusModel class.

RETURNS DESCRIPTION
decoder

This method returns the decoder of the BigBirdPegasusModel. The decoder is responsible for decoding the input and generating the output.

Source code in mindnlp/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py
2921
2922
2923
2924
2925
2926
2927
2928
2929
2930
2931
2932
2933
2934
2935
def get_decoder(self):
    """
    Returns the decoder of the BigBirdPegasusModel.

    Args:
        self: An instance of the BigBirdPegasusModel class.

    Returns:
        decoder: This method returns the decoder of the BigBirdPegasusModel.
            The decoder is responsible for decoding the input and generating the output.

    Raises:
        None.
    """
    return self.decoder

mindnlp.transformers.models.bigbird_pegasus.modeling_bigbird_pegasus.BigBirdPegasusModel.get_encoder()

This method returns the encoder associated with the BigBirdPegasusModel.

PARAMETER DESCRIPTION
self

The instance of the BigBirdPegasusModel class.

RETURNS DESCRIPTION
None

This method returns the encoder associated with the BigBirdPegasusModel.

Source code in mindnlp/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py
2906
2907
2908
2909
2910
2911
2912
2913
2914
2915
2916
2917
2918
2919
def get_encoder(self):
    """
    This method returns the encoder associated with the BigBirdPegasusModel.

    Args:
        self: The instance of the BigBirdPegasusModel class.

    Returns:
        None: This method returns the encoder associated with the BigBirdPegasusModel.

    Raises:
        This method does not raise any exceptions.
    """
    return self.encoder

mindnlp.transformers.models.bigbird_pegasus.modeling_bigbird_pegasus.BigBirdPegasusModel.get_input_embeddings()

This method retrieves the input embeddings for the BigBirdPegasusModel.

PARAMETER DESCRIPTION
self

An instance of the BigBirdPegasusModel class.

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py
2851
2852
2853
2854
2855
2856
2857
2858
2859
2860
2861
2862
2863
2864
def get_input_embeddings(self):
    """
    This method retrieves the input embeddings for the BigBirdPegasusModel.

    Args:
        self: An instance of the BigBirdPegasusModel class.

    Returns:
        None.

    Raises:
        None
    """
    return self.shared

mindnlp.transformers.models.bigbird_pegasus.modeling_bigbird_pegasus.BigBirdPegasusModel.set_input_embeddings(value)

Set the input embeddings for the BigBirdPegasusModel.

PARAMETER DESCRIPTION
self

The instance of the BigBirdPegasusModel class.

TYPE: BigBirdPegasusModel

value

The input embeddings to be set for the model.

TYPE: object

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py
2866
2867
2868
2869
2870
2871
2872
2873
2874
2875
2876
2877
2878
2879
2880
2881
2882
def set_input_embeddings(self, value):
    """
    Set the input embeddings for the BigBirdPegasusModel.

    Args:
        self (BigBirdPegasusModel): The instance of the BigBirdPegasusModel class.
        value (object): The input embeddings to be set for the model.

    Returns:
        None.

    Raises:
        None.
    """
    self.shared = value
    self.encoder.embed_tokens = self.shared
    self.decoder.embed_tokens = self.shared

mindnlp.transformers.models.bigbird_pegasus.modeling_bigbird_pegasus.BigBirdPegasusPreTrainedModel

Bases: PreTrainedModel

The 'BigBirdPegasusPreTrainedModel' class represents a pre-trained model for natural language processing tasks. It inherits from the 'PreTrainedModel' class and includes methods for initializing weights and generating dummy inputs for the model.

The '_init_weights' method initializes the weights of the model's cells based on the specified standard deviation. It handles different cell types such as 'nn.Linear' and 'nn.Embedding', setting their weights and biases accordingly. For 'nn.Embedding' cells, it also handles padding indices to ensure proper weight initialization.

The 'dummy_inputs' property returns a dictionary of dummy inputs for the model, including an attention mask and input IDs. It uses the specified pad token ID to generate the inputs and handles padding for the input sequences.

This class provides essential functionality for initializing model weights and generating dummy inputs, making it suitable for use in natural language processing tasks.

Source code in mindnlp/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
class BigBirdPegasusPreTrainedModel(PreTrainedModel):

    """
    The 'BigBirdPegasusPreTrainedModel' class represents a pre-trained model for natural language processing tasks. 
    It inherits from the 'PreTrainedModel' class and includes methods for initializing weights
    and generating dummy inputs for the model.

    The '_init_weights' method initializes the weights of the model's cells based on the specified standard deviation. 
    It handles different cell types such as 'nn.Linear' and 'nn.Embedding', setting their weights and biases accordingly. 
    For 'nn.Embedding' cells, it also handles padding indices to ensure proper weight initialization.

    The 'dummy_inputs' property returns a dictionary of dummy inputs for the model, including an attention mask and input IDs. 
    It uses the specified pad token ID to generate the inputs and handles padding for the input sequences.

    This class provides essential functionality for initializing model weights and generating dummy inputs, 
    making it suitable for use in natural language processing tasks.
    """
    config_class = BigBirdPegasusConfig
    base_model_prefix = "model"
    supports_gradient_checkpointing = True
    _no_split_modules = ["BigBirdPegasusEncoderLayer", "BigBirdPegasusDecoderLayer"]
    _skip_keys_device_placement = "past_key_values"

    def _init_weights(self, cell):
        """Initialize the weights"""
        std = self.config.init_std
        if isinstance(cell, nn.Linear):
            # Slightly different from the TF version which uses truncated_normal for initialization
            # cf https://github.com/pytorch/pytorch/pull/5617
            cell.weight.set_data(initializer(Normal(std),
                                                    cell.weight.shape, cell.weight.dtype))
            if cell.bias is not None:
                cell.bias.set_data(initializer('zeros', cell.bias.shape, cell.bias.dtype))
        elif isinstance(cell, nn.Embedding):
            weight = np.random.normal(0.0, std, cell.weight.shape)
            if cell.padding_idx:
                weight[cell.padding_idx] = 0

            cell.weight.set_data(Tensor(weight, cell.weight.dtype))

    @property
    def dummy_inputs(self):
        """
        Retrieves dummy inputs for the 'BigBirdPegasusPreTrainedModel' class.

        Args:
            self: The current instance of the class (BigBirdPegasusPreTrainedModel).

        Returns:
            dict: 
                A dictionary containing dummy inputs for the model, with the following keys:

                - 'attention_mask': A tensor representing the attention mask. 
                It is obtained by applying the 'ne' (not equal) operation on the 'input_ids' tensor, 
                with the padding token as the argument.
                - 'input_ids': A tensor representing the input IDs for the model. 
                It contains two rows, where each row represents a different sequence. 
                The first row consists of the values [0, 6, 10, 4, 2],
                and the second row consists of the values [0, 8, 12, 2, pad_token], 
                where 'pad_token' is the padding token ID obtained from the model's configuration.

        Raises:
            None.
        """
        pad_token = self.config.pad_token_id
        input_ids = mindspore.tensor([[0, 6, 10, 4, 2], [0, 8, 12, 2, pad_token]])
        dummy_inputs = {
            "attention_mask": input_ids.ne(pad_token),
            "input_ids": input_ids,
        }
        return dummy_inputs

mindnlp.transformers.models.bigbird_pegasus.modeling_bigbird_pegasus.BigBirdPegasusPreTrainedModel.dummy_inputs property

Retrieves dummy inputs for the 'BigBirdPegasusPreTrainedModel' class.

PARAMETER DESCRIPTION
self

The current instance of the class (BigBirdPegasusPreTrainedModel).

RETURNS DESCRIPTION
dict

A dictionary containing dummy inputs for the model, with the following keys:

  • 'attention_mask': A tensor representing the attention mask. It is obtained by applying the 'ne' (not equal) operation on the 'input_ids' tensor, with the padding token as the argument.
  • 'input_ids': A tensor representing the input IDs for the model. It contains two rows, where each row represents a different sequence. The first row consists of the values [0, 6, 10, 4, 2], and the second row consists of the values [0, 8, 12, 2, pad_token], where 'pad_token' is the padding token ID obtained from the model's configuration.