Skip to content

pegasus

mindnlp.transformers.models.pegasus.tokenization_pegasus

Pegasus Tokenizer

mindnlp.transformers.models.pegasus.tokenization_pegasus.PegasusTokenizer

Bases: PreTrainedTokenizer

Construct a PEGASUS tokenizer. Based on SentencePiece.

This tokenizer inherits from [PreTrainedTokenizer] which contains most of the main methods. Users should refer to this superclass for more information regarding those methods.

PARAMETER DESCRIPTION
vocab_file

SentencePiece file (generally has a .spm extension) that contains the vocabulary necessary to instantiate a tokenizer.

TYPE: `str`

pad_token

The token used for padding, for example when batching sequences of different lengths.

TYPE: `str`, *optional*, defaults to `"<pad>"` DEFAULT: '<pad>'

eos_token

The end of sequence token.

When building a sequence using special tokens, this is not the token that is used for the end of sequence. The token used is the sep_token.

TYPE: `str`, *optional*, defaults to `"</s>"` DEFAULT: '</s>'

unk_token

The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this token instead.

TYPE: `str`, *optional*, defaults to `"<unk>"` DEFAULT: '<unk>'

mask_token

The token used for masking single token values. This is the token used when training this model with masked language modeling (MLM). This is the token that the PEGASUS encoder will try to predict during pretraining. It corresponds to [MASK2] in PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization.

TYPE: `str`, *optional*, defaults to `"<mask_2>"` DEFAULT: '<mask_2>'

mask_token_sent

The token used for masking whole target sentences. This is the token used when training this model with gap sentences generation (GSG). This is the sentence that the PEGASUS decoder will try to predict during pretraining. It corresponds to [MASK1] in PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization.

TYPE: `str`, *optional*, defaults to `"<mask_1>"` DEFAULT: '<mask_1>'

additional_special_tokens

Additional special tokens used by the tokenizer. If no additional_special_tokens are provided and are used as additional special tokens corresponding to the original PEGASUS tokenizer that uses the tokens 2 - 104 only for pretraining

TYPE: `List[str]`, *optional* DEFAULT: None

sp_model_kwargs

Will be passed to the SentencePieceProcessor.__init__() method. The Python wrapper for SentencePiece can be used, among other things, to set:

  • enable_sampling: Enable subword regularization.
  • nbest_size: Sampling parameters for unigram. Invalid for BPE-Dropout.

    • nbest_size = {0,1}: No sampling is performed.
    • nbest_size > 1: samples from the nbest_size results.
    • nbest_size < 0: assuming that nbest_size is infinite and samples from the all hypothesis (lattice) using forward-filtering-and-backward-sampling algorithm.
    • alpha: Smoothing parameter for unigram sampling, and dropout probability of merge operations for BPE-dropout.

TYPE: `dict`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/pegasus/tokenization_pegasus.py
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
class PegasusTokenizer(PreTrainedTokenizer):
    r"""
    Construct a PEGASUS tokenizer. Based on [SentencePiece](https://github.com/google/sentencepiece).

    This tokenizer inherits from [`PreTrainedTokenizer`] which contains most of the main methods. Users should refer to
    this superclass for more information regarding those methods.

    Args:
        vocab_file (`str`):
            [SentencePiece](https://github.com/google/sentencepiece) file (generally has a *.spm* extension) that
            contains the vocabulary necessary to instantiate a tokenizer.
        pad_token (`str`, *optional*, defaults to `"<pad>"`):
            The token used for padding, for example when batching sequences of different lengths.
        eos_token (`str`, *optional*, defaults to `"</s>"`):
            The end of sequence token.

            <Tip>

            When building a sequence using special tokens, this is not the token that is used for the end of sequence.
            The token used is the `sep_token`.

            </Tip>

        unk_token (`str`, *optional*, defaults to `"<unk>"`):
            The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
            token instead.
        mask_token (`str`, *optional*, defaults to `"<mask_2>"`):
            The token used for masking single token values. This is the token used when training this model with masked
            language modeling (MLM). This is the token that the PEGASUS encoder will try to predict during pretraining.
            It corresponds to *[MASK2]* in [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive
            Summarization](https://arxiv.org/pdf/1912.08777.pdf).
        mask_token_sent (`str`, *optional*, defaults to `"<mask_1>"`):
            The token used for masking whole target sentences. This is the token used when training this model with gap
            sentences generation (GSG). This is the sentence that the PEGASUS decoder will try to predict during
            pretraining. It corresponds to *[MASK1]* in [PEGASUS: Pre-training with Extracted Gap-sentences for
            Abstractive Summarization](https://arxiv.org/pdf/1912.08777.pdf).
        additional_special_tokens (`List[str]`, *optional*):
            Additional special tokens used by the tokenizer. If no additional_special_tokens are provided <mask_2> and
            <unk_2, ..., unk_102> are used as additional special tokens corresponding to the [original PEGASUS
            tokenizer](https://github.com/google-research/pegasus/blob/939830367bcf411193d2b5eca2f2f90f3f9260ca/pegasus/ops/pretrain_parsing_ops.cc#L66)
            that uses the tokens 2 - 104 only for pretraining
        sp_model_kwargs (`dict`, *optional*):
            Will be passed to the `SentencePieceProcessor.__init__()` method. The [Python wrapper for
            SentencePiece](https://github.com/google/sentencepiece/tree/master/python) can be used, among other things,
            to set:

            - `enable_sampling`: Enable subword regularization.
            - `nbest_size`: Sampling parameters for unigram. Invalid for BPE-Dropout.

                - `nbest_size = {0,1}`: No sampling is performed.
                - `nbest_size > 1`: samples from the nbest_size results.
                - `nbest_size < 0`: assuming that nbest_size is infinite and samples from the all hypothesis (lattice)
                using forward-filtering-and-backward-sampling algorithm.
                - `alpha`: Smoothing parameter for unigram sampling, and dropout probability of merge operations for
                BPE-dropout.
    """
    vocab_files_names = VOCAB_FILES_NAMES
    model_input_names = ["input_ids", "attention_mask"]

    def __init__(
        self,
        vocab_file,
        pad_token="<pad>",
        eos_token="</s>",
        unk_token="<unk>",
        mask_token="<mask_2>",
        mask_token_sent="<mask_1>",
        additional_special_tokens=None,
        offset=103,  # entries 2 - 104 are only used for pretraining
        sp_model_kwargs: Optional[Dict[str, Any]] = None,
        **kwargs,
    ) -> None:
        """
        Initialize a PegasusTokenizer object.

        Args:
            vocab_file (str): Path to the vocabulary file.
            pad_token (str, optional): Token representing padding. Default is '<pad>'.
            eos_token (str, optional): Token representing end of sentence. Default is '</s>'.
            unk_token (str, optional): Token representing unknown tokens. Default is '<unk>'.
            mask_token (str, optional): Token representing masked tokens. Default is '<mask_2>'.
            mask_token_sent (str, optional): Token representing masked tokens at sentence level. Default is '<mask_1>'.
            additional_special_tokens (List[str], optional): List of additional special tokens. Default is None.
            offset (int): Offset value for special tokens.
            sp_model_kwargs (Optional[Dict[str, Any]], optional): Additional arguments for SentencePieceProcessor.
                Default is None.

        Returns:
            None

        Raises:
            TypeError: If additional_special_tokens is not a list.
            ValueError: If additional_special_tokens contain an incorrectly shifted list of unknown tokens.
        """
        self.offset = offset
        if additional_special_tokens is not None:
            if not isinstance(additional_special_tokens, list):
                raise TypeError(
                    f"additional_special_tokens should be of type {type(list)}, but is"
                    f" {type(additional_special_tokens)}"
                )
            additional_special_tokens_extended = (
                ([mask_token_sent] + additional_special_tokens)
                if mask_token_sent not in additional_special_tokens and mask_token_sent is not None
                else additional_special_tokens
            )
            # fill additional tokens with ..., <unk_token_102> in case not all additional tokens are already taken
            additional_special_tokens_extended += [
                f"<unk_{i}>" for i in range(len(additional_special_tokens_extended), self.offset - 1)
            ]

            if len(set(additional_special_tokens_extended)) != len(additional_special_tokens_extended):
                raise ValueError(
                    "Please make sure that the provided additional_special_tokens do not contain an incorrectly"
                    f" shifted list of <unk_x> tokens. Found {additional_special_tokens_extended}."
                )
            additional_special_tokens = additional_special_tokens_extended
        else:
            additional_special_tokens_extended = []
            additional_special_tokens = [mask_token_sent] if mask_token_sent is not None else []
            additional_special_tokens += [f"<unk_{i}>" for i in range(2, self.offset)]

        self.sp_model_kwargs = {} if sp_model_kwargs is None else sp_model_kwargs
        self.mask_token_sent = mask_token_sent
        self.vocab_file = vocab_file
        self.sp_model = spm.SentencePieceProcessor(**self.sp_model_kwargs)
        self.sp_model.Load(vocab_file)

        _added_tokens_decoder = {
            0: AddedToken(str(pad_token), special=True),
            1: AddedToken(str(eos_token), special=True),
        }

        if self.mask_token_sent is not None:
            _added_tokens_decoder[2] = AddedToken(mask_token_sent, special=True)
            _added_tokens_decoder[3] = AddedToken(str(mask_token), special=True)

        for i in range(2, self.offset):
            _added_tokens_decoder[len(_added_tokens_decoder)] = AddedToken(f"<unk_{i}>", special=True)

        # Force update as we want to make sure vocab is enforced (same as fast)
        self._added_tokens_decoder = kwargs.pop("added_tokens_decoder", {})
        self._added_tokens_decoder.update(_added_tokens_decoder)

        super().__init__(
            eos_token=eos_token,
            unk_token=unk_token,
            mask_token=mask_token,
            pad_token=pad_token,
            mask_token_sent=mask_token_sent,
            offset=offset,
            additional_special_tokens=additional_special_tokens,
            sp_model_kwargs=self.sp_model_kwargs,
            **kwargs,
        )

    @property
    def vocab_size(self) -> int:
        """
        This method returns the size of the vocabulary used by the PegasusTokenizer.

        Args:
            self (PegasusTokenizer): The instance of the PegasusTokenizer class.

        Returns:
            int: The size of the vocabulary, calculated as the length of the sp_model attribute plus the offset.

        Raises:
            None
        """
        return len(self.sp_model) + self.offset

    def get_vocab(self) -> Dict[str, int]:
        """
        Returns the vocabulary of the PegasusTokenizer.

        Args:
            self: An instance of the PegasusTokenizer class.

        Returns:
            dict:
                A dictionary containing the vocabulary of the tokenizer, where the keys are strings representing tokens
                and the values are integers representing their corresponding ids.

        Raises:
            None.

        Note:
            The vocabulary includes both the base tokenizer's vocabulary and any additional tokens that
            have been added using the `add_tokens` method.

        Example:
            ```python
            >>> tokenizer = PegasusTokenizer()
            >>> vocab = tokenizer.get_vocab()
            >>> print(vocab)
            {'<s>': 0, '</s>': 1, '<unk>': 2, '<pad>': 3, '<mask>': 4, 'additional_token': 5, ...}
            ```
        """
        vocab = {self.convert_ids_to_tokens(i): i for i in range(self.vocab_size)}
        vocab.update(self.added_tokens_encoder)
        return vocab

    def __getstate__(self):
        """
        This method __getstate__ is defined within the class PegasusTokenizer.
        It is used to return the state of the object for serialization purposes.

        Args:
            self (object): The instance of the PegasusTokenizer class.
                This parameter refers to the current object instance used to call the method.

        Returns:
            None: This method returns a value of type None.
                It modifies the state dictionary by setting the 'sp_model' key to None before returning it.

        Raises:
            This method does not raise any exceptions.
        """
        state = self.__dict__.copy()
        state["sp_model"] = None
        return state

    def __setstate__(self, d):
        """
        This method __setstate__ is defined within the class PegasusTokenizer and is used to set the internal state
        of the tokenizer object based on the provided dictionary 'd'.

        Args:
            self (PegasusTokenizer): The instance of the PegasusTokenizer class on which this method is called.
            d (dict): A dictionary containing the state information to be set on the tokenizer object.
                This dictionary is expected to hold the necessary data for setting the state of the tokenizer.

        Returns:
            None: This method does not return any value explicitly. It updates the internal state of the
                PegasusTokenizer object based on the provided dictionary 'd'.

        Raises:
            None: However, potential exceptions that could occur during the execution of this method may include any
                exceptions raised by the SentencePieceProcessor class methods like Load, if there are issues with
                loading the vocabulary file specified in the state information.
        """
        self.__dict__ = d

        # for backward compatibility
        if not hasattr(self, "sp_model_kwargs"):
            self.sp_model_kwargs = {}

        self.sp_model = spm.SentencePieceProcessor(**self.sp_model_kwargs)
        self.sp_model.Load(self.vocab_file)

    def _tokenize(self, text: str) -> List[str]:
        """Take as input a string and return a list of strings (tokens) for words/sub-words"""
        return self.sp_model.encode(text, out_type=str)

    def _convert_token_to_id(self, token: str) -> int:
        """Converts a token (str) to an id using the vocab."""
        sp_id = self.sp_model.piece_to_id(token)
        return sp_id + self.offset

    def _convert_id_to_token(self, index: int) -> str:
        """Converts an index (integer) to a token (str) using the vocab."""
        if index < self.offset:
            return self.sp_model.IdToPiece(index)
        token = self.sp_model.IdToPiece(index - self.offset)
        return token

    def convert_tokens_to_string(self, tokens):
        """Converts a sequence of tokens (string) in a single string."""
        current_sub_tokens = []
        out_string = ""
        for token in tokens:
            # make sure that special tokens are not decoded using sentencepiece model
            if token in self.all_special_tokens:
                out_string += self.sp_model.decode(current_sub_tokens) + token
                current_sub_tokens = []
            else:
                current_sub_tokens.append(token)
        out_string += self.sp_model.decode(current_sub_tokens)
        return out_string.strip()

    def num_special_tokens_to_add(self, pair=False):
        """Just EOS"""
        return 1

    def _special_token_mask(self, seq):
        """
        This method is defined in the 'PegasusTokenizer' class and is named '_special_token_mask'.
        It takes two parameters: self and seq.

        Args:
            self: An instance of the 'PegasusTokenizer' class.
            seq (list): A list of integers representing a sequence of tokens.

        Returns:
            None.

        Raises:
            None.
        """
        all_special_ids = set(self.all_special_ids)  # call it once instead of inside list comp
        all_special_ids.remove(self.unk_token_id)  # <unk> is only sometimes special

        return [1 if x in all_special_ids else 0 for x in seq]

    def get_special_tokens_mask(
        self, token_ids_0: List, token_ids_1: Optional[List] = None, already_has_special_tokens: bool = False
    ) -> List[int]:
        """Get list where entries are [1] if a token is [eos] or [pad] else 0."""
        if already_has_special_tokens:
            return self._special_token_mask(token_ids_0)
        elif token_ids_1 is None:
            return self._special_token_mask(token_ids_0) + [1]
        else:
            return self._special_token_mask(token_ids_0 + token_ids_1) + [1]

    def build_inputs_with_special_tokens(self, token_ids_0, token_ids_1=None) -> List[int]:
        """
        Build model inputs from a sequence or a pair of sequences for sequence classification tasks by concatenating
        and adding special tokens. A PEGASUS sequence has the following format, where `X` represents the sequence:

        - single sequence: `X </s>`
        - pair of sequences: `A B </s>` (not intended use)

        BOS is never used. Pairs of sequences are not the expected use case, but they will be handled without a
        separator.

        Args:
            token_ids_0 (`List[int]`):
                List of IDs to which the special tokens will be added.
            token_ids_1 (`List[int]`, *optional*):
                Optional second list of IDs for sequence pairs.

        Returns:
            `List[int]`: List of [input IDs](../glossary#input-ids) with the appropriate special tokens.
        """
        if token_ids_1 is None:
            return token_ids_0 + [self.eos_token_id]
        # We don't expect to process pairs, but leave the pair logic for API consistency
        return token_ids_0 + token_ids_1 + [self.eos_token_id]

    def save_vocabulary(self, save_directory: str, filename_prefix: Optional[str] = None) -> Tuple[str]:
        """Save the vocabulary files for the Pegasus Tokenizer.

        Args:
            self (PegasusTokenizer): An instance of the PegasusTokenizer class.
            save_directory (str): The directory path where the vocabulary files will be saved.
            filename_prefix (Optional[str], optional): An optional prefix to be added to the filename. Defaults to None.

        Returns:
            Tuple[str]: A tuple containing the file path of the saved vocabulary file.

        Raises:
            OSError: If the `save_directory` path is not a valid directory.

        This method saves the vocabulary files required for the Pegasus Tokenizer. 
        The `save_directory` parameter specifies the directory path where the vocabulary files will be saved. 
        If `filename_prefix` is provided, it will be added as a prefix to the filename. 
        The saved vocabulary file path is returned as a tuple containing a single string value.

        If the `save_directory` path is not a valid directory, an OSError will be raised.
        """
        if not os.path.isdir(save_directory):
            logger.error(f"Vocabulary path ({save_directory}) should be a directory")
            return
        out_vocab_file = os.path.join(
            save_directory, (filename_prefix + "-" if filename_prefix else "") + VOCAB_FILES_NAMES["vocab_file"]
        )

        if os.path.abspath(self.vocab_file) != os.path.abspath(out_vocab_file) and os.path.isfile(self.vocab_file):
            copyfile(self.vocab_file, out_vocab_file)
        elif not os.path.isfile(self.vocab_file):
            with open(out_vocab_file, "wb") as fi:
                content_spiece_model = self.sp_model.serialized_model_proto()
                fi.write(content_spiece_model)

        return (out_vocab_file,)

mindnlp.transformers.models.pegasus.tokenization_pegasus.PegasusTokenizer.vocab_size: int property

This method returns the size of the vocabulary used by the PegasusTokenizer.

PARAMETER DESCRIPTION
self

The instance of the PegasusTokenizer class.

TYPE: PegasusTokenizer

RETURNS DESCRIPTION
int

The size of the vocabulary, calculated as the length of the sp_model attribute plus the offset.

TYPE: int

mindnlp.transformers.models.pegasus.tokenization_pegasus.PegasusTokenizer.__getstate__()

This method getstate is defined within the class PegasusTokenizer. It is used to return the state of the object for serialization purposes.

PARAMETER DESCRIPTION
self

The instance of the PegasusTokenizer class. This parameter refers to the current object instance used to call the method.

TYPE: object

RETURNS DESCRIPTION
None

This method returns a value of type None. It modifies the state dictionary by setting the 'sp_model' key to None before returning it.

Source code in mindnlp/transformers/models/pegasus/tokenization_pegasus.py
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
def __getstate__(self):
    """
    This method __getstate__ is defined within the class PegasusTokenizer.
    It is used to return the state of the object for serialization purposes.

    Args:
        self (object): The instance of the PegasusTokenizer class.
            This parameter refers to the current object instance used to call the method.

    Returns:
        None: This method returns a value of type None.
            It modifies the state dictionary by setting the 'sp_model' key to None before returning it.

    Raises:
        This method does not raise any exceptions.
    """
    state = self.__dict__.copy()
    state["sp_model"] = None
    return state

mindnlp.transformers.models.pegasus.tokenization_pegasus.PegasusTokenizer.__init__(vocab_file, pad_token='<pad>', eos_token='</s>', unk_token='<unk>', mask_token='<mask_2>', mask_token_sent='<mask_1>', additional_special_tokens=None, offset=103, sp_model_kwargs=None, **kwargs)

Initialize a PegasusTokenizer object.

PARAMETER DESCRIPTION
vocab_file

Path to the vocabulary file.

TYPE: str

pad_token

Token representing padding. Default is ''.

TYPE: str DEFAULT: '<pad>'

eos_token

Token representing end of sentence. Default is ''.

TYPE: str DEFAULT: '</s>'

unk_token

Token representing unknown tokens. Default is ''.

TYPE: str DEFAULT: '<unk>'

mask_token

Token representing masked tokens. Default is ''.

TYPE: str DEFAULT: '<mask_2>'

mask_token_sent

Token representing masked tokens at sentence level. Default is ''.

TYPE: str DEFAULT: '<mask_1>'

additional_special_tokens

List of additional special tokens. Default is None.

TYPE: List[str] DEFAULT: None

offset

Offset value for special tokens.

TYPE: int DEFAULT: 103

sp_model_kwargs

Additional arguments for SentencePieceProcessor. Default is None.

TYPE: Optional[Dict[str, Any]] DEFAULT: None

RETURNS DESCRIPTION
None

None

RAISES DESCRIPTION
TypeError

If additional_special_tokens is not a list.

ValueError

If additional_special_tokens contain an incorrectly shifted list of unknown tokens.

Source code in mindnlp/transformers/models/pegasus/tokenization_pegasus.py
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
def __init__(
    self,
    vocab_file,
    pad_token="<pad>",
    eos_token="</s>",
    unk_token="<unk>",
    mask_token="<mask_2>",
    mask_token_sent="<mask_1>",
    additional_special_tokens=None,
    offset=103,  # entries 2 - 104 are only used for pretraining
    sp_model_kwargs: Optional[Dict[str, Any]] = None,
    **kwargs,
) -> None:
    """
    Initialize a PegasusTokenizer object.

    Args:
        vocab_file (str): Path to the vocabulary file.
        pad_token (str, optional): Token representing padding. Default is '<pad>'.
        eos_token (str, optional): Token representing end of sentence. Default is '</s>'.
        unk_token (str, optional): Token representing unknown tokens. Default is '<unk>'.
        mask_token (str, optional): Token representing masked tokens. Default is '<mask_2>'.
        mask_token_sent (str, optional): Token representing masked tokens at sentence level. Default is '<mask_1>'.
        additional_special_tokens (List[str], optional): List of additional special tokens. Default is None.
        offset (int): Offset value for special tokens.
        sp_model_kwargs (Optional[Dict[str, Any]], optional): Additional arguments for SentencePieceProcessor.
            Default is None.

    Returns:
        None

    Raises:
        TypeError: If additional_special_tokens is not a list.
        ValueError: If additional_special_tokens contain an incorrectly shifted list of unknown tokens.
    """
    self.offset = offset
    if additional_special_tokens is not None:
        if not isinstance(additional_special_tokens, list):
            raise TypeError(
                f"additional_special_tokens should be of type {type(list)}, but is"
                f" {type(additional_special_tokens)}"
            )
        additional_special_tokens_extended = (
            ([mask_token_sent] + additional_special_tokens)
            if mask_token_sent not in additional_special_tokens and mask_token_sent is not None
            else additional_special_tokens
        )
        # fill additional tokens with ..., <unk_token_102> in case not all additional tokens are already taken
        additional_special_tokens_extended += [
            f"<unk_{i}>" for i in range(len(additional_special_tokens_extended), self.offset - 1)
        ]

        if len(set(additional_special_tokens_extended)) != len(additional_special_tokens_extended):
            raise ValueError(
                "Please make sure that the provided additional_special_tokens do not contain an incorrectly"
                f" shifted list of <unk_x> tokens. Found {additional_special_tokens_extended}."
            )
        additional_special_tokens = additional_special_tokens_extended
    else:
        additional_special_tokens_extended = []
        additional_special_tokens = [mask_token_sent] if mask_token_sent is not None else []
        additional_special_tokens += [f"<unk_{i}>" for i in range(2, self.offset)]

    self.sp_model_kwargs = {} if sp_model_kwargs is None else sp_model_kwargs
    self.mask_token_sent = mask_token_sent
    self.vocab_file = vocab_file
    self.sp_model = spm.SentencePieceProcessor(**self.sp_model_kwargs)
    self.sp_model.Load(vocab_file)

    _added_tokens_decoder = {
        0: AddedToken(str(pad_token), special=True),
        1: AddedToken(str(eos_token), special=True),
    }

    if self.mask_token_sent is not None:
        _added_tokens_decoder[2] = AddedToken(mask_token_sent, special=True)
        _added_tokens_decoder[3] = AddedToken(str(mask_token), special=True)

    for i in range(2, self.offset):
        _added_tokens_decoder[len(_added_tokens_decoder)] = AddedToken(f"<unk_{i}>", special=True)

    # Force update as we want to make sure vocab is enforced (same as fast)
    self._added_tokens_decoder = kwargs.pop("added_tokens_decoder", {})
    self._added_tokens_decoder.update(_added_tokens_decoder)

    super().__init__(
        eos_token=eos_token,
        unk_token=unk_token,
        mask_token=mask_token,
        pad_token=pad_token,
        mask_token_sent=mask_token_sent,
        offset=offset,
        additional_special_tokens=additional_special_tokens,
        sp_model_kwargs=self.sp_model_kwargs,
        **kwargs,
    )

mindnlp.transformers.models.pegasus.tokenization_pegasus.PegasusTokenizer.__setstate__(d)

This method setstate is defined within the class PegasusTokenizer and is used to set the internal state of the tokenizer object based on the provided dictionary 'd'.

PARAMETER DESCRIPTION
self

The instance of the PegasusTokenizer class on which this method is called.

TYPE: PegasusTokenizer

d

A dictionary containing the state information to be set on the tokenizer object. This dictionary is expected to hold the necessary data for setting the state of the tokenizer.

TYPE: dict

RETURNS DESCRIPTION
None

This method does not return any value explicitly. It updates the internal state of the PegasusTokenizer object based on the provided dictionary 'd'.

RAISES DESCRIPTION
None

However, potential exceptions that could occur during the execution of this method may include any exceptions raised by the SentencePieceProcessor class methods like Load, if there are issues with loading the vocabulary file specified in the state information.

Source code in mindnlp/transformers/models/pegasus/tokenization_pegasus.py
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
def __setstate__(self, d):
    """
    This method __setstate__ is defined within the class PegasusTokenizer and is used to set the internal state
    of the tokenizer object based on the provided dictionary 'd'.

    Args:
        self (PegasusTokenizer): The instance of the PegasusTokenizer class on which this method is called.
        d (dict): A dictionary containing the state information to be set on the tokenizer object.
            This dictionary is expected to hold the necessary data for setting the state of the tokenizer.

    Returns:
        None: This method does not return any value explicitly. It updates the internal state of the
            PegasusTokenizer object based on the provided dictionary 'd'.

    Raises:
        None: However, potential exceptions that could occur during the execution of this method may include any
            exceptions raised by the SentencePieceProcessor class methods like Load, if there are issues with
            loading the vocabulary file specified in the state information.
    """
    self.__dict__ = d

    # for backward compatibility
    if not hasattr(self, "sp_model_kwargs"):
        self.sp_model_kwargs = {}

    self.sp_model = spm.SentencePieceProcessor(**self.sp_model_kwargs)
    self.sp_model.Load(self.vocab_file)

mindnlp.transformers.models.pegasus.tokenization_pegasus.PegasusTokenizer.build_inputs_with_special_tokens(token_ids_0, token_ids_1=None)

Build model inputs from a sequence or a pair of sequences for sequence classification tasks by concatenating and adding special tokens. A PEGASUS sequence has the following format, where X represents the sequence:

  • single sequence: X </s>
  • pair of sequences: A B </s> (not intended use)

BOS is never used. Pairs of sequences are not the expected use case, but they will be handled without a separator.

PARAMETER DESCRIPTION
token_ids_0

List of IDs to which the special tokens will be added.

TYPE: `List[int]`

token_ids_1

Optional second list of IDs for sequence pairs.

TYPE: `List[int]`, *optional* DEFAULT: None

RETURNS DESCRIPTION
List[int]

List[int]: List of input IDs with the appropriate special tokens.

Source code in mindnlp/transformers/models/pegasus/tokenization_pegasus.py
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
def build_inputs_with_special_tokens(self, token_ids_0, token_ids_1=None) -> List[int]:
    """
    Build model inputs from a sequence or a pair of sequences for sequence classification tasks by concatenating
    and adding special tokens. A PEGASUS sequence has the following format, where `X` represents the sequence:

    - single sequence: `X </s>`
    - pair of sequences: `A B </s>` (not intended use)

    BOS is never used. Pairs of sequences are not the expected use case, but they will be handled without a
    separator.

    Args:
        token_ids_0 (`List[int]`):
            List of IDs to which the special tokens will be added.
        token_ids_1 (`List[int]`, *optional*):
            Optional second list of IDs for sequence pairs.

    Returns:
        `List[int]`: List of [input IDs](../glossary#input-ids) with the appropriate special tokens.
    """
    if token_ids_1 is None:
        return token_ids_0 + [self.eos_token_id]
    # We don't expect to process pairs, but leave the pair logic for API consistency
    return token_ids_0 + token_ids_1 + [self.eos_token_id]

mindnlp.transformers.models.pegasus.tokenization_pegasus.PegasusTokenizer.convert_tokens_to_string(tokens)

Converts a sequence of tokens (string) in a single string.

Source code in mindnlp/transformers/models/pegasus/tokenization_pegasus.py
301
302
303
304
305
306
307
308
309
310
311
312
313
def convert_tokens_to_string(self, tokens):
    """Converts a sequence of tokens (string) in a single string."""
    current_sub_tokens = []
    out_string = ""
    for token in tokens:
        # make sure that special tokens are not decoded using sentencepiece model
        if token in self.all_special_tokens:
            out_string += self.sp_model.decode(current_sub_tokens) + token
            current_sub_tokens = []
        else:
            current_sub_tokens.append(token)
    out_string += self.sp_model.decode(current_sub_tokens)
    return out_string.strip()

mindnlp.transformers.models.pegasus.tokenization_pegasus.PegasusTokenizer.get_special_tokens_mask(token_ids_0, token_ids_1=None, already_has_special_tokens=False)

Get list where entries are [1] if a token is [eos] or [pad] else 0.

Source code in mindnlp/transformers/models/pegasus/tokenization_pegasus.py
339
340
341
342
343
344
345
346
347
348
def get_special_tokens_mask(
    self, token_ids_0: List, token_ids_1: Optional[List] = None, already_has_special_tokens: bool = False
) -> List[int]:
    """Get list where entries are [1] if a token is [eos] or [pad] else 0."""
    if already_has_special_tokens:
        return self._special_token_mask(token_ids_0)
    elif token_ids_1 is None:
        return self._special_token_mask(token_ids_0) + [1]
    else:
        return self._special_token_mask(token_ids_0 + token_ids_1) + [1]

mindnlp.transformers.models.pegasus.tokenization_pegasus.PegasusTokenizer.get_vocab()

Returns the vocabulary of the PegasusTokenizer.

PARAMETER DESCRIPTION
self

An instance of the PegasusTokenizer class.

RETURNS DESCRIPTION
dict

A dictionary containing the vocabulary of the tokenizer, where the keys are strings representing tokens and the values are integers representing their corresponding ids.

TYPE: Dict[str, int]

Note

The vocabulary includes both the base tokenizer's vocabulary and any additional tokens that have been added using the add_tokens method.

Example
>>> tokenizer = PegasusTokenizer()
>>> vocab = tokenizer.get_vocab()
>>> print(vocab)
{'<s>': 0, '</s>': 1, '<unk>': 2, '<pad>': 3, '<mask>': 4, 'additional_token': 5, ...}
Source code in mindnlp/transformers/models/pegasus/tokenization_pegasus.py
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
def get_vocab(self) -> Dict[str, int]:
    """
    Returns the vocabulary of the PegasusTokenizer.

    Args:
        self: An instance of the PegasusTokenizer class.

    Returns:
        dict:
            A dictionary containing the vocabulary of the tokenizer, where the keys are strings representing tokens
            and the values are integers representing their corresponding ids.

    Raises:
        None.

    Note:
        The vocabulary includes both the base tokenizer's vocabulary and any additional tokens that
        have been added using the `add_tokens` method.

    Example:
        ```python
        >>> tokenizer = PegasusTokenizer()
        >>> vocab = tokenizer.get_vocab()
        >>> print(vocab)
        {'<s>': 0, '</s>': 1, '<unk>': 2, '<pad>': 3, '<mask>': 4, 'additional_token': 5, ...}
        ```
    """
    vocab = {self.convert_ids_to_tokens(i): i for i in range(self.vocab_size)}
    vocab.update(self.added_tokens_encoder)
    return vocab

mindnlp.transformers.models.pegasus.tokenization_pegasus.PegasusTokenizer.num_special_tokens_to_add(pair=False)

Just EOS

Source code in mindnlp/transformers/models/pegasus/tokenization_pegasus.py
315
316
317
def num_special_tokens_to_add(self, pair=False):
    """Just EOS"""
    return 1

mindnlp.transformers.models.pegasus.tokenization_pegasus.PegasusTokenizer.save_vocabulary(save_directory, filename_prefix=None)

Save the vocabulary files for the Pegasus Tokenizer.

PARAMETER DESCRIPTION
self

An instance of the PegasusTokenizer class.

TYPE: PegasusTokenizer

save_directory

The directory path where the vocabulary files will be saved.

TYPE: str

filename_prefix

An optional prefix to be added to the filename. Defaults to None.

TYPE: Optional[str] DEFAULT: None

RETURNS DESCRIPTION
Tuple[str]

Tuple[str]: A tuple containing the file path of the saved vocabulary file.

RAISES DESCRIPTION
OSError

If the save_directory path is not a valid directory.

This method saves the vocabulary files required for the Pegasus Tokenizer. The save_directory parameter specifies the directory path where the vocabulary files will be saved. If filename_prefix is provided, it will be added as a prefix to the filename. The saved vocabulary file path is returned as a tuple containing a single string value.

If the save_directory path is not a valid directory, an OSError will be raised.

Source code in mindnlp/transformers/models/pegasus/tokenization_pegasus.py
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
def save_vocabulary(self, save_directory: str, filename_prefix: Optional[str] = None) -> Tuple[str]:
    """Save the vocabulary files for the Pegasus Tokenizer.

    Args:
        self (PegasusTokenizer): An instance of the PegasusTokenizer class.
        save_directory (str): The directory path where the vocabulary files will be saved.
        filename_prefix (Optional[str], optional): An optional prefix to be added to the filename. Defaults to None.

    Returns:
        Tuple[str]: A tuple containing the file path of the saved vocabulary file.

    Raises:
        OSError: If the `save_directory` path is not a valid directory.

    This method saves the vocabulary files required for the Pegasus Tokenizer. 
    The `save_directory` parameter specifies the directory path where the vocabulary files will be saved. 
    If `filename_prefix` is provided, it will be added as a prefix to the filename. 
    The saved vocabulary file path is returned as a tuple containing a single string value.

    If the `save_directory` path is not a valid directory, an OSError will be raised.
    """
    if not os.path.isdir(save_directory):
        logger.error(f"Vocabulary path ({save_directory}) should be a directory")
        return
    out_vocab_file = os.path.join(
        save_directory, (filename_prefix + "-" if filename_prefix else "") + VOCAB_FILES_NAMES["vocab_file"]
    )

    if os.path.abspath(self.vocab_file) != os.path.abspath(out_vocab_file) and os.path.isfile(self.vocab_file):
        copyfile(self.vocab_file, out_vocab_file)
    elif not os.path.isfile(self.vocab_file):
        with open(out_vocab_file, "wb") as fi:
            content_spiece_model = self.sp_model.serialized_model_proto()
            fi.write(content_spiece_model)

    return (out_vocab_file,)

mindnlp.transformers.models.pegasus.tokenization_pegasus_fast

Tokenization class for model PEGASUS.

mindnlp.transformers.models.pegasus.tokenization_pegasus_fast.PegasusTokenizerFast

Bases: PreTrainedTokenizerFast

Construct a "fast" PEGASUS tokenizer (backed by HuggingFace's tokenizers library). Based on Unigram.

This tokenizer inherits from [PreTrainedTokenizerFast] which contains most of the main methods. Users should refer to this superclass for more information regarding those methods.

PARAMETER DESCRIPTION
vocab_file

SentencePiece file (generally has a .spm extension) that contains the vocabulary necessary to instantiate a tokenizer.

TYPE: `str` DEFAULT: None

pad_token

The token used for padding, for example when batching sequences of different lengths.

TYPE: `str`, *optional*, defaults to `"<pad>"` DEFAULT: '<pad>'

eos_token

The end of sequence token.

When building a sequence using special tokens, this is not the token that is used for the end of sequence. The token used is the sep_token.

TYPE: `str`, *optional*, defaults to `"</s>"` DEFAULT: '</s>'

unk_token

The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this token instead.

TYPE: `str`, *optional*, defaults to `"<unk>"` DEFAULT: '<unk>'

mask_token

The token used for masking single token values. This is the token used when training this model with masked language modeling (MLM). This is the token that the PEGASUS encoder will try to predict during pretraining. It corresponds to [MASK2] in PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization.

TYPE: `str`, *optional*, defaults to `"<mask_2>"` DEFAULT: '<mask_2>'

mask_token_sent

The token used for masking whole target sentences. This is the token used when training this model with gap sentences generation (GSG). This is the sentence that the PEGASUS decoder will try to predict during pretraining. It corresponds to [MASK1] in PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization.

TYPE: `str`, *optional*, defaults to `"<mask_1>"` DEFAULT: '<mask_1>'

additional_special_tokens

Additional special tokens used by the tokenizer. If no additional_special_tokens are provided and are used as additional special tokens corresponding to the original PEGASUS tokenizer that uses the tokens 2 - 104 only for pretraining

TYPE: `List[str]`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/pegasus/tokenization_pegasus_fast.py
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
class PegasusTokenizerFast(PreTrainedTokenizerFast):
    r"""
    Construct a "fast" PEGASUS tokenizer (backed by HuggingFace's *tokenizers* library). Based on
    [Unigram](https://hf-mirror.com/docs/tokenizers/python/latest/components.html?highlight=unigram#models).

    This tokenizer inherits from [`PreTrainedTokenizerFast`] which contains most of the main methods. Users should
    refer to this superclass for more information regarding those methods.

    Args:
        vocab_file (`str`):
            [SentencePiece](https://github.com/google/sentencepiece) file (generally has a *.spm* extension) that
            contains the vocabulary necessary to instantiate a tokenizer.
        pad_token (`str`, *optional*, defaults to `"<pad>"`):
            The token used for padding, for example when batching sequences of different lengths.
        eos_token (`str`, *optional*, defaults to `"</s>"`):
            The end of sequence token.

            <Tip>

            When building a sequence using special tokens, this is not the token that is used for the end of sequence.
            The token used is the `sep_token`.

            </Tip>

        unk_token (`str`, *optional*, defaults to `"<unk>"`):
            The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
            token instead.
        mask_token (`str`, *optional*, defaults to `"<mask_2>"`):
            The token used for masking single token values. This is the token used when training this model with masked
            language modeling (MLM). This is the token that the PEGASUS encoder will try to predict during pretraining.
            It corresponds to *[MASK2]* in [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive
            Summarization](https://arxiv.org/pdf/1912.08777.pdf).
        mask_token_sent (`str`, *optional*, defaults to `"<mask_1>"`):
            The token used for masking whole target sentences. This is the token used when training this model with gap
            sentences generation (GSG). This is the sentence that the PEGASUS decoder will try to predict during
            pretraining. It corresponds to *[MASK1]* in [PEGASUS: Pre-training with Extracted Gap-sentences for
            Abstractive Summarization](https://arxiv.org/pdf/1912.08777.pdf).
        additional_special_tokens (`List[str]`, *optional*):
            Additional special tokens used by the tokenizer. If no additional_special_tokens are provided <mask_2> and
            <unk_2, ..., unk_102> are used as additional special tokens corresponding to the [original PEGASUS
            tokenizer](https://github.com/google-research/pegasus/blob/939830367bcf411193d2b5eca2f2f90f3f9260ca/pegasus/ops/pretrain_parsing_ops.cc#L66)
            that uses the tokens 2 - 104 only for pretraining
    """
    vocab_files_names = VOCAB_FILES_NAMES
    slow_tokenizer_class = PegasusTokenizer
    model_input_names = ["input_ids", "attention_mask"]

    def __init__(
        self,
        vocab_file=None,
        tokenizer_file=None,
        pad_token="<pad>",
        eos_token="</s>",
        unk_token="<unk>",
        mask_token="<mask_2>",
        mask_token_sent="<mask_1>",
        additional_special_tokens=None,
        offset=103,  # entries 2 - 104 are only used for pretraining
        **kwargs,
    ):
        """
        This method initializes an instance of the PegasusTokenizerFast class.

        Args:
            self: The instance of the class.
            vocab_file (str): Path to the vocabulary file. Defaults to None.
            tokenizer_file (str): Path to the tokenizer file. Defaults to None.
            pad_token (str): Special token representing padding. Defaults to '<pad>'.
            eos_token (str): Special token representing end of sequence. Defaults to '</s>'.
            unk_token (str): Special token representing unknown tokens. Defaults to '<unk>'.
            mask_token (str): Special token for masking tokens. Defaults to '<mask_2>'.
            mask_token_sent (str): Special token for masking sentences. Defaults to '<mask_1>'.
            additional_special_tokens (list): List of additional special tokens. Defaults to None.
            offset (int): Offset value for special tokens. Defaults to 103.

        Returns:
            None.

        Raises:
            TypeError: If additional_special_tokens is not a list.
            ValueError: If the provided additional_special_tokens contain an incorrectly shifted list of unknown tokens.
        """
        self.offset = offset

        if additional_special_tokens is not None:
            if not isinstance(additional_special_tokens, list):
                raise TypeError(
                    f"additional_special_tokens should be of type {type(list)}, but is"
                    f" {type(additional_special_tokens)}"
                )

            additional_special_tokens_extended = (
                ([mask_token_sent] + additional_special_tokens)
                if mask_token_sent not in additional_special_tokens and mask_token_sent is not None
                else additional_special_tokens
            )
            # fill additional tokens with ..., <unk_token_102> in case not all additional tokens are already taken
            additional_special_tokens_extended += [
                f"<unk_{i}>" for i in range(len(additional_special_tokens_extended), self.offset - 1)
            ]

            if len(set(additional_special_tokens_extended)) != len(additional_special_tokens_extended):
                raise ValueError(
                    "Please make sure that the provided additional_special_tokens do not contain an incorrectly"
                    f" shifted list of <unk_x> tokens. Found {additional_special_tokens_extended}."
                )
            additional_special_tokens = additional_special_tokens_extended
        else:
            additional_special_tokens = [mask_token_sent] if mask_token_sent is not None else []
            additional_special_tokens += [f"<unk_{i}>" for i in range(2, self.offset)]

        # pegasus was design to support changing the index of the first tokens. If one of the padding/eos/unk/mask token
        # is different from default, we must rebuild the vocab
        from_slow = kwargs.pop("from_slow", None)
        from_slow = from_slow or str(pad_token) != "<pad>" or str(eos_token) != "</s>" or str(unk_token) != "<unk>"

        kwargs.pop("added_tokens_decoder", {})

        super().__init__(
            vocab_file,
            tokenizer_file=tokenizer_file,
            pad_token=pad_token,
            eos_token=eos_token,
            unk_token=unk_token,
            mask_token=mask_token,
            mask_token_sent=mask_token_sent,
            offset=offset,
            additional_special_tokens=additional_special_tokens,
            from_slow=from_slow,
            **kwargs,
        )
        self.vocab_file = vocab_file

    @property
    def can_save_slow_tokenizer(self) -> bool:
        """
        Check whether the slow tokenizer can be saved.

        Args:
            self (PegasusTokenizerFast): The instance of the PegasusTokenizerFast class.

        Returns:
            bool: Returns True if the vocab_file exists and is a valid file path, False otherwise.

        Raises:
            None
        """
        return os.path.isfile(self.vocab_file) if self.vocab_file else False

    def _special_token_mask(self, seq):
        """
        Special Token Mask method in the PegasusTokenizerFast class.

        This method creates a special token mask for a sequence.

        Args:
            self (PegasusTokenizerFast): The instance of the PegasusTokenizerFast class.
            seq (List[int]): The input sequence for which the special token mask is to be created.

        Returns:
            List[int]: A list of integers representing the special token mask for the input sequence.
                The value 1 indicates that the token is a special token, while 0 indicates a regular token.

        Raises:
            ValueError: If the number or types of special tokens do not match the expected configuration,
                a ValueError is raised.
        """
        all_special_ids = set(self.all_special_ids)  # call it once instead of inside list comp
        all_special_ids.remove(self.unk_token_id)  # <unk> is only sometimes special

        if all_special_ids != set(range(len(self.additional_special_tokens) + 3)):
            raise ValueError(
                "There should be 3 special tokens: mask_token, pad_token, and eos_token +"
                f" {len(self.additional_special_tokens)} additional_special_tokens, but got {all_special_ids}"
            )

        return [1 if x in all_special_ids else 0 for x in seq]

    def get_special_tokens_mask(
        self, token_ids_0: List, token_ids_1: Optional[List] = None, already_has_special_tokens: bool = False
    ) -> List[int]:
        """Get list where entries are [1] if a token is [eos] or [pad] else 0."""
        if already_has_special_tokens:
            return self._special_token_mask(token_ids_0)
        elif token_ids_1 is None:
            return self._special_token_mask(token_ids_0) + [1]
        else:
            return self._special_token_mask(token_ids_0 + token_ids_1) + [1]

    def build_inputs_with_special_tokens(self, token_ids_0, token_ids_1=None) -> List[int]:
        r"""
        Build model inputs from a sequence by adding eos to the end. no bos token is added to the front.

        - single sequence: `X </s>`
        - pair of sequences: `A B </s>` (not intended use)

        Args:
            token_ids_0 (`List[int]`):
                List of IDs to which the special tokens will be added
            token_ids_1 (`List[int]`, *optional*):
                Optional second list of IDs for sequence pairs.

        Returns:
            `List[int]`: list of [input IDs](../glossary#input-ids) with the appropriate special tokens.
        """
        if token_ids_1 is None:
            return token_ids_0 + [self.eos_token_id]
        # We don't expect to process pairs, but leave the pair logic for API consistency
        return token_ids_0 + token_ids_1 + [self.eos_token_id]

    def save_vocabulary(self, save_directory: str, filename_prefix: Optional[str] = None) -> Tuple[str]:
        """
        Save the vocabulary to the specified directory with an optional filename prefix.

        Args:
            self (PegasusTokenizerFast): The instance of the PegasusTokenizerFast class.
            save_directory (str): The directory path where the vocabulary will be saved.
            filename_prefix (Optional[str]): An optional prefix to be added to the vocabulary filename. Default is None.

        Returns:
            Tuple[str]: A tuple containing the path to the saved vocabulary file.

        Raises:
            ValueError: If the fast tokenizer does not have the necessary information to save the vocabulary for
                a slow tokenizer.
            OSError: If the save_directory provided is not a valid directory path.
        """
        if not self.can_save_slow_tokenizer:
            raise ValueError(
                "Your fast tokenizer does not have the necessary information to save the vocabulary for a slow "
                "tokenizer."
            )

        if not os.path.isdir(save_directory):
            logger.error(f"Vocabulary path ({save_directory}) should be a directory")
            return
        out_vocab_file = os.path.join(
            save_directory, (filename_prefix + "-" if filename_prefix else "") + VOCAB_FILES_NAMES["vocab_file"]
        )

        if os.path.abspath(self.vocab_file) != os.path.abspath(out_vocab_file):
            copyfile(self.vocab_file, out_vocab_file)

        return (out_vocab_file,)

mindnlp.transformers.models.pegasus.tokenization_pegasus_fast.PegasusTokenizerFast.can_save_slow_tokenizer: bool property

Check whether the slow tokenizer can be saved.

PARAMETER DESCRIPTION
self

The instance of the PegasusTokenizerFast class.

TYPE: PegasusTokenizerFast

RETURNS DESCRIPTION
bool

Returns True if the vocab_file exists and is a valid file path, False otherwise.

TYPE: bool

mindnlp.transformers.models.pegasus.tokenization_pegasus_fast.PegasusTokenizerFast.__init__(vocab_file=None, tokenizer_file=None, pad_token='<pad>', eos_token='</s>', unk_token='<unk>', mask_token='<mask_2>', mask_token_sent='<mask_1>', additional_special_tokens=None, offset=103, **kwargs)

This method initializes an instance of the PegasusTokenizerFast class.

PARAMETER DESCRIPTION
self

The instance of the class.

vocab_file

Path to the vocabulary file. Defaults to None.

TYPE: str DEFAULT: None

tokenizer_file

Path to the tokenizer file. Defaults to None.

TYPE: str DEFAULT: None

pad_token

Special token representing padding. Defaults to ''.

TYPE: str DEFAULT: '<pad>'

eos_token

Special token representing end of sequence. Defaults to ''.

TYPE: str DEFAULT: '</s>'

unk_token

Special token representing unknown tokens. Defaults to ''.

TYPE: str DEFAULT: '<unk>'

mask_token

Special token for masking tokens. Defaults to ''.

TYPE: str DEFAULT: '<mask_2>'

mask_token_sent

Special token for masking sentences. Defaults to ''.

TYPE: str DEFAULT: '<mask_1>'

additional_special_tokens

List of additional special tokens. Defaults to None.

TYPE: list DEFAULT: None

offset

Offset value for special tokens. Defaults to 103.

TYPE: int DEFAULT: 103

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
TypeError

If additional_special_tokens is not a list.

ValueError

If the provided additional_special_tokens contain an incorrectly shifted list of unknown tokens.

Source code in mindnlp/transformers/models/pegasus/tokenization_pegasus_fast.py
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
def __init__(
    self,
    vocab_file=None,
    tokenizer_file=None,
    pad_token="<pad>",
    eos_token="</s>",
    unk_token="<unk>",
    mask_token="<mask_2>",
    mask_token_sent="<mask_1>",
    additional_special_tokens=None,
    offset=103,  # entries 2 - 104 are only used for pretraining
    **kwargs,
):
    """
    This method initializes an instance of the PegasusTokenizerFast class.

    Args:
        self: The instance of the class.
        vocab_file (str): Path to the vocabulary file. Defaults to None.
        tokenizer_file (str): Path to the tokenizer file. Defaults to None.
        pad_token (str): Special token representing padding. Defaults to '<pad>'.
        eos_token (str): Special token representing end of sequence. Defaults to '</s>'.
        unk_token (str): Special token representing unknown tokens. Defaults to '<unk>'.
        mask_token (str): Special token for masking tokens. Defaults to '<mask_2>'.
        mask_token_sent (str): Special token for masking sentences. Defaults to '<mask_1>'.
        additional_special_tokens (list): List of additional special tokens. Defaults to None.
        offset (int): Offset value for special tokens. Defaults to 103.

    Returns:
        None.

    Raises:
        TypeError: If additional_special_tokens is not a list.
        ValueError: If the provided additional_special_tokens contain an incorrectly shifted list of unknown tokens.
    """
    self.offset = offset

    if additional_special_tokens is not None:
        if not isinstance(additional_special_tokens, list):
            raise TypeError(
                f"additional_special_tokens should be of type {type(list)}, but is"
                f" {type(additional_special_tokens)}"
            )

        additional_special_tokens_extended = (
            ([mask_token_sent] + additional_special_tokens)
            if mask_token_sent not in additional_special_tokens and mask_token_sent is not None
            else additional_special_tokens
        )
        # fill additional tokens with ..., <unk_token_102> in case not all additional tokens are already taken
        additional_special_tokens_extended += [
            f"<unk_{i}>" for i in range(len(additional_special_tokens_extended), self.offset - 1)
        ]

        if len(set(additional_special_tokens_extended)) != len(additional_special_tokens_extended):
            raise ValueError(
                "Please make sure that the provided additional_special_tokens do not contain an incorrectly"
                f" shifted list of <unk_x> tokens. Found {additional_special_tokens_extended}."
            )
        additional_special_tokens = additional_special_tokens_extended
    else:
        additional_special_tokens = [mask_token_sent] if mask_token_sent is not None else []
        additional_special_tokens += [f"<unk_{i}>" for i in range(2, self.offset)]

    # pegasus was design to support changing the index of the first tokens. If one of the padding/eos/unk/mask token
    # is different from default, we must rebuild the vocab
    from_slow = kwargs.pop("from_slow", None)
    from_slow = from_slow or str(pad_token) != "<pad>" or str(eos_token) != "</s>" or str(unk_token) != "<unk>"

    kwargs.pop("added_tokens_decoder", {})

    super().__init__(
        vocab_file,
        tokenizer_file=tokenizer_file,
        pad_token=pad_token,
        eos_token=eos_token,
        unk_token=unk_token,
        mask_token=mask_token,
        mask_token_sent=mask_token_sent,
        offset=offset,
        additional_special_tokens=additional_special_tokens,
        from_slow=from_slow,
        **kwargs,
    )
    self.vocab_file = vocab_file

mindnlp.transformers.models.pegasus.tokenization_pegasus_fast.PegasusTokenizerFast.build_inputs_with_special_tokens(token_ids_0, token_ids_1=None)

Build model inputs from a sequence by adding eos to the end. no bos token is added to the front.

  • single sequence: X </s>
  • pair of sequences: A B </s> (not intended use)
PARAMETER DESCRIPTION
token_ids_0

List of IDs to which the special tokens will be added

TYPE: `List[int]`

token_ids_1

Optional second list of IDs for sequence pairs.

TYPE: `List[int]`, *optional* DEFAULT: None

RETURNS DESCRIPTION
List[int]

List[int]: list of input IDs with the appropriate special tokens.

Source code in mindnlp/transformers/models/pegasus/tokenization_pegasus_fast.py
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
def build_inputs_with_special_tokens(self, token_ids_0, token_ids_1=None) -> List[int]:
    r"""
    Build model inputs from a sequence by adding eos to the end. no bos token is added to the front.

    - single sequence: `X </s>`
    - pair of sequences: `A B </s>` (not intended use)

    Args:
        token_ids_0 (`List[int]`):
            List of IDs to which the special tokens will be added
        token_ids_1 (`List[int]`, *optional*):
            Optional second list of IDs for sequence pairs.

    Returns:
        `List[int]`: list of [input IDs](../glossary#input-ids) with the appropriate special tokens.
    """
    if token_ids_1 is None:
        return token_ids_0 + [self.eos_token_id]
    # We don't expect to process pairs, but leave the pair logic for API consistency
    return token_ids_0 + token_ids_1 + [self.eos_token_id]

mindnlp.transformers.models.pegasus.tokenization_pegasus_fast.PegasusTokenizerFast.get_special_tokens_mask(token_ids_0, token_ids_1=None, already_has_special_tokens=False)

Get list where entries are [1] if a token is [eos] or [pad] else 0.

Source code in mindnlp/transformers/models/pegasus/tokenization_pegasus_fast.py
219
220
221
222
223
224
225
226
227
228
def get_special_tokens_mask(
    self, token_ids_0: List, token_ids_1: Optional[List] = None, already_has_special_tokens: bool = False
) -> List[int]:
    """Get list where entries are [1] if a token is [eos] or [pad] else 0."""
    if already_has_special_tokens:
        return self._special_token_mask(token_ids_0)
    elif token_ids_1 is None:
        return self._special_token_mask(token_ids_0) + [1]
    else:
        return self._special_token_mask(token_ids_0 + token_ids_1) + [1]

mindnlp.transformers.models.pegasus.tokenization_pegasus_fast.PegasusTokenizerFast.save_vocabulary(save_directory, filename_prefix=None)

Save the vocabulary to the specified directory with an optional filename prefix.

PARAMETER DESCRIPTION
self

The instance of the PegasusTokenizerFast class.

TYPE: PegasusTokenizerFast

save_directory

The directory path where the vocabulary will be saved.

TYPE: str

filename_prefix

An optional prefix to be added to the vocabulary filename. Default is None.

TYPE: Optional[str] DEFAULT: None

RETURNS DESCRIPTION
Tuple[str]

Tuple[str]: A tuple containing the path to the saved vocabulary file.

RAISES DESCRIPTION
ValueError

If the fast tokenizer does not have the necessary information to save the vocabulary for a slow tokenizer.

OSError

If the save_directory provided is not a valid directory path.

Source code in mindnlp/transformers/models/pegasus/tokenization_pegasus_fast.py
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
def save_vocabulary(self, save_directory: str, filename_prefix: Optional[str] = None) -> Tuple[str]:
    """
    Save the vocabulary to the specified directory with an optional filename prefix.

    Args:
        self (PegasusTokenizerFast): The instance of the PegasusTokenizerFast class.
        save_directory (str): The directory path where the vocabulary will be saved.
        filename_prefix (Optional[str]): An optional prefix to be added to the vocabulary filename. Default is None.

    Returns:
        Tuple[str]: A tuple containing the path to the saved vocabulary file.

    Raises:
        ValueError: If the fast tokenizer does not have the necessary information to save the vocabulary for
            a slow tokenizer.
        OSError: If the save_directory provided is not a valid directory path.
    """
    if not self.can_save_slow_tokenizer:
        raise ValueError(
            "Your fast tokenizer does not have the necessary information to save the vocabulary for a slow "
            "tokenizer."
        )

    if not os.path.isdir(save_directory):
        logger.error(f"Vocabulary path ({save_directory}) should be a directory")
        return
    out_vocab_file = os.path.join(
        save_directory, (filename_prefix + "-" if filename_prefix else "") + VOCAB_FILES_NAMES["vocab_file"]
    )

    if os.path.abspath(self.vocab_file) != os.path.abspath(out_vocab_file):
        copyfile(self.vocab_file, out_vocab_file)

    return (out_vocab_file,)

mindnlp.transformers.models.pegasus.configuration_pegasus

PEGASUS model configuration

mindnlp.transformers.models.pegasus.configuration_pegasus.PegasusConfig

Bases: PretrainedConfig

This is the configuration class to store the configuration of a [PegasusModel]. It is used to instantiate an PEGASUS model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the PEGASUS google/pegasus-large architecture.

Configuration objects inherit from [PretrainedConfig] and can be used to control the model outputs. Read the documentation from [PretrainedConfig] for more information.

PARAMETER DESCRIPTION
vocab_size

Vocabulary size of the PEGASUS model. Defines the number of different tokens that can be represented by the inputs_ids passed when calling [PegasusModel] or [TFPegasusModel].

TYPE: `int`, *optional*, defaults to 50265 DEFAULT: 50265

d_model

Dimensionality of the layers and the pooler layer.

TYPE: `int`, *optional*, defaults to 1024 DEFAULT: 1024

encoder_layers

Number of encoder layers.

TYPE: `int`, *optional*, defaults to 12 DEFAULT: 12

decoder_layers

Number of decoder layers.

TYPE: `int`, *optional*, defaults to 12 DEFAULT: 12

encoder_attention_heads

Number of attention heads for each attention layer in the Transformer encoder.

TYPE: `int`, *optional*, defaults to 16 DEFAULT: 16

decoder_attention_heads

Number of attention heads for each attention layer in the Transformer decoder.

TYPE: `int`, *optional*, defaults to 16 DEFAULT: 16

decoder_ffn_dim

Dimensionality of the "intermediate" (often named feed-forward) layer in decoder.

TYPE: `int`, *optional*, defaults to 4096 DEFAULT: 4096

encoder_ffn_dim

Dimensionality of the "intermediate" (often named feed-forward) layer in decoder.

TYPE: `int`, *optional*, defaults to 4096 DEFAULT: 4096

activation_function

The non-linear activation function (function or string) in the encoder and pooler. If string, "gelu", "relu", "silu" and "gelu_new" are supported.

TYPE: `str` or `function`, *optional*, defaults to `"gelu"` DEFAULT: 'gelu'

dropout

The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.

TYPE: `float`, *optional*, defaults to 0.1 DEFAULT: 0.1

attention_dropout

The dropout ratio for the attention probabilities.

TYPE: `float`, *optional*, defaults to 0.0 DEFAULT: 0.0

activation_dropout

The dropout ratio for activations inside the fully connected layer.

TYPE: `float`, *optional*, defaults to 0.0 DEFAULT: 0.0

max_position_embeddings

The maximum sequence length that this model might ever be used with. Typically set this to something large just in case (e.g., 512 or 1024 or 2048).

TYPE: `int`, *optional*, defaults to 1024 DEFAULT: 1024

init_std

The standard deviation of the truncated_normal_initializer for initializing all weight matrices.

TYPE: `float`, *optional*, defaults to 0.02 DEFAULT: 0.02

encoder_layerdrop

The LayerDrop probability for the encoder. See the LayerDrop paper for more details.

TYPE: `float`, *optional*, defaults to 0.0 DEFAULT: 0.0

decoder_layerdrop

The LayerDrop probability for the decoder. See the LayerDrop paper for more details.

TYPE: `float`, *optional*, defaults to 0.0 DEFAULT: 0.0

scale_embedding

Scale embeddings by diving by sqrt(d_model).

TYPE: `bool`, *optional*, defaults to `False` DEFAULT: False

use_cache

Whether or not the model should return the last key/values attentions (not used by all models)

TYPE: `bool`, *optional*, defaults to `True` DEFAULT: True

forced_eos_token_id

The id of the token to force as the last generated token when max_length is reached. Usually set to eos_token_id.

TYPE: `int`, *optional*, defaults to 1 DEFAULT: 1

Example
>>> from transformers import PegasusConfig, PegasusModel
...
>>> # Initializing a PEGASUS google/pegasus-large style configuration
>>> configuration = PegasusConfig()
...
>>> # Initializing a model (with random weights) from the google/pegasus-large style configuration
>>> model = PegasusModel(configuration)
...
>>> # Accessing the model configuration
>>> configuration = model.config
Source code in mindnlp/transformers/models/pegasus/configuration_pegasus.py
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
class PegasusConfig(PretrainedConfig):
    r"""
    This is the configuration class to store the configuration of a [`PegasusModel`]. It is used to instantiate an
    PEGASUS model according to the specified arguments, defining the model architecture. Instantiating a configuration
    with the defaults will yield a similar configuration to that of the PEGASUS
    [google/pegasus-large](https://hf-mirror.com/google/pegasus-large) architecture.

    Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
    documentation from [`PretrainedConfig`] for more information.


    Args:
        vocab_size (`int`, *optional*, defaults to 50265):
            Vocabulary size of the PEGASUS model. Defines the number of different tokens that can be represented by the
            `inputs_ids` passed when calling [`PegasusModel`] or [`TFPegasusModel`].
        d_model (`int`, *optional*, defaults to 1024):
            Dimensionality of the layers and the pooler layer.
        encoder_layers (`int`, *optional*, defaults to 12):
            Number of encoder layers.
        decoder_layers (`int`, *optional*, defaults to 12):
            Number of decoder layers.
        encoder_attention_heads (`int`, *optional*, defaults to 16):
            Number of attention heads for each attention layer in the Transformer encoder.
        decoder_attention_heads (`int`, *optional*, defaults to 16):
            Number of attention heads for each attention layer in the Transformer decoder.
        decoder_ffn_dim (`int`, *optional*, defaults to 4096):
            Dimensionality of the "intermediate" (often named feed-forward) layer in decoder.
        encoder_ffn_dim (`int`, *optional*, defaults to 4096):
            Dimensionality of the "intermediate" (often named feed-forward) layer in decoder.
        activation_function (`str` or `function`, *optional*, defaults to `"gelu"`):
            The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
            `"relu"`, `"silu"` and `"gelu_new"` are supported.
        dropout (`float`, *optional*, defaults to 0.1):
            The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
        attention_dropout (`float`, *optional*, defaults to 0.0):
            The dropout ratio for the attention probabilities.
        activation_dropout (`float`, *optional*, defaults to 0.0):
            The dropout ratio for activations inside the fully connected layer.
        max_position_embeddings (`int`, *optional*, defaults to 1024):
            The maximum sequence length that this model might ever be used with. Typically set this to something large
            just in case (e.g., 512 or 1024 or 2048).
        init_std (`float`, *optional*, defaults to 0.02):
            The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
        encoder_layerdrop (`float`, *optional*, defaults to 0.0):
            The LayerDrop probability for the encoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556)
            for more details.
        decoder_layerdrop (`float`, *optional*, defaults to 0.0):
            The LayerDrop probability for the decoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556)
            for more details.
        scale_embedding (`bool`, *optional*, defaults to `False`):
            Scale embeddings by diving by sqrt(d_model).
        use_cache (`bool`, *optional*, defaults to `True`):
            Whether or not the model should return the last key/values attentions (not used by all models)
        forced_eos_token_id (`int`, *optional*, defaults to 1):
            The id of the token to force as the last generated token when `max_length` is reached. Usually set to
            `eos_token_id`.

    Example:
        ```python
        >>> from transformers import PegasusConfig, PegasusModel
        ...
        >>> # Initializing a PEGASUS google/pegasus-large style configuration
        >>> configuration = PegasusConfig()
        ...
        >>> # Initializing a model (with random weights) from the google/pegasus-large style configuration
        >>> model = PegasusModel(configuration)
        ...
        >>> # Accessing the model configuration
        >>> configuration = model.config
        ```
    """
    model_type = "pegasus"
    keys_to_ignore_at_inference = ["past_key_values"]
    attribute_map = {"num_attention_heads": "encoder_attention_heads", "hidden_size": "d_model"}

    def __init__(
        self,
        vocab_size=50265,
        max_position_embeddings=1024,
        encoder_layers=12,
        encoder_ffn_dim=4096,
        encoder_attention_heads=16,
        decoder_layers=12,
        decoder_ffn_dim=4096,
        decoder_attention_heads=16,
        encoder_layerdrop=0.0,
        decoder_layerdrop=0.0,
        use_cache=True,
        is_encoder_decoder=True,
        activation_function="gelu",
        d_model=1024,
        dropout=0.1,
        attention_dropout=0.0,
        activation_dropout=0.0,
        init_std=0.02,
        initializer_range=0.02,
        decoder_start_token_id=0,
        scale_embedding=False,
        pad_token_id=0,
        eos_token_id=1,
        forced_eos_token_id=1,
        **kwargs,
    ):
        """
        Initializes a new PegasusConfig object with the provided configuration parameters.

        Args:
            self: The instance of the class.
            vocab_size (int, optional): The size of the vocabulary. Default is 50265.
            max_position_embeddings (int, optional): The maximum number of tokens in a sequence. Default is 1024.
            encoder_layers (int, optional): The number of layers in the encoder. Default is 12.
            encoder_ffn_dim (int, optional): The dimension of the feedforward network in the encoder layers. Default is 4096.
            encoder_attention_heads (int, optional): The number of attention heads in the encoder layers. Default is 16.
            decoder_layers (int, optional): The number of layers in the decoder. Default is 12.
            decoder_ffn_dim (int, optional): The dimension of the feedforward network in the decoder layers. Default is 4096.
            decoder_attention_heads (int, optional): The number of attention heads in the decoder layers. Default is 16.
            encoder_layerdrop (float, optional): The probability of dropping a layer in the encoder. Default is 0.0.
            decoder_layerdrop (float, optional): The probability of dropping a layer in the decoder. Default is 0.0.
            use_cache (bool, optional): Whether to use caching for the model. Default is True.
            is_encoder_decoder (bool, optional): Whether the model is an encoder-decoder model. Default is True.
            activation_function (str, optional): The activation function to be used. Default is 'gelu'.
            d_model (int, optional): The dimension of the model. Default is 1024.
            dropout (float, optional): The dropout probability. Default is 0.1.
            attention_dropout (float, optional): The dropout probability for attention layers. Default is 0.0.
            activation_dropout (float, optional): The dropout probability for activation layers. Default is 0.0.
            init_std (float, optional): The standard deviation for weight initialization. Default is 0.02.
            initializer_range (float, optional): The range for weight initialization. Default is 0.02.
            decoder_start_token_id (int, optional): The token id for the start of the decoder sequence. Default is 0.
            scale_embedding (bool, optional): Whether to scale embeddings. Default is False.
            pad_token_id (int, optional): The token id for padding. Default is 0.
            eos_token_id (int, optional): The token id for end of sequence. Default is 1.
            forced_eos_token_id (int, optional): The token id for forced end of sequence. Default is 1.
            **kwargs: Additional keyword arguments.

        Returns:
            None.

        Raises:
            None.
        """
        self.vocab_size = vocab_size
        self.max_position_embeddings = max_position_embeddings
        self.d_model = d_model
        self.encoder_ffn_dim = encoder_ffn_dim
        self.encoder_layers = encoder_layers
        self.encoder_attention_heads = encoder_attention_heads
        self.decoder_ffn_dim = decoder_ffn_dim
        self.decoder_layers = decoder_layers
        self.decoder_attention_heads = decoder_attention_heads
        self.dropout = dropout
        self.attention_dropout = attention_dropout
        self.activation_dropout = activation_dropout
        self.activation_function = activation_function
        self.init_std = init_std
        self.initializer_range = initializer_range
        self.encoder_layerdrop = encoder_layerdrop
        self.decoder_layerdrop = decoder_layerdrop
        self.use_cache = use_cache
        self.num_hidden_layers = encoder_layers
        self.scale_embedding = scale_embedding  # scale factor will be sqrt(d_model) if True
        super().__init__(
            pad_token_id=pad_token_id,
            eos_token_id=eos_token_id,
            is_encoder_decoder=is_encoder_decoder,
            decoder_start_token_id=decoder_start_token_id,
            forced_eos_token_id=forced_eos_token_id,
            **kwargs,
        )

    @property
    def num_attention_heads(self) -> int:
        """
        Returns the number of attention heads in the Pegasus model's encoder.

        Args:
            self (PegasusConfig): The current instance of the PegasusConfig class.

        Returns:
            int: The number of attention heads used in the encoder of the Pegasus model.

        Raises:
            None.


        The `num_attention_heads` method returns an integer value representing the number of attention heads used
        in the encoder of the Pegasus model. Attention heads are a key component of transformer models, and they enable
        the model to focus on different parts of the input sequence during processing. By varying the number of
        attention heads, the model can capture different levels of information and dependencies in the input data.

        This method is a property, which means that it can be accessed as an attribute without needing to call
        it explicitly as a function. When accessed, it directly returns the number of attention heads specified in the
        `encoder_attention_heads` attribute of the current instance of the PegasusConfig class.

        Note that the `num_attention_heads` method does not take any additional parameters beyond the `self` parameter,
        as it is designed to provide information specific to the current instance of the class.

        Example:
            ```python
            >>> config = PegasusConfig()
            >>> num_heads = config.num_attention_heads
            >>> print(num_heads)
            12
            ```

        In this example, a new instance of the PegasusConfig class is created. The `num_attention_heads` property is
        accessed as an attribute (`config.num_attention_heads`), and the resulting number of attention heads
        (12 in this case) is printed.
        """
        return self.encoder_attention_heads

    @property
    def hidden_size(self) -> int:
        """
        Returns the hidden size of the PegasusConfig object.

        Args:
            self: The PegasusConfig object.

        Returns:
            int: The hidden size of the PegasusConfig object.
                This value represents the size of the hidden state in the model.

        Raises:
            None.
        """
        return self.d_model

mindnlp.transformers.models.pegasus.configuration_pegasus.PegasusConfig.hidden_size: int property

Returns the hidden size of the PegasusConfig object.

PARAMETER DESCRIPTION
self

The PegasusConfig object.

RETURNS DESCRIPTION
int

The hidden size of the PegasusConfig object. This value represents the size of the hidden state in the model.

TYPE: int

mindnlp.transformers.models.pegasus.configuration_pegasus.PegasusConfig.num_attention_heads: int property

Returns the number of attention heads in the Pegasus model's encoder.

PARAMETER DESCRIPTION
self

The current instance of the PegasusConfig class.

TYPE: PegasusConfig

RETURNS DESCRIPTION
int

The number of attention heads used in the encoder of the Pegasus model.

TYPE: int

The num_attention_heads method returns an integer value representing the number of attention heads used in the encoder of the Pegasus model. Attention heads are a key component of transformer models, and they enable the model to focus on different parts of the input sequence during processing. By varying the number of attention heads, the model can capture different levels of information and dependencies in the input data.

This method is a property, which means that it can be accessed as an attribute without needing to call it explicitly as a function. When accessed, it directly returns the number of attention heads specified in the encoder_attention_heads attribute of the current instance of the PegasusConfig class.

Note that the num_attention_heads method does not take any additional parameters beyond the self parameter, as it is designed to provide information specific to the current instance of the class.

Example
>>> config = PegasusConfig()
>>> num_heads = config.num_attention_heads
>>> print(num_heads)
12

In this example, a new instance of the PegasusConfig class is created. The num_attention_heads property is accessed as an attribute (config.num_attention_heads), and the resulting number of attention heads (12 in this case) is printed.

mindnlp.transformers.models.pegasus.configuration_pegasus.PegasusConfig.__init__(vocab_size=50265, max_position_embeddings=1024, encoder_layers=12, encoder_ffn_dim=4096, encoder_attention_heads=16, decoder_layers=12, decoder_ffn_dim=4096, decoder_attention_heads=16, encoder_layerdrop=0.0, decoder_layerdrop=0.0, use_cache=True, is_encoder_decoder=True, activation_function='gelu', d_model=1024, dropout=0.1, attention_dropout=0.0, activation_dropout=0.0, init_std=0.02, initializer_range=0.02, decoder_start_token_id=0, scale_embedding=False, pad_token_id=0, eos_token_id=1, forced_eos_token_id=1, **kwargs)

Initializes a new PegasusConfig object with the provided configuration parameters.

PARAMETER DESCRIPTION
self

The instance of the class.

vocab_size

The size of the vocabulary. Default is 50265.

TYPE: int DEFAULT: 50265

max_position_embeddings

The maximum number of tokens in a sequence. Default is 1024.

TYPE: int DEFAULT: 1024

encoder_layers

The number of layers in the encoder. Default is 12.

TYPE: int DEFAULT: 12

encoder_ffn_dim

The dimension of the feedforward network in the encoder layers. Default is 4096.

TYPE: int DEFAULT: 4096

encoder_attention_heads

The number of attention heads in the encoder layers. Default is 16.

TYPE: int DEFAULT: 16

decoder_layers

The number of layers in the decoder. Default is 12.

TYPE: int DEFAULT: 12

decoder_ffn_dim

The dimension of the feedforward network in the decoder layers. Default is 4096.

TYPE: int DEFAULT: 4096

decoder_attention_heads

The number of attention heads in the decoder layers. Default is 16.

TYPE: int DEFAULT: 16

encoder_layerdrop

The probability of dropping a layer in the encoder. Default is 0.0.

TYPE: float DEFAULT: 0.0

decoder_layerdrop

The probability of dropping a layer in the decoder. Default is 0.0.

TYPE: float DEFAULT: 0.0

use_cache

Whether to use caching for the model. Default is True.

TYPE: bool DEFAULT: True

is_encoder_decoder

Whether the model is an encoder-decoder model. Default is True.

TYPE: bool DEFAULT: True

activation_function

The activation function to be used. Default is 'gelu'.

TYPE: str DEFAULT: 'gelu'

d_model

The dimension of the model. Default is 1024.

TYPE: int DEFAULT: 1024

dropout

The dropout probability. Default is 0.1.

TYPE: float DEFAULT: 0.1

attention_dropout

The dropout probability for attention layers. Default is 0.0.

TYPE: float DEFAULT: 0.0

activation_dropout

The dropout probability for activation layers. Default is 0.0.

TYPE: float DEFAULT: 0.0

init_std

The standard deviation for weight initialization. Default is 0.02.

TYPE: float DEFAULT: 0.02

initializer_range

The range for weight initialization. Default is 0.02.

TYPE: float DEFAULT: 0.02

decoder_start_token_id

The token id for the start of the decoder sequence. Default is 0.

TYPE: int DEFAULT: 0

scale_embedding

Whether to scale embeddings. Default is False.

TYPE: bool DEFAULT: False

pad_token_id

The token id for padding. Default is 0.

TYPE: int DEFAULT: 0

eos_token_id

The token id for end of sequence. Default is 1.

TYPE: int DEFAULT: 1

forced_eos_token_id

The token id for forced end of sequence. Default is 1.

TYPE: int DEFAULT: 1

**kwargs

Additional keyword arguments.

DEFAULT: {}

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/pegasus/configuration_pegasus.py
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
def __init__(
    self,
    vocab_size=50265,
    max_position_embeddings=1024,
    encoder_layers=12,
    encoder_ffn_dim=4096,
    encoder_attention_heads=16,
    decoder_layers=12,
    decoder_ffn_dim=4096,
    decoder_attention_heads=16,
    encoder_layerdrop=0.0,
    decoder_layerdrop=0.0,
    use_cache=True,
    is_encoder_decoder=True,
    activation_function="gelu",
    d_model=1024,
    dropout=0.1,
    attention_dropout=0.0,
    activation_dropout=0.0,
    init_std=0.02,
    initializer_range=0.02,
    decoder_start_token_id=0,
    scale_embedding=False,
    pad_token_id=0,
    eos_token_id=1,
    forced_eos_token_id=1,
    **kwargs,
):
    """
    Initializes a new PegasusConfig object with the provided configuration parameters.

    Args:
        self: The instance of the class.
        vocab_size (int, optional): The size of the vocabulary. Default is 50265.
        max_position_embeddings (int, optional): The maximum number of tokens in a sequence. Default is 1024.
        encoder_layers (int, optional): The number of layers in the encoder. Default is 12.
        encoder_ffn_dim (int, optional): The dimension of the feedforward network in the encoder layers. Default is 4096.
        encoder_attention_heads (int, optional): The number of attention heads in the encoder layers. Default is 16.
        decoder_layers (int, optional): The number of layers in the decoder. Default is 12.
        decoder_ffn_dim (int, optional): The dimension of the feedforward network in the decoder layers. Default is 4096.
        decoder_attention_heads (int, optional): The number of attention heads in the decoder layers. Default is 16.
        encoder_layerdrop (float, optional): The probability of dropping a layer in the encoder. Default is 0.0.
        decoder_layerdrop (float, optional): The probability of dropping a layer in the decoder. Default is 0.0.
        use_cache (bool, optional): Whether to use caching for the model. Default is True.
        is_encoder_decoder (bool, optional): Whether the model is an encoder-decoder model. Default is True.
        activation_function (str, optional): The activation function to be used. Default is 'gelu'.
        d_model (int, optional): The dimension of the model. Default is 1024.
        dropout (float, optional): The dropout probability. Default is 0.1.
        attention_dropout (float, optional): The dropout probability for attention layers. Default is 0.0.
        activation_dropout (float, optional): The dropout probability for activation layers. Default is 0.0.
        init_std (float, optional): The standard deviation for weight initialization. Default is 0.02.
        initializer_range (float, optional): The range for weight initialization. Default is 0.02.
        decoder_start_token_id (int, optional): The token id for the start of the decoder sequence. Default is 0.
        scale_embedding (bool, optional): Whether to scale embeddings. Default is False.
        pad_token_id (int, optional): The token id for padding. Default is 0.
        eos_token_id (int, optional): The token id for end of sequence. Default is 1.
        forced_eos_token_id (int, optional): The token id for forced end of sequence. Default is 1.
        **kwargs: Additional keyword arguments.

    Returns:
        None.

    Raises:
        None.
    """
    self.vocab_size = vocab_size
    self.max_position_embeddings = max_position_embeddings
    self.d_model = d_model
    self.encoder_ffn_dim = encoder_ffn_dim
    self.encoder_layers = encoder_layers
    self.encoder_attention_heads = encoder_attention_heads
    self.decoder_ffn_dim = decoder_ffn_dim
    self.decoder_layers = decoder_layers
    self.decoder_attention_heads = decoder_attention_heads
    self.dropout = dropout
    self.attention_dropout = attention_dropout
    self.activation_dropout = activation_dropout
    self.activation_function = activation_function
    self.init_std = init_std
    self.initializer_range = initializer_range
    self.encoder_layerdrop = encoder_layerdrop
    self.decoder_layerdrop = decoder_layerdrop
    self.use_cache = use_cache
    self.num_hidden_layers = encoder_layers
    self.scale_embedding = scale_embedding  # scale factor will be sqrt(d_model) if True
    super().__init__(
        pad_token_id=pad_token_id,
        eos_token_id=eos_token_id,
        is_encoder_decoder=is_encoder_decoder,
        decoder_start_token_id=decoder_start_token_id,
        forced_eos_token_id=forced_eos_token_id,
        **kwargs,
    )

mindnlp.transformers.models.pegasus.modeling_pegasus

MindSpore PEGASUS model.

mindnlp.transformers.models.pegasus.modeling_pegasus.PegasusAttention

Bases: Module

Multi-headed attention from 'Attention Is All You Need' paper

Source code in mindnlp/transformers/models/pegasus/modeling_pegasus.py
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
class PegasusAttention(nn.Module):
    """Multi-headed attention from 'Attention Is All You Need' paper"""
    def __init__(
        self,
        embed_dim: int,
        num_heads: int,
        dropout: float = 0.0,
        is_decoder: bool = False,
        bias: bool = True,
        is_causal: bool = False,
        config: Optional[PegasusConfig] = None,
    ):
        """
        Initializes the PegasusAttention class.

        Args:
            embed_dim (int): The dimension of the input embeddings.
            num_heads (int): The number of attention heads.
            dropout (float, optional): The dropout probability. Default is 0.0.
            is_decoder (bool, optional): Whether the attention is used in a decoder setting. Default is False.
            bias (bool, optional): Whether to use bias in linear projections. Default is True.
            is_causal (bool, optional): Whether the attention is causal. Default is False.
            config (Optional[PegasusConfig], optional): An optional Pegasus configuration object. Default is None.

        Returns:
            None.

        Raises:
            ValueError: If embed_dim is not divisible by num_heads.
        """
        super().__init__()
        self.embed_dim = embed_dim
        self.num_heads = num_heads
        self.dropout = dropout
        self.head_dim = embed_dim // num_heads
        self.config = config

        if (self.head_dim * num_heads) != self.embed_dim:
            raise ValueError(
                f"embed_dim must be divisible by num_heads (got `embed_dim`: {self.embed_dim}"
                f" and `num_heads`: {num_heads})."
            )
        self.scaling = self.head_dim**-0.5
        self.is_decoder = is_decoder
        self.is_causal = is_causal

        self.k_proj = nn.Linear(embed_dim, embed_dim, bias=bias)
        self.v_proj = nn.Linear(embed_dim, embed_dim, bias=bias)
        self.q_proj = nn.Linear(embed_dim, embed_dim, bias=bias)
        self.out_proj = nn.Linear(embed_dim, embed_dim, bias=bias)

    def _shape(self, tensor: mindspore.Tensor, seq_len: int, bsz: int):
        """
        Method to reshape a tensor for Pegasus attention mechanism.

        Args:
            self (PegasusAttention): An instance of the PegasusAttention class.
            tensor (mindspore.Tensor): The input tensor to be reshaped.
            seq_len (int): The length of the sequence.
            bsz (int): The batch size.

        Returns:
            None: The method modifies the shape of the input tensor and returns None.

        Raises:
            None.
        """
        return tensor.view(bsz, seq_len, self.num_heads, self.head_dim).swapaxes(1, 2)

    def forward(
        self,
        hidden_states: mindspore.Tensor,
        key_value_states: Optional[mindspore.Tensor] = None,
        past_key_value: Optional[Tuple[mindspore.Tensor]] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        layer_head_mask: Optional[mindspore.Tensor] = None,
        output_attentions: bool = False,
    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
        """Input shape: Batch x Time x Channel"""
        # if key_value_states are provided this layer is used as a cross-attention layer
        # for the decoder
        is_cross_attention = key_value_states is not None

        bsz, tgt_len, _ = hidden_states.shape

        # get query proj
        query_states = self.q_proj(hidden_states) * self.scaling
        # get key, value proj
        # `past_key_value[0].shape[2] == key_value_states.shape[1]`
        # is checking that the `sequence_length` of the `past_key_value` is the same as
        # the provided `key_value_states` to support prefix tuning
        if (
            is_cross_attention
            and past_key_value is not None
            and past_key_value[0].shape[2] == key_value_states.shape[1]
        ):
            # reuse k,v, cross_attentions
            key_states = past_key_value[0]
            value_states = past_key_value[1]
        elif is_cross_attention:
            # cross_attentions
            key_states = self._shape(self.k_proj(key_value_states), -1, bsz)
            value_states = self._shape(self.v_proj(key_value_states), -1, bsz)
        elif past_key_value is not None:
            # reuse k, v, self_attention
            key_states = self._shape(self.k_proj(hidden_states), -1, bsz)
            value_states = self._shape(self.v_proj(hidden_states), -1, bsz)
            key_states = ops.cat([past_key_value[0], key_states], axis=2)
            value_states = ops.cat([past_key_value[1], value_states], axis=2)
        else:
            # self_attention
            key_states = self._shape(self.k_proj(hidden_states), -1, bsz)
            value_states = self._shape(self.v_proj(hidden_states), -1, bsz)

        if self.is_decoder:
            # if cross_attention save Tuple(mindspore.Tensor, mindspore.Tensor) of all cross attention key/value_states.
            # Further calls to cross_attention layer can then reuse all cross-attention
            # key/value_states (first "if" case)
            # if uni-directional self-attention (decoder) save Tuple(mindspore.Tensor, mindspore.Tensor) of
            # all previous decoder key/value_states. Further calls to uni-directional self-attention
            # can concat previous decoder key/value_states to current projected key/value_states (third "elif" case)
            # if encoder bi-directional self-attention `past_key_value` is always `None`
            past_key_value = (key_states, value_states)

        proj_shape = (bsz * self.num_heads, -1, self.head_dim)
        query_states = self._shape(query_states, tgt_len, bsz).view(*proj_shape)
        key_states = key_states.reshape(*proj_shape)
        value_states = value_states.reshape(*proj_shape)

        src_len = key_states.shape[1]
        attn_weights = ops.bmm(query_states, key_states.swapaxes(1, 2))

        if attn_weights.shape != (bsz * self.num_heads, tgt_len, src_len):
            raise ValueError(
                f"Attention weights should be of size {(bsz * self.num_heads, tgt_len, src_len)}, but is"
                f" {attn_weights.shape}"
            )

        if attention_mask is not None:
            if attention_mask.shape != (bsz, 1, tgt_len, src_len):
                raise ValueError(
                    f"Attention mask should be of size {(bsz, 1, tgt_len, src_len)}, but is {attention_mask.shape}"
                )
            attn_weights = attn_weights.view(bsz, self.num_heads, tgt_len, src_len) + attention_mask
            attn_weights = attn_weights.view(bsz * self.num_heads, tgt_len, src_len)

        attn_weights = ops.softmax(attn_weights, axis=-1)

        if layer_head_mask is not None:
            if layer_head_mask.shape != (self.num_heads,):
                raise ValueError(
                    f"Head mask for a single layer should be of size {(self.num_heads,)}, but is"
                    f" {layer_head_mask.shape}"
                )
            attn_weights = layer_head_mask.view(1, -1, 1, 1) * attn_weights.view(bsz, self.num_heads, tgt_len, src_len)
            attn_weights = attn_weights.view(bsz * self.num_heads, tgt_len, src_len)

        if output_attentions:
            # this operation is a bit awkward, but it's required to
            # make sure that attn_weights keeps its gradient.
            # In order to do so, attn_weights have to be reshaped
            # twice and have to be reused in the following
            attn_weights_reshaped = attn_weights.view(bsz, self.num_heads, tgt_len, src_len)
            attn_weights = attn_weights_reshaped.view(bsz * self.num_heads, tgt_len, src_len)
        else:
            attn_weights_reshaped = None

        attn_probs = ops.dropout(attn_weights, p=self.dropout, training=self.training)

        attn_output = ops.bmm(attn_probs, value_states)

        if attn_output.shape != (bsz * self.num_heads, tgt_len, self.head_dim):
            raise ValueError(
                f"`attn_output` should be of size {(bsz * self.num_heads, tgt_len, self.head_dim)}, but is"
                f" {attn_output.shape}"
            )

        attn_output = attn_output.view(bsz, self.num_heads, tgt_len, self.head_dim)
        attn_output = attn_output.swapaxes(1, 2)

        # Use the `embed_dim` from the config (stored in the class) rather than `hidden_state` because `attn_output` can be
        # partitioned across GPUs when using tensor-parallelism.
        attn_output = attn_output.reshape(bsz, tgt_len, self.embed_dim)

        attn_output = self.out_proj(attn_output)

        return attn_output, attn_weights_reshaped, past_key_value

mindnlp.transformers.models.pegasus.modeling_pegasus.PegasusAttention.__init__(embed_dim, num_heads, dropout=0.0, is_decoder=False, bias=True, is_causal=False, config=None)

Initializes the PegasusAttention class.

PARAMETER DESCRIPTION
embed_dim

The dimension of the input embeddings.

TYPE: int

num_heads

The number of attention heads.

TYPE: int

dropout

The dropout probability. Default is 0.0.

TYPE: float DEFAULT: 0.0

is_decoder

Whether the attention is used in a decoder setting. Default is False.

TYPE: bool DEFAULT: False

bias

Whether to use bias in linear projections. Default is True.

TYPE: bool DEFAULT: True

is_causal

Whether the attention is causal. Default is False.

TYPE: bool DEFAULT: False

config

An optional Pegasus configuration object. Default is None.

TYPE: Optional[PegasusConfig] DEFAULT: None

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
ValueError

If embed_dim is not divisible by num_heads.

Source code in mindnlp/transformers/models/pegasus/modeling_pegasus.py
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
def __init__(
    self,
    embed_dim: int,
    num_heads: int,
    dropout: float = 0.0,
    is_decoder: bool = False,
    bias: bool = True,
    is_causal: bool = False,
    config: Optional[PegasusConfig] = None,
):
    """
    Initializes the PegasusAttention class.

    Args:
        embed_dim (int): The dimension of the input embeddings.
        num_heads (int): The number of attention heads.
        dropout (float, optional): The dropout probability. Default is 0.0.
        is_decoder (bool, optional): Whether the attention is used in a decoder setting. Default is False.
        bias (bool, optional): Whether to use bias in linear projections. Default is True.
        is_causal (bool, optional): Whether the attention is causal. Default is False.
        config (Optional[PegasusConfig], optional): An optional Pegasus configuration object. Default is None.

    Returns:
        None.

    Raises:
        ValueError: If embed_dim is not divisible by num_heads.
    """
    super().__init__()
    self.embed_dim = embed_dim
    self.num_heads = num_heads
    self.dropout = dropout
    self.head_dim = embed_dim // num_heads
    self.config = config

    if (self.head_dim * num_heads) != self.embed_dim:
        raise ValueError(
            f"embed_dim must be divisible by num_heads (got `embed_dim`: {self.embed_dim}"
            f" and `num_heads`: {num_heads})."
        )
    self.scaling = self.head_dim**-0.5
    self.is_decoder = is_decoder
    self.is_causal = is_causal

    self.k_proj = nn.Linear(embed_dim, embed_dim, bias=bias)
    self.v_proj = nn.Linear(embed_dim, embed_dim, bias=bias)
    self.q_proj = nn.Linear(embed_dim, embed_dim, bias=bias)
    self.out_proj = nn.Linear(embed_dim, embed_dim, bias=bias)

mindnlp.transformers.models.pegasus.modeling_pegasus.PegasusAttention.forward(hidden_states, key_value_states=None, past_key_value=None, attention_mask=None, layer_head_mask=None, output_attentions=False)

Input shape: Batch x Time x Channel

Source code in mindnlp/transformers/models/pegasus/modeling_pegasus.py
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
def forward(
    self,
    hidden_states: mindspore.Tensor,
    key_value_states: Optional[mindspore.Tensor] = None,
    past_key_value: Optional[Tuple[mindspore.Tensor]] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    layer_head_mask: Optional[mindspore.Tensor] = None,
    output_attentions: bool = False,
) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
    """Input shape: Batch x Time x Channel"""
    # if key_value_states are provided this layer is used as a cross-attention layer
    # for the decoder
    is_cross_attention = key_value_states is not None

    bsz, tgt_len, _ = hidden_states.shape

    # get query proj
    query_states = self.q_proj(hidden_states) * self.scaling
    # get key, value proj
    # `past_key_value[0].shape[2] == key_value_states.shape[1]`
    # is checking that the `sequence_length` of the `past_key_value` is the same as
    # the provided `key_value_states` to support prefix tuning
    if (
        is_cross_attention
        and past_key_value is not None
        and past_key_value[0].shape[2] == key_value_states.shape[1]
    ):
        # reuse k,v, cross_attentions
        key_states = past_key_value[0]
        value_states = past_key_value[1]
    elif is_cross_attention:
        # cross_attentions
        key_states = self._shape(self.k_proj(key_value_states), -1, bsz)
        value_states = self._shape(self.v_proj(key_value_states), -1, bsz)
    elif past_key_value is not None:
        # reuse k, v, self_attention
        key_states = self._shape(self.k_proj(hidden_states), -1, bsz)
        value_states = self._shape(self.v_proj(hidden_states), -1, bsz)
        key_states = ops.cat([past_key_value[0], key_states], axis=2)
        value_states = ops.cat([past_key_value[1], value_states], axis=2)
    else:
        # self_attention
        key_states = self._shape(self.k_proj(hidden_states), -1, bsz)
        value_states = self._shape(self.v_proj(hidden_states), -1, bsz)

    if self.is_decoder:
        # if cross_attention save Tuple(mindspore.Tensor, mindspore.Tensor) of all cross attention key/value_states.
        # Further calls to cross_attention layer can then reuse all cross-attention
        # key/value_states (first "if" case)
        # if uni-directional self-attention (decoder) save Tuple(mindspore.Tensor, mindspore.Tensor) of
        # all previous decoder key/value_states. Further calls to uni-directional self-attention
        # can concat previous decoder key/value_states to current projected key/value_states (third "elif" case)
        # if encoder bi-directional self-attention `past_key_value` is always `None`
        past_key_value = (key_states, value_states)

    proj_shape = (bsz * self.num_heads, -1, self.head_dim)
    query_states = self._shape(query_states, tgt_len, bsz).view(*proj_shape)
    key_states = key_states.reshape(*proj_shape)
    value_states = value_states.reshape(*proj_shape)

    src_len = key_states.shape[1]
    attn_weights = ops.bmm(query_states, key_states.swapaxes(1, 2))

    if attn_weights.shape != (bsz * self.num_heads, tgt_len, src_len):
        raise ValueError(
            f"Attention weights should be of size {(bsz * self.num_heads, tgt_len, src_len)}, but is"
            f" {attn_weights.shape}"
        )

    if attention_mask is not None:
        if attention_mask.shape != (bsz, 1, tgt_len, src_len):
            raise ValueError(
                f"Attention mask should be of size {(bsz, 1, tgt_len, src_len)}, but is {attention_mask.shape}"
            )
        attn_weights = attn_weights.view(bsz, self.num_heads, tgt_len, src_len) + attention_mask
        attn_weights = attn_weights.view(bsz * self.num_heads, tgt_len, src_len)

    attn_weights = ops.softmax(attn_weights, axis=-1)

    if layer_head_mask is not None:
        if layer_head_mask.shape != (self.num_heads,):
            raise ValueError(
                f"Head mask for a single layer should be of size {(self.num_heads,)}, but is"
                f" {layer_head_mask.shape}"
            )
        attn_weights = layer_head_mask.view(1, -1, 1, 1) * attn_weights.view(bsz, self.num_heads, tgt_len, src_len)
        attn_weights = attn_weights.view(bsz * self.num_heads, tgt_len, src_len)

    if output_attentions:
        # this operation is a bit awkward, but it's required to
        # make sure that attn_weights keeps its gradient.
        # In order to do so, attn_weights have to be reshaped
        # twice and have to be reused in the following
        attn_weights_reshaped = attn_weights.view(bsz, self.num_heads, tgt_len, src_len)
        attn_weights = attn_weights_reshaped.view(bsz * self.num_heads, tgt_len, src_len)
    else:
        attn_weights_reshaped = None

    attn_probs = ops.dropout(attn_weights, p=self.dropout, training=self.training)

    attn_output = ops.bmm(attn_probs, value_states)

    if attn_output.shape != (bsz * self.num_heads, tgt_len, self.head_dim):
        raise ValueError(
            f"`attn_output` should be of size {(bsz * self.num_heads, tgt_len, self.head_dim)}, but is"
            f" {attn_output.shape}"
        )

    attn_output = attn_output.view(bsz, self.num_heads, tgt_len, self.head_dim)
    attn_output = attn_output.swapaxes(1, 2)

    # Use the `embed_dim` from the config (stored in the class) rather than `hidden_state` because `attn_output` can be
    # partitioned across GPUs when using tensor-parallelism.
    attn_output = attn_output.reshape(bsz, tgt_len, self.embed_dim)

    attn_output = self.out_proj(attn_output)

    return attn_output, attn_weights_reshaped, past_key_value

mindnlp.transformers.models.pegasus.modeling_pegasus.PegasusDecoder

Bases: PegasusPreTrainedModel

Transformer decoder consisting of config.decoder_layers layers. Each layer is a [PegasusDecoderLayer]

PARAMETER DESCRIPTION
config

PegasusConfig

TYPE: PegasusConfig

embed_tokens

output embedding

TYPE: Embedding DEFAULT: None

Source code in mindnlp/transformers/models/pegasus/modeling_pegasus.py
 860
 861
 862
 863
 864
 865
 866
 867
 868
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
class PegasusDecoder(PegasusPreTrainedModel):
    """
    Transformer decoder consisting of *config.decoder_layers* layers. Each layer is a [`PegasusDecoderLayer`]

    Args:
        config: PegasusConfig
        embed_tokens (nn.Embedding): output embedding
    """
    def __init__(self, config: PegasusConfig, embed_tokens: Optional[nn.Embedding] = None):
        """
        Initializes a PegasusDecoder instance.

        Args:
            self: The object itself.
            config (PegasusConfig): An instance of PegasusConfig containing configuration parameters.
            embed_tokens (Optional[nn.Embedding]): An optional instance of nn.Embedding representing embeddings.
                Defaults to None.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__(config)
        self.dropout = config.dropout
        self.layerdrop = config.decoder_layerdrop
        self.padding_idx = config.pad_token_id
        self.max_target_positions = config.max_position_embeddings
        self.embed_scale = math.sqrt(config.d_model) if config.scale_embedding else 1.0

        if embed_tokens is not None:
            self.embed_tokens = embed_tokens
        else:
            self.embed_tokens = nn.Embedding(config.vocab_size, config.d_model, self.padding_idx)

        self.embed_positions = PegasusSinusoidalPositionalEmbedding(
            config.max_position_embeddings,
            config.d_model,
            self.padding_idx,
        )
        self.layers = nn.ModuleList([PegasusDecoderLayer(config) for _ in range(config.decoder_layers)])
        self.layer_norm = nn.LayerNorm(config.d_model, eps=1e-5)

        self.gradient_checkpointing = False
        # Initialize weights and apply final processing
        self.post_init()

    def get_input_embeddings(self):
        """
        This method returns the input embeddings for the PegasusDecoder.

        Args:
            self (PegasusDecoder): The instance of the PegasusDecoder class.

        Returns:
            embed_tokens: This method returns the input embeddings stored in the 'embed_tokens' attribute of
                the PegasusDecoder instance.

        Raises:
            None.
        """
        return self.embed_tokens

    def set_input_embeddings(self, value):
        """
        This method sets the input embeddings for the PegasusDecoder.

        Args:
            self (PegasusDecoder): The instance of the PegasusDecoder class.
            value: The input embeddings to be set for the decoder.
                It should be of type torch.Tensor and represent the embeddings for the input tokens.

        Returns:
            None.

        Raises:
            None.
        """
        self.embed_tokens = value

    def resize_position_embeddings(self, new_num_position_embeddings: int):
        """
        Resizes position embeddings matrix of the model if `new_num_position_embeddings !=
        config.max_position_embeddings`.

        Arguments:
            new_num_position_embeddings (`int`):
                The number of new position embeddings.

                - If position embeddings are learned, increasing the size will add newly initialized vectors at the end,
                whereas reducing the size will remove vectors from the end.
                - If position embeddings are not learned (*e.g.* sinusoidal position embeddings), increasing the size will
                add correct vectors at the end following the position encoding algorithm, whereas reducing the size
                will remove vectors from the end.
        """
        logger.info(f"Setting `config.max_position_embeddings={new_num_position_embeddings}`...")
        self.config.max_position_embeddings = new_num_position_embeddings

        self.embed_positions = PegasusSinusoidalPositionalEmbedding(
            self.config.max_position_embeddings,
            self.config.d_model,
            self.padding_idx,
        )

    def get_position_embeddings(self) -> nn.Embedding:
        """
        Returns the position embeddings matrix
        """
        return self.embed_positions

    def forward(
        self,
        input_ids=None,
        attention_mask=None,
        encoder_hidden_states=None,
        encoder_attention_mask=None,
        head_mask=None,
        cross_attn_head_mask=None,
        past_key_values=None,
        inputs_embeds=None,
        use_cache=None,
        output_attentions=None,
        output_hidden_states=None,
        return_dict=None,
    ):
        r"""
        Args:
            input_ids (`mindspore.Tensor` of shape `(batch_size, sequence_length)`):
                Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you
                provide it.

                Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and
                [`PreTrainedTokenizer.__call__`] for details.

                [What are input IDs?](../glossary#input-ids)
            attention_mask (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
                Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:

                - 1 for tokens that are **not masked**,
                - 0 for tokens that are **masked**.

                [What are attention masks?](../glossary#attention-mask)
            encoder_hidden_states (`mindspore.Tensor` of shape `(batch_size, encoder_sequence_length, hidden_size)`, *optional*):
                Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention
                of the decoder.
            encoder_attention_mask (`mindspore.Tensor` of shape `(batch_size, encoder_sequence_length)`, *optional*):
                Mask to avoid performing cross-attention on padding tokens indices of encoder input_ids. Mask values
                selected in `[0, 1]`:

                - 1 for tokens that are **not masked**,
                - 0 for tokens that are **masked**.

                [What are attention masks?](../glossary#attention-mask)
            head_mask (`mindspore.Tensor` of shape `(decoder_layers, decoder_attention_heads)`, *optional*):
                Mask to nullify selected heads of the attention modules. Mask values selected in `[0, 1]`:

                - 1 indicates the head is **not masked**,
                - 0 indicates the head is **masked**.

            cross_attn_head_mask (`mindspore.Tensor` of shape `(decoder_layers, decoder_attention_heads)`, *optional*):
                Mask to nullify selected heads of the cross-attention modules in decoder to avoid performing
                cross-attention on hidden heads. Mask values selected in `[0, 1]`:

                - 1 indicates the head is **not masked**,
                - 0 indicates the head is **masked**.

            past_key_values (`tuple(tuple(mindspore.Tensor))`, *optional*, returned when `use_cache=True` is passed
                or when `config.use_cache=True`):
                Tuple of `tuple(mindspore.Tensor)` of length `config.n_layers`, with each tuple having 2 tensors of
                shape `(batch_size, num_heads, sequence_length, embed_size_per_head)`) and 2 additional tensors of
                shape `(batch_size, num_heads, encoder_sequence_length, embed_size_per_head)`.

                Contains pre-computed hidden-states (key and values in the self-attention blocks and in the
                cross-attention blocks) that can be used (see `past_key_values` input) to speed up sequential decoding.

                If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
                that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
                all `decoder_input_ids` of shape `(batch_size, sequence_length)`.
            inputs_embeds (`mindspore.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
                Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
                This is useful if you want more control over how to convert `input_ids` indices into associated vectors
                than the model's internal embedding lookup matrix.
            output_attentions (`bool`, *optional*):
                Whether or not to return the attentions tensors of all attention layers. See `attentions` under
                returned tensors for more detail.
            output_hidden_states (`bool`, *optional*):
                Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors
                for more detail.
            return_dict (`bool`, *optional*):
                Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
        """
        output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
        output_hidden_states = (
            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
        )
        use_cache = use_cache if use_cache is not None else self.config.use_cache
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        # retrieve input_ids and inputs_embeds
        if input_ids is not None and inputs_embeds is not None:
            raise ValueError("You cannot specify both decoder_input_ids and decoder_inputs_embeds at the same time")
        elif input_ids is not None:
            input_shape = input_ids.shape
            input_ids = input_ids.view(-1, input_shape[-1])
        elif inputs_embeds is not None:
            input_shape = inputs_embeds.shape[:-1]
        else:
            raise ValueError("You have to specify either decoder_input_ids or decoder_inputs_embeds")

        # past_key_values_length
        past_key_values_length = past_key_values[0][0].shape[2] if past_key_values is not None else 0

        if inputs_embeds is None:
            inputs_embeds = self.embed_tokens(input_ids) * self.embed_scale

        attention_mask = _prepare_4d_causal_attention_mask(
            attention_mask, input_shape, inputs_embeds, past_key_values_length
        )

        # expand encoder attention mask
        if encoder_hidden_states is not None and encoder_attention_mask is not None:
            # [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len]
            encoder_attention_mask = _prepare_4d_attention_mask(
                encoder_attention_mask, inputs_embeds.dtype, tgt_len=input_shape[-1]
            )

        # embed positions
        positions = self.embed_positions(input_shape, past_key_values_length)
        hidden_states = inputs_embeds + positions

        hidden_states = ops.dropout(hidden_states, p=self.dropout, training=self.training)

        if self.gradient_checkpointing and self.training:
            if use_cache:
                logger.warning_once(
                    "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`..."
                )
                use_cache = False

        # decoder layers
        all_hidden_states = () if output_hidden_states else None
        all_self_attns = () if output_attentions else None
        all_cross_attentions = () if (output_attentions and encoder_hidden_states is not None) else None
        next_decoder_cache = () if use_cache else None

        # check if head_mask/cross_attn_head_mask has a correct number of layers specified if desired
        for attn_mask, mask_name in zip([head_mask, cross_attn_head_mask], ["head_mask", "cross_attn_head_mask"]):
            if attn_mask is not None:
                if attn_mask.shape[0] != len(self.layers):
                    raise ValueError(
                        f"The `{mask_name}` should be specified for {len(self.layers)} layers, but it is for"
                        f" {head_mask.shape[0]}."
                    )
        for idx, decoder_layer in enumerate(self.layers):
            # add LayerDrop (see https://arxiv.org/abs/1909.11556 for description)
            if output_hidden_states:
                all_hidden_states += (hidden_states,)
            if self.training:
                dropout_probability = ops.rand([])
                if dropout_probability < self.layerdrop:
                    continue

            past_key_value = past_key_values[idx] if past_key_values is not None else None

            if self.gradient_checkpointing and self.training:
                layer_outputs = self._gradient_checkpointing_func(
                    decoder_layer.__call__,
                    hidden_states,
                    attention_mask,
                    encoder_hidden_states,
                    encoder_attention_mask,
                    head_mask[idx] if head_mask is not None else None,
                    cross_attn_head_mask[idx] if cross_attn_head_mask is not None else None,
                    None,
                    output_attentions,
                    use_cache,
                )
            else:
                layer_outputs = decoder_layer(
                    hidden_states,
                    attention_mask=attention_mask,
                    encoder_hidden_states=encoder_hidden_states,
                    encoder_attention_mask=encoder_attention_mask,
                    layer_head_mask=(head_mask[idx] if head_mask is not None else None),
                    cross_attn_layer_head_mask=(
                        cross_attn_head_mask[idx] if cross_attn_head_mask is not None else None
                    ),
                    past_key_value=past_key_value,
                    output_attentions=output_attentions,
                    use_cache=use_cache,
                )
            hidden_states = layer_outputs[0]

            if use_cache:
                next_decoder_cache += (layer_outputs[3 if output_attentions else 1],)

            if output_attentions:
                all_self_attns += (layer_outputs[1],)

                if encoder_hidden_states is not None:
                    all_cross_attentions += (layer_outputs[2],)

        hidden_states = self.layer_norm(hidden_states)

        # add hidden states from the last decoder layer
        if output_hidden_states:
            all_hidden_states += (hidden_states,)

        next_cache = next_decoder_cache if use_cache else None
        if not return_dict:
            return tuple(
                v
                for v in [hidden_states, next_cache, all_hidden_states, all_self_attns, all_cross_attentions]
                if v is not None
            )
        return BaseModelOutputWithPastAndCrossAttentions(
            last_hidden_state=hidden_states,
            past_key_values=next_cache,
            hidden_states=all_hidden_states,
            attentions=all_self_attns,
            cross_attentions=all_cross_attentions,
        )

mindnlp.transformers.models.pegasus.modeling_pegasus.PegasusDecoder.__init__(config, embed_tokens=None)

Initializes a PegasusDecoder instance.

PARAMETER DESCRIPTION
self

The object itself.

config

An instance of PegasusConfig containing configuration parameters.

TYPE: PegasusConfig

embed_tokens

An optional instance of nn.Embedding representing embeddings. Defaults to None.

TYPE: Optional[Embedding] DEFAULT: None

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/pegasus/modeling_pegasus.py
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
def __init__(self, config: PegasusConfig, embed_tokens: Optional[nn.Embedding] = None):
    """
    Initializes a PegasusDecoder instance.

    Args:
        self: The object itself.
        config (PegasusConfig): An instance of PegasusConfig containing configuration parameters.
        embed_tokens (Optional[nn.Embedding]): An optional instance of nn.Embedding representing embeddings.
            Defaults to None.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__(config)
    self.dropout = config.dropout
    self.layerdrop = config.decoder_layerdrop
    self.padding_idx = config.pad_token_id
    self.max_target_positions = config.max_position_embeddings
    self.embed_scale = math.sqrt(config.d_model) if config.scale_embedding else 1.0

    if embed_tokens is not None:
        self.embed_tokens = embed_tokens
    else:
        self.embed_tokens = nn.Embedding(config.vocab_size, config.d_model, self.padding_idx)

    self.embed_positions = PegasusSinusoidalPositionalEmbedding(
        config.max_position_embeddings,
        config.d_model,
        self.padding_idx,
    )
    self.layers = nn.ModuleList([PegasusDecoderLayer(config) for _ in range(config.decoder_layers)])
    self.layer_norm = nn.LayerNorm(config.d_model, eps=1e-5)

    self.gradient_checkpointing = False
    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.pegasus.modeling_pegasus.PegasusDecoder.forward(input_ids=None, attention_mask=None, encoder_hidden_states=None, encoder_attention_mask=None, head_mask=None, cross_attn_head_mask=None, past_key_values=None, inputs_embeds=None, use_cache=None, output_attentions=None, output_hidden_states=None, return_dict=None)

PARAMETER DESCRIPTION
input_ids

Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide it.

Indices can be obtained using [AutoTokenizer]. See [PreTrainedTokenizer.encode] and [PreTrainedTokenizer.__call__] for details.

What are input IDs?

TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length)` DEFAULT: None

attention_mask

Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:

  • 1 for tokens that are not masked,
  • 0 for tokens that are masked.

What are attention masks?

TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional* DEFAULT: None

encoder_hidden_states

Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention of the decoder.

TYPE: `mindspore.Tensor` of shape `(batch_size, encoder_sequence_length, hidden_size)`, *optional* DEFAULT: None

encoder_attention_mask

Mask to avoid performing cross-attention on padding tokens indices of encoder input_ids. Mask values selected in [0, 1]:

  • 1 for tokens that are not masked,
  • 0 for tokens that are masked.

What are attention masks?

TYPE: `mindspore.Tensor` of shape `(batch_size, encoder_sequence_length)`, *optional* DEFAULT: None

head_mask

Mask to nullify selected heads of the attention modules. Mask values selected in [0, 1]:

  • 1 indicates the head is not masked,
  • 0 indicates the head is masked.

TYPE: `mindspore.Tensor` of shape `(decoder_layers, decoder_attention_heads)`, *optional* DEFAULT: None

cross_attn_head_mask

Mask to nullify selected heads of the cross-attention modules in decoder to avoid performing cross-attention on hidden heads. Mask values selected in [0, 1]:

  • 1 indicates the head is not masked,
  • 0 indicates the head is masked.

TYPE: `mindspore.Tensor` of shape `(decoder_layers, decoder_attention_heads)`, *optional* DEFAULT: None

inputs_embeds

Optionally, instead of passing input_ids you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model's internal embedding lookup matrix.

TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional* DEFAULT: None

output_attentions

Whether or not to return the attentions tensors of all attention layers. See attentions under returned tensors for more detail.

TYPE: `bool`, *optional* DEFAULT: None

output_hidden_states

Whether or not to return the hidden states of all layers. See hidden_states under returned tensors for more detail.

TYPE: `bool`, *optional* DEFAULT: None

return_dict

Whether or not to return a [~utils.ModelOutput] instead of a plain tuple.

TYPE: `bool`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/pegasus/modeling_pegasus.py
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
def forward(
    self,
    input_ids=None,
    attention_mask=None,
    encoder_hidden_states=None,
    encoder_attention_mask=None,
    head_mask=None,
    cross_attn_head_mask=None,
    past_key_values=None,
    inputs_embeds=None,
    use_cache=None,
    output_attentions=None,
    output_hidden_states=None,
    return_dict=None,
):
    r"""
    Args:
        input_ids (`mindspore.Tensor` of shape `(batch_size, sequence_length)`):
            Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you
            provide it.

            Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and
            [`PreTrainedTokenizer.__call__`] for details.

            [What are input IDs?](../glossary#input-ids)
        attention_mask (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
            Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:

            - 1 for tokens that are **not masked**,
            - 0 for tokens that are **masked**.

            [What are attention masks?](../glossary#attention-mask)
        encoder_hidden_states (`mindspore.Tensor` of shape `(batch_size, encoder_sequence_length, hidden_size)`, *optional*):
            Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention
            of the decoder.
        encoder_attention_mask (`mindspore.Tensor` of shape `(batch_size, encoder_sequence_length)`, *optional*):
            Mask to avoid performing cross-attention on padding tokens indices of encoder input_ids. Mask values
            selected in `[0, 1]`:

            - 1 for tokens that are **not masked**,
            - 0 for tokens that are **masked**.

            [What are attention masks?](../glossary#attention-mask)
        head_mask (`mindspore.Tensor` of shape `(decoder_layers, decoder_attention_heads)`, *optional*):
            Mask to nullify selected heads of the attention modules. Mask values selected in `[0, 1]`:

            - 1 indicates the head is **not masked**,
            - 0 indicates the head is **masked**.

        cross_attn_head_mask (`mindspore.Tensor` of shape `(decoder_layers, decoder_attention_heads)`, *optional*):
            Mask to nullify selected heads of the cross-attention modules in decoder to avoid performing
            cross-attention on hidden heads. Mask values selected in `[0, 1]`:

            - 1 indicates the head is **not masked**,
            - 0 indicates the head is **masked**.

        past_key_values (`tuple(tuple(mindspore.Tensor))`, *optional*, returned when `use_cache=True` is passed
            or when `config.use_cache=True`):
            Tuple of `tuple(mindspore.Tensor)` of length `config.n_layers`, with each tuple having 2 tensors of
            shape `(batch_size, num_heads, sequence_length, embed_size_per_head)`) and 2 additional tensors of
            shape `(batch_size, num_heads, encoder_sequence_length, embed_size_per_head)`.

            Contains pre-computed hidden-states (key and values in the self-attention blocks and in the
            cross-attention blocks) that can be used (see `past_key_values` input) to speed up sequential decoding.

            If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
            that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
            all `decoder_input_ids` of shape `(batch_size, sequence_length)`.
        inputs_embeds (`mindspore.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
            Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
            This is useful if you want more control over how to convert `input_ids` indices into associated vectors
            than the model's internal embedding lookup matrix.
        output_attentions (`bool`, *optional*):
            Whether or not to return the attentions tensors of all attention layers. See `attentions` under
            returned tensors for more detail.
        output_hidden_states (`bool`, *optional*):
            Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors
            for more detail.
        return_dict (`bool`, *optional*):
            Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
    """
    output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
    output_hidden_states = (
        output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
    )
    use_cache = use_cache if use_cache is not None else self.config.use_cache
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    # retrieve input_ids and inputs_embeds
    if input_ids is not None and inputs_embeds is not None:
        raise ValueError("You cannot specify both decoder_input_ids and decoder_inputs_embeds at the same time")
    elif input_ids is not None:
        input_shape = input_ids.shape
        input_ids = input_ids.view(-1, input_shape[-1])
    elif inputs_embeds is not None:
        input_shape = inputs_embeds.shape[:-1]
    else:
        raise ValueError("You have to specify either decoder_input_ids or decoder_inputs_embeds")

    # past_key_values_length
    past_key_values_length = past_key_values[0][0].shape[2] if past_key_values is not None else 0

    if inputs_embeds is None:
        inputs_embeds = self.embed_tokens(input_ids) * self.embed_scale

    attention_mask = _prepare_4d_causal_attention_mask(
        attention_mask, input_shape, inputs_embeds, past_key_values_length
    )

    # expand encoder attention mask
    if encoder_hidden_states is not None and encoder_attention_mask is not None:
        # [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len]
        encoder_attention_mask = _prepare_4d_attention_mask(
            encoder_attention_mask, inputs_embeds.dtype, tgt_len=input_shape[-1]
        )

    # embed positions
    positions = self.embed_positions(input_shape, past_key_values_length)
    hidden_states = inputs_embeds + positions

    hidden_states = ops.dropout(hidden_states, p=self.dropout, training=self.training)

    if self.gradient_checkpointing and self.training:
        if use_cache:
            logger.warning_once(
                "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`..."
            )
            use_cache = False

    # decoder layers
    all_hidden_states = () if output_hidden_states else None
    all_self_attns = () if output_attentions else None
    all_cross_attentions = () if (output_attentions and encoder_hidden_states is not None) else None
    next_decoder_cache = () if use_cache else None

    # check if head_mask/cross_attn_head_mask has a correct number of layers specified if desired
    for attn_mask, mask_name in zip([head_mask, cross_attn_head_mask], ["head_mask", "cross_attn_head_mask"]):
        if attn_mask is not None:
            if attn_mask.shape[0] != len(self.layers):
                raise ValueError(
                    f"The `{mask_name}` should be specified for {len(self.layers)} layers, but it is for"
                    f" {head_mask.shape[0]}."
                )
    for idx, decoder_layer in enumerate(self.layers):
        # add LayerDrop (see https://arxiv.org/abs/1909.11556 for description)
        if output_hidden_states:
            all_hidden_states += (hidden_states,)
        if self.training:
            dropout_probability = ops.rand([])
            if dropout_probability < self.layerdrop:
                continue

        past_key_value = past_key_values[idx] if past_key_values is not None else None

        if self.gradient_checkpointing and self.training:
            layer_outputs = self._gradient_checkpointing_func(
                decoder_layer.__call__,
                hidden_states,
                attention_mask,
                encoder_hidden_states,
                encoder_attention_mask,
                head_mask[idx] if head_mask is not None else None,
                cross_attn_head_mask[idx] if cross_attn_head_mask is not None else None,
                None,
                output_attentions,
                use_cache,
            )
        else:
            layer_outputs = decoder_layer(
                hidden_states,
                attention_mask=attention_mask,
                encoder_hidden_states=encoder_hidden_states,
                encoder_attention_mask=encoder_attention_mask,
                layer_head_mask=(head_mask[idx] if head_mask is not None else None),
                cross_attn_layer_head_mask=(
                    cross_attn_head_mask[idx] if cross_attn_head_mask is not None else None
                ),
                past_key_value=past_key_value,
                output_attentions=output_attentions,
                use_cache=use_cache,
            )
        hidden_states = layer_outputs[0]

        if use_cache:
            next_decoder_cache += (layer_outputs[3 if output_attentions else 1],)

        if output_attentions:
            all_self_attns += (layer_outputs[1],)

            if encoder_hidden_states is not None:
                all_cross_attentions += (layer_outputs[2],)

    hidden_states = self.layer_norm(hidden_states)

    # add hidden states from the last decoder layer
    if output_hidden_states:
        all_hidden_states += (hidden_states,)

    next_cache = next_decoder_cache if use_cache else None
    if not return_dict:
        return tuple(
            v
            for v in [hidden_states, next_cache, all_hidden_states, all_self_attns, all_cross_attentions]
            if v is not None
        )
    return BaseModelOutputWithPastAndCrossAttentions(
        last_hidden_state=hidden_states,
        past_key_values=next_cache,
        hidden_states=all_hidden_states,
        attentions=all_self_attns,
        cross_attentions=all_cross_attentions,
    )

mindnlp.transformers.models.pegasus.modeling_pegasus.PegasusDecoder.get_input_embeddings()

This method returns the input embeddings for the PegasusDecoder.

PARAMETER DESCRIPTION
self

The instance of the PegasusDecoder class.

TYPE: PegasusDecoder

RETURNS DESCRIPTION
embed_tokens

This method returns the input embeddings stored in the 'embed_tokens' attribute of the PegasusDecoder instance.

Source code in mindnlp/transformers/models/pegasus/modeling_pegasus.py
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
def get_input_embeddings(self):
    """
    This method returns the input embeddings for the PegasusDecoder.

    Args:
        self (PegasusDecoder): The instance of the PegasusDecoder class.

    Returns:
        embed_tokens: This method returns the input embeddings stored in the 'embed_tokens' attribute of
            the PegasusDecoder instance.

    Raises:
        None.
    """
    return self.embed_tokens

mindnlp.transformers.models.pegasus.modeling_pegasus.PegasusDecoder.get_position_embeddings()

Returns the position embeddings matrix

Source code in mindnlp/transformers/models/pegasus/modeling_pegasus.py
965
966
967
968
969
def get_position_embeddings(self) -> nn.Embedding:
    """
    Returns the position embeddings matrix
    """
    return self.embed_positions

mindnlp.transformers.models.pegasus.modeling_pegasus.PegasusDecoder.resize_position_embeddings(new_num_position_embeddings)

Resizes position embeddings matrix of the model if new_num_position_embeddings != config.max_position_embeddings.

PARAMETER DESCRIPTION
new_num_position_embeddings

The number of new position embeddings.

  • If position embeddings are learned, increasing the size will add newly initialized vectors at the end, whereas reducing the size will remove vectors from the end.
  • If position embeddings are not learned (e.g. sinusoidal position embeddings), increasing the size will add correct vectors at the end following the position encoding algorithm, whereas reducing the size will remove vectors from the end.

TYPE: `int`

Source code in mindnlp/transformers/models/pegasus/modeling_pegasus.py
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
def resize_position_embeddings(self, new_num_position_embeddings: int):
    """
    Resizes position embeddings matrix of the model if `new_num_position_embeddings !=
    config.max_position_embeddings`.

    Arguments:
        new_num_position_embeddings (`int`):
            The number of new position embeddings.

            - If position embeddings are learned, increasing the size will add newly initialized vectors at the end,
            whereas reducing the size will remove vectors from the end.
            - If position embeddings are not learned (*e.g.* sinusoidal position embeddings), increasing the size will
            add correct vectors at the end following the position encoding algorithm, whereas reducing the size
            will remove vectors from the end.
    """
    logger.info(f"Setting `config.max_position_embeddings={new_num_position_embeddings}`...")
    self.config.max_position_embeddings = new_num_position_embeddings

    self.embed_positions = PegasusSinusoidalPositionalEmbedding(
        self.config.max_position_embeddings,
        self.config.d_model,
        self.padding_idx,
    )

mindnlp.transformers.models.pegasus.modeling_pegasus.PegasusDecoder.set_input_embeddings(value)

This method sets the input embeddings for the PegasusDecoder.

PARAMETER DESCRIPTION
self

The instance of the PegasusDecoder class.

TYPE: PegasusDecoder

value

The input embeddings to be set for the decoder. It should be of type torch.Tensor and represent the embeddings for the input tokens.

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/pegasus/modeling_pegasus.py
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
def set_input_embeddings(self, value):
    """
    This method sets the input embeddings for the PegasusDecoder.

    Args:
        self (PegasusDecoder): The instance of the PegasusDecoder class.
        value: The input embeddings to be set for the decoder.
            It should be of type torch.Tensor and represent the embeddings for the input tokens.

    Returns:
        None.

    Raises:
        None.
    """
    self.embed_tokens = value

mindnlp.transformers.models.pegasus.modeling_pegasus.PegasusDecoderLayer

Bases: Module

The PegasusDecoderLayer class represents a single layer of the Pegasus decoder model. It includes self-attention and encoder-decoder cross-attention mechanisms followed by feedforward neural network layers. This class inherits from nn.Module and implements the decoding logic for the Pegasus model.

ATTRIBUTE DESCRIPTION
embed_dim

The dimension of the embeddings used in the layer.

TYPE: int

self_attn

The self-attention mechanism used in the layer.

TYPE: PegasusAttention

dropout

The dropout probability applied in the layer.

TYPE: float

activation_fn

The activation function used in the feedforward neural network layers.

TYPE: function

activation_dropout

The dropout probability applied after the activation function.

TYPE: float

self_attn_layer_norm

Layer normalization applied after self-attention.

TYPE: LayerNorm

encoder_attn

The encoder-decoder cross-attention mechanism used in the layer.

TYPE: PegasusAttention

encoder_attn_layer_norm

Layer normalization applied after encoder-decoder cross-attention.

TYPE: LayerNorm

fc1

The first feedforward neural network layer.

TYPE: Dense

fc2

The second feedforward neural network layer.

TYPE: Dense

final_layer_norm

Layer normalization applied at the end of the layer.

TYPE: LayerNorm

METHOD DESCRIPTION
forward

Constructs the output of the layer based on the input hidden states and optional arguments. Returns the output tensor.

PARAMETER DESCRIPTION
hidden_states

Input to the layer of shape (batch, seq_len, embed_dim).

TYPE: Tensor

attention_mask

Attention mask of size (batch, 1, tgt_len, src_len) with padding indicated by large negative values.

TYPE: Tensor

encoder_hidden_states

Encoder input to the layer of shape (batch, seq_len, embed_dim).

TYPE: Tensor

encoder_attention_mask

Encoder attention mask of size (batch, 1, tgt_len, src_len) with padding indicated by large negative values.

TYPE: Tensor

layer_head_mask

Mask for attention heads in a given layer.

TYPE: Tensor

cross_attn_layer_head_mask

Mask for cross-attention heads in a given layer.

TYPE: Tensor

past_key_value

Cached past key and value projection states.

TYPE: Tuple(Tensor

output_attentions

Flag to determine whether to return attention tensors.

TYPE: bool

use_cache

Flag to determine whether to use caching mechanism for key-value states.

TYPE: bool

RETURNS DESCRIPTION
outputs

Tuple containing the output tensor and optionally self-attention and cross-attention weights if output_attentions is True, and present key-value states if use_cache is True.

TYPE: Tuple

Source code in mindnlp/transformers/models/pegasus/modeling_pegasus.py
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
class PegasusDecoderLayer(nn.Module):

    """
    The PegasusDecoderLayer class represents a single layer of the Pegasus decoder model.
    It includes self-attention and encoder-decoder cross-attention mechanisms followed by feedforward
    neural network layers. This class inherits from nn.Module and implements the decoding logic for the Pegasus model.

    Attributes:
        embed_dim (int): The dimension of the embeddings used in the layer.
        self_attn (PegasusAttention): The self-attention mechanism used in the layer.
        dropout (float): The dropout probability applied in the layer.
        activation_fn (function): The activation function used in the feedforward neural network layers.
        activation_dropout (float): The dropout probability applied after the activation function.
        self_attn_layer_norm (LayerNorm): Layer normalization applied after self-attention.
        encoder_attn (PegasusAttention): The encoder-decoder cross-attention mechanism used in the layer.
        encoder_attn_layer_norm (LayerNorm): Layer normalization applied after encoder-decoder cross-attention.
        fc1 (Dense): The first feedforward neural network layer.
        fc2 (Dense): The second feedforward neural network layer.
        final_layer_norm (LayerNorm): Layer normalization applied at the end of the layer.

    Methods:
        forward:
            Constructs the output of the layer based on the input hidden states and optional arguments.
            Returns the output tensor.

    Args:
        hidden_states (Tensor): Input to the layer of shape (batch, seq_len, embed_dim).
        attention_mask (Tensor): Attention mask of size (batch, 1, tgt_len, src_len) with padding indicated by
            large negative values.
        encoder_hidden_states (Tensor): Encoder input to the layer of shape (batch, seq_len, embed_dim).
        encoder_attention_mask (Tensor): Encoder attention mask of size (batch, 1, tgt_len, src_len) with padding
            indicated by large negative values.
        layer_head_mask (Tensor): Mask for attention heads in a given layer.
        cross_attn_layer_head_mask (Tensor): Mask for cross-attention heads in a given layer.
        past_key_value (Tuple(Tensor)): Cached past key and value projection states.
        output_attentions (bool): Flag to determine whether to return attention tensors.
        use_cache (bool): Flag to determine whether to use caching mechanism for key-value states.

    Returns:
        outputs (Tuple): Tuple containing the output tensor and optionally self-attention and cross-attention weights
            if output_attentions is True, and present key-value states if use_cache is True.
    """
    def __init__(self, config: PegasusConfig):
        """
        Initializes an instance of the PegasusDecoderLayer class.

        Args:
            self (PegasusDecoderLayer): The current instance of the class.
            config (PegasusConfig): The configuration object containing various settings for the decoder layer.

        Returns:
            None

        Raises:
            None
        """
        super().__init__()
        self.embed_dim = config.d_model

        self.self_attn = PEGASUS_ATTENTION_CLASSES[config._attn_implementation](
            embed_dim=self.embed_dim,
            num_heads=config.decoder_attention_heads,
            dropout=config.attention_dropout,
            is_decoder=True,
            is_causal=True,
            config=config,
        )
        self.dropout = config.dropout
        self.activation_fn = ACT2FN[config.activation_function]
        self.activation_dropout = config.activation_dropout

        self.self_attn_layer_norm = nn.LayerNorm(self.embed_dim, eps=1e-5)
        self.encoder_attn = PEGASUS_ATTENTION_CLASSES[config._attn_implementation](
            self.embed_dim,
            config.decoder_attention_heads,
            dropout=config.attention_dropout,
            is_decoder=True,
            config=config,
        )
        self.encoder_attn_layer_norm = nn.LayerNorm(self.embed_dim, eps=1e-5)
        self.fc1 = nn.Linear(self.embed_dim, config.decoder_ffn_dim)
        self.fc2 = nn.Linear(config.decoder_ffn_dim, self.embed_dim)
        self.final_layer_norm = nn.LayerNorm(self.embed_dim, eps=1e-5)

    def forward(
        self,
        hidden_states: mindspore.Tensor,
        attention_mask: Optional[mindspore.Tensor] = None,
        encoder_hidden_states: Optional[mindspore.Tensor] = None,
        encoder_attention_mask: Optional[mindspore.Tensor] = None,
        layer_head_mask: Optional[mindspore.Tensor] = None,
        cross_attn_layer_head_mask: Optional[mindspore.Tensor] = None,
        past_key_value: Optional[Tuple[mindspore.Tensor]] = None,
        output_attentions: Optional[bool] = False,
        use_cache: Optional[bool] = True,
    ) -> mindspore.Tensor:
        """
        Args:
            hidden_states (`mindspore.Tensor`): input to the layer of shape `(batch, seq_len, embed_dim)`
            attention_mask (`mindspore.Tensor`): attention mask of size
                `(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values.
            encoder_hidden_states (`mindspore.Tensor`):
                cross attention input to the layer of shape `(batch, seq_len, embed_dim)`
            encoder_attention_mask (`mindspore.Tensor`): encoder attention mask of size
                `(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values.
            layer_head_mask (`mindspore.Tensor`): mask for attention heads in a given layer of size
                `(encoder_attention_heads,)`.
            cross_attn_layer_head_mask (`mindspore.Tensor`): mask for cross-attention heads in a given layer of
                size `(decoder_attention_heads,)`.
            past_key_value (`Tuple(mindspore.Tensor)`): cached past key and value projection states
            output_attentions (`bool`, *optional*):
                Whether or not to return the attentions tensors of all attention layers. See `attentions` under
                returned tensors for more detail.
        """
        residual = hidden_states
        hidden_states = self.self_attn_layer_norm(hidden_states)

        # Self Attention
        # decoder uni-directional self-attention cached key/values tuple is at positions 1,2
        self_attn_past_key_value = past_key_value[:2] if past_key_value is not None else None
        # add present self-attn cache to positions 1,2 of present_key_value tuple
        hidden_states, self_attn_weights, present_key_value = self.self_attn(
            hidden_states=hidden_states,
            past_key_value=self_attn_past_key_value,
            attention_mask=attention_mask,
            layer_head_mask=layer_head_mask,
            output_attentions=output_attentions,
        )
        hidden_states = ops.dropout(hidden_states, p=self.dropout, training=self.training)
        hidden_states = residual + hidden_states
        # Cross-Attention Block
        cross_attn_present_key_value = None
        cross_attn_weights = None
        if encoder_hidden_states is not None:
            residual = hidden_states
            hidden_states = self.encoder_attn_layer_norm(hidden_states)

            # cross_attn cached key/values tuple is at positions 3,4 of present_key_value tuple
            cross_attn_past_key_value = past_key_value[-2:] if past_key_value is not None else None
            hidden_states, cross_attn_weights, cross_attn_present_key_value = self.encoder_attn(
                hidden_states=hidden_states,
                key_value_states=encoder_hidden_states,
                attention_mask=encoder_attention_mask,
                layer_head_mask=cross_attn_layer_head_mask,
                past_key_value=cross_attn_past_key_value,
                output_attentions=output_attentions,
            )
            hidden_states = ops.dropout(hidden_states, p=self.dropout, training=self.training)
            hidden_states = residual + hidden_states

            # add cross-attn to positions 3,4 of present_key_value tuple
            present_key_value = present_key_value + cross_attn_present_key_value

        # Fully Connected
        residual = hidden_states
        hidden_states = self.final_layer_norm(hidden_states)
        hidden_states = self.activation_fn(self.fc1(hidden_states))
        hidden_states = ops.dropout(hidden_states, p=self.activation_dropout, training=self.training)
        hidden_states = self.fc2(hidden_states)
        hidden_states = ops.dropout(hidden_states, p=self.dropout, training=self.training)
        hidden_states = residual + hidden_states

        outputs = (hidden_states,)
        if output_attentions:
            outputs += (self_attn_weights, cross_attn_weights)

        if use_cache:
            outputs += (present_key_value,)

        return outputs

mindnlp.transformers.models.pegasus.modeling_pegasus.PegasusDecoderLayer.__init__(config)

Initializes an instance of the PegasusDecoderLayer class.

PARAMETER DESCRIPTION
self

The current instance of the class.

TYPE: PegasusDecoderLayer

config

The configuration object containing various settings for the decoder layer.

TYPE: PegasusConfig

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/pegasus/modeling_pegasus.py
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
def __init__(self, config: PegasusConfig):
    """
    Initializes an instance of the PegasusDecoderLayer class.

    Args:
        self (PegasusDecoderLayer): The current instance of the class.
        config (PegasusConfig): The configuration object containing various settings for the decoder layer.

    Returns:
        None

    Raises:
        None
    """
    super().__init__()
    self.embed_dim = config.d_model

    self.self_attn = PEGASUS_ATTENTION_CLASSES[config._attn_implementation](
        embed_dim=self.embed_dim,
        num_heads=config.decoder_attention_heads,
        dropout=config.attention_dropout,
        is_decoder=True,
        is_causal=True,
        config=config,
    )
    self.dropout = config.dropout
    self.activation_fn = ACT2FN[config.activation_function]
    self.activation_dropout = config.activation_dropout

    self.self_attn_layer_norm = nn.LayerNorm(self.embed_dim, eps=1e-5)
    self.encoder_attn = PEGASUS_ATTENTION_CLASSES[config._attn_implementation](
        self.embed_dim,
        config.decoder_attention_heads,
        dropout=config.attention_dropout,
        is_decoder=True,
        config=config,
    )
    self.encoder_attn_layer_norm = nn.LayerNorm(self.embed_dim, eps=1e-5)
    self.fc1 = nn.Linear(self.embed_dim, config.decoder_ffn_dim)
    self.fc2 = nn.Linear(config.decoder_ffn_dim, self.embed_dim)
    self.final_layer_norm = nn.LayerNorm(self.embed_dim, eps=1e-5)

mindnlp.transformers.models.pegasus.modeling_pegasus.PegasusDecoderLayer.forward(hidden_states, attention_mask=None, encoder_hidden_states=None, encoder_attention_mask=None, layer_head_mask=None, cross_attn_layer_head_mask=None, past_key_value=None, output_attentions=False, use_cache=True)

PARAMETER DESCRIPTION
hidden_states

input to the layer of shape (batch, seq_len, embed_dim)

TYPE: `mindspore.Tensor`

attention_mask

attention mask of size (batch, 1, tgt_len, src_len) where padding elements are indicated by very large negative values.

TYPE: `mindspore.Tensor` DEFAULT: None

encoder_hidden_states

cross attention input to the layer of shape (batch, seq_len, embed_dim)

TYPE: `mindspore.Tensor` DEFAULT: None

encoder_attention_mask

encoder attention mask of size (batch, 1, tgt_len, src_len) where padding elements are indicated by very large negative values.

TYPE: `mindspore.Tensor` DEFAULT: None

layer_head_mask

mask for attention heads in a given layer of size (encoder_attention_heads,).

TYPE: `mindspore.Tensor` DEFAULT: None

cross_attn_layer_head_mask

mask for cross-attention heads in a given layer of size (decoder_attention_heads,).

TYPE: `mindspore.Tensor` DEFAULT: None

past_key_value

cached past key and value projection states

TYPE: `Tuple(mindspore.Tensor)` DEFAULT: None

output_attentions

Whether or not to return the attentions tensors of all attention layers. See attentions under returned tensors for more detail.

TYPE: `bool`, *optional* DEFAULT: False

Source code in mindnlp/transformers/models/pegasus/modeling_pegasus.py
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
def forward(
    self,
    hidden_states: mindspore.Tensor,
    attention_mask: Optional[mindspore.Tensor] = None,
    encoder_hidden_states: Optional[mindspore.Tensor] = None,
    encoder_attention_mask: Optional[mindspore.Tensor] = None,
    layer_head_mask: Optional[mindspore.Tensor] = None,
    cross_attn_layer_head_mask: Optional[mindspore.Tensor] = None,
    past_key_value: Optional[Tuple[mindspore.Tensor]] = None,
    output_attentions: Optional[bool] = False,
    use_cache: Optional[bool] = True,
) -> mindspore.Tensor:
    """
    Args:
        hidden_states (`mindspore.Tensor`): input to the layer of shape `(batch, seq_len, embed_dim)`
        attention_mask (`mindspore.Tensor`): attention mask of size
            `(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values.
        encoder_hidden_states (`mindspore.Tensor`):
            cross attention input to the layer of shape `(batch, seq_len, embed_dim)`
        encoder_attention_mask (`mindspore.Tensor`): encoder attention mask of size
            `(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values.
        layer_head_mask (`mindspore.Tensor`): mask for attention heads in a given layer of size
            `(encoder_attention_heads,)`.
        cross_attn_layer_head_mask (`mindspore.Tensor`): mask for cross-attention heads in a given layer of
            size `(decoder_attention_heads,)`.
        past_key_value (`Tuple(mindspore.Tensor)`): cached past key and value projection states
        output_attentions (`bool`, *optional*):
            Whether or not to return the attentions tensors of all attention layers. See `attentions` under
            returned tensors for more detail.
    """
    residual = hidden_states
    hidden_states = self.self_attn_layer_norm(hidden_states)

    # Self Attention
    # decoder uni-directional self-attention cached key/values tuple is at positions 1,2
    self_attn_past_key_value = past_key_value[:2] if past_key_value is not None else None
    # add present self-attn cache to positions 1,2 of present_key_value tuple
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
        hidden_states=hidden_states,
        past_key_value=self_attn_past_key_value,
        attention_mask=attention_mask,
        layer_head_mask=layer_head_mask,
        output_attentions=output_attentions,
    )
    hidden_states = ops.dropout(hidden_states, p=self.dropout, training=self.training)
    hidden_states = residual + hidden_states
    # Cross-Attention Block
    cross_attn_present_key_value = None
    cross_attn_weights = None
    if encoder_hidden_states is not None:
        residual = hidden_states
        hidden_states = self.encoder_attn_layer_norm(hidden_states)

        # cross_attn cached key/values tuple is at positions 3,4 of present_key_value tuple
        cross_attn_past_key_value = past_key_value[-2:] if past_key_value is not None else None
        hidden_states, cross_attn_weights, cross_attn_present_key_value = self.encoder_attn(
            hidden_states=hidden_states,
            key_value_states=encoder_hidden_states,
            attention_mask=encoder_attention_mask,
            layer_head_mask=cross_attn_layer_head_mask,
            past_key_value=cross_attn_past_key_value,
            output_attentions=output_attentions,
        )
        hidden_states = ops.dropout(hidden_states, p=self.dropout, training=self.training)
        hidden_states = residual + hidden_states

        # add cross-attn to positions 3,4 of present_key_value tuple
        present_key_value = present_key_value + cross_attn_present_key_value

    # Fully Connected
    residual = hidden_states
    hidden_states = self.final_layer_norm(hidden_states)
    hidden_states = self.activation_fn(self.fc1(hidden_states))
    hidden_states = ops.dropout(hidden_states, p=self.activation_dropout, training=self.training)
    hidden_states = self.fc2(hidden_states)
    hidden_states = ops.dropout(hidden_states, p=self.dropout, training=self.training)
    hidden_states = residual + hidden_states

    outputs = (hidden_states,)
    if output_attentions:
        outputs += (self_attn_weights, cross_attn_weights)

    if use_cache:
        outputs += (present_key_value,)

    return outputs

mindnlp.transformers.models.pegasus.modeling_pegasus.PegasusDecoderWrapper

Bases: PegasusPreTrainedModel

This wrapper class is a helper class to correctly load pretrained checkpoints when the causal language model is used in combination with the [EncoderDecoderModel] framework.

Source code in mindnlp/transformers/models/pegasus/modeling_pegasus.py
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
class PegasusDecoderWrapper(PegasusPreTrainedModel):
    """
    This wrapper class is a helper class to correctly load pretrained checkpoints when the causal language model is
    used in combination with the [`EncoderDecoderModel`] framework.
    """
    def __init__(self, config):
        """
        Initializes an instance of the PegasusDecoderWrapper class.

        Args:
            self (PegasusDecoderWrapper): The instance of the class itself.
            config: The configuration object containing the necessary parameters for initialization.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__(config)
        self.decoder = PegasusDecoder(config)

    def forward(self, *args, **kwargs):
        """
        Method 'forward' in the class 'PegasusDecoderWrapper'.

        Args:
            *args: Variable length argument list.
            **kwargs: Arbitrary keyword arguments.

        Returns:
            None: This method returns None.

        Raises:
            None.
        """
        return self.decoder(*args, **kwargs)

mindnlp.transformers.models.pegasus.modeling_pegasus.PegasusDecoderWrapper.__init__(config)

Initializes an instance of the PegasusDecoderWrapper class.

PARAMETER DESCRIPTION
self

The instance of the class itself.

TYPE: PegasusDecoderWrapper

config

The configuration object containing the necessary parameters for initialization.

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/pegasus/modeling_pegasus.py
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
def __init__(self, config):
    """
    Initializes an instance of the PegasusDecoderWrapper class.

    Args:
        self (PegasusDecoderWrapper): The instance of the class itself.
        config: The configuration object containing the necessary parameters for initialization.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__(config)
    self.decoder = PegasusDecoder(config)

mindnlp.transformers.models.pegasus.modeling_pegasus.PegasusDecoderWrapper.forward(*args, **kwargs)

Method 'forward' in the class 'PegasusDecoderWrapper'.

PARAMETER DESCRIPTION
*args

Variable length argument list.

DEFAULT: ()

**kwargs

Arbitrary keyword arguments.

DEFAULT: {}

RETURNS DESCRIPTION
None

This method returns None.

Source code in mindnlp/transformers/models/pegasus/modeling_pegasus.py
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
def forward(self, *args, **kwargs):
    """
    Method 'forward' in the class 'PegasusDecoderWrapper'.

    Args:
        *args: Variable length argument list.
        **kwargs: Arbitrary keyword arguments.

    Returns:
        None: This method returns None.

    Raises:
        None.
    """
    return self.decoder(*args, **kwargs)

mindnlp.transformers.models.pegasus.modeling_pegasus.PegasusEncoder

Bases: PegasusPreTrainedModel

Transformer encoder consisting of config.encoder_layers self attention layers. Each layer is a [PegasusEncoderLayer].

PARAMETER DESCRIPTION
config

PegasusConfig

TYPE: PegasusConfig

embed_tokens

output embedding

TYPE: Embedding DEFAULT: None

Source code in mindnlp/transformers/models/pegasus/modeling_pegasus.py
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
class PegasusEncoder(PegasusPreTrainedModel):
    """
    Transformer encoder consisting of *config.encoder_layers* self attention layers. Each layer is a
    [`PegasusEncoderLayer`].

    Args:
        config: PegasusConfig
        embed_tokens (nn.Embedding): output embedding
    """
    def __init__(self, config: PegasusConfig, embed_tokens: Optional[nn.Embedding] = None):
        '''
        Initializes a PegasusEncoder object.

        Args:
            self: The PegasusEncoder object itself.
            config (PegasusConfig): An instance of PegasusConfig containing the configuration settings for
                the Pegasus model.
            embed_tokens (Optional[nn.Embedding]): An optional instance of nn.Embedding representing the
                token embeddings.

        Returns:
            None.

        Raises:
            None.
        '''
        super().__init__(config)

        self.dropout = config.dropout
        self.layerdrop = config.encoder_layerdrop

        embed_dim = config.d_model
        self.padding_idx = config.pad_token_id
        self.max_source_positions = config.max_position_embeddings
        self.embed_scale = math.sqrt(embed_dim) if config.scale_embedding else 1.0

        if embed_tokens is not None:
            self.embed_tokens = embed_tokens
        else:
            self.embed_tokens = nn.Embedding(config.vocab_size, embed_dim, self.padding_idx)

        self.embed_positions = PegasusSinusoidalPositionalEmbedding(
            config.max_position_embeddings,
            embed_dim,
            self.padding_idx,
        )
        self.layers = nn.ModuleList([PegasusEncoderLayer(config) for _ in range(config.encoder_layers)])
        self.layer_norm = nn.LayerNorm(config.d_model, eps=1e-5)

        self.gradient_checkpointing = False
        # Initialize weights and apply final processing
        self.post_init()

    def resize_position_embeddings(self, new_num_position_embeddings: int):
        """
        Resizes position embeddings matrix of the model if `new_num_position_embeddings !=
        config.max_position_embeddings`.

        Arguments:
            new_num_position_embeddings (`int`):
                The number of new position embeddings.

                - If position embeddings are learned, increasing the size will add newly initialized vectors at the end,
                whereas reducing the size will remove vectors from the end.
                - If position embeddings are not learned (*e.g.* sinusoidal position embeddings), increasing the size
                will add correct vectors at the end following the position encoding algorithm, whereas reducing the size
                will remove vectors from the end.
        """
        logger.info(f"Setting `config.max_position_embeddings={new_num_position_embeddings}`...")
        self.config.max_position_embeddings = new_num_position_embeddings

        self.embed_positions = PegasusSinusoidalPositionalEmbedding(
            self.config.max_position_embeddings,
            self.config.d_model,
            self.padding_idx,
        )

    def get_position_embeddings(self) -> nn.Embedding:
        """
        Returns the position embeddings matrix
        """
        return self.embed_positions

    def forward(
        self,
        input_ids=None,
        attention_mask=None,
        head_mask=None,
        inputs_embeds=None,
        output_attentions=None,
        output_hidden_states=None,
        return_dict=None,
    ):
        r"""
        Args:
            input_ids (`mindspore.Tensor` of shape `(batch_size, sequence_length)`):
                Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you
                provide it.

                Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and
                [`PreTrainedTokenizer.__call__`] for details.

                [What are input IDs?](../glossary#input-ids)
            attention_mask (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
                Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:

                - 1 for tokens that are **not masked**,
                - 0 for tokens that are **masked**.

                [What are attention masks?](../glossary#attention-mask)
            head_mask (`mindspore.Tensor` of shape `(encoder_layers, encoder_attention_heads)`, *optional*):
                Mask to nullify selected heads of the attention modules. Mask values selected in `[0, 1]`:

                - 1 indicates the head is **not masked**,
                - 0 indicates the head is **masked**.

            inputs_embeds (`mindspore.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
                Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
                This is useful if you want more control over how to convert `input_ids` indices into associated vectors
                than the model's internal embedding lookup matrix.
            output_attentions (`bool`, *optional*):
                Whether or not to return the attentions tensors of all attention layers. See `attentions` under
                returned tensors for more detail.
            output_hidden_states (`bool`, *optional*):
                Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors
                for more detail.
            return_dict (`bool`, *optional*):
                Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
        """
        output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
        output_hidden_states = (
            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
        )
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        # retrieve input_ids and inputs_embeds
        if input_ids is not None and inputs_embeds is not None:
            raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
        elif input_ids is not None:
            self.warn_if_padding_and_no_attention_mask(input_ids, attention_mask)
            input_shape = input_ids.shape
            input_ids = input_ids.view(-1, input_shape[-1])
        elif inputs_embeds is not None:
            input_shape = inputs_embeds.shape[:-1]
        else:
            raise ValueError("You have to specify either input_ids or inputs_embeds")

        if inputs_embeds is None:
            inputs_embeds = self.embed_tokens(input_ids) * self.embed_scale

        embed_pos = self.embed_positions(input_shape)

        hidden_states = inputs_embeds + embed_pos

        hidden_states = ops.dropout(hidden_states, p=self.dropout, training=self.training)

        # expand attention_mask
        if attention_mask is not None:
            # [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len]
            attention_mask = _prepare_4d_attention_mask(attention_mask, inputs_embeds.dtype)

        encoder_states = () if output_hidden_states else None
        all_attentions = () if output_attentions else None

        # check if head_mask has a correct number of layers specified if desired
        if head_mask is not None:
            if head_mask.shape[0] != len(self.layers):
                raise ValueError(
                    f"The head_mask should be specified for {len(self.layers)} layers, but it is for"
                    f" {head_mask.shape[0]}."
                )
        for idx, encoder_layer in enumerate(self.layers):
            if output_hidden_states:
                encoder_states = encoder_states + (hidden_states,)
            # add LayerDrop (see https://arxiv.org/abs/1909.11556 for description)
            to_drop = False
            if self.training:
                dropout_probability = ops.rand([])
                if dropout_probability < self.layerdrop:  # skip the layer
                    to_drop = True

            if to_drop:
                layer_outputs = (None, None)
            else:
                if self.gradient_checkpointing and self.training:
                    layer_outputs = self._gradient_checkpointing_func(
                        encoder_layer.__call__,
                        hidden_states,
                        attention_mask,
                        (head_mask[idx] if head_mask is not None else None),
                        output_attentions,
                    )
                else:
                    layer_outputs = encoder_layer(
                        hidden_states,
                        attention_mask,
                        layer_head_mask=(head_mask[idx] if head_mask is not None else None),
                        output_attentions=output_attentions,
                    )

                hidden_states = layer_outputs[0]

            if output_attentions:
                all_attentions = all_attentions + (layer_outputs[1],)

        hidden_states = self.layer_norm(hidden_states)

        if output_hidden_states:
            encoder_states = encoder_states + (hidden_states,)

        if not return_dict:
            return tuple(v for v in [hidden_states, encoder_states, all_attentions] if v is not None)
        return BaseModelOutput(
            last_hidden_state=hidden_states, hidden_states=encoder_states, attentions=all_attentions
        )

mindnlp.transformers.models.pegasus.modeling_pegasus.PegasusEncoder.__init__(config, embed_tokens=None)

Initializes a PegasusEncoder object.

PARAMETER DESCRIPTION
self

The PegasusEncoder object itself.

config

An instance of PegasusConfig containing the configuration settings for the Pegasus model.

TYPE: PegasusConfig

embed_tokens

An optional instance of nn.Embedding representing the token embeddings.

TYPE: Optional[Embedding] DEFAULT: None

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/pegasus/modeling_pegasus.py
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
def __init__(self, config: PegasusConfig, embed_tokens: Optional[nn.Embedding] = None):
    '''
    Initializes a PegasusEncoder object.

    Args:
        self: The PegasusEncoder object itself.
        config (PegasusConfig): An instance of PegasusConfig containing the configuration settings for
            the Pegasus model.
        embed_tokens (Optional[nn.Embedding]): An optional instance of nn.Embedding representing the
            token embeddings.

    Returns:
        None.

    Raises:
        None.
    '''
    super().__init__(config)

    self.dropout = config.dropout
    self.layerdrop = config.encoder_layerdrop

    embed_dim = config.d_model
    self.padding_idx = config.pad_token_id
    self.max_source_positions = config.max_position_embeddings
    self.embed_scale = math.sqrt(embed_dim) if config.scale_embedding else 1.0

    if embed_tokens is not None:
        self.embed_tokens = embed_tokens
    else:
        self.embed_tokens = nn.Embedding(config.vocab_size, embed_dim, self.padding_idx)

    self.embed_positions = PegasusSinusoidalPositionalEmbedding(
        config.max_position_embeddings,
        embed_dim,
        self.padding_idx,
    )
    self.layers = nn.ModuleList([PegasusEncoderLayer(config) for _ in range(config.encoder_layers)])
    self.layer_norm = nn.LayerNorm(config.d_model, eps=1e-5)

    self.gradient_checkpointing = False
    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.pegasus.modeling_pegasus.PegasusEncoder.forward(input_ids=None, attention_mask=None, head_mask=None, inputs_embeds=None, output_attentions=None, output_hidden_states=None, return_dict=None)

PARAMETER DESCRIPTION
input_ids

Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide it.

Indices can be obtained using [AutoTokenizer]. See [PreTrainedTokenizer.encode] and [PreTrainedTokenizer.__call__] for details.

What are input IDs?

TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length)` DEFAULT: None

attention_mask

Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:

  • 1 for tokens that are not masked,
  • 0 for tokens that are masked.

What are attention masks?

TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional* DEFAULT: None

head_mask

Mask to nullify selected heads of the attention modules. Mask values selected in [0, 1]:

  • 1 indicates the head is not masked,
  • 0 indicates the head is masked.

TYPE: `mindspore.Tensor` of shape `(encoder_layers, encoder_attention_heads)`, *optional* DEFAULT: None

inputs_embeds

Optionally, instead of passing input_ids you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model's internal embedding lookup matrix.

TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional* DEFAULT: None

output_attentions

Whether or not to return the attentions tensors of all attention layers. See attentions under returned tensors for more detail.

TYPE: `bool`, *optional* DEFAULT: None

output_hidden_states

Whether or not to return the hidden states of all layers. See hidden_states under returned tensors for more detail.

TYPE: `bool`, *optional* DEFAULT: None

return_dict

Whether or not to return a [~utils.ModelOutput] instead of a plain tuple.

TYPE: `bool`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/pegasus/modeling_pegasus.py
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
def forward(
    self,
    input_ids=None,
    attention_mask=None,
    head_mask=None,
    inputs_embeds=None,
    output_attentions=None,
    output_hidden_states=None,
    return_dict=None,
):
    r"""
    Args:
        input_ids (`mindspore.Tensor` of shape `(batch_size, sequence_length)`):
            Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you
            provide it.

            Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and
            [`PreTrainedTokenizer.__call__`] for details.

            [What are input IDs?](../glossary#input-ids)
        attention_mask (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
            Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:

            - 1 for tokens that are **not masked**,
            - 0 for tokens that are **masked**.

            [What are attention masks?](../glossary#attention-mask)
        head_mask (`mindspore.Tensor` of shape `(encoder_layers, encoder_attention_heads)`, *optional*):
            Mask to nullify selected heads of the attention modules. Mask values selected in `[0, 1]`:

            - 1 indicates the head is **not masked**,
            - 0 indicates the head is **masked**.

        inputs_embeds (`mindspore.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
            Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
            This is useful if you want more control over how to convert `input_ids` indices into associated vectors
            than the model's internal embedding lookup matrix.
        output_attentions (`bool`, *optional*):
            Whether or not to return the attentions tensors of all attention layers. See `attentions` under
            returned tensors for more detail.
        output_hidden_states (`bool`, *optional*):
            Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors
            for more detail.
        return_dict (`bool`, *optional*):
            Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
    """
    output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
    output_hidden_states = (
        output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
    )
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    # retrieve input_ids and inputs_embeds
    if input_ids is not None and inputs_embeds is not None:
        raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
    elif input_ids is not None:
        self.warn_if_padding_and_no_attention_mask(input_ids, attention_mask)
        input_shape = input_ids.shape
        input_ids = input_ids.view(-1, input_shape[-1])
    elif inputs_embeds is not None:
        input_shape = inputs_embeds.shape[:-1]
    else:
        raise ValueError("You have to specify either input_ids or inputs_embeds")

    if inputs_embeds is None:
        inputs_embeds = self.embed_tokens(input_ids) * self.embed_scale

    embed_pos = self.embed_positions(input_shape)

    hidden_states = inputs_embeds + embed_pos

    hidden_states = ops.dropout(hidden_states, p=self.dropout, training=self.training)

    # expand attention_mask
    if attention_mask is not None:
        # [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len]
        attention_mask = _prepare_4d_attention_mask(attention_mask, inputs_embeds.dtype)

    encoder_states = () if output_hidden_states else None
    all_attentions = () if output_attentions else None

    # check if head_mask has a correct number of layers specified if desired
    if head_mask is not None:
        if head_mask.shape[0] != len(self.layers):
            raise ValueError(
                f"The head_mask should be specified for {len(self.layers)} layers, but it is for"
                f" {head_mask.shape[0]}."
            )
    for idx, encoder_layer in enumerate(self.layers):
        if output_hidden_states:
            encoder_states = encoder_states + (hidden_states,)
        # add LayerDrop (see https://arxiv.org/abs/1909.11556 for description)
        to_drop = False
        if self.training:
            dropout_probability = ops.rand([])
            if dropout_probability < self.layerdrop:  # skip the layer
                to_drop = True

        if to_drop:
            layer_outputs = (None, None)
        else:
            if self.gradient_checkpointing and self.training:
                layer_outputs = self._gradient_checkpointing_func(
                    encoder_layer.__call__,
                    hidden_states,
                    attention_mask,
                    (head_mask[idx] if head_mask is not None else None),
                    output_attentions,
                )
            else:
                layer_outputs = encoder_layer(
                    hidden_states,
                    attention_mask,
                    layer_head_mask=(head_mask[idx] if head_mask is not None else None),
                    output_attentions=output_attentions,
                )

            hidden_states = layer_outputs[0]

        if output_attentions:
            all_attentions = all_attentions + (layer_outputs[1],)

    hidden_states = self.layer_norm(hidden_states)

    if output_hidden_states:
        encoder_states = encoder_states + (hidden_states,)

    if not return_dict:
        return tuple(v for v in [hidden_states, encoder_states, all_attentions] if v is not None)
    return BaseModelOutput(
        last_hidden_state=hidden_states, hidden_states=encoder_states, attentions=all_attentions
    )

mindnlp.transformers.models.pegasus.modeling_pegasus.PegasusEncoder.get_position_embeddings()

Returns the position embeddings matrix

Source code in mindnlp/transformers/models/pegasus/modeling_pegasus.py
720
721
722
723
724
def get_position_embeddings(self) -> nn.Embedding:
    """
    Returns the position embeddings matrix
    """
    return self.embed_positions

mindnlp.transformers.models.pegasus.modeling_pegasus.PegasusEncoder.resize_position_embeddings(new_num_position_embeddings)

Resizes position embeddings matrix of the model if new_num_position_embeddings != config.max_position_embeddings.

PARAMETER DESCRIPTION
new_num_position_embeddings

The number of new position embeddings.

  • If position embeddings are learned, increasing the size will add newly initialized vectors at the end, whereas reducing the size will remove vectors from the end.
  • If position embeddings are not learned (e.g. sinusoidal position embeddings), increasing the size will add correct vectors at the end following the position encoding algorithm, whereas reducing the size will remove vectors from the end.

TYPE: `int`

Source code in mindnlp/transformers/models/pegasus/modeling_pegasus.py
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
def resize_position_embeddings(self, new_num_position_embeddings: int):
    """
    Resizes position embeddings matrix of the model if `new_num_position_embeddings !=
    config.max_position_embeddings`.

    Arguments:
        new_num_position_embeddings (`int`):
            The number of new position embeddings.

            - If position embeddings are learned, increasing the size will add newly initialized vectors at the end,
            whereas reducing the size will remove vectors from the end.
            - If position embeddings are not learned (*e.g.* sinusoidal position embeddings), increasing the size
            will add correct vectors at the end following the position encoding algorithm, whereas reducing the size
            will remove vectors from the end.
    """
    logger.info(f"Setting `config.max_position_embeddings={new_num_position_embeddings}`...")
    self.config.max_position_embeddings = new_num_position_embeddings

    self.embed_positions = PegasusSinusoidalPositionalEmbedding(
        self.config.max_position_embeddings,
        self.config.d_model,
        self.padding_idx,
    )

mindnlp.transformers.models.pegasus.modeling_pegasus.PegasusEncoderLayer

Bases: Module

The PegasusEncoderLayer class represents a single layer of the Pegasus encoder. This layer includes self-attention, feed-forward neural network (FFN) processing, and layer normalization.

This class inherits from nn.Module and has the following attributes:

  • embed_dim: The dimension of the input embeddings
  • self_attn: The self-attention mechanism used in the layer
  • self_attn_layer_norm: The layer normalization applied after self-attention
  • dropout: The dropout rate applied during processing
  • activation_fn: The activation function used in the feed-forward neural network
  • activation_dropout: The dropout rate applied after the activation function
  • fc1: The first fully connected layer in the feed-forward neural network
  • fc2: The second fully connected layer in the feed-forward neural network
  • final_layer_norm: The layer normalization applied after the feed-forward neural network processing

The PegasusEncoderLayer class has a forward method that takes the following arguments:

  • hidden_states: Input to the layer of shape (batch, seq_len, embed_dim)
  • attention_mask: Attention mask of size (batch, 1, tgt_len, src_len) where padding elements are indicated by very large negative values
  • layer_head_mask: Mask for attention heads in a given layer of size (encoder_attention_heads,)
  • output_attentions: Whether or not to return the attentions tensors of all attention layers

The forward method returns the following outputs:

  • hidden_states: The processed hidden states
  • attn_weights: The attention weights if output_attentions is set to True
Source code in mindnlp/transformers/models/pegasus/modeling_pegasus.py
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
class PegasusEncoderLayer(nn.Module):

    '''
    The PegasusEncoderLayer class represents a single layer of the Pegasus encoder.
    This layer includes self-attention, feed-forward neural network (FFN) processing, and layer normalization.

    This class inherits from nn.Module and has the following attributes:

    - embed_dim: The dimension of the input embeddings
    - self_attn: The self-attention mechanism used in the layer
    - self_attn_layer_norm: The layer normalization applied after self-attention
    - dropout: The dropout rate applied during processing
    - activation_fn: The activation function used in the feed-forward neural network
    - activation_dropout: The dropout rate applied after the activation function
    - fc1: The first fully connected layer in the feed-forward neural network
    - fc2: The second fully connected layer in the feed-forward neural network
    - final_layer_norm: The layer normalization applied after the feed-forward neural network processing

    The PegasusEncoderLayer class has a forward method that takes the following arguments:

    - hidden_states: Input to the layer of shape `(batch, seq_len, embed_dim)`
    - attention_mask: Attention mask of size `(batch, 1, tgt_len, src_len)` where padding elements are indicated
    by very large negative values
    - layer_head_mask: Mask for attention heads in a given layer of size `(encoder_attention_heads,)`
    - output_attentions: Whether or not to return the attentions tensors of all attention layers

    The forward method returns the following outputs:

    - hidden_states: The processed hidden states
    - attn_weights: The attention weights if output_attentions is set to True
    '''
    def __init__(self, config: PegasusConfig):
        """
        Initialize a PegasusEncoderLayer object.

        Args:
            self (PegasusEncoderLayer): The instance of the PegasusEncoderLayer class.
            config (PegasusConfig):
                The configuration object containing parameters for initializing the encoder layer.

                - Type: PegasusConfig
                - Purpose: Specifies the configuration settings for the encoder layer.
                - Restrictions: Must be an instance of the PegasusConfig class.

        Returns:
            None.

        Raises:
            None
        """
        super().__init__()
        self.embed_dim = config.d_model

        self.self_attn = PEGASUS_ATTENTION_CLASSES[config._attn_implementation](
            embed_dim=self.embed_dim,
            num_heads=config.encoder_attention_heads,
            dropout=config.attention_dropout,
            config=config,
        )
        self.self_attn_layer_norm = nn.LayerNorm(self.embed_dim, eps=1e-5)
        self.dropout = config.dropout
        self.activation_fn = ACT2FN[config.activation_function]
        self.activation_dropout = config.activation_dropout
        self.fc1 = nn.Linear(self.embed_dim, config.encoder_ffn_dim)
        self.fc2 = nn.Linear(config.encoder_ffn_dim, self.embed_dim)
        self.final_layer_norm = nn.LayerNorm(self.embed_dim, eps=1e-5)

    def forward(
        self,
        hidden_states: mindspore.Tensor,
        attention_mask: mindspore.Tensor,
        layer_head_mask: mindspore.Tensor,
        output_attentions: bool = False,
    ) -> mindspore.Tensor:
        """
        Args:
            hidden_states (`mindspore.Tensor`): input to the layer of shape `(batch, seq_len, embed_dim)`
            attention_mask (`mindspore.Tensor`): attention mask of size
                `(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values.
            layer_head_mask (`mindspore.Tensor`): mask for attention heads in a given layer of size
                `(encoder_attention_heads,)`.
            output_attentions (`bool`, *optional*):
                Whether or not to return the attentions tensors of all attention layers. See `attentions` under
                returned tensors for more detail.
        """
        residual = hidden_states
        hidden_states = self.self_attn_layer_norm(hidden_states)
        hidden_states, attn_weights, _ = self.self_attn(
            hidden_states=hidden_states,
            attention_mask=attention_mask,
            layer_head_mask=layer_head_mask,
            output_attentions=output_attentions,
        )
        hidden_states = ops.dropout(hidden_states, p=self.dropout, training=self.training)
        hidden_states = residual + hidden_states

        residual = hidden_states
        hidden_states = self.final_layer_norm(hidden_states)
        hidden_states = self.activation_fn(self.fc1(hidden_states))
        hidden_states = ops.dropout(hidden_states, p=self.activation_dropout, training=self.training)
        hidden_states = self.fc2(hidden_states)
        hidden_states = ops.dropout(hidden_states, p=self.dropout, training=self.training)
        hidden_states = residual + hidden_states

        if hidden_states.dtype == mindspore.float16 and (
            ops.isinf(hidden_states).any() or ops.isnan(hidden_states).any()
        ):
            clamp_value = finfo(hidden_states.dtype, 'max') - 1000
            hidden_states = ops.clamp(hidden_states, min=-clamp_value, max=clamp_value)

        outputs = (hidden_states,)

        if output_attentions:
            outputs += (attn_weights,)

        return outputs

mindnlp.transformers.models.pegasus.modeling_pegasus.PegasusEncoderLayer.__init__(config)

Initialize a PegasusEncoderLayer object.

PARAMETER DESCRIPTION
self

The instance of the PegasusEncoderLayer class.

TYPE: PegasusEncoderLayer

config

The configuration object containing parameters for initializing the encoder layer.

  • Type: PegasusConfig
  • Purpose: Specifies the configuration settings for the encoder layer.
  • Restrictions: Must be an instance of the PegasusConfig class.

TYPE: PegasusConfig

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/pegasus/modeling_pegasus.py
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
def __init__(self, config: PegasusConfig):
    """
    Initialize a PegasusEncoderLayer object.

    Args:
        self (PegasusEncoderLayer): The instance of the PegasusEncoderLayer class.
        config (PegasusConfig):
            The configuration object containing parameters for initializing the encoder layer.

            - Type: PegasusConfig
            - Purpose: Specifies the configuration settings for the encoder layer.
            - Restrictions: Must be an instance of the PegasusConfig class.

    Returns:
        None.

    Raises:
        None
    """
    super().__init__()
    self.embed_dim = config.d_model

    self.self_attn = PEGASUS_ATTENTION_CLASSES[config._attn_implementation](
        embed_dim=self.embed_dim,
        num_heads=config.encoder_attention_heads,
        dropout=config.attention_dropout,
        config=config,
    )
    self.self_attn_layer_norm = nn.LayerNorm(self.embed_dim, eps=1e-5)
    self.dropout = config.dropout
    self.activation_fn = ACT2FN[config.activation_function]
    self.activation_dropout = config.activation_dropout
    self.fc1 = nn.Linear(self.embed_dim, config.encoder_ffn_dim)
    self.fc2 = nn.Linear(config.encoder_ffn_dim, self.embed_dim)
    self.final_layer_norm = nn.LayerNorm(self.embed_dim, eps=1e-5)

mindnlp.transformers.models.pegasus.modeling_pegasus.PegasusEncoderLayer.forward(hidden_states, attention_mask, layer_head_mask, output_attentions=False)

PARAMETER DESCRIPTION
hidden_states

input to the layer of shape (batch, seq_len, embed_dim)

TYPE: `mindspore.Tensor`

attention_mask

attention mask of size (batch, 1, tgt_len, src_len) where padding elements are indicated by very large negative values.

TYPE: `mindspore.Tensor`

layer_head_mask

mask for attention heads in a given layer of size (encoder_attention_heads,).

TYPE: `mindspore.Tensor`

output_attentions

Whether or not to return the attentions tensors of all attention layers. See attentions under returned tensors for more detail.

TYPE: `bool`, *optional* DEFAULT: False

Source code in mindnlp/transformers/models/pegasus/modeling_pegasus.py
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
def forward(
    self,
    hidden_states: mindspore.Tensor,
    attention_mask: mindspore.Tensor,
    layer_head_mask: mindspore.Tensor,
    output_attentions: bool = False,
) -> mindspore.Tensor:
    """
    Args:
        hidden_states (`mindspore.Tensor`): input to the layer of shape `(batch, seq_len, embed_dim)`
        attention_mask (`mindspore.Tensor`): attention mask of size
            `(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values.
        layer_head_mask (`mindspore.Tensor`): mask for attention heads in a given layer of size
            `(encoder_attention_heads,)`.
        output_attentions (`bool`, *optional*):
            Whether or not to return the attentions tensors of all attention layers. See `attentions` under
            returned tensors for more detail.
    """
    residual = hidden_states
    hidden_states = self.self_attn_layer_norm(hidden_states)
    hidden_states, attn_weights, _ = self.self_attn(
        hidden_states=hidden_states,
        attention_mask=attention_mask,
        layer_head_mask=layer_head_mask,
        output_attentions=output_attentions,
    )
    hidden_states = ops.dropout(hidden_states, p=self.dropout, training=self.training)
    hidden_states = residual + hidden_states

    residual = hidden_states
    hidden_states = self.final_layer_norm(hidden_states)
    hidden_states = self.activation_fn(self.fc1(hidden_states))
    hidden_states = ops.dropout(hidden_states, p=self.activation_dropout, training=self.training)
    hidden_states = self.fc2(hidden_states)
    hidden_states = ops.dropout(hidden_states, p=self.dropout, training=self.training)
    hidden_states = residual + hidden_states

    if hidden_states.dtype == mindspore.float16 and (
        ops.isinf(hidden_states).any() or ops.isnan(hidden_states).any()
    ):
        clamp_value = finfo(hidden_states.dtype, 'max') - 1000
        hidden_states = ops.clamp(hidden_states, min=-clamp_value, max=clamp_value)

    outputs = (hidden_states,)

    if output_attentions:
        outputs += (attn_weights,)

    return outputs

mindnlp.transformers.models.pegasus.modeling_pegasus.PegasusForCausalLM

Bases: PegasusPreTrainedModel

This class represents a Pegasus model for causal language modeling (LM). It is a subclass of PegasusPreTrainedModel, which provides the basic infrastructure for loading and saving pre-trained models.

The PegasusForCausalLM class is designed for generating text in a causal manner, where each token is generated based on the previously generated tokens. It takes as input a sequence of tokens and predicts the probability distribution over the next token in the sequence.

The PegasusForCausalLM class provides various methods for interacting with the model. These include initializing the model with a configuration, getting and setting input and output embeddings, getting and setting the decoder, getting the position embeddings, resizing the position embeddings, and forwarding the model for generation.

The __init__ method initializes the PegasusForCausalLM object with a configuration. It sets the decoder configuration and initializes the model and the LM head.

The get_input_embeddings method returns the input embeddings of the model.

The set_input_embeddings method sets the input embeddings of the model to a new value.

The get_output_embeddings method returns the output embeddings (LM head) of the model.

The set_output_embeddings method sets the output embeddings (LM head) of the model to a new value.

The set_decoder method sets the decoder of the model to a new decoder.

The get_decoder method returns the decoder of the model.

The get_position_embeddings method returns the position embeddings matrix of the model.

The resize_position_embeddings method resizes the position embeddings matrix of the model if the new number of position embeddings is different from the maximum number of position embeddings specified in the configuration.

The forward method forwards the model for generation. It takes input tensors such as input_ids, attention_mask, encoder_hidden_states, and labels, and returns the model outputs, including the logits, loss, past key values, hidden states, attentions, and cross attentions.

The prepare_inputs_for_generation method prepares the inputs for generation. It takes input tensors such as input_ids, past_key_values, and attention_mask, and returns a dictionary of prepared inputs.

The _reorder_cache method reorders the past key values for generation based on the beam index.

Note

This class inherits from PegasusPreTrainedModel and provides additional methods specific to causal LM tasks.

Source code in mindnlp/transformers/models/pegasus/modeling_pegasus.py
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
2209
2210
2211
2212
2213
2214
2215
2216
2217
2218
2219
2220
2221
2222
2223
2224
2225
2226
2227
2228
2229
2230
2231
2232
2233
2234
2235
2236
2237
2238
2239
2240
2241
2242
2243
2244
2245
2246
2247
2248
2249
2250
2251
2252
2253
2254
2255
2256
2257
2258
2259
2260
2261
2262
2263
2264
2265
2266
2267
2268
2269
2270
2271
2272
2273
2274
2275
2276
2277
2278
2279
2280
2281
2282
2283
2284
2285
2286
2287
2288
2289
2290
2291
2292
2293
2294
2295
2296
2297
2298
2299
2300
class PegasusForCausalLM(PegasusPreTrainedModel):

    """
    This class represents a Pegasus model for causal language modeling (LM). It is a subclass of PegasusPreTrainedModel,
    which provides the basic infrastructure for loading and saving pre-trained models.

    The PegasusForCausalLM class is designed for generating text in a causal manner, where each token is generated
    based on the previously generated tokens. It takes as input a sequence of tokens and predicts the probability
    distribution over the next token in the sequence.

    The PegasusForCausalLM class provides various methods for interacting with the model. These include initializing
    the model with a configuration, getting and setting input and output embeddings, getting and setting the decoder,
    getting the position embeddings, resizing the position embeddings, and forwarding the model for generation.

    The `__init__` method initializes the PegasusForCausalLM object with a configuration.
    It sets the decoder configuration and initializes the model and the LM head.

    The `get_input_embeddings` method returns the input embeddings of the model.

    The `set_input_embeddings` method sets the input embeddings of the model to a new value.

    The `get_output_embeddings` method returns the output embeddings (LM head) of the model.

    The `set_output_embeddings` method sets the output embeddings (LM head) of the model to a new value.

    The `set_decoder` method sets the decoder of the model to a new decoder.

    The `get_decoder` method returns the decoder of the model.

    The `get_position_embeddings` method returns the position embeddings matrix of the model.

    The `resize_position_embeddings` method resizes the position embeddings matrix of the model if the new number of
    position embeddings is different from the maximum number of position embeddings specified in the configuration.

    The `forward` method forwards the model for generation. It takes input tensors such as input_ids, attention_mask,
    encoder_hidden_states, and labels, and returns the model outputs, including the logits, loss, past key values,
    hidden states, attentions, and cross attentions.

    The `prepare_inputs_for_generation` method prepares the inputs for generation. It takes input tensors such as
    input_ids, past_key_values, and attention_mask, and returns a dictionary of prepared inputs.

    The `_reorder_cache` method reorders the past key values for generation based on the beam index.

    Note:
        This class inherits from PegasusPreTrainedModel and provides additional methods specific to causal LM tasks.
    """
    _tied_weights_keys = ["lm_head.weight"]

    def __init__(self, config):
        """
        Initializes a new instance of the PegasusForCausalLM class.

        Args:
            self (PegasusForCausalLM): The instance of the PegasusForCausalLM class.
            config (object): The configuration object containing settings for the model.
                This object is deep copied to avoid modification of the original configuration.
                It must have the following attributes:

                - is_decoder (bool): Set to True.
                - is_encoder_decoder (bool): Set to False.

        Returns:
            None.

        Raises:
            None.
        """
        config = copy.deepcopy(config)
        config.is_decoder = True
        config.is_encoder_decoder = False
        super().__init__(config)
        self.model = PegasusDecoderWrapper(config)

        self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=False)

        # Initialize weights and apply final processing
        self.post_init()

    def get_input_embeddings(self):
        """
        Method: get_input_embeddings

        Description:
        This method retrieves the input embeddings from the PegasusForCausalLM model.

        Args:
            self: PegasusForCausalLM instance. Represents the current instance of the PegasusForCausalLM class.

        Returns:
            None

        Raises:
            None
        """
        return self.model.decoder.embed_tokens

    def set_input_embeddings(self, value):
        """
        set_input_embeddings method in the PegasusForCausalLM class sets the input embeddings for the model.

        Args:
            self (PegasusForCausalLM): The instance of the PegasusForCausalLM class.
            value (torch.Tensor): The input embeddings to be set for the model.
                It should be a tensor of shape (vocab_size, embedding_dim).

        Returns:
            None.

        Raises:
            None.
        """
        self.model.decoder.embed_tokens = value

    def get_output_embeddings(self):
        """
        Method to retrieve the output embeddings from the PegasusForCausalLM model.

        Args:
            self (PegasusForCausalLM): The instance of the PegasusForCausalLM class.
                This parameter is a reference to the current instance of the class.

        Returns:
            None: This method returns None, as it retrieves the output embeddings from the model
                and does not return any specific value.

        Raises:
            None.
        """
        return self.lm_head

    def set_output_embeddings(self, new_embeddings):
        """
        Set the output embeddings for PegasusForCausalLM model.

        Args:
            self (PegasusForCausalLM): The instance of the PegasusForCausalLM class.
            new_embeddings (object): The new embeddings to be set as output embeddings for the model.

        Returns:
            None.

        Raises:
            None.
        """
        self.lm_head = new_embeddings

    def set_decoder(self, decoder):
        """
        Sets the decoder of the PegasusForCausalLM model.

        Args:
            self (PegasusForCausalLM): The instance of the PegasusForCausalLM class.
            decoder (object): The decoder object to be set for the model.

        Returns:
            None: This method modifies the decoder attribute of the PegasusForCausalLM instance.

        Raises:
            None.
        """
        self.model.decoder = decoder

    def get_decoder(self):
        """
        This method retrieves the decoder component of the PegasusForCausalLM model.

        Args:
            self: An instance of the PegasusForCausalLM class.

        Returns:
            decoder: The method returns the decoder component of the model.

        Raises:
            None.
        """
        return self.model.decoder

    def get_position_embeddings(self) -> nn.Embedding:
        """
        Returns the position embeddings matrix
        """
        return self.model.decoder.get_position_embeddings()

    def resize_position_embeddings(self, new_num_position_embeddings: int):
        """
        Resizes position embeddings matrix of the model if `new_num_position_embeddings !=
        config.max_position_embeddings`.

        Arguments:
            new_num_position_embeddings (`int`):
                The number of new position embeddings.

                - If position embeddings are learned, increasing the size will add newly initialized vectors at the end,
                whereas reducing the size will remove vectors from the end.
                - If position embeddings are not learned (*e.g.* sinusoidal position embeddings), increasing the size
                will add correct vectors at the end following the position encoding algorithm, whereas reducing the size
                will remove vectors from the end.
        """
        self.config.max_position_embeddings = new_num_position_embeddings
        self.model.decoder.resize_position_embeddings(new_num_position_embeddings)

    # Copied from transformers.models.bart.modeling_bart.BartForCausalLM.forward with Bart->Pegasus, facebook/bart-base->google/pegasus-large
    def forward(
        self,
        input_ids: mindspore.Tensor = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        encoder_hidden_states: Optional[mindspore.Tensor] = None,
        encoder_attention_mask: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        cross_attn_head_mask: Optional[mindspore.Tensor] = None,
        past_key_values: Optional[List[mindspore.Tensor]] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        labels: Optional[mindspore.Tensor] = None,
        use_cache: Optional[bool] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple, CausalLMOutputWithCrossAttentions]:
        r"""
        Args:
            input_ids (`mindspore.Tensor` of shape `(batch_size, sequence_length)`):
                Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you
                provide it.

                Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and
                [`PreTrainedTokenizer.__call__`] for details.

                [What are input IDs?](../glossary#input-ids)
            attention_mask (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
                Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:

                - 1 for tokens that are **not masked**,
                - 0 for tokens that are **masked**.

                [What are attention masks?](../glossary#attention-mask)
            encoder_hidden_states  (`mindspore.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
                Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention
                if the model is configured as a decoder.
            encoder_attention_mask (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
                Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used
                in the cross-attention if the model is configured as a decoder. Mask values selected in `[0, 1]`:
            head_mask (`mindspore.Tensor` of shape `(decoder_layers, decoder_attention_heads)`, *optional*):
                Mask to nullify selected heads of the attention modules. Mask values selected in `[0, 1]`:

                - 1 indicates the head is **not masked**,
                - 0 indicates the head is **masked**.

            cross_attn_head_mask (`mindspore.Tensor` of shape `(decoder_layers, decoder_attention_heads)`, *optional*):
                Mask to nullify selected heads of the cross-attention modules. Mask values selected in `[0, 1]`:

                - 1 indicates the head is **not masked**,
                - 0 indicates the head is **masked**.

            past_key_values (`tuple(tuple(mindspore.Tensor))`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`):
                Tuple of `tuple(mindspore.Tensor)` of length `config.n_layers`, with each tuple having 2 tensors of
                shape `(batch_size, num_heads, sequence_length, embed_size_per_head)`) and 2 additional tensors of
                shape `(batch_size, num_heads, encoder_sequence_length, embed_size_per_head)`. The two additional
                tensors are only required when the model is used as a decoder in a Sequence to Sequence model.

                Contains pre-computed hidden-states (key and values in the self-attention blocks and in the
                cross-attention blocks) that can be used (see `past_key_values` input) to speed up sequential decoding.

                If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
                that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
                all `decoder_input_ids` of shape `(batch_size, sequence_length)`.
            labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
                Labels for computing the masked language modeling loss. Indices should either be in `[0, ...,
                config.vocab_size]` or -100 (see `input_ids` docstring). Tokens with indices set to `-100` are ignored
                (masked), the loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`.
            use_cache (`bool`, *optional*):
                If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
                (see `past_key_values`).

                - 1 for tokens that are **not masked**,
                - 0 for tokens that are **masked**.

            output_attentions (`bool`, *optional*):
                Whether or not to return the attentions tensors of all attention layers. See `attentions` under
                returned tensors for more detail.
            output_hidden_states (`bool`, *optional*):
                Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors
                for more detail.
            return_dict (`bool`, *optional*):
                Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.

        Returns:
            Union[Tuple, CausalLMOutputWithCrossAttentions]

        Example:
            ```python
            >>> from transformers import AutoTokenizer, PegasusForCausalLM
            ...
            >>> tokenizer = AutoTokenizer.from_pretrained("google/pegasus-large")
            >>> model = PegasusForCausalLM.from_pretrained("google/pegasus-large", add_cross_attention=False)
            >>> assert model.config.is_decoder, f"{model.__class__} has to be configured as a decoder."
            >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
            >>> outputs = model(**inputs)
            ...
            >>> logits = outputs.logits
            >>> expected_shape = [1, inputs.input_ids.shape[-1], model.config.vocab_size]
            >>> list(logits.shape) == expected_shape
            True
            ```
        """
        output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
        output_hidden_states = (
            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
        )
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        # decoder outputs consists of (dec_features, layer_state, dec_hidden, dec_attn)
        outputs = self.model.decoder(
            input_ids=input_ids,
            attention_mask=attention_mask,
            encoder_hidden_states=encoder_hidden_states,
            encoder_attention_mask=encoder_attention_mask,
            head_mask=head_mask,
            cross_attn_head_mask=cross_attn_head_mask,
            past_key_values=past_key_values,
            inputs_embeds=inputs_embeds,
            use_cache=use_cache,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        logits = self.lm_head(outputs[0])

        loss = None
        if labels is not None:
            loss = ops.cross_entropy(logits.view(-1, self.config.vocab_size), labels.view(-1).astype(mindspore.int32))

        if not return_dict:
            output = (logits,) + outputs[1:]
            return (loss,) + output if loss is not None else output

        return CausalLMOutputWithCrossAttentions(
            loss=loss,
            logits=logits,
            past_key_values=outputs.past_key_values,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
            cross_attentions=outputs.cross_attentions,
        )

    def prepare_inputs_for_generation(
        self, input_ids, past_key_values=None, attention_mask=None, use_cache=None, **kwargs
    ):
        """
        Prepare inputs for generation in the PegasusForCausalLM class.

        This method prepares inputs for generating text by adjusting input_ids and attention_mask based on
        past_key_values if provided.

        Args:
            self (PegasusForCausalLM): The instance of the PegasusForCausalLM class.
            input_ids (torch.Tensor): The input tensor containing token ids for the model.
            past_key_values (tuple, optional): Tuple of past key values for faster generation, if available.
            attention_mask (torch.Tensor, optional): Tensor indicating which tokens should be attended to.
            use_cache (bool, optional): Flag indicating whether to use cache for faster decoding.

        Returns:
            dict:
                A dictionary containing the following keys:

                - input_ids (torch.Tensor): The adjusted input tensor after processing.
                - attention_mask (torch.Tensor): The attention mask for the input tensor.
                - past_key_values (tuple): Past key values if provided, else None.
                - use_cache (bool): Flag indicating whether to use cache for faster decoding.

        Raises:
            ValueError: If the input_ids and past_key_values shapes are incompatible.
            IndexError: If the input_ids shape is invalid for processing.
        """
        # if model is used as a decoder in encoder-decoder model, the decoder attention mask is created on the fly
        if attention_mask is None:
            attention_mask = input_ids.new_ones(input_ids.shape)

        if past_key_values:
            past_length = past_key_values[0][0].shape[2]

            # Some generation methods already pass only the last input ID
            if input_ids.shape[1] > past_length:
                remove_prefix_length = past_length
            else:
                # Default to old behavior: keep only final ID
                remove_prefix_length = input_ids.shape[1] - 1

            input_ids = input_ids[:, remove_prefix_length:]
        # first step, decoder_cached_states are empty
        return {
            "input_ids": input_ids,  # encoder_outputs is defined. input_ids not needed
            "attention_mask": attention_mask,
            "past_key_values": past_key_values,
            "use_cache": use_cache,
        }

    @staticmethod
    def _reorder_cache(past_key_values, beam_idx):
        """
        Reorders the cache for beam search in the PegasusForCausalLM class.

        Args:
            past_key_values (tuple): A tuple of past key-values containing cached states for each layer.
            beam_idx (torch.Tensor): A tensor representing the indices of the selected beams.

        Returns:
            tuple: A tuple of reordered past key-values for each layer.

        Raises:
            None.

        This static method reorders the cache for beam search in the PegasusForCausalLM class. It takes two parameters:

        - `past_key_values`: A tuple of past key-values which contains the cached states for each layer.
        This is used to keep track of the previous states.
        - `beam_idx`: A tensor representing the indices of the selected beams. This tensor is used to select the
        states corresponding to the selected beams.

        The method returns a tuple of reordered past key-values for each layer.
        This reordering is done by selecting the states in each layer's past key-values tensor based on the beam
        indices provided.

        The method does not raise any exceptions.
        """
        reordered_past = ()
        for layer_past in past_key_values:
            reordered_past += (
                tuple(past_state.index_select(0, beam_idx) for past_state in layer_past),
            )
        return reordered_past

mindnlp.transformers.models.pegasus.modeling_pegasus.PegasusForCausalLM.__init__(config)

Initializes a new instance of the PegasusForCausalLM class.

PARAMETER DESCRIPTION
self

The instance of the PegasusForCausalLM class.

TYPE: PegasusForCausalLM

config

The configuration object containing settings for the model. This object is deep copied to avoid modification of the original configuration. It must have the following attributes:

  • is_decoder (bool): Set to True.
  • is_encoder_decoder (bool): Set to False.

TYPE: object

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/pegasus/modeling_pegasus.py
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
def __init__(self, config):
    """
    Initializes a new instance of the PegasusForCausalLM class.

    Args:
        self (PegasusForCausalLM): The instance of the PegasusForCausalLM class.
        config (object): The configuration object containing settings for the model.
            This object is deep copied to avoid modification of the original configuration.
            It must have the following attributes:

            - is_decoder (bool): Set to True.
            - is_encoder_decoder (bool): Set to False.

    Returns:
        None.

    Raises:
        None.
    """
    config = copy.deepcopy(config)
    config.is_decoder = True
    config.is_encoder_decoder = False
    super().__init__(config)
    self.model = PegasusDecoderWrapper(config)

    self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=False)

    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.pegasus.modeling_pegasus.PegasusForCausalLM.forward(input_ids=None, attention_mask=None, encoder_hidden_states=None, encoder_attention_mask=None, head_mask=None, cross_attn_head_mask=None, past_key_values=None, inputs_embeds=None, labels=None, use_cache=None, output_attentions=None, output_hidden_states=None, return_dict=None)

PARAMETER DESCRIPTION
input_ids

Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide it.

Indices can be obtained using [AutoTokenizer]. See [PreTrainedTokenizer.encode] and [PreTrainedTokenizer.__call__] for details.

What are input IDs?

TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length)` DEFAULT: None

attention_mask

Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:

  • 1 for tokens that are not masked,
  • 0 for tokens that are masked.

What are attention masks?

TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional* DEFAULT: None

encoder_hidden_states

Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention if the model is configured as a decoder.

TYPE: (`mindspore.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional* DEFAULT: None

encoder_attention_mask

Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used in the cross-attention if the model is configured as a decoder. Mask values selected in [0, 1]:

TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional* DEFAULT: None

head_mask

Mask to nullify selected heads of the attention modules. Mask values selected in [0, 1]:

  • 1 indicates the head is not masked,
  • 0 indicates the head is masked.

TYPE: `mindspore.Tensor` of shape `(decoder_layers, decoder_attention_heads)`, *optional* DEFAULT: None

cross_attn_head_mask

Mask to nullify selected heads of the cross-attention modules. Mask values selected in [0, 1]:

  • 1 indicates the head is not masked,
  • 0 indicates the head is masked.

TYPE: `mindspore.Tensor` of shape `(decoder_layers, decoder_attention_heads)`, *optional* DEFAULT: None

past_key_values

Tuple of tuple(mindspore.Tensor) of length config.n_layers, with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)) and 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head). The two additional tensors are only required when the model is used as a decoder in a Sequence to Sequence model.

Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) that can be used (see past_key_values input) to speed up sequential decoding.

If past_key_values are used, the user can optionally input only the last decoder_input_ids (those that don't have their past key value states given to this model) of shape (batch_size, 1) instead of all decoder_input_ids of shape (batch_size, sequence_length).

TYPE: `tuple(tuple(mindspore.Tensor))`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True` DEFAULT: None

labels

Labels for computing the masked language modeling loss. Indices should either be in [0, ..., config.vocab_size] or -100 (see input_ids docstring). Tokens with indices set to -100 are ignored (masked), the loss is only computed for the tokens with labels in [0, ..., config.vocab_size].

TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional* DEFAULT: None

use_cache

If set to True, past_key_values key value states are returned and can be used to speed up decoding (see past_key_values).

  • 1 for tokens that are not masked,
  • 0 for tokens that are masked.

TYPE: `bool`, *optional* DEFAULT: None

output_attentions

Whether or not to return the attentions tensors of all attention layers. See attentions under returned tensors for more detail.

TYPE: `bool`, *optional* DEFAULT: None

output_hidden_states

Whether or not to return the hidden states of all layers. See hidden_states under returned tensors for more detail.

TYPE: `bool`, *optional* DEFAULT: None

return_dict

Whether or not to return a [~utils.ModelOutput] instead of a plain tuple.

TYPE: `bool`, *optional* DEFAULT: None

RETURNS DESCRIPTION
Union[Tuple, CausalLMOutputWithCrossAttentions]

Union[Tuple, CausalLMOutputWithCrossAttentions]

Example
>>> from transformers import AutoTokenizer, PegasusForCausalLM
...
>>> tokenizer = AutoTokenizer.from_pretrained("google/pegasus-large")
>>> model = PegasusForCausalLM.from_pretrained("google/pegasus-large", add_cross_attention=False)
>>> assert model.config.is_decoder, f"{model.__class__} has to be configured as a decoder."
>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> outputs = model(**inputs)
...
>>> logits = outputs.logits
>>> expected_shape = [1, inputs.input_ids.shape[-1], model.config.vocab_size]
>>> list(logits.shape) == expected_shape
True
Source code in mindnlp/transformers/models/pegasus/modeling_pegasus.py
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
2209
2210
2211
2212
2213
def forward(
    self,
    input_ids: mindspore.Tensor = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    encoder_hidden_states: Optional[mindspore.Tensor] = None,
    encoder_attention_mask: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    cross_attn_head_mask: Optional[mindspore.Tensor] = None,
    past_key_values: Optional[List[mindspore.Tensor]] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    labels: Optional[mindspore.Tensor] = None,
    use_cache: Optional[bool] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[Tuple, CausalLMOutputWithCrossAttentions]:
    r"""
    Args:
        input_ids (`mindspore.Tensor` of shape `(batch_size, sequence_length)`):
            Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you
            provide it.

            Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and
            [`PreTrainedTokenizer.__call__`] for details.

            [What are input IDs?](../glossary#input-ids)
        attention_mask (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
            Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:

            - 1 for tokens that are **not masked**,
            - 0 for tokens that are **masked**.

            [What are attention masks?](../glossary#attention-mask)
        encoder_hidden_states  (`mindspore.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
            Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention
            if the model is configured as a decoder.
        encoder_attention_mask (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
            Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used
            in the cross-attention if the model is configured as a decoder. Mask values selected in `[0, 1]`:
        head_mask (`mindspore.Tensor` of shape `(decoder_layers, decoder_attention_heads)`, *optional*):
            Mask to nullify selected heads of the attention modules. Mask values selected in `[0, 1]`:

            - 1 indicates the head is **not masked**,
            - 0 indicates the head is **masked**.

        cross_attn_head_mask (`mindspore.Tensor` of shape `(decoder_layers, decoder_attention_heads)`, *optional*):
            Mask to nullify selected heads of the cross-attention modules. Mask values selected in `[0, 1]`:

            - 1 indicates the head is **not masked**,
            - 0 indicates the head is **masked**.

        past_key_values (`tuple(tuple(mindspore.Tensor))`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`):
            Tuple of `tuple(mindspore.Tensor)` of length `config.n_layers`, with each tuple having 2 tensors of
            shape `(batch_size, num_heads, sequence_length, embed_size_per_head)`) and 2 additional tensors of
            shape `(batch_size, num_heads, encoder_sequence_length, embed_size_per_head)`. The two additional
            tensors are only required when the model is used as a decoder in a Sequence to Sequence model.

            Contains pre-computed hidden-states (key and values in the self-attention blocks and in the
            cross-attention blocks) that can be used (see `past_key_values` input) to speed up sequential decoding.

            If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
            that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
            all `decoder_input_ids` of shape `(batch_size, sequence_length)`.
        labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
            Labels for computing the masked language modeling loss. Indices should either be in `[0, ...,
            config.vocab_size]` or -100 (see `input_ids` docstring). Tokens with indices set to `-100` are ignored
            (masked), the loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`.
        use_cache (`bool`, *optional*):
            If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
            (see `past_key_values`).

            - 1 for tokens that are **not masked**,
            - 0 for tokens that are **masked**.

        output_attentions (`bool`, *optional*):
            Whether or not to return the attentions tensors of all attention layers. See `attentions` under
            returned tensors for more detail.
        output_hidden_states (`bool`, *optional*):
            Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors
            for more detail.
        return_dict (`bool`, *optional*):
            Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.

    Returns:
        Union[Tuple, CausalLMOutputWithCrossAttentions]

    Example:
        ```python
        >>> from transformers import AutoTokenizer, PegasusForCausalLM
        ...
        >>> tokenizer = AutoTokenizer.from_pretrained("google/pegasus-large")
        >>> model = PegasusForCausalLM.from_pretrained("google/pegasus-large", add_cross_attention=False)
        >>> assert model.config.is_decoder, f"{model.__class__} has to be configured as a decoder."
        >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
        >>> outputs = model(**inputs)
        ...
        >>> logits = outputs.logits
        >>> expected_shape = [1, inputs.input_ids.shape[-1], model.config.vocab_size]
        >>> list(logits.shape) == expected_shape
        True
        ```
    """
    output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
    output_hidden_states = (
        output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
    )
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    # decoder outputs consists of (dec_features, layer_state, dec_hidden, dec_attn)
    outputs = self.model.decoder(
        input_ids=input_ids,
        attention_mask=attention_mask,
        encoder_hidden_states=encoder_hidden_states,
        encoder_attention_mask=encoder_attention_mask,
        head_mask=head_mask,
        cross_attn_head_mask=cross_attn_head_mask,
        past_key_values=past_key_values,
        inputs_embeds=inputs_embeds,
        use_cache=use_cache,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    logits = self.lm_head(outputs[0])

    loss = None
    if labels is not None:
        loss = ops.cross_entropy(logits.view(-1, self.config.vocab_size), labels.view(-1).astype(mindspore.int32))

    if not return_dict:
        output = (logits,) + outputs[1:]
        return (loss,) + output if loss is not None else output

    return CausalLMOutputWithCrossAttentions(
        loss=loss,
        logits=logits,
        past_key_values=outputs.past_key_values,
        hidden_states=outputs.hidden_states,
        attentions=outputs.attentions,
        cross_attentions=outputs.cross_attentions,
    )

mindnlp.transformers.models.pegasus.modeling_pegasus.PegasusForCausalLM.get_decoder()

This method retrieves the decoder component of the PegasusForCausalLM model.

PARAMETER DESCRIPTION
self

An instance of the PegasusForCausalLM class.

RETURNS DESCRIPTION
decoder

The method returns the decoder component of the model.

Source code in mindnlp/transformers/models/pegasus/modeling_pegasus.py
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
def get_decoder(self):
    """
    This method retrieves the decoder component of the PegasusForCausalLM model.

    Args:
        self: An instance of the PegasusForCausalLM class.

    Returns:
        decoder: The method returns the decoder component of the model.

    Raises:
        None.
    """
    return self.model.decoder

mindnlp.transformers.models.pegasus.modeling_pegasus.PegasusForCausalLM.get_input_embeddings()

Description: This method retrieves the input embeddings from the PegasusForCausalLM model.

PARAMETER DESCRIPTION
self

PegasusForCausalLM instance. Represents the current instance of the PegasusForCausalLM class.

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/pegasus/modeling_pegasus.py
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
def get_input_embeddings(self):
    """
    Method: get_input_embeddings

    Description:
    This method retrieves the input embeddings from the PegasusForCausalLM model.

    Args:
        self: PegasusForCausalLM instance. Represents the current instance of the PegasusForCausalLM class.

    Returns:
        None

    Raises:
        None
    """
    return self.model.decoder.embed_tokens

mindnlp.transformers.models.pegasus.modeling_pegasus.PegasusForCausalLM.get_output_embeddings()

Method to retrieve the output embeddings from the PegasusForCausalLM model.

PARAMETER DESCRIPTION
self

The instance of the PegasusForCausalLM class. This parameter is a reference to the current instance of the class.

TYPE: PegasusForCausalLM

RETURNS DESCRIPTION
None

This method returns None, as it retrieves the output embeddings from the model and does not return any specific value.

Source code in mindnlp/transformers/models/pegasus/modeling_pegasus.py
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
def get_output_embeddings(self):
    """
    Method to retrieve the output embeddings from the PegasusForCausalLM model.

    Args:
        self (PegasusForCausalLM): The instance of the PegasusForCausalLM class.
            This parameter is a reference to the current instance of the class.

    Returns:
        None: This method returns None, as it retrieves the output embeddings from the model
            and does not return any specific value.

    Raises:
        None.
    """
    return self.lm_head

mindnlp.transformers.models.pegasus.modeling_pegasus.PegasusForCausalLM.get_position_embeddings()

Returns the position embeddings matrix

Source code in mindnlp/transformers/models/pegasus/modeling_pegasus.py
2047
2048
2049
2050
2051
def get_position_embeddings(self) -> nn.Embedding:
    """
    Returns the position embeddings matrix
    """
    return self.model.decoder.get_position_embeddings()

mindnlp.transformers.models.pegasus.modeling_pegasus.PegasusForCausalLM.prepare_inputs_for_generation(input_ids, past_key_values=None, attention_mask=None, use_cache=None, **kwargs)

Prepare inputs for generation in the PegasusForCausalLM class.

This method prepares inputs for generating text by adjusting input_ids and attention_mask based on past_key_values if provided.

PARAMETER DESCRIPTION
self

The instance of the PegasusForCausalLM class.

TYPE: PegasusForCausalLM

input_ids