Skip to content

cpmant

mindnlp.transformers.models.cpmant.configuration_cpmant

CPMAnt model configuration

mindnlp.transformers.models.cpmant.configuration_cpmant.CpmAntConfig

Bases: PretrainedConfig

This is the configuration class to store the configuration of a [CpmAntModel]. It is used to instantiate an CPMAnt model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the CPMAnt openbmb/cpm-ant-10b architecture.

Configuration objects inherit from [PretrainedConfig] and can be used to control the model outputs. Read the documentation from [PretrainedConfig] for more information.

PARAMETER DESCRIPTION
vocab_size

Vocabulary size of the CPMAnt model. Defines the number of different tokens that can be represented by the input passed when calling [CpmAntModel].

TYPE: `int`, *optional*, defaults to 30720 DEFAULT: 30720

hidden_size

Dimension of the encoder layers.

TYPE: `int`, *optional*, defaults to 4096 DEFAULT: 4096

num_attention_heads

Number of attention heads in the Transformer encoder.

TYPE: `int`, *optional*, defaults to 32 DEFAULT: 32

dim_head

Dimension of attention heads for each attention layer in the Transformer encoder.

TYPE: `int`, *optional*, defaults to 128 DEFAULT: 128

dim_ff

Dimension of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder.

TYPE: `int`, *optional*, defaults to 10240 DEFAULT: 10240

num_hidden_layers

Number of layers of the Transformer encoder.

TYPE: `int`, *optional*, defaults to 48 DEFAULT: 48

dropout_p

The dropout probability for all fully connected layers in the embeddings, encoder.

TYPE: `float`, *optional*, defaults to 0.0 DEFAULT: 0.0

position_bias_num_buckets

The number of position_bias buckets.

TYPE: `int`, *optional*, defaults to 512 DEFAULT: 512

position_bias_max_distance

The maximum sequence length that this model might ever be used with. Typically set this to something large just in case (e.g., 512 or 1024 or 2048).

TYPE: `int`, *optional*, defaults to 2048 DEFAULT: 2048

eps

The epsilon used by the layer normalization layers.

TYPE: `float`, *optional*, defaults to 1e-06 DEFAULT: 1e-06

init_std

Initialize parameters with std = init_std.

TYPE: `float`, *optional*, defaults to 1.0 DEFAULT: 1.0

prompt_types

The type of prompt.

TYPE: `int`, *optional*, defaults to 32 DEFAULT: 32

prompt_length

The length of prompt.

TYPE: `int`, *optional*, defaults to 32 DEFAULT: 32

segment_types

The type of segment.

TYPE: `int`, *optional*, defaults to 32 DEFAULT: 32

use_cache

Whether to use cache.

TYPE: `bool`, *optional*, defaults to `True` DEFAULT: True

Example
>>> from transformers import CpmAntModel, CpmAntConfig
...
>>> # Initializing a CPMAnt cpm-ant-10b style configuration
>>> configuration = CpmAntConfig()
...
>>> # Initializing a model from the cpm-ant-10b style configuration
>>> model = CpmAntModel(configuration)
...
>>> # Accessing the model configuration
>>> configuration = model.config
Source code in mindnlp/transformers/models/cpmant/configuration_cpmant.py
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
class CpmAntConfig(PretrainedConfig):
    r"""
    This is the configuration class to store the configuration of a [`CpmAntModel`]. It is used to instantiate an
    CPMAnt model according to the specified arguments, defining the model architecture. Instantiating a configuration
    with the defaults will yield a similar configuration to that of the CPMAnt
    [openbmb/cpm-ant-10b](https://hf-mirror.com/openbmb/cpm-ant-10b) architecture.

    Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
    documentation from [`PretrainedConfig`] for more information.

    Args:
        vocab_size (`int`, *optional*, defaults to 30720):
            Vocabulary size of the CPMAnt model. Defines the number of different tokens that can be represented by the
            `input` passed when calling [`CpmAntModel`].
        hidden_size (`int`, *optional*, defaults to 4096):
            Dimension of the encoder layers.
        num_attention_heads (`int`, *optional*, defaults to 32):
            Number of attention heads in the Transformer encoder.
        dim_head (`int`, *optional*, defaults to 128):
            Dimension of attention heads for each attention layer in the Transformer encoder.
        dim_ff (`int`, *optional*, defaults to 10240):
            Dimension of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder.
        num_hidden_layers (`int`, *optional*, defaults to 48):
            Number of layers of the Transformer encoder.
        dropout_p (`float`, *optional*, defaults to 0.0):
            The dropout probability for all fully connected layers in the embeddings, encoder.
        position_bias_num_buckets (`int`, *optional*, defaults to 512):
            The number of position_bias buckets.
        position_bias_max_distance (`int`, *optional*, defaults to 2048):
            The maximum sequence length that this model might ever be used with. Typically set this to something large
            just in case (e.g., 512 or 1024 or 2048).
        eps (`float`, *optional*, defaults to 1e-06):
            The epsilon used by the layer normalization layers.
        init_std (`float`, *optional*, defaults to 1.0):
            Initialize parameters with std = init_std.
        prompt_types (`int`, *optional*, defaults to 32):
            The type of prompt.
        prompt_length (`int`, *optional*, defaults to 32):
            The length of prompt.
        segment_types (`int`, *optional*, defaults to 32):
            The type of segment.
        use_cache (`bool`, *optional*, defaults to `True`):
            Whether to use cache.

    Example:
        ```python
        >>> from transformers import CpmAntModel, CpmAntConfig
        ...
        >>> # Initializing a CPMAnt cpm-ant-10b style configuration
        >>> configuration = CpmAntConfig()
        ...
        >>> # Initializing a model from the cpm-ant-10b style configuration
        >>> model = CpmAntModel(configuration)
        ...
        >>> # Accessing the model configuration
        >>> configuration = model.config
        ```
    """
    model_type = "cpmant"

    def __init__(
        self,
        vocab_size: int = 30720,
        hidden_size: int = 4096,
        num_attention_heads: int = 32,
        dim_head: int = 128,
        dim_ff: int = 10240,
        num_hidden_layers: int = 48,
        dropout_p: int = 0.0,
        position_bias_num_buckets: int = 512,
        position_bias_max_distance: int = 2048,
        eps: int = 1e-6,
        init_std: float = 1.0,
        prompt_types: int = 32,
        prompt_length: int = 32,
        segment_types: int = 32,
        use_cache: bool = True,
        **kwargs,
    ):
        """
        Initializes an instance of the CpmAntConfig class.

        Args:
            self (CpmAntConfig): The instance of the CpmAntConfig class.
            vocab_size (int): The size of the vocabulary. Defaults to 30720.
            hidden_size (int): The size of the hidden state. Defaults to 4096.
            num_attention_heads (int): The number of attention heads. Defaults to 32.
            dim_head (int): The dimension of each attention head. Defaults to 128.
            dim_ff (int): The dimension of the feed-forward layer. Defaults to 10240.
            num_hidden_layers (int): The number of hidden layers. Defaults to 48.
            dropout_p (float): The dropout rate. Defaults to 0.0.
            position_bias_num_buckets (int): The number of buckets for position bias. Defaults to 512.
            position_bias_max_distance (int): The maximum distance for position bias. Defaults to 2048.
            eps (float): The epsilon value for numerical stability. Defaults to 1e-06.
            init_std (float): The standard deviation for weight initialization. Defaults to 1.0.
            prompt_types (int): The number of prompt types. Defaults to 32.
            prompt_length (int): The length of the prompt. Defaults to 32.
            segment_types (int): The number of segment types. Defaults to 32.
            use_cache (bool): Whether to use cache. Defaults to True.

        Returns:
            None.

        Raises:
            None.
        """
        """"""
        super().__init__(**kwargs)
        self.prompt_types = prompt_types
        self.prompt_length = prompt_length
        self.segment_types = segment_types
        self.hidden_size = hidden_size
        self.num_attention_heads = num_attention_heads
        self.dim_head = dim_head
        self.dim_ff = dim_ff
        self.num_hidden_layers = num_hidden_layers
        self.position_bias_num_buckets = position_bias_num_buckets
        self.position_bias_max_distance = position_bias_max_distance
        self.dropout_p = dropout_p
        self.eps = eps
        self.use_cache = use_cache
        self.vocab_size = vocab_size
        self.init_std = init_std

mindnlp.transformers.models.cpmant.configuration_cpmant.CpmAntConfig.__init__(vocab_size=30720, hidden_size=4096, num_attention_heads=32, dim_head=128, dim_ff=10240, num_hidden_layers=48, dropout_p=0.0, position_bias_num_buckets=512, position_bias_max_distance=2048, eps=1e-06, init_std=1.0, prompt_types=32, prompt_length=32, segment_types=32, use_cache=True, **kwargs)

Initializes an instance of the CpmAntConfig class.

PARAMETER DESCRIPTION
self

The instance of the CpmAntConfig class.

TYPE: CpmAntConfig

vocab_size

The size of the vocabulary. Defaults to 30720.

TYPE: int DEFAULT: 30720

hidden_size

The size of the hidden state. Defaults to 4096.

TYPE: int DEFAULT: 4096

num_attention_heads

The number of attention heads. Defaults to 32.

TYPE: int DEFAULT: 32

dim_head

The dimension of each attention head. Defaults to 128.

TYPE: int DEFAULT: 128

dim_ff

The dimension of the feed-forward layer. Defaults to 10240.

TYPE: int DEFAULT: 10240

num_hidden_layers

The number of hidden layers. Defaults to 48.

TYPE: int DEFAULT: 48

dropout_p

The dropout rate. Defaults to 0.0.

TYPE: float DEFAULT: 0.0

position_bias_num_buckets

The number of buckets for position bias. Defaults to 512.

TYPE: int DEFAULT: 512

position_bias_max_distance

The maximum distance for position bias. Defaults to 2048.

TYPE: int DEFAULT: 2048

eps

The epsilon value for numerical stability. Defaults to 1e-06.

TYPE: float DEFAULT: 1e-06

init_std

The standard deviation for weight initialization. Defaults to 1.0.

TYPE: float DEFAULT: 1.0

prompt_types

The number of prompt types. Defaults to 32.

TYPE: int DEFAULT: 32

prompt_length

The length of the prompt. Defaults to 32.

TYPE: int DEFAULT: 32

segment_types

The number of segment types. Defaults to 32.

TYPE: int DEFAULT: 32

use_cache

Whether to use cache. Defaults to True.

TYPE: bool DEFAULT: True

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/cpmant/configuration_cpmant.py
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
def __init__(
    self,
    vocab_size: int = 30720,
    hidden_size: int = 4096,
    num_attention_heads: int = 32,
    dim_head: int = 128,
    dim_ff: int = 10240,
    num_hidden_layers: int = 48,
    dropout_p: int = 0.0,
    position_bias_num_buckets: int = 512,
    position_bias_max_distance: int = 2048,
    eps: int = 1e-6,
    init_std: float = 1.0,
    prompt_types: int = 32,
    prompt_length: int = 32,
    segment_types: int = 32,
    use_cache: bool = True,
    **kwargs,
):
    """
    Initializes an instance of the CpmAntConfig class.

    Args:
        self (CpmAntConfig): The instance of the CpmAntConfig class.
        vocab_size (int): The size of the vocabulary. Defaults to 30720.
        hidden_size (int): The size of the hidden state. Defaults to 4096.
        num_attention_heads (int): The number of attention heads. Defaults to 32.
        dim_head (int): The dimension of each attention head. Defaults to 128.
        dim_ff (int): The dimension of the feed-forward layer. Defaults to 10240.
        num_hidden_layers (int): The number of hidden layers. Defaults to 48.
        dropout_p (float): The dropout rate. Defaults to 0.0.
        position_bias_num_buckets (int): The number of buckets for position bias. Defaults to 512.
        position_bias_max_distance (int): The maximum distance for position bias. Defaults to 2048.
        eps (float): The epsilon value for numerical stability. Defaults to 1e-06.
        init_std (float): The standard deviation for weight initialization. Defaults to 1.0.
        prompt_types (int): The number of prompt types. Defaults to 32.
        prompt_length (int): The length of the prompt. Defaults to 32.
        segment_types (int): The number of segment types. Defaults to 32.
        use_cache (bool): Whether to use cache. Defaults to True.

    Returns:
        None.

    Raises:
        None.
    """
    """"""
    super().__init__(**kwargs)
    self.prompt_types = prompt_types
    self.prompt_length = prompt_length
    self.segment_types = segment_types
    self.hidden_size = hidden_size
    self.num_attention_heads = num_attention_heads
    self.dim_head = dim_head
    self.dim_ff = dim_ff
    self.num_hidden_layers = num_hidden_layers
    self.position_bias_num_buckets = position_bias_num_buckets
    self.position_bias_max_distance = position_bias_max_distance
    self.dropout_p = dropout_p
    self.eps = eps
    self.use_cache = use_cache
    self.vocab_size = vocab_size
    self.init_std = init_std

mindnlp.transformers.models.cpmant.tokenization_cpmant

Tokenization classes for CPMAnt.

mindnlp.transformers.models.cpmant.tokenization_cpmant.CpmAntTokenizer

Bases: PreTrainedTokenizer

Construct a CPMAnt tokenizer. Based on byte-level Byte-Pair-Encoding.

PARAMETER DESCRIPTION
vocab_file

Path to the vocabulary file.

TYPE: `str`

bod_token

The beginning of document token.

TYPE: `str`, *optional*, defaults to `"<d>"` DEFAULT: '<d>'

eod_token

The end of document token.

TYPE: `str`, *optional*, defaults to `"</d>"` DEFAULT: '</d>'

bos_token

The beginning of sequence token.

TYPE: `str`, *optional*, defaults to `"<s>"` DEFAULT: '<s>'

eos_token

The end of sequence token.

TYPE: `str`, *optional*, defaults to `"</s>"` DEFAULT: '</s>'

pad_token

The token used for padding.

TYPE: `str`, *optional*, defaults to `"<pad>"` DEFAULT: '<pad>'

unk_token

The unknown token.

TYPE: `str`, *optional*, defaults to `"<unk>"` DEFAULT: '<unk>'

line_token

The line token.

TYPE: `str`, *optional*, defaults to `"</n>"` DEFAULT: '</n>'

space_token

The space token.

TYPE: `str`, *optional*, defaults to `"</_>"` DEFAULT: '</_>'

Source code in mindnlp/transformers/models/cpmant/tokenization_cpmant.py
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
class CpmAntTokenizer(PreTrainedTokenizer):
    """
    Construct a CPMAnt tokenizer. Based on byte-level Byte-Pair-Encoding.

    Args:
        vocab_file (`str`):
            Path to the vocabulary file.
        bod_token (`str`, *optional*, defaults to `"<d>"`):
            The beginning of document token.
        eod_token (`str`, *optional*, defaults to `"</d>"`):
            The end of document token.
        bos_token (`str`, *optional*, defaults to `"<s>"`):
            The beginning of sequence token.
        eos_token (`str`, *optional*, defaults to `"</s>"`):
            The end of sequence token.
        pad_token (`str`, *optional*, defaults to `"<pad>"`):
            The token used for padding.
        unk_token (`str`, *optional*, defaults to `"<unk>"`):
            The unknown token.
        line_token (`str`, *optional*, defaults to `"</n>"`):
            The line token.
        space_token (`str`, *optional*, defaults to `"</_>"`):
            The space token.
    """
    vocab_files_names = VOCAB_FILES_NAMES
    pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP
    max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
    model_input_names = ["input_ids", "attention_mask"]
    add_prefix_space = False

    def __init__(
        self,
        vocab_file,
        bod_token="<d>",
        eod_token="</d>",
        bos_token="<s>",
        eos_token="</s>",
        pad_token="<pad>",
        unk_token="<unk>",
        line_token="</n>",
        space_token="</_>",
        padding_side="left",
        **kwargs,
    ):
        """
        Initialize a CpmAntTokenizer object with the provided parameters.

        Args:
            vocab_file (str): The path to the vocabulary file to load.
            bod_token (str, optional): Beginning of document token (default is '<d>').
            eod_token (str, optional): End of document token (default is '</d>').
            bos_token (str, optional): Beginning of sentence token (default is '<s>').
            eos_token (str, optional): End of sentence token (default is '</s>').
            pad_token (str, optional): Padding token (default is '<pad>').
            unk_token (str, optional): Token for unknown words (default is '<unk>').
            line_token (str, optional): Line break token (default is '</n>').
            space_token (str, optional): Space token (default is '</_>').
            padding_side (str, optional): Side for padding (default is 'left').

        Returns:
            None.

        Raises:
            MissingBackendError: If required backend 'jieba' is not available.
            FileNotFoundError: If the specified 'vocab_file' does not exist.
            KeyError: If 'space_token' or 'line_token' are missing in the loaded vocabulary.
            Exception: Any other unforeseen error that may occur during initialization.
        """
        requires_backends(self, ["jieba"])
        self.bod_token = bod_token
        self.eod_token = eod_token
        self.encoder = load_vocab(vocab_file)
        self.encoder[" "] = self.encoder[space_token]
        self.encoder["\n"] = self.encoder[line_token]

        del self.encoder[space_token]
        del self.encoder[line_token]

        self.encoder = collections.OrderedDict(sorted(self.encoder.items(), key=lambda x: x[1]))
        self.decoder = {v: k for k, v in self.encoder.items()}

        self.wordpiece_tokenizer = WordpieceTokenizer(vocab=self.encoder, unk_token=unk_token)

        super().__init__(
            bod_token=bod_token,
            eod_token=eod_token,
            bos_token=bos_token,
            eos_token=eos_token,
            pad_token=pad_token,
            unk_token=unk_token,
            line_token=line_token,
            space_token=space_token,
            padding_side=padding_side,
            **kwargs,
        )

    @property
    def bod_token_id(self):
        """
        This method, 'bod_token_id', is a property method defined in the 'CpmAntTokenizer' class.
        It takes no external parameters and returns the token ID associated with the 'bod_token'.

        Args:
            self (CpmAntTokenizer): The instance of the CpmAntTokenizer class.

        Returns:
            None.

        Raises:
            None.
        """
        return self.encoder[self.bod_token]

    @property
    def eod_token_id(self):
        """
        This method 'eod_token_id' in the class 'CpmAntTokenizer' retrieves the token ID of the end-of-document token.

        Args:
            self: An instance of the class CpmAntTokenizer.
                It is required as this method is part of the class and needs access to its attributes and methods.

        Returns:
            None: This method returns a value of type None.
                It retrieves the token ID of the end-of-document token from the encoder attribute of the class instance.

        Raises:
            None.
        """
        return self.encoder[self.eod_token]

    @property
    def newline_id(self):
        r"""
        This method, newline_id, in the class CpmAntTokenizer, returns the value associated with the newline character in the encoder.

        Args:
            self (CpmAntTokenizer): The instance of the CpmAntTokenizer class.

        Returns:
            None.

        Raises:
            KeyError: If the newline character `'\n'` is not found in the encoder dictionary, a KeyError is raised.
        """
        return self.encoder["\n"]

    @property
    def vocab_size(self) -> int:
        """
        Returns the size of the vocabulary used by the CpmAntTokenizer instance.

        Args:
            self: The CpmAntTokenizer instance itself.

        Returns:
            int: The number of unique tokens in the vocabulary.

        Raises:
            None.
        """
        return len(self.encoder)

    def get_vocab(self):
        """
        Retrieves the vocabulary of the CpmAntTokenizer instance.

        Args:
            self (CpmAntTokenizer): The instance of CpmAntTokenizer.

        Returns:
            dict: The vocabulary of the tokenizer, which is a dictionary mapping tokens to their corresponding IDs.

        Raises:
            None.

        Example:
            ```python
            >>> tokenizer = CpmAntTokenizer()
            >>> vocab = tokenizer.get_vocab()
            >>> vocab
            {'<pad>': 0, '<unk>': 1, '<s>': 2, '</s>': 3, ...}
            ```
        """
        return dict(self.encoder, **self.added_tokens_encoder)

    def _tokenize(self, text):
        """Tokenize a string."""
        output_tokens = []
        for x in jieba.cut(text, cut_all=False):
            output_tokens.extend(self.wordpiece_tokenizer.tokenize(x))
        return output_tokens

    def _decode(self, token_ids, **kwargs):
        """Decode ids into a string."""
        token_ids = [i for i in token_ids if i >= 0]
        token_ids = [
            x for x in token_ids if x not in (self.pad_token_id, self.eos_token_id, self.bos_token_id)
        ]
        return super()._decode(token_ids, **kwargs)

    def check(self, token):
        """
        Check if a token is present in the encoder of the CpmAntTokenizer.

        Args:
            self (CpmAntTokenizer): An instance of the CpmAntTokenizer class.
            token (Any): The token to be checked.

        Returns:
            None.

        Raises:
            None.
        """
        return token in self.encoder

    def convert_tokens_to_string(self, tokens: List[str]) -> str:
        """
        Converts a list of tokens into a string representation.

        Args:
            self (CpmAntTokenizer): An instance of the CpmAntTokenizer class.
            tokens (List[str]): A list of tokens to be converted into a string representation.

        Returns:
            str: A string representation of the tokens.

        Raises:
            None.

        Note:
            - The tokens should be provided as a list of strings.
            - The method will join the tokens together using an empty string as a separator.

        Example:
            ```python
            >>> tokenizer = CpmAntTokenizer()
            >>> tokens = ['Hello', 'world', '!']
            >>> tokenizer.convert_tokens_to_string(tokens)
            'Hello world!'
            ```
        """
        return "".join(tokens)

    def _convert_token_to_id(self, token):
        """Converts a token (str) in an id using the vocab."""
        return self.encoder.get(token, self.encoder.get(self.unk_token))

    def _convert_id_to_token(self, index):
        """Converts an index (integer) in a token (str) using the vocab."""
        return self.decoder.get(index, self.unk_token)

    def save_vocabulary(self, save_directory: str, filename_prefix: Optional[str] = None) -> Tuple[str]:
        """
        Save the vocabulary to a file with the specified directory and filename prefix.

        Args:
            self: Instance of the CpmAntTokenizer class.
            save_directory (str): The directory where the vocabulary file will be saved.
            filename_prefix (Optional[str]): A string to be prefixed to the filename. Defaults to None.

        Returns:
            Tuple[str]: A tuple containing the path to the saved vocabulary file.

        Raises:
            None.
        """
        if os.path.isdir(save_directory):
            vocab_file = os.path.join(
                save_directory, (filename_prefix + "-" if filename_prefix else "") + VOCAB_FILES_NAMES["vocab_file"]
            )
        else:
            vocab_file = (filename_prefix + "-" if filename_prefix else "") + save_directory
        index = 0
        if " " in self.encoder:
            self.encoder["</_>"] = self.encoder[" "]
            del self.encoder[" "]
        if "\n" in self.encoder:
            self.encoder["</n>"] = self.encoder["\n"]
            del self.encoder["\n"]
        self.encoder = collections.OrderedDict(sorted(self.encoder.items(), key=lambda x: x[1]))
        with open(vocab_file, "w", encoding="utf-8") as writer:
            for token, token_index in self.encoder.items():
                if index != token_index:
                    logger.warning(
                        f"Saving vocabulary to {vocab_file}: vocabulary indices are not consecutive."
                        " Please check that the vocabulary is not corrupted!"
                    )
                    index = token_index
                writer.write(token + "\n")
                index += 1
        return (vocab_file,)

    def build_inputs_with_special_tokens(self, token_ids_0: List[int], token_ids_1: List[int] = None) -> List[int]:
        """
        Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and
        adding special tokens. A CPMAnt sequence has the following format:

        - single sequence: `[BOS] Sequence`.

        Args:
            token_ids_0 (`List[int]`): The first tokenized sequence that special tokens will be added.
            token_ids_1 (`List[int]`): The optional second tokenized sequence that special tokens will be added.

        Returns:
            `List[int]`: The model input with special tokens.
        """
        if token_ids_1 is None:
            return [self.bos_token_id] + token_ids_0
        return [self.bos_token_id] + token_ids_0 + [self.bos_token_id] + token_ids_1

    def get_special_tokens_mask(
        self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None, already_has_special_tokens: bool = False
    ) -> List[int]:
        """
        Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding
        special tokens using the tokenizer `prepare_for_model` method.

        Args:
            token_ids_0 (`List[int]`): List of IDs.
            token_ids_1 (`List[int]`, *optional*): Optional second list of IDs for sequence pairs.
            already_has_special_tokens (`bool`, *optional*, defaults to `False`):
                Whether or not the token list is already formatted with special tokens for the model.

        Returns:
            `List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
        """
        if already_has_special_tokens:
            return super().get_special_tokens_mask(
                token_ids_0=token_ids_0, token_ids_1=token_ids_1, already_has_special_tokens=True
            )

        if token_ids_1 is not None:
            return [1] + ([0] * len(token_ids_0)) + [1] + ([0] * len(token_ids_1))
        return [1] + ([0] * len(token_ids_0))

mindnlp.transformers.models.cpmant.tokenization_cpmant.CpmAntTokenizer.bod_token_id property

This method, 'bod_token_id', is a property method defined in the 'CpmAntTokenizer' class. It takes no external parameters and returns the token ID associated with the 'bod_token'.

PARAMETER DESCRIPTION
self

The instance of the CpmAntTokenizer class.

TYPE: CpmAntTokenizer

RETURNS DESCRIPTION

None.

mindnlp.transformers.models.cpmant.tokenization_cpmant.CpmAntTokenizer.eod_token_id property

This method 'eod_token_id' in the class 'CpmAntTokenizer' retrieves the token ID of the end-of-document token.

PARAMETER DESCRIPTION
self

An instance of the class CpmAntTokenizer. It is required as this method is part of the class and needs access to its attributes and methods.

RETURNS DESCRIPTION
None

This method returns a value of type None. It retrieves the token ID of the end-of-document token from the encoder attribute of the class instance.

mindnlp.transformers.models.cpmant.tokenization_cpmant.CpmAntTokenizer.newline_id property

This method, newline_id, in the class CpmAntTokenizer, returns the value associated with the newline character in the encoder.

PARAMETER DESCRIPTION
self

The instance of the CpmAntTokenizer class.

TYPE: CpmAntTokenizer

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
KeyError

If the newline character '\n' is not found in the encoder dictionary, a KeyError is raised.

mindnlp.transformers.models.cpmant.tokenization_cpmant.CpmAntTokenizer.vocab_size: int property

Returns the size of the vocabulary used by the CpmAntTokenizer instance.

PARAMETER DESCRIPTION
self

The CpmAntTokenizer instance itself.

RETURNS DESCRIPTION
int

The number of unique tokens in the vocabulary.

TYPE: int

mindnlp.transformers.models.cpmant.tokenization_cpmant.CpmAntTokenizer.__init__(vocab_file, bod_token='<d>', eod_token='</d>', bos_token='<s>', eos_token='</s>', pad_token='<pad>', unk_token='<unk>', line_token='</n>', space_token='</_>', padding_side='left', **kwargs)

Initialize a CpmAntTokenizer object with the provided parameters.

PARAMETER DESCRIPTION
vocab_file

The path to the vocabulary file to load.

TYPE: str

bod_token

Beginning of document token (default is '').

TYPE: str DEFAULT: '<d>'

eod_token

End of document token (default is '').

TYPE: str DEFAULT: '</d>'

bos_token

Beginning of sentence token (default is '').

TYPE: str DEFAULT: '<s>'

eos_token

End of sentence token (default is '').

TYPE: str DEFAULT: '</s>'

pad_token

Padding token (default is '').

TYPE: str DEFAULT: '<pad>'

unk_token

Token for unknown words (default is '').

TYPE: str DEFAULT: '<unk>'

line_token

Line break token (default is '').

TYPE: str DEFAULT: '</n>'

space_token

Space token (default is '</_>').

TYPE: str DEFAULT: '</_>'

padding_side

Side for padding (default is 'left').

TYPE: str DEFAULT: 'left'

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
MissingBackendError

If required backend 'jieba' is not available.

FileNotFoundError

If the specified 'vocab_file' does not exist.

KeyError

If 'space_token' or 'line_token' are missing in the loaded vocabulary.

Exception

Any other unforeseen error that may occur during initialization.

Source code in mindnlp/transformers/models/cpmant/tokenization_cpmant.py
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
def __init__(
    self,
    vocab_file,
    bod_token="<d>",
    eod_token="</d>",
    bos_token="<s>",
    eos_token="</s>",
    pad_token="<pad>",
    unk_token="<unk>",
    line_token="</n>",
    space_token="</_>",
    padding_side="left",
    **kwargs,
):
    """
    Initialize a CpmAntTokenizer object with the provided parameters.

    Args:
        vocab_file (str): The path to the vocabulary file to load.
        bod_token (str, optional): Beginning of document token (default is '<d>').
        eod_token (str, optional): End of document token (default is '</d>').
        bos_token (str, optional): Beginning of sentence token (default is '<s>').
        eos_token (str, optional): End of sentence token (default is '</s>').
        pad_token (str, optional): Padding token (default is '<pad>').
        unk_token (str, optional): Token for unknown words (default is '<unk>').
        line_token (str, optional): Line break token (default is '</n>').
        space_token (str, optional): Space token (default is '</_>').
        padding_side (str, optional): Side for padding (default is 'left').

    Returns:
        None.

    Raises:
        MissingBackendError: If required backend 'jieba' is not available.
        FileNotFoundError: If the specified 'vocab_file' does not exist.
        KeyError: If 'space_token' or 'line_token' are missing in the loaded vocabulary.
        Exception: Any other unforeseen error that may occur during initialization.
    """
    requires_backends(self, ["jieba"])
    self.bod_token = bod_token
    self.eod_token = eod_token
    self.encoder = load_vocab(vocab_file)
    self.encoder[" "] = self.encoder[space_token]
    self.encoder["\n"] = self.encoder[line_token]

    del self.encoder[space_token]
    del self.encoder[line_token]

    self.encoder = collections.OrderedDict(sorted(self.encoder.items(), key=lambda x: x[1]))
    self.decoder = {v: k for k, v in self.encoder.items()}

    self.wordpiece_tokenizer = WordpieceTokenizer(vocab=self.encoder, unk_token=unk_token)

    super().__init__(
        bod_token=bod_token,
        eod_token=eod_token,
        bos_token=bos_token,
        eos_token=eos_token,
        pad_token=pad_token,
        unk_token=unk_token,
        line_token=line_token,
        space_token=space_token,
        padding_side=padding_side,
        **kwargs,
    )

mindnlp.transformers.models.cpmant.tokenization_cpmant.CpmAntTokenizer.build_inputs_with_special_tokens(token_ids_0, token_ids_1=None)

Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and adding special tokens. A CPMAnt sequence has the following format:

  • single sequence: [BOS] Sequence.
PARAMETER DESCRIPTION
token_ids_0

The first tokenized sequence that special tokens will be added.

TYPE: `List[int]`

token_ids_1

The optional second tokenized sequence that special tokens will be added.

TYPE: `List[int]` DEFAULT: None

RETURNS DESCRIPTION
List[int]

List[int]: The model input with special tokens.

Source code in mindnlp/transformers/models/cpmant/tokenization_cpmant.py
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
def build_inputs_with_special_tokens(self, token_ids_0: List[int], token_ids_1: List[int] = None) -> List[int]:
    """
    Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and
    adding special tokens. A CPMAnt sequence has the following format:

    - single sequence: `[BOS] Sequence`.

    Args:
        token_ids_0 (`List[int]`): The first tokenized sequence that special tokens will be added.
        token_ids_1 (`List[int]`): The optional second tokenized sequence that special tokens will be added.

    Returns:
        `List[int]`: The model input with special tokens.
    """
    if token_ids_1 is None:
        return [self.bos_token_id] + token_ids_0
    return [self.bos_token_id] + token_ids_0 + [self.bos_token_id] + token_ids_1

mindnlp.transformers.models.cpmant.tokenization_cpmant.CpmAntTokenizer.check(token)

Check if a token is present in the encoder of the CpmAntTokenizer.

PARAMETER DESCRIPTION
self

An instance of the CpmAntTokenizer class.

TYPE: CpmAntTokenizer

token

The token to be checked.

TYPE: Any

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/cpmant/tokenization_cpmant.py
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
def check(self, token):
    """
    Check if a token is present in the encoder of the CpmAntTokenizer.

    Args:
        self (CpmAntTokenizer): An instance of the CpmAntTokenizer class.
        token (Any): The token to be checked.

    Returns:
        None.

    Raises:
        None.
    """
    return token in self.encoder

mindnlp.transformers.models.cpmant.tokenization_cpmant.CpmAntTokenizer.convert_tokens_to_string(tokens)

Converts a list of tokens into a string representation.

PARAMETER DESCRIPTION
self

An instance of the CpmAntTokenizer class.

TYPE: CpmAntTokenizer

tokens

A list of tokens to be converted into a string representation.

TYPE: List[str]

RETURNS DESCRIPTION
str

A string representation of the tokens.

TYPE: str

Note
  • The tokens should be provided as a list of strings.
  • The method will join the tokens together using an empty string as a separator.
Example
>>> tokenizer = CpmAntTokenizer()
>>> tokens = ['Hello', 'world', '!']
>>> tokenizer.convert_tokens_to_string(tokens)
'Hello world!'
Source code in mindnlp/transformers/models/cpmant/tokenization_cpmant.py
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
def convert_tokens_to_string(self, tokens: List[str]) -> str:
    """
    Converts a list of tokens into a string representation.

    Args:
        self (CpmAntTokenizer): An instance of the CpmAntTokenizer class.
        tokens (List[str]): A list of tokens to be converted into a string representation.

    Returns:
        str: A string representation of the tokens.

    Raises:
        None.

    Note:
        - The tokens should be provided as a list of strings.
        - The method will join the tokens together using an empty string as a separator.

    Example:
        ```python
        >>> tokenizer = CpmAntTokenizer()
        >>> tokens = ['Hello', 'world', '!']
        >>> tokenizer.convert_tokens_to_string(tokens)
        'Hello world!'
        ```
    """
    return "".join(tokens)

mindnlp.transformers.models.cpmant.tokenization_cpmant.CpmAntTokenizer.get_special_tokens_mask(token_ids_0, token_ids_1=None, already_has_special_tokens=False)

Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding special tokens using the tokenizer prepare_for_model method.

PARAMETER DESCRIPTION
token_ids_0

List of IDs.

TYPE: `List[int]`

token_ids_1

Optional second list of IDs for sequence pairs.

TYPE: `List[int]`, *optional* DEFAULT: None

already_has_special_tokens

Whether or not the token list is already formatted with special tokens for the model.

TYPE: `bool`, *optional*, defaults to `False` DEFAULT: False

RETURNS DESCRIPTION
List[int]

List[int]: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.

Source code in mindnlp/transformers/models/cpmant/tokenization_cpmant.py
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
def get_special_tokens_mask(
    self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None, already_has_special_tokens: bool = False
) -> List[int]:
    """
    Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding
    special tokens using the tokenizer `prepare_for_model` method.

    Args:
        token_ids_0 (`List[int]`): List of IDs.
        token_ids_1 (`List[int]`, *optional*): Optional second list of IDs for sequence pairs.
        already_has_special_tokens (`bool`, *optional*, defaults to `False`):
            Whether or not the token list is already formatted with special tokens for the model.

    Returns:
        `List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
    """
    if already_has_special_tokens:
        return super().get_special_tokens_mask(
            token_ids_0=token_ids_0, token_ids_1=token_ids_1, already_has_special_tokens=True
        )

    if token_ids_1 is not None:
        return [1] + ([0] * len(token_ids_0)) + [1] + ([0] * len(token_ids_1))
    return [1] + ([0] * len(token_ids_0))

mindnlp.transformers.models.cpmant.tokenization_cpmant.CpmAntTokenizer.get_vocab()

Retrieves the vocabulary of the CpmAntTokenizer instance.

PARAMETER DESCRIPTION
self

The instance of CpmAntTokenizer.

TYPE: CpmAntTokenizer

RETURNS DESCRIPTION
dict

The vocabulary of the tokenizer, which is a dictionary mapping tokens to their corresponding IDs.

Example
>>> tokenizer = CpmAntTokenizer()
>>> vocab = tokenizer.get_vocab()
>>> vocab
{'<pad>': 0, '<unk>': 1, '<s>': 2, '</s>': 3, ...}
Source code in mindnlp/transformers/models/cpmant/tokenization_cpmant.py
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
def get_vocab(self):
    """
    Retrieves the vocabulary of the CpmAntTokenizer instance.

    Args:
        self (CpmAntTokenizer): The instance of CpmAntTokenizer.

    Returns:
        dict: The vocabulary of the tokenizer, which is a dictionary mapping tokens to their corresponding IDs.

    Raises:
        None.

    Example:
        ```python
        >>> tokenizer = CpmAntTokenizer()
        >>> vocab = tokenizer.get_vocab()
        >>> vocab
        {'<pad>': 0, '<unk>': 1, '<s>': 2, '</s>': 3, ...}
        ```
    """
    return dict(self.encoder, **self.added_tokens_encoder)

mindnlp.transformers.models.cpmant.tokenization_cpmant.CpmAntTokenizer.save_vocabulary(save_directory, filename_prefix=None)

Save the vocabulary to a file with the specified directory and filename prefix.

PARAMETER DESCRIPTION
self

Instance of the CpmAntTokenizer class.

save_directory

The directory where the vocabulary file will be saved.

TYPE: str

filename_prefix

A string to be prefixed to the filename. Defaults to None.

TYPE: Optional[str] DEFAULT: None

RETURNS DESCRIPTION
Tuple[str]

Tuple[str]: A tuple containing the path to the saved vocabulary file.

Source code in mindnlp/transformers/models/cpmant/tokenization_cpmant.py
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
def save_vocabulary(self, save_directory: str, filename_prefix: Optional[str] = None) -> Tuple[str]:
    """
    Save the vocabulary to a file with the specified directory and filename prefix.

    Args:
        self: Instance of the CpmAntTokenizer class.
        save_directory (str): The directory where the vocabulary file will be saved.
        filename_prefix (Optional[str]): A string to be prefixed to the filename. Defaults to None.

    Returns:
        Tuple[str]: A tuple containing the path to the saved vocabulary file.

    Raises:
        None.
    """
    if os.path.isdir(save_directory):
        vocab_file = os.path.join(
            save_directory, (filename_prefix + "-" if filename_prefix else "") + VOCAB_FILES_NAMES["vocab_file"]
        )
    else:
        vocab_file = (filename_prefix + "-" if filename_prefix else "") + save_directory
    index = 0
    if " " in self.encoder:
        self.encoder["</_>"] = self.encoder[" "]
        del self.encoder[" "]
    if "\n" in self.encoder:
        self.encoder["</n>"] = self.encoder["\n"]
        del self.encoder["\n"]
    self.encoder = collections.OrderedDict(sorted(self.encoder.items(), key=lambda x: x[1]))
    with open(vocab_file, "w", encoding="utf-8") as writer:
        for token, token_index in self.encoder.items():
            if index != token_index:
                logger.warning(
                    f"Saving vocabulary to {vocab_file}: vocabulary indices are not consecutive."
                    " Please check that the vocabulary is not corrupted!"
                )
                index = token_index
            writer.write(token + "\n")
            index += 1
    return (vocab_file,)

mindnlp.transformers.models.cpmant.tokenization_cpmant.WordpieceTokenizer

The WordpieceTokenizer class represents a tokenizer that tokenizes input text into subword tokens using the WordPiece algorithm.

ATTRIBUTE DESCRIPTION
vocab

A dictionary containing the vocabulary of subword tokens.

TYPE: dict

unk_token

The token to be used for out-of-vocabulary or unknown words.

TYPE: str

max_input_chars_per_word

The maximum number of input characters per word for tokenization.

TYPE: int

METHOD DESCRIPTION
tokenize

Tokenizes the input token into subword tokens using the WordPiece algorithm and the specified vocabulary.

Example
>>> vocab = {'hello': 'he', 'world': 'wo', 'hello,': 'hello'}
>>> tokenizer = WordpieceTokenizer(vocab, '<unk>', 200)
>>> tokenized_text = tokenizer.tokenize('helloworld')
Source code in mindnlp/transformers/models/cpmant/tokenization_cpmant.py
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
class WordpieceTokenizer:

    """
    The WordpieceTokenizer class represents a tokenizer that tokenizes input text into subword tokens using the WordPiece algorithm.

    Attributes:
        vocab (dict): A dictionary containing the vocabulary of subword tokens.
        unk_token (str): The token to be used for out-of-vocabulary or unknown words.
        max_input_chars_per_word (int): The maximum number of input characters per word for tokenization.

    Methods:
        tokenize(token):
            Tokenizes the input token into subword tokens using the WordPiece algorithm and the specified vocabulary.

    Example:
        ```python
        >>> vocab = {'hello': 'he', 'world': 'wo', 'hello,': 'hello'}
        >>> tokenizer = WordpieceTokenizer(vocab, '<unk>', 200)
        >>> tokenized_text = tokenizer.tokenize('helloworld')
        ```
    """
    def __init__(self, vocab, unk_token="<unk>", max_input_chars_per_word=200):
        """
        Initializes a new instance of the WordpieceTokenizer class.

        Args:
            self (WordpieceTokenizer): The current instance of the WordpieceTokenizer class.
            vocab (list): A list of strings representing the vocabulary for the tokenizer.
            unk_token (str, optional): The token to use for unknown words. Defaults to '<unk>'.
            max_input_chars_per_word (int, optional): The maximum number of characters allowed per word. Defaults to 200.

        Returns:
            None

        Raises:
            None.

        This method initializes the WordpieceTokenizer object with the provided vocabulary, unknown token, and maximum input characters per word.
        The vocabulary is a list of strings that represents the set of tokens used by the tokenizer.
        The unk_token parameter allows customization of the token used to represent unknown words. If not provided, it defaults to '<unk>'.
        The max_input_chars_per_word parameter limits the number of characters allowed per word.
        If a word exceeds this limit, it will be split into subwords.

        Example:
            ```python
            >>> tokenizer = WordpieceTokenizer(vocab=['hello', 'world'], unk_token='<unk>', max_input_chars_per_word=200)
            ```
        """
        self.vocab = vocab
        self.unk_token = unk_token
        self.max_input_chars_per_word = max_input_chars_per_word

    def tokenize(self, token):
        """
        This method tokenizes a given input token into sub-tokens based on the vocabulary of the WordpieceTokenizer class.

        Args:
            self (WordpieceTokenizer): The instance of the WordpieceTokenizer class.
                It is used to access the vocabulary and maximum input characters per word.
            token (str): The input token to be tokenized.
                It represents the word to be broken down into sub-tokens.
                Must be a string.

        Returns:
            list: A list of sub-tokens generated from the input token based on the vocabulary.
                If the length of the input token exceeds the maximum allowed characters per word,
                it returns a list containing the unknown token (unk_token).
                Otherwise, it returns a list of sub-tokens that are part of the vocabulary or the unknown token.

        Raises:
            None
        """
        chars = list(token)
        if len(chars) > self.max_input_chars_per_word:
            return [self.unk_token]

        start = 0
        sub_tokens = []
        while start < len(chars):
            end = len(chars)
            cur_substr = None
            while start < end:
                substr = "".join(chars[start:end])
                if substr in self.vocab:
                    cur_substr = substr
                    break
                end -= 1
            if cur_substr is None:
                sub_tokens.append(self.unk_token)
                start += 1
            else:
                sub_tokens.append(cur_substr)
                start = end

        return sub_tokens

mindnlp.transformers.models.cpmant.tokenization_cpmant.WordpieceTokenizer.__init__(vocab, unk_token='<unk>', max_input_chars_per_word=200)

Initializes a new instance of the WordpieceTokenizer class.

PARAMETER DESCRIPTION
self

The current instance of the WordpieceTokenizer class.

TYPE: WordpieceTokenizer

vocab

A list of strings representing the vocabulary for the tokenizer.

TYPE: list

unk_token

The token to use for unknown words. Defaults to ''.

TYPE: str DEFAULT: '<unk>'

max_input_chars_per_word

The maximum number of characters allowed per word. Defaults to 200.

TYPE: int DEFAULT: 200

RETURNS DESCRIPTION

None

This method initializes the WordpieceTokenizer object with the provided vocabulary, unknown token, and maximum input characters per word. The vocabulary is a list of strings that represents the set of tokens used by the tokenizer. The unk_token parameter allows customization of the token used to represent unknown words. If not provided, it defaults to ''. The max_input_chars_per_word parameter limits the number of characters allowed per word. If a word exceeds this limit, it will be split into subwords.

Example
>>> tokenizer = WordpieceTokenizer(vocab=['hello', 'world'], unk_token='<unk>', max_input_chars_per_word=200)
Source code in mindnlp/transformers/models/cpmant/tokenization_cpmant.py
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
def __init__(self, vocab, unk_token="<unk>", max_input_chars_per_word=200):
    """
    Initializes a new instance of the WordpieceTokenizer class.

    Args:
        self (WordpieceTokenizer): The current instance of the WordpieceTokenizer class.
        vocab (list): A list of strings representing the vocabulary for the tokenizer.
        unk_token (str, optional): The token to use for unknown words. Defaults to '<unk>'.
        max_input_chars_per_word (int, optional): The maximum number of characters allowed per word. Defaults to 200.

    Returns:
        None

    Raises:
        None.

    This method initializes the WordpieceTokenizer object with the provided vocabulary, unknown token, and maximum input characters per word.
    The vocabulary is a list of strings that represents the set of tokens used by the tokenizer.
    The unk_token parameter allows customization of the token used to represent unknown words. If not provided, it defaults to '<unk>'.
    The max_input_chars_per_word parameter limits the number of characters allowed per word.
    If a word exceeds this limit, it will be split into subwords.

    Example:
        ```python
        >>> tokenizer = WordpieceTokenizer(vocab=['hello', 'world'], unk_token='<unk>', max_input_chars_per_word=200)
        ```
    """
    self.vocab = vocab
    self.unk_token = unk_token
    self.max_input_chars_per_word = max_input_chars_per_word

mindnlp.transformers.models.cpmant.tokenization_cpmant.WordpieceTokenizer.tokenize(token)

This method tokenizes a given input token into sub-tokens based on the vocabulary of the WordpieceTokenizer class.

PARAMETER DESCRIPTION
self

The instance of the WordpieceTokenizer class. It is used to access the vocabulary and maximum input characters per word.

TYPE: WordpieceTokenizer

token

The input token to be tokenized. It represents the word to be broken down into sub-tokens. Must be a string.

TYPE: str

RETURNS DESCRIPTION
list

A list of sub-tokens generated from the input token based on the vocabulary. If the length of the input token exceeds the maximum allowed characters per word, it returns a list containing the unknown token (unk_token). Otherwise, it returns a list of sub-tokens that are part of the vocabulary or the unknown token.

Source code in mindnlp/transformers/models/cpmant/tokenization_cpmant.py
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
def tokenize(self, token):
    """
    This method tokenizes a given input token into sub-tokens based on the vocabulary of the WordpieceTokenizer class.

    Args:
        self (WordpieceTokenizer): The instance of the WordpieceTokenizer class.
            It is used to access the vocabulary and maximum input characters per word.
        token (str): The input token to be tokenized.
            It represents the word to be broken down into sub-tokens.
            Must be a string.

    Returns:
        list: A list of sub-tokens generated from the input token based on the vocabulary.
            If the length of the input token exceeds the maximum allowed characters per word,
            it returns a list containing the unknown token (unk_token).
            Otherwise, it returns a list of sub-tokens that are part of the vocabulary or the unknown token.

    Raises:
        None
    """
    chars = list(token)
    if len(chars) > self.max_input_chars_per_word:
        return [self.unk_token]

    start = 0
    sub_tokens = []
    while start < len(chars):
        end = len(chars)
        cur_substr = None
        while start < end:
            substr = "".join(chars[start:end])
            if substr in self.vocab:
                cur_substr = substr
                break
            end -= 1
        if cur_substr is None:
            sub_tokens.append(self.unk_token)
            start += 1
        else:
            sub_tokens.append(cur_substr)
            start = end

    return sub_tokens

mindnlp.transformers.models.cpmant.tokenization_cpmant.load_vocab(vocab_file)

Loads a vocabulary file into a dictionary.

Source code in mindnlp/transformers/models/cpmant/tokenization_cpmant.py
45
46
47
48
49
50
51
52
53
def load_vocab(vocab_file):
    """Loads a vocabulary file into a dictionary."""
    vocab = collections.OrderedDict()
    with open(vocab_file, "r", encoding="utf-8") as reader:
        tokens = reader.readlines()
    for index, token in enumerate(tokens):
        token = token.rstrip("\n")
        vocab[token] = index
    return vocab

mindnlp.transformers.models.cpmant.modeling_cpmant

MindSpore CPMAnt

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntAttention

Bases: Module

This class represents the CpmAntAttention module, which is a component of the CpmAnt model. It performs the self-attention mechanism in the transformer block.

The CpmAntAttention module inherits from the nn.Module class and initializes with a config object of type CpmAntConfig.

ATTRIBUTE DESCRIPTION
dim_model

The hidden size of the model.

TYPE: int

num_heads

The number of attention heads.

TYPE: int

dim_head

The dimension of each attention head.

TYPE: int

project_q

The linear transformation layer for query projection.

TYPE: Linear

project_k

The linear transformation layer for key projection.

TYPE: Linear

project_v

The linear transformation layer for value projection.

TYPE: Linear

attention_out

The linear transformation layer for output projection.

TYPE: Linear

softmax

The softmax activation function for attention scores.

TYPE: Softmax

dropout

The dropout layer, if configured.

TYPE: Dropout

METHOD DESCRIPTION
forward

Constructs the self-attention block of the transformer.

Args:

  • hidden_q (mindspore.Tensor): The input tensor for the self-attention block.
  • hidden_kv (mindspore.Tensor): The tensor for key-value projection.
  • attention_mask (mindspore.Tensor): The mask tensor to avoid invalid areas in self-attention.
  • position_bias (mindspore.Tensor): The positional information tensor for self-attention.
  • output_attentions (bool, optional): Whether or not to return the attentions tensors of all attention layers.
  • past_key_values (Tuple[mindspore.Tensor, mindspore.Tensor], optional): Cached past key and value projection states.
  • use_cache (bool, optional): Whether to use cached key-value states to speed up decoding.

Returns:

  • score (mindspore.Tensor): The output attention score tensor.
  • attn_weights (mindspore.Tensor): The attention weights tensor, if output_attentions is set to True.
  • past_key_values (Tuple[mindspore.Tensor, mindspore.Tensor]): The cached key-value states, if use_cache is set to True.
Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
class CpmAntAttention(nn.Module):

    """
    This class represents the CpmAntAttention module, which is a component of the CpmAnt model.
    It performs the self-attention mechanism in the transformer block.

    The CpmAntAttention module inherits from the nn.Module class and initializes with a config object of type CpmAntConfig.

    Attributes:
        dim_model (int): The hidden size of the model.
        num_heads (int): The number of attention heads.
        dim_head (int): The dimension of each attention head.
        project_q (nn.Linear): The linear transformation layer for query projection.
        project_k (nn.Linear): The linear transformation layer for key projection.
        project_v (nn.Linear): The linear transformation layer for value projection.
        attention_out (nn.Linear): The linear transformation layer for output projection.
        softmax (nn.Softmax): The softmax activation function for attention scores.
        dropout (nn.Dropout): The dropout layer, if configured.

    Methods:
        forward(hidden_q, hidden_kv, attention_mask, position_bias, output_attentions, past_key_values, use_cache):
            Constructs the self-attention block of the transformer.

            Args:

            - hidden_q (mindspore.Tensor): The input tensor for the self-attention block.
            - hidden_kv (mindspore.Tensor): The tensor for key-value projection.
            - attention_mask (mindspore.Tensor): The mask tensor to avoid invalid areas in self-attention.
            - position_bias (mindspore.Tensor): The positional information tensor for self-attention.
            - output_attentions (bool, optional): Whether or not to return the attentions tensors of all attention layers.
            - past_key_values (Tuple[mindspore.Tensor, mindspore.Tensor], optional): Cached past key and value projection states.
            - use_cache (bool, optional): Whether to use cached key-value states to speed up decoding.

            Returns:

            - score (mindspore.Tensor): The output attention score tensor.
            - attn_weights (mindspore.Tensor): The attention weights tensor, if output_attentions is set to True.
            - past_key_values (Tuple[mindspore.Tensor, mindspore.Tensor]): The cached key-value states, if use_cache is set to True.
    """
    def __init__(self, config: CpmAntConfig):
        """
        Initializes an instance of CpmAntAttention.

        Args:
            self: The instance of the class.
            config (CpmAntConfig):
                An instance of CpmAntConfig containing configuration parameters.

                - hidden_size (int): The dimension size of the model.
                - num_attention_heads (int): The number of attention heads.
                - dim_head (int): The dimension of each attention head.
                - dropout_p (float, optional): The dropout probability. Default is None.

        Returns:
            None: This method initializes the CpmAntAttention instance with the provided configuration parameters.

        Raises:
            None.
        """
        super().__init__()
        self.dim_model = config.hidden_size
        self.num_heads = config.num_attention_heads
        self.dim_head = config.dim_head

        self.project_q = nn.Linear(self.dim_model, self.num_heads * self.dim_head, bias=False)
        self.project_k = nn.Linear(self.dim_model, self.num_heads * self.dim_head, bias=False)
        self.project_v = nn.Linear(self.dim_model, self.num_heads * self.dim_head, bias=False)

        self.attention_out = nn.Linear(self.num_heads * self.dim_head, self.dim_model, bias=False)

        self.softmax = nn.Softmax(axis=-1)

        if config.dropout_p is not None:
            self.dropout = nn.Dropout(p=config.dropout_p)
        else:
            self.dropout = None

    def forward(
        self,
        hidden_q: mindspore.Tensor,
        hidden_kv: mindspore.Tensor,
        attention_mask: mindspore.Tensor,
        position_bias: mindspore.Tensor,
        output_attentions: Optional[bool] = False,
        past_key_values: Optional[Tuple[mindspore.Tensor, mindspore.Tensor]] = None,
        use_cache: Optional[bool] = None,
    ):
        """
        Args:
            hidden_q (`mindspore.Tensor`):
                Input of transformer block(self-attention block). It can be the raw embedding of a batch of sequences.
            hidden_kv (`mindspore.Tensor` of shape `(batch, len_k, dim_model)`)):
                Tensor *key_value* and *query* of shape `(batch, len_k, dim_model)`
            attention_mask (`mindspore.Tensor` of shape `(batch, len_seq, len_seq)`):
                Avoid invalid areas to participate in the calculation of self-attention.
            position_bias (`mindspore.Tensor` of shape `(batch, len_seq, len_seq)`):
                Provide positional information to self-attention block.
            output_attentions (`bool`, *optional*):
                Whether or not to return the attentions tensors of all attention layers.
            past_key_values (`Tuple[mindspore.Tensor, mindspore.Tensor]`, *optional*):
                Cached past key and value projection states.
            use_cache (`bool`, *optional*):
                If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
                (see `past_key_values`).
        """
        batch_size = hidden_q.shape[0]
        len_q = hidden_q.shape[1]
        len_k = hidden_kv.shape[1]

        query = self.project_q(hidden_q)
        key = self.project_k(hidden_kv)
        value = self.project_v(hidden_kv)

        query = query.view(batch_size, len_q, self.num_heads, self.dim_head).permute(0, 2, 1, 3)
        key = key.view(batch_size, len_k, self.num_heads, self.dim_head).permute(0, 2, 1, 3)
        value = value.view(batch_size, len_k, self.num_heads, self.dim_head).permute(0, 2, 1, 3)

        if past_key_values is not None:
            key = ops.cat([past_key_values[0], key], axis=-2)
            value = ops.cat([past_key_values[1], value], axis=-2)
            len_k = key.shape[-2]

        # (batch_size, num_heads, len_q, dim_head) @ (batch_size, num_heads, dim_head, len_k) -> (batch_size, num_heads, len_q, len_k)
        score = ops.matmul(query, key.swapaxes(-1, -2)) / math.sqrt(self.dim_head)
        score = score + position_bias

        score = ops.masked_fill(
            score,
            attention_mask.view(batch_size, 1, len_q, len_k) == mindspore.Tensor(False),
            ops.scalar_to_tensor(float("-inf"), dtype=score.dtype),
        )
        score = self.softmax(score)

        score = ops.masked_fill(
            score,
            attention_mask.view(batch_size, 1, len_q, len_k) == mindspore.Tensor(False),
            ops.scalar_to_tensor(0, dtype=score.dtype),
        )
        if output_attentions:
            attn_weights = score
        else:
            attn_weights = None

        if self.dropout is not None:
            score = self.dropout(score)

        # (batch_size, num_heads, len_q, len_k) @ (batch_size, num_heads, len_k, dim_head) -> (batch_size, num_heads, len_q, dim_head)
        score = ops.matmul(score, value)

        score = score.view(batch_size, self.num_heads, len_q, self.dim_head).permute(0, 2, 1, 3)
        score = score.view(batch_size, len_q, self.num_heads * self.dim_head)

        score = self.attention_out(score)

        past_key_values = None
        if use_cache:
            past_key_values = (key, value)

        return score, attn_weights, past_key_values

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntAttention.__init__(config)

Initializes an instance of CpmAntAttention.

PARAMETER DESCRIPTION
self

The instance of the class.

config

An instance of CpmAntConfig containing configuration parameters.

  • hidden_size (int): The dimension size of the model.
  • num_attention_heads (int): The number of attention heads.
  • dim_head (int): The dimension of each attention head.
  • dropout_p (float, optional): The dropout probability. Default is None.

TYPE: CpmAntConfig

RETURNS DESCRIPTION
None

This method initializes the CpmAntAttention instance with the provided configuration parameters.

Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
def __init__(self, config: CpmAntConfig):
    """
    Initializes an instance of CpmAntAttention.

    Args:
        self: The instance of the class.
        config (CpmAntConfig):
            An instance of CpmAntConfig containing configuration parameters.

            - hidden_size (int): The dimension size of the model.
            - num_attention_heads (int): The number of attention heads.
            - dim_head (int): The dimension of each attention head.
            - dropout_p (float, optional): The dropout probability. Default is None.

    Returns:
        None: This method initializes the CpmAntAttention instance with the provided configuration parameters.

    Raises:
        None.
    """
    super().__init__()
    self.dim_model = config.hidden_size
    self.num_heads = config.num_attention_heads
    self.dim_head = config.dim_head

    self.project_q = nn.Linear(self.dim_model, self.num_heads * self.dim_head, bias=False)
    self.project_k = nn.Linear(self.dim_model, self.num_heads * self.dim_head, bias=False)
    self.project_v = nn.Linear(self.dim_model, self.num_heads * self.dim_head, bias=False)

    self.attention_out = nn.Linear(self.num_heads * self.dim_head, self.dim_model, bias=False)

    self.softmax = nn.Softmax(axis=-1)

    if config.dropout_p is not None:
        self.dropout = nn.Dropout(p=config.dropout_p)
    else:
        self.dropout = None

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntAttention.forward(hidden_q, hidden_kv, attention_mask, position_bias, output_attentions=False, past_key_values=None, use_cache=None)

PARAMETER DESCRIPTION
hidden_q

Input of transformer block(self-attention block). It can be the raw embedding of a batch of sequences.

TYPE: `mindspore.Tensor`

hidden_kv

Tensor key_value and query of shape (batch, len_k, dim_model)

TYPE: `mindspore.Tensor` of shape `(batch, len_k, dim_model)`

attention_mask

Avoid invalid areas to participate in the calculation of self-attention.

TYPE: `mindspore.Tensor` of shape `(batch, len_seq, len_seq)`

position_bias

Provide positional information to self-attention block.

TYPE: `mindspore.Tensor` of shape `(batch, len_seq, len_seq)`

output_attentions

Whether or not to return the attentions tensors of all attention layers.

TYPE: `bool`, *optional* DEFAULT: False

past_key_values

Cached past key and value projection states.

TYPE: `Tuple[mindspore.Tensor, mindspore.Tensor]`, *optional* DEFAULT: None

use_cache

If set to True, past_key_values key value states are returned and can be used to speed up decoding (see past_key_values).

TYPE: `bool`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
def forward(
    self,
    hidden_q: mindspore.Tensor,
    hidden_kv: mindspore.Tensor,
    attention_mask: mindspore.Tensor,
    position_bias: mindspore.Tensor,
    output_attentions: Optional[bool] = False,
    past_key_values: Optional[Tuple[mindspore.Tensor, mindspore.Tensor]] = None,
    use_cache: Optional[bool] = None,
):
    """
    Args:
        hidden_q (`mindspore.Tensor`):
            Input of transformer block(self-attention block). It can be the raw embedding of a batch of sequences.
        hidden_kv (`mindspore.Tensor` of shape `(batch, len_k, dim_model)`)):
            Tensor *key_value* and *query* of shape `(batch, len_k, dim_model)`
        attention_mask (`mindspore.Tensor` of shape `(batch, len_seq, len_seq)`):
            Avoid invalid areas to participate in the calculation of self-attention.
        position_bias (`mindspore.Tensor` of shape `(batch, len_seq, len_seq)`):
            Provide positional information to self-attention block.
        output_attentions (`bool`, *optional*):
            Whether or not to return the attentions tensors of all attention layers.
        past_key_values (`Tuple[mindspore.Tensor, mindspore.Tensor]`, *optional*):
            Cached past key and value projection states.
        use_cache (`bool`, *optional*):
            If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
            (see `past_key_values`).
    """
    batch_size = hidden_q.shape[0]
    len_q = hidden_q.shape[1]
    len_k = hidden_kv.shape[1]

    query = self.project_q(hidden_q)
    key = self.project_k(hidden_kv)
    value = self.project_v(hidden_kv)

    query = query.view(batch_size, len_q, self.num_heads, self.dim_head).permute(0, 2, 1, 3)
    key = key.view(batch_size, len_k, self.num_heads, self.dim_head).permute(0, 2, 1, 3)
    value = value.view(batch_size, len_k, self.num_heads, self.dim_head).permute(0, 2, 1, 3)

    if past_key_values is not None:
        key = ops.cat([past_key_values[0], key], axis=-2)
        value = ops.cat([past_key_values[1], value], axis=-2)
        len_k = key.shape[-2]

    # (batch_size, num_heads, len_q, dim_head) @ (batch_size, num_heads, dim_head, len_k) -> (batch_size, num_heads, len_q, len_k)
    score = ops.matmul(query, key.swapaxes(-1, -2)) / math.sqrt(self.dim_head)
    score = score + position_bias

    score = ops.masked_fill(
        score,
        attention_mask.view(batch_size, 1, len_q, len_k) == mindspore.Tensor(False),
        ops.scalar_to_tensor(float("-inf"), dtype=score.dtype),
    )
    score = self.softmax(score)

    score = ops.masked_fill(
        score,
        attention_mask.view(batch_size, 1, len_q, len_k) == mindspore.Tensor(False),
        ops.scalar_to_tensor(0, dtype=score.dtype),
    )
    if output_attentions:
        attn_weights = score
    else:
        attn_weights = None

    if self.dropout is not None:
        score = self.dropout(score)

    # (batch_size, num_heads, len_q, len_k) @ (batch_size, num_heads, len_k, dim_head) -> (batch_size, num_heads, len_q, dim_head)
    score = ops.matmul(score, value)

    score = score.view(batch_size, self.num_heads, len_q, self.dim_head).permute(0, 2, 1, 3)
    score = score.view(batch_size, len_q, self.num_heads * self.dim_head)

    score = self.attention_out(score)

    past_key_values = None
    if use_cache:
        past_key_values = (key, value)

    return score, attn_weights, past_key_values

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntDenseGatedACT

Bases: Module

A class representing a dense gated activation layer for neural networks in the CPM-ANT model.

This class inherits from nn.Module and provides functionality to transform an input tensor from one feature space to another via a nonlinear operation. The transformation is performed using two dense layers with gated activation.

ATTRIBUTE DESCRIPTION
w_0

The first dense layer for the transformation.

TYPE: Linear

w_1

The second dense layer for the transformation.

TYPE: Linear

act

The activation function to apply.

TYPE: GELU

METHOD DESCRIPTION
__init__

Initializes the CpmAntDenseGatedACT instance.

forward

Transforms an input tensor using the dense gated activation.

Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
class CpmAntDenseGatedACT(nn.Module):

    """
    A class representing a dense gated activation layer for neural networks in the CPM-ANT model.

    This class inherits from nn.Module and provides functionality to transform an input tensor from one feature space to another via a nonlinear operation. The transformation is performed using two dense layers
    with gated activation.

    Attributes:
        w_0 (nn.Linear): The first dense layer for the transformation.
        w_1 (nn.Linear): The second dense layer for the transformation.
        act (nn.GELU): The activation function to apply.

    Methods:
        __init__: Initializes the CpmAntDenseGatedACT instance.
        forward: Transforms an input tensor using the dense gated activation.

    """
    def __init__(self, config: CpmAntConfig):
        """
        Initializes an instance of the CpmAntDenseGatedACT class.

        Args:
            self: The object instance.
            config (CpmAntConfig):
                The configuration object that contains the required parameters for initialization.

                - `hidden_size` (int): The size of the hidden layer.
                - `dim_ff` (int): The dimension of the feed-forward layer.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()
        self.w_0 = nn.Linear(config.hidden_size, config.dim_ff, bias=False)
        self.w_1 = nn.Linear(config.hidden_size, config.dim_ff, bias=False)
        self.act = nn.GELU()

    def forward(self, hidden_states: mindspore.Tensor):
        """Transform an input tensor from one feature space to another via a nonlinear operation

        Args:
            hidden_states (`mindspore.Tensor` of shape `(batch, seq_len, dim_in)`)
        """
        gate_score = self.act(self.w_0(hidden_states))
        hidden_states = self.w_1(hidden_states)

        hidden_states = gate_score * hidden_states
        return hidden_states

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntDenseGatedACT.__init__(config)

Initializes an instance of the CpmAntDenseGatedACT class.

PARAMETER DESCRIPTION
self

The object instance.

config

The configuration object that contains the required parameters for initialization.

  • hidden_size (int): The size of the hidden layer.
  • dim_ff (int): The dimension of the feed-forward layer.

TYPE: CpmAntConfig

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
def __init__(self, config: CpmAntConfig):
    """
    Initializes an instance of the CpmAntDenseGatedACT class.

    Args:
        self: The object instance.
        config (CpmAntConfig):
            The configuration object that contains the required parameters for initialization.

            - `hidden_size` (int): The size of the hidden layer.
            - `dim_ff` (int): The dimension of the feed-forward layer.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()
    self.w_0 = nn.Linear(config.hidden_size, config.dim_ff, bias=False)
    self.w_1 = nn.Linear(config.hidden_size, config.dim_ff, bias=False)
    self.act = nn.GELU()

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntDenseGatedACT.forward(hidden_states)

Transform an input tensor from one feature space to another via a nonlinear operation

Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
386
387
388
389
390
391
392
393
394
395
396
def forward(self, hidden_states: mindspore.Tensor):
    """Transform an input tensor from one feature space to another via a nonlinear operation

    Args:
        hidden_states (`mindspore.Tensor` of shape `(batch, seq_len, dim_in)`)
    """
    gate_score = self.act(self.w_0(hidden_states))
    hidden_states = self.w_1(hidden_states)

    hidden_states = gate_score * hidden_states
    return hidden_states

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntEncoder

Bases: Module

The CpmAntEncoder class represents a transformer encoder for the CpmAntConfig model. It inherits from nn.Module and contains methods for initializing the encoder and forwarding the encoder layers.

The init method initializes the CpmAntEncoder with the provided CpmAntConfig, setting the number of layers and creating a list of transformer blocks for the encoder.

The forward method takes input hidden_states, attention_mask, position_bias, and optional parameters to perform the encoding process. It iterates through the encoder layers, applying the attention mechanism and caching key and value projection states if specified. The method returns the final hidden_states, current_key_values, hidden_states of all layers, and attention weights of all layers as per the specified optional outputs.

PARAMETER DESCRIPTION
hidden_states

Input to the layer of shape (batch, seq_len, dim_model)

TYPE: Tensor

attention_mask

Avoid invalid areas to participate in the calculation of shape (batch, seq_len, seq_len)

TYPE: Tensor

position_bias

Provides position information to attention mechanism of shape (num_heads, seq_len, seq_len)

TYPE: Tensor

output_attentions

Whether or not to return the attentions tensors of all attention layers.

TYPE: bool

output_hidden_states

Whether or not to return the hidden states of all layers.

TYPE: bool

past_key_values

Cached past key and value projection states

TYPE: Tuple[Tensor, Tensor]

use_cache

If set to True, past_key_values key value states are returned and can be used to speed up decoding (see past_key_values).

TYPE: bool

RETURNS DESCRIPTION
tuple

Tuple of mindspore.Tensor, Tuple of mindspore.Tensor, Optional[Tuple[mindspore.Tensor]], Optional[Tuple[mindspore.Tensor]]:

  • hidden_states: Final hidden states of the encoder
  • current_key_values: Current key and value projection states
  • all_hidden_states: Hidden states of all layers (if output_hidden_states is True)
  • all_self_attns: Attention weights of all layers (if output_attentions is True)
Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
class CpmAntEncoder(nn.Module):

    """
    The CpmAntEncoder class represents a transformer encoder for the CpmAntConfig model.
    It inherits from nn.Module and contains methods for initializing the encoder and forwarding the encoder layers.

    The __init__ method initializes the CpmAntEncoder with the provided CpmAntConfig,
    setting the number of layers and creating a list of transformer blocks for the encoder.

    The forward method takes input hidden_states, attention_mask, position_bias, and optional parameters
    to perform the encoding process. It iterates through the encoder layers, applying the attention
    mechanism and caching key and value projection states if specified.
    The method returns the final hidden_states, current_key_values, hidden_states of all layers, and attention weights
    of all layers as per the specified optional outputs.

    Args:
        hidden_states (mindspore.Tensor):
            Input to the layer of shape (batch, seq_len, dim_model)
        attention_mask (mindspore.Tensor):
            Avoid invalid areas to participate in the calculation of shape (batch, seq_len, seq_len)
        position_bias (mindspore.Tensor):
            Provides position information to attention mechanism of shape (num_heads, seq_len, seq_len)
        output_attentions (bool, optional):
            Whether or not to return the attentions tensors of all attention layers.
        output_hidden_states (bool, optional):
            Whether or not to return the hidden states of all layers.
        past_key_values (Tuple[mindspore.Tensor, mindspore.Tensor], optional):
            Cached past key and value projection states
        use_cache (bool, optional):
            If set to True, past_key_values key value states are returned and can be used to speed up decoding (see past_key_values).

    Returns:
        tuple:
            Tuple of mindspore.Tensor, Tuple of mindspore.Tensor, Optional[Tuple[mindspore.Tensor]],
            Optional[Tuple[mindspore.Tensor]]:

            - hidden_states: Final hidden states of the encoder
            - current_key_values: Current key and value projection states
            - all_hidden_states: Hidden states of all layers (if output_hidden_states is True)
            - all_self_attns: Attention weights of all layers (if output_attentions is True)
    """
    def __init__(self, config: CpmAntConfig):
        """
        Initializes a new instance of the CpmAntEncoder class.

        Args:
            self: The instance of the class.
            config (CpmAntConfig):
                The configuration object for the encoder.

                - num_hidden_layers (int): The number of hidden layers.

        Returns:
            None

        Raises:
            None
        """
        super().__init__()
        self.num_layers = config.num_hidden_layers
        self.layers = nn.ModuleList([CpmAntTransformerBlock(config) for ith in range(self.num_layers)])

        self.output_layernorm = CpmAntLayerNorm(config)

    def forward(
        self,
        hidden_states: mindspore.Tensor,
        attention_mask: mindspore.Tensor,
        position_bias: mindspore.Tensor,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        past_key_values: Optional[Tuple[mindspore.Tensor, mindspore.Tensor]] = None,
        use_cache: Optional[bool] = None,
    ):
        """
        Args:
            hidden_states (`mindspore.Tensor`):
                Input to the layer of shape `(batch, seq_len, dim_model)`
            attention_mask (`mindspore.Tensor`):
                Avoid invalid areas to participate in the calculation of shape `(batch, seq_len, seq_len)`
            position_bias (`mindspore.Tensor`):
                Provides position information to attention mechanism of shape `(num_heads, seq_len, seq_len)`
            output_attentions (`bool`, *optional*):
                Whether or not to return the attentions tensors of all attention layers.
            output_hidden_states (`bool`, *optional*):
                Whether or not to return the hidden states of all layers.
            past_key_values (`Tuple[mindspore.Tensor, mindspore.Tensor])`, *optional*):
                Cached past key and value projection states
            use_cache (`bool`, *optional*):
                If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
                (see `past_key_values`).
        """
        all_hidden_states = () if output_hidden_states else None
        all_self_attns = () if output_attentions else None
        current_key_values = () if use_cache else None

        for i, layer in enumerate(self.layers):
            if output_hidden_states:
                all_hidden_states += (hidden_states,)
            layer_outputs = layer(
                hidden_states,
                attention_mask,
                position_bias,
                output_attentions=output_attentions,
                past_key_values=past_key_values[i] if past_key_values else None,
                use_cache=use_cache,
            )
            hidden_states, attn_weights, current_key_value = layer_outputs
            if output_attentions:
                all_self_attns += (attn_weights,)
            if current_key_value is not None:
                current_key_values = current_key_values + (current_key_value,)

        hidden_states = self.output_layernorm(hidden_states)

        if output_hidden_states:
            all_hidden_states += (hidden_states,)

        return hidden_states, current_key_values, all_hidden_states, all_self_attns

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntEncoder.__init__(config)

Initializes a new instance of the CpmAntEncoder class.

PARAMETER DESCRIPTION
self

The instance of the class.

config

The configuration object for the encoder.

  • num_hidden_layers (int): The number of hidden layers.

TYPE: CpmAntConfig

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
def __init__(self, config: CpmAntConfig):
    """
    Initializes a new instance of the CpmAntEncoder class.

    Args:
        self: The instance of the class.
        config (CpmAntConfig):
            The configuration object for the encoder.

            - num_hidden_layers (int): The number of hidden layers.

    Returns:
        None

    Raises:
        None
    """
    super().__init__()
    self.num_layers = config.num_hidden_layers
    self.layers = nn.ModuleList([CpmAntTransformerBlock(config) for ith in range(self.num_layers)])

    self.output_layernorm = CpmAntLayerNorm(config)

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntEncoder.forward(hidden_states, attention_mask, position_bias, output_attentions=None, output_hidden_states=None, past_key_values=None, use_cache=None)

PARAMETER DESCRIPTION
hidden_states

Input to the layer of shape (batch, seq_len, dim_model)

TYPE: `mindspore.Tensor`

attention_mask

Avoid invalid areas to participate in the calculation of shape (batch, seq_len, seq_len)

TYPE: `mindspore.Tensor`

position_bias

Provides position information to attention mechanism of shape (num_heads, seq_len, seq_len)

TYPE: `mindspore.Tensor`

output_attentions

Whether or not to return the attentions tensors of all attention layers.

TYPE: `bool`, *optional* DEFAULT: None

output_hidden_states

Whether or not to return the hidden states of all layers.

TYPE: `bool`, *optional* DEFAULT: None

past_key_values

Cached past key and value projection states

TYPE: `Tuple[mindspore.Tensor, mindspore.Tensor])`, *optional* DEFAULT: None

use_cache

If set to True, past_key_values key value states are returned and can be used to speed up decoding (see past_key_values).

TYPE: `bool`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
def forward(
    self,
    hidden_states: mindspore.Tensor,
    attention_mask: mindspore.Tensor,
    position_bias: mindspore.Tensor,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    past_key_values: Optional[Tuple[mindspore.Tensor, mindspore.Tensor]] = None,
    use_cache: Optional[bool] = None,
):
    """
    Args:
        hidden_states (`mindspore.Tensor`):
            Input to the layer of shape `(batch, seq_len, dim_model)`
        attention_mask (`mindspore.Tensor`):
            Avoid invalid areas to participate in the calculation of shape `(batch, seq_len, seq_len)`
        position_bias (`mindspore.Tensor`):
            Provides position information to attention mechanism of shape `(num_heads, seq_len, seq_len)`
        output_attentions (`bool`, *optional*):
            Whether or not to return the attentions tensors of all attention layers.
        output_hidden_states (`bool`, *optional*):
            Whether or not to return the hidden states of all layers.
        past_key_values (`Tuple[mindspore.Tensor, mindspore.Tensor])`, *optional*):
            Cached past key and value projection states
        use_cache (`bool`, *optional*):
            If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
            (see `past_key_values`).
    """
    all_hidden_states = () if output_hidden_states else None
    all_self_attns = () if output_attentions else None
    current_key_values = () if use_cache else None

    for i, layer in enumerate(self.layers):
        if output_hidden_states:
            all_hidden_states += (hidden_states,)
        layer_outputs = layer(
            hidden_states,
            attention_mask,
            position_bias,
            output_attentions=output_attentions,
            past_key_values=past_key_values[i] if past_key_values else None,
            use_cache=use_cache,
        )
        hidden_states, attn_weights, current_key_value = layer_outputs
        if output_attentions:
            all_self_attns += (attn_weights,)
        if current_key_value is not None:
            current_key_values = current_key_values + (current_key_value,)

    hidden_states = self.output_layernorm(hidden_states)

    if output_hidden_states:
        all_hidden_states += (hidden_states,)

    return hidden_states, current_key_values, all_hidden_states, all_self_attns

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntFFNBlock

Bases: Module

This class represents a feed-forward neural network block used in the CpmAnt model. It is a sub-module of the CpmAnt model and is responsible for applying feed-forward operations to the input hidden states.

The CpmAntFFNBlock class inherits from the nn.Module class, which is a base class for neural network cells in the MindSpore framework.

ATTRIBUTE DESCRIPTION
layernorm_before_ffn

An instance of the CpmAntLayerNorm class used for layer normalization before the feed-forward operation.

TYPE: CpmAntLayerNorm

ffn

An instance of the CpmAntFeedForward class responsible for the actual feed-forward operation.

TYPE: CpmAntFeedForward

dropout

An instance of the nn.Dropout class used for applying dropout regularization, if configured. If dropout probability is not specified, it is set to None.

TYPE: Dropout or None

METHOD DESCRIPTION
forward

Applies the feed-forward operations to the input hidden states and returns the updated hidden states.

Args:

  • hidden_states (mindspore.Tensor): The input hidden states before the feed-forward layer. It has a shape of (batch, len_seq, dim_model).

Returns:

  • mindspore.Tensor: The updated hidden states after applying the feed-forward operations.
Note

The CpmAntFFNBlock class is typically used as a building block within the CpmAnt model to process intermediate hidden states. It performs layer normalization, feed-forward operations, and optionally applies dropout regularization.

Example
>>> config = CpmAntConfig()
>>> ffn_block = CpmAntFFNBlock(config)
>>> hidden_states = mindspore.Tensor(np.random.randn(batch, len_seq, dim_model), dtype=mindspore.float32)
>>> updated_hidden_states = ffn_block.forward(hidden_states)
Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
class CpmAntFFNBlock(nn.Module):

    """
    This class represents a feed-forward neural network block used in the CpmAnt model.
    It is a sub-module of the CpmAnt model and is responsible for applying feed-forward operations to the input hidden states.

    The CpmAntFFNBlock class inherits from the nn.Module class, which is a base class for neural network cells in the MindSpore framework.

    Attributes:
        layernorm_before_ffn (CpmAntLayerNorm):
            An instance of the CpmAntLayerNorm class used for layer normalization before the feed-forward operation.
        ffn (CpmAntFeedForward):
            An instance of the CpmAntFeedForward class responsible for the actual feed-forward operation.
        dropout (nn.Dropout or None):
            An instance of the nn.Dropout class used for applying dropout regularization, if configured.
            If dropout probability is not specified, it is set to None.

    Methods:
        forward:
            Applies the feed-forward operations to the input hidden states and returns the updated hidden states.

            Args:

            - hidden_states (mindspore.Tensor): The input hidden states before the feed-forward layer.
            It has a shape of `(batch, len_seq, dim_model)`.

            Returns:

            - mindspore.Tensor: The updated hidden states after applying the feed-forward operations.

    Note:
        The CpmAntFFNBlock class is typically used as a building block within the CpmAnt model to process intermediate hidden states.
        It performs layer normalization, feed-forward operations, and optionally applies dropout regularization.

    Example:
        ```python
        >>> config = CpmAntConfig()
        >>> ffn_block = CpmAntFFNBlock(config)
        >>> hidden_states = mindspore.Tensor(np.random.randn(batch, len_seq, dim_model), dtype=mindspore.float32)
        >>> updated_hidden_states = ffn_block.forward(hidden_states)
        ```
    """
    def __init__(self, config: CpmAntConfig):
        """
        Initializes a new instance of the CpmAntFFNBlock class.

        Args:
            self: The instance of the class.
            config (CpmAntConfig):
                The configuration object for the CpmAntFFNBlock. It contains the parameters and settings for the block.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()
        self.layernorm_before_ffn = CpmAntLayerNorm(config)
        self.ffn = CpmAntFeedForward(config)
        if config.dropout_p:
            self.dropout = nn.Dropout(p=config.dropout_p)
        else:
            self.dropout = None

    def forward(
        self,
        hidden_states: mindspore.Tensor,
    ):
        """
        Args:
            hidden_states (`mindspore.Tensor` of shape `(batch, len_seq, dim_model)`):
                Hidden states before feed forward layer.
        """
        ln_outputs = self.layernorm_before_ffn(hidden_states)
        outputs = self.ffn(ln_outputs)
        if self.dropout is not None:
            outputs = self.dropout(outputs)
        hidden_states = hidden_states + outputs
        return hidden_states

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntFFNBlock.__init__(config)

Initializes a new instance of the CpmAntFFNBlock class.

PARAMETER DESCRIPTION
self

The instance of the class.

config

The configuration object for the CpmAntFFNBlock. It contains the parameters and settings for the block.

TYPE: CpmAntConfig

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
def __init__(self, config: CpmAntConfig):
    """
    Initializes a new instance of the CpmAntFFNBlock class.

    Args:
        self: The instance of the class.
        config (CpmAntConfig):
            The configuration object for the CpmAntFFNBlock. It contains the parameters and settings for the block.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()
    self.layernorm_before_ffn = CpmAntLayerNorm(config)
    self.ffn = CpmAntFeedForward(config)
    if config.dropout_p:
        self.dropout = nn.Dropout(p=config.dropout_p)
    else:
        self.dropout = None

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntFFNBlock.forward(hidden_states)

PARAMETER DESCRIPTION
hidden_states

Hidden states before feed forward layer.

TYPE: `mindspore.Tensor` of shape `(batch, len_seq, dim_model)`

Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
def forward(
    self,
    hidden_states: mindspore.Tensor,
):
    """
    Args:
        hidden_states (`mindspore.Tensor` of shape `(batch, len_seq, dim_model)`):
            Hidden states before feed forward layer.
    """
    ln_outputs = self.layernorm_before_ffn(hidden_states)
    outputs = self.ffn(ln_outputs)
    if self.dropout is not None:
        outputs = self.dropout(outputs)
    hidden_states = hidden_states + outputs
    return hidden_states

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntFeedForward

Bases: Module

CpmAntFeedForward represents a feedforward neural network component designed for the CpmAnt model architecture. This class inherits from nn.Module and is used for processing hidden states through a series of transformations.

ATTRIBUTE DESCRIPTION
w_in

The first layer of the feedforward network for processing input hidden states.

TYPE: CpmAntDenseGatedACT

dropout

Dropout layer for regularization, initialized based on the configuration parameter.

TYPE: Dropout or None

w_out

The output layer of the feedforward network for producing final hidden states.

TYPE: Linear

METHOD DESCRIPTION
__init__

Constructor method for initializing the CpmAntFeedForward instance with the given configuration.

forward

Method for processing the input hidden states through the network layers.

PARAMETER DESCRIPTION
config

Configuration object containing settings for the feedforward network.

TYPE: CpmAntConfig

hidden_states

Input tensor representing hidden states with shape (batch, seq_len, dim_in).

TYPE: Tensor

RETURNS DESCRIPTION

mindspore.Tensor: Output tensor containing the processed hidden states after passing through the feedforward network.

Usage

Instantiate an object of CpmAntFeedForward with a CpmAntConfig object and then call the forward method with input hidden_states to obtain the processed output hidden states.

Note
  • The dropout layer is optional based on the dropout probability specified in the configuration.
Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
class CpmAntFeedForward(nn.Module):

    """
    CpmAntFeedForward represents a feedforward neural network component designed for the CpmAnt model architecture.
    This class inherits from nn.Module and is used for processing hidden states through a series of transformations.

    Attributes:
        w_in (CpmAntDenseGatedACT): The first layer of the feedforward network for processing input hidden states.
        dropout (nn.Dropout or None): Dropout layer for regularization, initialized based on the configuration parameter.
        w_out (nn.Linear): The output layer of the feedforward network for producing final hidden states.

    Methods:
        __init__: Constructor method for initializing the CpmAntFeedForward instance with the given configuration.
        forward: Method for processing the input hidden states through the network layers.

    Args:
        config (CpmAntConfig): Configuration object containing settings for the feedforward network.
        hidden_states (mindspore.Tensor): Input tensor representing hidden states with shape (batch, seq_len, dim_in).

    Returns:
        mindspore.Tensor: Output tensor containing the processed hidden states after passing through the feedforward network.

    Usage:
        Instantiate an object of CpmAntFeedForward with a CpmAntConfig object and then call the forward method with input hidden_states
        to obtain the processed output hidden states.

    Note:
        - The dropout layer is optional based on the dropout probability specified in the configuration.
    """
    def __init__(self, config: CpmAntConfig):
        """
        Initializes an instance of the CpmAntFeedForward class.

        Args:
            self: The instance of the class.
            config (CpmAntConfig): An object of type CpmAntConfig containing configuration parameters.
                This parameter is required for configuring the feed-forward network.
                It should be an instance of CpmAntConfig class.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()
        self.w_in = CpmAntDenseGatedACT(config)
        if config.dropout_p is not None:
            self.dropout = nn.Dropout(p=config.dropout_p)
        else:
            self.dropout = None

        self.w_out = nn.Linear(config.dim_ff, config.hidden_size, bias=False)

    def forward(self, hidden_states: mindspore.Tensor):
        """
        Args:
            hidden_states (`mindspore.Tensor` of shape `(batch, seq_len, dim_in)`)
        """
        hidden_states = self.w_in(hidden_states)

        if self.dropout is not None:
            hidden_states = self.dropout(hidden_states)

        hidden_states = self.w_out(hidden_states)

        return hidden_states

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntFeedForward.__init__(config)

Initializes an instance of the CpmAntFeedForward class.

PARAMETER DESCRIPTION
self

The instance of the class.

config

An object of type CpmAntConfig containing configuration parameters. This parameter is required for configuring the feed-forward network. It should be an instance of CpmAntConfig class.

TYPE: CpmAntConfig

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
def __init__(self, config: CpmAntConfig):
    """
    Initializes an instance of the CpmAntFeedForward class.

    Args:
        self: The instance of the class.
        config (CpmAntConfig): An object of type CpmAntConfig containing configuration parameters.
            This parameter is required for configuring the feed-forward network.
            It should be an instance of CpmAntConfig class.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()
    self.w_in = CpmAntDenseGatedACT(config)
    if config.dropout_p is not None:
        self.dropout = nn.Dropout(p=config.dropout_p)
    else:
        self.dropout = None

    self.w_out = nn.Linear(config.dim_ff, config.hidden_size, bias=False)

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntFeedForward.forward(hidden_states)

Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
453
454
455
456
457
458
459
460
461
462
463
464
465
def forward(self, hidden_states: mindspore.Tensor):
    """
    Args:
        hidden_states (`mindspore.Tensor` of shape `(batch, seq_len, dim_in)`)
    """
    hidden_states = self.w_in(hidden_states)

    if self.dropout is not None:
        hidden_states = self.dropout(hidden_states)

    hidden_states = self.w_out(hidden_states)

    return hidden_states

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntForCausalLM

Bases: CpmAntPreTrainedModel

CpmAntForCausalLM is a class representing a Causal Language Model based on the CPMAnt model for text generation tasks. This class extends the functionality of CpmAntPreTrainedModel and provides methods for model initialization, text generation, and handling embeddings.

The CpmAntForCausalLM class includes methods for model initialization, generating text based on input sequences, accessing and setting input and output embeddings, preparing inputs for text generation, and reordering cache for beam search decoding.

Example

Text Generation with CpmAntForCausalLM:

>>> from transformers import CPMAntTokenizer, CpmAntForCausalLM
...
>>> texts = "Today is a beautiful day, "
>>> model = CpmAntForCausalLM.from_pretrained("openbmb/cpm-ant-10b")
>>> tokenizer = CPMAntTokenizer.from_pretrained("openbmb/cpm-ant-10b")
>>> input_ids = tokenizer(texts, return_tensors="pt")
>>> outputs = model.generate(**input_ids)
>>> output_texts = tokenizer.batch_decode(outputs)
>>> print(output_texts)
['Today is a beautiful day, the sun is shining, and the birds are singing.']

METHOD DESCRIPTION
__init__

Initializes the CpmAntForCausalLM model with the provided configuration.

forward

Constructs the model for text generation based on the input arguments and returns output in the specified format.

get_input_embeddings

Retrieves the input embeddings of the model.

set_input_embeddings

Sets new input embeddings for the model.

get_output_embeddings

Retrieves the output embeddings of the model.

set_output_embeddings

Sets new output embeddings for the model.

prepare_inputs_for_generation

Prepares inputs for text generation based on the provided input_ids and keyword arguments.

_reorder_cache

Reorders the cache for beam search decoding.

PARAMETER DESCRIPTION
input_ids

Indices of input sequence tokens in the vocabulary.

TYPE: Tensor

past_key_values

Pre-computed hidden states for sequential decoding.

TYPE: List[Tuple[Tensor, Tensor]]

use_cache

Flag to determine if cache should be used for decoding.

TYPE: bool

output_attentions

Flag to include attention tensors in the output.

TYPE: bool

output_hidden_states

Flag to include hidden states of all layers in the output.

TYPE: bool

labels

Labels for computing the masked language modeling loss.

TYPE: Tensor

return_dict

Flag to determine the format of the output.

TYPE: bool

attention_mask

Dummy parameter for text-generation pipeline.

TYPE: Tensor

RETURNS DESCRIPTION

Union[Tuple, CausalLMOutputWithPast]: Tuple or CausalLMOutputWithPast object containing model outputs and past key values.

RAISES DESCRIPTION
NotImplementedError

If a method is not implemented in the subclass.

Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
class CpmAntForCausalLM(CpmAntPreTrainedModel):

    """
    CpmAntForCausalLM is a class representing a Causal Language Model based on the CPMAnt model for text generation tasks.
    This class extends the functionality of CpmAntPreTrainedModel and provides methods for model initialization,
    text generation, and handling embeddings.

    The CpmAntForCausalLM class includes methods for model initialization, generating text based on input sequences,
    accessing and setting input and output embeddings,
    preparing inputs for text generation, and reordering cache for beam search decoding.

    Example:
        Text Generation with CpmAntForCausalLM:
        ```python
        >>> from transformers import CPMAntTokenizer, CpmAntForCausalLM
        ...
        >>> texts = "Today is a beautiful day, "
        >>> model = CpmAntForCausalLM.from_pretrained("openbmb/cpm-ant-10b")
        >>> tokenizer = CPMAntTokenizer.from_pretrained("openbmb/cpm-ant-10b")
        >>> input_ids = tokenizer(texts, return_tensors="pt")
        >>> outputs = model.generate(**input_ids)
        >>> output_texts = tokenizer.batch_decode(outputs)
        >>> print(output_texts)
        ['Today is a beautiful day, the sun is shining, and the birds are singing.']
        ```

    Methods:
        __init__: Initializes the CpmAntForCausalLM model with the provided configuration.
        forward: Constructs the model for text generation based on the input arguments and returns output in the specified format.
        get_input_embeddings: Retrieves the input embeddings of the model.
        set_input_embeddings: Sets new input embeddings for the model.
        get_output_embeddings: Retrieves the output embeddings of the model.
        set_output_embeddings: Sets new output embeddings for the model.
        prepare_inputs_for_generation: Prepares inputs for text generation based on the provided input_ids and keyword arguments.
        _reorder_cache: Reorders the cache for beam search decoding.

    Args:
        input_ids (mindspore.Tensor): Indices of input sequence tokens in the vocabulary.
        past_key_values (List[Tuple[mindspore.Tensor, mindspore.Tensor]]): Pre-computed hidden states for sequential decoding.
        use_cache (bool): Flag to determine if cache should be used for decoding.
        output_attentions (bool): Flag to include attention tensors in the output.
        output_hidden_states (bool): Flag to include hidden states of all layers in the output.
        labels (mindspore.Tensor): Labels for computing the masked language modeling loss.
        return_dict (bool): Flag to determine the format of the output.
        attention_mask (mindspore.Tensor): Dummy parameter for text-generation pipeline.

    Returns:
        Union[Tuple, CausalLMOutputWithPast]: Tuple or CausalLMOutputWithPast object containing model outputs and past key values.

    Raises:
        NotImplementedError: If a method is not implemented in the subclass.

    """
    _tied_weights_keys = ["lm_head.weight"]

    def __init__(self, config: CpmAntConfig):
        """
        Initializes an instance of the CpmAntForCausalLM class.

        Args:
            self: The instance of the class.
            config (CpmAntConfig): The configuration object for the CpmAnt model.

        Returns:
            None

        Raises:
            None
        """
        super().__init__(config)
        self.cpmant = CpmAntModel(config)

        # lm_head.weight is tied to cpmant.input_embedding.weight
        self.lm_head = nn.Linear(
            config.hidden_size, config.vocab_size + config.prompt_types * config.prompt_length, bias=False
        )
        self.post_init()

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        past_key_values: Optional[List[Tuple[mindspore.Tensor, mindspore.Tensor]]] = None,
        use_cache: Optional[bool] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        labels: Optional[mindspore.Tensor] = None,
        return_dict: Optional[bool] = None,
        attention_mask: Optional[mindspore.Tensor] = None,  # dummy parameter for text-generation pipeline
        **kwargs,
    ) -> Union[Tuple, CausalLMOutputWithPast]:
        r"""
        Args:
            input_ids (`mindspore.Tensor` of shape `(batch_size, seq_len)`):
                Indices of input sequence tokens in the vocabulary.

                Indices can be obtained using [`CPMAntTokenizer`]. See [`PreTrainedTokenizer.encode`] and
                [`PreTrainedTokenizer.__call__`] for details.

                [What are input IDs?](../glossary#input-ids)
            past_key_values (`tuple(tuple(torch.FloatTensor))`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`):
                Contains pre-computed hidden-states (key and values in the self-attention blocks and in the
                cross-attention blocks) that can be used (see `past_key_values` input) to speed up sequential decoding.
            use_cache (`bool`, *optional*):
                If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
                (see `past_key_values`).
            output_attentions (`bool`, *optional*):
                Whether or not to return the attentions tensors of all attention layers.
            output_hidden_states (`bool`, *optional*):
                Whether or not to return the hidden states of all layers.
            labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
                Labels for computing the masked language modeling loss.
            return_dict (`bool`, *optional*):
                Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
            attention_mask (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
                CPMAnt will process attention mask automatically, this parameter is a dummy parameter for
                text-generation pipeline.

        Example:
            Text Generation with CpmAntForCausalLM.
            ```python
            >>> from transformers import CPMAntTokenizer, CpmAntForCausalLM
            ...
            >>> texts = "今天天气不错,"
            >>> model = CpmAntForCausalLM.from_pretrained("openbmb/cpm-ant-10b")
            >>> tokenizer = CPMAntTokenizer.from_pretrained("openbmb/cpm-ant-10b")
            >>> input_ids = tokenizer(texts, return_tensors="pt")
            >>> outputs = model.generate(**input_ids)
            >>> output_texts = tokenizer.batch_decode(outputs)
            >>> print(output_texts)
            ['今天天气不错,阳光明媚,我和妈妈一起去超市买东西。\n在超市里,我看到了一个很好玩的玩具,它的名字叫“机器人”。它有一个圆圆的脑袋,两只圆圆的眼睛,还有一个圆圆的']
            ```
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        model_output = self.cpmant(
            input_ids, output_attentions, output_hidden_states, past_key_values, use_cache, return_dict
        )
        hidden_states = model_output.last_hidden_state if return_dict else model_output[0]

        logits = self.lm_head(hidden_states)

        loss = None
        if labels is not None:
            loss = ops.cross_entropy(logits.view(-1, logits.shape[-1]), labels.view(-1))

        if not return_dict:
            output = (logits,) + model_output[1:]
            return ((loss,) + output) if loss is not None else output

        return CausalLMOutputWithPast(
            loss=loss,
            logits=logits,
            past_key_values=model_output.past_key_values,
            hidden_states=model_output.hidden_states,
            attentions=model_output.attentions,
        )

    def get_input_embeddings(self):
        """
        Retrieve the input embeddings used by the CpmAntForCausalLM model.

        Args:
            self (CpmAntForCausalLM): The instance of the CpmAntForCausalLM class.
                This parameter is required to access the input embeddings specific to this instance.

        Returns:
            None: This method returns the input embeddings associated with the CpmAntForCausalLM model.
                The input embeddings are used for processing input data within the model.

        Raises:
            None: This method does not raise any exceptions.
        """
        return self.cpmant.input_embedding

    def set_input_embeddings(self, embeddings):
        """
        Set the input embeddings for the CpmAntForCausalLM model.

        Args:
            self (CpmAntForCausalLM): The instance of the CpmAntForCausalLM class.
            embeddings: The input embeddings to be set for the model.
                This parameter should be a valid embeddings object that can be assigned to the input_embedding attribute of the CpmAntForCausalLM instance.

        Returns:
            None.

        Raises:
            None.
        """
        self.cpmant.input_embedding = embeddings

    def get_output_embeddings(self):
        """
        Retrieves the output embeddings of the language model head.

        Args:
            self: An instance of the CpmAntForCausalLM class.

        Returns:
            lm_head: The method returns the output embeddings of the language model head.

        Raises:
            None.
        """
        return self.lm_head

    def set_output_embeddings(self, new_embeddings):
        """
        Sets the output embeddings of the CpmAntForCausalLM model.

        Args:
            self (CpmAntForCausalLM): The instance of the CpmAntForCausalLM class.
            new_embeddings (torch.nn.Module): The new embeddings to be set as the output embeddings of the model.

        Returns:
            None

        Raises:
            None

        This method sets the output embeddings of the CpmAntForCausalLM model to the provided new embeddings.
        The new embeddings should be an instance of torch.nn.Module.

        Example:
            ```python
            >>> model = CpmAntForCausalLM()
            >>> new_embeddings = nn.Embedding(1000, 768)
            >>> model.set_output_embeddings(new_embeddings)
            ```
        """
        self.lm_head = new_embeddings

    def prepare_inputs_for_generation(self, input_ids, **kwargs):
        """
        Prepare inputs for generation.

        This method takes in two parameters: self and input_ids.
        It modifies the input_ids and returns a dictionary containing the modified input_ids, use_cache, and past_key_values.

        Args:
            self: The instance of the CpmAntForCausalLM class.
            input_ids (tensor): The input tensor containing the tokenized input sequence.

        Returns:
            dict:
                A dictionary with the following keys:

                - input_ids (tensor): The modified input tensor.
                - use_cache (bool): The value of the use_cache parameter from kwargs.
                - past_key_values (tensor or None): The value of the past_key_values parameter from kwargs,
                or None if not provided.

        Raises:
            None.

        Note:
            - The input_ids parameter is cast to int.
            - If the 'attention_mask' key is present in kwargs, its value is replaced with a zero tensor of shape (1, 1).
        """
        input_ids = input_ids.int()
        # save the memory usage of dummy attention mask
        if "attention_mask" in kwargs:
            kwargs["attention_mask"] = ops.zeros(1, 1)

        return {
            "input_ids": input_ids,
            "use_cache": kwargs["use_cache"],
            "past_key_values": kwargs.get("past_key_values", None),
        }

    def _reorder_cache(self, past_key_values, beam_idx):
        """
        Reorders the cache for the specified beam index.

        Args:
            self (CpmAntForCausalLM): An instance of the CpmAntForCausalLM class.
            past_key_values (list): A list of past key values. Each element in the list represents a key-value layer 
                and is a list containing two elements: the key and the value. If a key-value layer
                is None, it will be preserved as None.
            beam_idx (int): The index of the beam for which the cache needs to be reordered.

        Returns:
            list: The reordered cache represented as a list of past key values. Each element in the list is a key-value 
                layer, and each key-value layer is a list containing two elements: the key and the value.

        Raises:
            None

        """
        past_key_values = [list(each) if each is not None else each for each in past_key_values]
        for key_value_layer in past_key_values:
            key_value_layer[0] = key_value_layer[0][beam_idx]
            key_value_layer[1] = key_value_layer[1][beam_idx]
        return past_key_values

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntForCausalLM.__init__(config)

Initializes an instance of the CpmAntForCausalLM class.

PARAMETER DESCRIPTION
self

The instance of the class.

config

The configuration object for the CpmAnt model.

TYPE: CpmAntConfig

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
def __init__(self, config: CpmAntConfig):
    """
    Initializes an instance of the CpmAntForCausalLM class.

    Args:
        self: The instance of the class.
        config (CpmAntConfig): The configuration object for the CpmAnt model.

    Returns:
        None

    Raises:
        None
    """
    super().__init__(config)
    self.cpmant = CpmAntModel(config)

    # lm_head.weight is tied to cpmant.input_embedding.weight
    self.lm_head = nn.Linear(
        config.hidden_size, config.vocab_size + config.prompt_types * config.prompt_length, bias=False
    )
    self.post_init()

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntForCausalLM.forward(input_ids=None, past_key_values=None, use_cache=None, output_attentions=None, output_hidden_states=None, labels=None, return_dict=None, attention_mask=None, **kwargs)

PARAMETER DESCRIPTION
input_ids

Indices of input sequence tokens in the vocabulary.

Indices can be obtained using [CPMAntTokenizer]. See [PreTrainedTokenizer.encode] and [PreTrainedTokenizer.__call__] for details.

What are input IDs?

TYPE: `mindspore.Tensor` of shape `(batch_size, seq_len)` DEFAULT: None

past_key_values

Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) that can be used (see past_key_values input) to speed up sequential decoding.

TYPE: `tuple(tuple(torch.FloatTensor))`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True` DEFAULT: None

use_cache

If set to True, past_key_values key value states are returned and can be used to speed up decoding (see past_key_values).

TYPE: `bool`, *optional* DEFAULT: None

output_attentions

Whether or not to return the attentions tensors of all attention layers.

TYPE: `bool`, *optional* DEFAULT: None

output_hidden_states

Whether or not to return the hidden states of all layers.

TYPE: `bool`, *optional* DEFAULT: None

labels

Labels for computing the masked language modeling loss.

TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional* DEFAULT: None

return_dict

Whether or not to return a [~utils.ModelOutput] instead of a plain tuple.

TYPE: `bool`, *optional* DEFAULT: None

attention_mask

CPMAnt will process attention mask automatically, this parameter is a dummy parameter for text-generation pipeline.

TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional* DEFAULT: None

Example

Text Generation with CpmAntForCausalLM.

>>> from transformers import CPMAntTokenizer, CpmAntForCausalLM
...
>>> texts = "今天天气不错,"
>>> model = CpmAntForCausalLM.from_pretrained("openbmb/cpm-ant-10b")
>>> tokenizer = CPMAntTokenizer.from_pretrained("openbmb/cpm-ant-10b")
>>> input_ids = tokenizer(texts, return_tensors="pt")
>>> outputs = model.generate(**input_ids)
>>> output_texts = tokenizer.batch_decode(outputs)
>>> print(output_texts)
['今天天气不错,阳光明媚,我和妈妈一起去超市买东西。\n在超市里,我看到了一个很好玩的玩具,它的名字叫“机器人”。它有一个圆圆的脑袋,两只圆圆的眼睛,还有一个圆圆的']

Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    past_key_values: Optional[List[Tuple[mindspore.Tensor, mindspore.Tensor]]] = None,
    use_cache: Optional[bool] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    labels: Optional[mindspore.Tensor] = None,
    return_dict: Optional[bool] = None,
    attention_mask: Optional[mindspore.Tensor] = None,  # dummy parameter for text-generation pipeline
    **kwargs,
) -> Union[Tuple, CausalLMOutputWithPast]:
    r"""
    Args:
        input_ids (`mindspore.Tensor` of shape `(batch_size, seq_len)`):
            Indices of input sequence tokens in the vocabulary.

            Indices can be obtained using [`CPMAntTokenizer`]. See [`PreTrainedTokenizer.encode`] and
            [`PreTrainedTokenizer.__call__`] for details.

            [What are input IDs?](../glossary#input-ids)
        past_key_values (`tuple(tuple(torch.FloatTensor))`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`):
            Contains pre-computed hidden-states (key and values in the self-attention blocks and in the
            cross-attention blocks) that can be used (see `past_key_values` input) to speed up sequential decoding.
        use_cache (`bool`, *optional*):
            If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
            (see `past_key_values`).
        output_attentions (`bool`, *optional*):
            Whether or not to return the attentions tensors of all attention layers.
        output_hidden_states (`bool`, *optional*):
            Whether or not to return the hidden states of all layers.
        labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
            Labels for computing the masked language modeling loss.
        return_dict (`bool`, *optional*):
            Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
        attention_mask (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
            CPMAnt will process attention mask automatically, this parameter is a dummy parameter for
            text-generation pipeline.

    Example:
        Text Generation with CpmAntForCausalLM.
        ```python
        >>> from transformers import CPMAntTokenizer, CpmAntForCausalLM
        ...
        >>> texts = "今天天气不错,"
        >>> model = CpmAntForCausalLM.from_pretrained("openbmb/cpm-ant-10b")
        >>> tokenizer = CPMAntTokenizer.from_pretrained("openbmb/cpm-ant-10b")
        >>> input_ids = tokenizer(texts, return_tensors="pt")
        >>> outputs = model.generate(**input_ids)
        >>> output_texts = tokenizer.batch_decode(outputs)
        >>> print(output_texts)
        ['今天天气不错,阳光明媚,我和妈妈一起去超市买东西。\n在超市里,我看到了一个很好玩的玩具,它的名字叫“机器人”。它有一个圆圆的脑袋,两只圆圆的眼睛,还有一个圆圆的']
        ```
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    model_output = self.cpmant(
        input_ids, output_attentions, output_hidden_states, past_key_values, use_cache, return_dict
    )
    hidden_states = model_output.last_hidden_state if return_dict else model_output[0]

    logits = self.lm_head(hidden_states)

    loss = None
    if labels is not None:
        loss = ops.cross_entropy(logits.view(-1, logits.shape[-1]), labels.view(-1))

    if not return_dict:
        output = (logits,) + model_output[1:]
        return ((loss,) + output) if loss is not None else output

    return CausalLMOutputWithPast(
        loss=loss,
        logits=logits,
        past_key_values=model_output.past_key_values,
        hidden_states=model_output.hidden_states,
        attentions=model_output.attentions,
    )

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntForCausalLM.get_input_embeddings()

Retrieve the input embeddings used by the CpmAntForCausalLM model.

PARAMETER DESCRIPTION
self

The instance of the CpmAntForCausalLM class. This parameter is required to access the input embeddings specific to this instance.

TYPE: CpmAntForCausalLM

RETURNS DESCRIPTION
None

This method returns the input embeddings associated with the CpmAntForCausalLM model. The input embeddings are used for processing input data within the model.

RAISES DESCRIPTION
None

This method does not raise any exceptions.

Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
def get_input_embeddings(self):
    """
    Retrieve the input embeddings used by the CpmAntForCausalLM model.

    Args:
        self (CpmAntForCausalLM): The instance of the CpmAntForCausalLM class.
            This parameter is required to access the input embeddings specific to this instance.

    Returns:
        None: This method returns the input embeddings associated with the CpmAntForCausalLM model.
            The input embeddings are used for processing input data within the model.

    Raises:
        None: This method does not raise any exceptions.
    """
    return self.cpmant.input_embedding

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntForCausalLM.get_output_embeddings()

Retrieves the output embeddings of the language model head.

PARAMETER DESCRIPTION
self

An instance of the CpmAntForCausalLM class.

RETURNS DESCRIPTION
lm_head

The method returns the output embeddings of the language model head.

Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
def get_output_embeddings(self):
    """
    Retrieves the output embeddings of the language model head.

    Args:
        self: An instance of the CpmAntForCausalLM class.

    Returns:
        lm_head: The method returns the output embeddings of the language model head.

    Raises:
        None.
    """
    return self.lm_head

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntForCausalLM.prepare_inputs_for_generation(input_ids, **kwargs)

Prepare inputs for generation.

This method takes in two parameters: self and input_ids. It modifies the input_ids and returns a dictionary containing the modified input_ids, use_cache, and past_key_values.

PARAMETER DESCRIPTION
self

The instance of the CpmAntForCausalLM class.

input_ids

The input tensor containing the tokenized input sequence.

TYPE: tensor

RETURNS DESCRIPTION
dict

A dictionary with the following keys:

  • input_ids (tensor): The modified input tensor.
  • use_cache (bool): The value of the use_cache parameter from kwargs.
  • past_key_values (tensor or None): The value of the past_key_values parameter from kwargs, or None if not provided.
Note
  • The input_ids parameter is cast to int.
  • If the 'attention_mask' key is present in kwargs, its value is replaced with a zero tensor of shape (1, 1).
Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
def prepare_inputs_for_generation(self, input_ids, **kwargs):
    """
    Prepare inputs for generation.

    This method takes in two parameters: self and input_ids.
    It modifies the input_ids and returns a dictionary containing the modified input_ids, use_cache, and past_key_values.

    Args:
        self: The instance of the CpmAntForCausalLM class.
        input_ids (tensor): The input tensor containing the tokenized input sequence.

    Returns:
        dict:
            A dictionary with the following keys:

            - input_ids (tensor): The modified input tensor.
            - use_cache (bool): The value of the use_cache parameter from kwargs.
            - past_key_values (tensor or None): The value of the past_key_values parameter from kwargs,
            or None if not provided.

    Raises:
        None.

    Note:
        - The input_ids parameter is cast to int.
        - If the 'attention_mask' key is present in kwargs, its value is replaced with a zero tensor of shape (1, 1).
    """
    input_ids = input_ids.int()
    # save the memory usage of dummy attention mask
    if "attention_mask" in kwargs:
        kwargs["attention_mask"] = ops.zeros(1, 1)

    return {
        "input_ids": input_ids,
        "use_cache": kwargs["use_cache"],
        "past_key_values": kwargs.get("past_key_values", None),
    }

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntForCausalLM.set_input_embeddings(embeddings)

Set the input embeddings for the CpmAntForCausalLM model.

PARAMETER DESCRIPTION
self

The instance of the CpmAntForCausalLM class.

TYPE: CpmAntForCausalLM

embeddings

The input embeddings to be set for the model. This parameter should be a valid embeddings object that can be assigned to the input_embedding attribute of the CpmAntForCausalLM instance.

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
def set_input_embeddings(self, embeddings):
    """
    Set the input embeddings for the CpmAntForCausalLM model.

    Args:
        self (CpmAntForCausalLM): The instance of the CpmAntForCausalLM class.
        embeddings: The input embeddings to be set for the model.
            This parameter should be a valid embeddings object that can be assigned to the input_embedding attribute of the CpmAntForCausalLM instance.

    Returns:
        None.

    Raises:
        None.
    """
    self.cpmant.input_embedding = embeddings

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntForCausalLM.set_output_embeddings(new_embeddings)

Sets the output embeddings of the CpmAntForCausalLM model.

PARAMETER DESCRIPTION
self

The instance of the CpmAntForCausalLM class.

TYPE: CpmAntForCausalLM

new_embeddings

The new embeddings to be set as the output embeddings of the model.

TYPE: Module

RETURNS DESCRIPTION

None

This method sets the output embeddings of the CpmAntForCausalLM model to the provided new embeddings. The new embeddings should be an instance of torch.nn.Module.

Example
>>> model = CpmAntForCausalLM()
>>> new_embeddings = nn.Embedding(1000, 768)
>>> model.set_output_embeddings(new_embeddings)
Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
def set_output_embeddings(self, new_embeddings):
    """
    Sets the output embeddings of the CpmAntForCausalLM model.

    Args:
        self (CpmAntForCausalLM): The instance of the CpmAntForCausalLM class.
        new_embeddings (torch.nn.Module): The new embeddings to be set as the output embeddings of the model.

    Returns:
        None

    Raises:
        None

    This method sets the output embeddings of the CpmAntForCausalLM model to the provided new embeddings.
    The new embeddings should be an instance of torch.nn.Module.

    Example:
        ```python
        >>> model = CpmAntForCausalLM()
        >>> new_embeddings = nn.Embedding(1000, 768)
        >>> model.set_output_embeddings(new_embeddings)
        ```
    """
    self.lm_head = new_embeddings

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntIntermediate

Bases: Module

The CpmAntIntermediate class represents an intermediate layer for the CpmAnt model. This class inherits from nn.Module and is used to perform operations on hidden states, including dense transformations and activation functions.

ATTRIBUTE DESCRIPTION
dense

A dense layer used for transforming hidden states.

TYPE: Linear

intermediate_act_fn

The activation function applied to the hidden states.

TYPE: function

METHOD DESCRIPTION
__init__

Initializes the CpmAntIntermediate instance with the provided configuration.

forward

Applies dense transformation and activation function to the input hidden states.

Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
class CpmAntIntermediate(nn.Module):

    """
    The CpmAntIntermediate class represents an intermediate layer for the CpmAnt model.
    This class inherits from nn.Module and is used to perform operations on hidden states,
    including dense transformations and activation functions.

    Attributes:
        dense (nn.Linear): A dense layer used for transforming hidden states.
        intermediate_act_fn (function): The activation function applied to the hidden states.

    Methods:
        __init__: Initializes the CpmAntIntermediate instance with the provided configuration.
        forward: Applies dense transformation and activation function to the input hidden states.
    """
    def __init__(self, config):
        """
        Initializes an instance of the CpmAntIntermediate class.

        Args:
            self: An instance of the CpmAntIntermediate class.
            config:
                An object of type 'config' containing the configuration parameters for the model.

                - Type: 'config'
                - Purpose: The configuration parameters for the model.
                - Restrictions: None.

        Returns:
            None

        Raises:
            None
        """
        super().__init__()
        self.dense = nn.Linear(config.hidden_size, config.intermediate_size)
        if isinstance(config.hidden_act, str):
            self.intermediate_act_fn = ACT2FN[config.hidden_act]
        else:
            self.intermediate_act_fn = config.hidden_act

    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
        """
        Docstring for method 'forward' in class 'CpmAntIntermediate':

        Args:
            self (CpmAntIntermediate): The instance of the class CpmAntIntermediate.
            hidden_states (mindspore.Tensor): A tensor containing the hidden states data to be processed.
                It should be compatible with the operations performed by the method.

        Returns:
            mindspore.Tensor: A tensor representing the processed hidden states data.
                This tensor is the result of applying the dense layer and intermediate activation function.

        Raises:
            None
        """
        hidden_states = self.dense(hidden_states)
        hidden_states = self.intermediate_act_fn(hidden_states)
        return hidden_states

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntIntermediate.__init__(config)

Initializes an instance of the CpmAntIntermediate class.

PARAMETER DESCRIPTION
self

An instance of the CpmAntIntermediate class.

config

An object of type 'config' containing the configuration parameters for the model.

  • Type: 'config'
  • Purpose: The configuration parameters for the model.
  • Restrictions: None.

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
def __init__(self, config):
    """
    Initializes an instance of the CpmAntIntermediate class.

    Args:
        self: An instance of the CpmAntIntermediate class.
        config:
            An object of type 'config' containing the configuration parameters for the model.

            - Type: 'config'
            - Purpose: The configuration parameters for the model.
            - Restrictions: None.

    Returns:
        None

    Raises:
        None
    """
    super().__init__()
    self.dense = nn.Linear(config.hidden_size, config.intermediate_size)
    if isinstance(config.hidden_act, str):
        self.intermediate_act_fn = ACT2FN[config.hidden_act]
    else:
        self.intermediate_act_fn = config.hidden_act

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntIntermediate.forward(hidden_states)

Docstring for method 'forward' in class 'CpmAntIntermediate':

PARAMETER DESCRIPTION
self

The instance of the class CpmAntIntermediate.

TYPE: CpmAntIntermediate

hidden_states

A tensor containing the hidden states data to be processed. It should be compatible with the operations performed by the method.

TYPE: Tensor

RETURNS DESCRIPTION
Tensor

mindspore.Tensor: A tensor representing the processed hidden states data. This tensor is the result of applying the dense layer and intermediate activation function.

Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
    """
    Docstring for method 'forward' in class 'CpmAntIntermediate':

    Args:
        self (CpmAntIntermediate): The instance of the class CpmAntIntermediate.
        hidden_states (mindspore.Tensor): A tensor containing the hidden states data to be processed.
            It should be compatible with the operations performed by the method.

    Returns:
        mindspore.Tensor: A tensor representing the processed hidden states data.
            This tensor is the result of applying the dense layer and intermediate activation function.

    Raises:
        None
    """
    hidden_states = self.dense(hidden_states)
    hidden_states = self.intermediate_act_fn(hidden_states)
    return hidden_states

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntLayerNorm

Bases: Module

We use Root Mean Square (RMS) Layer Normalization, please see https://arxiv.org/abs/1910.07467 for details."

Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
class CpmAntLayerNorm(nn.Module):
    """
    We use Root Mean Square (RMS) Layer Normalization, please see https://arxiv.org/abs/1910.07467 for details."
    """
    def __init__(self, config: CpmAntConfig):
        """
        Initializes a new instance of the CpmAntLayerNorm class.

        Args:
            self: The object that the method belongs to.
            config (CpmAntConfig): The configuration object used to initialize the instance.
                The config parameter is of type CpmAntConfig and is required to initialize the instance.
                It contains the following attributes:

                - eps: A float value representing the epsilon value used in layer normalization.
                - hidden_size: An integer specifying the size of the hidden layer.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()

        self.eps = config.eps
        self.dim_norm = config.hidden_size
        self.weight = Parameter(ops.zeros(config.hidden_size))

    def forward(self, hidden_states: mindspore.Tensor):
        """
        Args:
            hidden_states (`mindspore.Tensor` of shape `(batch, seq_len, dim_in)`)
        """
        if hidden_states.shape[-1] != self.dim_norm:
            raise AssertionError("hidden_states.shape[-1] != self.dim_norm")
        old_dtype = hidden_states.dtype
        variance = hidden_states.to(mindspore.float32).pow(2).mean(axis=-1, keep_dims=True)
        hidden_states = (hidden_states * ops.rsqrt(variance + self.eps)).to(old_dtype) * self.weight
        return hidden_states

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntLayerNorm.__init__(config)

Initializes a new instance of the CpmAntLayerNorm class.

PARAMETER DESCRIPTION
self

The object that the method belongs to.

config

The configuration object used to initialize the instance. The config parameter is of type CpmAntConfig and is required to initialize the instance. It contains the following attributes:

  • eps: A float value representing the epsilon value used in layer normalization.
  • hidden_size: An integer specifying the size of the hidden layer.

TYPE: CpmAntConfig

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
def __init__(self, config: CpmAntConfig):
    """
    Initializes a new instance of the CpmAntLayerNorm class.

    Args:
        self: The object that the method belongs to.
        config (CpmAntConfig): The configuration object used to initialize the instance.
            The config parameter is of type CpmAntConfig and is required to initialize the instance.
            It contains the following attributes:

            - eps: A float value representing the epsilon value used in layer normalization.
            - hidden_size: An integer specifying the size of the hidden layer.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()

    self.eps = config.eps
    self.dim_norm = config.hidden_size
    self.weight = Parameter(ops.zeros(config.hidden_size))

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntLayerNorm.forward(hidden_states)

Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
76
77
78
79
80
81
82
83
84
85
86
def forward(self, hidden_states: mindspore.Tensor):
    """
    Args:
        hidden_states (`mindspore.Tensor` of shape `(batch, seq_len, dim_in)`)
    """
    if hidden_states.shape[-1] != self.dim_norm:
        raise AssertionError("hidden_states.shape[-1] != self.dim_norm")
    old_dtype = hidden_states.dtype
    variance = hidden_states.to(mindspore.float32).pow(2).mean(axis=-1, keep_dims=True)
    hidden_states = (hidden_states * ops.rsqrt(variance + self.eps)).to(old_dtype) * self.weight
    return hidden_states

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntModel

Bases: CpmAntPreTrainedModel

CpmAntModel is a class that represents a model for CPM-ANT (Antecedent-Conditioned Prompting) tasks. It inherits from CpmAntPreTrainedModel and includes methods for initializing the model, preparing attention masks, and forwarding the model output based on input tensors.

ATTRIBUTE DESCRIPTION
encoder

CpmAntEncoder object for encoding input data

segment_embedding

nn.Embedding object for segment embeddings

input_embedding

nn.Embedding object for input embeddings

position_bias

CpmAntSegmentPositionEmbedding object for position bias calculations

prompt_length

Length of the prompt in the input data

vocab_size

Size of the vocabulary in the input data

METHOD DESCRIPTION
__init__

Initializes the model with the given configuration

get_input_embeddings

Returns the input embeddings

set_input_embeddings

Sets the input embeddings to the given value

_prepare_attention_mask

Prepares the attention mask for the input data

forward

Constructs the model output based on input tensors and optional configurations

This class provides functionality for processing input data, calculating attention masks, and generating model outputs for CPM-ANT tasks.

Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
class CpmAntModel(CpmAntPreTrainedModel):

    """
    CpmAntModel is a class that represents a model for CPM-ANT (Antecedent-Conditioned Prompting) tasks.
    It inherits from CpmAntPreTrainedModel and includes methods for initializing the model, preparing
    attention masks, and forwarding the model output based on input tensors.

    Attributes:
        encoder: CpmAntEncoder object for encoding input data
        segment_embedding: nn.Embedding object for segment embeddings
        input_embedding: nn.Embedding object for input embeddings
        position_bias: CpmAntSegmentPositionEmbedding object for position bias calculations
        prompt_length: Length of the prompt in the input data
        vocab_size: Size of the vocabulary in the input data

    Methods:
        __init__: Initializes the model with the given configuration
        get_input_embeddings: Returns the input embeddings
        set_input_embeddings: Sets the input embeddings to the given value
        _prepare_attention_mask: Prepares the attention mask for the input data
        forward: Constructs the model output based on input tensors and optional configurations

    This class provides functionality for processing input data, calculating attention masks,
    and generating model outputs for CPM-ANT tasks.
    """
    def __init__(self, config: CpmAntConfig):
        """
        Initializes a new instance of the CpmAntModel class.

        Args:
            self: The object instance itself.
            config (CpmAntConfig): An instance of CpmAntConfig containing configuration parameters for the model.
                It specifies the configuration settings required for initializing the model.
                This parameter is mandatory and must be an instance of CpmAntConfig.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__(config)
        self.encoder = CpmAntEncoder(config)
        self.segment_embedding = nn.Embedding(config.segment_types, config.hidden_size)
        self.input_embedding = nn.Embedding(
            config.vocab_size + config.prompt_types * config.prompt_length, config.hidden_size
        )
        self.position_bias = CpmAntSegmentPositionEmbedding(config)
        self.prompt_length = config.prompt_length
        self.vocab_size = config.vocab_size

        self.post_init()

    def get_input_embeddings(self):
        """
        Retrieve the input embeddings from the CpmAntModel.

        Args:
            self: CpmAntModel - The instance of the CpmAntModel class.

        Returns:
            None:
                This method returns the input embeddings as an instance of the input_embedding attribute
                from the CpmAntModel.

        Raises:
            This method does not raise any exceptions.
        """
        return self.input_embedding

    def set_input_embeddings(self, embeddings, **kwargs):
        """
        Method to set input embeddings for the CpmAntModel.

        Args:
            self (CpmAntModel): The instance of the CpmAntModel class.
            embeddings:
                The input embeddings to be set for the model.

                - Type: Any
                - Purpose: Represents the embeddings to be assigned to the input_embedding attribute of
                the CpmAntModel instance.
                - Restrictions: None

        Returns:
            None.

        Raises:
            None.
        """
        self.input_embedding = embeddings

    def _prepare_attention_mask(self, input_ids, span, context, length):
        """
        Prepare attention mask for the CpmAntModel.

        Args:
            self (CpmAntModel): The instance of the CpmAntModel class.
            input_ids (Tensor): The input tensor containing tokenized input IDs.
            span (Tensor): The tensor containing span information.
            context (Tensor): The tensor containing context information.
            length (Tensor): The tensor containing the length information.

        Returns:
            Tensor: The attention mask tensor prepared for the CpmAntModel.

        Raises:
            ValueError: If the input_ids, span, context, or length tensors are not provided.
            RuntimeError: If there is an issue during the preparation of the attention mask.
        """
        batch = input_ids.shape[0]
        seqlen = input_ids.shape[1]
        directional_mask_2d = ops.arange(seqlen) <= ops.arange(seqlen).view(-1, 1)
        attention_mask = context[:, None, :] | (
            context[:, :, None].logical_not() & directional_mask_2d.view(1, seqlen, seqlen)
        )
        attention_mask = attention_mask & (span[:, None, :] == span[:, :, None])
        # mask for left padding
        mask_1d = (
            mindspore.Tensor(list(range(seqlen - self.prompt_length))[::-1])[None, :].repeat(batch, 1)
            < length[:, None]
        )
        mask_1d = ops.cat((ops.ones(batch, self.prompt_length).bool(), mask_1d), axis=1)
        attention_mask = mask_1d.view(batch, seqlen, 1) & mask_1d.view(batch, 1, seqlen) & attention_mask
        return attention_mask

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        past_key_values: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
        use_cache: Optional[bool] = None,
        return_dict: Optional[bool] = None,
        **kwargs,
    ) -> Union[Tuple[mindspore.Tensor], BaseModelOutputWithPast]:
        """
        Constructs the CpmAntModel.

        This method initializes and forwards the CpmAntModel. It takes the following parameters:

        Args:
            self: The instance of the class.
            input_ids (Optional[mindspore.Tensor]):
                The input tensor of shape [batch_size, seq_length]. It represents the input IDs for the model.
                Defaults to None.
            output_attentions (Optional[bool]):
                Whether to output attentions. If set to True, the attentions will be returned. Defaults to None.
            output_hidden_states (Optional[bool]):
                Whether to output hidden states. If set to True, the hidden states will be returned. Defaults to None.
            past_key_values (Optional[Tuple[Tuple[mindspore.Tensor]]]):
                The past key values. Defaults to None.
            use_cache (Optional[bool]): Whether to use cache. Defaults to None.
            return_dict (Optional[bool]):
                Whether to return the output as a dictionary.
                If set to True, the output will be returned as a dictionary. Defaults to None.

        Returns:
            Union[Tuple[mindspore.Tensor], BaseModelOutputWithPast]:
                The output of the model.

                - If return_dict is set to False, a tuple of outputs will be returned, including hidden_states,
                present_key_values, all_hidden_states, and all_attentions.
                - If return_dict is set to True, an instance of BaseModelOutputWithPast will be returned, containing
                the last_hidden_state, past_key_values, hidden_states, and attentions.

        Raises:
            None.
        """
        output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
        output_hidden_states = (
            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
        )
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
        use_cache = use_cache if use_cache is not None else self.config.use_cache

        # add prompts ahead
        if input_ids.dtype != mindspore.int32:
            input_ids = input_ids.to(mindspore.int32)
        dtype = input_ids.dtype
        segment = ops.where(input_ids != 0, mindspore.tensor(2), 0).to(dtype=dtype)
        length = (segment != 0).sum(-1).to(dtype=dtype)
        input_ids = ops.cat(
            (
                ops.arange(
                    self.prompt_length * 2 + self.vocab_size,
                    self.prompt_length * 3 + self.vocab_size,
                    dtype=dtype,
                ).tile((input_ids.shape[0], 1)),
                input_ids,
            ),
            axis=1,
        )
        batch, seq_length = input_ids.shape
        segment = ops.cat((ops.zeros(batch, self.prompt_length, dtype=dtype), segment), axis=1)
        context = ops.full((batch, seq_length), 1, dtype=dtype)
        position = ops.arange(seq_length, dtype=dtype).repeat(batch, 1)
        span = ops.full((batch, seq_length), 0, dtype=dtype)

        if past_key_values is None:
            past_length = 0
            past_key_values = tuple([None] * self.encoder.num_layers)
            hidden_states = self.input_embedding(input_ids)
            segment_states = self.segment_embedding(segment)
            hidden_states = hidden_states + segment_states
        else:
            past_length = past_key_values[0][0].shape[-2]
            segment_states = self.segment_embedding(segment)
            hidden_states = self.input_embedding(input_ids) + segment_states[:, -1:, :]

        attention_mask = self._prepare_attention_mask(input_ids, span, context, length)
        position_bias = self.position_bias(position, position, segment, segment)

        attention_mask = attention_mask[:, past_length:, :]
        position_bias = position_bias[:, :, past_length:, :]
        hidden_states = hidden_states[:, past_length:, :]

        hidden_states, present_key_values, all_hidden_states, all_attentions = self.encoder(
            hidden_states,
            attention_mask,
            position_bias,
            output_attentions,
            output_hidden_states,
            past_key_values,
            use_cache,
        )

        if past_length == 0:
            hidden_states = hidden_states[:, self.prompt_length :, :]
            # drop the prompt
            if all_attentions is not None:
                new_attentions = ()
                for attention in all_attentions:
                    new_attentions += (attention[:, :, self.prompt_length :, self.prompt_length :],)
                all_attentions = new_attentions
            if all_hidden_states is not None:
                new_hidden_states = ()
                for hidden_state in all_hidden_states:
                    new_hidden_states += (hidden_state[:, self.prompt_length :, :],)
                all_hidden_states = new_hidden_states

        if not return_dict:
            return tuple(
                v for v in [hidden_states, present_key_values, all_hidden_states, all_attentions] if v is not None
            )

        return BaseModelOutputWithPast(
            last_hidden_state=hidden_states,
            past_key_values=present_key_values,
            hidden_states=all_hidden_states,
            attentions=all_attentions,
        )

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntModel.__init__(config)

Initializes a new instance of the CpmAntModel class.

PARAMETER DESCRIPTION
self

The object instance itself.

config

An instance of CpmAntConfig containing configuration parameters for the model. It specifies the configuration settings required for initializing the model. This parameter is mandatory and must be an instance of CpmAntConfig.

TYPE: CpmAntConfig

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
def __init__(self, config: CpmAntConfig):
    """
    Initializes a new instance of the CpmAntModel class.

    Args:
        self: The object instance itself.
        config (CpmAntConfig): An instance of CpmAntConfig containing configuration parameters for the model.
            It specifies the configuration settings required for initializing the model.
            This parameter is mandatory and must be an instance of CpmAntConfig.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__(config)
    self.encoder = CpmAntEncoder(config)
    self.segment_embedding = nn.Embedding(config.segment_types, config.hidden_size)
    self.input_embedding = nn.Embedding(
        config.vocab_size + config.prompt_types * config.prompt_length, config.hidden_size
    )
    self.position_bias = CpmAntSegmentPositionEmbedding(config)
    self.prompt_length = config.prompt_length
    self.vocab_size = config.vocab_size

    self.post_init()

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntModel.forward(input_ids=None, output_attentions=None, output_hidden_states=None, past_key_values=None, use_cache=None, return_dict=None, **kwargs)

Constructs the CpmAntModel.

This method initializes and forwards the CpmAntModel. It takes the following parameters:

PARAMETER DESCRIPTION
self

The instance of the class.

input_ids

The input tensor of shape [batch_size, seq_length]. It represents the input IDs for the model. Defaults to None.

TYPE: Optional[Tensor] DEFAULT: None

output_attentions

Whether to output attentions. If set to True, the attentions will be returned. Defaults to None.

TYPE: Optional[bool] DEFAULT: None

output_hidden_states

Whether to output hidden states. If set to True, the hidden states will be returned. Defaults to None.

TYPE: Optional[bool] DEFAULT: None

past_key_values

The past key values. Defaults to None.

TYPE: Optional[Tuple[Tuple[Tensor]]] DEFAULT: None

use_cache

Whether to use cache. Defaults to None.

TYPE: Optional[bool] DEFAULT: None

return_dict

Whether to return the output as a dictionary. If set to True, the output will be returned as a dictionary. Defaults to None.

TYPE: Optional[bool] DEFAULT: None

RETURNS DESCRIPTION
Union[Tuple[Tensor], BaseModelOutputWithPast]

Union[Tuple[mindspore.Tensor], BaseModelOutputWithPast]: The output of the model.

  • If return_dict is set to False, a tuple of outputs will be returned, including hidden_states, present_key_values, all_hidden_states, and all_attentions.
  • If return_dict is set to True, an instance of BaseModelOutputWithPast will be returned, containing the last_hidden_state, past_key_values, hidden_states, and attentions.
Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    past_key_values: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
    use_cache: Optional[bool] = None,
    return_dict: Optional[bool] = None,
    **kwargs,
) -> Union[Tuple[mindspore.Tensor], BaseModelOutputWithPast]:
    """
    Constructs the CpmAntModel.

    This method initializes and forwards the CpmAntModel. It takes the following parameters:

    Args:
        self: The instance of the class.
        input_ids (Optional[mindspore.Tensor]):
            The input tensor of shape [batch_size, seq_length]. It represents the input IDs for the model.
            Defaults to None.
        output_attentions (Optional[bool]):
            Whether to output attentions. If set to True, the attentions will be returned. Defaults to None.
        output_hidden_states (Optional[bool]):
            Whether to output hidden states. If set to True, the hidden states will be returned. Defaults to None.
        past_key_values (Optional[Tuple[Tuple[mindspore.Tensor]]]):
            The past key values. Defaults to None.
        use_cache (Optional[bool]): Whether to use cache. Defaults to None.
        return_dict (Optional[bool]):
            Whether to return the output as a dictionary.
            If set to True, the output will be returned as a dictionary. Defaults to None.

    Returns:
        Union[Tuple[mindspore.Tensor], BaseModelOutputWithPast]:
            The output of the model.

            - If return_dict is set to False, a tuple of outputs will be returned, including hidden_states,
            present_key_values, all_hidden_states, and all_attentions.
            - If return_dict is set to True, an instance of BaseModelOutputWithPast will be returned, containing
            the last_hidden_state, past_key_values, hidden_states, and attentions.

    Raises:
        None.
    """
    output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
    output_hidden_states = (
        output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
    )
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict
    use_cache = use_cache if use_cache is not None else self.config.use_cache

    # add prompts ahead
    if input_ids.dtype != mindspore.int32:
        input_ids = input_ids.to(mindspore.int32)
    dtype = input_ids.dtype
    segment = ops.where(input_ids != 0, mindspore.tensor(2), 0).to(dtype=dtype)
    length = (segment != 0).sum(-1).to(dtype=dtype)
    input_ids = ops.cat(
        (
            ops.arange(
                self.prompt_length * 2 + self.vocab_size,
                self.prompt_length * 3 + self.vocab_size,
                dtype=dtype,
            ).tile((input_ids.shape[0], 1)),
            input_ids,
        ),
        axis=1,
    )
    batch, seq_length = input_ids.shape
    segment = ops.cat((ops.zeros(batch, self.prompt_length, dtype=dtype), segment), axis=1)
    context = ops.full((batch, seq_length), 1, dtype=dtype)
    position = ops.arange(seq_length, dtype=dtype).repeat(batch, 1)
    span = ops.full((batch, seq_length), 0, dtype=dtype)

    if past_key_values is None:
        past_length = 0
        past_key_values = tuple([None] * self.encoder.num_layers)
        hidden_states = self.input_embedding(input_ids)
        segment_states = self.segment_embedding(segment)
        hidden_states = hidden_states + segment_states
    else:
        past_length = past_key_values[0][0].shape[-2]
        segment_states = self.segment_embedding(segment)
        hidden_states = self.input_embedding(input_ids) + segment_states[:, -1:, :]

    attention_mask = self._prepare_attention_mask(input_ids, span, context, length)
    position_bias = self.position_bias(position, position, segment, segment)

    attention_mask = attention_mask[:, past_length:, :]
    position_bias = position_bias[:, :, past_length:, :]
    hidden_states = hidden_states[:, past_length:, :]

    hidden_states, present_key_values, all_hidden_states, all_attentions = self.encoder(
        hidden_states,
        attention_mask,
        position_bias,
        output_attentions,
        output_hidden_states,
        past_key_values,
        use_cache,
    )

    if past_length == 0:
        hidden_states = hidden_states[:, self.prompt_length :, :]
        # drop the prompt
        if all_attentions is not None:
            new_attentions = ()
            for attention in all_attentions:
                new_attentions += (attention[:, :, self.prompt_length :, self.prompt_length :],)
            all_attentions = new_attentions
        if all_hidden_states is not None:
            new_hidden_states = ()
            for hidden_state in all_hidden_states:
                new_hidden_states += (hidden_state[:, self.prompt_length :, :],)
            all_hidden_states = new_hidden_states

    if not return_dict:
        return tuple(
            v for v in [hidden_states, present_key_values, all_hidden_states, all_attentions] if v is not None
        )

    return BaseModelOutputWithPast(
        last_hidden_state=hidden_states,
        past_key_values=present_key_values,
        hidden_states=all_hidden_states,
        attentions=all_attentions,
    )

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntModel.get_input_embeddings()

Retrieve the input embeddings from the CpmAntModel.

PARAMETER DESCRIPTION
self

CpmAntModel - The instance of the CpmAntModel class.

RETURNS DESCRIPTION
None

This method returns the input embeddings as an instance of the input_embedding attribute from the CpmAntModel.

Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
def get_input_embeddings(self):
    """
    Retrieve the input embeddings from the CpmAntModel.

    Args:
        self: CpmAntModel - The instance of the CpmAntModel class.

    Returns:
        None:
            This method returns the input embeddings as an instance of the input_embedding attribute
            from the CpmAntModel.

    Raises:
        This method does not raise any exceptions.
    """
    return self.input_embedding

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntModel.set_input_embeddings(embeddings, **kwargs)

Method to set input embeddings for the CpmAntModel.

PARAMETER DESCRIPTION
self

The instance of the CpmAntModel class.

TYPE: CpmAntModel

embeddings

The input embeddings to be set for the model.

  • Type: Any
  • Purpose: Represents the embeddings to be assigned to the input_embedding attribute of the CpmAntModel instance.
  • Restrictions: None

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
def set_input_embeddings(self, embeddings, **kwargs):
    """
    Method to set input embeddings for the CpmAntModel.

    Args:
        self (CpmAntModel): The instance of the CpmAntModel class.
        embeddings:
            The input embeddings to be set for the model.

            - Type: Any
            - Purpose: Represents the embeddings to be assigned to the input_embedding attribute of
            the CpmAntModel instance.
            - Restrictions: None

    Returns:
        None.

    Raises:
        None.
    """
    self.input_embedding = embeddings

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntOutput

Bases: Module

CpmAntOutput represents a custom module for processing hidden states and input tensors in a CpmAnt model.

This class inherits from nn.Module and includes methods for initializing the module and forwarding the output tensor.

ATTRIBUTE DESCRIPTION
dense

A dense layer for processing hidden states.

TYPE: Linear

LayerNorm

A layer normalization module for normalizing hidden states.

TYPE: LayerNorm

dropout

A dropout module for applying dropout to hidden states.

TYPE: Dropout

METHOD DESCRIPTION
__init__

Initializes the CpmAntOutput module with the provided configuration.

forward

Constructs the output tensor based on the given hidden states and input tensor.

Example
>>> config = Config(intermediate_size=256, hidden_size=512, layer_norm_eps=1e-6)
>>> model = CpmAntOutput(config)
>>> output = model.forward(hidden_states, input_tensor)
Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
class CpmAntOutput(nn.Module):

    """
    CpmAntOutput represents a custom module for processing hidden states and input tensors in a CpmAnt model.

    This class inherits from nn.Module and includes methods for initializing the module and forwarding the output tensor.

    Attributes:
        dense (nn.Linear): A dense layer for processing hidden states.
        LayerNorm (nn.LayerNorm): A layer normalization module for normalizing hidden states.
        dropout (nn.Dropout): A dropout module for applying dropout to hidden states.

    Methods:
        __init__(config): Initializes the CpmAntOutput module with the provided configuration.
        forward(hidden_states, input_tensor): Constructs the output tensor based on the given hidden states and input tensor.

    Example:
        ```python
        >>> config = Config(intermediate_size=256, hidden_size=512, layer_norm_eps=1e-6)
        >>> model = CpmAntOutput(config)
        >>> output = model.forward(hidden_states, input_tensor)
        ```
    """
    def __init__(self, config):
        """
        Initializes a new instance of the CpmAntOutput class.

        Args:
            self: The object itself.
            config: An instance of the configuration class containing the model configuration parameters.

        Returns:
            None

        Raises:
            None
        """
        super().__init__()
        self.dense = nn.Linear(config.intermediate_size, config.hidden_size)
        self.LayerNorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
        self.dropout = nn.Dropout(p=config.hidden_dropout_prob)

    def forward(self, hidden_states: mindspore.Tensor, input_tensor: mindspore.Tensor) -> mindspore.Tensor:
        """
        Constructs the CpmAntOutput by processing the given hidden states and input tensor.

        Args:
            self (CpmAntOutput): An instance of the CpmAntOutput class.
            hidden_states (mindspore.Tensor): A tensor containing the hidden states.
                Shape: (batch_size, sequence_length, hidden_size)
                The hidden states represent the intermediate outputs of the model.
            input_tensor (mindspore.Tensor): A tensor containing the input values.
                Shape: (batch_size, sequence_length, hidden_size)
                The input tensor is added to the hidden states after passing through the dense, dropout, and LayerNorm layers.

        Returns:
            mindspore.Tensor: A tensor representing the processed hidden states.
                Shape: (batch_size, sequence_length, hidden_size)
                The processed hidden states are obtained by passing the hidden states through the dense, dropout,
                and LayerNorm layers, and then adding the input tensor.

        Raises:
            None.
        """
        hidden_states = self.dense(hidden_states)
        hidden_states = self.dropout(hidden_states)
        hidden_states = self.LayerNorm(hidden_states + input_tensor)
        return hidden_states

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntOutput.__init__(config)

Initializes a new instance of the CpmAntOutput class.

PARAMETER DESCRIPTION
self

The object itself.

config

An instance of the configuration class containing the model configuration parameters.

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
def __init__(self, config):
    """
    Initializes a new instance of the CpmAntOutput class.

    Args:
        self: The object itself.
        config: An instance of the configuration class containing the model configuration parameters.

    Returns:
        None

    Raises:
        None
    """
    super().__init__()
    self.dense = nn.Linear(config.intermediate_size, config.hidden_size)
    self.LayerNorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
    self.dropout = nn.Dropout(p=config.hidden_dropout_prob)

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntOutput.forward(hidden_states, input_tensor)

Constructs the CpmAntOutput by processing the given hidden states and input tensor.

PARAMETER DESCRIPTION
self

An instance of the CpmAntOutput class.

TYPE: CpmAntOutput

hidden_states

A tensor containing the hidden states. Shape: (batch_size, sequence_length, hidden_size) The hidden states represent the intermediate outputs of the model.

TYPE: Tensor

input_tensor

A tensor containing the input values. Shape: (batch_size, sequence_length, hidden_size) The input tensor is added to the hidden states after passing through the dense, dropout, and LayerNorm layers.

TYPE: Tensor

RETURNS DESCRIPTION
Tensor

mindspore.Tensor: A tensor representing the processed hidden states. Shape: (batch_size, sequence_length, hidden_size) The processed hidden states are obtained by passing the hidden states through the dense, dropout, and LayerNorm layers, and then adding the input tensor.

Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
def forward(self, hidden_states: mindspore.Tensor, input_tensor: mindspore.Tensor) -> mindspore.Tensor:
    """
    Constructs the CpmAntOutput by processing the given hidden states and input tensor.

    Args:
        self (CpmAntOutput): An instance of the CpmAntOutput class.
        hidden_states (mindspore.Tensor): A tensor containing the hidden states.
            Shape: (batch_size, sequence_length, hidden_size)
            The hidden states represent the intermediate outputs of the model.
        input_tensor (mindspore.Tensor): A tensor containing the input values.
            Shape: (batch_size, sequence_length, hidden_size)
            The input tensor is added to the hidden states after passing through the dense, dropout, and LayerNorm layers.

    Returns:
        mindspore.Tensor: A tensor representing the processed hidden states.
            Shape: (batch_size, sequence_length, hidden_size)
            The processed hidden states are obtained by passing the hidden states through the dense, dropout,
            and LayerNorm layers, and then adding the input tensor.

    Raises:
        None.
    """
    hidden_states = self.dense(hidden_states)
    hidden_states = self.dropout(hidden_states)
    hidden_states = self.LayerNorm(hidden_states + input_tensor)
    return hidden_states

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntPreTrainedModel

Bases: PreTrainedModel

An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained models.

Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
class CpmAntPreTrainedModel(PreTrainedModel):
    """
    An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
    models.
    """
    config_class = CpmAntConfig
    base_model_prefix = "cpmant"

    def _init_weights(self, cell):
        """Initialize the weights"""
        std = self.config.init_std
        if isinstance(cell, nn.Linear):
            cell.weight.set_data(initializer(Normal(std), cell.weight.shape, cell.weight.dtype))
            if cell.bias is not None:
                cell.bias.set_data(initializer('zeros', cell.bias.shape, cell.bias.dtype))
        elif isinstance(cell, nn.Embedding):
            weight = np.random.normal(0.0, std, cell.weight.shape)
            if cell.padding_idx:
                weight[cell.padding_idx] = 0

            cell.weight.set_data(Tensor(weight, cell.weight.dtype))
        elif isinstance(cell, nn.LayerNorm):
            cell.weight.set_data(initializer('ones', cell.weight.shape, cell.weight.dtype))
            cell.bias.set_data(initializer('zeros', cell.bias.shape, cell.bias.dtype))
        elif isinstance(cell, CpmAntLayerNorm):
            cell.weight.set_data(initializer('ones', cell.weight.shape, cell.weight.dtype))
        elif isinstance(cell, CpmAntSegmentPositionEmbedding):
            cell.relative_attention_bias.set_data(initializer(
                Normal(std), cell.relative_attention_bias.shape, cell.relative_attention_bias.dtype))

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntSegmentPositionEmbedding

Bases: Module

This class represents a segment position embedding module for the CPM-ANT model. It is used to generate embeddings that encode the relative positions of segments in the input tensors.

The class inherits from the nn.Module class.

ATTRIBUTE DESCRIPTION
num_heads

The number of attention heads in the model.

TYPE: int

num_buckets

The number of buckets used for segment relative positions.

TYPE: int

max_distance

The maximum distance allowed for segment relative positions.

TYPE: int

num_segments

The number of segment types in the model.

TYPE: int

relative_attention_bias

The parameter used to compute the relative attention bias.

TYPE: Parameter

METHOD DESCRIPTION
__init__

Initializes the CpmAntSegmentPositionEmbedding instance with the provided configuration.

forward

Constructs the segment position embeddings based on the input key and query positions and segments.

_segment_relative_position_bucket

Computes the segment relative position bucket.

_position_bucket

Computes the position bucket.

Detailed Description

The CpmAntSegmentPositionEmbedding class is used to compute segment position embeddings for the CPM-ANT model. These embeddings encode the relative positions between different segments in the input tensors.

The class takes a configuration object (CpmAntConfig) as input during initialization. This configuration object contains various parameters such as the number of attention heads, the number of buckets for segment relative positions, the maximum distance allowed for segment relative positions, and the number of segment types in the model.

The forward method is the main function of this class. It takes four input tensors: key_pos, query_pos, key_segment, and query_segment. These tensors represent the positions and segments of the key and query elements. The method checks the shapes of the input tensors and raises an AssertionError if they are not compatible. It then performs various operations to compute the relative position bucket and the position bucket. Finally, it uses the computed embeddings to generate the segment position embeddings.

The _segment_relative_position_bucket method computes the segment relative position bucket based on the query and key segments.

The _position_bucket method computes the position bucket based on the relative position, the number of buckets, and the maximum distance.

Note

This class assumes the availability of the following modules: mindspore, math.

Example
>>> config = CpmAntConfig()
>>> segment_embedding = CpmAntSegmentPositionEmbedding(config)
>>> key_pos = mindspore.Tensor(...)
>>> query_pos = mindspore.Tensor(...)
>>> key_segment = mindspore.Tensor(...)
>>> query_segment = mindspore.Tensor(...)
>>> embeddings = segment_embedding.forward(key_pos, query_pos, key_segment, query_segment)
Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
 808
 809
 810
 811
 812
 813
 814
 815
 816
 817
 818
 819
 820
 821
 822
 823
 824
 825
 826
 827
 828
 829
 830
 831
 832
 833
 834
 835
 836
 837
 838
 839
 840
 841
 842
 843
 844
 845
 846
 847
 848
 849
 850
 851
 852
 853
 854
 855
 856
 857
 858
 859
 860
 861
 862
 863
 864
 865
 866
 867
 868
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
class CpmAntSegmentPositionEmbedding(nn.Module):

    """
    This class represents a segment position embedding module for the CPM-ANT model.
    It is used to generate embeddings that encode the relative positions of segments in the input tensors.

    The class inherits from the nn.Module class.

    Attributes:
        num_heads (int): The number of attention heads in the model.
        num_buckets (int): The number of buckets used for segment relative positions.
        max_distance (int): The maximum distance allowed for segment relative positions.
        num_segments (int): The number of segment types in the model.
        relative_attention_bias (mindspore.Parameter): The parameter used to compute the relative attention bias.

    Methods:
        __init__: Initializes the CpmAntSegmentPositionEmbedding instance with the provided configuration.
        forward: Constructs the segment position embeddings based on the input key and query positions and segments.
        _segment_relative_position_bucket: Computes the segment relative position bucket.
        _position_bucket: Computes the position bucket.

    Detailed Description:
        The CpmAntSegmentPositionEmbedding class is used to compute segment position embeddings for the CPM-ANT model.
        These embeddings encode the relative positions between different segments in the input tensors.

        The class takes a configuration object (CpmAntConfig) as input during initialization.
        This configuration object contains various parameters such as the number of attention heads, the number of buckets for
        segment relative positions, the maximum distance allowed for segment relative positions, and the number of segment types in the model.

        The forward method is the main function of this class.
        It takes four input tensors: key_pos, query_pos, key_segment, and query_segment.
        These tensors represent the positions and segments of the key and query elements.
        The method checks the shapes of the input tensors and raises an AssertionError if they are not compatible.
        It then performs various operations to compute the relative position bucket and the  position bucket.
        Finally, it uses the computed embeddings to generate the segment position embeddings.

        The _segment_relative_position_bucket method computes the segment relative position bucket based on the query and key segments.

        The _position_bucket method computes the position bucket based on the relative position, the number of buckets, and the maximum distance.

    Note:
        This class assumes the availability of the following modules: mindspore, math.

    Example:
        ```python
        >>> config = CpmAntConfig()
        >>> segment_embedding = CpmAntSegmentPositionEmbedding(config)
        >>> key_pos = mindspore.Tensor(...)
        >>> query_pos = mindspore.Tensor(...)
        >>> key_segment = mindspore.Tensor(...)
        >>> query_segment = mindspore.Tensor(...)
        >>> embeddings = segment_embedding.forward(key_pos, query_pos, key_segment, query_segment)
        ```
    """
    def __init__(self, config: CpmAntConfig):
        """
        Initializes an instance of the CpmAntSegmentPositionEmbedding class.

        Args:
            self: The instance of the class.
            config (CpmAntConfig):
                The configuration object containing the parameters for the segment position embedding.

                - num_heads (int): The number of attention heads.
                - num_buckets (int): The number of buckets for the position bias.
                - max_distance (int): The maximum distance for the position bias.
                - num_segments (int): The number of segment types.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()

        self.num_heads = config.num_attention_heads
        self.num_buckets = config.position_bias_num_buckets
        self.max_distance = config.position_bias_max_distance
        self.num_segments = config.segment_types

        self.relative_attention_bias = Parameter(
            ops.zeros(
                config.segment_types * config.segment_types + config.position_bias_num_buckets,
                config.num_attention_heads,
            )
        )

    def forward(
        self,
        key_pos: mindspore.Tensor,
        query_pos: mindspore.Tensor,
        key_segment: mindspore.Tensor,
        query_segment: mindspore.Tensor,
    ):
        """
        Constructs the segment position embedding for the CpmAntSegmentPositionEmbedding class.

        Args:
            self: An instance of the CpmAntSegmentPositionEmbedding class.
            key_pos (mindspore.Tensor): A tensor representing the positions of the keys. Its shape is (batch, keylen).
            query_pos (mindspore.Tensor): A tensor representing the positions of the queries. Its shape is (batch, querylen).
            key_segment (mindspore.Tensor): A tensor representing the segments of the keys. Its shape is (batch, keylen).
            query_segment (mindspore.Tensor): A tensor representing the segments of the queries. Its shape is (batch, querylen).

        Returns:
            None.

        Raises:
            AssertionError: If key_pos.shape[0] is not equal to query_pos.shape[0].
            AssertionError: If keylen is not equal to key_segment.shape[1] or querylen is not equal to query_segment.shape[1].
            AssertionError: If querylen is not equal to query_segment.shape[1].
        """
        batch = key_pos.shape[0]
        keylen = key_pos.shape[1]
        querylen = query_pos.shape[1]

        if key_pos.shape[0] != query_pos.shape[0]:
            raise AssertionError(
                f"key_pos.shape[0] should be equal to query_pos.shape[0], but got {key_pos.shape[0]} and {query_pos.shape[0]}!"
            )
        if keylen != key_segment.shape[1] or querylen != query_segment.shape[1]:
            raise AssertionError(
                f"keylen should be equal to key_segment.shape[1], but got {keylen} and {key_segment.shape[1]}!"
            )
        if querylen != query_segment.shape[1]:
            raise AssertionError(
                f"querylen should be equal to query_segment.shape[1], but got {querylen} and {query_segment.szie(1)}!"
            )

        key_pos = key_pos.view(batch, -1, keylen)
        query_pos = query_pos.view(batch, querylen, -1)
        key_segment = key_segment.view(batch, -1, keylen)
        query_segment = query_segment.view(batch, querylen, -1)

        relative_position_bucket = self._segment_relative_position_bucket(query_segment, key_segment)
        relative_position_bucket = relative_position_bucket + self.num_buckets

        # (batch, len_q, len_k)
        absolute_position_bucket = self._position_bucket(
            ops.arange(keylen, dtype=mindspore.int32)[None, :]
            - ops.arange(querylen, dtype=mindspore.int32)[:, None],
            num_buckets=self.num_buckets,
            max_distance=self.max_distance,
        )
        relative_position_bucket = ops.where(
            (key_segment == query_segment),
            absolute_position_bucket[None, :, :],
            relative_position_bucket,
        )

        # (batch, len_q, len_k, num_heads)
        embeds = F.embedding(relative_position_bucket, self.relative_attention_bias)
        # (batch, num_heads, len_q, len_k)
        embeds = embeds.permute(0, 3, 1, 2)
        return embeds

    def _segment_relative_position_bucket(self, query_segment, key_segment):
        """
        Method to calculate the relative position bucket between a query segment and a key segment.

        Args:
            self (CpmAntSegmentPositionEmbedding): An instance of the CpmAntSegmentPositionEmbedding class.
            query_segment (int): The segment index of the query.
            key_segment (int): The segment index of the key.

        Returns:
            None: This method does not return any value.

        Raises:
            None: This method does not raise any exceptions.
        """
        return query_segment * self.num_segments + key_segment

    def _position_bucket(self, relative_position, num_buckets=32, max_distance=128):
        """
        Position bucket calculation.

        Args:
            self (CpmAntSegmentPositionEmbedding): The instance of the CpmAntSegmentPositionEmbedding class.
            relative_position (Tensor): The relative position for which the bucket is calculated.
            num_buckets (int): The total number of buckets to be used for bucketing the relative positions. Default is 32.
            max_distance (int): The maximum distance considered for bucketing. Default is 128.

        Returns:
            Tensor: The calculated relative bucket positions.

        Raises:
            ValueError: If the relative_position tensor is not valid or if any of the input parameters are invalid.
            TypeError: If the input parameters are not of the expected types.
            RuntimeError: If there is a runtime error during the bucket calculation process.
        """
        relative_buckets = 0
        # always bidirectional in CPMAnt
        num_buckets //= 2
        relative_buckets = (relative_position > 0).to(mindspore.int32) * num_buckets
        relative_position = ops.abs(relative_position)
        max_exact = num_buckets // 2
        is_small = relative_position < max_exact
        relative_postion_if_large = max_exact + (
            ops.log(relative_position.float() / max_exact)
            / math.log(max_distance / max_exact)
            * (num_buckets - max_exact)
        ).to(mindspore.int32)
        relative_postion_if_large = ops.minimum(
            relative_postion_if_large,
            ops.full_like(relative_postion_if_large, num_buckets - 1),
        )
        relative_buckets += ops.where(is_small, relative_position.to(mindspore.int32), relative_postion_if_large)
        return relative_buckets

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntSegmentPositionEmbedding.__init__(config)

Initializes an instance of the CpmAntSegmentPositionEmbedding class.

PARAMETER DESCRIPTION
self

The instance of the class.

config

The configuration object containing the parameters for the segment position embedding.

  • num_heads (int): The number of attention heads.
  • num_buckets (int): The number of buckets for the position bias.
  • max_distance (int): The maximum distance for the position bias.
  • num_segments (int): The number of segment types.

TYPE: CpmAntConfig

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
def __init__(self, config: CpmAntConfig):
    """
    Initializes an instance of the CpmAntSegmentPositionEmbedding class.

    Args:
        self: The instance of the class.
        config (CpmAntConfig):
            The configuration object containing the parameters for the segment position embedding.

            - num_heads (int): The number of attention heads.
            - num_buckets (int): The number of buckets for the position bias.
            - max_distance (int): The maximum distance for the position bias.
            - num_segments (int): The number of segment types.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()

    self.num_heads = config.num_attention_heads
    self.num_buckets = config.position_bias_num_buckets
    self.max_distance = config.position_bias_max_distance
    self.num_segments = config.segment_types

    self.relative_attention_bias = Parameter(
        ops.zeros(
            config.segment_types * config.segment_types + config.position_bias_num_buckets,
            config.num_attention_heads,
        )
    )

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntSegmentPositionEmbedding.forward(key_pos, query_pos, key_segment, query_segment)

Constructs the segment position embedding for the CpmAntSegmentPositionEmbedding class.

PARAMETER DESCRIPTION
self

An instance of the CpmAntSegmentPositionEmbedding class.

key_pos

A tensor representing the positions of the keys. Its shape is (batch, keylen).

TYPE: Tensor

query_pos

A tensor representing the positions of the queries. Its shape is (batch, querylen).

TYPE: Tensor

key_segment

A tensor representing the segments of the keys. Its shape is (batch, keylen).

TYPE: Tensor

query_segment

A tensor representing the segments of the queries. Its shape is (batch, querylen).

TYPE: Tensor

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
AssertionError

If key_pos.shape[0] is not equal to query_pos.shape[0].

AssertionError

If keylen is not equal to key_segment.shape[1] or querylen is not equal to query_segment.shape[1].

AssertionError

If querylen is not equal to query_segment.shape[1].

Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
def forward(
    self,
    key_pos: mindspore.Tensor,
    query_pos: mindspore.Tensor,
    key_segment: mindspore.Tensor,
    query_segment: mindspore.Tensor,
):
    """
    Constructs the segment position embedding for the CpmAntSegmentPositionEmbedding class.

    Args:
        self: An instance of the CpmAntSegmentPositionEmbedding class.
        key_pos (mindspore.Tensor): A tensor representing the positions of the keys. Its shape is (batch, keylen).
        query_pos (mindspore.Tensor): A tensor representing the positions of the queries. Its shape is (batch, querylen).
        key_segment (mindspore.Tensor): A tensor representing the segments of the keys. Its shape is (batch, keylen).
        query_segment (mindspore.Tensor): A tensor representing the segments of the queries. Its shape is (batch, querylen).

    Returns:
        None.

    Raises:
        AssertionError: If key_pos.shape[0] is not equal to query_pos.shape[0].
        AssertionError: If keylen is not equal to key_segment.shape[1] or querylen is not equal to query_segment.shape[1].
        AssertionError: If querylen is not equal to query_segment.shape[1].
    """
    batch = key_pos.shape[0]
    keylen = key_pos.shape[1]
    querylen = query_pos.shape[1]

    if key_pos.shape[0] != query_pos.shape[0]:
        raise AssertionError(
            f"key_pos.shape[0] should be equal to query_pos.shape[0], but got {key_pos.shape[0]} and {query_pos.shape[0]}!"
        )
    if keylen != key_segment.shape[1] or querylen != query_segment.shape[1]:
        raise AssertionError(
            f"keylen should be equal to key_segment.shape[1], but got {keylen} and {key_segment.shape[1]}!"
        )
    if querylen != query_segment.shape[1]:
        raise AssertionError(
            f"querylen should be equal to query_segment.shape[1], but got {querylen} and {query_segment.szie(1)}!"
        )

    key_pos = key_pos.view(batch, -1, keylen)
    query_pos = query_pos.view(batch, querylen, -1)
    key_segment = key_segment.view(batch, -1, keylen)
    query_segment = query_segment.view(batch, querylen, -1)

    relative_position_bucket = self._segment_relative_position_bucket(query_segment, key_segment)
    relative_position_bucket = relative_position_bucket + self.num_buckets

    # (batch, len_q, len_k)
    absolute_position_bucket = self._position_bucket(
        ops.arange(keylen, dtype=mindspore.int32)[None, :]
        - ops.arange(querylen, dtype=mindspore.int32)[:, None],
        num_buckets=self.num_buckets,
        max_distance=self.max_distance,
    )
    relative_position_bucket = ops.where(
        (key_segment == query_segment),
        absolute_position_bucket[None, :, :],
        relative_position_bucket,
    )

    # (batch, len_q, len_k, num_heads)
    embeds = F.embedding(relative_position_bucket, self.relative_attention_bias)
    # (batch, num_heads, len_q, len_k)
    embeds = embeds.permute(0, 3, 1, 2)
    return embeds

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntSelfAttentionBlock

Bases: Module

This class represents a self-attention block used in the CpmAnt model. It is a subclass of the nn.Module class.

ATTRIBUTE DESCRIPTION
layernorm_before_attention

An instance of the CpmAntLayerNorm class that performs layer normalization before the self-attention operation.

TYPE: CpmAntLayerNorm

self_attention

An instance of the CpmAntAttention class that performs the self-attention operation.

TYPE: CpmAntAttention

dropout

An optional dropout layer. If configured, it applies dropout to the outputs.

TYPE: Dropout or None

METHOD DESCRIPTION
__init__

Initializes the CpmAntSelfAttentionBlock instance.

Args:

  • config (CpmAntConfig): The configuration object for the CpmAnt model.
forward

Applies the self-attention block to the given hidden states.

Args:

  • hidden_states (mindspore.Tensor): The input tensor of shape (batch, len_seq, dim_model) representing the hidden states.
  • attention_mask (mindspore.Tensor): The attention mask tensor of shape (batch, len_seq, len_seq) that avoids invalid areas in the self-attention calculation.
  • position_bias (Optional[mindspore.Tensor]): An optional positional bias tensor of shape (batch, len_seq, len_seq) that provides positional information to the self-attention block.
  • output_attentions (Optional[bool]): Whether or not to return the attention tensors of all attention layers.
  • past_key_values (Optional[Tuple[mindspore.Tensor, mindspore.Tensor]]): An optional tuple of past key and value projection states used for caching.
  • use_cache (Optional[bool]): If set to True, the past key and value states in past_key_values are returned and can be used to speed up decoding.

Returns:

  • Tuple[mindspore.Tensor, mindspore.Tensor, mindspore.Tensor]: A tuple containing the updated hidden states, attention weights, and current key-value states.
Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
class CpmAntSelfAttentionBlock(nn.Module):

    """
    This class represents a self-attention block used in the CpmAnt model. It is a subclass of the nn.Module class.

    Attributes:
        layernorm_before_attention (CpmAntLayerNorm):
            An instance of the CpmAntLayerNorm class that performs layer normalization before the self-attention operation.
        self_attention (CpmAntAttention):
            An instance of the CpmAntAttention class that performs the self-attention operation.
        dropout (nn.Dropout or None): An optional dropout layer. If configured, it applies dropout to the outputs.

    Methods:
        __init__: Initializes the CpmAntSelfAttentionBlock instance.

            Args:

            - config (CpmAntConfig): The configuration object for the CpmAnt model.

        forward: Applies the self-attention block to the given hidden states.

            Args:

            - hidden_states (mindspore.Tensor): The input tensor of shape `(batch, len_seq, dim_model)` representing the hidden states.
            - attention_mask (mindspore.Tensor): The attention mask tensor of shape `(batch, len_seq, len_seq)` that avoids invalid areas in the self-attention calculation.
            - position_bias (Optional[mindspore.Tensor]): An optional positional bias tensor of shape `(batch, len_seq, len_seq)` that provides positional information to the self-attention block.
            - output_attentions (Optional[bool]): Whether or not to return the attention tensors of all attention layers.
            - past_key_values (Optional[Tuple[mindspore.Tensor, mindspore.Tensor]]): An optional tuple of past key and value projection states used for caching.
            - use_cache (Optional[bool]): If set to `True`, the past key and value states in `past_key_values` are returned and can be used to speed up decoding.

            Returns:

            - Tuple[mindspore.Tensor, mindspore.Tensor, mindspore.Tensor]: A tuple containing the updated hidden states, attention weights, and current key-value states.
    """
    def __init__(self, config: CpmAntConfig):
        """
        This method initializes a CpmAntSelfAttentionBlock instance.

        Args:
            self (CpmAntSelfAttentionBlock): The instance of the CpmAntSelfAttentionBlock class.
            config (CpmAntConfig): The configuration object containing settings for the self-attention block.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()
        self.layernorm_before_attention = CpmAntLayerNorm(config)
        self.self_attention = CpmAntAttention(config)
        if config.dropout_p:
            self.dropout = nn.Dropout(p=config.dropout_p)
        else:
            self.dropout = None

    def forward(
        self,
        hidden_states: mindspore.Tensor,
        attention_mask: mindspore.Tensor,
        position_bias: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = False,
        past_key_values: Optional[Tuple[mindspore.Tensor, mindspore.Tensor]] = None,
        use_cache: Optional[bool] = None,
    ):
        """
        Args:
            hidden_states (`mindspore.Tensor` of shape `(batch, len_seq, dim_model)`):
                Input of transformer block(self-attention block). It can be the raw embedding of a batch of sequences.
            attention_mask (`mindspore.Tensor` of shape `(batch, len_seq, len_seq)`):
                Avoid invalid areas to participate in the calculation of self-attention.
            position_bias (`mindspore.Tensor` of shape `(batch, len_seq, len_seq)`):
                Provide positional information to self-attention block.
            output_attentions (`bool`, *optional*):
                Whether or not to return the attentions tensors of all attention layers.
            past_key_values (`Tuple(torch.FloatTensor)`, *optional*):
                Cached past key and value projection states.
            use_cache (`bool`, *optional*):
                If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
                (see `past_key_values`).
        """
        outputs = self.layernorm_before_attention(hidden_states)
        outputs = self.self_attention(
            outputs, outputs, attention_mask, position_bias, output_attentions, past_key_values, use_cache
        )

        outputs, attn_weights, current_key_value = outputs

        if self.dropout is not None:
            outputs = self.dropout(outputs)
        hidden_states = hidden_states + outputs

        return hidden_states, attn_weights, current_key_value

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntSelfAttentionBlock.__init__(config)

This method initializes a CpmAntSelfAttentionBlock instance.

PARAMETER DESCRIPTION
self

The instance of the CpmAntSelfAttentionBlock class.

TYPE: CpmAntSelfAttentionBlock

config

The configuration object containing settings for the self-attention block.

TYPE: CpmAntConfig

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
def __init__(self, config: CpmAntConfig):
    """
    This method initializes a CpmAntSelfAttentionBlock instance.

    Args:
        self (CpmAntSelfAttentionBlock): The instance of the CpmAntSelfAttentionBlock class.
        config (CpmAntConfig): The configuration object containing settings for the self-attention block.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()
    self.layernorm_before_attention = CpmAntLayerNorm(config)
    self.self_attention = CpmAntAttention(config)
    if config.dropout_p:
        self.dropout = nn.Dropout(p=config.dropout_p)
    else:
        self.dropout = None

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntSelfAttentionBlock.forward(hidden_states, attention_mask, position_bias=None, output_attentions=False, past_key_values=None, use_cache=None)

PARAMETER DESCRIPTION
hidden_states

Input of transformer block(self-attention block). It can be the raw embedding of a batch of sequences.

TYPE: `mindspore.Tensor` of shape `(batch, len_seq, dim_model)`

attention_mask

Avoid invalid areas to participate in the calculation of self-attention.

TYPE: `mindspore.Tensor` of shape `(batch, len_seq, len_seq)`

position_bias

Provide positional information to self-attention block.

TYPE: `mindspore.Tensor` of shape `(batch, len_seq, len_seq)` DEFAULT: None

output_attentions

Whether or not to return the attentions tensors of all attention layers.

TYPE: `bool`, *optional* DEFAULT: False

past_key_values

Cached past key and value projection states.

TYPE: `Tuple(torch.FloatTensor)`, *optional* DEFAULT: None

use_cache

If set to True, past_key_values key value states are returned and can be used to speed up decoding (see past_key_values).

TYPE: `bool`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
def forward(
    self,
    hidden_states: mindspore.Tensor,
    attention_mask: mindspore.Tensor,
    position_bias: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = False,
    past_key_values: Optional[Tuple[mindspore.Tensor, mindspore.Tensor]] = None,
    use_cache: Optional[bool] = None,
):
    """
    Args:
        hidden_states (`mindspore.Tensor` of shape `(batch, len_seq, dim_model)`):
            Input of transformer block(self-attention block). It can be the raw embedding of a batch of sequences.
        attention_mask (`mindspore.Tensor` of shape `(batch, len_seq, len_seq)`):
            Avoid invalid areas to participate in the calculation of self-attention.
        position_bias (`mindspore.Tensor` of shape `(batch, len_seq, len_seq)`):
            Provide positional information to self-attention block.
        output_attentions (`bool`, *optional*):
            Whether or not to return the attentions tensors of all attention layers.
        past_key_values (`Tuple(torch.FloatTensor)`, *optional*):
            Cached past key and value projection states.
        use_cache (`bool`, *optional*):
            If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
            (see `past_key_values`).
    """
    outputs = self.layernorm_before_attention(hidden_states)
    outputs = self.self_attention(
        outputs, outputs, attention_mask, position_bias, output_attentions, past_key_values, use_cache
    )

    outputs, attn_weights, current_key_value = outputs

    if self.dropout is not None:
        outputs = self.dropout(outputs)
    hidden_states = hidden_states + outputs

    return hidden_states, attn_weights, current_key_value

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntTransformerBlock

Bases: Module

This class represents a block of the CpmAntTransformer model, which is a type of transformer used for natural language processing tasks. It inherits from the nn.Module class.

ATTRIBUTE DESCRIPTION
self_att

The self-attention block of the transformer.

TYPE: CpmAntSelfAttentionBlock

ffn

The feed-forward neural network block of the transformer.

TYPE: CpmAntFFNBlock

METHOD DESCRIPTION
__init__

Initializes a new instance of the CpmAntTransformerBlock class.

forward

Constructs the transformer block.

Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
class CpmAntTransformerBlock(nn.Module):

    """
    This class represents a block of the CpmAntTransformer model, which is a type of transformer used for
    natural language processing tasks. It inherits from the nn.Module class.

    Attributes:
        self_att (CpmAntSelfAttentionBlock): The self-attention block of the transformer.
        ffn (CpmAntFFNBlock): The feed-forward neural network block of the transformer.

    Methods:
        __init__: Initializes a new instance of the CpmAntTransformerBlock class.
        forward: Constructs the transformer block.

    """
    def __init__(self, config: CpmAntConfig):
        """
        Initializes a new instance of the CpmAntTransformerBlock class.

        Args:
            self: The current instance of the class.
            config (CpmAntConfig): The configuration object for the transformer block.

        Returns:
            None

        Raises:
            None
        """
        super().__init__()
        self.self_att = CpmAntSelfAttentionBlock(config)
        self.ffn = CpmAntFFNBlock(config)

    def forward(
        self,
        hidden_states: mindspore.Tensor,
        attention_mask: mindspore.Tensor,
        position_bias: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = False,
        past_key_values: Optional[Tuple[mindspore.Tensor, mindspore.Tensor]] = None,
        use_cache: Optional[bool] = None,
    ):
        """
        Args:
            hidden_states (`mindspore.Tensor`):
                Input to the layer of shape `(batch, seq_len, dim_model)`
            attention_mask (`mindspore.Tensor`):
                Avoid invalid areas to participate in the calculation of shape `(batch, seq_len, seq_len)`
            position_bias (`mindspore.Tensor`):
                Provides position information to attention mechanism of shape `(num_heads, seq_len, seq_len)`
            output_attentions (`bool`, *optional*):
                Whether or not to return the attentions tensors of all attention layers.
            past_key_values (`Tuple[mindspore.Tensor, mindspore.Tensor])`, *optional*):
                Cached past key and value projection states
            use_cache (`bool`, *optional*):
                If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
                (see `past_key_values`).
        """
        hidden_states = self.self_att(
            hidden_states,
            attention_mask=attention_mask,
            position_bias=position_bias,
            output_attentions=output_attentions,
            past_key_values=past_key_values,
            use_cache=use_cache,
        )

        hidden_states, attn_weights, current_key_value = hidden_states

        hidden_states = self.ffn(hidden_states)

        return hidden_states, attn_weights, current_key_value

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntTransformerBlock.__init__(config)

Initializes a new instance of the CpmAntTransformerBlock class.

PARAMETER DESCRIPTION
self

The current instance of the class.

config

The configuration object for the transformer block.

TYPE: CpmAntConfig

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
def __init__(self, config: CpmAntConfig):
    """
    Initializes a new instance of the CpmAntTransformerBlock class.

    Args:
        self: The current instance of the class.
        config (CpmAntConfig): The configuration object for the transformer block.

    Returns:
        None

    Raises:
        None
    """
    super().__init__()
    self.self_att = CpmAntSelfAttentionBlock(config)
    self.ffn = CpmAntFFNBlock(config)

mindnlp.transformers.models.cpmant.modeling_cpmant.CpmAntTransformerBlock.forward(hidden_states, attention_mask, position_bias=None, output_attentions=False, past_key_values=None, use_cache=None)

PARAMETER DESCRIPTION
hidden_states

Input to the layer of shape (batch, seq_len, dim_model)

TYPE: `mindspore.Tensor`

attention_mask

Avoid invalid areas to participate in the calculation of shape (batch, seq_len, seq_len)

TYPE: `mindspore.Tensor`

position_bias

Provides position information to attention mechanism of shape (num_heads, seq_len, seq_len)

TYPE: `mindspore.Tensor` DEFAULT: None

output_attentions

Whether or not to return the attentions tensors of all attention layers.

TYPE: `bool`, *optional* DEFAULT: False

past_key_values

Cached past key and value projection states

TYPE: `Tuple[mindspore.Tensor, mindspore.Tensor])`, *optional* DEFAULT: None

use_cache

If set to True, past_key_values key value states are returned and can be used to speed up decoding (see past_key_values).

TYPE: `bool`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/cpmant/modeling_cpmant.py
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
def forward(
    self,
    hidden_states: mindspore.Tensor,
    attention_mask: mindspore.Tensor,
    position_bias: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = False,
    past_key_values: Optional[Tuple[mindspore.Tensor, mindspore.Tensor]] = None,
    use_cache: Optional[bool] = None,
):
    """
    Args:
        hidden_states (`mindspore.Tensor`):
            Input to the layer of shape `(batch, seq_len, dim_model)`
        attention_mask (`mindspore.Tensor`):
            Avoid invalid areas to participate in the calculation of shape `(batch, seq_len, seq_len)`
        position_bias (`mindspore.Tensor`):
            Provides position information to attention mechanism of shape `(num_heads, seq_len, seq_len)`
        output_attentions (`bool`, *optional*):
            Whether or not to return the attentions tensors of all attention layers.
        past_key_values (`Tuple[mindspore.Tensor, mindspore.Tensor])`, *optional*):
            Cached past key and value projection states
        use_cache (`bool`, *optional*):
            If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
            (see `past_key_values`).
    """
    hidden_states = self.self_att(
        hidden_states,
        attention_mask=attention_mask,
        position_bias=position_bias,
        output_attentions=output_attentions,
        past_key_values=past_key_values,
        use_cache=use_cache,
    )

    hidden_states, attn_weights, current_key_value = hidden_states

    hidden_states = self.ffn(hidden_states)

    return hidden_states, attn_weights, current_key_value