Skip to content

gpt

mindnlp.transformers.models.gpt.modeling_gpt

MindSpore OpenAI GPT model.

mindnlp.transformers.models.gpt.modeling_gpt.Attention

Bases: Module

This class represents an attention mechanism used in neural networks. It is designed to be used as a part of a larger model and inherits from the nn.Module class.

ATTRIBUTE DESCRIPTION
bias

A tensor representing the bias used in the attention computation.

TYPE: Tensor

n_head

The number of attention heads.

TYPE: int

split_size

The size of each split in the attention mechanism.

TYPE: int

scale

A flag indicating whether to scale the attention weights.

TYPE: bool

c_attn

A 1D convolutional layer used for computing the attention weights.

TYPE: Conv1D

c_proj

A 1D convolutional layer used for projecting the attention weights.

TYPE: Conv1D

attn_dropout

A dropout layer applied to the attention weights.

TYPE: Dropout

resid_dropout

A dropout layer applied to the projected attention weights.

TYPE: Dropout

pruned_heads

A set of pruned attention heads.

TYPE: set

METHOD DESCRIPTION
__init__

Initializes the Attention object.

prune_heads

Prunes the specified attention heads.

_attn

Computes the attention weights.

merge_heads

Merges the attention heads.

split_heads

Splits the input into multiple attention heads.

forward

Constructs the attention mechanism.

Note
  • The Attention class assumes that the input tensors follow specific shapes and sizes. It is important to ensure that the input data is compatible with the class implementation.
  • The Attention class should be used as part of a larger model and is not intended to be used as a standalone component.
Source code in mindnlp/transformers/models/gpt/modeling_gpt.py
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
class Attention(nn.Module):

    """
    This class represents an attention mechanism used in neural networks. 
    It is designed to be used as a part of a larger model and inherits from the nn.Module class.

    Attributes:
        bias (Tensor): A tensor representing the bias used in the attention computation.
        n_head (int): The number of attention heads.
        split_size (int): The size of each split in the attention mechanism.
        scale (bool): A flag indicating whether to scale the attention weights.
        c_attn (Conv1D): A 1D convolutional layer used for computing the attention weights.
        c_proj (Conv1D): A 1D convolutional layer used for projecting the attention weights.
        attn_dropout (Dropout): A dropout layer applied to the attention weights.
        resid_dropout (Dropout): A dropout layer applied to the projected attention weights.
        pruned_heads (set): A set of pruned attention heads.

    Methods:
        __init__: Initializes the Attention object.
        prune_heads: Prunes the specified attention heads.
        _attn: Computes the attention weights.
        merge_heads: Merges the attention heads.
        split_heads: Splits the input into multiple attention heads.
        forward: Constructs the attention mechanism.

    Note:
        - The Attention class assumes that the input tensors follow specific shapes and sizes. 
        It is important to ensure that the input data is compatible with the class implementation.
        - The Attention class should be used as part of a larger model and is not intended to be used as a 
        standalone component.
    """
    def __init__(self, nx, n_positions, config, scale=False):
        """
        Initialize the Attention class.

        Args:
            self (Attention): The instance of the Attention class.
            nx (int): The size of the input state.
            n_positions (int): The number of positions.
            config (object): An object containing configuration settings.
            scale (bool): Flag indicating whether to scale the output. Default is False.

        Returns:
            None.

        Raises:
            ValueError: If the input state size (n_state) is not divisible by the number of attention heads specified 
                in the configuration.
        """
        super().__init__()
        n_state = nx  # in Attention: n_state=768 (nx=n_embd)
        # [switch nx => n_state from Block to Attention to keep identical to TF implementation]
        if n_state % config.n_head != 0:
            raise ValueError(f"Attention n_state shape: {n_state} must be divisible by config.n_head {config.n_head}")
        self.bias = ops.tril(ops.ones((n_positions, n_positions))).view(1, 1, n_positions, n_positions)

        self.n_head = config.n_head
        self.split_size = n_state
        self.scale = scale

        self.c_attn = Conv1D(n_state * 3, nx)
        self.c_proj = Conv1D(n_state, nx)
        self.attn_dropout = nn.Dropout(p=config.attn_pdrop)
        self.resid_dropout = nn.Dropout(p=config.resid_pdrop)
        self.pruned_heads = set()

    def prune_heads(self, heads):
        """
        Method: prune_heads

        Description:
            This method prunes the attention heads based on the input 'heads' list and updates the necessary attributes 
            of the Attention class accordingly.

        Args:
            self (Attention): 
                An instance of the Attention class.

                - Type: Attention class object
                - Purpose: Represents the current instance of the Attention class where the pruning operation is to 
                be applied.

            heads (list): 
                The list of attention heads to be pruned.

                - Type: List
                - Purpose: Contains the indices of the attention heads to be pruned.
                - Restrictions: Should be a non-empty list of integers representing valid attention head indices.

        Returns:
            None:
                - Type: None
                - Purpose: The method does not return any value explicitly but updates the attributes of the Attention 
                class in place.

        Raises:
            None.
        """
        if len(heads) == 0:
            return
        heads, index = find_pruneable_heads_and_indices(
            heads, self.n_head, self.split_size // self.n_head, self.pruned_heads
        )
        index_attn = ops.cat([index, index + self.split_size, index + (2 * self.split_size)])
        # Prune conv1d layers
        self.c_attn = prune_conv1d_layer(self.c_attn, index_attn, dim=1)
        self.c_proj = prune_conv1d_layer(self.c_proj, index, dim=0)
        # Update hyper params
        self.split_size = (self.split_size // self.n_head) * (self.n_head - len(heads))
        self.n_head = self.n_head - len(heads)
        self.pruned_heads = self.pruned_heads.union(heads)

    def _attn(self, q, k, v, attention_mask=None, head_mask=None, output_attentions=False):
        """
        Method _attn in the class Attention calculates attention weights based on the input query (q), key (k),
        and value (v) tensors.

        Args:
            self (Attention): The instance of the Attention class.
            q (Tensor): The input query tensor.
            k (Tensor): The input key tensor.
            v (Tensor): The input value tensor.
            attention_mask (Tensor, optional):
                A mask tensor to mask certain positions in the attention weights. Default is None.
            head_mask (Tensor, optional): A mask tensor to mask certain heads in multi-head attention. Default is None.
            output_attentions (bool, optional): A flag indicating whether to output the attention weights. Default is False.

        Returns:
            outputs (List[Tensor]): A list containing the output tensor representing the weighted values.
                If output_attentions is True, the list also includes the attention weights tensor.

        Raises:
            ValueError: If the dimensionality of the input tensors is not compatible for matrix multiplication.
            ValueError: If the dimensions of the bias tensor are not compatible with the computed attention weights.
            TypeError: If any of the input tensors are not of type Tensor.
            TypeError: If head_mask is provided but not of type Tensor.
            TypeError: If output_attentions is provided but not of type bool.
        """
        w = ops.matmul(q, k)
        if self.scale:
            w = w / math.sqrt(v.shape[-1])
        # w = w * self.bias + -1e9 * (1 - self.bias)  # TF implementation method: mask_attn_weights
        # XD: self.b may be larger than w, so we need to crop it
        b = self.bias[:, :, : w.shape[-2], : w.shape[-1]]
        w = w * b + -1e4 * (1 - b)

        if attention_mask is not None:
            # Apply the attention mask
            w = w + attention_mask

        w = ops.softmax(w, axis=-1)
        w = self.attn_dropout(w)

        # Mask heads if we want to
        if head_mask is not None:
            w = w * head_mask

        outputs = [ops.matmul(w, v)]
        if output_attentions:
            outputs.append(w)
        return outputs

    def merge_heads(self, x):
        """
        Merge the heads of the attention mechanism.

        Args:
            self: An instance of the Attention class.
            x: A tensor representing the input data.
                It should have a shape of (batch_size, num_heads, seq_len, head_dim).

        Returns:
            None

        Raises:
            None
        """
        x = x.permute(0, 2, 1, 3)
        new_x_shape = x.shape[:-2] + (x.shape[-2] * x.shape[-1],)
        return x.view(*new_x_shape)  # in Tensorflow implementation: fct merge_states

    def split_heads(self, x, k=False):
        """
        Splits the input tensor into multiple "head" tensors along the last dimension.

        Args:
            self (Attention): The instance of the Attention class.
            x (torch.Tensor): The input tensor to be split into multiple "head" tensors.
                It should have a shape of (batch_size, seq_len, d_model).
            k (bool, optional): A boolean flag indicating whether to transpose the dimensions of the output tensors.
                Default is False.

        Returns:
            torch.Tensor or None: If `k` is True, the function returns a tensor with shape
                (batch_size, seq_len, n_head, d_model/n_head), where the last two dimensions are transposed.
                If `k` is False, the function returns a tensor with shape (batch_size, seq_len, n_head, d_model/n_head).

        Raises:
            None: This method does not raise any exceptions.
        """
        new_x_shape = x.shape[:-1] + (self.n_head, x.shape[-1] // self.n_head)
        x = x.view(*new_x_shape)  # in Tensorflow implementation: fct split_states
        if k:
            return x.permute(0, 2, 3, 1)
        return x.permute(0, 2, 1, 3)

    def forward(self, x, attention_mask=None, head_mask=None, output_attentions=False):
        """
        This method 'forward' in the class 'Attention' processes the input data 'x' through attention mechanisms.

        Args:
            self (object): The instance of the Attention class.
            x (tensor): The input data tensor to be processed.
            attention_mask (tensor, optional):
                An optional mask tensor for masking out certain elements during attention computation.
            head_mask (tensor, optional): An optional mask tensor for masking out specific attention heads.
            output_attentions (bool): A flag indicating whether to output attention weights.

        Returns:
            None: This method does not return any value explicitly;
                it updates internal states and outputs intermediate results.

        Raises:
            ValueError: If the provided 'x' tensor is not compatible for processing.
            RuntimeError: If an error occurs during the attention mechanism computations.
            TypeError: If incorrect data types are provided for the input parameters.
        """
        x = self.c_attn(x)
        query, key, value = x.split(self.split_size, axis=2)
        query = self.split_heads(query)
        key = self.split_heads(key, k=True)
        value = self.split_heads(value)

        attn_outputs = self._attn(query, key, value, attention_mask, head_mask, output_attentions)
        a = attn_outputs[0]

        a = self.merge_heads(a)
        a = self.c_proj(a)
        a = self.resid_dropout(a)

        outputs = [a] + attn_outputs[1:]
        return outputs  # a, (attentions)

mindnlp.transformers.models.gpt.modeling_gpt.Attention.__init__(nx, n_positions, config, scale=False)

Initialize the Attention class.

PARAMETER DESCRIPTION
self

The instance of the Attention class.

TYPE: Attention

nx

The size of the input state.

TYPE: int

n_positions

The number of positions.

TYPE: int

config

An object containing configuration settings.

TYPE: object

scale

Flag indicating whether to scale the output. Default is False.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
ValueError

If the input state size (n_state) is not divisible by the number of attention heads specified in the configuration.

Source code in mindnlp/transformers/models/gpt/modeling_gpt.py
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
def __init__(self, nx, n_positions, config, scale=False):
    """
    Initialize the Attention class.

    Args:
        self (Attention): The instance of the Attention class.
        nx (int): The size of the input state.
        n_positions (int): The number of positions.
        config (object): An object containing configuration settings.
        scale (bool): Flag indicating whether to scale the output. Default is False.

    Returns:
        None.

    Raises:
        ValueError: If the input state size (n_state) is not divisible by the number of attention heads specified 
            in the configuration.
    """
    super().__init__()
    n_state = nx  # in Attention: n_state=768 (nx=n_embd)
    # [switch nx => n_state from Block to Attention to keep identical to TF implementation]
    if n_state % config.n_head != 0:
        raise ValueError(f"Attention n_state shape: {n_state} must be divisible by config.n_head {config.n_head}")
    self.bias = ops.tril(ops.ones((n_positions, n_positions))).view(1, 1, n_positions, n_positions)

    self.n_head = config.n_head
    self.split_size = n_state
    self.scale = scale

    self.c_attn = Conv1D(n_state * 3, nx)
    self.c_proj = Conv1D(n_state, nx)
    self.attn_dropout = nn.Dropout(p=config.attn_pdrop)
    self.resid_dropout = nn.Dropout(p=config.resid_pdrop)
    self.pruned_heads = set()

mindnlp.transformers.models.gpt.modeling_gpt.Attention.forward(x, attention_mask=None, head_mask=None, output_attentions=False)

This method 'forward' in the class 'Attention' processes the input data 'x' through attention mechanisms.

PARAMETER DESCRIPTION
self

The instance of the Attention class.

TYPE: object

x

The input data tensor to be processed.

TYPE: tensor

attention_mask

An optional mask tensor for masking out certain elements during attention computation.

TYPE: tensor DEFAULT: None

head_mask

An optional mask tensor for masking out specific attention heads.

TYPE: tensor DEFAULT: None

output_attentions

A flag indicating whether to output attention weights.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
None

This method does not return any value explicitly; it updates internal states and outputs intermediate results.

RAISES DESCRIPTION
ValueError

If the provided 'x' tensor is not compatible for processing.

RuntimeError

If an error occurs during the attention mechanism computations.

TypeError

If incorrect data types are provided for the input parameters.

Source code in mindnlp/transformers/models/gpt/modeling_gpt.py
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
def forward(self, x, attention_mask=None, head_mask=None, output_attentions=False):
    """
    This method 'forward' in the class 'Attention' processes the input data 'x' through attention mechanisms.

    Args:
        self (object): The instance of the Attention class.
        x (tensor): The input data tensor to be processed.
        attention_mask (tensor, optional):
            An optional mask tensor for masking out certain elements during attention computation.
        head_mask (tensor, optional): An optional mask tensor for masking out specific attention heads.
        output_attentions (bool): A flag indicating whether to output attention weights.

    Returns:
        None: This method does not return any value explicitly;
            it updates internal states and outputs intermediate results.

    Raises:
        ValueError: If the provided 'x' tensor is not compatible for processing.
        RuntimeError: If an error occurs during the attention mechanism computations.
        TypeError: If incorrect data types are provided for the input parameters.
    """
    x = self.c_attn(x)
    query, key, value = x.split(self.split_size, axis=2)
    query = self.split_heads(query)
    key = self.split_heads(key, k=True)
    value = self.split_heads(value)

    attn_outputs = self._attn(query, key, value, attention_mask, head_mask, output_attentions)
    a = attn_outputs[0]

    a = self.merge_heads(a)
    a = self.c_proj(a)
    a = self.resid_dropout(a)

    outputs = [a] + attn_outputs[1:]
    return outputs  # a, (attentions)

mindnlp.transformers.models.gpt.modeling_gpt.Attention.merge_heads(x)

Merge the heads of the attention mechanism.

PARAMETER DESCRIPTION
self

An instance of the Attention class.

x

A tensor representing the input data. It should have a shape of (batch_size, num_heads, seq_len, head_dim).

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/gpt/modeling_gpt.py
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
def merge_heads(self, x):
    """
    Merge the heads of the attention mechanism.

    Args:
        self: An instance of the Attention class.
        x: A tensor representing the input data.
            It should have a shape of (batch_size, num_heads, seq_len, head_dim).

    Returns:
        None

    Raises:
        None
    """
    x = x.permute(0, 2, 1, 3)
    new_x_shape = x.shape[:-2] + (x.shape[-2] * x.shape[-1],)
    return x.view(*new_x_shape)  # in Tensorflow implementation: fct merge_states

mindnlp.transformers.models.gpt.modeling_gpt.Attention.prune_heads(heads)

Description

This method prunes the attention heads based on the input 'heads' list and updates the necessary attributes of the Attention class accordingly.

PARAMETER DESCRIPTION
self

An instance of the Attention class.

  • Type: Attention class object
  • Purpose: Represents the current instance of the Attention class where the pruning operation is to be applied.

TYPE: Attention

heads

The list of attention heads to be pruned.

  • Type: List
  • Purpose: Contains the indices of the attention heads to be pruned.
  • Restrictions: Should be a non-empty list of integers representing valid attention head indices.

TYPE: list

RETURNS DESCRIPTION
None
  • Type: None
  • Purpose: The method does not return any value explicitly but updates the attributes of the Attention class in place.
Source code in mindnlp/transformers/models/gpt/modeling_gpt.py
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
def prune_heads(self, heads):
    """
    Method: prune_heads

    Description:
        This method prunes the attention heads based on the input 'heads' list and updates the necessary attributes 
        of the Attention class accordingly.

    Args:
        self (Attention): 
            An instance of the Attention class.

            - Type: Attention class object
            - Purpose: Represents the current instance of the Attention class where the pruning operation is to 
            be applied.

        heads (list): 
            The list of attention heads to be pruned.

            - Type: List
            - Purpose: Contains the indices of the attention heads to be pruned.
            - Restrictions: Should be a non-empty list of integers representing valid attention head indices.

    Returns:
        None:
            - Type: None
            - Purpose: The method does not return any value explicitly but updates the attributes of the Attention 
            class in place.

    Raises:
        None.
    """
    if len(heads) == 0:
        return
    heads, index = find_pruneable_heads_and_indices(
        heads, self.n_head, self.split_size // self.n_head, self.pruned_heads
    )
    index_attn = ops.cat([index, index + self.split_size, index + (2 * self.split_size)])
    # Prune conv1d layers
    self.c_attn = prune_conv1d_layer(self.c_attn, index_attn, dim=1)
    self.c_proj = prune_conv1d_layer(self.c_proj, index, dim=0)
    # Update hyper params
    self.split_size = (self.split_size // self.n_head) * (self.n_head - len(heads))
    self.n_head = self.n_head - len(heads)
    self.pruned_heads = self.pruned_heads.union(heads)

mindnlp.transformers.models.gpt.modeling_gpt.Attention.split_heads(x, k=False)

Splits the input tensor into multiple "head" tensors along the last dimension.

PARAMETER DESCRIPTION
self

The instance of the Attention class.

TYPE: Attention

x

The input tensor to be split into multiple "head" tensors. It should have a shape of (batch_size, seq_len, d_model).

TYPE: Tensor

k

A boolean flag indicating whether to transpose the dimensions of the output tensors. Default is False.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION

torch.Tensor or None: If k is True, the function returns a tensor with shape (batch_size, seq_len, n_head, d_model/n_head), where the last two dimensions are transposed. If k is False, the function returns a tensor with shape (batch_size, seq_len, n_head, d_model/n_head).

RAISES DESCRIPTION
None

This method does not raise any exceptions.

Source code in mindnlp/transformers/models/gpt/modeling_gpt.py
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
def split_heads(self, x, k=False):
    """
    Splits the input tensor into multiple "head" tensors along the last dimension.

    Args:
        self (Attention): The instance of the Attention class.
        x (torch.Tensor): The input tensor to be split into multiple "head" tensors.
            It should have a shape of (batch_size, seq_len, d_model).
        k (bool, optional): A boolean flag indicating whether to transpose the dimensions of the output tensors.
            Default is False.

    Returns:
        torch.Tensor or None: If `k` is True, the function returns a tensor with shape
            (batch_size, seq_len, n_head, d_model/n_head), where the last two dimensions are transposed.
            If `k` is False, the function returns a tensor with shape (batch_size, seq_len, n_head, d_model/n_head).

    Raises:
        None: This method does not raise any exceptions.
    """
    new_x_shape = x.shape[:-1] + (self.n_head, x.shape[-1] // self.n_head)
    x = x.view(*new_x_shape)  # in Tensorflow implementation: fct split_states
    if k:
        return x.permute(0, 2, 3, 1)
    return x.permute(0, 2, 1, 3)

mindnlp.transformers.models.gpt.modeling_gpt.Block

Bases: Module

This class represents a block in a neural network model. It is a subclass of nn.Module and is used for building transformer models.

ATTRIBUTE DESCRIPTION
attn

The attention module of the block.

TYPE: Attention

ln_1

The first layer normalization module.

TYPE: LayerNorm

mlp

The multi-layer perceptron module.

TYPE: MLP

ln_2

The second layer normalization module.

TYPE: LayerNorm

METHOD DESCRIPTION
__init__

Initializes a new instance of the Block class.

Args:

  • n_positions (int): The number of positions in the input sequence.
  • config (object): The configuration object for the block.
  • scale (bool, optional): Whether to scale the attention scores. Defaults to False.
forward

Constructs the block by performing the necessary computations on the input.

Args:

  • x (Tensor): The input tensor.
  • attention_mask (Tensor, optional): The attention mask tensor. Defaults to None.
  • head_mask (Tensor, optional): The head mask tensor. Defaults to None.
  • output_attentions (bool, optional): Whether to output the attention weights. Defaults to False.

Returns:

  • outputs (list): A list containing the output tensor and optional attention weights.
Source code in mindnlp/transformers/models/gpt/modeling_gpt.py
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
class Block(nn.Module):

    """
    This class represents a block in a neural network model.
    It is a subclass of nn.Module and is used for building transformer models.

    Attributes:
        attn (Attention): The attention module of the block.
        ln_1 (nn.LayerNorm): The first layer normalization module.
        mlp (MLP): The multi-layer perceptron module.
        ln_2 (nn.LayerNorm): The second layer normalization module.

    Methods:
        __init__(self, n_positions, config, scale=False):
            Initializes a new instance of the Block class.

            Args:

            - n_positions (int): The number of positions in the input sequence.
            - config (object): The configuration object for the block.
            - scale (bool, optional): Whether to scale the attention scores. Defaults to False.

        forward(self, x, attention_mask=None, head_mask=None, output_attentions=False):
            Constructs the block by performing the necessary computations on the input.

            Args:

            - x (Tensor): The input tensor.
            - attention_mask (Tensor, optional): The attention mask tensor. Defaults to None.
            - head_mask (Tensor, optional): The head mask tensor. Defaults to None.
            - output_attentions (bool, optional): Whether to output the attention weights. Defaults to False.

            Returns:

            - outputs (list): A list containing the output tensor and optional attention weights.
    """
    def __init__(self, n_positions, config, scale=False):
        """
        Initializes a Block object.

        Args:
            self (object): The instance of the Block class.
            n_positions (int): The number of positions.
            config (object): The configuration object.
            scale (bool, optional): A flag to indicate scaling. Defaults to False.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()
        nx = config.n_embd
        self.attn = Attention(nx, n_positions, config, scale)
        self.ln_1 = nn.LayerNorm([nx], eps=config.layer_norm_epsilon)
        self.mlp = MLP(4 * nx, config)
        self.ln_2 = nn.LayerNorm([nx], eps=config.layer_norm_epsilon)

    def forward(self, x, attention_mask=None, head_mask=None, output_attentions=False):
        """
        Constructs a block in the given class.

        Args:
            self (Block): An instance of the Block class.
            x: The input tensor.
            attention_mask (Optional[Tensor]): An optional attention mask tensor. Default is None.
            head_mask (Optional[Tensor]): An optional head mask tensor. Default is None.
            output_attentions (bool): Whether to output attentions. Default is False.

        Returns:
            list: A list containing the output tensor and other optional attentions.

        Raises:
            None.

        This method forwards a block by performing the following steps:

        1. Calculate attention outputs using the 'attn' method, passing the input tensor, attention mask,
           head mask, and output attentions flag as parameters. Store the result in 'attn_outputs'.
        2. Retrieve the first element from 'attn_outputs' and assign it to 'a'.
        3. Add 'x' and 'a' and apply layer normalization using the 'ln_1' method. Store the result in 'n'.
        4. Apply multi-layer perceptron (MLP) to 'n' using the 'mlp' method. Store the result in 'm'.
        5. Add 'n' and 'm' and apply layer normalization using the 'ln_2' method. Store the result in 'h'.
        6. Create a list 'outputs' containing 'h' as the first element, followed by any additional elements
           from 'attn_outputs'.
        7. Return 'outputs'.
        """
        attn_outputs = self.attn(
            x,
            attention_mask=attention_mask,
            head_mask=head_mask,
            output_attentions=output_attentions,
        )
        a = attn_outputs[0]

        n = self.ln_1(x + a)
        m = self.mlp(n)
        h = self.ln_2(n + m)

        outputs = [h] + attn_outputs[1:]
        return outputs

mindnlp.transformers.models.gpt.modeling_gpt.Block.__init__(n_positions, config, scale=False)

Initializes a Block object.

PARAMETER DESCRIPTION
self

The instance of the Block class.

TYPE: object

n_positions

The number of positions.

TYPE: int

config

The configuration object.

TYPE: object

scale

A flag to indicate scaling. Defaults to False.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/gpt/modeling_gpt.py
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
def __init__(self, n_positions, config, scale=False):
    """
    Initializes a Block object.

    Args:
        self (object): The instance of the Block class.
        n_positions (int): The number of positions.
        config (object): The configuration object.
        scale (bool, optional): A flag to indicate scaling. Defaults to False.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()
    nx = config.n_embd
    self.attn = Attention(nx, n_positions, config, scale)
    self.ln_1 = nn.LayerNorm([nx], eps=config.layer_norm_epsilon)
    self.mlp = MLP(4 * nx, config)
    self.ln_2 = nn.LayerNorm([nx], eps=config.layer_norm_epsilon)

mindnlp.transformers.models.gpt.modeling_gpt.Block.forward(x, attention_mask=None, head_mask=None, output_attentions=False)

Constructs a block in the given class.

PARAMETER DESCRIPTION
self

An instance of the Block class.

TYPE: Block

x

The input tensor.

attention_mask

An optional attention mask tensor. Default is None.

TYPE: Optional[Tensor] DEFAULT: None

head_mask

An optional head mask tensor. Default is None.

TYPE: Optional[Tensor] DEFAULT: None

output_attentions

Whether to output attentions. Default is False.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
list

A list containing the output tensor and other optional attentions.

This method forwards a block by performing the following steps:

  1. Calculate attention outputs using the 'attn' method, passing the input tensor, attention mask, head mask, and output attentions flag as parameters. Store the result in 'attn_outputs'.
  2. Retrieve the first element from 'attn_outputs' and assign it to 'a'.
  3. Add 'x' and 'a' and apply layer normalization using the 'ln_1' method. Store the result in 'n'.
  4. Apply multi-layer perceptron (MLP) to 'n' using the 'mlp' method. Store the result in 'm'.
  5. Add 'n' and 'm' and apply layer normalization using the 'ln_2' method. Store the result in 'h'.
  6. Create a list 'outputs' containing 'h' as the first element, followed by any additional elements from 'attn_outputs'.
  7. Return 'outputs'.
Source code in mindnlp/transformers/models/gpt/modeling_gpt.py
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
def forward(self, x, attention_mask=None, head_mask=None, output_attentions=False):
    """
    Constructs a block in the given class.

    Args:
        self (Block): An instance of the Block class.
        x: The input tensor.
        attention_mask (Optional[Tensor]): An optional attention mask tensor. Default is None.
        head_mask (Optional[Tensor]): An optional head mask tensor. Default is None.
        output_attentions (bool): Whether to output attentions. Default is False.

    Returns:
        list: A list containing the output tensor and other optional attentions.

    Raises:
        None.

    This method forwards a block by performing the following steps:

    1. Calculate attention outputs using the 'attn' method, passing the input tensor, attention mask,
       head mask, and output attentions flag as parameters. Store the result in 'attn_outputs'.
    2. Retrieve the first element from 'attn_outputs' and assign it to 'a'.
    3. Add 'x' and 'a' and apply layer normalization using the 'ln_1' method. Store the result in 'n'.
    4. Apply multi-layer perceptron (MLP) to 'n' using the 'mlp' method. Store the result in 'm'.
    5. Add 'n' and 'm' and apply layer normalization using the 'ln_2' method. Store the result in 'h'.
    6. Create a list 'outputs' containing 'h' as the first element, followed by any additional elements
       from 'attn_outputs'.
    7. Return 'outputs'.
    """
    attn_outputs = self.attn(
        x,
        attention_mask=attention_mask,
        head_mask=head_mask,
        output_attentions=output_attentions,
    )
    a = attn_outputs[0]

    n = self.ln_1(x + a)
    m = self.mlp(n)
    h = self.ln_2(n + m)

    outputs = [h] + attn_outputs[1:]
    return outputs

mindnlp.transformers.models.gpt.modeling_gpt.GPTDoubleHeadsModel

Bases: GPTPreTrainedModel

This class represents a GPT (Generative Pre-trained Transformer) model with double heads. It is used for language modeling and multiple choice classification tasks. The GPTDoubleHeadsModel inherits from the GPTPreTrainedModel class.

The GPTDoubleHeadsModel class contains methods for initializing the model, getting and setting the output embeddings, and forwarding the model. It also includes a detailed docstring for the forward method, which describes the input parameters, return values, and provides examples of how to use the method.

To use the GPTDoubleHeadsModel, follow these steps:

  1. Instantiate the GPTDoubleHeadsModel class, passing the config parameter.
  2. Use the get_output_embeddings method to get the output embeddings of the model.
  3. Use the set_output_embeddings method to set new embeddings for the model.
  4. Use the forward method to perform language modeling and multiple choice classification tasks. The method takes various input tensors and returns the model outputs, including logits for language modeling and multiple choice classification.
Example
>>> from transformers import AutoTokenizer, GPTDoubleHeadsModel
...
>>> tokenizer = AutoTokenizer.from_pretrained("openai-gpt")
>>> model = GPTDoubleHeadsModel.from_pretrained("openai-gpt")
>>> tokenizer.add_special_tokens({"cls_token": "[CLS]"})
>>> model.resize_token_embeddings(len(tokenizer))
...
>>> choices = ["Hello, my dog is cute [CLS]", "Hello, my cat is cute [CLS]"]
>>> input_ids = tokenizer.encode_batch(choices)
>>> mc_token_ids = [len(ids) - 1 for ids in input_ids]
...
>>> outputs = model.forward(input_ids, mc_token_ids=mc_token_ids)
>>> lm_logits = outputs.logits
>>> mc_logits = outputs.mc_logits

For more details on how to use the GPTDoubleHeadsModel class, refer to the documentation and examples provided in the code.

Source code in mindnlp/transformers/models/gpt/modeling_gpt.py
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
class GPTDoubleHeadsModel(GPTPreTrainedModel):

    """
    This class represents a GPT (Generative Pre-trained Transformer) model with double heads.
    It is used for language modeling and multiple choice classification tasks.
    The GPTDoubleHeadsModel inherits from the GPTPreTrainedModel class.

    The GPTDoubleHeadsModel class contains methods for initializing the model, getting and setting the output embeddings,
    and forwarding the model. It also includes a detailed docstring for the `forward` method,
    which describes the input parameters, return values, and provides examples of how to use the method.

    To use the GPTDoubleHeadsModel, follow these steps:

    1. Instantiate the GPTDoubleHeadsModel class, passing the `config` parameter.
    2. Use the `get_output_embeddings` method to get the output embeddings of the model.
    3. Use the `set_output_embeddings` method to set new embeddings for the model.
    4. Use the `forward` method to perform language modeling and multiple choice classification tasks.
    The method takes various input tensors and returns the model outputs, including logits for language
    modeling and multiple choice classification.

    Example:
        ```python
        >>> from transformers import AutoTokenizer, GPTDoubleHeadsModel
        ...
        >>> tokenizer = AutoTokenizer.from_pretrained("openai-gpt")
        >>> model = GPTDoubleHeadsModel.from_pretrained("openai-gpt")
        >>> tokenizer.add_special_tokens({"cls_token": "[CLS]"})
        >>> model.resize_token_embeddings(len(tokenizer))
        ...
        >>> choices = ["Hello, my dog is cute [CLS]", "Hello, my cat is cute [CLS]"]
        >>> input_ids = tokenizer.encode_batch(choices)
        >>> mc_token_ids = [len(ids) - 1 for ids in input_ids]
        ...
        >>> outputs = model.forward(input_ids, mc_token_ids=mc_token_ids)
        >>> lm_logits = outputs.logits
        >>> mc_logits = outputs.mc_logits
        ```
    For more details on how to use the GPTDoubleHeadsModel class, refer to the documentation and examples provided
    in the code.
    """
    _tied_weights_keys = ["lm_head.weight"]

    def __init__(self, config):
        """
        Initializes an instance of the GPTDoubleHeadsModel class.

        Args:
            self: The current object.
            config: An instance of the GPTConfig class that holds the configuration settings for the GPT model.

        Returns:
            None

        Raises:
            None
        """
        super().__init__(config)

        config.num_labels = 1
        self.transformer = GPTModel(config)
        self.lm_head = nn.Linear(config.n_embd, config.vocab_size, bias=False)
        self.multiple_choice_head = SequenceSummary(config)

        # Initialize weights and apply final processing
        self.post_init()

    def get_output_embeddings(self):
        """
        Returns the output embeddings for the GPTDoubleHeadsModel.

        Args:
            self: An instance of the GPTDoubleHeadsModel class.

        Returns:
            None

        Raises:
            None
        """
        return self.lm_head

    def set_output_embeddings(self, new_embeddings):
        """
        Sets the output embeddings of the GPTDoubleHeadsModel.

        Args:
            self (GPTDoubleHeadsModel): The instance of the GPTDoubleHeadsModel class.
            new_embeddings (Any): The new embeddings to be set as the output embeddings. This can be an object of any type.

        Returns:
            None.

        Raises:
            None.
        """
        self.lm_head = new_embeddings

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        token_type_ids: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        mc_token_ids: Optional[mindspore.Tensor] = None,
        labels: Optional[mindspore.Tensor] = None,
        mc_labels: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple[mindspore.Tensor], GPTDoubleHeadsModelOutput]:
        r"""
        Args:
            mc_token_ids (`mindspore.Tensor` of shape `(batch_size, num_choices)`, *optional*,
                default to index of the last token of the input):
                Index of the classification token in each input sequence. Selected in the range `[0, input_ids.shape[-1] -
                1]`.
            labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
                Labels for language modeling. Note that the labels **are shifted** inside the model, i.e. you can set
                `labels = input_ids` Indices are selected in `[-1, 0, ..., config.vocab_size]` All labels set to `-100` are
                ignored (masked), the loss is only computed for labels in `[0, ..., config.vocab_size]`
            mc_labels (`mindspore.Tensor` of shape `(batch_size)`, *optional*):
                Labels for computing the multiple choice classification loss. Indices should be in `[0, ..., num_choices]`
                where *num_choices* is the size of the second dimension of the input tensors. (see *input_ids* above)

        Return:
            Union[Tuple[mindspore.Tensor], GPTDoubleHeadsModelOutput]

        Example:
            ```python
            >>> from transformers import AutoTokenizer, OpenAIGPTDoubleHeadsModel
            ...
            ...
            >>> tokenizer = AutoTokenizer.from_pretrained("openai-gpt")
            >>> model = OpenAIGPTDoubleHeadsModel.from_pretrained("openai-gpt")
            >>> tokenizer.add_special_tokens(
            ...     {"cls_token": "[CLS]"}
            ... )  # Add a [CLS] to the vocabulary (we should train it also!)
            >>> model.resize_token_embeddings(len(tokenizer))
            ...
            >>> choices = ["Hello, my dog is cute [CLS]", "Hello, my cat is cute [CLS]"]
            >>> input_ids = torch.tensor([tokenizer.encode(s) for s in choices]).unsqueeze(0)  # Batch size 1, 2 choices
            >>> mc_token_ids = torch.tensor([input_ids.shape[-1] - 1, input_ids.shape[-1] - 1]).unsqueeze(0)  # Batch size 1
            ...
            >>> outputs = model(input_ids, mc_token_ids=mc_token_ids)
            >>> lm_logits = outputs.logits
            >>> mc_logits = outputs.mc_logits
            ```
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        transformer_outputs = self.transformer(
            input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )
        hidden_states = transformer_outputs[0]

        lm_logits = self.lm_head(hidden_states)
        mc_logits = self.multiple_choice_head(hidden_states, mc_token_ids).squeeze(-1)

        lm_loss, mc_loss = None, None
        if mc_labels is not None:
            mc_loss = ops.cross_entropy(mc_logits.view(-1, mc_logits.shape[-1]), mc_labels.view(-1))
        if labels is not None:
            shift_logits = lm_logits[..., :-1, :]
            shift_labels = labels[..., 1:]
            lm_loss = ops.cross_entropy(shift_logits.view(-1, shift_logits.shape[-1]), shift_labels.view(-1))

        if not return_dict:
            output = (lm_logits, mc_logits) + transformer_outputs[1:]
            if mc_loss is not None:
                output = (mc_loss,) + output
            return ((lm_loss,) + output) if lm_loss is not None else output

        return GPTDoubleHeadsModelOutput(
            loss=lm_loss,
            mc_loss=mc_loss,
            logits=lm_logits,
            mc_logits=mc_logits,
            hidden_states=transformer_outputs.hidden_states,
            attentions=transformer_outputs.attentions,
        )

mindnlp.transformers.models.gpt.modeling_gpt.GPTDoubleHeadsModel.__init__(config)

Initializes an instance of the GPTDoubleHeadsModel class.

PARAMETER DESCRIPTION
self

The current object.

config

An instance of the GPTConfig class that holds the configuration settings for the GPT model.

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/gpt/modeling_gpt.py
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
def __init__(self, config):
    """
    Initializes an instance of the GPTDoubleHeadsModel class.

    Args:
        self: The current object.
        config: An instance of the GPTConfig class that holds the configuration settings for the GPT model.

    Returns:
        None

    Raises:
        None
    """
    super().__init__(config)

    config.num_labels = 1
    self.transformer = GPTModel(config)
    self.lm_head = nn.Linear(config.n_embd, config.vocab_size, bias=False)
    self.multiple_choice_head = SequenceSummary(config)

    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.gpt.modeling_gpt.GPTDoubleHeadsModel.forward(input_ids=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, mc_token_ids=None, labels=None, mc_labels=None, output_attentions=None, output_hidden_states=None, return_dict=None)

PARAMETER DESCRIPTION
labels

Labels for language modeling. Note that the labels are shifted inside the model, i.e. you can set labels = input_ids Indices are selected in [-1, 0, ..., config.vocab_size] All labels set to -100 are ignored (masked), the loss is only computed for labels in [0, ..., config.vocab_size]

TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional* DEFAULT: None

mc_labels

Labels for computing the multiple choice classification loss. Indices should be in [0, ..., num_choices] where num_choices is the size of the second dimension of the input tensors. (see input_ids above)

TYPE: `mindspore.Tensor` of shape `(batch_size)`, *optional* DEFAULT: None

Return

Union[Tuple[mindspore.Tensor], GPTDoubleHeadsModelOutput]

Example
>>> from transformers import AutoTokenizer, OpenAIGPTDoubleHeadsModel
...
...
>>> tokenizer = AutoTokenizer.from_pretrained("openai-gpt")
>>> model = OpenAIGPTDoubleHeadsModel.from_pretrained("openai-gpt")
>>> tokenizer.add_special_tokens(
...     {"cls_token": "[CLS]"}
... )  # Add a [CLS] to the vocabulary (we should train it also!)
>>> model.resize_token_embeddings(len(tokenizer))
...
>>> choices = ["Hello, my dog is cute [CLS]", "Hello, my cat is cute [CLS]"]
>>> input_ids = torch.tensor([tokenizer.encode(s) for s in choices]).unsqueeze(0)  # Batch size 1, 2 choices
>>> mc_token_ids = torch.tensor([input_ids.shape[-1] - 1, input_ids.shape[-1] - 1]).unsqueeze(0)  # Batch size 1
...
>>> outputs = model(input_ids, mc_token_ids=mc_token_ids)
>>> lm_logits = outputs.logits
>>> mc_logits = outputs.mc_logits
Source code in mindnlp/transformers/models/gpt/modeling_gpt.py
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    token_type_ids: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    mc_token_ids: Optional[mindspore.Tensor] = None,
    labels: Optional[mindspore.Tensor] = None,
    mc_labels: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[Tuple[mindspore.Tensor], GPTDoubleHeadsModelOutput]:
    r"""
    Args:
        mc_token_ids (`mindspore.Tensor` of shape `(batch_size, num_choices)`, *optional*,
            default to index of the last token of the input):
            Index of the classification token in each input sequence. Selected in the range `[0, input_ids.shape[-1] -
            1]`.
        labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
            Labels for language modeling. Note that the labels **are shifted** inside the model, i.e. you can set
            `labels = input_ids` Indices are selected in `[-1, 0, ..., config.vocab_size]` All labels set to `-100` are
            ignored (masked), the loss is only computed for labels in `[0, ..., config.vocab_size]`
        mc_labels (`mindspore.Tensor` of shape `(batch_size)`, *optional*):
            Labels for computing the multiple choice classification loss. Indices should be in `[0, ..., num_choices]`
            where *num_choices* is the size of the second dimension of the input tensors. (see *input_ids* above)

    Return:
        Union[Tuple[mindspore.Tensor], GPTDoubleHeadsModelOutput]

    Example:
        ```python
        >>> from transformers import AutoTokenizer, OpenAIGPTDoubleHeadsModel
        ...
        ...
        >>> tokenizer = AutoTokenizer.from_pretrained("openai-gpt")
        >>> model = OpenAIGPTDoubleHeadsModel.from_pretrained("openai-gpt")
        >>> tokenizer.add_special_tokens(
        ...     {"cls_token": "[CLS]"}
        ... )  # Add a [CLS] to the vocabulary (we should train it also!)
        >>> model.resize_token_embeddings(len(tokenizer))
        ...
        >>> choices = ["Hello, my dog is cute [CLS]", "Hello, my cat is cute [CLS]"]
        >>> input_ids = torch.tensor([tokenizer.encode(s) for s in choices]).unsqueeze(0)  # Batch size 1, 2 choices
        >>> mc_token_ids = torch.tensor([input_ids.shape[-1] - 1, input_ids.shape[-1] - 1]).unsqueeze(0)  # Batch size 1
        ...
        >>> outputs = model(input_ids, mc_token_ids=mc_token_ids)
        >>> lm_logits = outputs.logits
        >>> mc_logits = outputs.mc_logits
        ```
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    transformer_outputs = self.transformer(
        input_ids,
        attention_mask=attention_mask,
        token_type_ids=token_type_ids,
        position_ids=position_ids,
        head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )
    hidden_states = transformer_outputs[0]

    lm_logits = self.lm_head(hidden_states)
    mc_logits = self.multiple_choice_head(hidden_states, mc_token_ids).squeeze(-1)

    lm_loss, mc_loss = None, None
    if mc_labels is not None:
        mc_loss = ops.cross_entropy(mc_logits.view(-1, mc_logits.shape[-1]), mc_labels.view(-1))
    if labels is not None:
        shift_logits = lm_logits[..., :-1, :]
        shift_labels = labels[..., 1:]
        lm_loss = ops.cross_entropy(shift_logits.view(-1, shift_logits.shape[-1]), shift_labels.view(-1))

    if not return_dict:
        output = (lm_logits, mc_logits) + transformer_outputs[1:]
        if mc_loss is not None:
            output = (mc_loss,) + output
        return ((lm_loss,) + output) if lm_loss is not None else output

    return GPTDoubleHeadsModelOutput(
        loss=lm_loss,
        mc_loss=mc_loss,
        logits=lm_logits,
        mc_logits=mc_logits,
        hidden_states=transformer_outputs.hidden_states,
        attentions=transformer_outputs.attentions,
    )

mindnlp.transformers.models.gpt.modeling_gpt.GPTDoubleHeadsModel.get_output_embeddings()

Returns the output embeddings for the GPTDoubleHeadsModel.

PARAMETER DESCRIPTION
self

An instance of the GPTDoubleHeadsModel class.

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/gpt/modeling_gpt.py
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
def get_output_embeddings(self):
    """
    Returns the output embeddings for the GPTDoubleHeadsModel.

    Args:
        self: An instance of the GPTDoubleHeadsModel class.

    Returns:
        None

    Raises:
        None
    """
    return self.lm_head

mindnlp.transformers.models.gpt.modeling_gpt.GPTDoubleHeadsModel.set_output_embeddings(new_embeddings)

Sets the output embeddings of the GPTDoubleHeadsModel.

PARAMETER DESCRIPTION
self

The instance of the GPTDoubleHeadsModel class.

TYPE: GPTDoubleHeadsModel

new_embeddings

The new embeddings to be set as the output embeddings. This can be an object of any type.

TYPE: Any

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/gpt/modeling_gpt.py
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
def set_output_embeddings(self, new_embeddings):
    """
    Sets the output embeddings of the GPTDoubleHeadsModel.

    Args:
        self (GPTDoubleHeadsModel): The instance of the GPTDoubleHeadsModel class.
        new_embeddings (Any): The new embeddings to be set as the output embeddings. This can be an object of any type.

    Returns:
        None.

    Raises:
        None.
    """
    self.lm_head = new_embeddings

mindnlp.transformers.models.gpt.modeling_gpt.GPTDoubleHeadsModelOutput dataclass

Bases: ModelOutput

Base class for outputs of models predicting if two sentences are consecutive or not.

PARAMETER DESCRIPTION
loss

Language modeling loss.

TYPE: `mindspore.Tensor` of shape `(1,)`, *optional*, returned when `labels` is provided DEFAULT: None

mc_loss

Multiple choice classification loss.

TYPE: `mindspore.Tensor` of shape `(1,)`, *optional*, returned when `mc_labels` is provided DEFAULT: None

logits

Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).

TYPE: `mindspore.Tensor` of shape `(batch_size, num_choices, sequence_length, config.vocab_size)` DEFAULT: None

mc_logits

Prediction scores of the multiple choice classification head (scores for each choice before SoftMax).

TYPE: `mindspore.Tensor` of shape `(batch_size, num_choices)` DEFAULT: None

Source code in mindnlp/transformers/models/gpt/modeling_gpt.py
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
@dataclass
class GPTDoubleHeadsModelOutput(ModelOutput):
    """
    Base class for outputs of models predicting if two sentences are consecutive or not.

    Args:
        loss (`mindspore.Tensor` of shape `(1,)`, *optional*, returned when `labels` is provided):
            Language modeling loss.
        mc_loss (`mindspore.Tensor` of shape `(1,)`, *optional*, returned when `mc_labels` is provided):
            Multiple choice classification loss.
        logits (`mindspore.Tensor` of shape `(batch_size, num_choices, sequence_length, config.vocab_size)`):
            Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
        mc_logits (`mindspore.Tensor` of shape `(batch_size, num_choices)`):
            Prediction scores of the multiple choice classification head (scores for each choice before SoftMax).
        hidden_states (`tuple(mindspore.Tensor)`, *optional*, returned when `output_hidden_states=True` is passed or
            when `config.output_hidden_states=True`):
            Tuple of `mindspore.Tensor` (one for the output of the embeddings + one for the output of each layer) of
            shape `(batch_size, sequence_length, hidden_size)`.

            Hidden-states of the model at the output of each layer plus the initial embedding outputs.
        attentions (`tuple(mindspore.Tensor)`, *optional*, returned when `output_attentions=True` is passed or
            when `config.output_attentions=True`):
            Tuple of `mindspore.Tensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length,
            sequence_length)`.

            Attentions weights after the attention softmax, used to compute the weighted average in the self-attention
            heads.
    """
    loss: Optional[mindspore.Tensor] = None
    mc_loss: Optional[mindspore.Tensor] = None
    logits: mindspore.Tensor = None
    mc_logits: mindspore.Tensor = None
    hidden_states: Optional[Tuple[mindspore.Tensor]] = None
    attentions: Optional[Tuple[mindspore.Tensor]] = None

mindnlp.transformers.models.gpt.modeling_gpt.GPTForSequenceClassification

Bases: GPTPreTrainedModel

This class 'GPTForSequenceClassification' represents a sequence classification model based on the GPT (Generative Pre-trained Transformer) architecture. It is designed to classify sequences based on the provided input.

This class inherits from the 'GPTPreTrainedModel' class, which provides the basic functionality for a pre-trained GPT model.

The class contains an initializer method 'init' which takes a 'config' parameter. It calls the initializer of the parent class and initializes the 'num_labels' attribute with the 'num_labels' value from the 'config'. It also initializes a 'transformer' attribute with an instance of the 'GPTModel' class from the 'config'. Additionally, it creates a 'score' attribute which is a neural network layer with a dense layer of shape '(config.n_embd, num_labels)' and no bias. Finally, it calls the 'post_init' method.

The 'forward' method is responsible for forwarding the sequence classification model. It takes several optional input tensors as parameters, including 'input_ids', 'attention_mask', 'token_type_ids', 'position_ids', 'head_mask', 'inputs_embeds', 'labels', 'output_attentions', 'output_hidden_states', and 'return_dict'. It returns a Tuple of tensors or a 'SequenceClassifierOutput' object.

The 'labels' parameter is an optional tensor of shape '(batch_size,)', which provides the labels for computing the sequence classification/regression loss. The indices in 'labels' should be in the range of '[0, ..., config.num_labels - 1]'. If 'config.num_labels == 1', a regression loss (Mean-Square loss) is computed. If 'config.num_labels > 1', a classification loss (Cross-Entropy) is computed.

The 'return_dict' parameter indicates whether the method should return a 'SequenceClassifierOutput' object. If 'return_dict' is not provided, it defaults to the value of 'self.config.use_return_dict'.

The method first calls the 'transformer' model with the provided input tensors and other optional parameters to obtain the transformer outputs, including the 'hidden_states' tensor. Then, it passes the 'hidden_states' tensor through the 'score' layer to obtain the 'logits' tensor.

Next, the method checks the shape of the 'input_ids' tensor to determine the batch size. If 'input_ids' is not None, the shape of 'input_ids' is used to calculate the sequence lengths. If 'self.config.pad_token_id' is not None, the method checks for padding tokens in 'input_ids' and calculates the sequence lengths accordingly. If 'input_ids' is None, the sequence lengths are set to -1.

The method then selects the relevant logits based on the sequence lengths. If 'sequence_lengths' is an integer, the method uses it to index the 'logits' tensor. Otherwise, it uses the 'sequence_lengths' tensor to gather the relevant logits.

The 'loss' variable is set to None initially. If 'labels' is provided, the method determines the 'problem_type' based on the 'config' and the shape and dtype of 'labels'. Depending on the 'problem_type', the method calculates the loss using operations provided by the 'ops' module.

Finally, depending on the 'return_dict' parameter, the method either returns a Tuple of tensors or a 'SequenceClassifierOutput' object containing the 'loss', 'logits', 'hidden_states', and 'attentions'.

Note

This docstring does not include method signatures or any other code for clarity.

Source code in mindnlp/transformers/models/gpt/modeling_gpt.py
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
class GPTForSequenceClassification(GPTPreTrainedModel):

    """
    This class 'GPTForSequenceClassification' represents a sequence classification model based on the GPT
    (Generative Pre-trained Transformer) architecture. It is designed to classify sequences based on the
    provided input.

    This class inherits from the 'GPTPreTrainedModel' class, which provides the basic functionality for a pre-trained
    GPT model.

    The class contains an initializer method '__init__' which takes a 'config' parameter. It calls the initializer of
    the parent class and initializes the 'num_labels' attribute with the 'num_labels' value from the 'config'.
    It also initializes a 'transformer' attribute with an instance of the 'GPTModel' class from the 'config'.
    Additionally, it creates a 'score' attribute which is a neural network layer with a dense layer of shape
    '(config.n_embd, num_labels)' and no bias. Finally, it calls the 'post_init' method.

    The 'forward' method is responsible for forwarding the sequence classification model.
    It takes several optional input tensors as parameters, including 'input_ids', 'attention_mask', 'token_type_ids',
    'position_ids', 'head_mask', 'inputs_embeds', 'labels', 'output_attentions', 'output_hidden_states',
    and 'return_dict'. It returns a Tuple of tensors or a 'SequenceClassifierOutput' object.

    The 'labels' parameter is an optional tensor of shape '(batch_size,)', which provides the labels for computing
    the sequence classification/regression loss. The indices in 'labels' should be in the range of
    '[0, ..., config.num_labels - 1]'. If 'config.num_labels == 1', a regression loss (Mean-Square loss) is computed.
    If 'config.num_labels > 1', a classification loss (Cross-Entropy) is computed.

    The 'return_dict' parameter indicates whether the method should return a 'SequenceClassifierOutput' object.
    If 'return_dict' is not provided, it defaults to the value of 'self.config.use_return_dict'.

    The method first calls the 'transformer' model with the provided input tensors and other optional parameters to
    obtain the transformer outputs, including the 'hidden_states' tensor.
    Then, it passes the 'hidden_states' tensor through the 'score' layer to obtain the 'logits' tensor.

    Next, the method checks the shape of the 'input_ids' tensor to determine the batch size. If 'input_ids' is not None,
    the shape of 'input_ids' is used to calculate the sequence lengths. If 'self.config.pad_token_id' is not None,
    the method checks for padding tokens in 'input_ids' and calculates the sequence lengths accordingly.
    If 'input_ids' is None, the sequence lengths are set to -1.

    The method then selects the relevant logits based on the sequence lengths.
    If 'sequence_lengths' is an integer, the method uses it to index the 'logits' tensor. Otherwise,
    it uses the 'sequence_lengths' tensor to gather the relevant logits.

    The 'loss' variable is set to None initially. If 'labels' is provided, the method determines the 'problem_type'
    based on the 'config' and the shape and dtype of 'labels'. Depending on the 'problem_type', the method calculates
    the loss using operations provided by the 'ops' module.

    Finally, depending on the 'return_dict' parameter, the method either returns a Tuple of tensors or a
    'SequenceClassifierOutput' object containing the 'loss', 'logits', 'hidden_states', and 'attentions'.

    Note:
        This docstring does not include method signatures or any other code for clarity.
    """
    def __init__(self, config):
        """
        Initializes an instance of GPTForSequenceClassification.

        Args:
            self (GPTForSequenceClassification): The instance itself.
            config:
                An object containing configuration settings for the model.

                - Type: object
                - Purpose: Specifies the configuration settings for the model.
                - Restrictions: Must be compatible with the GPTModel configuration.

        Returns:
            None.

        Raises:
            NotImplementedError: If the method 'post_init' is not implemented in the derived class.
            TypeError: If the configuration settings provided are not compatible with the GPTModel.
        """
        super().__init__(config)
        self.num_labels = config.num_labels
        self.transformer = GPTModel(config)
        self.score = nn.Linear(config.n_embd, self.num_labels, bias=False)

        # Initialize weights and apply final processing
        self.post_init()

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        token_type_ids: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        labels: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple[mindspore.Tensor], SequenceClassifierOutput]:
        r"""
        Args:
            labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
                Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
                config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
                `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        transformer_outputs = self.transformer(
            input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        hidden_states = transformer_outputs[0]
        logits = self.score(hidden_states)

        if input_ids is not None:
            batch_size, _ = input_ids.shape[:2]
        else:
            batch_size, _ = inputs_embeds.shape[:2]

        # Ensure the batch size is > 1 if there is no padding.
        if self.config.pad_token_id is None and batch_size != 1:
            raise ValueError("Cannot handle batch sizes > 1 if no padding token is defined.")

        if self.config.pad_token_id is None:
            sequence_lengths = -1
        else:
            if input_ids is not None:
                sequence_lengths = ops.eq(input_ids, self.config.pad_token_id).int().argmax(-1) - 1
                # avoid backward error
                sequence_lengths = sequence_lengths % input_ids.shape[-1]
            else:
                sequence_lengths = -1
                logger.warning(
                    f"{self.__class__.__name__} will not detect padding tokens in `inputs_embeds`. Results may be "
                    "unexpected if using padding tokens in conjunction with `inputs_embeds.`"
                )

        if isinstance(sequence_lengths, int):
            pooled_logits = logits[ops.arange(batch_size), sequence_lengths]
        else:
            pooled_logits = ops.gather(logits, sequence_lengths, 1, 1)

        loss = None
        if labels is not None:
            if self.config.problem_type is None:
                if self.num_labels == 1:
                    self.config.problem_type = "regression"
                elif self.num_labels > 1 and labels.dtype in (mindspore.int64, mindspore.int32):
                    self.config.problem_type = "single_label_classification"
                else:
                    self.config.problem_type = "multi_label_classification"

            if self.config.problem_type == "regression":
                if self.num_labels == 1:
                    loss = ops.mse_loss(pooled_logits.squeeze(), labels.squeeze())
                else:
                    loss = ops.mse_loss(pooled_logits, labels)
            elif self.config.problem_type == "single_label_classification":
                loss = ops.cross_entropy(pooled_logits.view(-1, self.num_labels), labels.view(-1))
            elif self.config.problem_type == "multi_label_classification":
                loss = ops.binary_cross_entropy_with_logits(pooled_logits, labels)
        if not return_dict:
            output = (pooled_logits,) + transformer_outputs[1:]
            return ((loss,) + output) if loss is not None else output

        return SequenceClassifierOutput(
            loss=loss,
            logits=pooled_logits,
            hidden_states=transformer_outputs.hidden_states,
            attentions=transformer_outputs.attentions,
        )

mindnlp.transformers.models.gpt.modeling_gpt.GPTForSequenceClassification.__init__(config)

Initializes an instance of GPTForSequenceClassification.

PARAMETER DESCRIPTION
self

The instance itself.

TYPE: GPTForSequenceClassification

config

An object containing configuration settings for the model.

  • Type: object
  • Purpose: Specifies the configuration settings for the model.
  • Restrictions: Must be compatible with the GPTModel configuration.

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
NotImplementedError

If the method 'post_init' is not implemented in the derived class.

TypeError

If the configuration settings provided are not compatible with the GPTModel.

Source code in mindnlp/transformers/models/gpt/modeling_gpt.py
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
def __init__(self, config):
    """
    Initializes an instance of GPTForSequenceClassification.

    Args:
        self (GPTForSequenceClassification): The instance itself.
        config:
            An object containing configuration settings for the model.

            - Type: object
            - Purpose: Specifies the configuration settings for the model.
            - Restrictions: Must be compatible with the GPTModel configuration.

    Returns:
        None.

    Raises:
        NotImplementedError: If the method 'post_init' is not implemented in the derived class.
        TypeError: If the configuration settings provided are not compatible with the GPTModel.
    """
    super().__init__(config)
    self.num_labels = config.num_labels
    self.transformer = GPTModel(config)
    self.score = nn.Linear(config.n_embd, self.num_labels, bias=False)

    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.gpt.modeling_gpt.GPTForSequenceClassification.forward(input_ids=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=None)

PARAMETER DESCRIPTION
labels

Labels for computing the sequence classification/regression loss. Indices should be in [0, ..., config.num_labels - 1]. If config.num_labels == 1 a regression loss is computed (Mean-Square loss), If config.num_labels > 1 a classification loss is computed (Cross-Entropy).

TYPE: `mindspore.Tensor` of shape `(batch_size,)`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/gpt/modeling_gpt.py
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    token_type_ids: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    labels: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[Tuple[mindspore.Tensor], SequenceClassifierOutput]:
    r"""
    Args:
        labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
            config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
            `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    transformer_outputs = self.transformer(
        input_ids,
        attention_mask=attention_mask,
        token_type_ids=token_type_ids,
        position_ids=position_ids,
        head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    hidden_states = transformer_outputs[0]
    logits = self.score(hidden_states)

    if input_ids is not None:
        batch_size, _ = input_ids.shape[:2]
    else:
        batch_size, _ = inputs_embeds.shape[:2]

    # Ensure the batch size is > 1 if there is no padding.
    if self.config.pad_token_id is None and batch_size != 1:
        raise ValueError("Cannot handle batch sizes > 1 if no padding token is defined.")

    if self.config.pad_token_id is None:
        sequence_lengths = -1
    else:
        if input_ids is not None:
            sequence_lengths = ops.eq(input_ids, self.config.pad_token_id).int().argmax(-1) - 1
            # avoid backward error
            sequence_lengths = sequence_lengths % input_ids.shape[-1]
        else:
            sequence_lengths = -1
            logger.warning(
                f"{self.__class__.__name__} will not detect padding tokens in `inputs_embeds`. Results may be "
                "unexpected if using padding tokens in conjunction with `inputs_embeds.`"
            )

    if isinstance(sequence_lengths, int):
        pooled_logits = logits[ops.arange(batch_size), sequence_lengths]
    else:
        pooled_logits = ops.gather(logits, sequence_lengths, 1, 1)

    loss = None
    if labels is not None:
        if self.config.problem_type is None:
            if self.num_labels == 1:
                self.config.problem_type = "regression"
            elif self.num_labels > 1 and labels.dtype in (mindspore.int64, mindspore.int32):
                self.config.problem_type = "single_label_classification"
            else:
                self.config.problem_type = "multi_label_classification"

        if self.config.problem_type == "regression":
            if self.num_labels == 1:
                loss = ops.mse_loss(pooled_logits.squeeze(), labels.squeeze())
            else:
                loss = ops.mse_loss(pooled_logits, labels)
        elif self.config.problem_type == "single_label_classification":
            loss = ops.cross_entropy(pooled_logits.view(-1, self.num_labels), labels.view(-1))
        elif self.config.problem_type == "multi_label_classification":
            loss = ops.binary_cross_entropy_with_logits(pooled_logits, labels)
    if not return_dict:
        output = (pooled_logits,) + transformer_outputs[1:]
        return ((loss,) + output) if loss is not None else output

    return SequenceClassifierOutput(
        loss=loss,
        logits=pooled_logits,
        hidden_states=transformer_outputs.hidden_states,
        attentions=transformer_outputs.attentions,
    )

mindnlp.transformers.models.gpt.modeling_gpt.GPTLMHeadModel

Bases: GPTPreTrainedModel

This class represents a language model head for the Generative Pre-trained Transformer (GPT) model. It is used for generating language predictions and is designed to be compatible with the GPT architecture.

The GPTLMHeadModel class provides methods for initializing the model with a configuration, getting and setting output embeddings, forwarding language model outputs, and preparing inputs for generation. It inherits from the GPTPreTrainedModel class.

METHOD DESCRIPTION
__init__

Initializes the model with a given configuration.

get_output_embeddings

Returns the output embeddings of the model.

set_output_embeddings

Sets new output embeddings for the model.

forward

Constructs language model outputs based on input features.

prepare_inputs_for_generation

Prepares input data for language generation.

The forward method takes various input arguments for language modeling and returns model outputs, including logits and hidden states. The prepare_inputs_for_generation method prepares input data specifically for language generation tasks.

Note

The GPTLMHeadModel class is designed for use in natural language processing tasks and is a part of the GPT model framework.

Source code in mindnlp/transformers/models/gpt/modeling_gpt.py
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
class GPTLMHeadModel(GPTPreTrainedModel):

    """
    This class represents a language model head for the Generative Pre-trained Transformer (GPT) model.
    It is used for generating language predictions and is designed to be compatible with the GPT architecture.

    The GPTLMHeadModel class provides methods for initializing the model with a configuration, getting and
    setting output embeddings,
    forwarding language model outputs, and preparing inputs for generation.
    It inherits from the GPTPreTrainedModel class.

    Methods:
        __init__: Initializes the model with a given configuration.
        get_output_embeddings: Returns the output embeddings of the model.
        set_output_embeddings: Sets new output embeddings for the model.
        forward: Constructs language model outputs based on input features.
        prepare_inputs_for_generation: Prepares input data for language generation.

    The forward method takes various input arguments for language modeling and returns model outputs, including logits
    and hidden states.
    The prepare_inputs_for_generation method prepares input data specifically for language generation tasks.

    Note:
        The GPTLMHeadModel class is designed for use in natural language processing tasks and is a part of the GPT model
        framework.
    """
    _tied_weights_keys = ["lm_head.weight"]

    def __init__(self, config):
        """
        Initializes a new instance of the GPTLMHeadModel class.

        Args:
            self (GPTLMHeadModel): The instance of the GPTLMHeadModel class.
            config (Config): The configuration object containing model parameters.

        Returns:
            None.

        Raises:
            ValueError: If the configuration object is missing or invalid.
            TypeError: If the configuration object is not of type Config.
            RuntimeError: If there are issues during model initialization.
        """
        super().__init__(config)
        self.transformer = GPTModel(config)
        self.lm_head = nn.Linear(config.n_embd, config.vocab_size, bias=False)

        # Initialize weights and apply final processing
        self.post_init()

    def get_output_embeddings(self):
        """
        Method to retrieve the output embeddings from the GPTLMHeadModel.

        Args:
            self (GPTLMHeadModel): The instance of the GPTLMHeadModel class.
                This parameter refers to the current instance of the GPTLMHeadModel class.

        Returns:
            The 'lm_head' attribute of the GPTLMHeadModel instance.

        Raises:
            None.
        """
        return self.lm_head

    def set_output_embeddings(self, new_embeddings):
        """
        This method sets the output embeddings for the GPTLMHeadModel.

        Args:
            self (object): The instance of the GPTLMHeadModel class.
            new_embeddings (object):
                The new embeddings to be set as the output embeddings for the model.
                It should be of the same type as the existing embeddings.

        Returns:
            None:.

        Raises:
            None
        """
        self.lm_head = new_embeddings

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        token_type_ids: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        labels: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple[mindspore.Tensor], CausalLMOutput]:
        r"""
        Args:
            labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
                Labels for language modeling. Note that the labels **are shifted** inside the model, i.e. you can set
                `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100`
                are ignored (masked), the loss is only computed for labels in `[0, ..., config.vocab_size]`
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        transformer_outputs = self.transformer(
            input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )
        hidden_states = transformer_outputs[0]
        lm_logits = self.lm_head(hidden_states)

        loss = None
        if labels is not None:
            # Shift so that tokens < n predict n
            shift_logits = lm_logits[..., :-1, :]
            shift_labels = labels[..., 1:]
            # Flatten the tokens
            loss = ops.cross_entropy(shift_logits.view(-1, shift_logits.shape[-1]), shift_labels.view(-1))

        if not return_dict:
            output = (lm_logits,) + transformer_outputs[1:]
            return ((loss,) + output) if loss is not None else output

        return CausalLMOutput(
            loss=loss,
            logits=lm_logits,
            hidden_states=transformer_outputs.hidden_states,
            attentions=transformer_outputs.attentions,
        )

    def prepare_inputs_for_generation(self, input_ids: mindspore.Tensor, **kwargs) -> Dict[str, Any]:
        """
        Prepare inputs for generation.

        Args:
            self (GPTLMHeadModel): The instance of the GPTLMHeadModel class.
            input_ids (mindspore.Tensor): The input tensor containing the token ids for the generation.

        Returns:
            Dict[str, Any]: A dictionary containing the prepared input_ids.

        Raises:
            None.
        """
        return {"input_ids": input_ids}

mindnlp.transformers.models.gpt.modeling_gpt.GPTLMHeadModel.__init__(config)

Initializes a new instance of the GPTLMHeadModel class.

PARAMETER DESCRIPTION
self

The instance of the GPTLMHeadModel class.

TYPE: GPTLMHeadModel

config

The configuration object containing model parameters.

TYPE: Config

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
ValueError

If the configuration object is missing or invalid.

TypeError

If the configuration object is not of type Config.

RuntimeError

If there are issues during model initialization.

Source code in mindnlp/transformers/models/gpt/modeling_gpt.py
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
def __init__(self, config):
    """
    Initializes a new instance of the GPTLMHeadModel class.

    Args:
        self (GPTLMHeadModel): The instance of the GPTLMHeadModel class.
        config (Config): The configuration object containing model parameters.

    Returns:
        None.

    Raises:
        ValueError: If the configuration object is missing or invalid.
        TypeError: If the configuration object is not of type Config.
        RuntimeError: If there are issues during model initialization.
    """
    super().__init__(config)
    self.transformer = GPTModel(config)
    self.lm_head = nn.Linear(config.n_embd, config.vocab_size, bias=False)

    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.gpt.modeling_gpt.GPTLMHeadModel.forward(input_ids=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=None)

PARAMETER DESCRIPTION
labels

Labels for language modeling. Note that the labels are shifted inside the model, i.e. you can set labels = input_ids Indices are selected in [-100, 0, ..., config.vocab_size] All labels set to -100 are ignored (masked), the loss is only computed for labels in [0, ..., config.vocab_size]

TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/gpt/modeling_gpt.py
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    token_type_ids: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    labels: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[Tuple[mindspore.Tensor], CausalLMOutput]:
    r"""
    Args:
        labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
            Labels for language modeling. Note that the labels **are shifted** inside the model, i.e. you can set
            `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100`
            are ignored (masked), the loss is only computed for labels in `[0, ..., config.vocab_size]`
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    transformer_outputs = self.transformer(
        input_ids,
        attention_mask=attention_mask,
        token_type_ids=token_type_ids,
        position_ids=position_ids,
        head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )
    hidden_states = transformer_outputs[0]
    lm_logits = self.lm_head(hidden_states)

    loss = None
    if labels is not None:
        # Shift so that tokens < n predict n
        shift_logits = lm_logits[..., :-1, :]
        shift_labels = labels[..., 1:]
        # Flatten the tokens
        loss = ops.cross_entropy(shift_logits.view(-1, shift_logits.shape[-1]), shift_labels.view(-1))

    if not return_dict:
        output = (lm_logits,) + transformer_outputs[1:]
        return ((loss,) + output) if loss is not None else output

    return CausalLMOutput(
        loss=loss,
        logits=lm_logits,
        hidden_states=transformer_outputs.hidden_states,
        attentions=transformer_outputs.attentions,
    )

mindnlp.transformers.models.gpt.modeling_gpt.GPTLMHeadModel.get_output_embeddings()

Method to retrieve the output embeddings from the GPTLMHeadModel.

PARAMETER DESCRIPTION
self

The instance of the GPTLMHeadModel class. This parameter refers to the current instance of the GPTLMHeadModel class.

TYPE: GPTLMHeadModel

RETURNS DESCRIPTION

The 'lm_head' attribute of the GPTLMHeadModel instance.

Source code in mindnlp/transformers/models/gpt/modeling_gpt.py
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
def get_output_embeddings(self):
    """
    Method to retrieve the output embeddings from the GPTLMHeadModel.

    Args:
        self (GPTLMHeadModel): The instance of the GPTLMHeadModel class.
            This parameter refers to the current instance of the GPTLMHeadModel class.

    Returns:
        The 'lm_head' attribute of the GPTLMHeadModel instance.

    Raises:
        None.
    """
    return self.lm_head

mindnlp.transformers.models.gpt.modeling_gpt.GPTLMHeadModel.prepare_inputs_for_generation(input_ids, **kwargs)

Prepare inputs for generation.

PARAMETER DESCRIPTION
self

The instance of the GPTLMHeadModel class.

TYPE: GPTLMHeadModel

input_ids

The input tensor containing the token ids for the generation.

TYPE: Tensor

RETURNS DESCRIPTION
Dict[str, Any]

Dict[str, Any]: A dictionary containing the prepared input_ids.

Source code in mindnlp/transformers/models/gpt/modeling_gpt.py
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
def prepare_inputs_for_generation(self, input_ids: mindspore.Tensor, **kwargs) -> Dict[str, Any]:
    """
    Prepare inputs for generation.

    Args:
        self (GPTLMHeadModel): The instance of the GPTLMHeadModel class.
        input_ids (mindspore.Tensor): The input tensor containing the token ids for the generation.

    Returns:
        Dict[str, Any]: A dictionary containing the prepared input_ids.

    Raises:
        None.
    """
    return {"input_ids": input_ids}

mindnlp.transformers.models.gpt.modeling_gpt.GPTLMHeadModel.set_output_embeddings(new_embeddings)

This method sets the output embeddings for the GPTLMHeadModel.

PARAMETER DESCRIPTION
self

The instance of the GPTLMHeadModel class.

TYPE: object

new_embeddings

The new embeddings to be set as the output embeddings for the model. It should be of the same type as the existing embeddings.

TYPE: object

RETURNS DESCRIPTION
None

.

Source code in mindnlp/transformers/models/gpt/modeling_gpt.py
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
def set_output_embeddings(self, new_embeddings):
    """
    This method sets the output embeddings for the GPTLMHeadModel.

    Args:
        self (object): The instance of the GPTLMHeadModel class.
        new_embeddings (object):
            The new embeddings to be set as the output embeddings for the model.
            It should be of the same type as the existing embeddings.

    Returns:
        None:.

    Raises:
        None
    """
    self.lm_head = new_embeddings

mindnlp.transformers.models.gpt.modeling_gpt.GPTModel

Bases: GPTPreTrainedModel

This class represents a GPT (Generative Pre-trained Transformer) model for natural language processing tasks. It inherits from the GPTPreTrainedModel class and implements the GPT architecture for generating text based on input sequences. The model includes methods for initializing embeddings, pruning heads, and forwarding the model for inference or training.

ATTRIBUTE DESCRIPTION
config

The configuration for the GPTModel, including parameters such as vocab_size, n_embd, n_positions, embd_pdrop, and n_layer.

METHOD DESCRIPTION
__init__

Initializes the GPTModel with the given configuration.

get_input_embeddings

Returns the input embeddings used in the model.

set_input_embeddings

Sets new input embeddings for the model.

_prune_heads)

Prunes specified heads of the model based on the provided dictionary of layer numbers and heads to prune.

forward

Constructs the GPTModel for inference or training based on the input data and configuration.

Source code in mindnlp/transformers/models/gpt/modeling_gpt.py
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
class GPTModel(GPTPreTrainedModel):

    """
    This class represents a GPT (Generative Pre-trained Transformer) model for natural language processing tasks.
    It inherits from the GPTPreTrainedModel class and implements the GPT architecture for generating text based on
    input sequences.
    The model includes methods for initializing embeddings, pruning heads, and forwarding the model for inference
    or training.

    Attributes:
        config: The configuration for the GPTModel, including parameters such as vocab_size, n_embd, n_positions,
            embd_pdrop, and n_layer.

    Methods:
        __init__: Initializes the GPTModel with the given configuration.
        get_input_embeddings: Returns the input embeddings used in the model.
        set_input_embeddings: Sets new input embeddings for the model.
        _prune_heads): Prunes specified heads of the model based on the provided dictionary
            of layer numbers and heads to prune.
        forward: Constructs the GPTModel for inference or training based on the input data and configuration.
    """
    def __init__(self, config):
        """
        Initializes a GPTModel instance with the provided configuration.

        Args:
            self (GPTModel): The GPTModel instance to be initialized.
            config (object):
                The configuration object containing parameters for the model.

                - vocab_size (int): The size of the vocabulary.
                - n_embd (int): The embedding dimension.
                - n_positions (int): The maximum number of positions.
                - embd_pdrop (float): The dropout probability for embeddings.
                - n_layer (int): The number of layers in the model.

        Returns:
            None.

        Raises:
            TypeError: If config is not of the expected object type.
            ValueError: If any of the configuration parameters are invalid or out of range.
        """
        super().__init__(config)

        self.tokens_embed = nn.Embedding(config.vocab_size, config.n_embd)
        self.positions_embed = nn.Embedding(config.n_positions, config.n_embd)
        self.drop = nn.Dropout(p=config.embd_pdrop)
        self.h = nn.ModuleList([Block(config.n_positions, config, scale=True) for _ in range(config.n_layer)])

        self.position_ids = ops.arange(config.n_positions)
        # Initialize weights and apply final processing
        self.post_init()

    def get_input_embeddings(self):
        """
        This method retrieves the input embeddings from the GPTModel.

        Args:
            self: The instance of the GPTModel class.

        Returns:
            None.

        Raises:
            None.
        """
        return self.tokens_embed

    def set_input_embeddings(self, new_embeddings):
        """
        Sets the input embeddings for the GPTModel.

        Args:
            self (GPTModel): An instance of the GPTModel class.
            new_embeddings (object): The new input embeddings to be set.

        Returns:
            None.

        Raises:
            None.

        Description:
            This method allows for updating the input embeddings of the GPTModel by assigning the provided
            'new_embeddings' to the 'tokens_embed' attribute. The 'tokens_embed' attribute is used by the model during
            tokenization and embedding stages.

        The 'self' parameter refers to the current instance of the GPTModel class, while the 'new_embeddings' parameter
        represents the new input embeddings to be assigned. The 'new_embeddings' can be of any data type and should
        contain the updated embeddings.

        Note that the 'tokens_embed' attribute is expected to be updated directly by this method.
        Any existing input embeddings will be overwritten.

        Example:
            ```python
            >>> model = GPTModel()
            >>> new_embeddings = get_new_embeddings()
            >>> model.set_input_embeddings(new_embeddings)
            ```
        """
        self.tokens_embed = new_embeddings

    def _prune_heads(self, heads_to_prune):
        """
        Prunes heads of the model. heads_to_prune: dict of {layer_num: list of heads to prune in this layer}
        """
        for layer, heads in heads_to_prune.items():
            self.h[layer].attn.prune_heads(heads)

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        token_type_ids: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple[mindspore.Tensor], BaseModelOutput]:
        '''
        Constructs the GPTModel.

        Args:
            self: The instance of the class.
            input_ids (Optional[mindspore.Tensor]):
                The input tensor of shape [batch_size, sequence_length] containing the input IDs.
            attention_mask (Optional[mindspore.Tensor]):
                The attention mask tensor of shape [batch_size, sequence_length] containing the attention mask.
            token_type_ids (Optional[mindspore.Tensor]):
                The token type IDs tensor of shape [batch_size, sequence_length] containing the token type IDs.
            position_ids (Optional[mindspore.Tensor]):
                The position IDs tensor of shape [batch_size, sequence_length] containing the position IDs.
            head_mask (Optional[mindspore.Tensor]):
                The head mask tensor of shape [num_heads] containing the head mask.
            inputs_embeds (Optional[mindspore.Tensor]):
                The inputs embeddings tensor of shape [batch_size, sequence_length, hidden_size] containing the input embeddings.
            output_attentions (Optional[bool]):
                Whether to output attentions. If not provided, it uses the value from the configuration.
            output_hidden_states (Optional[bool]):
                Whether to output hidden states. If not provided, it uses the value from the configuration.
            return_dict (Optional[bool]):
                Whether to return a BaseModelOutput instead of a tuple.
                If not provided, it uses the value from the configuration.

        Returns:
            Union[Tuple[mindspore.Tensor], BaseModelOutput]:
                The output of the GPTModel.

                - If return_dict is False, it returns a tuple containing the hidden states, all hidden states,
                and all attentions.
                - If return_dict is True, it returns a BaseModelOutput with last_hidden_state, hidden_states,
                and attentions.

        Raises:
            ValueError: If both input_ids and inputs_embeds are specified at the same time.
            ValueError: If neither input_ids nor inputs_embeds are specified.
        '''
        output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
        output_hidden_states = (
            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
        )
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        if input_ids is not None and inputs_embeds is not None:
            raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
        if input_ids is not None:
            self.warn_if_padding_and_no_attention_mask(input_ids, attention_mask)
            input_shape = input_ids.shape
            input_ids = input_ids.view(-1, input_shape[-1])
        elif inputs_embeds is not None:
            input_shape = inputs_embeds.shape[:-1]
        else:
            raise ValueError("You have to specify either input_ids or inputs_embeds")

        if position_ids is None:
            # Code is different from when we had a single embedding matrix  from position and token embeddings
            position_ids = self.position_ids[None, : input_shape[-1]]

        # Attention mask.
        if attention_mask is not None:
            # We create a 3D attention mask from a 2D tensor mask.
            # Sizes are [batch_size, 1, 1, to_seq_length]
            # So we can broadcast to [batch_size, num_heads, from_seq_length, to_seq_length]
            # this attention mask is more simple than the triangular masking of causal attention
            # used in OpenAI GPT, we just need to prepare the broadcast dimension here.
            attention_mask = attention_mask.unsqueeze(1).unsqueeze(2)

            # Since attention_mask is 1.0 for positions we want to attend and 0.0 for
            # masked positions, this operation will create a tensor which is 0.0 for
            # positions we want to attend and the dtype's smallest value for masked positions.
            # Since we are adding it to the raw scores before the softmax, this is
            # effectively the same as removing these entirely.
            attention_mask = attention_mask.to(dtype=next(self.get_parameters()).dtype)  # fp16 compatibility
            attention_mask = (1.0 - attention_mask) * mindspore.tensor(
                np.finfo(mindspore.dtype_to_nptype(self.dtype)).min, attention_mask.dtype)

        # Prepare head mask if needed
        head_mask = self.get_head_mask(head_mask, self.config.n_layer)

        if inputs_embeds is None:
            inputs_embeds = self.tokens_embed(input_ids)
        position_embeds = self.positions_embed(position_ids)
        if token_type_ids is not None:
            token_type_ids = token_type_ids.view(-1, token_type_ids.shape[-1])
            token_type_embeds = self.tokens_embed(token_type_ids)
        else:
            token_type_embeds = 0
        hidden_states = inputs_embeds + position_embeds + token_type_embeds
        hidden_states = self.drop(hidden_states)

        output_shape = input_shape + (hidden_states.shape[-1],)

        all_attentions = () if output_attentions else None
        all_hidden_states = () if output_hidden_states else None
        for i, block in enumerate(self.h):
            if output_hidden_states:
                all_hidden_states = all_hidden_states + (hidden_states,)

            outputs = block(hidden_states, attention_mask, head_mask[i], output_attentions=output_attentions)
            hidden_states = outputs[0]
            if output_attentions:
                all_attentions = all_attentions + (outputs[1],)

        hidden_states = hidden_states.view(*output_shape)
        # Add last layer
        if output_hidden_states:
            all_hidden_states = all_hidden_states + (hidden_states,)

        if not return_dict:
            return tuple(v for v in [hidden_states, all_hidden_states, all_attentions] if v is not None)

        return BaseModelOutput(
            last_hidden_state=hidden_states,
            hidden_states=all_hidden_states,
            attentions=all_attentions,
        )

mindnlp.transformers.models.gpt.modeling_gpt.GPTModel.__init__(config)

Initializes a GPTModel instance with the provided configuration.

PARAMETER DESCRIPTION
self

The GPTModel instance to be initialized.

TYPE: GPTModel

config

The configuration object containing parameters for the model.

  • vocab_size (int): The size of the vocabulary.
  • n_embd (int): The embedding dimension.
  • n_positions (int): The maximum number of positions.
  • embd_pdrop (float): The dropout probability for embeddings.
  • n_layer (int): The number of layers in the model.

TYPE: object

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
TypeError

If config is not of the expected object type.

ValueError

If any of the configuration parameters are invalid or out of range.

Source code in mindnlp/transformers/models/gpt/modeling_gpt.py
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
def __init__(self, config):
    """
    Initializes a GPTModel instance with the provided configuration.

    Args:
        self (GPTModel): The GPTModel instance to be initialized.
        config (object):
            The configuration object containing parameters for the model.

            - vocab_size (int): The size of the vocabulary.
            - n_embd (int): The embedding dimension.
            - n_positions (int): The maximum number of positions.
            - embd_pdrop (float): The dropout probability for embeddings.
            - n_layer (int): The number of layers in the model.

    Returns:
        None.

    Raises:
        TypeError: If config is not of the expected object type.
        ValueError: If any of the configuration parameters are invalid or out of range.
    """
    super().__init__(config)

    self.tokens_embed = nn.Embedding(config.vocab_size, config.n_embd)
    self.positions_embed = nn.Embedding(config.n_positions, config.n_embd)
    self.drop = nn.Dropout(p=config.embd_pdrop)
    self.h = nn.ModuleList([Block(config.n_positions, config, scale=True) for _ in range(config.n_layer)])

    self.position_ids = ops.arange(config.n_positions)
    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.gpt.modeling_gpt.GPTModel.forward(input_ids=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, output_attentions=None, output_hidden_states=None, return_dict=None)

Constructs the GPTModel.

PARAMETER DESCRIPTION
self

The instance of the class.

input_ids

The input tensor of shape [batch_size, sequence_length] containing the input IDs.

TYPE: Optional[Tensor] DEFAULT: None

attention_mask

The attention mask tensor of shape [batch_size, sequence_length] containing the attention mask.

TYPE: Optional[Tensor] DEFAULT: None

token_type_ids

The token type IDs tensor of shape [batch_size, sequence_length] containing the token type IDs.

TYPE: Optional[Tensor] DEFAULT: None

position_ids

The position IDs tensor of shape [batch_size, sequence_length] containing the position IDs.

TYPE: Optional[Tensor] DEFAULT: None

head_mask

The head mask tensor of shape [num_heads] containing the head mask.

TYPE: Optional[Tensor] DEFAULT: None

inputs_embeds

The inputs embeddings tensor of shape [batch_size, sequence_length, hidden_size] containing the input embeddings.

TYPE: Optional[Tensor] DEFAULT: None

output_attentions

Whether to output attentions. If not provided, it uses the value from the configuration.

TYPE: Optional[bool] DEFAULT: None

output_hidden_states

Whether to output hidden states. If not provided, it uses the value from the configuration.

TYPE: Optional[bool] DEFAULT: None

return_dict

Whether to return a BaseModelOutput instead of a tuple. If not provided, it uses the value from the configuration.

TYPE: Optional[bool] DEFAULT: None

RETURNS DESCRIPTION
Union[Tuple[Tensor], BaseModelOutput]

Union[Tuple[mindspore.Tensor], BaseModelOutput]: The output of the GPTModel.

  • If return_dict is False, it returns a tuple containing the hidden states, all hidden states, and all attentions.
  • If return_dict is True, it returns a BaseModelOutput with last_hidden_state, hidden_states, and attentions.
RAISES DESCRIPTION
ValueError

If both input_ids and inputs_embeds are specified at the same time.

ValueError

If neither input_ids nor inputs_embeds are specified.

Source code in mindnlp/transformers/models/gpt/modeling_gpt.py
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    token_type_ids: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[Tuple[mindspore.Tensor], BaseModelOutput]:
    '''
    Constructs the GPTModel.

    Args:
        self: The instance of the class.
        input_ids (Optional[mindspore.Tensor]):
            The input tensor of shape [batch_size, sequence_length] containing the input IDs.
        attention_mask (Optional[mindspore.Tensor]):
            The attention mask tensor of shape [batch_size, sequence_length] containing the attention mask.
        token_type_ids (Optional[mindspore.Tensor]):
            The token type IDs tensor of shape [batch_size, sequence_length] containing the token type IDs.
        position_ids (Optional[mindspore.Tensor]):
            The position IDs tensor of shape [batch_size, sequence_length] containing the position IDs.
        head_mask (Optional[mindspore.Tensor]):
            The head mask tensor of shape [num_heads] containing the head mask.
        inputs_embeds (Optional[mindspore.Tensor]):
            The inputs embeddings tensor of shape [batch_size, sequence_length, hidden_size] containing the input embeddings.
        output_attentions (Optional[bool]):
            Whether to output attentions. If not provided, it uses the value from the configuration.
        output_hidden_states (Optional[bool]):
            Whether to output hidden states. If not provided, it uses the value from the configuration.
        return_dict (Optional[bool]):
            Whether to return a BaseModelOutput instead of a tuple.
            If not provided, it uses the value from the configuration.

    Returns:
        Union[Tuple[mindspore.Tensor], BaseModelOutput]:
            The output of the GPTModel.

            - If return_dict is False, it returns a tuple containing the hidden states, all hidden states,
            and all attentions.
            - If return_dict is True, it returns a BaseModelOutput with last_hidden_state, hidden_states,
            and attentions.

    Raises:
        ValueError: If both input_ids and inputs_embeds are specified at the same time.
        ValueError: If neither input_ids nor inputs_embeds are specified.
    '''
    output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
    output_hidden_states = (
        output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
    )
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    if input_ids is not None and inputs_embeds is not None:
        raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
    if input_ids is not None:
        self.warn_if_padding_and_no_attention_mask(input_ids, attention_mask)
        input_shape = input_ids.shape
        input_ids = input_ids.view(-1, input_shape[-1])
    elif inputs_embeds is not None:
        input_shape = inputs_embeds.shape[:-1]
    else:
        raise ValueError("You have to specify either input_ids or inputs_embeds")

    if position_ids is None:
        # Code is different from when we had a single embedding matrix  from position and token embeddings
        position_ids = self.position_ids[None, : input_shape[-1]]

    # Attention mask.
    if attention_mask is not None:
        # We create a 3D attention mask from a 2D tensor mask.
        # Sizes are [batch_size, 1, 1, to_seq_length]
        # So we can broadcast to [batch_size, num_heads, from_seq_length, to_seq_length]
        # this attention mask is more simple than the triangular masking of causal attention
        # used in OpenAI GPT, we just need to prepare the broadcast dimension here.
        attention_mask = attention_mask.unsqueeze(1).unsqueeze(2)

        # Since attention_mask is 1.0 for positions we want to attend and 0.0 for
        # masked positions, this operation will create a tensor which is 0.0 for
        # positions we want to attend and the dtype's smallest value for masked positions.
        # Since we are adding it to the raw scores before the softmax, this is
        # effectively the same as removing these entirely.
        attention_mask = attention_mask.to(dtype=next(self.get_parameters()).dtype)  # fp16 compatibility
        attention_mask = (1.0 - attention_mask) * mindspore.tensor(
            np.finfo(mindspore.dtype_to_nptype(self.dtype)).min, attention_mask.dtype)

    # Prepare head mask if needed
    head_mask = self.get_head_mask(head_mask, self.config.n_layer)

    if inputs_embeds is None:
        inputs_embeds = self.tokens_embed(input_ids)
    position_embeds = self.positions_embed(position_ids)
    if token_type_ids is not None:
        token_type_ids = token_type_ids.view(-1, token_type_ids.shape[-1])
        token_type_embeds = self.tokens_embed(token_type_ids)
    else:
        token_type_embeds = 0
    hidden_states = inputs_embeds + position_embeds + token_type_embeds
    hidden_states = self.drop(hidden_states)

    output_shape = input_shape + (hidden_states.shape[-1],)

    all_attentions = () if output_attentions else None
    all_hidden_states = () if output_hidden_states else None
    for i, block in enumerate(self.h):
        if output_hidden_states:
            all_hidden_states = all_hidden_states + (hidden_states,)

        outputs = block(hidden_states, attention_mask, head_mask[i], output_attentions=output_attentions)
        hidden_states = outputs[0]
        if output_attentions:
            all_attentions = all_attentions + (outputs[1],)

    hidden_states = hidden_states.view(*output_shape)
    # Add last layer
    if output_hidden_states:
        all_hidden_states = all_hidden_states + (hidden_states,)

    if not return_dict:
        return tuple(v for v in [hidden_states, all_hidden_states, all_attentions] if v is not None)

    return BaseModelOutput(
        last_hidden_state=hidden_states,
        hidden_states=all_hidden_states,
        attentions=all_attentions,
    )

mindnlp.transformers.models.gpt.modeling_gpt.GPTModel.get_input_embeddings()

This method retrieves the input embeddings from the GPTModel.

PARAMETER DESCRIPTION
self

The instance of the GPTModel class.

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/gpt/modeling_gpt.py
597
598
599
600
601
602
603
604
605
606
607
608
609
610
def get_input_embeddings(self):
    """
    This method retrieves the input embeddings from the GPTModel.

    Args:
        self: The instance of the GPTModel class.

    Returns:
        None.

    Raises:
        None.
    """
    return self.tokens_embed

mindnlp.transformers.models.gpt.modeling_gpt.GPTModel.set_input_embeddings(new_embeddings)

Sets the input embeddings for the GPTModel.

PARAMETER DESCRIPTION
self

An instance of the GPTModel class.

TYPE: GPTModel

new_embeddings

The new input embeddings to be set.

TYPE: object

RETURNS DESCRIPTION

None.

Description

This method allows for updating the input embeddings of the GPTModel by assigning the provided 'new_embeddings' to the 'tokens_embed' attribute. The 'tokens_embed' attribute is used by the model during tokenization and embedding stages.

The 'self' parameter refers to the current instance of the GPTModel class, while the 'new_embeddings' parameter represents the new input embeddings to be assigned. The 'new_embeddings' can be of any data type and should contain the updated embeddings.

Note that the 'tokens_embed' attribute is expected to be updated directly by this method. Any existing input embeddings will be overwritten.

Example
>>> model = GPTModel()
>>> new_embeddings = get_new_embeddings()
>>> model.set_input_embeddings(new_embeddings)
Source code in mindnlp/transformers/models/gpt/modeling_gpt.py
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
def set_input_embeddings(self, new_embeddings):
    """
    Sets the input embeddings for the GPTModel.

    Args:
        self (GPTModel): An instance of the GPTModel class.
        new_embeddings (object): The new input embeddings to be set.

    Returns:
        None.

    Raises:
        None.

    Description:
        This method allows for updating the input embeddings of the GPTModel by assigning the provided
        'new_embeddings' to the 'tokens_embed' attribute. The 'tokens_embed' attribute is used by the model during
        tokenization and embedding stages.

    The 'self' parameter refers to the current instance of the GPTModel class, while the 'new_embeddings' parameter
    represents the new input embeddings to be assigned. The 'new_embeddings' can be of any data type and should
    contain the updated embeddings.

    Note that the 'tokens_embed' attribute is expected to be updated directly by this method.
    Any existing input embeddings will be overwritten.

    Example:
        ```python
        >>> model = GPTModel()
        >>> new_embeddings = get_new_embeddings()
        >>> model.set_input_embeddings(new_embeddings)
        ```
    """
    self.tokens_embed = new_embeddings

mindnlp.transformers.models.gpt.modeling_gpt.GPTPreTrainedModel

Bases: PreTrainedModel

An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained models.

Source code in mindnlp/transformers/models/gpt/modeling_gpt.py
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
class GPTPreTrainedModel(PreTrainedModel):
    """
    An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
    models.
    """
    config_class = GPTConfig
    base_model_prefix = "transformer"
    _keys_to_ignore_on_load_unexpected = [r'attn.bias']

    def _init_weights(self, cell):
        """Initialize the weights"""
        if isinstance(cell, (nn.Linear, Conv1D)):
            # Slightly different from the TF version which uses truncated_normal for initialization
            # cf https://github.com/pytorch/pytorch/pull/5617
            cell.weight.set_data(initializer(Normal(self.config.initializer_range),
                                                    cell.weight.shape, cell.weight.dtype))
            if cell.bias is not None:
                cell.bias.set_data(initializer('zeros', cell.bias.shape, cell.bias.dtype))
        elif isinstance(cell, nn.Embedding):
            weight = initializer(Normal(self.config.initializer_range),
                                                 cell.weight.shape,
                                                 cell.weight.dtype)
            if cell.padding_idx is not None:
                weight[cell.padding_idx] = 0
            cell.weight.set_data(weight)
        elif isinstance(cell, nn.LayerNorm):
            cell.weight.set_data(initializer('ones', cell.weight.shape, cell.weight.dtype))
            cell.bias.set_data(initializer('zeros', cell.bias.shape, cell.bias.dtype))

mindnlp.transformers.models.gpt.modeling_gpt.MLP

Bases: Module

MLP is a class that represents a multi-layer perceptron (MLP) model.

MLP is a neural network model that consists of multiple layers of perceptrons or artificial neurons. Each layer is fully connected to the next layer, and the final layer produces the output. The MLP class inherits from the nn.Module class, which is a base class for all neural network modules in the PyTorch framework.

The MLP class has the following attributes:

  • n_state: an integer representing the number of output channels in the first convolutional layer.
  • config: an object containing various configuration parameters for the MLP model.

The MLP class has the following methods:

  • init(self, n_state, config): Initializes the MLP object. It takes two parameters: n_state, which represents the number of output channels in the first convolutional layer, and config, which is an object containing configuration parameters for the MLP model. Inside the method, it initializes the parent class (nn.Module), sets the number of input channels (nx) to the value specified in the config, creates a 1-dimensional convolutional layer (self.c_fc) with n_state output channels and nx input channels, creates another 1-dimensional convolutional layer (self.c_proj) with nx output channels and n_state input channels, sets the activation function (self.act) to the value specified in the config, and sets the dropout probability (self.dropout) to the value specified in the config.
  • forward(self, x): Constructs the MLP model. It takes one parameter, x, which represents the input tensor. Inside the method, it applies the activation function to the output of the first convolutional layer (self.c_fc), applies the second convolutional layer (self.c_proj) to the result, and returns the output after applying dropout (self.dropout).
Note

The MLP class assumes the existence of the ACT_FNS dictionary, which maps activation function names to their corresponding functions.

Example
>>> config = MLPConfig(n_embd=128, afn='relu', resid_pdrop=0.2)
>>> mlp = MLP(n_state=64, config=config)
>>> input_tensor = torch.randn(10, 128)
>>> output = mlp.forward(input_tensor)
Source code in mindnlp/transformers/models/gpt/modeling_gpt.py
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
class MLP(nn.Module):

    """
    MLP is a class that represents a multi-layer perceptron (MLP) model.

    MLP is a neural network model that consists of multiple layers of perceptrons or artificial neurons.
    Each layer is fully connected to the next layer, and the final layer produces the output. The MLP class
    inherits from the nn.Module class, which is a base class for all neural network modules in the PyTorch framework.

    The MLP class has the following attributes:

    - n_state: an integer representing the number of output channels in the first convolutional layer.
    - config: an object containing various configuration parameters for the MLP model.

    The MLP class has the following methods:

    - __init__(self, n_state, config): Initializes the MLP object. It takes two parameters: n_state, which represents
    the number of output channels in the first convolutional layer, and config, which is an object containing
    configuration parameters for the MLP model.
    Inside the method, it initializes the parent class (nn.Module), sets the number of input channels (nx) to the value
    specified in the config, creates a 1-dimensional convolutional layer (self.c_fc) with n_state output channels and
    nx input channels, creates another 1-dimensional convolutional layer (self.c_proj) with nx output channels and
    n_state input channels, sets the activation function (self.act) to the value specified in the config, and sets the
    dropout probability (self.dropout) to the value specified in the config.
    - forward(self, x): Constructs the MLP model. It takes one parameter, x, which represents the input tensor.
    Inside the method, it applies the activation function to the output of the first convolutional layer (self.c_fc),
    applies the second convolutional layer (self.c_proj) to the result, and returns the output after applying dropout
    (self.dropout).

    Note:
        The MLP class assumes the existence of the ACT_FNS dictionary, which maps activation function names to their
        corresponding functions.

    Example:
        ```python
        >>> config = MLPConfig(n_embd=128, afn='relu', resid_pdrop=0.2)
        >>> mlp = MLP(n_state=64, config=config)
        >>> input_tensor = torch.randn(10, 128)
        >>> output = mlp.forward(input_tensor)
        ```
    """
    def __init__(self, n_state, config):  # in MLP: n_state=3072 (4 * n_embd)
        """
        Initializes an instance of the MLP class.

        Args:
            self: The instance of the MLP class.
            n_state (int): Number of states for the MLP.
            config (object): Configuration object containing parameters for the MLP.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()
        nx = config.n_embd
        self.c_fc = Conv1D(n_state, nx)
        self.c_proj = Conv1D(nx, n_state)
        self.act = ACT_FNS[config.afn]
        self.dropout = nn.Dropout(p=config.resid_pdrop)

    def forward(self, x):
        """
        Constructs the output of the Multi-Layer Perceptron (MLP) model.

        Args:
            self (MLP): The instance of the MLP class.
            x (tensor): The input tensor to be processed by the MLP.

        Returns:
            The forwarded output tensor after passing through the MLP layers.

        Raises:
            None.
        """
        h = self.act(self.c_fc(x))
        h2 = self.c_proj(h)
        return self.dropout(h2)

mindnlp.transformers.models.gpt.modeling_gpt.MLP.__init__(n_state, config)

Initializes an instance of the MLP class.

PARAMETER DESCRIPTION
self

The instance of the MLP class.

n_state

Number of states for the MLP.

TYPE: int

config

Configuration object containing parameters for the MLP.

TYPE: object

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/gpt/modeling_gpt.py
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
def __init__(self, n_state, config):  # in MLP: n_state=3072 (4 * n_embd)
    """
    Initializes an instance of the MLP class.

    Args:
        self: The instance of the MLP class.
        n_state (int): Number of states for the MLP.
        config (object): Configuration object containing parameters for the MLP.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()
    nx = config.n_embd
    self.c_fc = Conv1D(n_state, nx)
    self.c_proj = Conv1D(nx, n_state)
    self.act = ACT_FNS[config.afn]
    self.dropout = nn.Dropout(p=config.resid_pdrop)

mindnlp.transformers.models.gpt.modeling_gpt.MLP.forward(x)

Constructs the output of the Multi-Layer Perceptron (MLP) model.

PARAMETER DESCRIPTION
self

The instance of the MLP class.

TYPE: MLP

x

The input tensor to be processed by the MLP.

TYPE: tensor

RETURNS DESCRIPTION

The forwarded output tensor after passing through the MLP layers.

Source code in mindnlp/transformers/models/gpt/modeling_gpt.py
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
def forward(self, x):
    """
    Constructs the output of the Multi-Layer Perceptron (MLP) model.

    Args:
        self (MLP): The instance of the MLP class.
        x (tensor): The input tensor to be processed by the MLP.

    Returns:
        The forwarded output tensor after passing through the MLP layers.

    Raises:
        None.
    """
    h = self.act(self.c_fc(x))
    h2 = self.c_proj(h)
    return self.dropout(h2)

mindnlp.transformers.models.gpt.configuration_gpt

OpenAI GPT configuration

mindnlp.transformers.models.gpt.configuration_gpt.GPTConfig

Bases: PretrainedConfig

This is the configuration class to store the configuration of a [OpenAIGPTModel] or a [TFOpenAIGPTModel]. It is used to instantiate a GPT model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the GPT openai-gpt architecture from OpenAI.

Configuration objects inherit from [PretrainedConfig] and can be used to control the model outputs. Read the documentation from [PretrainedConfig] for more information.

PARAMETER DESCRIPTION
vocab_size

Vocabulary size of the GPT-2 model. Defines the number of different tokens that can be represented by the inputs_ids passed when calling [OpenAIGPTModel] or [TFOpenAIGPTModel].

TYPE: `int`, *optional*, defaults to 40478 DEFAULT: 40478

n_positions

The maximum sequence length that this model might ever be used with. Typically set this to something large just in case (e.g., 512 or 1024 or 2048).

TYPE: `int`, *optional*, defaults to 512 DEFAULT: 512

n_embd

Dimensionality of the embeddings and hidden states.

TYPE: `int`, *optional*, defaults to 768 DEFAULT: 768

n_layer

Number of hidden layers in the Transformer encoder.

TYPE: `int`, *optional*, defaults to 12 DEFAULT: 12

n_head

Number of attention heads for each attention layer in the Transformer encoder.

TYPE: `int`, *optional*, defaults to 12 DEFAULT: 12

afn

The non-linear activation function (function or string) in the encoder and pooler. If string, "gelu", "relu", "silu" and "gelu_new" are supported.

TYPE: `str` or `Callable`, *optional*, defaults to `"gelu"` DEFAULT: 'gelu'

resid_pdrop

The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.

TYPE: `float`, *optional*, defaults to 0.1 DEFAULT: 0.1

embd_pdrop

The dropout ratio for the embeddings.

TYPE: `int`, *optional*, defaults to 0.1 DEFAULT: 0.1

attn_pdrop

The dropout ratio for the attention.

TYPE: `float`, *optional*, defaults to 0.1 DEFAULT: 0.1

layer_norm_epsilon

The epsilon to use in the layer normalization layers

TYPE: `float`, *optional*, defaults to 1e-05 DEFAULT: 1e-05

initializer_range

The standard deviation of the truncated_normal_initializer for initializing all weight matrices.

TYPE: `float`, *optional*, defaults to 0.02 DEFAULT: 0.02

summary_type

Argument used when doing sequence summary, used in the models [OpenAIGPTDoubleHeadsModel] and [OpenAIGPTDoubleHeadsModel]. Has to be one of the following options:

  • "last": Take the last token hidden state (like XLNet).
  • "first": Take the first token hidden state (like BERT).
  • "mean": Take the mean of all tokens hidden states.
  • "cls_index": Supply a Tensor of classification token position (like GPT/GPT-2).
  • "attn": Not implemented now, use multi-head attention.

TYPE: `str`, *optional*, defaults to `"cls_index"` DEFAULT: 'cls_index'

summary_use_proj

Argument used when doing sequence summary, used in the models [OpenAIGPTDoubleHeadsModel] and [OpenAIGPTDoubleHeadsModel].

Whether or not to add a projection after the vector extraction.

TYPE: `bool`, *optional*, defaults to `True` DEFAULT: True

summary_activation

Argument used when doing sequence summary, used in the models [OpenAIGPTDoubleHeadsModel] and [OpenAIGPTDoubleHeadsModel].

Pass "tanh" for a tanh activation to the output, any other value will result in no activation.

TYPE: `str`, *optional* DEFAULT: None

summary_proj_to_labels

Argument used when doing sequence summary, used in the models [OpenAIGPTDoubleHeadsModel] and [OpenAIGPTDoubleHeadsModel].

Whether the projection outputs should have config.num_labels or config.hidden_size classes.

TYPE: `bool`, *optional*, defaults to `True` DEFAULT: True

summary_first_dropout

Argument used when doing sequence summary, used in the models [OpenAIGPTDoubleHeadsModel] and [OpenAIGPTDoubleHeadsModel].

The dropout ratio to be used after the projection and activation.

TYPE: `float`, *optional*, defaults to 0.1 DEFAULT: 0.1

Example
>>> from transformers import OpenAIGPTConfig, OpenAIGPTModel
...
>>> # Initializing a GPT configuration
>>> configuration = OpenAIGPTConfig()
...
>>> # Initializing a model (with random weights) from the configuration
>>> model = OpenAIGPTModel(configuration)
...
>>> # Accessing the model configuration
>>> configuration = model.config
Source code in mindnlp/transformers/models/gpt/configuration_gpt.py
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
class GPTConfig(PretrainedConfig):
    """
    This is the configuration class to store the configuration of a [`OpenAIGPTModel`] or a [`TFOpenAIGPTModel`]. It is
    used to instantiate a GPT model according to the specified arguments, defining the model architecture.
    Instantiating a configuration with the defaults will yield a similar configuration to that of the GPT
    [openai-gpt](https://hf-mirror.com/openai-gpt) architecture from OpenAI.

    Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
    documentation from [`PretrainedConfig`] for more information.

    Args:
        vocab_size (`int`, *optional*, defaults to 40478):
            Vocabulary size of the GPT-2 model. Defines the number of different tokens that can be represented by the
            `inputs_ids` passed when calling [`OpenAIGPTModel`] or [`TFOpenAIGPTModel`].
        n_positions (`int`, *optional*, defaults to 512):
            The maximum sequence length that this model might ever be used with. Typically set this to something large
            just in case (e.g., 512 or 1024 or 2048).
        n_embd (`int`, *optional*, defaults to 768):
            Dimensionality of the embeddings and hidden states.
        n_layer (`int`, *optional*, defaults to 12):
            Number of hidden layers in the Transformer encoder.
        n_head (`int`, *optional*, defaults to 12):
            Number of attention heads for each attention layer in the Transformer encoder.
        afn (`str` or `Callable`, *optional*, defaults to `"gelu"`):
            The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
            `"relu"`, `"silu"` and `"gelu_new"` are supported.
        resid_pdrop (`float`, *optional*, defaults to 0.1):
            The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
        embd_pdrop (`int`, *optional*, defaults to 0.1):
            The dropout ratio for the embeddings.
        attn_pdrop (`float`, *optional*, defaults to 0.1):
            The dropout ratio for the attention.
        layer_norm_epsilon (`float`, *optional*, defaults to 1e-05):
            The epsilon to use in the layer normalization layers
        initializer_range (`float`, *optional*, defaults to 0.02):
            The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
        summary_type (`str`, *optional*, defaults to `"cls_index"`):
            Argument used when doing sequence summary, used in the models [`OpenAIGPTDoubleHeadsModel`] and
            [`OpenAIGPTDoubleHeadsModel`].
            Has to be one of the following options:

            - `"last"`: Take the last token hidden state (like XLNet).
            - `"first"`: Take the first token hidden state (like BERT).
            - `"mean"`: Take the mean of all tokens hidden states.
            - `"cls_index"`: Supply a Tensor of classification token position (like GPT/GPT-2).
            - `"attn"`: Not implemented now, use multi-head attention.
        summary_use_proj (`bool`, *optional*, defaults to `True`):
            Argument used when doing sequence summary, used in the models [`OpenAIGPTDoubleHeadsModel`] and
            [`OpenAIGPTDoubleHeadsModel`].

            Whether or not to add a projection after the vector extraction.
        summary_activation (`str`, *optional*):
            Argument used when doing sequence summary, used in the models [`OpenAIGPTDoubleHeadsModel`] and
            [`OpenAIGPTDoubleHeadsModel`].

            Pass `"tanh"` for a tanh activation to the output, any other value will result in no activation.
        summary_proj_to_labels (`bool`, *optional*, defaults to `True`):
            Argument used when doing sequence summary, used in the models [`OpenAIGPTDoubleHeadsModel`] and
            [`OpenAIGPTDoubleHeadsModel`].

            Whether the projection outputs should have `config.num_labels` or `config.hidden_size` classes.
        summary_first_dropout (`float`, *optional*, defaults to 0.1):
            Argument used when doing sequence summary, used in the models [`OpenAIGPTDoubleHeadsModel`] and
            [`OpenAIGPTDoubleHeadsModel`].

            The dropout ratio to be used after the projection and activation.


    Example:
        ```python
        >>> from transformers import OpenAIGPTConfig, OpenAIGPTModel
        ...
        >>> # Initializing a GPT configuration
        >>> configuration = OpenAIGPTConfig()
        ...
        >>> # Initializing a model (with random weights) from the configuration
        >>> model = OpenAIGPTModel(configuration)
        ...
        >>> # Accessing the model configuration
        >>> configuration = model.config
        ```
    """
    model_type = "openai-gpt"
    attribute_map = {
        "max_position_embeddings": "n_positions",
        "hidden_size": "n_embd",
        "num_attention_heads": "n_head",
        "num_hidden_layers": "n_layer",
    }

    def __init__(
        self,
        vocab_size=40478,
        n_positions=512,
        n_embd=768,
        n_layer=12,
        n_head=12,
        afn="gelu",
        resid_pdrop=0.1,
        embd_pdrop=0.1,
        attn_pdrop=0.1,
        layer_norm_epsilon=1e-5,
        initializer_range=0.02,
        summary_type="cls_index",
        summary_use_proj=True,
        summary_activation=None,
        summary_proj_to_labels=True,
        summary_first_dropout=0.1,
        **kwargs,
    ):
        """
        Initializes a GPTConfig object with the provided parameters.

        Args:
            vocab_size (int): The size of the vocabulary.
            n_positions (int): The number of positions.
            n_embd (int): The embedding dimension.
            n_layer (int): The number of layers.
            n_head (int): The number of attention heads.
            afn (str): The activation function to be used.
            resid_pdrop (float): The dropout probability for residual connections.
            embd_pdrop (float): The dropout probability for the embeddings.
            attn_pdrop (float): The dropout probability for attention layers.
            layer_norm_epsilon (float): The epsilon value for layer normalization.
            initializer_range (float): The range of the initializer.
            summary_type (str): The type of summary to be used.
            summary_use_proj (bool): Whether to use projection in summary.
            summary_activation (str): The activation function for the summary.
            summary_proj_to_labels (bool): Whether to project the summary to labels.
            summary_first_dropout (float): The dropout probability for the first layer of the summary.
            **kwargs: Additional keyword arguments.

        Returns:
            None.

        Raises:
            None
        """
        self.vocab_size = vocab_size
        self.n_positions = n_positions
        self.n_embd = n_embd
        self.n_layer = n_layer
        self.n_head = n_head
        self.afn = afn
        self.resid_pdrop = resid_pdrop
        self.embd_pdrop = embd_pdrop
        self.attn_pdrop = attn_pdrop
        self.layer_norm_epsilon = layer_norm_epsilon
        self.initializer_range = initializer_range
        self.summary_type = summary_type
        self.summary_use_proj = summary_use_proj
        self.summary_activation = summary_activation
        self.summary_first_dropout = summary_first_dropout
        self.summary_proj_to_labels = summary_proj_to_labels
        super().__init__(**kwargs)

mindnlp.transformers.models.gpt.configuration_gpt.GPTConfig.__init__(vocab_size=40478, n_positions=512, n_embd=768, n_layer=12, n_head=12, afn='gelu', resid_pdrop=0.1, embd_pdrop=0.1, attn_pdrop=0.1, layer_norm_epsilon=1e-05, initializer_range=0.02, summary_type='cls_index', summary_use_proj=True, summary_activation=None, summary_proj_to_labels=True, summary_first_dropout=0.1, **kwargs)

Initializes a GPTConfig object with the provided parameters.

PARAMETER DESCRIPTION
vocab_size

The size of the vocabulary.

TYPE: int DEFAULT: 40478

n_positions

The number of positions.

TYPE: int DEFAULT: 512

n_embd

The embedding dimension.

TYPE: int DEFAULT: 768

n_layer

The number of layers.

TYPE: int DEFAULT: 12

n_head

The number of attention heads.

TYPE: int DEFAULT: 12

afn

The activation function to be used.

TYPE: str DEFAULT: 'gelu'

resid_pdrop

The dropout probability for residual connections.

TYPE: float DEFAULT: 0.1

embd_pdrop

The dropout probability for the embeddings.

TYPE: float DEFAULT: 0.1

attn_pdrop

The dropout probability for attention layers.

TYPE: float DEFAULT: 0.1

layer_norm_epsilon

The epsilon value for layer normalization.

TYPE: float DEFAULT: 1e-05

initializer_range

The range of the initializer.

TYPE: float DEFAULT: 0.02

summary_type

The type of summary to be used.

TYPE: str DEFAULT: 'cls_index'

summary_use_proj

Whether to use projection in summary.

TYPE: bool DEFAULT: True

summary_activation

The activation function for the summary.

TYPE: str DEFAULT: None

summary_proj_to_labels

Whether to project the summary to labels.

TYPE: bool DEFAULT: True

summary_first_dropout

The dropout probability for the first layer of the summary.

TYPE: float DEFAULT: 0.1

**kwargs

Additional keyword arguments.

DEFAULT: {}

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/gpt/configuration_gpt.py
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
def __init__(
    self,
    vocab_size=40478,
    n_positions=512,
    n_embd=768,
    n_layer=12,
    n_head=12,
    afn="gelu",
    resid_pdrop=0.1,
    embd_pdrop=0.1,
    attn_pdrop=0.1,
    layer_norm_epsilon=1e-5,
    initializer_range=0.02,
    summary_type="cls_index",
    summary_use_proj=True,
    summary_activation=None,
    summary_proj_to_labels=True,
    summary_first_dropout=0.1,
    **kwargs,
):
    """
    Initializes a GPTConfig object with the provided parameters.

    Args:
        vocab_size (int): The size of the vocabulary.
        n_positions (int): The number of positions.
        n_embd (int): The embedding dimension.
        n_layer (int): The number of layers.
        n_head (int): The number of attention heads.
        afn (str): The activation function to be used.
        resid_pdrop (float): The dropout probability for residual connections.
        embd_pdrop (float): The dropout probability for the embeddings.
        attn_pdrop (float): The dropout probability for attention layers.
        layer_norm_epsilon (float): The epsilon value for layer normalization.
        initializer_range (float): The range of the initializer.
        summary_type (str): The type of summary to be used.
        summary_use_proj (bool): Whether to use projection in summary.
        summary_activation (str): The activation function for the summary.
        summary_proj_to_labels (bool): Whether to project the summary to labels.
        summary_first_dropout (float): The dropout probability for the first layer of the summary.
        **kwargs: Additional keyword arguments.

    Returns:
        None.

    Raises:
        None
    """
    self.vocab_size = vocab_size
    self.n_positions = n_positions
    self.n_embd = n_embd
    self.n_layer = n_layer
    self.n_head = n_head
    self.afn = afn
    self.resid_pdrop = resid_pdrop
    self.embd_pdrop = embd_pdrop
    self.attn_pdrop = attn_pdrop
    self.layer_norm_epsilon = layer_norm_epsilon
    self.initializer_range = initializer_range
    self.summary_type = summary_type
    self.summary_use_proj = summary_use_proj
    self.summary_activation = summary_activation
    self.summary_first_dropout = summary_first_dropout
    self.summary_proj_to_labels = summary_proj_to_labels
    super().__init__(**kwargs)

mindnlp.transformers.models.gpt.tokenization_gpt

Tokenization classes for OpenAI GPT.

mindnlp.transformers.models.gpt.tokenization_gpt.BasicTokenizer

Constructs a BasicTokenizer that will run basic tokenization (punctuation splitting, lower casing, etc.).

PARAMETER DESCRIPTION
do_lower_case

Whether or not to lowercase the input when tokenizing.

TYPE: `bool`, *optional*, defaults to `True` DEFAULT: True

never_split

Collection of tokens which will never be split during tokenization. Only has an effect when do_basic_tokenize=True

TYPE: `Iterable`, *optional* DEFAULT: None

tokenize_chinese_chars

Whether or not to tokenize Chinese characters.

This should likely be deactivated for Japanese (see this issue).

TYPE: `bool`, *optional*, defaults to `True` DEFAULT: True

strip_accents

Whether or not to strip all accents. If this option is not specified, then it will be determined by the value for lowercase (as in the original BERT).

TYPE: `bool`, *optional* DEFAULT: None

do_split_on_punc

In some instances we want to skip the basic punctuation splitting so that later tokenization can capture the full context of the words, such as contractions.

TYPE: `bool`, *optional*, defaults to `True` DEFAULT: True

Source code in mindnlp/transformers/models/gpt/tokenization_gpt.py
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
class BasicTokenizer():
    """
    Constructs a BasicTokenizer that will run basic tokenization (punctuation splitting, lower casing, etc.).

    Args:
        do_lower_case (`bool`, *optional*, defaults to `True`):
            Whether or not to lowercase the input when tokenizing.
        never_split (`Iterable`, *optional*):
            Collection of tokens which will never be split during tokenization. Only has an effect when
            `do_basic_tokenize=True`
        tokenize_chinese_chars (`bool`, *optional*, defaults to `True`):
            Whether or not to tokenize Chinese characters.

            This should likely be deactivated for Japanese (see this
            [issue](https://github.com/huggingface/transformers/issues/328)).
        strip_accents (`bool`, *optional*):
            Whether or not to strip all accents. If this option is not specified, then it will be determined by the
            value for `lowercase` (as in the original BERT).
        do_split_on_punc (`bool`, *optional*, defaults to `True`):
            In some instances we want to skip the basic punctuation splitting so that later tokenization can capture
            the full context of the words, such as contractions.
    """
    def __init__(
        self,
        do_lower_case=True,
        never_split=None,
        tokenize_chinese_chars=True,
        strip_accents=None,
        do_split_on_punc=True,
    ):
        """
        Initializes an instance of the BasicTokenizer class.

        Args:
            self (object): The instance of the BasicTokenizer class.
            do_lower_case (bool, optional): Indicates whether text should be converted to lowercase. Default is True.
            never_split (list, optional): List of tokens that should never be split. Default is an empty list.
            tokenize_chinese_chars (bool, optional): Indicates whether Chinese characters should be tokenized.
                Default is True.
            strip_accents (None or str, optional): Specifies the type of accents to remove. Default is None.
            do_split_on_punc (bool, optional): Indicates whether to split on punctuation. Default is True.

        Returns:
            None.

        Raises:
            None.
        """
        if never_split is None:
            never_split = []
        self.do_lower_case = do_lower_case
        self.never_split = set(never_split)
        self.tokenize_chinese_chars = tokenize_chinese_chars
        self.strip_accents = strip_accents
        self.do_split_on_punc = do_split_on_punc

    def tokenize(self, text, never_split=None):
        """
        Basic Tokenization of a piece of text. For sub-word tokenization, see WordPieceTokenizer.

        Args:
            never_split (`List[str]`, *optional*)
                Kept for backward compatibility purposes. Now implemented directly at the base class level (see
                [`PreTrainedTokenizer.tokenize`]) List of token not to split.
        """
        # union() returns a new set by concatenating the two sets.
        never_split = self.never_split.union(set(never_split)) if never_split else self.never_split
        text = self._clean_text(text)

        # This was added on November 1st, 2018 for the multilingual and Chinese
        # models. This is also applied to the English models now, but it doesn't
        # matter since the English models were not trained on any Chinese data
        # and generally don't have any Chinese data in them (there are Chinese
        # characters in the vocabulary because Wikipedia does have some Chinese
        # words in the English Wikipedia.).
        if self.tokenize_chinese_chars:
            text = self._tokenize_chinese_chars(text)
        # prevents treating the same character with different unicode codepoints as different characters
        unicode_normalized_text = unicodedata.normalize("NFC", text)
        orig_tokens = whitespace_tokenize(unicode_normalized_text)
        split_tokens = []
        for token in orig_tokens:
            if token not in never_split:
                if self.do_lower_case:
                    token = token.lower()
                    if self.strip_accents is not False:
                        token = self._run_strip_accents(token)
                elif self.strip_accents:
                    token = self._run_strip_accents(token)
            split_tokens.extend(self._run_split_on_punc(token, never_split))

        output_tokens = whitespace_tokenize(" ".join(split_tokens))
        return output_tokens

    def _run_strip_accents(self, text):
        """Strips accents from a piece of text."""
        text = unicodedata.normalize("NFD", text)
        output = []
        for char in text:
            cat = unicodedata.category(char)
            if cat == "Mn":
                continue
            output.append(char)
        return "".join(output)

    def _run_split_on_punc(self, text, never_split=None):
        """Splits punctuation on a piece of text."""
        if not self.do_split_on_punc or (never_split is not None and text in never_split):
            return [text]
        chars = list(text)
        i = 0
        start_new_word = True
        output = []
        while i < len(chars):
            char = chars[i]
            if _is_punctuation(char):
                output.append([char])
                start_new_word = True
            else:
                if start_new_word:
                    output.append([])
                start_new_word = False
                output[-1].append(char)
            i += 1

        return ["".join(x) for x in output]

    def _tokenize_chinese_chars(self, text):
        """Adds whitespace around any CJK character."""
        output = []
        for char in text:
            cp = ord(char)
            if self._is_chinese_char(cp):
                output.append(" ")
                output.append(char)
                output.append(" ")
            else:
                output.append(char)
        return "".join(output)

    def _is_chinese_char(self, cp):
        """Checks whether CP is the codepoint of a CJK character."""
        # This defines a "chinese character" as anything in the CJK Unicode block:
        #   https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_(Unicode_block)
        #
        # Note that the CJK Unicode block is NOT all Japanese and Korean characters,
        # despite its name. The modern Korean Hangul alphabet is a different block,
        # as is Japanese Hiragana and Katakana. Those alphabets are used to write
        # space-separated words, so they are not treated specially and handled
        # like the all of the other languages.
        if (
            (cp >= 0x4E00 and cp <= 0x9FFF)
            or (cp >= 0x3400 and cp <= 0x4DBF)  #
            or (cp >= 0x20000 and cp <= 0x2A6DF)  #
            or (cp >= 0x2A700 and cp <= 0x2B73F)  #
            or (cp >= 0x2B740 and cp <= 0x2B81F)  #
            or (cp >= 0x2B820 and cp <= 0x2CEAF)  #
            or (cp >= 0xF900 and cp <= 0xFAFF)
            or (cp >= 0x2F800 and cp <= 0x2FA1F)  #
        ):  #
            return True

        return False

    def _clean_text(self, text):
        """Performs invalid character removal and whitespace cleanup on text."""
        output = []
        for char in text:
            cp = ord(char)
            if cp == 0 or cp == 0xFFFD or _is_control(char):
                continue
            if _is_whitespace(char):
                output.append(" ")
            else:
                output.append(char)
        return "".join(output)

mindnlp.transformers.models.gpt.tokenization_gpt.BasicTokenizer.__init__(do_lower_case=True, never_split=None, tokenize_chinese_chars=True, strip_accents=None, do_split_on_punc=True)

Initializes an instance of the BasicTokenizer class.

PARAMETER DESCRIPTION
self

The instance of the BasicTokenizer class.

TYPE: object

do_lower_case

Indicates whether text should be converted to lowercase. Default is True.

TYPE: bool DEFAULT: True

never_split

List of tokens that should never be split. Default is an empty list.

TYPE: list DEFAULT: None

tokenize_chinese_chars

Indicates whether Chinese characters should be tokenized. Default is True.

TYPE: bool DEFAULT: True

strip_accents

Specifies the type of accents to remove. Default is None.

TYPE: None or str DEFAULT: None

do_split_on_punc

Indicates whether to split on punctuation. Default is True.

TYPE: bool DEFAULT: True

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/gpt/tokenization_gpt.py
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
def __init__(
    self,
    do_lower_case=True,
    never_split=None,
    tokenize_chinese_chars=True,
    strip_accents=None,
    do_split_on_punc=True,
):
    """
    Initializes an instance of the BasicTokenizer class.

    Args:
        self (object): The instance of the BasicTokenizer class.
        do_lower_case (bool, optional): Indicates whether text should be converted to lowercase. Default is True.
        never_split (list, optional): List of tokens that should never be split. Default is an empty list.
        tokenize_chinese_chars (bool, optional): Indicates whether Chinese characters should be tokenized.
            Default is True.
        strip_accents (None or str, optional): Specifies the type of accents to remove. Default is None.
        do_split_on_punc (bool, optional): Indicates whether to split on punctuation. Default is True.

    Returns:
        None.

    Raises:
        None.
    """
    if never_split is None:
        never_split = []
    self.do_lower_case = do_lower_case
    self.never_split = set(never_split)
    self.tokenize_chinese_chars = tokenize_chinese_chars
    self.strip_accents = strip_accents
    self.do_split_on_punc = do_split_on_punc

mindnlp.transformers.models.gpt.tokenization_gpt.BasicTokenizer.tokenize(text, never_split=None)

Basic Tokenization of a piece of text. For sub-word tokenization, see WordPieceTokenizer.

Source code in mindnlp/transformers/models/gpt/tokenization_gpt.py
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
def tokenize(self, text, never_split=None):
    """
    Basic Tokenization of a piece of text. For sub-word tokenization, see WordPieceTokenizer.

    Args:
        never_split (`List[str]`, *optional*)
            Kept for backward compatibility purposes. Now implemented directly at the base class level (see
            [`PreTrainedTokenizer.tokenize`]) List of token not to split.
    """
    # union() returns a new set by concatenating the two sets.
    never_split = self.never_split.union(set(never_split)) if never_split else self.never_split
    text = self._clean_text(text)

    # This was added on November 1st, 2018 for the multilingual and Chinese
    # models. This is also applied to the English models now, but it doesn't
    # matter since the English models were not trained on any Chinese data
    # and generally don't have any Chinese data in them (there are Chinese
    # characters in the vocabulary because Wikipedia does have some Chinese
    # words in the English Wikipedia.).
    if self.tokenize_chinese_chars:
        text = self._tokenize_chinese_chars(text)
    # prevents treating the same character with different unicode codepoints as different characters
    unicode_normalized_text = unicodedata.normalize("NFC", text)
    orig_tokens = whitespace_tokenize(unicode_normalized_text)
    split_tokens = []
    for token in orig_tokens:
        if token not in never_split:
            if self.do_lower_case:
                token = token.lower()
                if self.strip_accents is not False:
                    token = self._run_strip_accents(token)
            elif self.strip_accents:
                token = self._run_strip_accents(token)
        split_tokens.extend(self._run_split_on_punc(token, never_split))

    output_tokens = whitespace_tokenize(" ".join(split_tokens))
    return output_tokens

mindnlp.transformers.models.gpt.tokenization_gpt.GPTTokenizer

Bases: PreTrainedTokenizer

Construct a GPT Tokenizer. Based on Byte-Pair-Encoding with the following peculiarities:

  • lowercases all inputs,
  • uses SpaCy tokenizer and ftfy for pre-BPE tokenization if they are installed, fallback to BERT's BasicTokenizer if not.

This tokenizer inherits from [PreTrainedTokenizer] which contains most of the main methods. Users should refer to this superclass for more information regarding those methods.

PARAMETER DESCRIPTION
vocab_file

Path to the vocabulary file.

TYPE: `str`

merges_file

Path to the merges file.

TYPE: `str`

unk_token

The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this token instead.

TYPE: `str`, *optional*, defaults to `"<unk>"` DEFAULT: '<unk>'

Source code in mindnlp/transformers/models/gpt/tokenization_gpt.py
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
class GPTTokenizer(PreTrainedTokenizer):
    """
    Construct a GPT Tokenizer. Based on Byte-Pair-Encoding with the following peculiarities:

    - lowercases all inputs,
    - uses `SpaCy` tokenizer and `ftfy` for pre-BPE tokenization if they are installed, fallback to BERT's
    `BasicTokenizer` if not.

    This tokenizer inherits from [`PreTrainedTokenizer`] which contains most of the main methods. Users should refer to
    this superclass for more information regarding those methods.

    Args:
        vocab_file (`str`):
            Path to the vocabulary file.
        merges_file (`str`):
            Path to the merges file.
        unk_token (`str`, *optional*, defaults to `"<unk>"`):
            The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
            token instead.
    """
    vocab_files_names = VOCAB_FILES_NAMES
    pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP
    max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
    model_input_names = ["input_ids", "attention_mask"]

    def __init__(self, vocab_file, merges_file, unk_token="<unk>", **kwargs):
        """
        This method initializes an instance of the GPTTokenizer class.

        Args:
            self: The instance of the GPTTokenizer class.
            vocab_file (str): The path to the vocabulary file containing the encoder information.
            merges_file (str): The path to the file containing merge operations for Byte Pair Encoding (BPE).
            unk_token (str, optional): The token to represent unknown words. Defaults to '<unk>'.

        Returns:
            None.

        Raises:
            ImportError: If the required packages 'ftfy' or 'spacy' are not installed, an ImportError is raised.
            FileNotFoundError: If the vocab_file or merges_file is not found, a FileNotFoundError is raised.
            JSONDecodeError: If there is an issue with decoding the vocabulary file, a JSONDecodeError is raised.
            IndexError: If there is an issue with processing the merges file, an IndexError is raised.
        """
        try:
            import ftfy
            from spacy.lang.en import English

            _nlp = English()
            self.nlp = _nlp.tokenizer
            self.fix_text = ftfy.fix_text
        except ImportError:
            logger.warning("ftfy or spacy is not installed using BERT BasicTokenizer instead of SpaCy & ftfy.")
            self.nlp = BasicTokenizer(do_lower_case=True)
            self.fix_text = None

        with open(vocab_file, encoding="utf-8") as vocab_handle:
            self.encoder = json.load(vocab_handle)
        self.decoder = {v: k for k, v in self.encoder.items()}
        with open(merges_file, encoding="utf-8") as merges_handle:
            merges = merges_handle.read().split("\n")[1:-1]
        merges = [tuple(merge.split()) for merge in merges]
        self.bpe_ranks = dict(zip(merges, range(len(merges))))
        self.cache = {}

        super().__init__(unk_token=unk_token, **kwargs)

    @property
    def do_lower_case(self):
        """
        Toggle the lower case flag for the GPTTokenizer object.

        Args:
            self: An instance of the GPTTokenizer class.

        Returns:
            None.

        Raises:
            None.

        This method is used to toggle the lower case flag for the GPTTokenizer object.
        When the lower case flag is set to True, the tokenizer will convert all text to lower case.
        When the flag is set to False, the tokenizer will preserve the original casing of the text.

        Note that changing the lower case flag will affect the tokenization behavior of the GPTTokenizer object.
        It is recommended to set the lower case flag before tokenizing any text using the tokenizer. By default,
        the lower case flag is set to True.

        Example:
            ```python
            >>> tokenizer = GPTTokenizer()
            >>> tokenizer.do_lower_case = False
            ...
            ```
        """
        return True

    @property
    def vocab_size(self):
        """
        Method to retrieve the vocabulary size of the GPTTokenizer instance.

        Args:
            self: GPTTokenizer
                The instance of GPTTokenizer for which the vocabulary size is to be determined.
                It is automatically passed when the method is called.

        Returns:
            int:
                The vocabulary size of the GPTTokenizer instance, which is the length of the encoder used by the tokenizer.

        Raises:
            None:
                This method does not raise any exceptions.
        """
        return len(self.encoder)

    def get_vocab(self):
        """
        This method returns the vocabulary of the GPTTokenizer.

        Args:
            self (GPTTokenizer): The instance of the GPTTokenizer class.

        Returns:
            dict: A dictionary containing the vocabulary,
                where the keys are the tokens and the values are their corresponding IDs.

        Raises:
            None
        """
        return dict(self.encoder, **self.added_tokens_encoder)

    def bpe(self, token):
        """
        This method is part of the GPTTokenizer class and performs Byte Pair Encoding (BPE) on a given token.

        Args:
            self (GPTTokenizer): The instance of the GPTTokenizer class.
            token (str): The input token to be encoded using BPE. It should be a non-empty string.

        Returns:
            None: This method does not return any value but updates the cache with the encoded word.

        Raises:
            None.
        """
        word = tuple(token[:-1]) + (token[-1] + "</w>",)
        if token in self.cache:
            return self.cache[token]
        pairs = get_pairs(word)

        if not pairs:
            return token + "</w>"

        while True:
            bigram = min(pairs, key=lambda pair: self.bpe_ranks.get(pair, float("inf")))
            if bigram not in self.bpe_ranks:
                break
            first, second = bigram
            new_word = []
            i = 0
            while i < len(word):
                try:
                    j = word.index(first, i)
                except ValueError:
                    new_word.extend(word[i:])
                    break
                else:
                    new_word.extend(word[i:j])
                    i = j

                if word[i] == first and i < len(word) - 1 and word[i + 1] == second:
                    new_word.append(first + second)
                    i += 2
                else:
                    new_word.append(word[i])
                    i += 1
            new_word = tuple(new_word)
            word = new_word
            if len(word) == 1:
                break
            pairs = get_pairs(word)
        word = " ".join(word)
        if word == "\n  </w>":
            word = "\n</w>"
        self.cache[token] = word
        return word

    def _tokenize(self, text):
        """Tokenize a string."""
        split_tokens = []
        if self.fix_text is None:
            # Using BERT's BasicTokenizer
            text = self.nlp.tokenize(text)
            for token in text:
                split_tokens.extend(list(self.bpe(token).split(" ")))
        else:
            # Using SpaCy & ftfy (original tokenization process of OpenAI GPT)
            text = self.nlp(text_standardize(self.fix_text(text)))
            for token in text:
                split_tokens.extend(list(self.bpe(token.text.lower()).split(" ")))
        return split_tokens

    def _convert_token_to_id(self, token):
        """Converts a token (str) in an id using the vocab."""
        return self.encoder.get(token, self.encoder.get(self.unk_token))

    def _convert_id_to_token(self, index):
        """Converts an id in a token (BPE) using the vocab."""
        return self.decoder.get(index, self.unk_token)

    def convert_tokens_to_string(self, tokens):
        """Converts a sequence of tokens (string) in a single string."""
        out_string = "".join(tokens).replace("</w>", " ").strip()
        return out_string

    def save_vocabulary(self, save_directory: str, filename_prefix: Optional[str] = None) -> Tuple[str]:
        """
        Save the vocabulary files to the specified directory.

        Args:
            self: The instance of the GPTTokenizer class.
            save_directory (str): The directory path where the vocabulary files will be saved.
            filename_prefix (Optional[str]): An optional prefix to be added to the filenames. Default is None.

        Returns:
            Tuple[str]: A tuple containing the paths to the saved vocabulary file and merge file.

        Raises:
            FileNotFoundError: If the specified save_directory does not exist.
            IOError: If there is an issue with writing the vocabulary files.
            ValueError: If the provided filename_prefix is invalid.
            RuntimeError: If the BPE merge indices are not consecutive, indicating a potential tokenizer corruption.
        """
        if not os.path.isdir(save_directory):
            logger.error(f"Vocabulary path ({save_directory}) should be a directory")
            return
        vocab_file = os.path.join(
            save_directory, (filename_prefix + "-" if filename_prefix else "") + VOCAB_FILES_NAMES["vocab_file"]
        )
        merge_file = os.path.join(
            save_directory, (filename_prefix + "-" if filename_prefix else "") + VOCAB_FILES_NAMES["merges_file"]
        )

        with open(vocab_file, "w", encoding="utf-8") as f:
            f.write(json.dumps(self.encoder, indent=2, sort_keys=True, ensure_ascii=False) + "\n")

        index = 0
        with open(merge_file, "w", encoding="utf-8") as writer:
            writer.write("#version: 0.2\n")
            for bpe_tokens, token_index in sorted(self.bpe_ranks.items(), key=lambda kv: kv[1]):
                if index != token_index:
                    logger.warning(
                        f"Saving vocabulary to {merge_file}: BPE merge indices are not consecutive."
                        " Please check that the tokenizer is not corrupted!"
                    )
                    index = token_index
                writer.write(" ".join(bpe_tokens) + "\n")
                index += 1

        return vocab_file, merge_file

mindnlp.transformers.models.gpt.tokenization_gpt.GPTTokenizer.do_lower_case property

Toggle the lower case flag for the GPTTokenizer object.

PARAMETER DESCRIPTION
self

An instance of the GPTTokenizer class.

RETURNS DESCRIPTION

None.

This method is used to toggle the lower case flag for the GPTTokenizer object. When the lower case flag is set to True, the tokenizer will convert all text to lower case. When the flag is set to False, the tokenizer will preserve the original casing of the text.

Note that changing the lower case flag will affect the tokenization behavior of the GPTTokenizer object. It is recommended to set the lower case flag before tokenizing any text using the tokenizer. By default, the lower case flag is set to True.

Example
>>> tokenizer = GPTTokenizer()
>>> tokenizer.do_lower_case = False
...

mindnlp.transformers.models.gpt.tokenization_gpt.GPTTokenizer.vocab_size property

Method to retrieve the vocabulary size of the GPTTokenizer instance.

PARAMETER DESCRIPTION
self

GPTTokenizer The instance of GPTTokenizer for which the vocabulary size is to be determined. It is automatically passed when the method is called.

RETURNS DESCRIPTION
int

The vocabulary size of the GPTTokenizer instance, which is the length of the encoder used by the tokenizer.

RAISES DESCRIPTION
None

This method does not raise any exceptions.

mindnlp.transformers.models.gpt.tokenization_gpt.GPTTokenizer.__init__(vocab_file, merges_file, unk_token='<unk>', **kwargs)

This method initializes an instance of the GPTTokenizer class.

PARAMETER DESCRIPTION
self

The instance of the GPTTokenizer class.

vocab_file

The path to the vocabulary file containing the encoder information.

TYPE: str

merges_file

The path to the file containing merge operations for Byte Pair Encoding (BPE).

TYPE: str

unk_token

The token to represent unknown words. Defaults to ''.

TYPE: str DEFAULT: '<unk>'

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
ImportError

If the required packages 'ftfy' or 'spacy' are not installed, an ImportError is raised.

FileNotFoundError

If the vocab_file or merges_file is not found, a FileNotFoundError is raised.

JSONDecodeError

If there is an issue with decoding the vocabulary file, a JSONDecodeError is raised.

IndexError

If there is an issue with processing the merges file, an IndexError is raised.

Source code in mindnlp/transformers/models/gpt/tokenization_gpt.py
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
def __init__(self, vocab_file, merges_file, unk_token="<unk>", **kwargs):
    """
    This method initializes an instance of the GPTTokenizer class.

    Args:
        self: The instance of the GPTTokenizer class.
        vocab_file (str): The path to the vocabulary file containing the encoder information.
        merges_file (str): The path to the file containing merge operations for Byte Pair Encoding (BPE).
        unk_token (str, optional): The token to represent unknown words. Defaults to '<unk>'.

    Returns:
        None.

    Raises:
        ImportError: If the required packages 'ftfy' or 'spacy' are not installed, an ImportError is raised.
        FileNotFoundError: If the vocab_file or merges_file is not found, a FileNotFoundError is raised.
        JSONDecodeError: If there is an issue with decoding the vocabulary file, a JSONDecodeError is raised.
        IndexError: If there is an issue with processing the merges file, an IndexError is raised.
    """
    try:
        import ftfy
        from spacy.lang.en import English

        _nlp = English()
        self.nlp = _nlp.tokenizer
        self.fix_text = ftfy.fix_text
    except ImportError:
        logger.warning("ftfy or spacy is not installed using BERT BasicTokenizer instead of SpaCy & ftfy.")
        self.nlp = BasicTokenizer(do_lower_case=True)
        self.fix_text = None

    with open(vocab_file, encoding="utf-8") as vocab_handle:
        self.encoder = json.load(vocab_handle)
    self.decoder = {v: k for k, v in self.encoder.items()}
    with open(merges_file, encoding="utf-8") as merges_handle:
        merges = merges_handle.read().split("\n")[1:-1]
    merges = [tuple(merge.split()) for merge in merges]
    self.bpe_ranks = dict(zip(merges, range(len(merges))))
    self.cache = {}

    super().__init__(unk_token=unk_token, **kwargs)

mindnlp.transformers.models.gpt.tokenization_gpt.GPTTokenizer.bpe(token)

This method is part of the GPTTokenizer class and performs Byte Pair Encoding (BPE) on a given token.

PARAMETER DESCRIPTION
self

The instance of the GPTTokenizer class.

TYPE: GPTTokenizer

token

The input token to be encoded using BPE. It should be a non-empty string.

TYPE: str

RETURNS DESCRIPTION
None

This method does not return any value but updates the cache with the encoded word.

Source code in mindnlp/transformers/models/gpt/tokenization_gpt.py
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
def bpe(self, token):
    """
    This method is part of the GPTTokenizer class and performs Byte Pair Encoding (BPE) on a given token.

    Args:
        self (GPTTokenizer): The instance of the GPTTokenizer class.
        token (str): The input token to be encoded using BPE. It should be a non-empty string.

    Returns:
        None: This method does not return any value but updates the cache with the encoded word.

    Raises:
        None.
    """
    word = tuple(token[:-1]) + (token[-1] + "</w>",)
    if token in self.cache:
        return self.cache[token]
    pairs = get_pairs(word)

    if not pairs:
        return token + "</w>"

    while True:
        bigram = min(pairs, key=lambda pair: self.bpe_ranks.get(pair, float("inf")))
        if bigram not in self.bpe_ranks:
            break
        first, second = bigram
        new_word = []
        i = 0
        while i < len(word):
            try:
                j = word.index(first, i)
            except ValueError:
                new_word.extend(word[i:])
                break
            else:
                new_word.extend(word[i:j])
                i = j

            if word[i] == first and i < len(word) - 1 and word[i + 1] == second:
                new_word.append(first + second)
                i += 2
            else:
                new_word.append(word[i])
                i += 1
        new_word = tuple(new_word)
        word = new_word
        if len(word) == 1:
            break
        pairs = get_pairs(word)
    word = " ".join(word)
    if word == "\n  </w>":
        word = "\n</w>"
    self.cache[token] = word
    return word

mindnlp.transformers.models.gpt.tokenization_gpt.GPTTokenizer.convert_tokens_to_string(tokens)

Converts a sequence of tokens (string) in a single string.

Source code in mindnlp/transformers/models/gpt/tokenization_gpt.py
474
475
476
477
def convert_tokens_to_string(self, tokens):
    """Converts a sequence of tokens (string) in a single string."""
    out_string = "".join(tokens).replace("</w>", " ").strip()
    return out_string

mindnlp.transformers.models.gpt.tokenization_gpt.GPTTokenizer.get_vocab()

This method returns the vocabulary of the GPTTokenizer.

PARAMETER DESCRIPTION
self

The instance of the GPTTokenizer class.

TYPE: GPTTokenizer

RETURNS DESCRIPTION
dict

A dictionary containing the vocabulary, where the keys are the tokens and the values are their corresponding IDs.

Source code in mindnlp/transformers/models/gpt/tokenization_gpt.py
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
def get_vocab(self):
    """
    This method returns the vocabulary of the GPTTokenizer.

    Args:
        self (GPTTokenizer): The instance of the GPTTokenizer class.

    Returns:
        dict: A dictionary containing the vocabulary,
            where the keys are the tokens and the values are their corresponding IDs.

    Raises:
        None
    """
    return dict(self.encoder, **self.added_tokens_encoder)

mindnlp.transformers.models.gpt.tokenization_gpt.GPTTokenizer.save_vocabulary(save_directory, filename_prefix=None)

Save the vocabulary files to the specified directory.

PARAMETER DESCRIPTION
self

The instance of the GPTTokenizer class.

save_directory

The directory path where the vocabulary files will be saved.

TYPE: str

filename_prefix

An optional prefix to be added to the filenames. Default is None.

TYPE: Optional[str] DEFAULT: None

RETURNS DESCRIPTION
Tuple[str]

Tuple[str]: A tuple containing the paths to the saved vocabulary file and merge file.

RAISES DESCRIPTION
FileNotFoundError

If the specified save_directory does not exist.

IOError

If there is an issue with writing the vocabulary files.

ValueError

If the provided filename_prefix is invalid.

RuntimeError

If the BPE merge indices are not consecutive, indicating a potential tokenizer corruption.

Source code in mindnlp/transformers/models/gpt/tokenization_gpt.py
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
def save_vocabulary(self, save_directory: str, filename_prefix: Optional[str] = None) -> Tuple[str]:
    """
    Save the vocabulary files to the specified directory.

    Args:
        self: The instance of the GPTTokenizer class.
        save_directory (str): The directory path where the vocabulary files will be saved.
        filename_prefix (Optional[str]): An optional prefix to be added to the filenames. Default is None.

    Returns:
        Tuple[str]: A tuple containing the paths to the saved vocabulary file and merge file.

    Raises:
        FileNotFoundError: If the specified save_directory does not exist.
        IOError: If there is an issue with writing the vocabulary files.
        ValueError: If the provided filename_prefix is invalid.
        RuntimeError: If the BPE merge indices are not consecutive, indicating a potential tokenizer corruption.
    """
    if not os.path.isdir(save_directory):
        logger.error(f"Vocabulary path ({save_directory}) should be a directory")
        return
    vocab_file = os.path.join(
        save_directory, (filename_prefix + "-" if filename_prefix else "") + VOCAB_FILES_NAMES["vocab_file"]
    )
    merge_file = os.path.join(
        save_directory, (filename_prefix + "-" if filename_prefix else "") + VOCAB_FILES_NAMES["merges_file"]
    )

    with open(vocab_file, "w", encoding="utf-8") as f:
        f.write(json.dumps(self.encoder, indent=2, sort_keys=True, ensure_ascii=False) + "\n")

    index = 0
    with open(merge_file, "w", encoding="utf-8") as writer:
        writer.write("#version: 0.2\n")
        for bpe_tokens, token_index in sorted(self.bpe_ranks.items(), key=lambda kv: kv[1]):
            if index != token_index:
                logger.warning(
                    f"Saving vocabulary to {merge_file}: BPE merge indices are not consecutive."
                    " Please check that the tokenizer is not corrupted!"
                )
                index = token_index
            writer.write(" ".join(bpe_tokens) + "\n")
            index += 1

    return vocab_file, merge_file

mindnlp.transformers.models.gpt.tokenization_gpt.get_pairs(word)

Return set of symbol pairs in a word. word is represented as tuple of symbols (symbols being variable-length strings)

Source code in mindnlp/transformers/models/gpt/tokenization_gpt.py
233
234
235
236
237
238
239
240
241
242
243
def get_pairs(word):
    """
    Return set of symbol pairs in a word. word is represented as tuple of symbols (symbols being variable-length
    strings)
    """
    pairs = set()
    prev_char = word[0]
    for char in word[1:]:
        pairs.add((prev_char, char))
        prev_char = char
    return pairs

mindnlp.transformers.models.gpt.tokenization_gpt.text_standardize(text)

fixes some issues the spacy tokenizer had on books corpus also does some whitespace standardization

Source code in mindnlp/transformers/models/gpt/tokenization_gpt.py
246
247
248
249
250
251
252
253
254
255
256
257
258
def text_standardize(text):
    """
    fixes some issues the spacy tokenizer had on books corpus also does some whitespace standardization
    """
    text = text.replace("—", "-")
    text = text.replace("–", "-")
    text = text.replace("―", "-")
    text = text.replace("…", "...")
    text = text.replace("´", "'")
    text = re.sub(r"""(-+|~+|!+|"+|;+|\?+|\++|,+|\)+|\(+|\\+|\/+|\*+|\[+|\]+|}+|{+|\|+|_+)""", r" \1 ", text)
    text = re.sub(r"\s*\n\s*", " \n ", text)
    text = re.sub(r"[^\S\n]+", " ", text)
    return text.strip()

mindnlp.transformers.models.gpt.tokenization_gpt.whitespace_tokenize(text)

Runs basic whitespace cleaning and splitting on a piece of text.

Source code in mindnlp/transformers/models/gpt/tokenization_gpt.py
45
46
47
48
49
50
51
def whitespace_tokenize(text):
    """Runs basic whitespace cleaning and splitting on a piece of text."""
    text = text.strip()
    if not text:
        return []
    tokens = text.split()
    return tokens

mindnlp.transformers.models.gpt.tokenization_gpt_fast

Fast Tokenization classes for OpenAI GPT.

mindnlp.transformers.models.gpt.tokenization_gpt_fast.GPTTokenizerFast

Bases: PreTrainedTokenizerFast

Construct a "fast" GPT Tokenizer (backed by HuggingFace's tokenizers library). Based on Byte-Pair-Encoding with the following peculiarities:

  • lower case all inputs
  • uses BERT's BasicTokenizer for pre-BPE tokenization

This tokenizer inherits from [PreTrainedTokenizerFast] which contains most of the main methods. Users should refer to this superclass for more information regarding those methods.

PARAMETER DESCRIPTION
vocab_file

Path to the vocabulary file.

TYPE: `str` DEFAULT: None

merges_file

Path to the merges file.

TYPE: `str` DEFAULT: None

unk_token

The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this token instead.

TYPE: `str`, *optional*, defaults to `"<unk>"` DEFAULT: '<unk>'

Source code in mindnlp/transformers/models/gpt/tokenization_gpt_fast.py
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
class GPTTokenizerFast(PreTrainedTokenizerFast):
    """
    Construct a "fast" GPT Tokenizer (backed by HuggingFace's *tokenizers* library). Based on Byte-Pair-Encoding with
    the following peculiarities:

    - lower case all inputs
    - uses BERT's BasicTokenizer for pre-BPE tokenization

    This tokenizer inherits from [`PreTrainedTokenizerFast`] which contains most of the main methods. Users should
    refer to this superclass for more information regarding those methods.

    Args:
        vocab_file (`str`):
            Path to the vocabulary file.
        merges_file (`str`):
            Path to the merges file.
        unk_token (`str`, *optional*, defaults to `"<unk>"`):
            The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
            token instead.
    """
    vocab_files_names = VOCAB_FILES_NAMES
    pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP
    max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
    model_input_names = ["input_ids", "attention_mask"]
    slow_tokenizer_class = GPTTokenizer

    def __init__(self, vocab_file=None, merges_file=None, tokenizer_file=None, unk_token="<unk>", **kwargs):
        """
        Initialize a GPTTokenizerFast object.

        Args:
            vocab_file (str): Path to the vocabulary file. Default is None.
            merges_file (str): Path to the merges file. Default is None.
            tokenizer_file (str): Path to the tokenizer file. Default is None.
            unk_token (str): The token to represent unknown words. Default is '<unk>'.
            **kwargs: Additional keyword arguments.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__(vocab_file, merges_file, tokenizer_file=tokenizer_file, unk_token=unk_token, **kwargs)

    @property
    def do_lower_case(self):
        """
        Method 'do_lower_case' in the class 'GPTTokenizerFast'.

        Args:
            self: This parameter refers to the instance of the class itself.
                It is required for accessing the object's attributes and methods.

        Returns:
            True: This method always returns a boolean value of True indicating that lowercasing is enabled.

        Raises:
            None.
        """
        return True

    def save_vocabulary(self, save_directory: str, filename_prefix: Optional[str] = None) -> Tuple[str]:
        """
        Save the vocabulary files generated by the GPTTokenizerFast instance to the specified directory.

        Args:
            self (GPTTokenizerFast): The GPTTokenizerFast instance.
            save_directory (str): The directory where the vocabulary files will be saved.
            filename_prefix (Optional[str]): An optional prefix to be added to the filenames of the vocabulary files.
                Default is None.

        Returns:
            Tuple[str]: A tuple containing the filenames of the saved vocabulary files.

        Raises:
            None.

        This method saves the vocabulary files generated by the GPTTokenizerFast instance to the specified directory.
        The save_directory parameter should be a valid directory path. If filename_prefix is provided,
        it will be added as a prefix to the filenames of the vocabulary files.
        The method returns a tuple containing the filenames of the saved vocabulary files.
        """
        files = self._tokenizer.model.save(save_directory, name=filename_prefix)
        return tuple(files)

mindnlp.transformers.models.gpt.tokenization_gpt_fast.GPTTokenizerFast.do_lower_case property

Method 'do_lower_case' in the class 'GPTTokenizerFast'.

PARAMETER DESCRIPTION
self

This parameter refers to the instance of the class itself. It is required for accessing the object's attributes and methods.

RETURNS DESCRIPTION
True

This method always returns a boolean value of True indicating that lowercasing is enabled.

mindnlp.transformers.models.gpt.tokenization_gpt_fast.GPTTokenizerFast.__init__(vocab_file=None, merges_file=None, tokenizer_file=None, unk_token='<unk>', **kwargs)

Initialize a GPTTokenizerFast object.

PARAMETER DESCRIPTION
vocab_file

Path to the vocabulary file. Default is None.

TYPE: str DEFAULT: None

merges_file

Path to the merges file. Default is None.

TYPE: str DEFAULT: None

tokenizer_file

Path to the tokenizer file. Default is None.

TYPE: str DEFAULT: None

unk_token

The token to represent unknown words. Default is ''.

TYPE: str DEFAULT: '<unk>'

**kwargs

Additional keyword arguments.

DEFAULT: {}

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/gpt/tokenization_gpt_fast.py
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
def __init__(self, vocab_file=None, merges_file=None, tokenizer_file=None, unk_token="<unk>", **kwargs):
    """
    Initialize a GPTTokenizerFast object.

    Args:
        vocab_file (str): Path to the vocabulary file. Default is None.
        merges_file (str): Path to the merges file. Default is None.
        tokenizer_file (str): Path to the tokenizer file. Default is None.
        unk_token (str): The token to represent unknown words. Default is '<unk>'.
        **kwargs: Additional keyword arguments.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__(vocab_file, merges_file, tokenizer_file=tokenizer_file, unk_token=unk_token, **kwargs)

mindnlp.transformers.models.gpt.tokenization_gpt_fast.GPTTokenizerFast.save_vocabulary(save_directory, filename_prefix=None)

Save the vocabulary files generated by the GPTTokenizerFast instance to the specified directory.

PARAMETER DESCRIPTION
self

The GPTTokenizerFast instance.

TYPE: GPTTokenizerFast

save_directory

The directory where the vocabulary files will be saved.

TYPE: str

filename_prefix

An optional prefix to be added to the filenames of the vocabulary files. Default is None.

TYPE: Optional[str] DEFAULT: None

RETURNS DESCRIPTION
Tuple[str]

Tuple[str]: A tuple containing the filenames of the saved vocabulary files.

This method saves the vocabulary files generated by the GPTTokenizerFast instance to the specified directory. The save_directory parameter should be a valid directory path. If filename_prefix is provided, it will be added as a prefix to the filenames of the vocabulary files. The method returns a tuple containing the filenames of the saved vocabulary files.

Source code in mindnlp/transformers/models/gpt/tokenization_gpt_fast.py
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
def save_vocabulary(self, save_directory: str, filename_prefix: Optional[str] = None) -> Tuple[str]:
    """
    Save the vocabulary files generated by the GPTTokenizerFast instance to the specified directory.

    Args:
        self (GPTTokenizerFast): The GPTTokenizerFast instance.
        save_directory (str): The directory where the vocabulary files will be saved.
        filename_prefix (Optional[str]): An optional prefix to be added to the filenames of the vocabulary files.
            Default is None.

    Returns:
        Tuple[str]: A tuple containing the filenames of the saved vocabulary files.

    Raises:
        None.

    This method saves the vocabulary files generated by the GPTTokenizerFast instance to the specified directory.
    The save_directory parameter should be a valid directory path. If filename_prefix is provided,
    it will be added as a prefix to the filenames of the vocabulary files.
    The method returns a tuple containing the filenames of the saved vocabulary files.
    """
    files = self._tokenizer.model.save(save_directory, name=filename_prefix)
    return tuple(files)