Skip to content

xlm

mindnlp.transformers.models.xlm.modeling_xlm

PyTorch XLM model.

mindnlp.transformers.models.xlm.modeling_xlm.MultiHeadAttention

Bases: Module

A class representing a multi-head attention mechanism for neural networks.

This class implements multi-head attention by dividing the input into multiple heads and processing them in parallel. It includes methods for initializing the attention mechanism, pruning heads based on specific criteria, and forwarding the attention output based on input, masks, and key-value pairs.

ATTRIBUTE DESCRIPTION
layer_id

An identifier for the attention layer.

dim

The dimensionality of the input.

n_heads

The number of attention heads.

dropout

The dropout rate for attention weights.

q_lin

Linear transformation for query vectors.

k_lin

Linear transformation for key vectors.

v_lin

Linear transformation for value vectors.

out_lin

Linear transformation for the final output.

pruned_heads

A set containing indices of pruned attention heads.

METHOD DESCRIPTION
__init__

Initializes the multi-head attention mechanism.

prune_heads

Prunes specified attention heads based on given criteria.

forward

Constructs the attention output based on input, masks, and key-value pairs.

Note

This class inherits from nn.Module and is designed for neural network architectures that require multi-head attention mechanisms.

Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
class MultiHeadAttention(nn.Module):

    """
    A class representing a multi-head attention mechanism for neural networks.

    This class implements multi-head attention by dividing the input into multiple heads and processing them in parallel. 
    It includes methods for initializing the attention mechanism, pruning heads based on specific criteria, and 
    forwarding the attention output based on input, masks, and key-value pairs.

    Attributes:
        layer_id: An identifier for the attention layer.
        dim: The dimensionality of the input.
        n_heads: The number of attention heads.
        dropout: The dropout rate for attention weights.
        q_lin: Linear transformation for query vectors.
        k_lin: Linear transformation for key vectors.
        v_lin: Linear transformation for value vectors.
        out_lin: Linear transformation for the final output.
        pruned_heads: A set containing indices of pruned attention heads.

    Methods:
        __init__: Initializes the multi-head attention mechanism.
        prune_heads: Prunes specified attention heads based on given criteria.
        forward: Constructs the attention output based on input, masks, and key-value pairs.

    Note:
        This class inherits from nn.Module and is designed for neural network architectures that require multi-head 
        attention mechanisms.
    """
    NEW_ID = itertools.count()

    def __init__(self, n_heads, dim, config):
        """Initialize a MultiHeadAttention object.

        Args:
            self: The MultiHeadAttention object.
            n_heads (int): The number of attention heads.
            dim (int): The dimension of the input.
            config (object): The configuration object containing the attention dropout.

        Returns:
            None

        Raises:
            AssertionError: If the dimension is not divisible by the number of attention heads.

        """
        super().__init__()
        self.layer_id = next(MultiHeadAttention.NEW_ID)
        self.dim = dim
        self.n_heads = n_heads
        self.dropout = config.attention_dropout
        assert self.dim % self.n_heads == 0

        self.q_lin = nn.Linear(dim, dim)
        self.k_lin = nn.Linear(dim, dim)
        self.v_lin = nn.Linear(dim, dim)
        self.out_lin = nn.Linear(dim, dim)
        self.pruned_heads = set()

    def prune_heads(self, heads):
        """
        Prunes the attention heads in a MultiHeadAttention layer.

        Args:
            self (MultiHeadAttention): The instance of the MultiHeadAttention class.
            heads (List[int]): A list of integers representing the indices of the attention heads to be pruned.

        Returns:
            None

        Raises:
            None

        This method prunes the specified attention heads in a MultiHeadAttention layer. 
        The attention heads are pruned based on the given indices. The method performs the following steps:

        1. Calculates the attention_head_size by dividing the dimension (self.dim) by the number of heads (self.n_heads).
        2. If the list of heads is empty, the method returns without performing any pruning.
        3. Calls the 'find_pruneable_heads_and_indices' function to find the pruneable heads and their corresponding 
        indices based on the given parameters (heads, self.n_heads, attention_head_size, self.pruned_heads).
        4. Prunes the linear layers q_lin, k_lin, v_lin, and out_lin using the 'prune_linear_layer' function, passing 
        the calculated indices (index) as a parameter.
        5. Updates the number of heads (self.n_heads) by subtracting the length of the pruneable heads list.
        6. Updates the dimension (self.dim) by multiplying the attention_head_size with the updated number of heads.
        7. Updates the set of pruned heads (self.pruned_heads) by adding the pruneable heads.

        Note:
            Pruning attention heads reduces the computational complexity of the MultiHeadAttention layer.
        """
        attention_head_size = self.dim // self.n_heads
        if len(heads) == 0:
            return
        heads, index = find_pruneable_heads_and_indices(heads, self.n_heads, attention_head_size, self.pruned_heads)
        # Prune linear layers
        self.q_lin = prune_linear_layer(self.q_lin, index)
        self.k_lin = prune_linear_layer(self.k_lin, index)
        self.v_lin = prune_linear_layer(self.v_lin, index)
        self.out_lin = prune_linear_layer(self.out_lin, index, dim=1)
        # Update hyper params
        self.n_heads = self.n_heads - len(heads)
        self.dim = attention_head_size * self.n_heads
        self.pruned_heads = self.pruned_heads.union(heads)

    def forward(self, input, mask, kv=None, cache=None, head_mask=None, output_attentions=False):
        """
        Self-attention (if kv is None) or attention over source sentence (provided by kv).
        """
        # Input is (bs, qlen, dim)
        # Mask is (bs, klen) (non-causal) or (bs, klen, klen)
        bs, qlen, _ = input.shape
        if kv is None:
            klen = qlen if cache is None else cache["slen"] + qlen
        else:
            klen = kv.shape[1]
        # assert dim == self.dim, f'Dimensions do not match: {dim} input vs {self.dim} configured'
        n_heads = self.n_heads
        dim_per_head = self.dim // n_heads
        mask_reshape = (bs, 1, qlen, klen) if mask.dim() == 3 else (bs, 1, 1, klen)

        def shape(x):
            """projection"""
            return x.view(bs, -1, self.n_heads, dim_per_head).swapaxes(1, 2)

        def unshape(x):
            """compute context"""
            return x.swapaxes(1, 2).view(bs, -1, self.n_heads * dim_per_head)

        q = shape(self.q_lin(input))  # (bs, n_heads, qlen, dim_per_head)
        if kv is None:
            k = shape(self.k_lin(input))  # (bs, n_heads, qlen, dim_per_head)
            v = shape(self.v_lin(input))  # (bs, n_heads, qlen, dim_per_head)
        elif cache is None or self.layer_id not in cache:
            k = v = kv
            k = shape(self.k_lin(k))  # (bs, n_heads, qlen, dim_per_head)
            v = shape(self.v_lin(v))  # (bs, n_heads, qlen, dim_per_head)

        if cache is not None:
            if self.layer_id in cache:
                if kv is None:
                    k_, v_ = cache[self.layer_id]
                    k = ops.cat([k_, k], axis=2)  # (bs, n_heads, klen, dim_per_head)
                    v = ops.cat([v_, v], axis=2)  # (bs, n_heads, klen, dim_per_head)
                else:
                    k, v = cache[self.layer_id]
            cache[self.layer_id] = (k, v)

        q = q / math.sqrt(dim_per_head)  # (bs, n_heads, qlen, dim_per_head)
        scores = ops.matmul(q, k.swapaxes(2, 3))  # (bs, n_heads, qlen, klen)
        mask = (mask == 0).view(mask_reshape).expand_as(scores)  # (bs, n_heads, qlen, klen)
        scores = scores.masked_fill(mask, np.finfo(mindspore.dtype_to_nptype(scores.dtype)).min)  # (bs, n_heads, qlen, klen)

        weights = ops.softmax(scores.float(), axis=-1).astype(scores.dtype)  # (bs, n_heads, qlen, klen)
        weights = ops.dropout(weights, p=self.dropout, training=self.training)  # (bs, n_heads, qlen, klen)

        # Mask heads if we want to
        if head_mask is not None:
            weights = weights * head_mask

        context = ops.matmul(weights, v)  # (bs, n_heads, qlen, dim_per_head)
        context = unshape(context)  # (bs, qlen, dim)

        outputs = (self.out_lin(context),)
        if output_attentions:
            outputs = outputs + (weights,)
        return outputs

mindnlp.transformers.models.xlm.modeling_xlm.MultiHeadAttention.__init__(n_heads, dim, config)

Initialize a MultiHeadAttention object.

PARAMETER DESCRIPTION
self

The MultiHeadAttention object.

n_heads

The number of attention heads.

TYPE: int

dim

The dimension of the input.

TYPE: int

config

The configuration object containing the attention dropout.

TYPE: object

RETURNS DESCRIPTION

None

RAISES DESCRIPTION
AssertionError

If the dimension is not divisible by the number of attention heads.

Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
def __init__(self, n_heads, dim, config):
    """Initialize a MultiHeadAttention object.

    Args:
        self: The MultiHeadAttention object.
        n_heads (int): The number of attention heads.
        dim (int): The dimension of the input.
        config (object): The configuration object containing the attention dropout.

    Returns:
        None

    Raises:
        AssertionError: If the dimension is not divisible by the number of attention heads.

    """
    super().__init__()
    self.layer_id = next(MultiHeadAttention.NEW_ID)
    self.dim = dim
    self.n_heads = n_heads
    self.dropout = config.attention_dropout
    assert self.dim % self.n_heads == 0

    self.q_lin = nn.Linear(dim, dim)
    self.k_lin = nn.Linear(dim, dim)
    self.v_lin = nn.Linear(dim, dim)
    self.out_lin = nn.Linear(dim, dim)
    self.pruned_heads = set()

mindnlp.transformers.models.xlm.modeling_xlm.MultiHeadAttention.forward(input, mask, kv=None, cache=None, head_mask=None, output_attentions=False)

Self-attention (if kv is None) or attention over source sentence (provided by kv).

Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
def forward(self, input, mask, kv=None, cache=None, head_mask=None, output_attentions=False):
    """
    Self-attention (if kv is None) or attention over source sentence (provided by kv).
    """
    # Input is (bs, qlen, dim)
    # Mask is (bs, klen) (non-causal) or (bs, klen, klen)
    bs, qlen, _ = input.shape
    if kv is None:
        klen = qlen if cache is None else cache["slen"] + qlen
    else:
        klen = kv.shape[1]
    # assert dim == self.dim, f'Dimensions do not match: {dim} input vs {self.dim} configured'
    n_heads = self.n_heads
    dim_per_head = self.dim // n_heads
    mask_reshape = (bs, 1, qlen, klen) if mask.dim() == 3 else (bs, 1, 1, klen)

    def shape(x):
        """projection"""
        return x.view(bs, -1, self.n_heads, dim_per_head).swapaxes(1, 2)

    def unshape(x):
        """compute context"""
        return x.swapaxes(1, 2).view(bs, -1, self.n_heads * dim_per_head)

    q = shape(self.q_lin(input))  # (bs, n_heads, qlen, dim_per_head)
    if kv is None:
        k = shape(self.k_lin(input))  # (bs, n_heads, qlen, dim_per_head)
        v = shape(self.v_lin(input))  # (bs, n_heads, qlen, dim_per_head)
    elif cache is None or self.layer_id not in cache:
        k = v = kv
        k = shape(self.k_lin(k))  # (bs, n_heads, qlen, dim_per_head)
        v = shape(self.v_lin(v))  # (bs, n_heads, qlen, dim_per_head)

    if cache is not None:
        if self.layer_id in cache:
            if kv is None:
                k_, v_ = cache[self.layer_id]
                k = ops.cat([k_, k], axis=2)  # (bs, n_heads, klen, dim_per_head)
                v = ops.cat([v_, v], axis=2)  # (bs, n_heads, klen, dim_per_head)
            else:
                k, v = cache[self.layer_id]
        cache[self.layer_id] = (k, v)

    q = q / math.sqrt(dim_per_head)  # (bs, n_heads, qlen, dim_per_head)
    scores = ops.matmul(q, k.swapaxes(2, 3))  # (bs, n_heads, qlen, klen)
    mask = (mask == 0).view(mask_reshape).expand_as(scores)  # (bs, n_heads, qlen, klen)
    scores = scores.masked_fill(mask, np.finfo(mindspore.dtype_to_nptype(scores.dtype)).min)  # (bs, n_heads, qlen, klen)

    weights = ops.softmax(scores.float(), axis=-1).astype(scores.dtype)  # (bs, n_heads, qlen, klen)
    weights = ops.dropout(weights, p=self.dropout, training=self.training)  # (bs, n_heads, qlen, klen)

    # Mask heads if we want to
    if head_mask is not None:
        weights = weights * head_mask

    context = ops.matmul(weights, v)  # (bs, n_heads, qlen, dim_per_head)
    context = unshape(context)  # (bs, qlen, dim)

    outputs = (self.out_lin(context),)
    if output_attentions:
        outputs = outputs + (weights,)
    return outputs

mindnlp.transformers.models.xlm.modeling_xlm.MultiHeadAttention.prune_heads(heads)

Prunes the attention heads in a MultiHeadAttention layer.

PARAMETER DESCRIPTION
self

The instance of the MultiHeadAttention class.

TYPE: MultiHeadAttention

heads

A list of integers representing the indices of the attention heads to be pruned.

TYPE: List[int]

RETURNS DESCRIPTION

None

This method prunes the specified attention heads in a MultiHeadAttention layer. The attention heads are pruned based on the given indices. The method performs the following steps:

  1. Calculates the attention_head_size by dividing the dimension (self.dim) by the number of heads (self.n_heads).
  2. If the list of heads is empty, the method returns without performing any pruning.
  3. Calls the 'find_pruneable_heads_and_indices' function to find the pruneable heads and their corresponding indices based on the given parameters (heads, self.n_heads, attention_head_size, self.pruned_heads).
  4. Prunes the linear layers q_lin, k_lin, v_lin, and out_lin using the 'prune_linear_layer' function, passing the calculated indices (index) as a parameter.
  5. Updates the number of heads (self.n_heads) by subtracting the length of the pruneable heads list.
  6. Updates the dimension (self.dim) by multiplying the attention_head_size with the updated number of heads.
  7. Updates the set of pruned heads (self.pruned_heads) by adding the pruneable heads.
Note

Pruning attention heads reduces the computational complexity of the MultiHeadAttention layer.

Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
def prune_heads(self, heads):
    """
    Prunes the attention heads in a MultiHeadAttention layer.

    Args:
        self (MultiHeadAttention): The instance of the MultiHeadAttention class.
        heads (List[int]): A list of integers representing the indices of the attention heads to be pruned.

    Returns:
        None

    Raises:
        None

    This method prunes the specified attention heads in a MultiHeadAttention layer. 
    The attention heads are pruned based on the given indices. The method performs the following steps:

    1. Calculates the attention_head_size by dividing the dimension (self.dim) by the number of heads (self.n_heads).
    2. If the list of heads is empty, the method returns without performing any pruning.
    3. Calls the 'find_pruneable_heads_and_indices' function to find the pruneable heads and their corresponding 
    indices based on the given parameters (heads, self.n_heads, attention_head_size, self.pruned_heads).
    4. Prunes the linear layers q_lin, k_lin, v_lin, and out_lin using the 'prune_linear_layer' function, passing 
    the calculated indices (index) as a parameter.
    5. Updates the number of heads (self.n_heads) by subtracting the length of the pruneable heads list.
    6. Updates the dimension (self.dim) by multiplying the attention_head_size with the updated number of heads.
    7. Updates the set of pruned heads (self.pruned_heads) by adding the pruneable heads.

    Note:
        Pruning attention heads reduces the computational complexity of the MultiHeadAttention layer.
    """
    attention_head_size = self.dim // self.n_heads
    if len(heads) == 0:
        return
    heads, index = find_pruneable_heads_and_indices(heads, self.n_heads, attention_head_size, self.pruned_heads)
    # Prune linear layers
    self.q_lin = prune_linear_layer(self.q_lin, index)
    self.k_lin = prune_linear_layer(self.k_lin, index)
    self.v_lin = prune_linear_layer(self.v_lin, index)
    self.out_lin = prune_linear_layer(self.out_lin, index, dim=1)
    # Update hyper params
    self.n_heads = self.n_heads - len(heads)
    self.dim = attention_head_size * self.n_heads
    self.pruned_heads = self.pruned_heads.union(heads)

mindnlp.transformers.models.xlm.modeling_xlm.TransformerFFN

Bases: Module

TransformerFFN is a class that represents a feed-forward neural network component of a transformer model. It inherits from nn.Module and includes methods for initializing the network and forwarding the forward pass.

ATTRIBUTE DESCRIPTION
in_dim

The input dimension of the network.

TYPE: int

dim_hidden

The dimension of the hidden layer in the network.

TYPE: int

out_dim

The output dimension of the network.

TYPE: int

config

The configuration object containing parameters for the network.

TYPE: object

METHOD DESCRIPTION
__init__

Initializes the TransformerFFN instance with the specified input, hidden, and output dimensions, as well as the configuration object.

forward

Constructs the forward pass of the network using chunking for the specified input.

ff_chunk

Implements the feed-forward chunk of the network, including linear transformations, activation function, and dropout.

Note

This class assumes the presence of nn, ops, and apply_chunking_to_forward functions and objects for neural network and tensor operations.

Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
class TransformerFFN(nn.Module):

    """
    TransformerFFN is a class that represents a feed-forward neural network component of a transformer model. 
    It inherits from nn.Module and includes methods for initializing the network and forwarding the forward pass.

    Attributes:
        in_dim (int): The input dimension of the network.
        dim_hidden (int): The dimension of the hidden layer in the network.
        out_dim (int): The output dimension of the network.
        config (object): The configuration object containing parameters for the network.

    Methods:
        __init__: Initializes the TransformerFFN instance with the specified input, hidden, and output dimensions, 
            as well as the configuration object.
        forward: Constructs the forward pass of the network using chunking for the specified input.
        ff_chunk: Implements the feed-forward chunk of the network, including linear transformations, 
            activation function, and dropout.

    Note:
        This class assumes the presence of nn, ops, and apply_chunking_to_forward functions and objects for neural 
        network and tensor operations.
    """
    def __init__(self, in_dim, dim_hidden, out_dim, config):
        """
        Initializes an instance of the TransformerFFN class.

        Args:
            self (TransformerFFN): The instance of the TransformerFFN class.
            in_dim (int): The input dimension.
            dim_hidden (int): The dimension of the hidden layer.
            out_dim (int): The output dimension.
            config (object): The configuration object containing various settings.

        Returns:
            None.

        Raises:
            None.

        """
        super().__init__()
        self.dropout = config.dropout
        self.lin1 = nn.Linear(in_dim, dim_hidden)
        self.lin2 = nn.Linear(dim_hidden, out_dim)
        self.act = ops.gelu if config.gelu_activation else ops.relu
        self.chunk_size_feed_forward = config.chunk_size_feed_forward
        self.seq_len_dim = 1

    def forward(self, input):
        """
        Method 'forward' in the class 'TransformerFFN'.

        Args:
            self (object): The instance of the TransformerFFN class.
            input (any): The input data to be processed by the method.

        Returns:
            None.

        Raises:
            None.
        """
        return apply_chunking_to_forward(self.ff_chunk, self.chunk_size_feed_forward, self.seq_len_dim, input)

    def ff_chunk(self, input):
        """
        Method 'ff_chunk' in the class 'TransformerFFN'.

        Args:
            self (object): The instance of the TransformerFFN class.
            input (tensor): The input tensor to the feedforward chunk.

        Returns:
            None. The method returns the processed input tensor after passing through the feedforward chunk layers.

        Raises:
            ValueError: If the input tensor is not in the expected format.
            RuntimeError: If an issue occurs during the dropout operation.
        """
        x = self.lin1(input)
        x = self.act(x)
        x = self.lin2(x)
        x = ops.dropout(x, p=self.dropout, training=self.training)
        return x

mindnlp.transformers.models.xlm.modeling_xlm.TransformerFFN.__init__(in_dim, dim_hidden, out_dim, config)

Initializes an instance of the TransformerFFN class.

PARAMETER DESCRIPTION
self

The instance of the TransformerFFN class.

TYPE: TransformerFFN

in_dim

The input dimension.

TYPE: int

dim_hidden

The dimension of the hidden layer.

TYPE: int

out_dim

The output dimension.

TYPE: int

config

The configuration object containing various settings.

TYPE: object

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
def __init__(self, in_dim, dim_hidden, out_dim, config):
    """
    Initializes an instance of the TransformerFFN class.

    Args:
        self (TransformerFFN): The instance of the TransformerFFN class.
        in_dim (int): The input dimension.
        dim_hidden (int): The dimension of the hidden layer.
        out_dim (int): The output dimension.
        config (object): The configuration object containing various settings.

    Returns:
        None.

    Raises:
        None.

    """
    super().__init__()
    self.dropout = config.dropout
    self.lin1 = nn.Linear(in_dim, dim_hidden)
    self.lin2 = nn.Linear(dim_hidden, out_dim)
    self.act = ops.gelu if config.gelu_activation else ops.relu
    self.chunk_size_feed_forward = config.chunk_size_feed_forward
    self.seq_len_dim = 1

mindnlp.transformers.models.xlm.modeling_xlm.TransformerFFN.ff_chunk(input)

Method 'ff_chunk' in the class 'TransformerFFN'.

PARAMETER DESCRIPTION
self

The instance of the TransformerFFN class.

TYPE: object

input

The input tensor to the feedforward chunk.

TYPE: tensor

RETURNS DESCRIPTION

None. The method returns the processed input tensor after passing through the feedforward chunk layers.

RAISES DESCRIPTION
ValueError

If the input tensor is not in the expected format.

RuntimeError

If an issue occurs during the dropout operation.

Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
def ff_chunk(self, input):
    """
    Method 'ff_chunk' in the class 'TransformerFFN'.

    Args:
        self (object): The instance of the TransformerFFN class.
        input (tensor): The input tensor to the feedforward chunk.

    Returns:
        None. The method returns the processed input tensor after passing through the feedforward chunk layers.

    Raises:
        ValueError: If the input tensor is not in the expected format.
        RuntimeError: If an issue occurs during the dropout operation.
    """
    x = self.lin1(input)
    x = self.act(x)
    x = self.lin2(x)
    x = ops.dropout(x, p=self.dropout, training=self.training)
    return x

mindnlp.transformers.models.xlm.modeling_xlm.TransformerFFN.forward(input)

Method 'forward' in the class 'TransformerFFN'.

PARAMETER DESCRIPTION
self

The instance of the TransformerFFN class.

TYPE: object

input

The input data to be processed by the method.

TYPE: any

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
def forward(self, input):
    """
    Method 'forward' in the class 'TransformerFFN'.

    Args:
        self (object): The instance of the TransformerFFN class.
        input (any): The input data to be processed by the method.

    Returns:
        None.

    Raises:
        None.
    """
    return apply_chunking_to_forward(self.ff_chunk, self.chunk_size_feed_forward, self.seq_len_dim, input)

mindnlp.transformers.models.xlm.modeling_xlm.XLMForMultipleChoice

Bases: XLMPreTrainedModel

XLMForMultipleChoice represents a XLM model for multiple choice tasks. It is a subclass of XLMPreTrainedModel and includes methods for building the model, processing input data, and computing multiple choice classification loss.

ATTRIBUTE DESCRIPTION
transformer

An instance of XLMModel for processing input data.

sequence_summary

An instance of SequenceSummary for summarizing the transformer outputs.

logits_proj

An instance of nn.Linear for projecting the sequence summary outputs.

PARAMETER DESCRIPTION
config

The model configuration.

*inputs

Variable length input for the model.

DEFAULT: ()

**kwargs

Additional keyword arguments for the model.

DEFAULT: {}

METHOD DESCRIPTION
forward

Constructs the model and processes the input data for multiple choice tasks.

RETURNS DESCRIPTION

Union[Tuple, MultipleChoiceModelOutput]: A tuple containing the loss and model outputs or an instance of MultipleChoiceModelOutput.

Note

This class inherits from XLMPreTrainedModel and follows the implementation details specific to XLM multiple choice models.

See Also
RAISES DESCRIPTION
ValueError

If invalid input data or model configuration is provided.

RuntimeError

If errors occur during model processing or loss computation.

Example
>>> # Initialize XLMForMultipleChoice model
>>> model = XLMForMultipleChoice(config)
...
>>> # Process input data and compute multiple choice classification loss
>>> outputs = model.forward(input_ids, attention_mask, labels=labels)
...
>>> # Access model outputs
>>> logits = outputs.logits
>>> hidden_states = outputs.hidden_states
>>> attentions = outputs.attentions
Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
class XLMForMultipleChoice(XLMPreTrainedModel):

    """
    XLMForMultipleChoice represents a XLM model for multiple choice tasks. It is a subclass of XLMPreTrainedModel
    and includes methods for building the model, processing input data, and computing multiple choice classification
    loss.

    Attributes:
        transformer: An instance of XLMModel for processing input data.
        sequence_summary: An instance of SequenceSummary for summarizing the transformer outputs.
        logits_proj: An instance of nn.Linear for projecting the sequence summary outputs.

    Args:
        config: The model configuration.
        *inputs: Variable length input for the model.
        **kwargs: Additional keyword arguments for the model.

    Methods:
        forward: Constructs the model and processes the input data for multiple choice tasks.

    Returns:
        Union[Tuple, MultipleChoiceModelOutput]: A tuple containing the loss and model outputs or an instance of
            MultipleChoiceModelOutput.

    Note:
        This class inherits from XLMPreTrainedModel and follows the implementation details specific to XLM
        multiple choice models.

    See Also:
        XLMPreTrainedModel: The base class for all XLM model implementations.
        XLMModel: The base transformer model used for processing input data.
        SequenceSummary: A class for summarizing transformer outputs.
        MultipleChoiceModelOutput: The output class for multiple choice model predictions.

    Raises:
        ValueError: If invalid input data or model configuration is provided.
        RuntimeError: If errors occur during model processing or loss computation.

    Example:
        ```python
        >>> # Initialize XLMForMultipleChoice model
        >>> model = XLMForMultipleChoice(config)
        ...
        >>> # Process input data and compute multiple choice classification loss
        >>> outputs = model.forward(input_ids, attention_mask, labels=labels)
        ...
        >>> # Access model outputs
        >>> logits = outputs.logits
        >>> hidden_states = outputs.hidden_states
        >>> attentions = outputs.attentions
        ```
    """
    def __init__(self, config, *inputs, **kwargs):
        """
        Initializes the XLMForMultipleChoice class.

        Args:
            self: The instance of the class.
            config: The configuration object containing various parameters for model initialization.

        Returns:
            None.

        Raises:
            TypeError: If the provided config parameter is not of the correct type.
            ValueError: If the config parameter is missing required attributes.
            RuntimeError: If an error occurs during the initialization process.
        """
        super().__init__(config, *inputs, **kwargs)

        self.transformer = XLMModel(config)
        self.sequence_summary = SequenceSummary(config)
        self.logits_proj = nn.Linear(config.num_labels, 1)

        # Initialize weights and apply final processing
        self.post_init()

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        langs: Optional[mindspore.Tensor] = None,
        token_type_ids: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        lengths: Optional[mindspore.Tensor] = None,
        cache: Optional[Dict[str, mindspore.Tensor]] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        labels: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple, MultipleChoiceModelOutput]:
        r"""
        Args:
            labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
                Labels for computing the multiple choice classification loss. Indices should be in `[0, ...,
                num_choices-1]` where `num_choices` is the size of the second dimension of the input tensors. (See
                `input_ids` above)
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
        num_choices = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1]

        input_ids = input_ids.view(-1, input_ids.shape[-1]) if input_ids is not None else None
        attention_mask = attention_mask.view(-1, attention_mask.shape[-1]) if attention_mask is not None else None
        token_type_ids = token_type_ids.view(-1, token_type_ids.shape[-1]) if token_type_ids is not None else None
        position_ids = position_ids.view(-1, position_ids.shape[-1]) if position_ids is not None else None
        langs = langs.view(-1, langs.shape[-1]) if langs is not None else None
        inputs_embeds = (
            inputs_embeds.view(-1, inputs_embeds.shape[-2], inputs_embeds.shape[-1])
            if inputs_embeds is not None
            else None
        )

        if lengths is not None:
            logger.warning(
                "The `lengths` parameter cannot be used with the XLM multiple choice models. Please use the "
                "attention mask instead."
            )
            lengths = None

        transformer_outputs = self.transformer(
            input_ids=input_ids,
            attention_mask=attention_mask,
            langs=langs,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            lengths=lengths,
            cache=cache,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )
        output = transformer_outputs[0]
        logits = self.sequence_summary(output)
        logits = self.logits_proj(logits)
        reshaped_logits = logits.view(-1, num_choices)

        loss = None
        if labels is not None:
            loss = ops.cross_entropy(reshaped_logits, labels)

        if not return_dict:
            output = (reshaped_logits,) + transformer_outputs[1:]
            return ((loss,) + output) if loss is not None else output

        return MultipleChoiceModelOutput(
            loss=loss,
            logits=reshaped_logits,
            hidden_states=transformer_outputs.hidden_states,
            attentions=transformer_outputs.attentions,
        )

mindnlp.transformers.models.xlm.modeling_xlm.XLMForMultipleChoice.__init__(config, *inputs, **kwargs)

Initializes the XLMForMultipleChoice class.

PARAMETER DESCRIPTION
self

The instance of the class.

config

The configuration object containing various parameters for model initialization.

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
TypeError

If the provided config parameter is not of the correct type.

ValueError

If the config parameter is missing required attributes.

RuntimeError

If an error occurs during the initialization process.

Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
def __init__(self, config, *inputs, **kwargs):
    """
    Initializes the XLMForMultipleChoice class.

    Args:
        self: The instance of the class.
        config: The configuration object containing various parameters for model initialization.

    Returns:
        None.

    Raises:
        TypeError: If the provided config parameter is not of the correct type.
        ValueError: If the config parameter is missing required attributes.
        RuntimeError: If an error occurs during the initialization process.
    """
    super().__init__(config, *inputs, **kwargs)

    self.transformer = XLMModel(config)
    self.sequence_summary = SequenceSummary(config)
    self.logits_proj = nn.Linear(config.num_labels, 1)

    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.xlm.modeling_xlm.XLMForMultipleChoice.forward(input_ids=None, attention_mask=None, langs=None, token_type_ids=None, position_ids=None, lengths=None, cache=None, head_mask=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=None)

PARAMETER DESCRIPTION
labels

Labels for computing the multiple choice classification loss. Indices should be in [0, ..., num_choices-1] where num_choices is the size of the second dimension of the input tensors. (See input_ids above)

TYPE: `mindspore.Tensor` of shape `(batch_size,)`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    langs: Optional[mindspore.Tensor] = None,
    token_type_ids: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    lengths: Optional[mindspore.Tensor] = None,
    cache: Optional[Dict[str, mindspore.Tensor]] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    labels: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[Tuple, MultipleChoiceModelOutput]:
    r"""
    Args:
        labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for computing the multiple choice classification loss. Indices should be in `[0, ...,
            num_choices-1]` where `num_choices` is the size of the second dimension of the input tensors. (See
            `input_ids` above)
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict
    num_choices = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1]

    input_ids = input_ids.view(-1, input_ids.shape[-1]) if input_ids is not None else None
    attention_mask = attention_mask.view(-1, attention_mask.shape[-1]) if attention_mask is not None else None
    token_type_ids = token_type_ids.view(-1, token_type_ids.shape[-1]) if token_type_ids is not None else None
    position_ids = position_ids.view(-1, position_ids.shape[-1]) if position_ids is not None else None
    langs = langs.view(-1, langs.shape[-1]) if langs is not None else None
    inputs_embeds = (
        inputs_embeds.view(-1, inputs_embeds.shape[-2], inputs_embeds.shape[-1])
        if inputs_embeds is not None
        else None
    )

    if lengths is not None:
        logger.warning(
            "The `lengths` parameter cannot be used with the XLM multiple choice models. Please use the "
            "attention mask instead."
        )
        lengths = None

    transformer_outputs = self.transformer(
        input_ids=input_ids,
        attention_mask=attention_mask,
        langs=langs,
        token_type_ids=token_type_ids,
        position_ids=position_ids,
        lengths=lengths,
        cache=cache,
        head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )
    output = transformer_outputs[0]
    logits = self.sequence_summary(output)
    logits = self.logits_proj(logits)
    reshaped_logits = logits.view(-1, num_choices)

    loss = None
    if labels is not None:
        loss = ops.cross_entropy(reshaped_logits, labels)

    if not return_dict:
        output = (reshaped_logits,) + transformer_outputs[1:]
        return ((loss,) + output) if loss is not None else output

    return MultipleChoiceModelOutput(
        loss=loss,
        logits=reshaped_logits,
        hidden_states=transformer_outputs.hidden_states,
        attentions=transformer_outputs.attentions,
    )

mindnlp.transformers.models.xlm.modeling_xlm.XLMForQuestionAnswering

Bases: XLMPreTrainedModel

The XLMForQuestionAnswering class is a model for question answering tasks using the XLM (Cross-lingual Language Model) architecture. It is designed to take input sequences and output the start and end positions of the answer within the sequence.

This class inherits from XLMPreTrainedModel, which provides the base functionality for loading and using pre-trained XLM models.

ATTRIBUTE DESCRIPTION
`transformer`

An instance of the XLMModel class, which is responsible for encoding the input sequences.

`qa_outputs`

An instance of the SQuADHead class, which is responsible for predicting the start and end

Example
>>> from transformers import AutoTokenizer, XLMForQuestionAnswering
...
>>> tokenizer = AutoTokenizer.from_pretrained("xlm-mlm-en-2048")
>>> model = XLMForQuestionAnswering.from_pretrained("xlm-mlm-en-2048")
...
>>> input_ids = mindspore.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0)  # Batch size 1
>>> start_positions = mindspore.tensor([1])
>>> end_positions = mindspore.tensor([3])
...
>>> outputs = model(input_ids, start_positions=start_positions, end_positions=end_positions)
>>> loss = outputs.loss
Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
class XLMForQuestionAnswering(XLMPreTrainedModel):

    """
    The `XLMForQuestionAnswering` class is a model for question answering tasks using the XLM
    (Cross-lingual Language Model) architecture. It is designed to take input sequences and output the start and end
    positions of the answer within the sequence.

    This class inherits from `XLMPreTrainedModel`, which provides the base functionality for loading and using
    pre-trained XLM models.

    Attributes:
        `transformer`: An instance of the `XLMModel` class, which is responsible for encoding the input sequences.
        `qa_outputs`: An instance of the `SQuADHead` class, which is responsible for predicting the start and end
        positions of the answer.

    Example:
        ```python
        >>> from transformers import AutoTokenizer, XLMForQuestionAnswering
        ...
        >>> tokenizer = AutoTokenizer.from_pretrained("xlm-mlm-en-2048")
        >>> model = XLMForQuestionAnswering.from_pretrained("xlm-mlm-en-2048")
        ...
        >>> input_ids = mindspore.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0)  # Batch size 1
        >>> start_positions = mindspore.tensor([1])
        >>> end_positions = mindspore.tensor([3])
        ...
        >>> outputs = model(input_ids, start_positions=start_positions, end_positions=end_positions)
        >>> loss = outputs.loss
        ```

    """
    def __init__(self, config):
        """
        Initializes an instance of the XLMForQuestionAnswering class.

        Args:
            self (XLMForQuestionAnswering): The current instance of the XLMForQuestionAnswering class.
            config: The configuration object containing settings for the XLMForQuestionAnswering model.

        Returns:
            None.

        Raises:
            TypeError: If the provided 'config' parameter is not of the expected type.
            ValueError: If there are issues during the initialization process of the XLMForQuestionAnswering instance.
        """
        super().__init__(config)

        self.transformer = XLMModel(config)
        self.qa_outputs = SQuADHead(config)

        # Initialize weights and apply final processing
        self.post_init()

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        langs: Optional[mindspore.Tensor] = None,
        token_type_ids: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        lengths: Optional[mindspore.Tensor] = None,
        cache: Optional[Dict[str, mindspore.Tensor]] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        start_positions: Optional[mindspore.Tensor] = None,
        end_positions: Optional[mindspore.Tensor] = None,
        is_impossible: Optional[mindspore.Tensor] = None,
        cls_index: Optional[mindspore.Tensor] = None,
        p_mask: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple, XLMForQuestionAnsweringOutput]:
        r"""
        Args:
            start_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
                Labels for position (index) of the start of the labelled span for computing the token classification loss.
                Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
                are not taken into account for computing the loss.
            end_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
                Labels for position (index) of the end of the labelled span for computing the token classification loss.
                Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
                are not taken into account for computing the loss.
            is_impossible (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
                Labels whether a question has an answer or no answer (SQuAD 2.0)
            cls_index (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
                Labels for position (index) of the classification token to use as input for computing plausibility of the
                answer.
            p_mask (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
                Optional mask of tokens which can't be in answers (e.g. [CLS], [PAD], ...). 1.0 means token should be
                masked. 0.0 mean token is not masked.

        Returns:
            `Union[Tuple, XLMForQuestionAnsweringOutput]`

        Example:
            ```python
            >>> from transformers import AutoTokenizer, XLMForQuestionAnswering
            ...
            >>> tokenizer = AutoTokenizer.from_pretrained("xlm-mlm-en-2048")
            >>> model = XLMForQuestionAnswering.from_pretrained("xlm-mlm-en-2048")
            ...
            >>> input_ids = mindspore.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(
            ...     0
            ... )  # Batch size 1
            >>> start_positions = mindspore.tensor([1])
            >>> end_positions = mindspore.tensor([3])
            ...
            >>> outputs = model(input_ids, start_positions=start_positions, end_positions=end_positions)
            >>> loss = outputs.loss
            ```
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        transformer_outputs = self.transformer(
            input_ids,
            attention_mask=attention_mask,
            langs=langs,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            lengths=lengths,
            cache=cache,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        output = transformer_outputs[0]

        outputs = self.qa_outputs(
            output,
            start_positions=start_positions,
            end_positions=end_positions,
            cls_index=cls_index,
            is_impossible=is_impossible,
            p_mask=p_mask,
            return_dict=return_dict,
        )

        if not return_dict:
            return outputs + transformer_outputs[1:]

        return XLMForQuestionAnsweringOutput(
            loss=outputs.loss,
            start_top_log_probs=outputs.start_top_log_probs,
            start_top_index=outputs.start_top_index,
            end_top_log_probs=outputs.end_top_log_probs,
            end_top_index=outputs.end_top_index,
            cls_logits=outputs.cls_logits,
            hidden_states=transformer_outputs.hidden_states,
            attentions=transformer_outputs.attentions,
        )

mindnlp.transformers.models.xlm.modeling_xlm.XLMForQuestionAnswering.__init__(config)

Initializes an instance of the XLMForQuestionAnswering class.

PARAMETER DESCRIPTION
self

The current instance of the XLMForQuestionAnswering class.

TYPE: XLMForQuestionAnswering

config

The configuration object containing settings for the XLMForQuestionAnswering model.

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
TypeError

If the provided 'config' parameter is not of the expected type.

ValueError

If there are issues during the initialization process of the XLMForQuestionAnswering instance.

Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
def __init__(self, config):
    """
    Initializes an instance of the XLMForQuestionAnswering class.

    Args:
        self (XLMForQuestionAnswering): The current instance of the XLMForQuestionAnswering class.
        config: The configuration object containing settings for the XLMForQuestionAnswering model.

    Returns:
        None.

    Raises:
        TypeError: If the provided 'config' parameter is not of the expected type.
        ValueError: If there are issues during the initialization process of the XLMForQuestionAnswering instance.
    """
    super().__init__(config)

    self.transformer = XLMModel(config)
    self.qa_outputs = SQuADHead(config)

    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.xlm.modeling_xlm.XLMForQuestionAnswering.forward(input_ids=None, attention_mask=None, langs=None, token_type_ids=None, position_ids=None, lengths=None, cache=None, head_mask=None, inputs_embeds=None, start_positions=None, end_positions=None, is_impossible=None, cls_index=None, p_mask=None, output_attentions=None, output_hidden_states=None, return_dict=None)

PARAMETER DESCRIPTION
start_positions

Labels for position (index) of the start of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length). Position outside of the sequence are not taken into account for computing the loss.

TYPE: `mindspore.Tensor` of shape `(batch_size,)`, *optional* DEFAULT: None

end_positions

Labels for position (index) of the end of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length). Position outside of the sequence are not taken into account for computing the loss.

TYPE: `mindspore.Tensor` of shape `(batch_size,)`, *optional* DEFAULT: None

is_impossible

Labels whether a question has an answer or no answer (SQuAD 2.0)

TYPE: `mindspore.Tensor` of shape `(batch_size,)`, *optional* DEFAULT: None

cls_index

Labels for position (index) of the classification token to use as input for computing plausibility of the answer.

TYPE: `mindspore.Tensor` of shape `(batch_size,)`, *optional* DEFAULT: None

p_mask

Optional mask of tokens which can't be in answers (e.g. [CLS], [PAD], ...). 1.0 means token should be masked. 0.0 mean token is not masked.

TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional* DEFAULT: None

RETURNS DESCRIPTION
Union[Tuple, XLMForQuestionAnsweringOutput]

Union[Tuple, XLMForQuestionAnsweringOutput]

Example
>>> from transformers import AutoTokenizer, XLMForQuestionAnswering
...
>>> tokenizer = AutoTokenizer.from_pretrained("xlm-mlm-en-2048")
>>> model = XLMForQuestionAnswering.from_pretrained("xlm-mlm-en-2048")
...
>>> input_ids = mindspore.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(
...     0
... )  # Batch size 1
>>> start_positions = mindspore.tensor([1])
>>> end_positions = mindspore.tensor([3])
...
>>> outputs = model(input_ids, start_positions=start_positions, end_positions=end_positions)
>>> loss = outputs.loss
Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    langs: Optional[mindspore.Tensor] = None,
    token_type_ids: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    lengths: Optional[mindspore.Tensor] = None,
    cache: Optional[Dict[str, mindspore.Tensor]] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    start_positions: Optional[mindspore.Tensor] = None,
    end_positions: Optional[mindspore.Tensor] = None,
    is_impossible: Optional[mindspore.Tensor] = None,
    cls_index: Optional[mindspore.Tensor] = None,
    p_mask: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[Tuple, XLMForQuestionAnsweringOutput]:
    r"""
    Args:
        start_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for position (index) of the start of the labelled span for computing the token classification loss.
            Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
            are not taken into account for computing the loss.
        end_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for position (index) of the end of the labelled span for computing the token classification loss.
            Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
            are not taken into account for computing the loss.
        is_impossible (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels whether a question has an answer or no answer (SQuAD 2.0)
        cls_index (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for position (index) of the classification token to use as input for computing plausibility of the
            answer.
        p_mask (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
            Optional mask of tokens which can't be in answers (e.g. [CLS], [PAD], ...). 1.0 means token should be
            masked. 0.0 mean token is not masked.

    Returns:
        `Union[Tuple, XLMForQuestionAnsweringOutput]`

    Example:
        ```python
        >>> from transformers import AutoTokenizer, XLMForQuestionAnswering
        ...
        >>> tokenizer = AutoTokenizer.from_pretrained("xlm-mlm-en-2048")
        >>> model = XLMForQuestionAnswering.from_pretrained("xlm-mlm-en-2048")
        ...
        >>> input_ids = mindspore.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(
        ...     0
        ... )  # Batch size 1
        >>> start_positions = mindspore.tensor([1])
        >>> end_positions = mindspore.tensor([3])
        ...
        >>> outputs = model(input_ids, start_positions=start_positions, end_positions=end_positions)
        >>> loss = outputs.loss
        ```
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    transformer_outputs = self.transformer(
        input_ids,
        attention_mask=attention_mask,
        langs=langs,
        token_type_ids=token_type_ids,
        position_ids=position_ids,
        lengths=lengths,
        cache=cache,
        head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    output = transformer_outputs[0]

    outputs = self.qa_outputs(
        output,
        start_positions=start_positions,
        end_positions=end_positions,
        cls_index=cls_index,
        is_impossible=is_impossible,
        p_mask=p_mask,
        return_dict=return_dict,
    )

    if not return_dict:
        return outputs + transformer_outputs[1:]

    return XLMForQuestionAnsweringOutput(
        loss=outputs.loss,
        start_top_log_probs=outputs.start_top_log_probs,
        start_top_index=outputs.start_top_index,
        end_top_log_probs=outputs.end_top_log_probs,
        end_top_index=outputs.end_top_index,
        cls_logits=outputs.cls_logits,
        hidden_states=transformer_outputs.hidden_states,
        attentions=transformer_outputs.attentions,
    )

mindnlp.transformers.models.xlm.modeling_xlm.XLMForQuestionAnsweringOutput dataclass

Bases: ModelOutput

Base class for outputs of question answering models using a SquadHead.

Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
@dataclass
class XLMForQuestionAnsweringOutput(ModelOutput):
    """
    Base class for outputs of question answering models using a `SquadHead`.

    Args:
        loss (`mindspore.Tensor` of shape `(1,)`, *optional*, returned if both `start_positions` and `end_positions`
            are provided):
            Classification loss as the sum of start token, end token (and is_impossible if provided) classification
            losses.
        start_top_log_probs (`mindspore.Tensor` of shape `(batch_size, config.start_n_top)`, *optional*, returned if
            `start_positions` or `end_positions` is not provided):
            Log probabilities for the top config.start_n_top start token possibilities (beam-search).
        start_top_index (`mindspore.Tensor` of shape `(batch_size, config.start_n_top)`, *optional*, returned if
            `start_positions` or `end_positions` is not provided):
            Indices for the top config.start_n_top start token possibilities (beam-search).
        end_top_log_probs (`mindspore.Tensor` of shape `(batch_size, config.start_n_top * config.end_n_top)`,
            *optional*, returned if `start_positions` or `end_positions` is not provided):
            Log probabilities for the top `config.start_n_top * config.end_n_top` end token possibilities
            (beam-search).
        end_top_index (`mindspore.Tensor` of shape `(batch_size, config.start_n_top * config.end_n_top)`,
            *optional*, returned if `start_positions` or `end_positions` is not provided):
            Indices for the top `config.start_n_top * config.end_n_top` end token possibilities (beam-search).
        cls_logits (`mindspore.Tensor` of shape `(batch_size,)`, *optional*, returned if `start_positions` or
            `end_positions` is not provided):
            Log probabilities for the `is_impossible` label of the answers.
        hidden_states (`tuple(mindspore.Tensor)`, *optional*, returned when `output_hidden_states=True`
            is passed or when `config.output_hidden_states=True`):
            Tuple of `mindspore.Tensor` (one for the output of the embeddings + one for the output of each layer) of
            shape `(batch_size, sequence_length, hidden_size)`.

            Hidden-states of the model at the output of each layer plus the initial embedding outputs.
        attentions (`tuple(mindspore.Tensor)`, *optional*, returned when `output_attentions=True` is passed or
            when `config.output_attentions=True`):
            Tuple of `mindspore.Tensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length,
            sequence_length)`.

            Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
    """
    loss: Optional[mindspore.Tensor] = None
    start_top_log_probs: Optional[mindspore.Tensor] = None
    start_top_index: Optional[mindspore.Tensor] = None
    end_top_log_probs: Optional[mindspore.Tensor] = None
    end_top_index: Optional[mindspore.Tensor] = None
    cls_logits: Optional[mindspore.Tensor] = None
    hidden_states: Optional[Tuple[mindspore.Tensor]] = None
    attentions: Optional[Tuple[mindspore.Tensor]] = None

mindnlp.transformers.models.xlm.modeling_xlm.XLMForQuestionAnsweringSimple

Bases: XLMPreTrainedModel

This class represents a simple XLM model for question answering. It inherits from XLMPreTrainedModel and includes methods for forwarding the model and handling question answering tasks.

ATTRIBUTE DESCRIPTION
transformer

The XLMModel instance for the transformer component of the model.

TYPE: XLMModel

qa_outputs

The output layer for question answering predictions.

TYPE: Linear

METHOD DESCRIPTION
forward

Construct the model for question answering tasks, with optional input parameters and return values. This method includes detailed descriptions of the input and output tensors, as well as the expected behavior of the model during inference.

Note

This class is intended for use with the MindSpore framework.

Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
class XLMForQuestionAnsweringSimple(XLMPreTrainedModel):

    """
    This class represents a simple XLM model for question answering. It inherits from XLMPreTrainedModel
    and includes methods for forwarding the model and handling question answering tasks.

    Attributes:
        transformer (XLMModel): The XLMModel instance for the transformer component of the model.
        qa_outputs (nn.Linear): The output layer for question answering predictions.

    Methods:
        forward: Construct the model for question answering tasks, with optional input parameters and return values.
            This method includes detailed descriptions of the input and output tensors, as well as
            the expected behavior of the model during inference.

    Note:
        This class is intended for use with the MindSpore framework.
    """
    def __init__(self, config):
        """
        Initializes a new instance of the 'XLMForQuestionAnsweringSimple' class.

        Args:
            self: The object instance.
            config: An instance of the 'XLMConfig' class containing the configuration parameters for the model.

        Returns:
            None

        Raises:
            None
        """
        super().__init__(config)

        self.transformer = XLMModel(config)
        self.qa_outputs = nn.Linear(config.hidden_size, config.num_labels)

        # Initialize weights and apply final processing
        self.post_init()

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        langs: Optional[mindspore.Tensor] = None,
        token_type_ids: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        lengths: Optional[mindspore.Tensor] = None,
        cache: Optional[Dict[str, mindspore.Tensor]] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        start_positions: Optional[mindspore.Tensor] = None,
        end_positions: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple, QuestionAnsweringModelOutput]:
        r"""
        Args:
            start_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
                Labels for position (index) of the start of the labelled span for computing the token classification loss.
                Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
                are not taken into account for computing the loss.
            end_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
                Labels for position (index) of the end of the labelled span for computing the token classification loss.
                Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
                are not taken into account for computing the loss.
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        transformer_outputs = self.transformer(
            input_ids,
            attention_mask=attention_mask,
            langs=langs,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            lengths=lengths,
            cache=cache,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        sequence_output = transformer_outputs[0]

        logits = self.qa_outputs(sequence_output)
        start_logits, end_logits = logits.split(1, axis=-1)
        start_logits = start_logits.squeeze(-1)
        end_logits = end_logits.squeeze(-1)

        total_loss = None
        if start_positions is not None and end_positions is not None:
            # If we are on multi-GPU, split add a dimension
            if len(start_positions.shape) > 1:
                start_positions = start_positions.squeeze(-1)
            if len(end_positions.shape) > 1:
                end_positions = end_positions.squeeze(-1)
            # sometimes the start/end positions are outside our model inputs, we ignore these terms
            ignored_index = start_logits.shape[1]
            start_positions = start_positions.clamp(0, ignored_index)
            end_positions = end_positions.clamp(0, ignored_index)

            start_loss = ops.cross_entropy(start_logits, start_positions, ignore_index=ignored_index)
            end_loss = ops.cross_entropy(end_logits, end_positions)
            total_loss = (start_loss + end_loss) / 2

        if not return_dict:
            output = (start_logits, end_logits) + transformer_outputs[1:]
            return ((total_loss,) + output) if total_loss is not None else output

        return QuestionAnsweringModelOutput(
            loss=total_loss,
            start_logits=start_logits,
            end_logits=end_logits,
            hidden_states=transformer_outputs.hidden_states,
            attentions=transformer_outputs.attentions,
        )

mindnlp.transformers.models.xlm.modeling_xlm.XLMForQuestionAnsweringSimple.__init__(config)

Initializes a new instance of the 'XLMForQuestionAnsweringSimple' class.

PARAMETER DESCRIPTION
self

The object instance.

config

An instance of the 'XLMConfig' class containing the configuration parameters for the model.

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
def __init__(self, config):
    """
    Initializes a new instance of the 'XLMForQuestionAnsweringSimple' class.

    Args:
        self: The object instance.
        config: An instance of the 'XLMConfig' class containing the configuration parameters for the model.

    Returns:
        None

    Raises:
        None
    """
    super().__init__(config)

    self.transformer = XLMModel(config)
    self.qa_outputs = nn.Linear(config.hidden_size, config.num_labels)

    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.xlm.modeling_xlm.XLMForQuestionAnsweringSimple.forward(input_ids=None, attention_mask=None, langs=None, token_type_ids=None, position_ids=None, lengths=None, cache=None, head_mask=None, inputs_embeds=None, start_positions=None, end_positions=None, output_attentions=None, output_hidden_states=None, return_dict=None)

PARAMETER DESCRIPTION
start_positions

Labels for position (index) of the start of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length). Position outside of the sequence are not taken into account for computing the loss.

TYPE: `mindspore.Tensor` of shape `(batch_size,)`, *optional* DEFAULT: None

end_positions

Labels for position (index) of the end of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length). Position outside of the sequence are not taken into account for computing the loss.

TYPE: `mindspore.Tensor` of shape `(batch_size,)`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    langs: Optional[mindspore.Tensor] = None,
    token_type_ids: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    lengths: Optional[mindspore.Tensor] = None,
    cache: Optional[Dict[str, mindspore.Tensor]] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    start_positions: Optional[mindspore.Tensor] = None,
    end_positions: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[Tuple, QuestionAnsweringModelOutput]:
    r"""
    Args:
        start_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for position (index) of the start of the labelled span for computing the token classification loss.
            Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
            are not taken into account for computing the loss.
        end_positions (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for position (index) of the end of the labelled span for computing the token classification loss.
            Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
            are not taken into account for computing the loss.
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    transformer_outputs = self.transformer(
        input_ids,
        attention_mask=attention_mask,
        langs=langs,
        token_type_ids=token_type_ids,
        position_ids=position_ids,
        lengths=lengths,
        cache=cache,
        head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    sequence_output = transformer_outputs[0]

    logits = self.qa_outputs(sequence_output)
    start_logits, end_logits = logits.split(1, axis=-1)
    start_logits = start_logits.squeeze(-1)
    end_logits = end_logits.squeeze(-1)

    total_loss = None
    if start_positions is not None and end_positions is not None:
        # If we are on multi-GPU, split add a dimension
        if len(start_positions.shape) > 1:
            start_positions = start_positions.squeeze(-1)
        if len(end_positions.shape) > 1:
            end_positions = end_positions.squeeze(-1)
        # sometimes the start/end positions are outside our model inputs, we ignore these terms
        ignored_index = start_logits.shape[1]
        start_positions = start_positions.clamp(0, ignored_index)
        end_positions = end_positions.clamp(0, ignored_index)

        start_loss = ops.cross_entropy(start_logits, start_positions, ignore_index=ignored_index)
        end_loss = ops.cross_entropy(end_logits, end_positions)
        total_loss = (start_loss + end_loss) / 2

    if not return_dict:
        output = (start_logits, end_logits) + transformer_outputs[1:]
        return ((total_loss,) + output) if total_loss is not None else output

    return QuestionAnsweringModelOutput(
        loss=total_loss,
        start_logits=start_logits,
        end_logits=end_logits,
        hidden_states=transformer_outputs.hidden_states,
        attentions=transformer_outputs.attentions,
    )

mindnlp.transformers.models.xlm.modeling_xlm.XLMForSequenceClassification

Bases: XLMPreTrainedModel

XLMForSequenceClassification includes the logic to classify sequences using a transformer-based model. This class inherits from XLMPreTrainedModel and implements the specific logic for sequence classification using the XLM model.

ATTRIBUTE DESCRIPTION
num_labels

The number of labels for sequence classification.

TYPE: int

config

The configuration for the XLM model.

TYPE: XLMConfig

transformer

The transformer model used for sequence classification.

TYPE: XLMModel

sequence_summary

The sequence summarization layer.

TYPE: SequenceSummary

PARAMETER DESCRIPTION
config

The configuration object for the XLMForSequenceClassification model.

TYPE: XLMConfig

METHOD DESCRIPTION
forward

This method forwards the sequence classification model and returns the sequence classifier output.

RAISES DESCRIPTION
ValueError

If the number of labels is invalid or the problem type is not recognized.

RETURNS DESCRIPTION

Union[Tuple, SequenceClassifierOutput]: A tuple containing the loss and output if loss is not None, else the output.

RAISES DESCRIPTION
ValueError

If the number of labels is invalid or the problem type is not recognized.

Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
class XLMForSequenceClassification(XLMPreTrainedModel):

    """
    XLMForSequenceClassification includes the logic to classify sequences using a transformer-based model.
    This class inherits from XLMPreTrainedModel and implements the specific logic for sequence classification
    using the XLM model.

    Attributes:
        num_labels (int): The number of labels for sequence classification.
        config (XLMConfig): The configuration for the XLM model.
        transformer (XLMModel): The transformer model used for sequence classification.
        sequence_summary (SequenceSummary): The sequence summarization layer.

    Args:
        config (XLMConfig): The configuration object for the XLMForSequenceClassification model.

    Methods:
        forward:
            This method forwards the sequence classification model and returns the sequence classifier output.

    Raises:
        ValueError: If the number of labels is invalid or the problem type is not recognized.

    Returns:
        Union[Tuple, SequenceClassifierOutput]:
            A tuple containing the loss and output if loss is not None, else the output.

    Raises:
        ValueError: If the number of labels is invalid or the problem type is not recognized.
    """
    def __init__(self, config):
        """
        Initializes an instance of the XLMForSequenceClassification class.

        Args:
            self (XLMForSequenceClassification): The current instance of the XLMForSequenceClassification class.
            config (XLMConfig): The configuration object containing settings for the model initialization.
                It must include the number of labels 'num_labels' and other necessary configurations.

        Returns:
            None.

        Raises:
            TypeError: If the config parameter is not of type XLMConfig.
            ValueError: If the config object does not contain the required 'num_labels' attribute.
        """
        super().__init__(config)
        self.num_labels = config.num_labels
        self.config = config

        self.transformer = XLMModel(config)
        self.sequence_summary = SequenceSummary(config)

        # Initialize weights and apply final processing
        self.post_init()

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        langs: Optional[mindspore.Tensor] = None,
        token_type_ids: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        lengths: Optional[mindspore.Tensor] = None,
        cache: Optional[Dict[str, mindspore.Tensor]] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        labels: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple, SequenceClassifierOutput]:
        r"""
        Args:
            labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
                Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
                config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
                `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        transformer_outputs = self.transformer(
            input_ids,
            attention_mask=attention_mask,
            langs=langs,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            lengths=lengths,
            cache=cache,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        output = transformer_outputs[0]
        logits = self.sequence_summary(output)

        loss = None
        if labels is not None:
            if self.config.problem_type is None:
                if self.num_labels == 1:
                    self.config.problem_type = "regression"
                elif self.num_labels > 1 and labels.dtype in (mindspore.int64, mindspore.int32):
                    self.config.problem_type = "single_label_classification"
                else:
                    self.config.problem_type = "multi_label_classification"

            if self.config.problem_type == "regression":
                if self.num_labels == 1:
                    loss = ops.mse_loss(logits.squeeze(), labels.squeeze())
                else:
                    loss = ops.mse_loss(logits, labels)
            elif self.config.problem_type == "single_label_classification":
                loss = ops.cross_entropy(logits.view(-1, self.num_labels), labels.view(-1))
            elif self.config.problem_type == "multi_label_classification":
                loss = ops.binary_cross_entropy_with_logits(logits, labels)

        if not return_dict:
            output = (logits,) + transformer_outputs[1:]
            return ((loss,) + output) if loss is not None else output

        return SequenceClassifierOutput(
            loss=loss,
            logits=logits,
            hidden_states=transformer_outputs.hidden_states,
            attentions=transformer_outputs.attentions,
        )

mindnlp.transformers.models.xlm.modeling_xlm.XLMForSequenceClassification.__init__(config)

Initializes an instance of the XLMForSequenceClassification class.

PARAMETER DESCRIPTION
self

The current instance of the XLMForSequenceClassification class.

TYPE: XLMForSequenceClassification

config

The configuration object containing settings for the model initialization. It must include the number of labels 'num_labels' and other necessary configurations.

TYPE: XLMConfig

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
TypeError

If the config parameter is not of type XLMConfig.

ValueError

If the config object does not contain the required 'num_labels' attribute.

Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
def __init__(self, config):
    """
    Initializes an instance of the XLMForSequenceClassification class.

    Args:
        self (XLMForSequenceClassification): The current instance of the XLMForSequenceClassification class.
        config (XLMConfig): The configuration object containing settings for the model initialization.
            It must include the number of labels 'num_labels' and other necessary configurations.

    Returns:
        None.

    Raises:
        TypeError: If the config parameter is not of type XLMConfig.
        ValueError: If the config object does not contain the required 'num_labels' attribute.
    """
    super().__init__(config)
    self.num_labels = config.num_labels
    self.config = config

    self.transformer = XLMModel(config)
    self.sequence_summary = SequenceSummary(config)

    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.xlm.modeling_xlm.XLMForSequenceClassification.forward(input_ids=None, attention_mask=None, langs=None, token_type_ids=None, position_ids=None, lengths=None, cache=None, head_mask=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=None)

PARAMETER DESCRIPTION
labels

Labels for computing the sequence classification/regression loss. Indices should be in [0, ..., config.num_labels - 1]. If config.num_labels == 1 a regression loss is computed (Mean-Square loss), If config.num_labels > 1 a classification loss is computed (Cross-Entropy).

TYPE: `mindspore.Tensor` of shape `(batch_size,)`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    langs: Optional[mindspore.Tensor] = None,
    token_type_ids: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    lengths: Optional[mindspore.Tensor] = None,
    cache: Optional[Dict[str, mindspore.Tensor]] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    labels: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[Tuple, SequenceClassifierOutput]:
    r"""
    Args:
        labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
            config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
            `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    transformer_outputs = self.transformer(
        input_ids,
        attention_mask=attention_mask,
        langs=langs,
        token_type_ids=token_type_ids,
        position_ids=position_ids,
        lengths=lengths,
        cache=cache,
        head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    output = transformer_outputs[0]
    logits = self.sequence_summary(output)

    loss = None
    if labels is not None:
        if self.config.problem_type is None:
            if self.num_labels == 1:
                self.config.problem_type = "regression"
            elif self.num_labels > 1 and labels.dtype in (mindspore.int64, mindspore.int32):
                self.config.problem_type = "single_label_classification"
            else:
                self.config.problem_type = "multi_label_classification"

        if self.config.problem_type == "regression":
            if self.num_labels == 1:
                loss = ops.mse_loss(logits.squeeze(), labels.squeeze())
            else:
                loss = ops.mse_loss(logits, labels)
        elif self.config.problem_type == "single_label_classification":
            loss = ops.cross_entropy(logits.view(-1, self.num_labels), labels.view(-1))
        elif self.config.problem_type == "multi_label_classification":
            loss = ops.binary_cross_entropy_with_logits(logits, labels)

    if not return_dict:
        output = (logits,) + transformer_outputs[1:]
        return ((loss,) + output) if loss is not None else output

    return SequenceClassifierOutput(
        loss=loss,
        logits=logits,
        hidden_states=transformer_outputs.hidden_states,
        attentions=transformer_outputs.attentions,
    )

mindnlp.transformers.models.xlm.modeling_xlm.XLMForTokenClassification

Bases: XLMPreTrainedModel

XLMForTokenClassification

This class is a token classification model based on the XLM architecture. It is designed for token-level classification tasks, such as named entity recognition or part-of-speech tagging. The model takes input sequences and predicts a label for each token in the sequence.

The XLMForTokenClassification class inherits from the XLMPreTrainedModel class, which provides the basic functionality for pre-training and fine-tuning XLM models.

ATTRIBUTE DESCRIPTION
num_labels

The number of labels for token classification.

TYPE: int

transformer

The XLMModel instance used for the transformer architecture.

TYPE: XLMModel

dropout

Dropout layer for regularization.

TYPE: Dropout

classifier

Linear layer for classification.

TYPE: Linear

METHOD DESCRIPTION
__init__

Initializes the XLMForTokenClassification instance.

forward

Constructs the XLMForTokenClassification model and performs token classification.

Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
class XLMForTokenClassification(XLMPreTrainedModel):

    """XLMForTokenClassification

    This class is a token classification model based on the XLM architecture. It is designed for token-level
    classification tasks, such as named entity recognition or part-of-speech tagging. The model takes input sequences
    and predicts a label for each token in the sequence.

    The XLMForTokenClassification class inherits from the XLMPreTrainedModel class, which provides the basic
    functionality for pre-training and fine-tuning XLM models.

    Attributes:
        num_labels (int): The number of labels for token classification.
        transformer (XLMModel): The XLMModel instance used for the transformer architecture.
        dropout (nn.Dropout): Dropout layer for regularization.
        classifier (nn.Linear): Linear layer for classification.

    Methods:
        __init__: Initializes the XLMForTokenClassification instance.
        forward: Constructs the XLMForTokenClassification model and performs token classification.

    """
    def __init__(self, config):
        """Initialize the XLMForTokenClassification class.

            Args:
                config (XLMConfig): The configuration object for the model.

            Returns:
                None

            Raises:
                None
            """
        super().__init__(config)
        self.num_labels = config.num_labels

        self.transformer = XLMModel(config)
        self.dropout = nn.Dropout(p=config.dropout)
        self.classifier = nn.Linear(config.hidden_size, config.num_labels)

        # Initialize weights and apply final processing
        self.post_init()

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        langs: Optional[mindspore.Tensor] = None,
        token_type_ids: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        lengths: Optional[mindspore.Tensor] = None,
        cache: Optional[Dict[str, mindspore.Tensor]] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        labels: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple, TokenClassifierOutput]:
        r"""
        Args:
            labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
                Labels for computing the token classification loss. Indices should be in `[0, ..., config.num_labels - 1]`.
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        outputs = self.transformer(
            input_ids,
            attention_mask=attention_mask,
            langs=langs,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            lengths=lengths,
            cache=cache,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        sequence_output = outputs[0]

        sequence_output = self.dropout(sequence_output)
        logits = self.classifier(sequence_output)

        loss = None
        if labels is not None:
            loss = ops.cross_entropy(logits.view(-1, self.num_labels), labels.view(-1))

        if not return_dict:
            output = (logits,) + outputs[1:]
            return ((loss,) + output) if loss is not None else output

        return TokenClassifierOutput(
            loss=loss,
            logits=logits,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )

mindnlp.transformers.models.xlm.modeling_xlm.XLMForTokenClassification.__init__(config)

Initialize the XLMForTokenClassification class.

PARAMETER DESCRIPTION
config

The configuration object for the model.

TYPE: XLMConfig

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
def __init__(self, config):
    """Initialize the XLMForTokenClassification class.

        Args:
            config (XLMConfig): The configuration object for the model.

        Returns:
            None

        Raises:
            None
        """
    super().__init__(config)
    self.num_labels = config.num_labels

    self.transformer = XLMModel(config)
    self.dropout = nn.Dropout(p=config.dropout)
    self.classifier = nn.Linear(config.hidden_size, config.num_labels)

    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.xlm.modeling_xlm.XLMForTokenClassification.forward(input_ids=None, attention_mask=None, langs=None, token_type_ids=None, position_ids=None, lengths=None, cache=None, head_mask=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=None)

PARAMETER DESCRIPTION
labels

Labels for computing the token classification loss. Indices should be in [0, ..., config.num_labels - 1].

TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    langs: Optional[mindspore.Tensor] = None,
    token_type_ids: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    lengths: Optional[mindspore.Tensor] = None,
    cache: Optional[Dict[str, mindspore.Tensor]] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    labels: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[Tuple, TokenClassifierOutput]:
    r"""
    Args:
        labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
            Labels for computing the token classification loss. Indices should be in `[0, ..., config.num_labels - 1]`.
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    outputs = self.transformer(
        input_ids,
        attention_mask=attention_mask,
        langs=langs,
        token_type_ids=token_type_ids,
        position_ids=position_ids,
        lengths=lengths,
        cache=cache,
        head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    sequence_output = outputs[0]

    sequence_output = self.dropout(sequence_output)
    logits = self.classifier(sequence_output)

    loss = None
    if labels is not None:
        loss = ops.cross_entropy(logits.view(-1, self.num_labels), labels.view(-1))

    if not return_dict:
        output = (logits,) + outputs[1:]
        return ((loss,) + output) if loss is not None else output

    return TokenClassifierOutput(
        loss=loss,
        logits=logits,
        hidden_states=outputs.hidden_states,
        attentions=outputs.attentions,
    )

mindnlp.transformers.models.xlm.modeling_xlm.XLMModel

Bases: XLMPreTrainedModel

XLMModel is a class representing a transformer model for cross-lingual language model pre-training based on the XLM architecture.

This class inherits from XLMPreTrainedModel and implements various methods for initializing the model, handling embeddings, pruning heads, and forwarding the model for inference.

The init method initializes the model with configuration parameters and sets up the model's architecture. It handles encoder-decoder setup, embeddings, attention mechanisms, layer normalization, and other model components.

The get_input_embeddings method returns the input embeddings used in the model, while set_input_embeddings allows for updating the input embeddings.

The _prune_heads method prunes specific attention heads in the model based on the provided dictionary of {layer_num: list of heads}.

The forward method forwards the model for inference, taking input tensors for input_ids, attention_mask, langs, token_type_ids, position_ids, lengths, cache, head_mask, inputs_embeds, output settings, and returns the model output or a BaseModelOutput object depending on the return_dict setting.

Overall, XLMModel provides a comprehensive implementation of the XLM transformer model for cross-lingual language tasks.

Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
class XLMModel(XLMPreTrainedModel):

    """
    XLMModel is a class representing a transformer model for cross-lingual language model pre-training based on the
    XLM architecture.

    This class inherits from XLMPreTrainedModel and implements various methods for initializing the model, handling
    embeddings, pruning heads, and forwarding the model for inference.

    The __init__ method initializes the model with configuration parameters and sets up the model's architecture.
    It handles encoder-decoder setup, embeddings, attention mechanisms, layer normalization, and other model components.

    The get_input_embeddings method returns the input embeddings used in the model, while set_input_embeddings allows
    for updating the input embeddings.

    The _prune_heads method prunes specific attention heads in the model based on the provided dictionary of
    {layer_num: list of heads}.

    The forward method forwards the model for inference, taking input tensors for input_ids, attention_mask, langs,
    token_type_ids, position_ids, lengths, cache, head_mask, inputs_embeds, output settings, and returns the model
    output or a BaseModelOutput object depending on the return_dict setting.

    Overall, XLMModel provides a comprehensive implementation of the XLM transformer model for cross-lingual language
    tasks.
    """
    def __init__(self, config):
        """
        This method initializes an instance of the XLMModel class with the provided configuration.

        Args:
            self: The instance of the XLMModel class.
            config:
                An object containing configuration parameters for the XLMModel.

                - Type: object
                - Purpose: Specifies the configuration settings for the XLMModel.
                - Restrictions: Must be a valid configuration object.

        Returns:
            None.

        Raises:
            NotImplementedError: If the provided configuration indicates that the XLMModel is used as a decoder,
                since XLM can only be used as an encoder.
            AssertionError: If the transformer dimension is not a multiple of the number of heads.

        """
        super().__init__(config)

        # encoder / decoder, output layer
        self.is_encoder = config.is_encoder
        self.is_decoder = not config.is_encoder
        if self.is_decoder:
            raise NotImplementedError("Currently XLM can only be used as an encoder")
        # self.with_output = with_output
        self.causal = config.causal

        # dictionary / languages
        self.n_langs = config.n_langs
        self.use_lang_emb = config.use_lang_emb
        self.n_words = config.n_words
        self.eos_index = config.eos_index
        self.pad_index = config.pad_index
        # self.dico = dico
        # self.id2lang = config.id2lang
        # self.lang2id = config.lang2id
        # assert len(self.dico) == self.n_words
        # assert len(self.id2lang) == len(self.lang2id) == self.n_langs

        # model parameters
        self.dim = config.emb_dim  # 512 by default
        self.hidden_dim = self.dim * 4  # 2048 by default
        self.n_heads = config.n_heads  # 8 by default
        self.n_layers = config.n_layers
        self.dropout = config.dropout
        self.attention_dropout = config.attention_dropout
        assert self.dim % self.n_heads == 0, "transformer dim must be a multiple of n_heads"

        # embeddings
        self.position_embeddings = nn.Embedding(config.max_position_embeddings, self.dim)
        if config.sinusoidal_embeddings:
            create_sinusoidal_embeddings(config.max_position_embeddings, self.dim, out=self.position_embeddings.weight)
        if config.n_langs > 1 and config.use_lang_emb:
            self.lang_embeddings = nn.Embedding(self.n_langs, self.dim)
        self.embeddings = nn.Embedding(self.n_words, self.dim, padding_idx=self.pad_index)
        self.layer_norm_emb = nn.LayerNorm([self.dim], eps=config.layer_norm_eps)

        # transformer layers
        attentions = []
        layer_norm1 = []
        ffns = []
        layer_norm2 = []
        # if self.is_decoder:
        #     self.layer_norm15 = nn.ModuleList()
        #     self.encoder_attn = nn.ModuleList()

        for _ in range(self.n_layers):
            attentions.append(MultiHeadAttention(self.n_heads, self.dim, config=config))
            layer_norm1.append(nn.LayerNorm([self.dim], eps=config.layer_norm_eps))
            # if self.is_decoder:
            #     self.layer_norm15.append(nn.LayerNorm(self.dim, eps=config.layer_norm_eps))
            #     self.encoder_attn.append(MultiHeadAttention(self.n_heads, self.dim, dropout=self.attention_dropout))
            ffns.append(TransformerFFN(self.dim, self.hidden_dim, self.dim, config=config))
            layer_norm2.append(nn.LayerNorm([self.dim], eps=config.layer_norm_eps))

        self.attentions = nn.ModuleList(attentions)
        self.layer_norm1 = nn.ModuleList(layer_norm1)
        self.ffns = nn.ModuleList(ffns)
        self.layer_norm2 = nn.ModuleList(layer_norm2)

        if hasattr(config, "pruned_heads"):
            pruned_heads = config.pruned_heads.copy().items()
            config.pruned_heads = {}
            for layer, heads in pruned_heads:
                if self.attentions[int(layer)].n_heads == config.n_heads:
                    self.prune_heads({int(layer): list(map(int, heads))})

        # Initialize weights and apply final processing
        self.post_init()
        self.position_ids = ops.arange(config.max_position_embeddings).broadcast_to((1, -1))

    def get_input_embeddings(self):
        """
        Retrieve the input embeddings from the XLMModel.

        Args:
            self (XLMModel): An instance of the XLMModel class.

        Returns:
            None.

        Raises:
            None.
        """
        return self.embeddings

    def set_input_embeddings(self, new_embeddings):
        """Set the input embeddings for the XLMModel.

        This method sets the input embeddings for the XLMModel using the given new_embeddings.

        Args:
            self (XLMModel): The instance of the XLMModel class.
            new_embeddings (Any): The new embeddings to set for the XLMModel. It can be of any type.

        Returns:
            None.

        Raises:
            None.

        """
        self.embeddings = new_embeddings

    def _prune_heads(self, heads_to_prune):
        """
        Prunes heads of the model. heads_to_prune: dict of {layer_num: list of heads to prune in this layer} See base
        class PreTrainedModel
        """
        for layer, heads in heads_to_prune.items():
            self.attentions[layer].prune_heads(heads)

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        langs: Optional[mindspore.Tensor] = None,
        token_type_ids: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        lengths: Optional[mindspore.Tensor] = None,
        cache: Optional[Dict[str, mindspore.Tensor]] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple, BaseModelOutput]:
        '''
        Constructs the XLM model.

        Args:
            self: The object itself.
            input_ids (Optional[mindspore.Tensor]): The input tensor of shape (batch_size, sequence_length).
            attention_mask (Optional[mindspore.Tensor]): The attention mask tensor of shape (batch_size, sequence_length).
            langs (Optional[mindspore.Tensor]): The language tensor of shape (batch_size, sequence_length).
            token_type_ids (Optional[mindspore.Tensor]): The token type tensor of shape (batch_size, sequence_length).
            position_ids (Optional[mindspore.Tensor]): The position tensor of shape (batch_size, sequence_length).
            lengths (Optional[mindspore.Tensor]): The lengths tensor of shape (batch_size,).
            cache (Optional[Dict[str, mindspore.Tensor]]): The cache tensor.
            head_mask (Optional[mindspore.Tensor]): The head mask tensor.
            inputs_embeds (Optional[mindspore.Tensor]): The input embeddings tensor of shape
                (batch_size, sequence_length, embedding_size).
            output_attentions (Optional[bool]): Whether to output attentions.
            output_hidden_states (Optional[bool]): Whether to output hidden states.
            return_dict (Optional[bool]): Whether to return a dictionary.

        Returns:
            Union[Tuple, BaseModelOutput]: The model output, which can be a tuple of tensors or a BaseModelOutput object.

        Raises:
            AssertionError: If the lengths tensor shape does not match the batch size or if the maximum length in the
                lengths tensor exceeds the sequence length.
            AssertionError: If the position_ids tensor shape does not match the input tensor shape.
            AssertionError: If the langs tensor shape does not match the input tensor shape.
        '''
        output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
        output_hidden_states = (
            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
        )
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        if input_ids is not None:
            bs, slen = input_ids.shape
        else:
            bs, slen = inputs_embeds.shape[:-1]

        if lengths is None:
            if input_ids is not None:
                lengths = (input_ids != self.pad_index).sum(axis=1).long()
            else:
                lengths = mindspore.tensor([slen] * bs)
        # mask = input_ids != self.pad_index

        # check inputs
        assert lengths.shape[0] == bs
        assert lengths.max().item() <= slen
        # input_ids = input_ids.swapaxes(0, 1)  # batch size as dimension 0
        # assert (src_enc is None) == (src_len is None)
        # if src_enc is not None:
        #     assert self.is_decoder
        #     assert src_enc.shape[0] == bs

        # generate masks
        mask, attn_mask = get_masks(slen, lengths, self.causal, padding_mask=attention_mask)
        # if self.is_decoder and src_enc is not None:
        #     src_mask = ops.arange(src_len.max(), dtype=mindspore.int64) < src_len[:, None]

        # position_ids
        if position_ids is None:
            position_ids = self.position_ids[:, :slen]
        else:
            assert position_ids.shape == (bs, slen)  # (slen, bs)
            # position_ids = position_ids.swapaxes(0, 1)

        # langs
        if langs is not None:
            assert langs.shape == (bs, slen)  # (slen, bs)
            # langs = langs.swapaxes(0, 1)

        # Prepare head mask if needed
        head_mask = self.get_head_mask(head_mask, self.config.n_layers)

        # do not recompute cached elements
        if cache is not None and input_ids is not None:
            _slen = slen - cache["slen"]
            input_ids = input_ids[:, -_slen:]
            position_ids = position_ids[:, -_slen:]
            if langs is not None:
                langs = langs[:, -_slen:]
            mask = mask[:, -_slen:]
            attn_mask = attn_mask[:, -_slen:]

        # embeddings
        if inputs_embeds is None:
            inputs_embeds = self.embeddings(input_ids)

        tensor = inputs_embeds + self.position_embeddings(position_ids).expand_as(inputs_embeds)
        if langs is not None and self.use_lang_emb and self.n_langs > 1:
            tensor = tensor + self.lang_embeddings(langs)
        if token_type_ids is not None:
            tensor = tensor + self.embeddings(token_type_ids)
        tensor = self.layer_norm_emb(tensor)
        tensor = ops.dropout(tensor, p=self.dropout, training=self.training)
        tensor *= mask.unsqueeze(-1).to(tensor.dtype)

        # transformer layers
        hidden_states = () if output_hidden_states else None
        attentions = () if output_attentions else None
        for i in range(self.n_layers):
            if output_hidden_states:
                hidden_states = hidden_states + (tensor,)

            # self attention
            attn_outputs = self.attentions[i](
                tensor,
                attn_mask,
                cache=cache,
                head_mask=head_mask[i],
                output_attentions=output_attentions,
            )
            attn = attn_outputs[0]
            if output_attentions:
                attentions = attentions + (attn_outputs[1],)
            attn = ops.dropout(attn, p=self.dropout, training=self.training)
            tensor = tensor + attn
            tensor = self.layer_norm1[i](tensor)

            # encoder attention (for decoder only)
            # if self.is_decoder and src_enc is not None:
            #     attn = self.encoder_attn[i](tensor, src_mask, kv=src_enc, cache=cache)
            #     attn = ops.dropout(attn, p=self.dropout, training=self.training)
            #     tensor = tensor + attn
            #     tensor = self.layer_norm15[i](tensor)

            # FFN
            tensor = tensor + self.ffns[i](tensor)
            tensor = self.layer_norm2[i](tensor)
            tensor *= mask.unsqueeze(-1).to(tensor.dtype)

        # Add last hidden state
        if output_hidden_states:
            hidden_states = hidden_states + (tensor,)

        # update cache length
        if cache is not None:
            cache["slen"] += tensor.shape[1]

        # move back sequence length to dimension 0
        # tensor = tensor.swapaxes(0, 1)

        if not return_dict:
            return tuple(v for v in [tensor, hidden_states, attentions] if v is not None)
        return BaseModelOutput(last_hidden_state=tensor, hidden_states=hidden_states, attentions=attentions)

mindnlp.transformers.models.xlm.modeling_xlm.XLMModel.__init__(config)

This method initializes an instance of the XLMModel class with the provided configuration.

PARAMETER DESCRIPTION
self

The instance of the XLMModel class.

config

An object containing configuration parameters for the XLMModel.

  • Type: object
  • Purpose: Specifies the configuration settings for the XLMModel.
  • Restrictions: Must be a valid configuration object.

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
NotImplementedError

If the provided configuration indicates that the XLMModel is used as a decoder, since XLM can only be used as an encoder.

AssertionError

If the transformer dimension is not a multiple of the number of heads.

Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
def __init__(self, config):
    """
    This method initializes an instance of the XLMModel class with the provided configuration.

    Args:
        self: The instance of the XLMModel class.
        config:
            An object containing configuration parameters for the XLMModel.

            - Type: object
            - Purpose: Specifies the configuration settings for the XLMModel.
            - Restrictions: Must be a valid configuration object.

    Returns:
        None.

    Raises:
        NotImplementedError: If the provided configuration indicates that the XLMModel is used as a decoder,
            since XLM can only be used as an encoder.
        AssertionError: If the transformer dimension is not a multiple of the number of heads.

    """
    super().__init__(config)

    # encoder / decoder, output layer
    self.is_encoder = config.is_encoder
    self.is_decoder = not config.is_encoder
    if self.is_decoder:
        raise NotImplementedError("Currently XLM can only be used as an encoder")
    # self.with_output = with_output
    self.causal = config.causal

    # dictionary / languages
    self.n_langs = config.n_langs
    self.use_lang_emb = config.use_lang_emb
    self.n_words = config.n_words
    self.eos_index = config.eos_index
    self.pad_index = config.pad_index
    # self.dico = dico
    # self.id2lang = config.id2lang
    # self.lang2id = config.lang2id
    # assert len(self.dico) == self.n_words
    # assert len(self.id2lang) == len(self.lang2id) == self.n_langs

    # model parameters
    self.dim = config.emb_dim  # 512 by default
    self.hidden_dim = self.dim * 4  # 2048 by default
    self.n_heads = config.n_heads  # 8 by default
    self.n_layers = config.n_layers
    self.dropout = config.dropout
    self.attention_dropout = config.attention_dropout
    assert self.dim % self.n_heads == 0, "transformer dim must be a multiple of n_heads"

    # embeddings
    self.position_embeddings = nn.Embedding(config.max_position_embeddings, self.dim)
    if config.sinusoidal_embeddings:
        create_sinusoidal_embeddings(config.max_position_embeddings, self.dim, out=self.position_embeddings.weight)
    if config.n_langs > 1 and config.use_lang_emb:
        self.lang_embeddings = nn.Embedding(self.n_langs, self.dim)
    self.embeddings = nn.Embedding(self.n_words, self.dim, padding_idx=self.pad_index)
    self.layer_norm_emb = nn.LayerNorm([self.dim], eps=config.layer_norm_eps)

    # transformer layers
    attentions = []
    layer_norm1 = []
    ffns = []
    layer_norm2 = []
    # if self.is_decoder:
    #     self.layer_norm15 = nn.ModuleList()
    #     self.encoder_attn = nn.ModuleList()

    for _ in range(self.n_layers):
        attentions.append(MultiHeadAttention(self.n_heads, self.dim, config=config))
        layer_norm1.append(nn.LayerNorm([self.dim], eps=config.layer_norm_eps))
        # if self.is_decoder:
        #     self.layer_norm15.append(nn.LayerNorm(self.dim, eps=config.layer_norm_eps))
        #     self.encoder_attn.append(MultiHeadAttention(self.n_heads, self.dim, dropout=self.attention_dropout))
        ffns.append(TransformerFFN(self.dim, self.hidden_dim, self.dim, config=config))
        layer_norm2.append(nn.LayerNorm([self.dim], eps=config.layer_norm_eps))

    self.attentions = nn.ModuleList(attentions)
    self.layer_norm1 = nn.ModuleList(layer_norm1)
    self.ffns = nn.ModuleList(ffns)
    self.layer_norm2 = nn.ModuleList(layer_norm2)

    if hasattr(config, "pruned_heads"):
        pruned_heads = config.pruned_heads.copy().items()
        config.pruned_heads = {}
        for layer, heads in pruned_heads:
            if self.attentions[int(layer)].n_heads == config.n_heads:
                self.prune_heads({int(layer): list(map(int, heads))})

    # Initialize weights and apply final processing
    self.post_init()
    self.position_ids = ops.arange(config.max_position_embeddings).broadcast_to((1, -1))

mindnlp.transformers.models.xlm.modeling_xlm.XLMModel.forward(input_ids=None, attention_mask=None, langs=None, token_type_ids=None, position_ids=None, lengths=None, cache=None, head_mask=None, inputs_embeds=None, output_attentions=None, output_hidden_states=None, return_dict=None)

Constructs the XLM model.

PARAMETER DESCRIPTION
self

The object itself.

input_ids

The input tensor of shape (batch_size, sequence_length).

TYPE: Optional[Tensor] DEFAULT: None

attention_mask

The attention mask tensor of shape (batch_size, sequence_length).

TYPE: Optional[Tensor] DEFAULT: None

langs

The language tensor of shape (batch_size, sequence_length).

TYPE: Optional[Tensor] DEFAULT: None

token_type_ids

The token type tensor of shape (batch_size, sequence_length).

TYPE: Optional[Tensor] DEFAULT: None

position_ids

The position tensor of shape (batch_size, sequence_length).

TYPE: Optional[Tensor] DEFAULT: None

lengths

The lengths tensor of shape (batch_size,).

TYPE: Optional[Tensor] DEFAULT: None

cache

The cache tensor.

TYPE: Optional[Dict[str, Tensor]] DEFAULT: None

head_mask

The head mask tensor.

TYPE: Optional[Tensor] DEFAULT: None

inputs_embeds

The input embeddings tensor of shape (batch_size, sequence_length, embedding_size).

TYPE: Optional[Tensor] DEFAULT: None

output_attentions

Whether to output attentions.

TYPE: Optional[bool] DEFAULT: None

output_hidden_states

Whether to output hidden states.

TYPE: Optional[bool] DEFAULT: None

return_dict

Whether to return a dictionary.

TYPE: Optional[bool] DEFAULT: None

RETURNS DESCRIPTION
Union[Tuple, BaseModelOutput]

Union[Tuple, BaseModelOutput]: The model output, which can be a tuple of tensors or a BaseModelOutput object.

RAISES DESCRIPTION
AssertionError

If the lengths tensor shape does not match the batch size or if the maximum length in the lengths tensor exceeds the sequence length.

AssertionError

If the position_ids tensor shape does not match the input tensor shape.

AssertionError

If the langs tensor shape does not match the input tensor shape.

Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    langs: Optional[mindspore.Tensor] = None,
    token_type_ids: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    lengths: Optional[mindspore.Tensor] = None,
    cache: Optional[Dict[str, mindspore.Tensor]] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[Tuple, BaseModelOutput]:
    '''
    Constructs the XLM model.

    Args:
        self: The object itself.
        input_ids (Optional[mindspore.Tensor]): The input tensor of shape (batch_size, sequence_length).
        attention_mask (Optional[mindspore.Tensor]): The attention mask tensor of shape (batch_size, sequence_length).
        langs (Optional[mindspore.Tensor]): The language tensor of shape (batch_size, sequence_length).
        token_type_ids (Optional[mindspore.Tensor]): The token type tensor of shape (batch_size, sequence_length).
        position_ids (Optional[mindspore.Tensor]): The position tensor of shape (batch_size, sequence_length).
        lengths (Optional[mindspore.Tensor]): The lengths tensor of shape (batch_size,).
        cache (Optional[Dict[str, mindspore.Tensor]]): The cache tensor.
        head_mask (Optional[mindspore.Tensor]): The head mask tensor.
        inputs_embeds (Optional[mindspore.Tensor]): The input embeddings tensor of shape
            (batch_size, sequence_length, embedding_size).
        output_attentions (Optional[bool]): Whether to output attentions.
        output_hidden_states (Optional[bool]): Whether to output hidden states.
        return_dict (Optional[bool]): Whether to return a dictionary.

    Returns:
        Union[Tuple, BaseModelOutput]: The model output, which can be a tuple of tensors or a BaseModelOutput object.

    Raises:
        AssertionError: If the lengths tensor shape does not match the batch size or if the maximum length in the
            lengths tensor exceeds the sequence length.
        AssertionError: If the position_ids tensor shape does not match the input tensor shape.
        AssertionError: If the langs tensor shape does not match the input tensor shape.
    '''
    output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
    output_hidden_states = (
        output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
    )
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    if input_ids is not None:
        bs, slen = input_ids.shape
    else:
        bs, slen = inputs_embeds.shape[:-1]

    if lengths is None:
        if input_ids is not None:
            lengths = (input_ids != self.pad_index).sum(axis=1).long()
        else:
            lengths = mindspore.tensor([slen] * bs)
    # mask = input_ids != self.pad_index

    # check inputs
    assert lengths.shape[0] == bs
    assert lengths.max().item() <= slen
    # input_ids = input_ids.swapaxes(0, 1)  # batch size as dimension 0
    # assert (src_enc is None) == (src_len is None)
    # if src_enc is not None:
    #     assert self.is_decoder
    #     assert src_enc.shape[0] == bs

    # generate masks
    mask, attn_mask = get_masks(slen, lengths, self.causal, padding_mask=attention_mask)
    # if self.is_decoder and src_enc is not None:
    #     src_mask = ops.arange(src_len.max(), dtype=mindspore.int64) < src_len[:, None]

    # position_ids
    if position_ids is None:
        position_ids = self.position_ids[:, :slen]
    else:
        assert position_ids.shape == (bs, slen)  # (slen, bs)
        # position_ids = position_ids.swapaxes(0, 1)

    # langs
    if langs is not None:
        assert langs.shape == (bs, slen)  # (slen, bs)
        # langs = langs.swapaxes(0, 1)

    # Prepare head mask if needed
    head_mask = self.get_head_mask(head_mask, self.config.n_layers)

    # do not recompute cached elements
    if cache is not None and input_ids is not None:
        _slen = slen - cache["slen"]
        input_ids = input_ids[:, -_slen:]
        position_ids = position_ids[:, -_slen:]
        if langs is not None:
            langs = langs[:, -_slen:]
        mask = mask[:, -_slen:]
        attn_mask = attn_mask[:, -_slen:]

    # embeddings
    if inputs_embeds is None:
        inputs_embeds = self.embeddings(input_ids)

    tensor = inputs_embeds + self.position_embeddings(position_ids).expand_as(inputs_embeds)
    if langs is not None and self.use_lang_emb and self.n_langs > 1:
        tensor = tensor + self.lang_embeddings(langs)
    if token_type_ids is not None:
        tensor = tensor + self.embeddings(token_type_ids)
    tensor = self.layer_norm_emb(tensor)
    tensor = ops.dropout(tensor, p=self.dropout, training=self.training)
    tensor *= mask.unsqueeze(-1).to(tensor.dtype)

    # transformer layers
    hidden_states = () if output_hidden_states else None
    attentions = () if output_attentions else None
    for i in range(self.n_layers):
        if output_hidden_states:
            hidden_states = hidden_states + (tensor,)

        # self attention
        attn_outputs = self.attentions[i](
            tensor,
            attn_mask,
            cache=cache,
            head_mask=head_mask[i],
            output_attentions=output_attentions,
        )
        attn = attn_outputs[0]
        if output_attentions:
            attentions = attentions + (attn_outputs[1],)
        attn = ops.dropout(attn, p=self.dropout, training=self.training)
        tensor = tensor + attn
        tensor = self.layer_norm1[i](tensor)

        # encoder attention (for decoder only)
        # if self.is_decoder and src_enc is not None:
        #     attn = self.encoder_attn[i](tensor, src_mask, kv=src_enc, cache=cache)
        #     attn = ops.dropout(attn, p=self.dropout, training=self.training)
        #     tensor = tensor + attn
        #     tensor = self.layer_norm15[i](tensor)

        # FFN
        tensor = tensor + self.ffns[i](tensor)
        tensor = self.layer_norm2[i](tensor)
        tensor *= mask.unsqueeze(-1).to(tensor.dtype)

    # Add last hidden state
    if output_hidden_states:
        hidden_states = hidden_states + (tensor,)

    # update cache length
    if cache is not None:
        cache["slen"] += tensor.shape[1]

    # move back sequence length to dimension 0
    # tensor = tensor.swapaxes(0, 1)

    if not return_dict:
        return tuple(v for v in [tensor, hidden_states, attentions] if v is not None)
    return BaseModelOutput(last_hidden_state=tensor, hidden_states=hidden_states, attentions=attentions)

mindnlp.transformers.models.xlm.modeling_xlm.XLMModel.get_input_embeddings()

Retrieve the input embeddings from the XLMModel.

PARAMETER DESCRIPTION
self

An instance of the XLMModel class.

TYPE: XLMModel

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
598
599
600
601
602
603
604
605
606
607
608
609
610
611
def get_input_embeddings(self):
    """
    Retrieve the input embeddings from the XLMModel.

    Args:
        self (XLMModel): An instance of the XLMModel class.

    Returns:
        None.

    Raises:
        None.
    """
    return self.embeddings

mindnlp.transformers.models.xlm.modeling_xlm.XLMModel.set_input_embeddings(new_embeddings)

Set the input embeddings for the XLMModel.

This method sets the input embeddings for the XLMModel using the given new_embeddings.

PARAMETER DESCRIPTION
self

The instance of the XLMModel class.

TYPE: XLMModel

new_embeddings

The new embeddings to set for the XLMModel. It can be of any type.

TYPE: Any

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
def set_input_embeddings(self, new_embeddings):
    """Set the input embeddings for the XLMModel.

    This method sets the input embeddings for the XLMModel using the given new_embeddings.

    Args:
        self (XLMModel): The instance of the XLMModel class.
        new_embeddings (Any): The new embeddings to set for the XLMModel. It can be of any type.

    Returns:
        None.

    Raises:
        None.

    """
    self.embeddings = new_embeddings

mindnlp.transformers.models.xlm.modeling_xlm.XLMPreTrainedModel

Bases: PreTrainedModel

An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained models.

Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
class XLMPreTrainedModel(PreTrainedModel):
    """
    An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
    models.
    """
    config_class = XLMConfig
    load_tf_weights = None
    base_model_prefix = "transformer"

    @property
    def dummy_inputs(self):
        """
        Generates dummy inputs for the XLMPreTrainedModel.

        Args:
            self: An instance of the XLMPreTrainedModel class.

        Returns:
            dict: A dictionary containing the dummy inputs for the model.
                The dictionary has the following keys:

                - 'input_ids': A tensor representing the input sequences. The shape of the tensor is
                (num_sequences, sequence_length), where num_sequences is the number of input sequences and
                sequence_length is the maximum length of any sequence.
                - 'attention_mask': A tensor representing the attention mask for the input sequences.
                The shape of the tensor is the same as 'input_ids' and contains 0s and 1s, where 0 indicates padding and
                1 indicates a valid token.
                - 'langs': A tensor representing the language embeddings for the input sequences.
                The shape of the tensor is the same as 'input_ids'. If the model is configured to use language
                embeddings and there are multiple languages,  the tensor contains language embeddings for each token.
                Otherwise, it is set to None.

        Raises:
            None.
        """
        inputs_list = mindspore.tensor([[7, 6, 0, 0, 1], [1, 2, 3, 0, 0], [0, 0, 0, 4, 5]])
        attns_list = mindspore.tensor([[1, 1, 0, 0, 1], [1, 1, 1, 0, 0], [1, 0, 0, 1, 1]])
        if self.config.use_lang_emb and self.config.n_langs > 1:
            langs_list = mindspore.tensor([[1, 1, 0, 0, 1], [1, 1, 1, 0, 0], [1, 0, 0, 1, 1]])
        else:
            langs_list = None
        return {"input_ids": inputs_list, "attention_mask": attns_list, "langs": langs_list}

    def _init_weights(self, cell):
        """Initialize the weights."""
        if isinstance(cell, nn.Embedding):
            if self.config is not None and self.config.embed_init_std is not None:
                weight = np.random.normal(0.0, self.config.embed_init_std, cell.weight.shape)
                if cell.padding_idx:
                    weight[cell.padding_idx] = 0

                cell.weight.set_data(Tensor(weight, cell.weight.dtype))
        elif isinstance(cell, nn.Linear):
            if self.config is not None and self.config.init_std is not None:
                cell.weight.set_data(initializer(Normal(self.config.init_std),
                                                        cell.weight.shape, cell.weight.dtype))
                if cell.bias:
                    cell.bias.set_data(initializer('zeros', cell.bias.shape, cell.bias.dtype))

        if isinstance(cell, nn.LayerNorm):
            cell.weight.set_data(initializer('ones', cell.weight.shape, cell.weight.dtype))
            cell.bias.set_data(initializer('zeros', cell.bias.shape, cell.bias.dtype))

mindnlp.transformers.models.xlm.modeling_xlm.XLMPreTrainedModel.dummy_inputs property

Generates dummy inputs for the XLMPreTrainedModel.

PARAMETER DESCRIPTION
self

An instance of the XLMPreTrainedModel class.

RETURNS DESCRIPTION
dict

A dictionary containing the dummy inputs for the model. The dictionary has the following keys:

  • 'input_ids': A tensor representing the input sequences. The shape of the tensor is (num_sequences, sequence_length), where num_sequences is the number of input sequences and sequence_length is the maximum length of any sequence.
  • 'attention_mask': A tensor representing the attention mask for the input sequences. The shape of the tensor is the same as 'input_ids' and contains 0s and 1s, where 0 indicates padding and 1 indicates a valid token.
  • 'langs': A tensor representing the language embeddings for the input sequences. The shape of the tensor is the same as 'input_ids'. If the model is configured to use language embeddings and there are multiple languages, the tensor contains language embeddings for each token. Otherwise, it is set to None.

mindnlp.transformers.models.xlm.modeling_xlm.XLMPredLayer

Bases: Module

Prediction layer (cross_entropy or adaptive_softmax).

Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
class XLMPredLayer(nn.Module):
    """
    Prediction layer (cross_entropy or adaptive_softmax).
    """
    def __init__(self, config):
        """
        Initialize the XLMPredLayer class.

        Args:
            self: The instance of the XLMPredLayer class.
            config:
                A configuration object containing the following attributes:

                - asm (bool): Indicates whether to use Adaptive Softmax. If False, a Dense layer will be used.
                - n_words (int): Number of words in the vocabulary.
                - pad_index (int): Index of the padding token.
                - emb_dim (int): Dimension of the embedding.
                - asm_cutoffs (list of int): Cutoffs for Adaptive Softmax.
                - asm_div_value (float): Divisor value for Adaptive Softmax.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()
        self.asm = config.asm
        self.n_words = config.n_words
        self.pad_index = config.pad_index
        dim = config.emb_dim

        if config.asm is False:
            self.proj = nn.Linear(dim, config.n_words, bias=True)
        else:
            self.proj = nn.AdaptiveLogSoftmaxWithLoss(
                in_features=dim,
                n_classes=config.n_words,
                cutoffs=config.asm_cutoffs,
                div_value=config.asm_div_value,
                head_bias=True,  # default is False
            )

    def forward(self, x, y=None):
        """Compute the loss, and optionally the scores."""
        outputs = ()
        if self.asm is False:
            scores = self.proj(x)
            outputs = (scores,) + outputs
            if y is not None:
                loss = ops.cross_entropy(scores.view(-1, self.n_words), y.view(-1), reduction="mean")
                outputs = (loss,) + outputs
        else:
            scores = self.proj.log_prob(x)
            outputs = (scores,) + outputs
            if y is not None:
                _, loss = self.proj(x, y)
                outputs = (loss,) + outputs

        return outputs

mindnlp.transformers.models.xlm.modeling_xlm.XLMPredLayer.__init__(config)

Initialize the XLMPredLayer class.

PARAMETER DESCRIPTION
self

The instance of the XLMPredLayer class.

config

A configuration object containing the following attributes:

  • asm (bool): Indicates whether to use Adaptive Softmax. If False, a Dense layer will be used.
  • n_words (int): Number of words in the vocabulary.
  • pad_index (int): Index of the padding token.
  • emb_dim (int): Dimension of the embedding.
  • asm_cutoffs (list of int): Cutoffs for Adaptive Softmax.
  • asm_div_value (float): Divisor value for Adaptive Softmax.

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
def __init__(self, config):
    """
    Initialize the XLMPredLayer class.

    Args:
        self: The instance of the XLMPredLayer class.
        config:
            A configuration object containing the following attributes:

            - asm (bool): Indicates whether to use Adaptive Softmax. If False, a Dense layer will be used.
            - n_words (int): Number of words in the vocabulary.
            - pad_index (int): Index of the padding token.
            - emb_dim (int): Dimension of the embedding.
            - asm_cutoffs (list of int): Cutoffs for Adaptive Softmax.
            - asm_div_value (float): Divisor value for Adaptive Softmax.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()
    self.asm = config.asm
    self.n_words = config.n_words
    self.pad_index = config.pad_index
    dim = config.emb_dim

    if config.asm is False:
        self.proj = nn.Linear(dim, config.n_words, bias=True)
    else:
        self.proj = nn.AdaptiveLogSoftmaxWithLoss(
            in_features=dim,
            n_classes=config.n_words,
            cutoffs=config.asm_cutoffs,
            div_value=config.asm_div_value,
            head_bias=True,  # default is False
        )

mindnlp.transformers.models.xlm.modeling_xlm.XLMPredLayer.forward(x, y=None)

Compute the loss, and optionally the scores.

Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
def forward(self, x, y=None):
    """Compute the loss, and optionally the scores."""
    outputs = ()
    if self.asm is False:
        scores = self.proj(x)
        outputs = (scores,) + outputs
        if y is not None:
            loss = ops.cross_entropy(scores.view(-1, self.n_words), y.view(-1), reduction="mean")
            outputs = (loss,) + outputs
    else:
        scores = self.proj.log_prob(x)
        outputs = (scores,) + outputs
        if y is not None:
            _, loss = self.proj(x, y)
            outputs = (loss,) + outputs

    return outputs

mindnlp.transformers.models.xlm.modeling_xlm.XLMWithLMHeadModel

Bases: XLMPreTrainedModel

XLMWithLMHeadModel represents a transformer model with a language modeling head based on the XLM (Cross-lingual Language Model) architecture.

This class inherits from XLMPreTrainedModel and provides methods for initializing the model, getting and setting output embeddings, preparing inputs for generation, and forwarding the model for language modeling tasks.

ATTRIBUTE DESCRIPTION
transformer

The XLMModel instance used for the transformer architecture.

TYPE: XLMModel

pred_layer

The XLMPredLayer instance used for the language modeling head.

TYPE: XLMPredLayer

METHOD DESCRIPTION
__init__

Initializes the XLMWithLMHeadModel instance with the given configuration.

get_output_embeddings

Returns the output embeddings from the language modeling head.

set_output_embeddings

Sets new output embeddings for the language modeling head.

prepare_inputs_for_generation

Prepares input tensors for language generation tasks.

forward

Constructs the model for language modeling tasks and returns the masked language model output.

Note

The forward method includes detailed documentation for its parameters and return value, including optional and shifted labels for language modeling.

Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
 864
 865
 866
 867
 868
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
class XLMWithLMHeadModel(XLMPreTrainedModel):

    """
    XLMWithLMHeadModel represents a transformer model with a language modeling head based on the XLM
    (Cross-lingual Language Model) architecture.

    This class inherits from XLMPreTrainedModel and provides methods for initializing the model, getting and setting
    output embeddings, preparing inputs for generation, and forwarding the model for language modeling tasks.

    Attributes:
        transformer (XLMModel): The XLMModel instance used for the transformer architecture.
        pred_layer (XLMPredLayer): The XLMPredLayer instance used for the language modeling head.

    Methods:
        __init__: Initializes the XLMWithLMHeadModel instance with the given configuration.
        get_output_embeddings: Returns the output embeddings from the language modeling head.
        set_output_embeddings: Sets new output embeddings for the language modeling head.
        prepare_inputs_for_generation: Prepares input tensors for language generation tasks.
        forward: Constructs the model for language modeling tasks and returns the masked language model output.

    Note:
        The forward method includes detailed documentation for its parameters and return value, including optional
        and shifted labels for language modeling.
    """
    _tied_weights_keys = ["pred_layer.proj.weight"]

    def __init__(self, config):
        """
        Initializes a new instance of the XLMWithLMHeadModel class.

        Args:
            self (XLMWithLMHeadModel): The current instance of the XLMWithLMHeadModel class.
            config: The configuration object for the model.

        Returns:
            None

        Raises:
            None
        """
        super().__init__(config)
        self.transformer = XLMModel(config)
        self.pred_layer = XLMPredLayer(config)

        # Initialize weights and apply final processing
        self.post_init()

    def get_output_embeddings(self):
        """
        Returns the output embeddings of the XLMWithLMHeadModel.

        Args:
            self (XLMWithLMHeadModel): The instance of the XLMWithLMHeadModel class.

        Returns:
            None.

        Raises:
            None.
        """
        return self.pred_layer.proj

    def set_output_embeddings(self, new_embeddings):
        """
        Method to set new output embeddings for the XLM model with a language modeling head.

        Args:
            self (XLMWithLMHeadModel): The instance of the XLMWithLMHeadModel class.
            new_embeddings (torch.nn.Embedding): The new embeddings to be set as the output embeddings.
                This parameter should be an instance of torch.nn.Embedding class representing the new embeddings.

        Returns:
            None:
                This method does not return any value explicitly but updates the output embeddings of the model in-place.

        Raises:
            TypeError: If the new_embeddings parameter is not an instance of torch.nn.Embedding.
            ValueError: If the shape or type of the new_embeddings parameter is not compatible with the model's
                requirements.
        """
        self.pred_layer.proj = new_embeddings

    def prepare_inputs_for_generation(self, input_ids, **kwargs):
        """
        Prepare the inputs for generation in XLMWithLMHeadModel.

        Args:
            self: The instance of the XLMWithLMHeadModel class.
            input_ids (Tensor): The input tensor containing token IDs. Shape (batch_size, sequence_length).

        Returns:
            dict:
                A dictionary containing the prepared inputs for generation.

                - 'input_ids' (Tensor): The input tensor with additional mask token appended.
                Shape (batch_size, sequence_length + 1).
                - 'langs' (Tensor or None): The tensor specifying the language IDs for each token,
                or None if lang_id is not provided.

        Raises:
            ValueError: If the input_ids tensor is not valid or if an error occurs during tensor operations.
            TypeError: If the input_ids tensor is not of type Tensor.
        """
        mask_token_id = self.config.mask_token_id
        lang_id = self.config.lang_id

        effective_batch_size = input_ids.shape[0]
        mask_token = ops.full((effective_batch_size, 1), mask_token_id, dtype=mindspore.int64)
        input_ids = ops.cat([input_ids, mask_token], axis=1)
        if lang_id is not None:
            langs = ops.full_like(input_ids, lang_id)
        else:
            langs = None
        return {"input_ids": input_ids, "langs": langs}

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        langs: Optional[mindspore.Tensor] = None,
        token_type_ids: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        lengths: Optional[mindspore.Tensor] = None,
        cache: Optional[Dict[str, mindspore.Tensor]] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        labels: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple, MaskedLMOutput]:
        r"""
        Args:
            labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
                Labels for language modeling. Note that the labels **are shifted** inside the model, i.e. you can set
                `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100`
                are ignored (masked), the loss is only computed for labels in `[0, ..., config.vocab_size]`
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        transformer_outputs = self.transformer(
            input_ids,
            attention_mask=attention_mask,
            langs=langs,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            lengths=lengths,
            cache=cache,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        output = transformer_outputs[0]
        outputs = self.pred_layer(output, labels)  # (loss, logits) or (logits,) depending on if labels are provided.

        if not return_dict:
            return outputs + transformer_outputs[1:]

        return MaskedLMOutput(
            loss=outputs[0] if labels is not None else None,
            logits=outputs[0] if labels is None else outputs[1],
            hidden_states=transformer_outputs.hidden_states,
            attentions=transformer_outputs.attentions,
        )

mindnlp.transformers.models.xlm.modeling_xlm.XLMWithLMHeadModel.__init__(config)

Initializes a new instance of the XLMWithLMHeadModel class.

PARAMETER DESCRIPTION
self

The current instance of the XLMWithLMHeadModel class.

TYPE: XLMWithLMHeadModel

config

The configuration object for the model.

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
def __init__(self, config):
    """
    Initializes a new instance of the XLMWithLMHeadModel class.

    Args:
        self (XLMWithLMHeadModel): The current instance of the XLMWithLMHeadModel class.
        config: The configuration object for the model.

    Returns:
        None

    Raises:
        None
    """
    super().__init__(config)
    self.transformer = XLMModel(config)
    self.pred_layer = XLMPredLayer(config)

    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.xlm.modeling_xlm.XLMWithLMHeadModel.forward(input_ids=None, attention_mask=None, langs=None, token_type_ids=None, position_ids=None, lengths=None, cache=None, head_mask=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=None)

PARAMETER DESCRIPTION
labels

Labels for language modeling. Note that the labels are shifted inside the model, i.e. you can set labels = input_ids Indices are selected in [-100, 0, ..., config.vocab_size] All labels set to -100 are ignored (masked), the loss is only computed for labels in [0, ..., config.vocab_size]

TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    langs: Optional[mindspore.Tensor] = None,
    token_type_ids: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    lengths: Optional[mindspore.Tensor] = None,
    cache: Optional[Dict[str, mindspore.Tensor]] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    labels: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[Tuple, MaskedLMOutput]:
    r"""
    Args:
        labels (`mindspore.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
            Labels for language modeling. Note that the labels **are shifted** inside the model, i.e. you can set
            `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100`
            are ignored (masked), the loss is only computed for labels in `[0, ..., config.vocab_size]`
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    transformer_outputs = self.transformer(
        input_ids,
        attention_mask=attention_mask,
        langs=langs,
        token_type_ids=token_type_ids,
        position_ids=position_ids,
        lengths=lengths,
        cache=cache,
        head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    output = transformer_outputs[0]
    outputs = self.pred_layer(output, labels)  # (loss, logits) or (logits,) depending on if labels are provided.

    if not return_dict:
        return outputs + transformer_outputs[1:]

    return MaskedLMOutput(
        loss=outputs[0] if labels is not None else None,
        logits=outputs[0] if labels is None else outputs[1],
        hidden_states=transformer_outputs.hidden_states,
        attentions=transformer_outputs.attentions,
    )

mindnlp.transformers.models.xlm.modeling_xlm.XLMWithLMHeadModel.get_output_embeddings()

Returns the output embeddings of the XLMWithLMHeadModel.

PARAMETER DESCRIPTION
self

The instance of the XLMWithLMHeadModel class.

TYPE: XLMWithLMHeadModel

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
911
912
913
914
915
916
917
918
919
920
921
922
923
924
def get_output_embeddings(self):
    """
    Returns the output embeddings of the XLMWithLMHeadModel.

    Args:
        self (XLMWithLMHeadModel): The instance of the XLMWithLMHeadModel class.

    Returns:
        None.

    Raises:
        None.
    """
    return self.pred_layer.proj

mindnlp.transformers.models.xlm.modeling_xlm.XLMWithLMHeadModel.prepare_inputs_for_generation(input_ids, **kwargs)

Prepare the inputs for generation in XLMWithLMHeadModel.

PARAMETER DESCRIPTION
self

The instance of the XLMWithLMHeadModel class.

input_ids

The input tensor containing token IDs. Shape (batch_size, sequence_length).

TYPE: Tensor

RETURNS DESCRIPTION
dict

A dictionary containing the prepared inputs for generation.

  • 'input_ids' (Tensor): The input tensor with additional mask token appended. Shape (batch_size, sequence_length + 1).
  • 'langs' (Tensor or None): The tensor specifying the language IDs for each token, or None if lang_id is not provided.
RAISES DESCRIPTION
ValueError

If the input_ids tensor is not valid or if an error occurs during tensor operations.

TypeError

If the input_ids tensor is not of type Tensor.

Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
def prepare_inputs_for_generation(self, input_ids, **kwargs):
    """
    Prepare the inputs for generation in XLMWithLMHeadModel.

    Args:
        self: The instance of the XLMWithLMHeadModel class.
        input_ids (Tensor): The input tensor containing token IDs. Shape (batch_size, sequence_length).

    Returns:
        dict:
            A dictionary containing the prepared inputs for generation.

            - 'input_ids' (Tensor): The input tensor with additional mask token appended.
            Shape (batch_size, sequence_length + 1).
            - 'langs' (Tensor or None): The tensor specifying the language IDs for each token,
            or None if lang_id is not provided.

    Raises:
        ValueError: If the input_ids tensor is not valid or if an error occurs during tensor operations.
        TypeError: If the input_ids tensor is not of type Tensor.
    """
    mask_token_id = self.config.mask_token_id
    lang_id = self.config.lang_id

    effective_batch_size = input_ids.shape[0]
    mask_token = ops.full((effective_batch_size, 1), mask_token_id, dtype=mindspore.int64)
    input_ids = ops.cat([input_ids, mask_token], axis=1)
    if lang_id is not None:
        langs = ops.full_like(input_ids, lang_id)
    else:
        langs = None
    return {"input_ids": input_ids, "langs": langs}

mindnlp.transformers.models.xlm.modeling_xlm.XLMWithLMHeadModel.set_output_embeddings(new_embeddings)

Method to set new output embeddings for the XLM model with a language modeling head.

PARAMETER DESCRIPTION
self

The instance of the XLMWithLMHeadModel class.

TYPE: XLMWithLMHeadModel

new_embeddings

The new embeddings to be set as the output embeddings. This parameter should be an instance of torch.nn.Embedding class representing the new embeddings.

TYPE: Embedding

RETURNS DESCRIPTION
None

This method does not return any value explicitly but updates the output embeddings of the model in-place.

RAISES DESCRIPTION
TypeError

If the new_embeddings parameter is not an instance of torch.nn.Embedding.

ValueError

If the shape or type of the new_embeddings parameter is not compatible with the model's requirements.

Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
def set_output_embeddings(self, new_embeddings):
    """
    Method to set new output embeddings for the XLM model with a language modeling head.

    Args:
        self (XLMWithLMHeadModel): The instance of the XLMWithLMHeadModel class.
        new_embeddings (torch.nn.Embedding): The new embeddings to be set as the output embeddings.
            This parameter should be an instance of torch.nn.Embedding class representing the new embeddings.

    Returns:
        None:
            This method does not return any value explicitly but updates the output embeddings of the model in-place.

    Raises:
        TypeError: If the new_embeddings parameter is not an instance of torch.nn.Embedding.
        ValueError: If the shape or type of the new_embeddings parameter is not compatible with the model's
            requirements.
    """
    self.pred_layer.proj = new_embeddings

mindnlp.transformers.models.xlm.modeling_xlm.create_sinusoidal_embeddings(n_pos, dim, out)

Creates sinusoidal embeddings for positional encoding.

PARAMETER DESCRIPTION
n_pos

The number of positions to be encoded.

TYPE: int

dim

The dimension of the embeddings.

TYPE: int

out

The output tensor to store the sinusoidal embeddings.

TYPE: Tensor

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
def create_sinusoidal_embeddings(n_pos, dim, out):
    """
    Creates sinusoidal embeddings for positional encoding.

    Args:
        n_pos (int): The number of positions to be encoded.
        dim (int): The dimension of the embeddings.
        out (Tensor): The output tensor to store the sinusoidal embeddings.

    Returns:
        None.

    Raises:
        None.
    """
    position_enc = np.array([[pos / np.power(10000, 2 * (j // 2) / dim) for j in range(dim)] for pos in range(n_pos)])
    out[:, 0::2] = mindspore.Tensor(np.sin(position_enc[:, 0::2]))
    out[:, 1::2] = mindspore.Tensor(np.cos(position_enc[:, 1::2]))

mindnlp.transformers.models.xlm.modeling_xlm.get_masks(slen, lengths, causal, padding_mask=None)

Generate hidden states mask, and optionally an attention mask.

Source code in mindnlp/transformers/models/xlm/modeling_xlm.py
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
def get_masks(slen, lengths, causal, padding_mask=None):
    """
    Generate hidden states mask, and optionally an attention mask.
    """
    alen = ops.arange(slen, dtype=mindspore.int64)
    if padding_mask is not None:
        mask = padding_mask
    else:
        assert lengths.max().item() <= slen
        mask = alen < lengths[:, None]

    # attention mask is the same as mask, or triangular inferior attention (causal)
    bs = lengths.shape[0]
    if causal:
        attn_mask = alen[None, None, :].tile((bs, slen, 1)) <= alen[None, :, None]
    else:
        attn_mask = mask

    # sanity check
    assert mask.shape == (bs, slen)
    assert causal is False or attn_mask.shape == (bs, slen, slen)

    return mask, attn_mask

mindnlp.transformers.models.xlm.configuration_xlm

XLM configuration

mindnlp.transformers.models.xlm.configuration_xlm.XLMConfig

Bases: PretrainedConfig

This is the configuration class to store the configuration of a [XLMModel] or a [TFXLMModel]. It is used to instantiate a XLM model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the xlm-mlm-en-2048 architecture.

Configuration objects inherit from [PretrainedConfig] and can be used to control the model outputs. Read the documentation from [PretrainedConfig] for more information.

PARAMETER DESCRIPTION
vocab_size

Vocabulary size of the BERT model. Defines the number of different tokens that can be represented by the inputs_ids passed when calling [XLMModel] or [TFXLMModel].

TYPE: `int`, *optional*, defaults to 30145 DEFAULT: 30145

emb_dim

Dimensionality of the encoder layers and the pooler layer.

TYPE: `int`, *optional*, defaults to 2048 DEFAULT: 2048

n_layer

Number of hidden layers in the Transformer encoder.

TYPE: `int`, *optional*, defaults to 12

n_head

Number of attention heads for each attention layer in the Transformer encoder.

TYPE: `int`, *optional*, defaults to 16

dropout

The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.

TYPE: `float`, *optional*, defaults to 0.1 DEFAULT: 0.1

attention_dropout

The dropout probability for the attention mechanism

TYPE: `float`, *optional*, defaults to 0.1 DEFAULT: 0.1

gelu_activation

Whether or not to use gelu for the activations instead of relu.

TYPE: `bool`, *optional*, defaults to `True` DEFAULT: True

sinusoidal_embeddings

Whether or not to use sinusoidal positional embeddings instead of absolute positional embeddings.

TYPE: `bool`, *optional*, defaults to `False` DEFAULT: False

causal

Whether or not the model should behave in a causal manner. Causal models use a triangular attention mask in order to only attend to the left-side context instead if a bidirectional context.

TYPE: `bool`, *optional*, defaults to `False` DEFAULT: False

asm

Whether or not to use an adaptive log softmax projection layer instead of a linear layer for the prediction layer.

TYPE: `bool`, *optional*, defaults to `False` DEFAULT: False

n_langs

The number of languages the model handles. Set to 1 for monolingual models.

TYPE: `int`, *optional*, defaults to 1 DEFAULT: 1

max_position_embeddings

The maximum sequence length that this model might ever be used with. Typically set this to something large just in case (e.g., 512 or 1024 or 2048).

TYPE: `int`, *optional*, defaults to 512 DEFAULT: 512

embed_init_std

The standard deviation of the truncated_normal_initializer for initializing the embedding matrices.

TYPE: `float`, *optional*, defaults to 2048^-0.5 DEFAULT: 2048 ** -0.5

init_std

The standard deviation of the truncated_normal_initializer for initializing all weight matrices except the embedding matrices.

TYPE: `int`, *optional*, defaults to 50257 DEFAULT: 0.02

layer_norm_eps

The epsilon used by the layer normalization layers.

TYPE: `float`, *optional*, defaults to 1e-12 DEFAULT: 1e-12

bos_index

The index of the beginning of sentence token in the vocabulary.

TYPE: `int`, *optional*, defaults to 0 DEFAULT: 0

eos_index

The index of the end of sentence token in the vocabulary.

TYPE: `int`, *optional*, defaults to 1 DEFAULT: 1

pad_index

The index of the padding token in the vocabulary.

TYPE: `int`, *optional*, defaults to 2 DEFAULT: 2

unk_index

The index of the unknown token in the vocabulary.

TYPE: `int`, *optional*, defaults to 3 DEFAULT: 3

mask_index

The index of the masking token in the vocabulary.

TYPE: `int`, *optional*, defaults to 5 DEFAULT: 5

is_encoder(`bool`,

Whether or not the initialized model should be a transformer encoder or decoder as seen in Vaswani et al.

TYPE: *optional*, defaults to `True`

summary_type

Argument used when doing sequence summary. Used in the sequence classification and multiple choice models. Has to be one of the following options:

  • "last": Take the last token hidden state (like XLNet).
  • "first": Take the first token hidden state (like BERT).
  • "mean": Take the mean of all tokens hidden states.
  • "cls_index": Supply a Tensor of classification token position (like GPT/GPT-2).
  • "attn": Not implemented now, use multi-head attention.

TYPE: `string`, *optional*, defaults to "first" DEFAULT: 'first'

summary_use_proj

Argument used when doing sequence summary. Used in the sequence classification and multiple choice models. Whether or not to add a projection after the vector extraction.

TYPE: `bool`, *optional*, defaults to `True` DEFAULT: True

summary_activation

Argument used when doing sequence summary. Used in the sequence classification and multiple choice models. Pass "tanh" for a tanh activation to the output, any other value will result in no activation.

TYPE: `str`, *optional* DEFAULT: None

summary_proj_to_labels

Used in the sequence classification and multiple choice models. Whether the projection outputs should have config.num_labels or config.hidden_size classes.

TYPE: `bool`, *optional*, defaults to `True` DEFAULT: True

summary_first_dropout

Used in the sequence classification and multiple choice models. The dropout ratio to be used after the projection and activation.

TYPE: `float`, *optional*, defaults to 0.1 DEFAULT: 0.1

start_n_top

Used in the SQuAD evaluation script.

TYPE: `int`, *optional*, defaults to 5 DEFAULT: 5

end_n_top

Used in the SQuAD evaluation script.

TYPE: `int`, *optional*, defaults to 5 DEFAULT: 5

mask_token_id

Model agnostic parameter to identify masked tokens when generating text in an MLM context.

TYPE: `int`, *optional*, defaults to 0 DEFAULT: 0

lang_id

The ID of the language used by the model. This parameter is used when generating text in a given language.

TYPE: `int`, *optional*, defaults to 1 DEFAULT: 0

Example
>>> from transformers import XLMConfig, XLMModel
...
>>> # Initializing a XLM configuration
>>> configuration = XLMConfig()
...
>>> # Initializing a model (with random weights) from the configuration
>>> model = XLMModel(configuration)
...
>>> # Accessing the model configuration
>>> configuration = model.config
Source code in mindnlp/transformers/models/xlm/configuration_xlm.py
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
class XLMConfig(PretrainedConfig):
    """
    This is the configuration class to store the configuration of a [`XLMModel`] or a [`TFXLMModel`]. It is used to
    instantiate a XLM model according to the specified arguments, defining the model architecture. Instantiating a
    configuration with the defaults will yield a similar configuration to that of the
    [xlm-mlm-en-2048](https://hf-mirror.com/xlm-mlm-en-2048) architecture.

    Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
    documentation from [`PretrainedConfig`] for more information.

    Args:
        vocab_size (`int`, *optional*, defaults to 30145):
            Vocabulary size of the BERT model. Defines the number of different tokens that can be represented by the
            `inputs_ids` passed when calling [`XLMModel`] or [`TFXLMModel`].
        emb_dim (`int`, *optional*, defaults to 2048):
            Dimensionality of the encoder layers and the pooler layer.
        n_layer (`int`, *optional*, defaults to 12):
            Number of hidden layers in the Transformer encoder.
        n_head (`int`, *optional*, defaults to 16):
            Number of attention heads for each attention layer in the Transformer encoder.
        dropout (`float`, *optional*, defaults to 0.1):
            The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
        attention_dropout (`float`, *optional*, defaults to 0.1):
            The dropout probability for the attention mechanism
        gelu_activation (`bool`, *optional*, defaults to `True`):
            Whether or not to use *gelu* for the activations instead of *relu*.
        sinusoidal_embeddings (`bool`, *optional*, defaults to `False`):
            Whether or not to use sinusoidal positional embeddings instead of absolute positional embeddings.
        causal (`bool`, *optional*, defaults to `False`):
            Whether or not the model should behave in a causal manner. Causal models use a triangular attention mask in
            order to only attend to the left-side context instead if a bidirectional context.
        asm (`bool`, *optional*, defaults to `False`):
            Whether or not to use an adaptive log softmax projection layer instead of a linear layer for the prediction
            layer.
        n_langs (`int`, *optional*, defaults to 1):
            The number of languages the model handles. Set to 1 for monolingual models.
        use_lang_emb (`bool`, *optional*, defaults to `True`)
            Whether to use language embeddings. Some models use additional language embeddings, see [the multilingual
            models page](http://hf-mirror.com/transformers/multilingual.html#xlm-language-embeddings) for information
            on how to use them.
        max_position_embeddings (`int`, *optional*, defaults to 512):
            The maximum sequence length that this model might ever be used with. Typically set this to something large
            just in case (e.g., 512 or 1024 or 2048).
        embed_init_std (`float`, *optional*, defaults to 2048^-0.5):
            The standard deviation of the truncated_normal_initializer for initializing the embedding matrices.
        init_std (`int`, *optional*, defaults to 50257):
            The standard deviation of the truncated_normal_initializer for initializing all weight matrices except the
            embedding matrices.
        layer_norm_eps (`float`, *optional*, defaults to 1e-12):
            The epsilon used by the layer normalization layers.
        bos_index (`int`, *optional*, defaults to 0):
            The index of the beginning of sentence token in the vocabulary.
        eos_index (`int`, *optional*, defaults to 1):
            The index of the end of sentence token in the vocabulary.
        pad_index (`int`, *optional*, defaults to 2):
            The index of the padding token in the vocabulary.
        unk_index (`int`, *optional*, defaults to 3):
            The index of the unknown token in the vocabulary.
        mask_index (`int`, *optional*, defaults to 5):
            The index of the masking token in the vocabulary.
        is_encoder(`bool`, *optional*, defaults to `True`):
            Whether or not the initialized model should be a transformer encoder or decoder as seen in Vaswani et al.
        summary_type (`string`, *optional*, defaults to "first"):
            Argument used when doing sequence summary. Used in the sequence classification and multiple choice models.
            Has to be one of the following options:

            - `"last"`: Take the last token hidden state (like XLNet).
            - `"first"`: Take the first token hidden state (like BERT).
            - `"mean"`: Take the mean of all tokens hidden states.
            - `"cls_index"`: Supply a Tensor of classification token position (like GPT/GPT-2).
            - `"attn"`: Not implemented now, use multi-head attention.
        summary_use_proj (`bool`, *optional*, defaults to `True`):
            Argument used when doing sequence summary. Used in the sequence classification and multiple choice models.
            Whether or not to add a projection after the vector extraction.
        summary_activation (`str`, *optional*):
            Argument used when doing sequence summary. Used in the sequence classification and multiple choice models.
            Pass `"tanh"` for a tanh activation to the output, any other value will result in no activation.
        summary_proj_to_labels (`bool`, *optional*, defaults to `True`):
            Used in the sequence classification and multiple choice models.
            Whether the projection outputs should have `config.num_labels` or `config.hidden_size` classes.
        summary_first_dropout (`float`, *optional*, defaults to 0.1):
            Used in the sequence classification and multiple choice models.
            The dropout ratio to be used after the projection and activation.
        start_n_top (`int`, *optional*, defaults to 5):
            Used in the SQuAD evaluation script.
        end_n_top (`int`, *optional*, defaults to 5):
            Used in the SQuAD evaluation script.
        mask_token_id (`int`, *optional*, defaults to 0):
            Model agnostic parameter to identify masked tokens when generating text in an MLM context.
        lang_id (`int`, *optional*, defaults to 1):
            The ID of the language used by the model. This parameter is used when generating text in a given language.

    Example:
        ```python
        >>> from transformers import XLMConfig, XLMModel
        ...
        >>> # Initializing a XLM configuration
        >>> configuration = XLMConfig()
        ...
        >>> # Initializing a model (with random weights) from the configuration
        >>> model = XLMModel(configuration)
        ...
        >>> # Accessing the model configuration
        >>> configuration = model.config
        ```
    """
    model_type = "xlm"
    attribute_map = {
        "hidden_size": "emb_dim",
        "num_attention_heads": "n_heads",
        "num_hidden_layers": "n_layers",
        "n_words": "vocab_size",  # For backward compatibility
    }

    def __init__(
        self,
        vocab_size=30145,
        emb_dim=2048,
        n_layers=12,
        n_heads=16,
        dropout=0.1,
        attention_dropout=0.1,
        gelu_activation=True,
        sinusoidal_embeddings=False,
        causal=False,
        asm=False,
        n_langs=1,
        use_lang_emb=True,
        max_position_embeddings=512,
        embed_init_std=2048**-0.5,
        layer_norm_eps=1e-12,
        init_std=0.02,
        bos_index=0,
        eos_index=1,
        pad_index=2,
        unk_index=3,
        mask_index=5,
        is_encoder=True,
        summary_type="first",
        summary_use_proj=True,
        summary_activation=None,
        summary_proj_to_labels=True,
        summary_first_dropout=0.1,
        start_n_top=5,
        end_n_top=5,
        mask_token_id=0,
        lang_id=0,
        pad_token_id=2,
        bos_token_id=0,
        **kwargs,
    ):
        """Constructs XLMConfig."""
        self.vocab_size = vocab_size
        self.emb_dim = emb_dim
        self.n_layers = n_layers
        self.n_heads = n_heads
        self.dropout = dropout
        self.attention_dropout = attention_dropout
        self.gelu_activation = gelu_activation
        self.sinusoidal_embeddings = sinusoidal_embeddings
        self.causal = causal
        self.asm = asm
        self.n_langs = n_langs
        self.use_lang_emb = use_lang_emb
        self.layer_norm_eps = layer_norm_eps
        self.bos_index = bos_index
        self.eos_index = eos_index
        self.pad_index = pad_index
        self.unk_index = unk_index
        self.mask_index = mask_index
        self.is_encoder = is_encoder
        self.max_position_embeddings = max_position_embeddings
        self.embed_init_std = embed_init_std
        self.init_std = init_std
        self.summary_type = summary_type
        self.summary_use_proj = summary_use_proj
        self.summary_activation = summary_activation
        self.summary_proj_to_labels = summary_proj_to_labels
        self.summary_first_dropout = summary_first_dropout
        self.start_n_top = start_n_top
        self.end_n_top = end_n_top
        self.mask_token_id = mask_token_id
        self.lang_id = lang_id

        if "n_words" in kwargs:
            self.n_words = kwargs["n_words"]

        super().__init__(pad_token_id=pad_token_id, bos_token_id=bos_token_id, **kwargs)

mindnlp.transformers.models.xlm.configuration_xlm.XLMConfig.__init__(vocab_size=30145, emb_dim=2048, n_layers=12, n_heads=16, dropout=0.1, attention_dropout=0.1, gelu_activation=True, sinusoidal_embeddings=False, causal=False, asm=False, n_langs=1, use_lang_emb=True, max_position_embeddings=512, embed_init_std=2048 ** -0.5, layer_norm_eps=1e-12, init_std=0.02, bos_index=0, eos_index=1, pad_index=2, unk_index=3, mask_index=5, is_encoder=True, summary_type='first', summary_use_proj=True, summary_activation=None, summary_proj_to_labels=True, summary_first_dropout=0.1, start_n_top=5, end_n_top=5, mask_token_id=0, lang_id=0, pad_token_id=2, bos_token_id=0, **kwargs)

Constructs XLMConfig.

Source code in mindnlp/transformers/models/xlm/configuration_xlm.py
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
def __init__(
    self,
    vocab_size=30145,
    emb_dim=2048,
    n_layers=12,
    n_heads=16,
    dropout=0.1,
    attention_dropout=0.1,
    gelu_activation=True,
    sinusoidal_embeddings=False,
    causal=False,
    asm=False,
    n_langs=1,
    use_lang_emb=True,
    max_position_embeddings=512,
    embed_init_std=2048**-0.5,
    layer_norm_eps=1e-12,
    init_std=0.02,
    bos_index=0,
    eos_index=1,
    pad_index=2,
    unk_index=3,
    mask_index=5,
    is_encoder=True,
    summary_type="first",
    summary_use_proj=True,
    summary_activation=None,
    summary_proj_to_labels=True,
    summary_first_dropout=0.1,
    start_n_top=5,
    end_n_top=5,
    mask_token_id=0,
    lang_id=0,
    pad_token_id=2,
    bos_token_id=0,
    **kwargs,
):
    """Constructs XLMConfig."""
    self.vocab_size = vocab_size
    self.emb_dim = emb_dim
    self.n_layers = n_layers
    self.n_heads = n_heads
    self.dropout = dropout
    self.attention_dropout = attention_dropout
    self.gelu_activation = gelu_activation
    self.sinusoidal_embeddings = sinusoidal_embeddings
    self.causal = causal
    self.asm = asm
    self.n_langs = n_langs
    self.use_lang_emb = use_lang_emb
    self.layer_norm_eps = layer_norm_eps
    self.bos_index = bos_index
    self.eos_index = eos_index
    self.pad_index = pad_index
    self.unk_index = unk_index
    self.mask_index = mask_index
    self.is_encoder = is_encoder
    self.max_position_embeddings = max_position_embeddings
    self.embed_init_std = embed_init_std
    self.init_std = init_std
    self.summary_type = summary_type
    self.summary_use_proj = summary_use_proj
    self.summary_activation = summary_activation
    self.summary_proj_to_labels = summary_proj_to_labels
    self.summary_first_dropout = summary_first_dropout
    self.start_n_top = start_n_top
    self.end_n_top = end_n_top
    self.mask_token_id = mask_token_id
    self.lang_id = lang_id

    if "n_words" in kwargs:
        self.n_words = kwargs["n_words"]

    super().__init__(pad_token_id=pad_token_id, bos_token_id=bos_token_id, **kwargs)

mindnlp.transformers.models.xlm.tokenization_xlm

Tokenization classes for XLM.

mindnlp.transformers.models.xlm.tokenization_xlm.XLMTokenizer

Bases: PreTrainedTokenizer

Construct an XLM tokenizer. Based on Byte-Pair Encoding. The tokenization process is the following:

  • Moses preprocessing and tokenization for most supported languages.
  • Language specific tokenization for Chinese (Jieba), Japanese (KyTea) and Thai (PyThaiNLP).
  • Optionally lowercases and normalizes all inputs text.
  • The arguments special_tokens and the function set_special_tokens, can be used to add additional symbols (like "classify") to a vocabulary.
  • The lang2id attribute maps the languages supported by the model with their IDs if provided (automatically set for pretrained vocabularies).
  • The id2lang attributes does reverse mapping if provided (automatically set for pretrained vocabularies).

This tokenizer inherits from [PreTrainedTokenizer] which contains most of the main methods. Users should refer to this superclass for more information regarding those methods.

PARAMETER DESCRIPTION
vocab_file

Vocabulary file.

TYPE: `str`

merges_file

Merges file.

TYPE: `str`

unk_token

The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this token instead.

TYPE: `str`, *optional*, defaults to `"<unk>"` DEFAULT: '<unk>'

bos_token

The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token.

When building a sequence using special tokens, this is not the token that is used for the beginning of sequence. The token used is the cls_token.

TYPE: `str`, *optional*, defaults to `"<s>"` DEFAULT: '<s>'

sep_token

The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for sequence classification or for a text and a question for question answering. It is also used as the last token of a sequence built with special tokens.

TYPE: `str`, *optional*, defaults to `"</s>"` DEFAULT: '</s>'

pad_token

The token used for padding, for example when batching sequences of different lengths.

TYPE: `str`, *optional*, defaults to `"<pad>"` DEFAULT: '<pad>'

cls_token

The classifier token which is used when doing sequence classification (classification of the whole sequence instead of per-token classification). It is the first token of the sequence when built with special tokens.

TYPE: `str`, *optional*, defaults to `"</s>"` DEFAULT: '</s>'

mask_token

The token used for masking values. This is the token used when training this model with masked language modeling. This is the token which the model will try to predict.

TYPE: `str`, *optional*, defaults to `"<special1>"` DEFAULT: '<special1>'

lang2id

Dictionary mapping languages string identifiers to their IDs.

TYPE: `Dict[str, int]`, *optional* DEFAULT: None

id2lang

Dictionary mapping language IDs to their string identifiers.

TYPE: `Dict[int, str]`, *optional* DEFAULT: None

do_lowercase_and_remove_accent

Whether to lowercase and remove accents when tokenizing.

TYPE: `bool`, *optional*, defaults to `True` DEFAULT: True

Source code in mindnlp/transformers/models/xlm/tokenization_xlm.py
 528
 529
 530
 531
 532
 533
 534
 535
 536
 537
 538
 539
 540
 541
 542
 543
 544
 545
 546
 547
 548
 549
 550
 551
 552
 553
 554
 555
 556
 557
 558
 559
 560
 561
 562
 563
 564
 565
 566
 567
 568
 569
 570
 571
 572
 573
 574
 575
 576
 577
 578
 579
 580
 581
 582
 583
 584
 585
 586
 587
 588
 589
 590
 591
 592
 593
 594
 595
 596
 597
 598
 599
 600
 601
 602
 603
 604
 605
 606
 607
 608
 609
 610
 611
 612
 613
 614
 615
 616
 617
 618
 619
 620
 621
 622
 623
 624
 625
 626
 627
 628
 629
 630
 631
 632
 633
 634
 635
 636
 637
 638
 639
 640
 641
 642
 643
 644
 645
 646
 647
 648
 649
 650
 651
 652
 653
 654
 655
 656
 657
 658
 659
 660
 661
 662
 663
 664
 665
 666
 667
 668
 669
 670
 671
 672
 673
 674
 675
 676
 677
 678
 679
 680
 681
 682
 683
 684
 685
 686
 687
 688
 689
 690
 691
 692
 693
 694
 695
 696
 697
 698
 699
 700
 701
 702
 703
 704
 705
 706
 707
 708
 709
 710
 711
 712
 713
 714
 715
 716
 717
 718
 719
 720
 721
 722
 723
 724
 725
 726
 727
 728
 729
 730
 731
 732
 733
 734
 735
 736
 737
 738
 739
 740
 741
 742
 743
 744
 745
 746
 747
 748
 749
 750
 751
 752
 753
 754
 755
 756
 757
 758
 759
 760
 761
 762
 763
 764
 765
 766
 767
 768
 769
 770
 771
 772
 773
 774
 775
 776
 777
 778
 779
 780
 781
 782
 783
 784
 785
 786
 787
 788
 789
 790
 791
 792
 793
 794
 795
 796
 797
 798
 799
 800
 801
 802
 803
 804
 805
 806
 807
 808
 809
 810
 811
 812
 813
 814
 815
 816
 817
 818
 819
 820
 821
 822
 823
 824
 825
 826
 827
 828
 829
 830
 831
 832
 833
 834
 835
 836
 837
 838
 839
 840
 841
 842
 843
 844
 845
 846
 847
 848
 849
 850
 851
 852
 853
 854
 855
 856
 857
 858
 859
 860
 861
 862
 863
 864
 865
 866
 867
 868
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
class XLMTokenizer(PreTrainedTokenizer):
    """
    Construct an XLM tokenizer. Based on Byte-Pair Encoding. The tokenization process is the following:

    - Moses preprocessing and tokenization for most supported languages.
    - Language specific tokenization for Chinese (Jieba), Japanese (KyTea) and Thai (PyThaiNLP).
    - Optionally lowercases and normalizes all inputs text.
    - The arguments `special_tokens` and the function `set_special_tokens`, can be used to add additional symbols (like
    "__classify__") to a vocabulary.
    - The `lang2id` attribute maps the languages supported by the model with their IDs if provided (automatically set
    for pretrained vocabularies).
    - The `id2lang` attributes does reverse mapping if provided (automatically set for pretrained vocabularies).

    This tokenizer inherits from [`PreTrainedTokenizer`] which contains most of the main methods. Users should refer to
    this superclass for more information regarding those methods.

    Args:
        vocab_file (`str`):
            Vocabulary file.
        merges_file (`str`):
            Merges file.
        unk_token (`str`, *optional*, defaults to `"<unk>"`):
            The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
            token instead.
        bos_token (`str`, *optional*, defaults to `"<s>"`):
            The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token.

            <Tip>

            When building a sequence using special tokens, this is not the token that is used for the beginning of
            sequence. The token used is the `cls_token`.

            </Tip>

        sep_token (`str`, *optional*, defaults to `"</s>"`):
            The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for
            sequence classification or for a text and a question for question answering. It is also used as the last
            token of a sequence built with special tokens.
        pad_token (`str`, *optional*, defaults to `"<pad>"`):
            The token used for padding, for example when batching sequences of different lengths.
        cls_token (`str`, *optional*, defaults to `"</s>"`):
            The classifier token which is used when doing sequence classification (classification of the whole sequence
            instead of per-token classification). It is the first token of the sequence when built with special tokens.
        mask_token (`str`, *optional*, defaults to `"<special1>"`):
            The token used for masking values. This is the token used when training this model with masked language
            modeling. This is the token which the model will try to predict.
        additional_special_tokens (`List[str]`, *optional*, defaults to `['<special0>', '<special1>', '<special2>',
            '<special3>', '<special4>', '<special5>', '<special6>', '<special7>', '<special8>', '<special9>']`):
            List of additional special tokens.
        lang2id (`Dict[str, int]`, *optional*):
            Dictionary mapping languages string identifiers to their IDs.
        id2lang (`Dict[int, str]`, *optional*):
            Dictionary mapping language IDs to their string identifiers.
        do_lowercase_and_remove_accent (`bool`, *optional*, defaults to `True`):
            Whether to lowercase and remove accents when tokenizing.
    """
    vocab_files_names = VOCAB_FILES_NAMES
    pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP
    pretrained_init_configuration = PRETRAINED_INIT_CONFIGURATION
    max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES

    def __init__(
        self,
        vocab_file,
        merges_file,
        unk_token="<unk>",
        bos_token="<s>",
        sep_token="</s>",
        pad_token="<pad>",
        cls_token="</s>",
        mask_token="<special1>",
        additional_special_tokens=[
            "<special0>",
            "<special1>",
            "<special2>",
            "<special3>",
            "<special4>",
            "<special5>",
            "<special6>",
            "<special7>",
            "<special8>",
            "<special9>",
        ],
        lang2id=None,
        id2lang=None,
        do_lowercase_and_remove_accent=True,
        **kwargs,
    ):
        '''
        Initializes an instance of XLMTokenizer.

        Args:
            self: The instance of the class.
            vocab_file (str): The file path to the vocabulary file.
            merges_file (str): The file path to the merges file.
            unk_token (str): The unknown token (default: '<unk>').
            bos_token (str): The beginning of sentence token (default: '<s>').
            sep_token (str): The separator token (default: '</s>').
            pad_token (str): The padding token (default: '<pad>').
            cls_token (str): The classification token (default: '</s>').
            mask_token (str): The masking token (default: '<special1>').
            additional_special_tokens (list): List of additional special tokens (default: ['<special0>', '<special1>',
                '<special2>', '<special3>', '<special4>', '<special5>', '<special6>', '<special7>', '<special8>',
                '<special9>']).
            lang2id (dict): A dictionary mapping languages to IDs.
            id2lang (dict): A dictionary mapping IDs to languages.
            do_lowercase_and_remove_accent (bool): A flag indicating whether to lowercase and remove accents
                (default: True).
            **kwargs: Additional keyword arguments.

        Returns:
            None.

        Raises:
            ImportError: If the sacremoses library is not installed.

        '''
        try:
            import sacremoses
        except ImportError as exc:
            raise ImportError(
                "You need to install sacremoses to use XLMTokenizer. "
                "See https://pypi.org/project/sacremoses/ for installation."
            ) from exc

        self.sm = sacremoses

        # cache of sm.MosesPunctNormalizer instance
        self.cache_moses_punct_normalizer = {}
        # cache of sm.MosesTokenizer instance
        self.cache_moses_tokenizer = {}
        self.lang_with_custom_tokenizer = {"zh", "th", "ja"}
        # True for current supported model (v1.2.0), False for XLM-17 & 100
        self.do_lowercase_and_remove_accent = do_lowercase_and_remove_accent
        self.lang2id = lang2id
        self.id2lang = id2lang
        if lang2id is not None and id2lang is not None:
            assert len(lang2id) == len(id2lang)

        self.ja_word_tokenizer = None
        self.zh_word_tokenizer = None

        with open(vocab_file, encoding="utf-8") as vocab_handle:
            self.encoder = json.load(vocab_handle)
        self.decoder = {v: k for k, v in self.encoder.items()}
        with open(merges_file, encoding="utf-8") as merges_handle:
            merges = merges_handle.read().split("\n")[:-1]
        merges = [tuple(merge.split()[:2]) for merge in merges]
        self.bpe_ranks = dict(zip(merges, range(len(merges))))
        self.cache = {}
        super().__init__(
            unk_token=unk_token,
            bos_token=bos_token,
            sep_token=sep_token,
            pad_token=pad_token,
            cls_token=cls_token,
            mask_token=mask_token,
            additional_special_tokens=additional_special_tokens,
            lang2id=lang2id,
            id2lang=id2lang,
            do_lowercase_and_remove_accent=do_lowercase_and_remove_accent,
            **kwargs,
        )

    @property
    def do_lower_case(self):
        """
        This method, 'do_lower_case', is a property method within the 'XLMTokenizer' class.

        Args:
            self: The instance of the 'XLMTokenizer' class.

        Returns:
            None.

        Raises:
            None.
        """
        return self.do_lowercase_and_remove_accent

    def moses_punct_norm(self, text, lang):
        """
        The 'moses_punct_norm' method is a member of the 'XLMTokenizer' class. It normalizes punctuation in a given text
        based on the specified language using the MosesPunctNormalizer.

        Args:
            self (XLMTokenizer): An instance of the XLMTokenizer class.
            text (str): The input text to be normalized.
            lang (str): The language of the input text. The normalization is performed based on the rules specific to
                this language.

        Returns:
            None: This method does not return any value. It modifies the input text in-place by normalizing the
                punctuation.

        Raises:
            None.
        """
        if lang not in self.cache_moses_punct_normalizer:
            punct_normalizer = self.sm.MosesPunctNormalizer(lang=lang)
            self.cache_moses_punct_normalizer[lang] = punct_normalizer
        else:
            punct_normalizer = self.cache_moses_punct_normalizer[lang]
        return punct_normalizer.normalize(text)

    def moses_tokenize(self, text, lang):
        """
        Performs tokenization using the MosesTokenizer from the SentencePiece library.

        Args:
            self: An instance of the XLMTokenizer class.
            text (str): The input text to be tokenized.
            lang (str): The language of the text.

        Returns:
            None.

        Raises:
            None.

        Description:
            This method tokenizes the input text using the MosesTokenizer from the SentencePiece library.
            It is specifically designed for the XLMTokenizer class. The tokenization process splits the text into
            individual tokens based on language-specific rules and returns the tokenized output.

            - The 'self' parameter is used to access the instance variables and methods of the XLMTokenizer class.
            - The 'text' parameter represents the text that needs to be tokenized.
            - The 'lang' parameter specifies the language of the text, which is used to determine the appropriate
            tokenizer.

            If the MosesTokenizer for the specified language is not already cached, it is instantiated and stored in
            the cache_moses_tokenizer dictionary of the XLMTokenizer instance. Subsequent invocations of the method with
            the same language will reuse the cached tokenizer. This caching mechanism optimizes performance by avoiding
            repeated instantiation of tokenizers.

            The method calls the 'tokenize' function of the MosesTokenizer object to perform the actual tokenization.
            The 'return_str' parameter is set to False, indicating that the method should return a list of tokens rather
            than a single string. The 'escape' parameter is set to False, indicating that no escaping of special
            characters should be performed during tokenization.

            Note:
                The MosesTokenizer relies on pre-trained language-specific models for accurate tokenization.
                Make sure to have these models available for the desired languages.

        Example:
            ```python
            >>> tokenizer = XLMTokenizer()
            >>> tokenized_text = tokenizer.moses_tokenize("Hello world!", "en")
            ```

            The above example tokenizes the input text "Hello world!" using the MosesTokenizer for English language.
            The resulting tokenized text is stored in the 'tokenized_text' variable.

            Note:
                The actual tokenization behavior may vary based on the language and the specific language models used
                by the MosesTokenizer.
        """
        if lang not in self.cache_moses_tokenizer:
            moses_tokenizer = self.sm.MosesTokenizer(lang=lang)
            self.cache_moses_tokenizer[lang] = moses_tokenizer
        else:
            moses_tokenizer = self.cache_moses_tokenizer[lang]
        return moses_tokenizer.tokenize(text, return_str=False, escape=False)

    def moses_pipeline(self, text, lang):
        """
        Applies the Moses pipeline to preprocess text.

        Args:
            self (XLMTokenizer): An instance of the XLMTokenizer class.
            text (str): The input text to be processed.
            lang (str): The language of the input text.

        Returns:
            None: The method modifies the input text in-place.

        Raises:
            None.
        """
        text = replace_unicode_punct(text)
        text = self.moses_punct_norm(text, lang)
        text = remove_non_printing_char(text)
        return text

    def ja_tokenize(self, text):
        """
        Method to tokenize Japanese text using KyTea library.

        Args:
            self (object): Instance of the XLMTokenizer class.
            text (str): The Japanese text to be tokenized.

        Returns:
            None: This method returns None. The tokenized text can be accessed by calling the getWS method of the
                ja_word_tokenizer object.

        Raises:
            AttributeError: If an attribute error occurs during the execution of the method.
            ImportError: If an import error occurs, typically when the required KyTea library or its Python wrapper
                is not installed.
            Exception: Any other unexpected exception raised during the execution of the method.
        """
        if self.ja_word_tokenizer is None:
            try:
                import Mykytea

                self.ja_word_tokenizer = Mykytea.Mykytea(
                    f"-model {os.path.expanduser('~')}/local/share/kytea/model.bin"
                )
            except (AttributeError, ImportError):
                logger.error(
                    "Make sure you install KyTea (https://github.com/neubig/kytea) and it's python wrapper"
                    " (https://github.com/chezou/Mykytea-python) with the following steps"
                )
                logger.error("1. git clone git@github.com:neubig/kytea.git && cd kytea")
                logger.error("2. autoreconf -i")
                logger.error("3. ./configure --prefix=$HOME/local")
                logger.error("4. make && make install")
                logger.error("5. pip install kytea")
                raise
        return list(self.ja_word_tokenizer.getWS(text))

    @property
    def vocab_size(self):
        """
        Returns the size of the vocabulary used by the XLMTokenizer.

        Args:
            self: An instance of the XLMTokenizer class.

        Returns:
            int: The number of unique tokens in the tokenizer's encoder.

        Raises:
            None.

        Note:
            This method calculates the size of the vocabulary by obtaining the length of the tokenizer's encoder.
            The encoder is responsible for encoding and decoding the tokens used by the tokenizer.

        Example:
            ```python
            >>> tokenizer = XLMTokenizer()
            >>> tokenizer.vocab_size
            50000
            ```
        """
        return len(self.encoder)

    def get_vocab(self):
        """Return the vocabulary of the XLMTokenizer.

        Args:
            self (XLMTokenizer): An instance of the XLMTokenizer class.

        Returns:
            dict: A dictionary representing the vocabulary of the tokenizer. The keys are the tokens and the values
                are their corresponding IDs.

        Raises:
            None.

        Example:
            ```python
            >>> tokenizer = XLMTokenizer()
            >>> tokenizer.get_vocab()
            {'<s>': 0, '<pad>': 1, '</s>': 2, '<unk>': 3, '<mask>': 4, 'hello': 5, 'world': 6}
            ```

        Note:
            This method combines the encoder and added_tokens_encoder dictionaries to form the complete vocabulary.
        """
        return dict(self.encoder, **self.added_tokens_encoder)

    def bpe(self, token):
        """
        This method is part of the XLMTokenizer class and performs Byte Pair Encoding (BPE) on a given token.

        Args:
            self: The instance of the XLMTokenizer class.
            token (str): The input token to be processed through BPE. It should be a string representing a word.

        Returns:
            str: The processed token after applying Byte Pair Encoding. The token may have undergone splitting or
                merging based on the rules of BPE.

        Raises:
            ValueError: If an error occurs during the processing of the token, such as an issue with indexing or
                comparison.
            KeyError: If the method encounters a key error while accessing data structures like dictionaries.
        """
        word = tuple(token[:-1]) + (token[-1] + "</w>",)
        if token in self.cache:
            return self.cache[token]
        pairs = get_pairs(word)

        if not pairs:
            return token + "</w>"

        while True:
            bigram = min(pairs, key=lambda pair: self.bpe_ranks.get(pair, float("inf")))
            if bigram not in self.bpe_ranks:
                break
            first, second = bigram
            new_word = []
            i = 0
            while i < len(word):
                try:
                    j = word.index(first, i)
                except ValueError:
                    new_word.extend(word[i:])
                    break
                else:
                    new_word.extend(word[i:j])
                    i = j

                if word[i] == first and i < len(word) - 1 and word[i + 1] == second:
                    new_word.append(first + second)
                    i += 2
                else:
                    new_word.append(word[i])
                    i += 1
            new_word = tuple(new_word)
            word = new_word
            if len(word) == 1:
                break
            pairs = get_pairs(word)
        word = " ".join(word)
        if word == "\n  </w>":
            word = "\n</w>"
        self.cache[token] = word
        return word

    def _tokenize(self, text, lang="en", bypass_tokenizer=False):
        """
        Tokenize a string given language code. For Chinese, Japanese and Thai, we use a language specific tokenizer.
        Otherwise, we use Moses.

        Details of tokenization:

        - [sacremoses](https://github.com/alvations/sacremoses): port of Moses
        - Install with `pip install sacremoses`
        - [pythainlp](https://github.com/PyThaiNLP/pythainlp): Thai tokenizer
        - Install with `pip install pythainlp`
        - [kytea](https://github.com/chezou/Mykytea-python): Japanese tokenizer, wrapper of
        [KyTea](https://github.com/neubig/kytea)
        - Install with the following steps:
        ```
        git clone git@github.com:neubig/kytea.git && cd kytea autoreconf -i ./configure --prefix=$HOME/local
        make && make install pip install kytea
        ```

        - [jieba](https://github.com/fxsjy/jieba): Chinese tokenizer (*)
        - Install with `pip install jieba`

        (*) The original XLM used [Stanford
        Segmenter](https://nlp.stanford.edu/software/stanford-segmenter-2018-10-16.zip). However, the wrapper
        (`nltk.tokenize.stanford_segmenter`) is slow due to JVM overhead, and it will be deprecated. Jieba is a lot
        faster and pip-installable. Note there is some mismatch with the Stanford Segmenter. It should be fine if you
        fine-tune the model with Chinese supervisionself. If you want the same exact behaviour, use the original XLM
        [preprocessing script](https://github.com/facebookresearch/XLM/tree/master/tools) to tokenize the sentence
        externally, and set `bypass_tokenizer=True` to bypass the tokenizer.

        Args:
            lang (string): ISO language code (default = 'en'). Languages should belong of the model supported
                languages. However, we don't enforce it.
            bypass_tokenizer (bool): Allow users to preprocess and tokenize the sentences externally (default = False).
                If True, we only apply BPE.

        Returns:
            List of tokens.
        """
        if lang and self.lang2id and lang not in self.lang2id:
            logger.error(
                "Supplied language code not found in lang2id mapping. Please check that your language is supported by"
                " the loaded pretrained model."
            )
        if bypass_tokenizer:
            text = text.split()
        elif lang not in self.lang_with_custom_tokenizer:
            text = self.moses_pipeline(text, lang=lang)
            # TODO: make sure we are using `xlm-mlm-enro-1024`, since XLM-100 doesn't have this step
            if lang == "ro":
                text = romanian_preprocessing(text)
            text = self.moses_tokenize(text, lang=lang)
        elif lang == "th":
            text = self.moses_pipeline(text, lang=lang)
            try:
                if "pythainlp" not in sys.modules:
                    from pythainlp.tokenize import word_tokenize as th_word_tokenize
                else:
                    th_word_tokenize = sys.modules["pythainlp"].word_tokenize
            except (AttributeError, ImportError):
                logger.error(
                    "Make sure you install PyThaiNLP (https://github.com/PyThaiNLP/pythainlp) with the following steps"
                )
                logger.error("1. pip install pythainlp")
                raise
            text = th_word_tokenize(text)
        elif lang == "zh":
            try:
                if "jieba" not in sys.modules:
                    import jieba
                else:
                    jieba = sys.modules["jieba"]
            except (AttributeError, ImportError):
                logger.error("Make sure you install Jieba (https://github.com/fxsjy/jieba) with the following steps")
                logger.error("1. pip install jieba")
                raise
            text = " ".join(jieba.cut(text))
            text = self.moses_pipeline(text, lang=lang)
            text = text.split()
        elif lang == "ja":
            text = self.moses_pipeline(text, lang=lang)
            text = self.ja_tokenize(text)
        else:
            raise ValueError("It should not reach here")

        if self.do_lowercase_and_remove_accent and not bypass_tokenizer:
            text = lowercase_and_remove_accent(text)

        split_tokens = []
        for token in text:
            if token:
                split_tokens.extend(list(self.bpe(token).split(" ")))

        return split_tokens

    def _convert_token_to_id(self, token):
        """Converts a token (str) in an id using the vocab."""
        return self.encoder.get(token, self.encoder.get(self.unk_token))

    def _convert_id_to_token(self, index):
        """Converts an index (integer) in a token (str) using the vocab."""
        return self.decoder.get(index, self.unk_token)

    def convert_tokens_to_string(self, tokens):
        """Converts a sequence of tokens (string) in a single string."""
        out_string = "".join(tokens).replace("</w>", " ").strip()
        return out_string

    def build_inputs_with_special_tokens(
        self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
    ) -> List[int]:
        """
        Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and
        adding special tokens. An XLM sequence has the following format:

        - single sequence: `<s> X </s>`
        - pair of sequences: `<s> A </s> B </s>`

        Args:
            token_ids_0 (`List[int]`):
                List of IDs to which the special tokens will be added.
            token_ids_1 (`List[int]`, *optional*):
                Optional second list of IDs for sequence pairs.

        Returns:
            `List[int]`: List of [input IDs](../glossary#input-ids) with the appropriate special tokens.

        """
        bos = [self.bos_token_id]
        sep = [self.sep_token_id]

        if token_ids_1 is None:
            return bos + token_ids_0 + sep
        return bos + token_ids_0 + sep + token_ids_1 + sep

    def get_special_tokens_mask(
        self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None, already_has_special_tokens: bool = False
    ) -> List[int]:
        """
        Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding
        special tokens using the tokenizer `prepare_for_model` method.

        Args:
            token_ids_0 (`List[int]`):
                List of IDs.
            token_ids_1 (`List[int]`, *optional*):
                Optional second list of IDs for sequence pairs.
            already_has_special_tokens (`bool`, *optional*, defaults to `False`):
                Whether or not the token list is already formatted with special tokens for the model.

        Returns:
            `List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
        """
        if already_has_special_tokens:
            return super().get_special_tokens_mask(
                token_ids_0=token_ids_0, token_ids_1=token_ids_1, already_has_special_tokens=True
            )

        if token_ids_1 is not None:
            return [1] + ([0] * len(token_ids_0)) + [1] + ([0] * len(token_ids_1)) + [1]
        return [1] + ([0] * len(token_ids_0)) + [1]

    def create_token_type_ids_from_sequences(
        self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
    ) -> List[int]:
        """
        Create a mask from the two sequences passed to be used in a sequence-pair classification task. An XLM sequence
        pair mask has the following format:

        ```
        0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
        | first sequence    | second sequence |
        ```

        If `token_ids_1` is `None`, this method only returns the first portion of the mask (0s).

        Args:
            token_ids_0 (`List[int]`):
                List of IDs.
            token_ids_1 (`List[int]`, *optional*):
                Optional second list of IDs for sequence pairs.

        Returns:
            `List[int]`: List of [token type IDs](../glossary#token-type-ids) according to the given sequence(s).
        """
        sep = [self.sep_token_id]
        cls = [self.cls_token_id]
        if token_ids_1 is None:
            return len(cls + token_ids_0 + sep) * [0]
        return len(cls + token_ids_0 + sep) * [0] + len(token_ids_1 + sep) * [1]

    def save_vocabulary(self, save_directory: str, filename_prefix: Optional[str] = None) -> Tuple[str]:
        """
        Save the vocabulary to the specified directory.

        Args:
            self: The instance of the XLMTokenizer class.
            save_directory (str): The directory path where the vocabulary files will be saved.
            filename_prefix (Optional[str]): An optional prefix to be added to the filename.
                Defaults to None.

        Returns:
            Tuple[str]: A tuple containing the paths of the saved vocabulary and merge files.

        Raises:
            OSError: If the save_directory does not exist.
            IOError: If an error occurs while writing the vocabulary or merge files.
        """
        if not os.path.isdir(save_directory):
            logger.error(f"Vocabulary path ({save_directory}) should be a directory")
            return
        vocab_file = os.path.join(
            save_directory, (filename_prefix + "-" if filename_prefix else "") + VOCAB_FILES_NAMES["vocab_file"]
        )
        merge_file = os.path.join(
            save_directory, (filename_prefix + "-" if filename_prefix else "") + VOCAB_FILES_NAMES["merges_file"]
        )

        with open(vocab_file, "w", encoding="utf-8") as f:
            f.write(json.dumps(self.encoder, indent=2, sort_keys=True, ensure_ascii=False) + "\n")

        index = 0
        with open(merge_file, "w", encoding="utf-8") as writer:
            for bpe_tokens, token_index in sorted(self.bpe_ranks.items(), key=lambda kv: kv[1]):
                if index != token_index:
                    logger.warning(
                        f"Saving vocabulary to {merge_file}: BPE merge indices are not consecutive."
                        " Please check that the tokenizer is not corrupted!"
                    )
                    index = token_index
                writer.write(" ".join(bpe_tokens) + "\n")
                index += 1

        return vocab_file, merge_file

    def __getstate__(self):
        """
        Method '__getstate__' in the class 'XLMTokenizer'.

        Args:
            self: XLMTokenizer object.
                Represents the instance of the XLMTokenizer class.
                No restrictions.

        Returns:
            dict:
                This method returns a dictionary representing the current state of the XLMTokenizer object with the
                'sm' attribute set to None.

        Raises:
            None.
        """
        state = self.__dict__.copy()
        state["sm"] = None
        return state

    def __setstate__(self, d):
        """
        Sets the state of the XLMTokenizer object.

        Args:
            self (XLMTokenizer): The XLMTokenizer object.
            d (dict): The dictionary containing the state to be set.
                The dictionary should have the following keys:

                - '__dict__': The dictionary representing the attributes of the object.

        Returns:
            None.

        Raises:
            ImportError: If the 'sacremoses' module is not installed, an ImportError is raised. The error message
                will provide instructions on how to install the module.
        """
        self.__dict__ = d

        try:
            import sacremoses
        except ImportError as exc:
            raise ImportError(
                "You need to install sacremoses to use XLMTokenizer. "
                "See https://pypi.org/project/sacremoses/ for installation."
            ) from exc

        self.sm = sacremoses

mindnlp.transformers.models.xlm.tokenization_xlm.XLMTokenizer.do_lower_case property

This method, 'do_lower_case', is a property method within the 'XLMTokenizer' class.

PARAMETER DESCRIPTION
self

The instance of the 'XLMTokenizer' class.

RETURNS DESCRIPTION

None.

mindnlp.transformers.models.xlm.tokenization_xlm.XLMTokenizer.vocab_size property

Returns the size of the vocabulary used by the XLMTokenizer.

PARAMETER DESCRIPTION
self

An instance of the XLMTokenizer class.

RETURNS DESCRIPTION
int

The number of unique tokens in the tokenizer's encoder.

Note

This method calculates the size of the vocabulary by obtaining the length of the tokenizer's encoder. The encoder is responsible for encoding and decoding the tokens used by the tokenizer.

Example
>>> tokenizer = XLMTokenizer()
>>> tokenizer.vocab_size
50000

mindnlp.transformers.models.xlm.tokenization_xlm.XLMTokenizer.__getstate__()

Method 'getstate' in the class 'XLMTokenizer'.

PARAMETER DESCRIPTION
self

XLMTokenizer object. Represents the instance of the XLMTokenizer class. No restrictions.

RETURNS DESCRIPTION
dict

This method returns a dictionary representing the current state of the XLMTokenizer object with the 'sm' attribute set to None.

Source code in mindnlp/transformers/models/xlm/tokenization_xlm.py
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
def __getstate__(self):
    """
    Method '__getstate__' in the class 'XLMTokenizer'.

    Args:
        self: XLMTokenizer object.
            Represents the instance of the XLMTokenizer class.
            No restrictions.

    Returns:
        dict:
            This method returns a dictionary representing the current state of the XLMTokenizer object with the
            'sm' attribute set to None.

    Raises:
        None.
    """
    state = self.__dict__.copy()
    state["sm"] = None
    return state

mindnlp.transformers.models.xlm.tokenization_xlm.XLMTokenizer.__init__(vocab_file, merges_file, unk_token='<unk>', bos_token='<s>', sep_token='</s>', pad_token='<pad>', cls_token='</s>', mask_token='<special1>', additional_special_tokens=['<special0>', '<special1>', '<special2>', '<special3>', '<special4>', '<special5>', '<special6>', '<special7>', '<special8>', '<special9>'], lang2id=None, id2lang=None, do_lowercase_and_remove_accent=True, **kwargs)

Initializes an instance of XLMTokenizer.

PARAMETER DESCRIPTION
self

The instance of the class.

vocab_file

The file path to the vocabulary file.

TYPE: str

merges_file

The file path to the merges file.

TYPE: str

unk_token

The unknown token (default: '').

TYPE: str DEFAULT: '<unk>'

bos_token

The beginning of sentence token (default: '').

TYPE: str DEFAULT: '<s>'

sep_token

The separator token (default: '').

TYPE: str DEFAULT: '</s>'

pad_token

The padding token (default: '').

TYPE: str DEFAULT: '<pad>'

cls_token

The classification token (default: '').

TYPE: str DEFAULT: '</s>'

mask_token

The masking token (default: '').

TYPE: str DEFAULT: '<special1>'

additional_special_tokens

List of additional special tokens (default: ['', '', '', '', '', '', '', '', '', '']).

TYPE: list DEFAULT: ['<special0>', '<special1>', '<special2>', '<special3>', '<special4>', '<special5>', '<special6>', '<special7>', '<special8>', '<special9>']

lang2id

A dictionary mapping languages to IDs.

TYPE: dict DEFAULT: None

id2lang

A dictionary mapping IDs to languages.

TYPE: dict DEFAULT: None

do_lowercase_and_remove_accent

A flag indicating whether to lowercase and remove accents (default: True).

TYPE: bool DEFAULT: True

**kwargs

Additional keyword arguments.

DEFAULT: {}

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
ImportError

If the sacremoses library is not installed.

Source code in mindnlp/transformers/models/xlm/tokenization_xlm.py
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
def __init__(
    self,
    vocab_file,
    merges_file,
    unk_token="<unk>",
    bos_token="<s>",
    sep_token="</s>",
    pad_token="<pad>",
    cls_token="</s>",
    mask_token="<special1>",
    additional_special_tokens=[
        "<special0>",
        "<special1>",
        "<special2>",
        "<special3>",
        "<special4>",
        "<special5>",
        "<special6>",
        "<special7>",
        "<special8>",
        "<special9>",
    ],
    lang2id=None,
    id2lang=None,
    do_lowercase_and_remove_accent=True,
    **kwargs,
):
    '''
    Initializes an instance of XLMTokenizer.

    Args:
        self: The instance of the class.
        vocab_file (str): The file path to the vocabulary file.
        merges_file (str): The file path to the merges file.
        unk_token (str): The unknown token (default: '<unk>').
        bos_token (str): The beginning of sentence token (default: '<s>').
        sep_token (str): The separator token (default: '</s>').
        pad_token (str): The padding token (default: '<pad>').
        cls_token (str): The classification token (default: '</s>').
        mask_token (str): The masking token (default: '<special1>').
        additional_special_tokens (list): List of additional special tokens (default: ['<special0>', '<special1>',
            '<special2>', '<special3>', '<special4>', '<special5>', '<special6>', '<special7>', '<special8>',
            '<special9>']).
        lang2id (dict): A dictionary mapping languages to IDs.
        id2lang (dict): A dictionary mapping IDs to languages.
        do_lowercase_and_remove_accent (bool): A flag indicating whether to lowercase and remove accents
            (default: True).
        **kwargs: Additional keyword arguments.

    Returns:
        None.

    Raises:
        ImportError: If the sacremoses library is not installed.

    '''
    try:
        import sacremoses
    except ImportError as exc:
        raise ImportError(
            "You need to install sacremoses to use XLMTokenizer. "
            "See https://pypi.org/project/sacremoses/ for installation."
        ) from exc

    self.sm = sacremoses

    # cache of sm.MosesPunctNormalizer instance
    self.cache_moses_punct_normalizer = {}
    # cache of sm.MosesTokenizer instance
    self.cache_moses_tokenizer = {}
    self.lang_with_custom_tokenizer = {"zh", "th", "ja"}
    # True for current supported model (v1.2.0), False for XLM-17 & 100
    self.do_lowercase_and_remove_accent = do_lowercase_and_remove_accent
    self.lang2id = lang2id
    self.id2lang = id2lang
    if lang2id is not None and id2lang is not None:
        assert len(lang2id) == len(id2lang)

    self.ja_word_tokenizer = None
    self.zh_word_tokenizer = None

    with open(vocab_file, encoding="utf-8") as vocab_handle:
        self.encoder = json.load(vocab_handle)
    self.decoder = {v: k for k, v in self.encoder.items()}
    with open(merges_file, encoding="utf-8") as merges_handle:
        merges = merges_handle.read().split("\n")[:-1]
    merges = [tuple(merge.split()[:2]) for merge in merges]
    self.bpe_ranks = dict(zip(merges, range(len(merges))))
    self.cache = {}
    super().__init__(
        unk_token=unk_token,
        bos_token=bos_token,
        sep_token=sep_token,
        pad_token=pad_token,
        cls_token=cls_token,
        mask_token=mask_token,
        additional_special_tokens=additional_special_tokens,
        lang2id=lang2id,
        id2lang=id2lang,
        do_lowercase_and_remove_accent=do_lowercase_and_remove_accent,
        **kwargs,
    )

mindnlp.transformers.models.xlm.tokenization_xlm.XLMTokenizer.__setstate__(d)

Sets the state of the XLMTokenizer object.

PARAMETER DESCRIPTION
self

The XLMTokenizer object.

TYPE: XLMTokenizer

d

The dictionary containing the state to be set. The dictionary should have the following keys:

  • 'dict': The dictionary representing the attributes of the object.

TYPE: dict

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
ImportError

If the 'sacremoses' module is not installed, an ImportError is raised. The error message will provide instructions on how to install the module.

Source code in mindnlp/transformers/models/xlm/tokenization_xlm.py
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
def __setstate__(self, d):
    """
    Sets the state of the XLMTokenizer object.

    Args:
        self (XLMTokenizer): The XLMTokenizer object.
        d (dict): The dictionary containing the state to be set.
            The dictionary should have the following keys:

            - '__dict__': The dictionary representing the attributes of the object.

    Returns:
        None.

    Raises:
        ImportError: If the 'sacremoses' module is not installed, an ImportError is raised. The error message
            will provide instructions on how to install the module.
    """
    self.__dict__ = d

    try:
        import sacremoses
    except ImportError as exc:
        raise ImportError(
            "You need to install sacremoses to use XLMTokenizer. "
            "See https://pypi.org/project/sacremoses/ for installation."
        ) from exc

    self.sm = sacremoses

mindnlp.transformers.models.xlm.tokenization_xlm.XLMTokenizer.bpe(token)

This method is part of the XLMTokenizer class and performs Byte Pair Encoding (BPE) on a given token.

PARAMETER DESCRIPTION
self

The instance of the XLMTokenizer class.

token

The input token to be processed through BPE. It should be a string representing a word.

TYPE: str

RETURNS DESCRIPTION
str

The processed token after applying Byte Pair Encoding. The token may have undergone splitting or merging based on the rules of BPE.

RAISES DESCRIPTION
ValueError

If an error occurs during the processing of the token, such as an issue with indexing or comparison.

KeyError

If the method encounters a key error while accessing data structures like dictionaries.

Source code in mindnlp/transformers/models/xlm/tokenization_xlm.py
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
def bpe(self, token):
    """
    This method is part of the XLMTokenizer class and performs Byte Pair Encoding (BPE) on a given token.

    Args:
        self: The instance of the XLMTokenizer class.
        token (str): The input token to be processed through BPE. It should be a string representing a word.

    Returns:
        str: The processed token after applying Byte Pair Encoding. The token may have undergone splitting or
            merging based on the rules of BPE.

    Raises:
        ValueError: If an error occurs during the processing of the token, such as an issue with indexing or
            comparison.
        KeyError: If the method encounters a key error while accessing data structures like dictionaries.
    """
    word = tuple(token[:-1]) + (token[-1] + "</w>",)
    if token in self.cache:
        return self.cache[token]
    pairs = get_pairs(word)

    if not pairs:
        return token + "</w>"

    while True:
        bigram = min(pairs, key=lambda pair: self.bpe_ranks.get(pair, float("inf")))
        if bigram not in self.bpe_ranks:
            break
        first, second = bigram
        new_word = []
        i = 0
        while i < len(word):
            try:
                j = word.index(first, i)
            except ValueError:
                new_word.extend(word[i:])
                break
            else:
                new_word.extend(word[i:j])
                i = j

            if word[i] == first and i < len(word) - 1 and word[i + 1] == second:
                new_word.append(first + second)
                i += 2
            else:
                new_word.append(word[i])
                i += 1
        new_word = tuple(new_word)
        word = new_word
        if len(word) == 1:
            break
        pairs = get_pairs(word)
    word = " ".join(word)
    if word == "\n  </w>":
        word = "\n</w>"
    self.cache[token] = word
    return word