Skip to content

funnel

mindnlp.transformers.models.funnel.configuration_funnel

Funnel Transformer model configuration

mindnlp.transformers.models.funnel.configuration_funnel.FunnelConfig

Bases: PretrainedConfig

This is the configuration class to store the configuration of a [FunnelModel] or a [TFBertModel]. It is used to instantiate a Funnel Transformer model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the Funnel Transformer funnel-transformer/small architecture.

Configuration objects inherit from [PretrainedConfig] and can be used to control the model outputs. Read the documentation from [PretrainedConfig] for more information.

PARAMETER DESCRIPTION
vocab_size

Vocabulary size of the Funnel transformer. Defines the number of different tokens that can be represented by the inputs_ids passed when calling [FunnelModel] or [TFFunnelModel].

TYPE: `int`, *optional*, defaults to 30522 DEFAULT: 30522

block_sizes

The sizes of the blocks used in the model.

TYPE: `List[int]`, *optional*, defaults to `[4, 4, 4]` DEFAULT: [4, 4, 4]

block_repeats

If passed along, each layer of each block is repeated the number of times indicated.

TYPE: `List[int]`, *optional* DEFAULT: None

num_decoder_layers

The number of layers in the decoder (when not using the base model).

TYPE: `int`, *optional*, defaults to 2 DEFAULT: 2

d_model

Dimensionality of the model's hidden states.

TYPE: `int`, *optional*, defaults to 768 DEFAULT: 768

n_head

Number of attention heads for each attention layer in the Transformer encoder.

TYPE: `int`, *optional*, defaults to 12 DEFAULT: 12

d_head

Dimensionality of the model's heads.

TYPE: `int`, *optional*, defaults to 64 DEFAULT: 64

d_inner

Inner dimension in the feed-forward blocks.

TYPE: `int`, *optional*, defaults to 3072 DEFAULT: 3072

hidden_act

The non-linear activation function (function or string) in the encoder and pooler. If string, "gelu", "relu", "silu" and "gelu_new" are supported.

TYPE: `str` or `callable`, *optional*, defaults to `"gelu_new"` DEFAULT: 'gelu_new'

hidden_dropout

The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.

TYPE: `float`, *optional*, defaults to 0.1 DEFAULT: 0.1

attention_dropout

The dropout probability for the attention probabilities.

TYPE: `float`, *optional*, defaults to 0.1 DEFAULT: 0.1

activation_dropout

The dropout probability used between the two layers of the feed-forward blocks.

TYPE: `float`, *optional*, defaults to 0.0 DEFAULT: 0.0

initializer_range

The upper bound of the uniform initializer for initializing all weight matrices in attention layers.

TYPE: `float`, *optional*, defaults to 0.1 DEFAULT: 0.1

initializer_std

The standard deviation of the normal initializer for initializing the embedding matrix and the weight of linear layers. Will default to 1 for the embedding matrix and the value given by Xavier initialization for linear layers.

TYPE: `float`, *optional* DEFAULT: None

layer_norm_eps

The epsilon used by the layer normalization layers.

TYPE: `float`, *optional*, defaults to 1e-09 DEFAULT: 1e-09

pooling_type

Possible values are "mean" or "max". The way pooling is performed at the beginning of each block.

TYPE: `str`, *optional*, defaults to `"mean"` DEFAULT: 'mean'

attention_type

Possible values are "relative_shift" or "factorized". The former is faster on CPU/GPU while the latter is faster on TPU.

TYPE: `str`, *optional*, defaults to `"relative_shift"` DEFAULT: 'relative_shift'

separate_cls

Whether or not to separate the cls token when applying pooling.

TYPE: `bool`, *optional*, defaults to `True` DEFAULT: True

truncate_seq

When using separate_cls, whether or not to truncate the last token when pooling, to avoid getting a sequence length that is not a multiple of 2.

TYPE: `bool`, *optional*, defaults to `True` DEFAULT: True

pool_q_only

Whether or not to apply the pooling only to the query or to query, key and values for the attention layers.

TYPE: `bool`, *optional*, defaults to `True` DEFAULT: True

Source code in mindnlp/transformers/models/funnel/configuration_funnel.py
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
class FunnelConfig(PretrainedConfig):
    r"""
    This is the configuration class to store the configuration of a [`FunnelModel`] or a [`TFBertModel`]. It is used to
    instantiate a Funnel Transformer model according to the specified arguments, defining the model architecture.
    Instantiating a configuration with the defaults will yield a similar configuration to that of the Funnel
    Transformer [funnel-transformer/small](https://huggingface.co/funnel-transformer/small) architecture.

    Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
    documentation from [`PretrainedConfig`] for more information.

    Args:
        vocab_size (`int`, *optional*, defaults to 30522):
            Vocabulary size of the Funnel transformer. Defines the number of different tokens that can be represented
            by the `inputs_ids` passed when calling [`FunnelModel`] or [`TFFunnelModel`].
        block_sizes (`List[int]`, *optional*, defaults to `[4, 4, 4]`):
            The sizes of the blocks used in the model.
        block_repeats (`List[int]`, *optional*):
            If passed along, each layer of each block is repeated the number of times indicated.
        num_decoder_layers (`int`, *optional*, defaults to 2):
            The number of layers in the decoder (when not using the base model).
        d_model (`int`, *optional*, defaults to 768):
            Dimensionality of the model's hidden states.
        n_head (`int`, *optional*, defaults to 12):
            Number of attention heads for each attention layer in the Transformer encoder.
        d_head (`int`, *optional*, defaults to 64):
            Dimensionality of the model's heads.
        d_inner (`int`, *optional*, defaults to 3072):
            Inner dimension in the feed-forward blocks.
        hidden_act (`str` or `callable`, *optional*, defaults to `"gelu_new"`):
            The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
            `"relu"`, `"silu"` and `"gelu_new"` are supported.
        hidden_dropout (`float`, *optional*, defaults to 0.1):
            The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
        attention_dropout (`float`, *optional*, defaults to 0.1):
            The dropout probability for the attention probabilities.
        activation_dropout (`float`, *optional*, defaults to 0.0):
            The dropout probability used between the two layers of the feed-forward blocks.
        initializer_range (`float`, *optional*, defaults to 0.1):
            The upper bound of the *uniform initializer* for initializing all weight matrices in attention layers.
        initializer_std (`float`, *optional*):
            The standard deviation of the *normal initializer* for initializing the embedding matrix and the weight of
            linear layers. Will default to 1 for the embedding matrix and the value given by Xavier initialization for
            linear layers.
        layer_norm_eps (`float`, *optional*, defaults to 1e-09):
            The epsilon used by the layer normalization layers.
        pooling_type (`str`, *optional*, defaults to `"mean"`):
            Possible values are `"mean"` or `"max"`. The way pooling is performed at the beginning of each block.
        attention_type (`str`, *optional*, defaults to `"relative_shift"`):
            Possible values are `"relative_shift"` or `"factorized"`. The former is faster on CPU/GPU while the latter
            is faster on TPU.
        separate_cls (`bool`, *optional*, defaults to `True`):
            Whether or not to separate the cls token when applying pooling.
        truncate_seq (`bool`, *optional*, defaults to `True`):
            When using `separate_cls`, whether or not to truncate the last token when pooling, to avoid getting a
            sequence length that is not a multiple of 2.
        pool_q_only (`bool`, *optional*, defaults to `True`):
            Whether or not to apply the pooling only to the query or to query, key and values for the attention layers.
    """

    model_type = "funnel"
    attribute_map = {
        "hidden_size": "d_model",
        "num_attention_heads": "n_head",
    }

    def __init__(
        self,
        vocab_size=30522,
        block_sizes=[4, 4, 4],
        block_repeats=None,
        num_decoder_layers=2,
        d_model=768,
        n_head=12,
        d_head=64,
        d_inner=3072,
        hidden_act="gelu_new",
        hidden_dropout=0.1,
        attention_dropout=0.1,
        activation_dropout=0.0,
        initializer_range=0.1,
        initializer_std=None,
        layer_norm_eps=1e-9,
        pooling_type="mean",
        attention_type="relative_shift",
        separate_cls=True,
        truncate_seq=True,
        pool_q_only=True,
        **kwargs,
    ):
        self.vocab_size = vocab_size
        self.block_sizes = block_sizes
        self.block_repeats = [1] * len(block_sizes) if block_repeats is None else block_repeats
        assert len(block_sizes) == len(
            self.block_repeats
        ), "`block_sizes` and `block_repeats` should have the same length."
        self.num_decoder_layers = num_decoder_layers
        self.d_model = d_model
        self.n_head = n_head
        self.d_head = d_head
        self.d_inner = d_inner
        self.hidden_act = hidden_act
        self.hidden_dropout = hidden_dropout
        self.attention_dropout = attention_dropout
        self.activation_dropout = activation_dropout
        self.initializer_range = initializer_range
        self.initializer_std = initializer_std
        self.layer_norm_eps = layer_norm_eps
        assert pooling_type in [
            "mean",
            "max",
        ], f"Got {pooling_type} for `pooling_type` but only 'mean' and 'max' are supported."
        self.pooling_type = pooling_type
        assert attention_type in [
            "relative_shift",
            "factorized",
        ], f"Got {attention_type} for `attention_type` but only 'relative_shift' and 'factorized' are supported."
        self.attention_type = attention_type
        self.separate_cls = separate_cls
        self.truncate_seq = truncate_seq
        self.pool_q_only = pool_q_only

        super().__init__(**kwargs)

    @property
    def num_hidden_layers(self):
        return sum(self.block_sizes)

    @num_hidden_layers.setter
    def num_hidden_layers(self, value):
        raise NotImplementedError(
            "This model does not support the setting of `num_hidden_layers`. Please set `block_sizes`."
        )

    @property
    def num_blocks(self):
        return len(self.block_sizes)

    @num_blocks.setter
    def num_blocks(self, value):
        raise NotImplementedError("This model does not support the setting of `num_blocks`. Please set `block_sizes`.")

mindnlp.transformers.models.funnel.modeling_funnel

PyTorch Funnel Transformer model.

mindnlp.transformers.models.funnel.modeling_funnel.FunnelAttentionStructure

Bases: Module

Contains helpers for FunnelRelMultiheadAttention.

Source code in mindnlp/transformers/models/funnel/modeling_funnel.py
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
class FunnelAttentionStructure(nn.Module):
    """
    Contains helpers for `FunnelRelMultiheadAttention `.
    """

    cls_token_type_id: int = 2

    def __init__(self, config: FunnelConfig) -> None:
        super().__init__()
        self.config = config
        self.sin_dropout = nn.Dropout(p=config.hidden_dropout)
        self.cos_dropout = nn.Dropout(p=config.hidden_dropout)
        # Track where we are at in terms of pooling from the original input, e.g., by how much the sequence length was
        # divided.
        self.pooling_mult = None

    def init_attention_inputs(
        self,
        inputs_embeds: mindspore.Tensor,
        attention_mask: Optional[mindspore.Tensor] = None,
        token_type_ids: Optional[mindspore.Tensor] = None,
    ) -> Tuple[mindspore.Tensor]:
        """Returns the attention inputs associated to the inputs of the model."""
        # inputs_embeds has shape batch_size x seq_len x d_model
        # attention_mask and token_type_ids have shape batch_size x seq_len
        self.pooling_mult = 1
        self.seq_len = seq_len = inputs_embeds.shape[1]
        position_embeds = self.get_position_embeds(seq_len, inputs_embeds.dtype)
        token_type_mat = self.token_type_ids_to_mat(token_type_ids) if token_type_ids is not None else None
        cls_mask = (
            ops.pad(inputs_embeds.new_ones([seq_len - 1, seq_len - 1]), (1, 0, 1, 0))
            if self.config.separate_cls
            else None
        )
        return (position_embeds, token_type_mat, attention_mask, cls_mask)

    def token_type_ids_to_mat(self, token_type_ids: mindspore.Tensor) -> mindspore.Tensor:
        """Convert `token_type_ids` to `token_type_mat`."""
        token_type_mat = token_type_ids[:, :, None] == token_type_ids[:, None]
        # Treat <cls> as in the same segment as both A & B
        cls_ids = token_type_ids == self.cls_token_type_id
        cls_mat = cls_ids[:, :, None] | cls_ids[:, None]
        return cls_mat | token_type_mat

    def get_position_embeds(
        self, seq_len: int, dtype: mindspore.dtype
    ) -> Union[Tuple[mindspore.Tensor], List[List[mindspore.Tensor]]]:
        """
        Create and cache inputs related to relative position encoding. Those are very different depending on whether we
        are using the factorized or the relative shift attention:

        For the factorized attention, it returns the matrices (phi, pi, psi, omega) used in the paper, appendix A.2.2,
        final formula.

        For the relative shift attention, it returns all possible vectors R used in the paper, appendix A.2.1, final
        formula.

        Paper link: https://arxiv.org/abs/2006.03236
        """
        d_model = self.config.d_model
        if self.config.attention_type == "factorized":
            # Notations from the paper, appending A.2.2, final formula.
            # We need to create and return the matrices phi, psi, pi and omega.
            pos_seq = ops.arange(0, seq_len, 1.0, dtype=mindspore.int64).to(dtype)
            freq_seq = ops.arange(0, d_model // 2, 1.0, dtype=mindspore.int64).to(dtype)
            inv_freq = 1 / (10000 ** (freq_seq / (d_model // 2)))
            sinusoid = pos_seq[:, None] * inv_freq[None]
            sin_embed = ops.sin(sinusoid)
            sin_embed_d = self.sin_dropout(sin_embed)
            cos_embed = ops.cos(sinusoid)
            cos_embed_d = self.cos_dropout(cos_embed)
            # This is different from the formula on the paper...
            phi = ops.cat([sin_embed_d, sin_embed_d], axis=-1)
            psi = ops.cat([cos_embed, sin_embed], axis=-1)
            pi = ops.cat([cos_embed_d, cos_embed_d], axis=-1)
            omega = ops.cat([-sin_embed, cos_embed], axis=-1)
            return (phi, pi, psi, omega)
        else:
            # Notations from the paper, appending A.2.1, final formula.
            # We need to create and return all the possible vectors R for all blocks and shifts.
            freq_seq = ops.arange(0, d_model // 2, 1.0, dtype=mindspore.int64).to(dtype)
            inv_freq = 1 / (10000 ** (freq_seq / (d_model // 2)))
            # Maximum relative positions for the first input
            rel_pos_id = ops.arange(-seq_len * 2, seq_len * 2, 1.0, dtype=mindspore.int64).to(dtype)
            zero_offset = seq_len * 2
            sinusoid = rel_pos_id[:, None] * inv_freq[None]
            sin_embed = self.sin_dropout(ops.sin(sinusoid))
            cos_embed = self.cos_dropout(ops.cos(sinusoid))
            pos_embed = ops.cat([sin_embed, cos_embed], axis=-1)

            pos = ops.arange(0, seq_len, dtype=mindspore.int64).to(dtype)
            pooled_pos = pos
            position_embeds_list = []
            for block_index in range(0, self.config.num_blocks):
                # For each block with block_index > 0, we need two types position embeddings:
                #   - Attention(pooled-q, unpooled-kv)
                #   - Attention(pooled-q, pooled-kv)
                # For block_index = 0 we only need the second one and leave the first one as None.

                # First type
                if block_index == 0:
                    position_embeds_pooling = None
                else:
                    pooled_pos = self.stride_pool_pos(pos, block_index)

                    # forward rel_pos_id
                    stride = 2 ** (block_index - 1)
                    rel_pos = self.relative_pos(pos, stride, pooled_pos, shift=2)
                    rel_pos = rel_pos[:, None] + zero_offset
                    # rel_pos = rel_pos.expand(rel_pos.shape[0], d_model)
                    rel_pos = rel_pos.broadcast_to((rel_pos.shape[0], d_model))
                    position_embeds_pooling = ops.gather_elements(pos_embed, 0, rel_pos)

                # Second type
                pos = pooled_pos
                stride = 2**block_index
                rel_pos = self.relative_pos(pos, stride)

                rel_pos = rel_pos[:, None] + zero_offset
                # rel_pos = rel_pos.expand(rel_pos.shape[0], d_model)
                rel_pos = rel_pos.broadcast_to((rel_pos.shape[0], d_model))
                position_embeds_no_pooling = ops.gather_elements(pos_embed, 0, rel_pos)

                position_embeds_list.append([position_embeds_no_pooling, position_embeds_pooling])
            return position_embeds_list

    def stride_pool_pos(self, pos_id: mindspore.Tensor, block_index: int):
        """
        Pool `pos_id` while keeping the cls token separate (if `config.separate_cls=True`).
        """
        if self.config.separate_cls:
            # Under separate <cls>, we treat the <cls> as the first token in
            # the previous block of the 1st real block. Since the 1st real
            # block always has position 1, the position of the previous block
            # will be at `1 - 2 ** block_index`.
            # cls_pos = pos_id.new_tensor([-(2**block_index) + 1])
            cls_pos = mindspore.Tensor([-(2**block_index) + 1], dtype=pos_id.dtype)
            pooled_pos_id = pos_id[1:-1] if self.config.truncate_seq else pos_id[1:]
            return ops.cat([cls_pos, pooled_pos_id[::2]], axis=0)
        else:
            return pos_id[::2]

    def relative_pos(self, pos: mindspore.Tensor, stride: int, pooled_pos=None, shift: int = 1) -> mindspore.Tensor:
        """
        Build the relative positional vector between `pos` and `pooled_pos`.
        """
        if pooled_pos is None:
            pooled_pos = pos

        ref_point = pooled_pos[0] - pos[0]
        num_remove = shift * len(pooled_pos)
        max_dist = ref_point + num_remove * stride
        min_dist = pooled_pos[0] - pos[-1]

        return ops.arange(max_dist, min_dist - 1, -stride, dtype=mindspore.int64)

    def stride_pool(
        self,
        tensor: Union[mindspore.Tensor, Tuple[mindspore.Tensor], List[mindspore.Tensor]],
        axis: Union[int, Tuple[int], List[int]],
    ) -> mindspore.Tensor:
        """
        Perform pooling by stride slicing the tensor along the given axis.
        """
        if tensor is None:
            return None

        # Do the stride pool recursively if axis is a list or a tuple of ints.
        if isinstance(axis, (list, tuple)):
            for ax in axis:
                tensor = self.stride_pool(tensor, ax)
            return tensor

        # Do the stride pool recursively if tensor is a list or tuple of tensors.
        if isinstance(tensor, (tuple, list)):
            return type(tensor)(self.stride_pool(x, axis) for x in tensor)

        # Deal with negative axis
        axis %= tensor.ndim

        axis_slice = (
            slice(None, -1, 2) if self.config.separate_cls and self.config.truncate_seq else slice(None, None, 2)
        )
        enc_slice = [slice(None)] * axis + [axis_slice]
        if self.config.separate_cls:
            cls_slice = [slice(None)] * axis + [slice(None, 1)]
            tensor = ops.cat([tensor[cls_slice], tensor], axis=axis)
        return tensor[enc_slice]

    def pool_tensor(
        self, tensor: Union[mindspore.Tensor, Tuple[mindspore.Tensor], List[mindspore.Tensor]], mode: str = "mean", stride: int = 2
    ) -> mindspore.Tensor:
        """Apply 1D pooling to a tensor of size [B x T (x H)]."""
        if tensor is None:
            return None

        # Do the pool recursively if tensor is a list or tuple of tensors.
        if isinstance(tensor, (tuple, list)):
            return type(tensor)(self.pool_tensor(tensor, mode=mode, stride=stride) for x in tensor)

        if self.config.separate_cls:
            suffix = tensor[:, :-1] if self.config.truncate_seq else tensor
            tensor = ops.cat([tensor[:, :1], suffix], axis=1)

        ndim = tensor.ndim
        if ndim == 2:
            tensor = tensor[:, None, :, None]
        elif ndim == 3:
            tensor = tensor[:, None, :, :]
        # Stride is applied on the second-to-last dimension.
        stride = (stride, 1)

        if mode == "mean":
            tensor = ops.avg_pool2d(tensor, stride, stride=stride, ceil_mode=True)
        elif mode == "max":
            tensor = ops.max_pool2d(tensor, stride, stride=stride, ceil_mode=True)
        elif mode == "min":
            tensor = -ops.max_pool2d(-tensor, stride, stride=stride, ceil_mode=True)
        else:
            raise NotImplementedError("The supported modes are 'mean', 'max' and 'min'.")

        if ndim == 2:
            return tensor[:, 0, :, 0]
        elif ndim == 3:
            return tensor[:, 0]
        return tensor

    def pre_attention_pooling(
        self, output, attention_inputs: Tuple[mindspore.Tensor]
    ) -> Tuple[mindspore.Tensor, Tuple[mindspore.Tensor]]:
        """Pool `output` and the proper parts of `attention_inputs` before the attention layer."""
        position_embeds, token_type_mat, attention_mask, cls_mask = attention_inputs
        if self.config.pool_q_only:
            if self.config.attention_type == "factorized":
                position_embeds = self.stride_pool(position_embeds[:2], 0) + position_embeds[2:]
            token_type_mat = self.stride_pool(token_type_mat, 1)
            cls_mask = self.stride_pool(cls_mask, 0)
            output = self.pool_tensor(output, mode=self.config.pooling_type)
        else:
            self.pooling_mult *= 2
            if self.config.attention_type == "factorized":
                position_embeds = self.stride_pool(position_embeds, 0)
            token_type_mat = self.stride_pool(token_type_mat, [1, 2])
            cls_mask = self.stride_pool(cls_mask, [1, 2])
            attention_mask = self.pool_tensor(attention_mask, mode="min")
            output = self.pool_tensor(output, mode=self.config.pooling_type)
        attention_inputs = (position_embeds, token_type_mat, attention_mask, cls_mask)
        return output, attention_inputs

    def post_attention_pooling(self, attention_inputs: Tuple[mindspore.Tensor]) -> Tuple[mindspore.Tensor]:
        """Pool the proper parts of `attention_inputs` after the attention layer."""
        position_embeds, token_type_mat, attention_mask, cls_mask = attention_inputs
        if self.config.pool_q_only:
            self.pooling_mult *= 2
            if self.config.attention_type == "factorized":
                position_embeds = position_embeds[:2] + self.stride_pool(position_embeds[2:], 0)
            token_type_mat = self.stride_pool(token_type_mat, 2)
            cls_mask = self.stride_pool(cls_mask, 1)
            attention_mask = self.pool_tensor(attention_mask, mode="min")
        attention_inputs = (position_embeds, token_type_mat, attention_mask, cls_mask)
        return attention_inputs

mindnlp.transformers.models.funnel.modeling_funnel.FunnelAttentionStructure.get_position_embeds(seq_len, dtype)

Create and cache inputs related to relative position encoding. Those are very different depending on whether we are using the factorized or the relative shift attention:

For the factorized attention, it returns the matrices (phi, pi, psi, omega) used in the paper, appendix A.2.2, final formula.

For the relative shift attention, it returns all possible vectors R used in the paper, appendix A.2.1, final formula.

Paper link: https://arxiv.org/abs/2006.03236

Source code in mindnlp/transformers/models/funnel/modeling_funnel.py
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
def get_position_embeds(
    self, seq_len: int, dtype: mindspore.dtype
) -> Union[Tuple[mindspore.Tensor], List[List[mindspore.Tensor]]]:
    """
    Create and cache inputs related to relative position encoding. Those are very different depending on whether we
    are using the factorized or the relative shift attention:

    For the factorized attention, it returns the matrices (phi, pi, psi, omega) used in the paper, appendix A.2.2,
    final formula.

    For the relative shift attention, it returns all possible vectors R used in the paper, appendix A.2.1, final
    formula.

    Paper link: https://arxiv.org/abs/2006.03236
    """
    d_model = self.config.d_model
    if self.config.attention_type == "factorized":
        # Notations from the paper, appending A.2.2, final formula.
        # We need to create and return the matrices phi, psi, pi and omega.
        pos_seq = ops.arange(0, seq_len, 1.0, dtype=mindspore.int64).to(dtype)
        freq_seq = ops.arange(0, d_model // 2, 1.0, dtype=mindspore.int64).to(dtype)
        inv_freq = 1 / (10000 ** (freq_seq / (d_model // 2)))
        sinusoid = pos_seq[:, None] * inv_freq[None]
        sin_embed = ops.sin(sinusoid)
        sin_embed_d = self.sin_dropout(sin_embed)
        cos_embed = ops.cos(sinusoid)
        cos_embed_d = self.cos_dropout(cos_embed)
        # This is different from the formula on the paper...
        phi = ops.cat([sin_embed_d, sin_embed_d], axis=-1)
        psi = ops.cat([cos_embed, sin_embed], axis=-1)
        pi = ops.cat([cos_embed_d, cos_embed_d], axis=-1)
        omega = ops.cat([-sin_embed, cos_embed], axis=-1)
        return (phi, pi, psi, omega)
    else:
        # Notations from the paper, appending A.2.1, final formula.
        # We need to create and return all the possible vectors R for all blocks and shifts.
        freq_seq = ops.arange(0, d_model // 2, 1.0, dtype=mindspore.int64).to(dtype)
        inv_freq = 1 / (10000 ** (freq_seq / (d_model // 2)))
        # Maximum relative positions for the first input
        rel_pos_id = ops.arange(-seq_len * 2, seq_len * 2, 1.0, dtype=mindspore.int64).to(dtype)
        zero_offset = seq_len * 2
        sinusoid = rel_pos_id[:, None] * inv_freq[None]
        sin_embed = self.sin_dropout(ops.sin(sinusoid))
        cos_embed = self.cos_dropout(ops.cos(sinusoid))
        pos_embed = ops.cat([sin_embed, cos_embed], axis=-1)

        pos = ops.arange(0, seq_len, dtype=mindspore.int64).to(dtype)
        pooled_pos = pos
        position_embeds_list = []
        for block_index in range(0, self.config.num_blocks):
            # For each block with block_index > 0, we need two types position embeddings:
            #   - Attention(pooled-q, unpooled-kv)
            #   - Attention(pooled-q, pooled-kv)
            # For block_index = 0 we only need the second one and leave the first one as None.

            # First type
            if block_index == 0:
                position_embeds_pooling = None
            else:
                pooled_pos = self.stride_pool_pos(pos, block_index)

                # forward rel_pos_id
                stride = 2 ** (block_index - 1)
                rel_pos = self.relative_pos(pos, stride, pooled_pos, shift=2)
                rel_pos = rel_pos[:, None] + zero_offset
                # rel_pos = rel_pos.expand(rel_pos.shape[0], d_model)
                rel_pos = rel_pos.broadcast_to((rel_pos.shape[0], d_model))
                position_embeds_pooling = ops.gather_elements(pos_embed, 0, rel_pos)

            # Second type
            pos = pooled_pos
            stride = 2**block_index
            rel_pos = self.relative_pos(pos, stride)

            rel_pos = rel_pos[:, None] + zero_offset
            # rel_pos = rel_pos.expand(rel_pos.shape[0], d_model)
            rel_pos = rel_pos.broadcast_to((rel_pos.shape[0], d_model))
            position_embeds_no_pooling = ops.gather_elements(pos_embed, 0, rel_pos)

            position_embeds_list.append([position_embeds_no_pooling, position_embeds_pooling])
        return position_embeds_list

mindnlp.transformers.models.funnel.modeling_funnel.FunnelAttentionStructure.init_attention_inputs(inputs_embeds, attention_mask=None, token_type_ids=None)

Returns the attention inputs associated to the inputs of the model.

Source code in mindnlp/transformers/models/funnel/modeling_funnel.py
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
def init_attention_inputs(
    self,
    inputs_embeds: mindspore.Tensor,
    attention_mask: Optional[mindspore.Tensor] = None,
    token_type_ids: Optional[mindspore.Tensor] = None,
) -> Tuple[mindspore.Tensor]:
    """Returns the attention inputs associated to the inputs of the model."""
    # inputs_embeds has shape batch_size x seq_len x d_model
    # attention_mask and token_type_ids have shape batch_size x seq_len
    self.pooling_mult = 1
    self.seq_len = seq_len = inputs_embeds.shape[1]
    position_embeds = self.get_position_embeds(seq_len, inputs_embeds.dtype)
    token_type_mat = self.token_type_ids_to_mat(token_type_ids) if token_type_ids is not None else None
    cls_mask = (
        ops.pad(inputs_embeds.new_ones([seq_len - 1, seq_len - 1]), (1, 0, 1, 0))
        if self.config.separate_cls
        else None
    )
    return (position_embeds, token_type_mat, attention_mask, cls_mask)

mindnlp.transformers.models.funnel.modeling_funnel.FunnelAttentionStructure.pool_tensor(tensor, mode='mean', stride=2)

Apply 1D pooling to a tensor of size [B x T (x H)].

Source code in mindnlp/transformers/models/funnel/modeling_funnel.py
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
def pool_tensor(
    self, tensor: Union[mindspore.Tensor, Tuple[mindspore.Tensor], List[mindspore.Tensor]], mode: str = "mean", stride: int = 2
) -> mindspore.Tensor:
    """Apply 1D pooling to a tensor of size [B x T (x H)]."""
    if tensor is None:
        return None

    # Do the pool recursively if tensor is a list or tuple of tensors.
    if isinstance(tensor, (tuple, list)):
        return type(tensor)(self.pool_tensor(tensor, mode=mode, stride=stride) for x in tensor)

    if self.config.separate_cls:
        suffix = tensor[:, :-1] if self.config.truncate_seq else tensor
        tensor = ops.cat([tensor[:, :1], suffix], axis=1)

    ndim = tensor.ndim
    if ndim == 2:
        tensor = tensor[:, None, :, None]
    elif ndim == 3:
        tensor = tensor[:, None, :, :]
    # Stride is applied on the second-to-last dimension.
    stride = (stride, 1)

    if mode == "mean":
        tensor = ops.avg_pool2d(tensor, stride, stride=stride, ceil_mode=True)
    elif mode == "max":
        tensor = ops.max_pool2d(tensor, stride, stride=stride, ceil_mode=True)
    elif mode == "min":
        tensor = -ops.max_pool2d(-tensor, stride, stride=stride, ceil_mode=True)
    else:
        raise NotImplementedError("The supported modes are 'mean', 'max' and 'min'.")

    if ndim == 2:
        return tensor[:, 0, :, 0]
    elif ndim == 3:
        return tensor[:, 0]
    return tensor

mindnlp.transformers.models.funnel.modeling_funnel.FunnelAttentionStructure.post_attention_pooling(attention_inputs)

Pool the proper parts of attention_inputs after the attention layer.

Source code in mindnlp/transformers/models/funnel/modeling_funnel.py
331
332
333
334
335
336
337
338
339
340
341
342
def post_attention_pooling(self, attention_inputs: Tuple[mindspore.Tensor]) -> Tuple[mindspore.Tensor]:
    """Pool the proper parts of `attention_inputs` after the attention layer."""
    position_embeds, token_type_mat, attention_mask, cls_mask = attention_inputs
    if self.config.pool_q_only:
        self.pooling_mult *= 2
        if self.config.attention_type == "factorized":
            position_embeds = position_embeds[:2] + self.stride_pool(position_embeds[2:], 0)
        token_type_mat = self.stride_pool(token_type_mat, 2)
        cls_mask = self.stride_pool(cls_mask, 1)
        attention_mask = self.pool_tensor(attention_mask, mode="min")
    attention_inputs = (position_embeds, token_type_mat, attention_mask, cls_mask)
    return attention_inputs

mindnlp.transformers.models.funnel.modeling_funnel.FunnelAttentionStructure.pre_attention_pooling(output, attention_inputs)

Pool output and the proper parts of attention_inputs before the attention layer.

Source code in mindnlp/transformers/models/funnel/modeling_funnel.py
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
def pre_attention_pooling(
    self, output, attention_inputs: Tuple[mindspore.Tensor]
) -> Tuple[mindspore.Tensor, Tuple[mindspore.Tensor]]:
    """Pool `output` and the proper parts of `attention_inputs` before the attention layer."""
    position_embeds, token_type_mat, attention_mask, cls_mask = attention_inputs
    if self.config.pool_q_only:
        if self.config.attention_type == "factorized":
            position_embeds = self.stride_pool(position_embeds[:2], 0) + position_embeds[2:]
        token_type_mat = self.stride_pool(token_type_mat, 1)
        cls_mask = self.stride_pool(cls_mask, 0)
        output = self.pool_tensor(output, mode=self.config.pooling_type)
    else:
        self.pooling_mult *= 2
        if self.config.attention_type == "factorized":
            position_embeds = self.stride_pool(position_embeds, 0)
        token_type_mat = self.stride_pool(token_type_mat, [1, 2])
        cls_mask = self.stride_pool(cls_mask, [1, 2])
        attention_mask = self.pool_tensor(attention_mask, mode="min")
        output = self.pool_tensor(output, mode=self.config.pooling_type)
    attention_inputs = (position_embeds, token_type_mat, attention_mask, cls_mask)
    return output, attention_inputs

mindnlp.transformers.models.funnel.modeling_funnel.FunnelAttentionStructure.relative_pos(pos, stride, pooled_pos=None, shift=1)

Build the relative positional vector between pos and pooled_pos.

Source code in mindnlp/transformers/models/funnel/modeling_funnel.py
224
225
226
227
228
229
230
231
232
233
234
235
236
def relative_pos(self, pos: mindspore.Tensor, stride: int, pooled_pos=None, shift: int = 1) -> mindspore.Tensor:
    """
    Build the relative positional vector between `pos` and `pooled_pos`.
    """
    if pooled_pos is None:
        pooled_pos = pos

    ref_point = pooled_pos[0] - pos[0]
    num_remove = shift * len(pooled_pos)
    max_dist = ref_point + num_remove * stride
    min_dist = pooled_pos[0] - pos[-1]

    return ops.arange(max_dist, min_dist - 1, -stride, dtype=mindspore.int64)

mindnlp.transformers.models.funnel.modeling_funnel.FunnelAttentionStructure.stride_pool(tensor, axis)

Perform pooling by stride slicing the tensor along the given axis.

Source code in mindnlp/transformers/models/funnel/modeling_funnel.py
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
def stride_pool(
    self,
    tensor: Union[mindspore.Tensor, Tuple[mindspore.Tensor], List[mindspore.Tensor]],
    axis: Union[int, Tuple[int], List[int]],
) -> mindspore.Tensor:
    """
    Perform pooling by stride slicing the tensor along the given axis.
    """
    if tensor is None:
        return None

    # Do the stride pool recursively if axis is a list or a tuple of ints.
    if isinstance(axis, (list, tuple)):
        for ax in axis:
            tensor = self.stride_pool(tensor, ax)
        return tensor

    # Do the stride pool recursively if tensor is a list or tuple of tensors.
    if isinstance(tensor, (tuple, list)):
        return type(tensor)(self.stride_pool(x, axis) for x in tensor)

    # Deal with negative axis
    axis %= tensor.ndim

    axis_slice = (
        slice(None, -1, 2) if self.config.separate_cls and self.config.truncate_seq else slice(None, None, 2)
    )
    enc_slice = [slice(None)] * axis + [axis_slice]
    if self.config.separate_cls:
        cls_slice = [slice(None)] * axis + [slice(None, 1)]
        tensor = ops.cat([tensor[cls_slice], tensor], axis=axis)
    return tensor[enc_slice]

mindnlp.transformers.models.funnel.modeling_funnel.FunnelAttentionStructure.stride_pool_pos(pos_id, block_index)

Pool pos_id while keeping the cls token separate (if config.separate_cls=True).

Source code in mindnlp/transformers/models/funnel/modeling_funnel.py
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
def stride_pool_pos(self, pos_id: mindspore.Tensor, block_index: int):
    """
    Pool `pos_id` while keeping the cls token separate (if `config.separate_cls=True`).
    """
    if self.config.separate_cls:
        # Under separate <cls>, we treat the <cls> as the first token in
        # the previous block of the 1st real block. Since the 1st real
        # block always has position 1, the position of the previous block
        # will be at `1 - 2 ** block_index`.
        # cls_pos = pos_id.new_tensor([-(2**block_index) + 1])
        cls_pos = mindspore.Tensor([-(2**block_index) + 1], dtype=pos_id.dtype)
        pooled_pos_id = pos_id[1:-1] if self.config.truncate_seq else pos_id[1:]
        return ops.cat([cls_pos, pooled_pos_id[::2]], axis=0)
    else:
        return pos_id[::2]

mindnlp.transformers.models.funnel.modeling_funnel.FunnelAttentionStructure.token_type_ids_to_mat(token_type_ids)

Convert token_type_ids to token_type_mat.

Source code in mindnlp/transformers/models/funnel/modeling_funnel.py
118
119
120
121
122
123
124
def token_type_ids_to_mat(self, token_type_ids: mindspore.Tensor) -> mindspore.Tensor:
    """Convert `token_type_ids` to `token_type_mat`."""
    token_type_mat = token_type_ids[:, :, None] == token_type_ids[:, None]
    # Treat <cls> as in the same segment as both A & B
    cls_ids = token_type_ids == self.cls_token_type_id
    cls_mat = cls_ids[:, :, None] | cls_ids[:, None]
    return cls_mat | token_type_mat

mindnlp.transformers.models.funnel.modeling_funnel.FunnelDiscriminatorPredictions

Bases: Module

Prediction module for the discriminator, made up of two dense layers.

Source code in mindnlp/transformers/models/funnel/modeling_funnel.py
678
679
680
681
682
683
684
685
686
687
688
689
690
691
class FunnelDiscriminatorPredictions(nn.Module):
    """Prediction module for the discriminator, made up of two dense layers."""

    def __init__(self, config: FunnelConfig) -> None:
        super().__init__()
        self.config = config
        self.dense = nn.Linear(config.d_model, config.d_model)
        self.dense_prediction = nn.Linear(config.d_model, 1)

    def forward(self, discriminator_hidden_states: mindspore.Tensor) -> mindspore.Tensor:
        hidden_states = self.dense(discriminator_hidden_states)
        hidden_states = ACT2FN[self.config.hidden_act](hidden_states)
        logits = self.dense_prediction(hidden_states).squeeze(-1)
        return logits

mindnlp.transformers.models.funnel.modeling_funnel.FunnelForMaskedLM

Bases: FunnelPreTrainedModel

Source code in mindnlp/transformers/models/funnel/modeling_funnel.py
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
class FunnelForMaskedLM(FunnelPreTrainedModel):
    _tied_weights_keys = ["lm_head.weight"]

    def __init__(self, config: FunnelConfig) -> None:
        super().__init__(config)

        self.funnel = FunnelModel(config)
        self.lm_head = nn.Linear(config.d_model, config.vocab_size)

        # Initialize weights and apply final processing
        self.post_init()

    def get_output_embeddings(self) -> nn.Linear:
        return self.lm_head

    def set_output_embeddings(self, new_embeddings: nn.Embedding) -> None:
        self.lm_head = new_embeddings

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        token_type_ids: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        labels: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple, MaskedLMOutput]:
        r"""
        Args:
            labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
                Labels for computing the masked language modeling loss. Indices should be in `[-100, 0, ...,
                config.vocab_size]` (see `input_ids` docstring) Tokens with indices set to `-100` are ignored (masked), the
                loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        outputs = self.funnel(
            input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        last_hidden_state = outputs[0]
        prediction_logits = self.lm_head(last_hidden_state)

        masked_lm_loss = None
        if labels is not None:
            loss_fct = nn.CrossEntropyLoss()  # -100 index = padding token
            labels = labels.astype(mindspore.int32)
            masked_lm_loss = loss_fct(prediction_logits.view(-1, self.config.vocab_size), labels.view(-1))

        if not return_dict:
            output = (prediction_logits,) + outputs[1:]
            return ((masked_lm_loss,) + output) if masked_lm_loss is not None else output

        return MaskedLMOutput(
            loss=masked_lm_loss,
            logits=prediction_logits,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )

mindnlp.transformers.models.funnel.modeling_funnel.FunnelForMaskedLM.forward(input_ids=None, attention_mask=None, token_type_ids=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=None)

PARAMETER DESCRIPTION
labels

Labels for computing the masked language modeling loss. Indices should be in [-100, 0, ..., config.vocab_size] (see input_ids docstring) Tokens with indices set to -100 are ignored (masked), the loss is only computed for the tokens with labels in [0, ..., config.vocab_size]

TYPE: `torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/funnel/modeling_funnel.py
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    token_type_ids: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    labels: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[Tuple, MaskedLMOutput]:
    r"""
    Args:
        labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
            Labels for computing the masked language modeling loss. Indices should be in `[-100, 0, ...,
            config.vocab_size]` (see `input_ids` docstring) Tokens with indices set to `-100` are ignored (masked), the
            loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    outputs = self.funnel(
        input_ids,
        attention_mask=attention_mask,
        token_type_ids=token_type_ids,
        inputs_embeds=inputs_embeds,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    last_hidden_state = outputs[0]
    prediction_logits = self.lm_head(last_hidden_state)

    masked_lm_loss = None
    if labels is not None:
        loss_fct = nn.CrossEntropyLoss()  # -100 index = padding token
        labels = labels.astype(mindspore.int32)
        masked_lm_loss = loss_fct(prediction_logits.view(-1, self.config.vocab_size), labels.view(-1))

    if not return_dict:
        output = (prediction_logits,) + outputs[1:]
        return ((masked_lm_loss,) + output) if masked_lm_loss is not None else output

    return MaskedLMOutput(
        loss=masked_lm_loss,
        logits=prediction_logits,
        hidden_states=outputs.hidden_states,
        attentions=outputs.attentions,
    )

mindnlp.transformers.models.funnel.modeling_funnel.FunnelForMultipleChoice

Bases: FunnelPreTrainedModel

Source code in mindnlp/transformers/models/funnel/modeling_funnel.py
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
class FunnelForMultipleChoice(FunnelPreTrainedModel):
    def __init__(self, config: FunnelConfig) -> None:
        super().__init__(config)

        self.funnel = FunnelBaseModel(config)
        self.classifier = FunnelClassificationHead(config, 1)
        # Initialize weights and apply final processing
        self.post_init()

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        token_type_ids: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        labels: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple, MultipleChoiceModelOutput]:
        r"""
        Args:
            labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
                Labels for computing the multiple choice classification loss. Indices should be in `[0, ...,
                num_choices-1]` where `num_choices` is the size of the second dimension of the input tensors. (See
                `input_ids` above)
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
        num_choices = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1]

        input_ids = input_ids.view(-1, input_ids.shape[-1]) if input_ids is not None else None
        attention_mask = attention_mask.view(-1, attention_mask.shape[-1]) if attention_mask is not None else None
        token_type_ids = token_type_ids.view(-1, token_type_ids.shape[-1]) if token_type_ids is not None else None
        inputs_embeds = (
            inputs_embeds.view(-1, inputs_embeds.shape[-2], inputs_embeds.shape[-1])
            if inputs_embeds is not None
            else None
        )

        outputs = self.funnel(
            input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        last_hidden_state = outputs[0]
        pooled_output = last_hidden_state[:, 0]
        logits = self.classifier(pooled_output)
        reshaped_logits = logits.view(-1, num_choices)

        loss = None
        if labels is not None:
            loss_fct = nn.CrossEntropyLoss()
            labels = labels.astype(mindspore.int32)
            loss = loss_fct(reshaped_logits, labels)

        if not return_dict:
            output = (reshaped_logits,) + outputs[1:]
            return ((loss,) + output) if loss is not None else output

        return MultipleChoiceModelOutput(
            loss=loss,
            logits=reshaped_logits,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )

mindnlp.transformers.models.funnel.modeling_funnel.FunnelForMultipleChoice.forward(input_ids=None, attention_mask=None, token_type_ids=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=None)

PARAMETER DESCRIPTION
labels

Labels for computing the multiple choice classification loss. Indices should be in [0, ..., num_choices-1] where num_choices is the size of the second dimension of the input tensors. (See input_ids above)

TYPE: `torch.LongTensor` of shape `(batch_size,)`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/funnel/modeling_funnel.py
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    token_type_ids: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    labels: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[Tuple, MultipleChoiceModelOutput]:
    r"""
    Args:
        labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
            Labels for computing the multiple choice classification loss. Indices should be in `[0, ...,
            num_choices-1]` where `num_choices` is the size of the second dimension of the input tensors. (See
            `input_ids` above)
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict
    num_choices = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1]

    input_ids = input_ids.view(-1, input_ids.shape[-1]) if input_ids is not None else None
    attention_mask = attention_mask.view(-1, attention_mask.shape[-1]) if attention_mask is not None else None
    token_type_ids = token_type_ids.view(-1, token_type_ids.shape[-1]) if token_type_ids is not None else None
    inputs_embeds = (
        inputs_embeds.view(-1, inputs_embeds.shape[-2], inputs_embeds.shape[-1])
        if inputs_embeds is not None
        else None
    )

    outputs = self.funnel(
        input_ids,
        attention_mask=attention_mask,
        token_type_ids=token_type_ids,
        inputs_embeds=inputs_embeds,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    last_hidden_state = outputs[0]
    pooled_output = last_hidden_state[:, 0]
    logits = self.classifier(pooled_output)
    reshaped_logits = logits.view(-1, num_choices)

    loss = None
    if labels is not None:
        loss_fct = nn.CrossEntropyLoss()
        labels = labels.astype(mindspore.int32)
        loss = loss_fct(reshaped_logits, labels)

    if not return_dict:
        output = (reshaped_logits,) + outputs[1:]
        return ((loss,) + output) if loss is not None else output

    return MultipleChoiceModelOutput(
        loss=loss,
        logits=reshaped_logits,
        hidden_states=outputs.hidden_states,
        attentions=outputs.attentions,
    )

mindnlp.transformers.models.funnel.modeling_funnel.FunnelForPreTraining

Bases: FunnelPreTrainedModel

Source code in mindnlp/transformers/models/funnel/modeling_funnel.py
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
class FunnelForPreTraining(FunnelPreTrainedModel):
    def __init__(self, config: FunnelConfig) -> None:
        super().__init__(config)

        self.funnel = FunnelModel(config)
        self.discriminator_predictions = FunnelDiscriminatorPredictions(config)
        # Initialize weights and apply final processing
        self.post_init()

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        token_type_ids: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        labels: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple, FunnelForPreTrainingOutput]:
        r"""
        Args:
            labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
                Labels for computing the ELECTRA-style loss. Input should be a sequence of tokens (see `input_ids`
                docstring) Indices should be in `[0, 1]`:

                - 0 indicates the token is an original token,
                - 1 indicates the token was replaced.

        Returns:
            `Union[Tuple, FunnelForPreTrainingOutput]`

        Example:
            ```python
            >>> from transformers import AutoTokenizer, FunnelForPreTraining
            >>> import torch
            ...
            >>> tokenizer = AutoTokenizer.from_pretrained("funnel-transformer/small")
            >>> model = FunnelForPreTraining.from_pretrained("funnel-transformer/small")
            ...
            >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
            >>> logits = model(**inputs).logits
            ```
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        discriminator_hidden_states = self.funnel(
            input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )
        discriminator_sequence_output = discriminator_hidden_states[0]

        logits = self.discriminator_predictions(discriminator_sequence_output)

        loss = None
        if labels is not None:
            loss_fct = nn.BCEWithLogitsLoss()
            if attention_mask is not None:
                active_loss = attention_mask.view(-1, discriminator_sequence_output.shape[1]) == 1
                active_logits = logits.view(-1, discriminator_sequence_output.shape[1])[active_loss]
                active_labels = labels[active_loss]
                loss = loss_fct(active_logits, active_labels.float())
            else:
                loss = loss_fct(logits.view(-1, discriminator_sequence_output.shape[1]), labels.float())

        if not return_dict:
            output = (logits,) + discriminator_hidden_states[1:]
            return ((loss,) + output) if loss is not None else output

        return FunnelForPreTrainingOutput(
            loss=loss,
            logits=logits,
            hidden_states=discriminator_hidden_states.hidden_states,
            attentions=discriminator_hidden_states.attentions,
        )

mindnlp.transformers.models.funnel.modeling_funnel.FunnelForPreTraining.forward(input_ids=None, attention_mask=None, token_type_ids=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=None)

PARAMETER DESCRIPTION
labels

Labels for computing the ELECTRA-style loss. Input should be a sequence of tokens (see input_ids docstring) Indices should be in [0, 1]:

  • 0 indicates the token is an original token,
  • 1 indicates the token was replaced.

TYPE: `torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional* DEFAULT: None

RETURNS DESCRIPTION
Union[Tuple, FunnelForPreTrainingOutput]

Union[Tuple, FunnelForPreTrainingOutput]

Example
>>> from transformers import AutoTokenizer, FunnelForPreTraining
>>> import torch
...
>>> tokenizer = AutoTokenizer.from_pretrained("funnel-transformer/small")
>>> model = FunnelForPreTraining.from_pretrained("funnel-transformer/small")
...
>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> logits = model(**inputs).logits
Source code in mindnlp/transformers/models/funnel/modeling_funnel.py
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    token_type_ids: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    labels: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[Tuple, FunnelForPreTrainingOutput]:
    r"""
    Args:
        labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
            Labels for computing the ELECTRA-style loss. Input should be a sequence of tokens (see `input_ids`
            docstring) Indices should be in `[0, 1]`:

            - 0 indicates the token is an original token,
            - 1 indicates the token was replaced.

    Returns:
        `Union[Tuple, FunnelForPreTrainingOutput]`

    Example:
        ```python
        >>> from transformers import AutoTokenizer, FunnelForPreTraining
        >>> import torch
        ...
        >>> tokenizer = AutoTokenizer.from_pretrained("funnel-transformer/small")
        >>> model = FunnelForPreTraining.from_pretrained("funnel-transformer/small")
        ...
        >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
        >>> logits = model(**inputs).logits
        ```
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    discriminator_hidden_states = self.funnel(
        input_ids,
        attention_mask=attention_mask,
        token_type_ids=token_type_ids,
        inputs_embeds=inputs_embeds,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )
    discriminator_sequence_output = discriminator_hidden_states[0]

    logits = self.discriminator_predictions(discriminator_sequence_output)

    loss = None
    if labels is not None:
        loss_fct = nn.BCEWithLogitsLoss()
        if attention_mask is not None:
            active_loss = attention_mask.view(-1, discriminator_sequence_output.shape[1]) == 1
            active_logits = logits.view(-1, discriminator_sequence_output.shape[1])[active_loss]
            active_labels = labels[active_loss]
            loss = loss_fct(active_logits, active_labels.float())
        else:
            loss = loss_fct(logits.view(-1, discriminator_sequence_output.shape[1]), labels.float())

    if not return_dict:
        output = (logits,) + discriminator_hidden_states[1:]
        return ((loss,) + output) if loss is not None else output

    return FunnelForPreTrainingOutput(
        loss=loss,
        logits=logits,
        hidden_states=discriminator_hidden_states.hidden_states,
        attentions=discriminator_hidden_states.attentions,
    )

mindnlp.transformers.models.funnel.modeling_funnel.FunnelForQuestionAnswering

Bases: FunnelPreTrainedModel

Source code in mindnlp/transformers/models/funnel/modeling_funnel.py
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
class FunnelForQuestionAnswering(FunnelPreTrainedModel):
    def __init__(self, config: FunnelConfig) -> None:
        super().__init__(config)
        self.num_labels = config.num_labels

        self.funnel = FunnelModel(config)
        self.qa_outputs = nn.Linear(config.hidden_size, config.num_labels)

        # Initialize weights and apply final processing
        self.post_init()

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        token_type_ids: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        start_positions: Optional[mindspore.Tensor] = None,
        end_positions: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple, QuestionAnsweringModelOutput]:
        r"""
        Args:
            start_positions (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
                Labels for position (index) of the start of the labelled span for computing the token classification loss.
                Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
                are not taken into account for computing the loss.
            end_positions (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
                Labels for position (index) of the end of the labelled span for computing the token classification loss.
                Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
                are not taken into account for computing the loss.
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        outputs = self.funnel(
            input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        last_hidden_state = outputs[0]

        logits = self.qa_outputs(last_hidden_state)
        start_logits, end_logits = logits.split(1, axis=-1)
        start_logits = start_logits.squeeze(-1).contiguous()
        end_logits = end_logits.squeeze(-1).contiguous()

        total_loss = None
        if start_positions is not None and end_positions is not None:
            # If we are on multi-GPU, split add a dimension
            if len(start_positions.shape) > 1:
                start_positions = start_positions.squeze(-1)
            if len(end_positions.shape) > 1:
                end_positions = end_positions.squeeze(-1)
            # sometimes the start/end positions are outside our model inputs, we ignore these terms
            ignored_index = start_logits.shape[1]
            start_positions = start_positions.clamp(0, ignored_index)
            end_positions = end_positions.clamp(0, ignored_index)

            loss_fct = nn.CrossEntropyLoss(ignore_index=ignored_index)
            start_positions = start_positions.astype(mindspore.int32)
            start_loss = loss_fct(start_logits, start_positions)
            end_positions = end_positions.astype(mindspore.int32)
            end_loss = loss_fct(end_logits, end_positions)
            total_loss = (start_loss + end_loss) / 2

        if not return_dict:
            output = (start_logits, end_logits) + outputs[1:]
            return ((total_loss,) + output) if total_loss is not None else output

        return QuestionAnsweringModelOutput(
            loss=total_loss,
            start_logits=start_logits,
            end_logits=end_logits,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )

mindnlp.transformers.models.funnel.modeling_funnel.FunnelForQuestionAnswering.forward(input_ids=None, attention_mask=None, token_type_ids=None, inputs_embeds=None, start_positions=None, end_positions=None, output_attentions=None, output_hidden_states=None, return_dict=None)

PARAMETER DESCRIPTION
start_positions

Labels for position (index) of the start of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length). Position outside of the sequence are not taken into account for computing the loss.

TYPE: `torch.LongTensor` of shape `(batch_size,)`, *optional* DEFAULT: None

end_positions

Labels for position (index) of the end of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length). Position outside of the sequence are not taken into account for computing the loss.

TYPE: `torch.LongTensor` of shape `(batch_size,)`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/funnel/modeling_funnel.py
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    token_type_ids: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    start_positions: Optional[mindspore.Tensor] = None,
    end_positions: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[Tuple, QuestionAnsweringModelOutput]:
    r"""
    Args:
        start_positions (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
            Labels for position (index) of the start of the labelled span for computing the token classification loss.
            Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
            are not taken into account for computing the loss.
        end_positions (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
            Labels for position (index) of the end of the labelled span for computing the token classification loss.
            Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
            are not taken into account for computing the loss.
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    outputs = self.funnel(
        input_ids,
        attention_mask=attention_mask,
        token_type_ids=token_type_ids,
        inputs_embeds=inputs_embeds,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    last_hidden_state = outputs[0]

    logits = self.qa_outputs(last_hidden_state)
    start_logits, end_logits = logits.split(1, axis=-1)
    start_logits = start_logits.squeeze(-1).contiguous()
    end_logits = end_logits.squeeze(-1).contiguous()

    total_loss = None
    if start_positions is not None and end_positions is not None:
        # If we are on multi-GPU, split add a dimension
        if len(start_positions.shape) > 1:
            start_positions = start_positions.squeze(-1)
        if len(end_positions.shape) > 1:
            end_positions = end_positions.squeeze(-1)
        # sometimes the start/end positions are outside our model inputs, we ignore these terms
        ignored_index = start_logits.shape[1]
        start_positions = start_positions.clamp(0, ignored_index)
        end_positions = end_positions.clamp(0, ignored_index)

        loss_fct = nn.CrossEntropyLoss(ignore_index=ignored_index)
        start_positions = start_positions.astype(mindspore.int32)
        start_loss = loss_fct(start_logits, start_positions)
        end_positions = end_positions.astype(mindspore.int32)
        end_loss = loss_fct(end_logits, end_positions)
        total_loss = (start_loss + end_loss) / 2

    if not return_dict:
        output = (start_logits, end_logits) + outputs[1:]
        return ((total_loss,) + output) if total_loss is not None else output

    return QuestionAnsweringModelOutput(
        loss=total_loss,
        start_logits=start_logits,
        end_logits=end_logits,
        hidden_states=outputs.hidden_states,
        attentions=outputs.attentions,
    )

mindnlp.transformers.models.funnel.modeling_funnel.FunnelForSequenceClassification

Bases: FunnelPreTrainedModel

Source code in mindnlp/transformers/models/funnel/modeling_funnel.py
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
class FunnelForSequenceClassification(FunnelPreTrainedModel):
    def __init__(self, config: FunnelConfig) -> None:
        super().__init__(config)
        self.num_labels = config.num_labels
        self.config = config

        self.funnel = FunnelBaseModel(config)
        self.classifier = FunnelClassificationHead(config, config.num_labels)
        # Initialize weights and apply final processing
        self.post_init()

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        token_type_ids: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        labels: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple, SequenceClassifierOutput]:
        r"""
        Args:
            labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
                Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
                config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
                `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        outputs = self.funnel(
            input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        last_hidden_state = outputs[0]
        pooled_output = last_hidden_state[:, 0]
        logits = self.classifier(pooled_output)

        loss = None
        if labels is not None:
            if self.config.problem_type is None:
                if self.num_labels == 1:
                    self.config.problem_type = "regression"
                elif self.num_labels > 1 and (labels.dtype in (mindspore.int64, mindspore.int32)):
                    self.config.problem_type = "single_label_classification"
                else:
                    self.config.problem_type = "multi_label_classification"

            if self.config.problem_type == "regression":
                loss_fct = ops.MSELoss()
                if self.num_labels == 1:
                    loss = loss_fct(logits.squeeze(), labels.squeeze())
                else:
                    loss = loss_fct(logits, labels)
            elif self.config.problem_type == "single_label_classification":
                loss_fct = nn.CrossEntropyLoss()
                labels = labels.astype(mindspore.int32)
                loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
            elif self.config.problem_type == "multi_label_classification":
                loss_fct = ops.BCEWithLogitsLoss()
                loss = loss_fct(logits, labels)

        if not return_dict:
            output = (logits,) + outputs[1:]
            return ((loss,) + output) if loss is not None else output

        return SequenceClassifierOutput(
            loss=loss,
            logits=logits,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )

mindnlp.transformers.models.funnel.modeling_funnel.FunnelForSequenceClassification.forward(input_ids=None, attention_mask=None, token_type_ids=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=None)

PARAMETER DESCRIPTION
labels

Labels for computing the sequence classification/regression loss. Indices should be in [0, ..., config.num_labels - 1]. If config.num_labels == 1 a regression loss is computed (Mean-Square loss), If config.num_labels > 1 a classification loss is computed (Cross-Entropy).

TYPE: `torch.LongTensor` of shape `(batch_size,)`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/funnel/modeling_funnel.py
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    token_type_ids: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    labels: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[Tuple, SequenceClassifierOutput]:
    r"""
    Args:
        labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
            Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
            config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
            `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    outputs = self.funnel(
        input_ids,
        attention_mask=attention_mask,
        token_type_ids=token_type_ids,
        inputs_embeds=inputs_embeds,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    last_hidden_state = outputs[0]
    pooled_output = last_hidden_state[:, 0]
    logits = self.classifier(pooled_output)

    loss = None
    if labels is not None:
        if self.config.problem_type is None:
            if self.num_labels == 1:
                self.config.problem_type = "regression"
            elif self.num_labels > 1 and (labels.dtype in (mindspore.int64, mindspore.int32)):
                self.config.problem_type = "single_label_classification"
            else:
                self.config.problem_type = "multi_label_classification"

        if self.config.problem_type == "regression":
            loss_fct = ops.MSELoss()
            if self.num_labels == 1:
                loss = loss_fct(logits.squeeze(), labels.squeeze())
            else:
                loss = loss_fct(logits, labels)
        elif self.config.problem_type == "single_label_classification":
            loss_fct = nn.CrossEntropyLoss()
            labels = labels.astype(mindspore.int32)
            loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
        elif self.config.problem_type == "multi_label_classification":
            loss_fct = ops.BCEWithLogitsLoss()
            loss = loss_fct(logits, labels)

    if not return_dict:
        output = (logits,) + outputs[1:]
        return ((loss,) + output) if loss is not None else output

    return SequenceClassifierOutput(
        loss=loss,
        logits=logits,
        hidden_states=outputs.hidden_states,
        attentions=outputs.attentions,
    )

mindnlp.transformers.models.funnel.modeling_funnel.FunnelForTokenClassification

Bases: FunnelPreTrainedModel

Source code in mindnlp/transformers/models/funnel/modeling_funnel.py
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
class FunnelForTokenClassification(FunnelPreTrainedModel):
    def __init__(self, config: FunnelConfig) -> None:
        super().__init__(config)
        self.num_labels = config.num_labels

        self.funnel = FunnelModel(config)
        self.dropout = nn.Dropout(p=config.hidden_dropout)
        self.classifier = nn.Linear(config.hidden_size, config.num_labels)

        # Initialize weights and apply final processing
        self.post_init()

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        token_type_ids: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        labels: Optional[mindspore.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple, TokenClassifierOutput]:
        r"""
        Args:
            labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
                Labels for computing the token classification loss. Indices should be in `[0, ..., config.num_labels - 1]`.
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        outputs = self.funnel(
            input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        last_hidden_state = outputs[0]
        last_hidden_state = self.dropout(last_hidden_state)
        logits = self.classifier(last_hidden_state)

        loss = None
        if labels is not None:
            loss_fct = nn.CrossEntropyLoss()
            labels = labels.astype(mindspore.int32)
            loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))

        if not return_dict:
            output = (logits,) + outputs[1:]
            return ((loss,) + output) if loss is not None else output

        return TokenClassifierOutput(
            loss=loss,
            logits=logits,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )

mindnlp.transformers.models.funnel.modeling_funnel.FunnelForTokenClassification.forward(input_ids=None, attention_mask=None, token_type_ids=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=None)

PARAMETER DESCRIPTION
labels

Labels for computing the token classification loss. Indices should be in [0, ..., config.num_labels - 1].

TYPE: `torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/funnel/modeling_funnel.py
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    token_type_ids: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    labels: Optional[mindspore.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[Tuple, TokenClassifierOutput]:
    r"""
    Args:
        labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
            Labels for computing the token classification loss. Indices should be in `[0, ..., config.num_labels - 1]`.
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    outputs = self.funnel(
        input_ids,
        attention_mask=attention_mask,
        token_type_ids=token_type_ids,
        inputs_embeds=inputs_embeds,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    last_hidden_state = outputs[0]
    last_hidden_state = self.dropout(last_hidden_state)
    logits = self.classifier(last_hidden_state)

    loss = None
    if labels is not None:
        loss_fct = nn.CrossEntropyLoss()
        labels = labels.astype(mindspore.int32)
        loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))

    if not return_dict:
        output = (logits,) + outputs[1:]
        return ((loss,) + output) if loss is not None else output

    return TokenClassifierOutput(
        loss=loss,
        logits=logits,
        hidden_states=outputs.hidden_states,
        attentions=outputs.attentions,
    )

mindnlp.transformers.models.funnel.modeling_funnel.FunnelPreTrainedModel

Bases: PreTrainedModel

An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained models.

Source code in mindnlp/transformers/models/funnel/modeling_funnel.py
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
class FunnelPreTrainedModel(PreTrainedModel):
    """
    An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
    models.
    """

    config_class = FunnelConfig
    base_model_prefix = "funnel"

    def _init_weights(self, cell):
        classname = cell.__class__.__name__
        if classname.find("Dense") != -1:
            if getattr(cell, "weight", None) is not None:
                if self.config.initializer_std is None:
                    fan_out, fan_in = cell.weight.shape
                    std = np.sqrt(1.0 / float(fan_in + fan_out))
                else:
                    std = self.config.initializer_std
                cell.weight.set_data(initializer(Normal(std),
                                                    cell.weight.shape, cell.weight.dtype))
            if getattr(cell, "bias", None) is not None:
                cell.bias[:] = 0.0
        elif classname == "FunnelRelMultiheadAttention":
            minval = Tensor(0.0, mindspore.float32)
            maxval = Tensor(self.config.initializer_range, mindspore.float32)
            cell.r_w_bias = ops.uniform(cell.r_w_bias.shape, minval, maxval)
            cell.r_r_bias = ops.uniform(cell.r_r_bias.shape, minval, maxval)
            cell.r_kernel = ops.uniform(cell.r_kernel.shape, minval, maxval)
            cell.r_s_bias = ops.uniform(cell.r_s_bias.shape, minval, maxval)
            cell.seg_embed = ops.uniform(cell.seg_embed.shape, minval, maxval)
        elif classname == "FunnelEmbeddings":
            std = 1.0 if self.config.initializer_std is None else self.config.initializer_std
            # nn.init.normal_(cell.word_embeddings.weight, std=std)
            cell.word_embeddings.weight.set_data(initializer(Normal(std),
                                                    cell.word_embeddings.weight.shape, cell.word_embeddings.weight.dtype))
            if cell.word_embeddings.padding_idx is not None:
                cell.word_embeddings.weight.data[cell.padding_idx].zero_()

mindnlp.transformers.models.funnel.modeling_funnel.FunnelRelMultiheadAttention

Bases: Module

Source code in mindnlp/transformers/models/funnel/modeling_funnel.py
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
class FunnelRelMultiheadAttention(nn.Module):
    def __init__(self, config: FunnelConfig, block_index: int) -> None:
        super().__init__()
        self.config = config
        self.block_index = block_index
        d_model, n_head, d_head = config.d_model, config.n_head, config.d_head

        self.hidden_dropout = nn.Dropout(p=config.hidden_dropout)
        self.attention_dropout = nn.Dropout(p=config.attention_dropout)

        self.q_head = nn.Linear(d_model, n_head * d_head, bias=False)
        self.k_head = nn.Linear(d_model, n_head * d_head)
        self.v_head = nn.Linear(d_model, n_head * d_head)

        self.r_w_bias = mindspore.Parameter(ops.zeros([n_head, d_head]))
        self.r_r_bias = mindspore.Parameter(ops.zeros([n_head, d_head]))
        self.r_kernel = mindspore.Parameter(ops.zeros([d_model, n_head, d_head]))
        self.r_s_bias = mindspore.Parameter(ops.zeros([n_head, d_head]))
        self.seg_embed = mindspore.Parameter(ops.zeros([2, n_head, d_head]))

        self.post_proj = nn.Linear(n_head * d_head, d_model)
        self.layer_norm = nn.LayerNorm([d_model], eps=config.layer_norm_eps)
        self.scale = 1.0 / (d_head**0.5)

    def relative_positional_attention(self, position_embeds, q_head, context_len, cls_mask=None):
        """Relative attention score for the positional encodings"""
        # q_head has shape batch_size x sea_len x n_head x d_head
        if self.config.attention_type == "factorized":
            # Notations from the paper, appending A.2.2, final formula (https://arxiv.org/abs/2006.03236)
            # phi and pi have shape seq_len x d_model, psi and omega have shape context_len x d_model
            phi, pi, psi, omega = position_embeds
            # Shape n_head x d_head
            u = self.r_r_bias * self.scale
            # Shape d_model x n_head x d_head
            w_r = self.r_kernel

            # Shape batch_size x sea_len x n_head x d_model
            q_r_attention = ops.einsum("binh,dnh->bind", q_head + u, w_r)
            q_r_attention_1 = q_r_attention * phi[:, None]
            q_r_attention_2 = q_r_attention * pi[:, None]

            # Shape batch_size x n_head x seq_len x context_len
            positional_attn = ops.einsum("bind,jd->bnij", q_r_attention_1, psi) + ops.einsum(
                "bind,jd->bnij", q_r_attention_2, omega
            )
        else:
            shift = 2 if q_head.shape[1] != context_len else 1
            # Notations from the paper, appending A.2.1, final formula (https://arxiv.org/abs/2006.03236)
            # Grab the proper positional encoding, shape max_rel_len x d_model
            r = position_embeds[self.block_index][shift - 1]
            # Shape n_head x d_head
            v = self.r_r_bias * self.scale
            # Shape d_model x n_head x d_head
            w_r = self.r_kernel

            # Shape max_rel_len x n_head x d_model
            r_head = ops.einsum("td,dnh->tnh", r, w_r)
            # Shape batch_size x n_head x seq_len x max_rel_len
            positional_attn = ops.einsum("binh,tnh->bnit", q_head + v, r_head)
            # Shape batch_size x n_head x seq_len x context_len
            positional_attn = _relative_shift_gather(positional_attn, context_len, shift)

        if cls_mask is not None:
            positional_attn *= cls_mask
        return positional_attn

    def relative_token_type_attention(self, token_type_mat, q_head, cls_mask=None):
        """Relative attention score for the token_type_ids"""
        if token_type_mat is None:
            return 0
        batch_size, seq_len, context_len = token_type_mat.shape
        # q_head has shape batch_size x seq_len x n_head x d_head
        # Shape n_head x d_head
        r_s_bias = self.r_s_bias * self.scale

        # Shape batch_size x n_head x seq_len x 2
        token_type_bias = ops.einsum("bind,snd->bnis", q_head + r_s_bias, self.seg_embed)
        # Shape batch_size x n_head x seq_len x context_len
        token_type_mat = token_type_mat[:, None].broadcast_to((batch_size, q_head.shape[2], seq_len, context_len))
        # Shapes batch_size x n_head x seq_len
        diff_token_type, same_token_type = ops.split(token_type_bias, 1, axis=-1)
        # Shape batch_size x n_head x seq_len x context_len
        # ops.Print(token_type_mat)
        token_type_mat = token_type_mat.astype(mindspore.bool_)
        token_type_attn = ops.where(
            token_type_mat, same_token_type.broadcast_to(token_type_mat.shape), diff_token_type.broadcast_to(token_type_mat.shape)
        )

        if cls_mask is not None:
            token_type_attn *= cls_mask
        return token_type_attn

    def forward(
        self,
        query: mindspore.Tensor,
        key: mindspore.Tensor,
        value: mindspore.Tensor,
        attention_inputs: Tuple[mindspore.Tensor],
        output_attentions: bool = False,
    ) -> Tuple[mindspore.Tensor, ...]:
        # query has shape batch_size x seq_len x d_model
        # key and value have shapes batch_size x context_len x d_model
        position_embeds, token_type_mat, attention_mask, cls_mask = attention_inputs

        batch_size, seq_len, _ = query.shape
        context_len = key.shape[1]
        n_head, d_head = self.config.n_head, self.config.d_head

        # Shape batch_size x seq_len x n_head x d_head
        q_head = self.q_head(query).view(batch_size, seq_len, n_head, d_head)
        # Shapes batch_size x context_len x n_head x d_head
        k_head = self.k_head(key).view(batch_size, context_len, n_head, d_head)
        v_head = self.v_head(value).view(batch_size, context_len, n_head, d_head)

        q_head = q_head * self.scale
        # Shape n_head x d_head
        r_w_bias = self.r_w_bias * self.scale
        # Shapes batch_size x n_head x seq_len x context_len
        content_score = ops.einsum("bind,bjnd->bnij", q_head + r_w_bias, k_head)
        positional_attn = self.relative_positional_attention(position_embeds, q_head, context_len, cls_mask)
        token_type_attn = self.relative_token_type_attention(token_type_mat, q_head, cls_mask)

        # merge attention scores
        attn_score = content_score + positional_attn + token_type_attn

        # precision safe in case of mixed precision training
        dtype = attn_score.dtype
        attn_score = attn_score.float()
        # perform masking
        if attention_mask is not None:
            attn_score = attn_score - INF * (1 - attention_mask[:, None, None].float())
        # attention probability
        attn_prob = ops.softmax(attn_score, axis=-1, dtype=dtype)
        attn_prob = self.attention_dropout(attn_prob)

        # attention output, shape batch_size x seq_len x n_head x d_head
        attn_vec = ops.einsum("bnij,bjnd->bind", attn_prob, v_head)

        # Shape shape batch_size x seq_len x d_model
        attn_out = self.post_proj(attn_vec.reshape(batch_size, seq_len, n_head * d_head))
        attn_out = self.hidden_dropout(attn_out)

        output = self.layer_norm(query + attn_out)
        return (output, attn_prob) if output_attentions else (output,)

mindnlp.transformers.models.funnel.modeling_funnel.FunnelRelMultiheadAttention.relative_positional_attention(position_embeds, q_head, context_len, cls_mask=None)

Relative attention score for the positional encodings

Source code in mindnlp/transformers/models/funnel/modeling_funnel.py
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
def relative_positional_attention(self, position_embeds, q_head, context_len, cls_mask=None):
    """Relative attention score for the positional encodings"""
    # q_head has shape batch_size x sea_len x n_head x d_head
    if self.config.attention_type == "factorized":
        # Notations from the paper, appending A.2.2, final formula (https://arxiv.org/abs/2006.03236)
        # phi and pi have shape seq_len x d_model, psi and omega have shape context_len x d_model
        phi, pi, psi, omega = position_embeds
        # Shape n_head x d_head
        u = self.r_r_bias * self.scale
        # Shape d_model x n_head x d_head
        w_r = self.r_kernel

        # Shape batch_size x sea_len x n_head x d_model
        q_r_attention = ops.einsum("binh,dnh->bind", q_head + u, w_r)
        q_r_attention_1 = q_r_attention * phi[:, None]
        q_r_attention_2 = q_r_attention * pi[:, None]

        # Shape batch_size x n_head x seq_len x context_len
        positional_attn = ops.einsum("bind,jd->bnij", q_r_attention_1, psi) + ops.einsum(
            "bind,jd->bnij", q_r_attention_2, omega
        )
    else:
        shift = 2 if q_head.shape[1] != context_len else 1
        # Notations from the paper, appending A.2.1, final formula (https://arxiv.org/abs/2006.03236)
        # Grab the proper positional encoding, shape max_rel_len x d_model
        r = position_embeds[self.block_index][shift - 1]
        # Shape n_head x d_head
        v = self.r_r_bias * self.scale
        # Shape d_model x n_head x d_head
        w_r = self.r_kernel

        # Shape max_rel_len x n_head x d_model
        r_head = ops.einsum("td,dnh->tnh", r, w_r)
        # Shape batch_size x n_head x seq_len x max_rel_len
        positional_attn = ops.einsum("binh,tnh->bnit", q_head + v, r_head)
        # Shape batch_size x n_head x seq_len x context_len
        positional_attn = _relative_shift_gather(positional_attn, context_len, shift)

    if cls_mask is not None:
        positional_attn *= cls_mask
    return positional_attn

mindnlp.transformers.models.funnel.modeling_funnel.FunnelRelMultiheadAttention.relative_token_type_attention(token_type_mat, q_head, cls_mask=None)

Relative attention score for the token_type_ids

Source code in mindnlp/transformers/models/funnel/modeling_funnel.py
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
def relative_token_type_attention(self, token_type_mat, q_head, cls_mask=None):
    """Relative attention score for the token_type_ids"""
    if token_type_mat is None:
        return 0
    batch_size, seq_len, context_len = token_type_mat.shape
    # q_head has shape batch_size x seq_len x n_head x d_head
    # Shape n_head x d_head
    r_s_bias = self.r_s_bias * self.scale

    # Shape batch_size x n_head x seq_len x 2
    token_type_bias = ops.einsum("bind,snd->bnis", q_head + r_s_bias, self.seg_embed)
    # Shape batch_size x n_head x seq_len x context_len
    token_type_mat = token_type_mat[:, None].broadcast_to((batch_size, q_head.shape[2], seq_len, context_len))
    # Shapes batch_size x n_head x seq_len
    diff_token_type, same_token_type = ops.split(token_type_bias, 1, axis=-1)
    # Shape batch_size x n_head x seq_len x context_len
    # ops.Print(token_type_mat)
    token_type_mat = token_type_mat.astype(mindspore.bool_)
    token_type_attn = ops.where(
        token_type_mat, same_token_type.broadcast_to(token_type_mat.shape), diff_token_type.broadcast_to(token_type_mat.shape)
    )

    if cls_mask is not None:
        token_type_attn *= cls_mask
    return token_type_attn

mindnlp.transformers.models.funnel.modeling_funnel.upsample(x, stride, target_len, separate_cls=True, truncate_seq=False)

Upsample tensor x to match target_len by repeating the tokens stride time on the sequence length dimension.

Source code in mindnlp/transformers/models/funnel/modeling_funnel.py
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
def upsample(
    x: mindspore.Tensor, stride: int, target_len: int, separate_cls: bool = True, truncate_seq: bool = False
) -> mindspore.Tensor:
    """
    Upsample tensor `x` to match `target_len` by repeating the tokens `stride` time on the sequence length dimension.
    """
    if stride == 1:
        return x
    if separate_cls:
        cls = x[:, :1]
        x = x[:, 1:]
    output = ops.repeat_interleave(x, repeats=stride, axis=1)
    if separate_cls:
        if truncate_seq:
            output = ops.pad(output, (0, 0, 0, stride - 1, 0, 0))
        output = output[:, : target_len - 1]
        output = ops.cat([cls, output], axis=1)
    else:
        output = output[:, :target_len]
    return output

mindnlp.transformers.models.funnel.tokenization_funnel

Tokenization class for Funnel Transformer.

mindnlp.transformers.models.funnel.tokenization_funnel.BasicTokenizer

Constructs a BasicTokenizer that will run basic tokenization (punctuation splitting, lower casing, etc.).

PARAMETER DESCRIPTION
do_lower_case

Whether or not to lowercase the input when tokenizing.

TYPE: `bool`, *optional*, defaults to `True` DEFAULT: True

never_split

Collection of tokens which will never be split during tokenization. Only has an effect when do_basic_tokenize=True

TYPE: `Iterable`, *optional* DEFAULT: None

tokenize_chinese_chars

Whether or not to tokenize Chinese characters.

This should likely be deactivated for Japanese (see this issue).

TYPE: `bool`, *optional*, defaults to `True` DEFAULT: True

strip_accents

Whether or not to strip all accents. If this option is not specified, then it will be determined by the value for lowercase (as in the original BERT).

TYPE: `bool`, *optional* DEFAULT: None

do_split_on_punc

In some instances we want to skip the basic punctuation splitting so that later tokenization can capture the full context of the words, such as contractions.

TYPE: `bool`, *optional*, defaults to `True` DEFAULT: True

Source code in mindnlp/transformers/models/funnel/tokenization_funnel.py
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
class BasicTokenizer():
    """
    Constructs a BasicTokenizer that will run basic tokenization (punctuation splitting, lower casing, etc.).

    Args:
        do_lower_case (`bool`, *optional*, defaults to `True`):
            Whether or not to lowercase the input when tokenizing.
        never_split (`Iterable`, *optional*):
            Collection of tokens which will never be split during tokenization. Only has an effect when
            `do_basic_tokenize=True`
        tokenize_chinese_chars (`bool`, *optional*, defaults to `True`):
            Whether or not to tokenize Chinese characters.

            This should likely be deactivated for Japanese (see this
            [issue](https://github.com/huggingface/transformers/issues/328)).
        strip_accents (`bool`, *optional*):
            Whether or not to strip all accents. If this option is not specified, then it will be determined by the
            value for `lowercase` (as in the original BERT).
        do_split_on_punc (`bool`, *optional*, defaults to `True`):
            In some instances we want to skip the basic punctuation splitting so that later tokenization can capture
            the full context of the words, such as contractions.
    """

    def __init__(
        self,
        do_lower_case=True,
        never_split=None,
        tokenize_chinese_chars=True,
        strip_accents=None,
        do_split_on_punc=True,
    ):
        if never_split is None:
            never_split = []
        self.do_lower_case = do_lower_case
        self.never_split = set(never_split)
        self.tokenize_chinese_chars = tokenize_chinese_chars
        self.strip_accents = strip_accents
        self.do_split_on_punc = do_split_on_punc

    def tokenize(self, text, never_split=None):
        """
        Basic Tokenization of a piece of text. For sub-word tokenization, see WordPieceTokenizer.

        Args:
            never_split (`List[str]`, *optional*)
                Kept for backward compatibility purposes. Now implemented directly at the base class level (see
                [`PreTrainedTokenizer.tokenize`]) List of token not to split.
        """
        # union() returns a new set by concatenating the two sets.
        never_split = self.never_split.union(set(never_split)) if never_split else self.never_split
        text = self._clean_text(text)

        # This was added on November 1st, 2018 for the multilingual and Chinese
        # models. This is also applied to the English models now, but it doesn't
        # matter since the English models were not trained on any Chinese data
        # and generally don't have any Chinese data in them (there are Chinese
        # characters in the vocabulary because Wikipedia does have some Chinese
        # words in the English Wikipedia.).
        if self.tokenize_chinese_chars:
            text = self._tokenize_chinese_chars(text)
        # prevents treating the same character with different unicode codepoints as different characters
        unicode_normalized_text = unicodedata.normalize("NFC", text)
        orig_tokens = whitespace_tokenize(unicode_normalized_text)
        split_tokens = []
        for token in orig_tokens:
            if token not in never_split:
                if self.do_lower_case:
                    token = token.lower()
                    if self.strip_accents is not False:
                        token = self._run_strip_accents(token)
                elif self.strip_accents:
                    token = self._run_strip_accents(token)
            split_tokens.extend(self._run_split_on_punc(token, never_split))

        output_tokens = whitespace_tokenize(" ".join(split_tokens))
        return output_tokens

    def _run_strip_accents(self, text):
        """Strips accents from a piece of text."""
        text = unicodedata.normalize("NFD", text)
        output = []
        for char in text:
            cat = unicodedata.category(char)
            if cat == "Mn":
                continue
            output.append(char)
        return "".join(output)

    def _run_split_on_punc(self, text, never_split=None):
        """Splits punctuation on a piece of text."""
        if not self.do_split_on_punc or (never_split is not None and text in never_split):
            return [text]
        chars = list(text)
        i = 0
        start_new_word = True
        output = []
        while i < len(chars):
            char = chars[i]
            if _is_punctuation(char):
                output.append([char])
                start_new_word = True
            else:
                if start_new_word:
                    output.append([])
                start_new_word = False
                output[-1].append(char)
            i += 1

        return ["".join(x) for x in output]

    def _tokenize_chinese_chars(self, text):
        """Adds whitespace around any CJK character."""
        output = []
        for char in text:
            cp = ord(char)
            if self._is_chinese_char(cp):
                output.append(" ")
                output.append(char)
                output.append(" ")
            else:
                output.append(char)
        return "".join(output)

    def _is_chinese_char(self, cp):
        """Checks whether CP is the codepoint of a CJK character."""
        # This defines a "chinese character" as anything in the CJK Unicode block:
        #   https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_(Unicode_block)
        #
        # Note that the CJK Unicode block is NOT all Japanese and Korean characters,
        # despite its name. The modern Korean Hangul alphabet is a different block,
        # as is Japanese Hiragana and Katakana. Those alphabets are used to write
        # space-separated words, so they are not treated specially and handled
        # like the all of the other languages.
        if (
            (cp >= 0x4E00 and cp <= 0x9FFF)
            or (cp >= 0x3400 and cp <= 0x4DBF)  #
            or (cp >= 0x20000 and cp <= 0x2A6DF)  #
            or (cp >= 0x2A700 and cp <= 0x2B73F)  #
            or (cp >= 0x2B740 and cp <= 0x2B81F)  #
            or (cp >= 0x2B820 and cp <= 0x2CEAF)  #
            or (cp >= 0xF900 and cp <= 0xFAFF)
            or (cp >= 0x2F800 and cp <= 0x2FA1F)  #
        ):  #
            return True

        return False

    def _clean_text(self, text):
        """Performs invalid character removal and whitespace cleanup on text."""
        output = []
        for char in text:
            cp = ord(char)
            if cp == 0 or cp == 0xFFFD or _is_control(char):
                continue
            if _is_whitespace(char):
                output.append(" ")
            else:
                output.append(char)
        return "".join(output)

mindnlp.transformers.models.funnel.tokenization_funnel.BasicTokenizer.tokenize(text, never_split=None)

Basic Tokenization of a piece of text. For sub-word tokenization, see WordPieceTokenizer.

Source code in mindnlp/transformers/models/funnel/tokenization_funnel.py
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
def tokenize(self, text, never_split=None):
    """
    Basic Tokenization of a piece of text. For sub-word tokenization, see WordPieceTokenizer.

    Args:
        never_split (`List[str]`, *optional*)
            Kept for backward compatibility purposes. Now implemented directly at the base class level (see
            [`PreTrainedTokenizer.tokenize`]) List of token not to split.
    """
    # union() returns a new set by concatenating the two sets.
    never_split = self.never_split.union(set(never_split)) if never_split else self.never_split
    text = self._clean_text(text)

    # This was added on November 1st, 2018 for the multilingual and Chinese
    # models. This is also applied to the English models now, but it doesn't
    # matter since the English models were not trained on any Chinese data
    # and generally don't have any Chinese data in them (there are Chinese
    # characters in the vocabulary because Wikipedia does have some Chinese
    # words in the English Wikipedia.).
    if self.tokenize_chinese_chars:
        text = self._tokenize_chinese_chars(text)
    # prevents treating the same character with different unicode codepoints as different characters
    unicode_normalized_text = unicodedata.normalize("NFC", text)
    orig_tokens = whitespace_tokenize(unicode_normalized_text)
    split_tokens = []
    for token in orig_tokens:
        if token not in never_split:
            if self.do_lower_case:
                token = token.lower()
                if self.strip_accents is not False:
                    token = self._run_strip_accents(token)
            elif self.strip_accents:
                token = self._run_strip_accents(token)
        split_tokens.extend(self._run_split_on_punc(token, never_split))

    output_tokens = whitespace_tokenize(" ".join(split_tokens))
    return output_tokens

mindnlp.transformers.models.funnel.tokenization_funnel.FunnelTokenizer

Bases: PreTrainedTokenizer

Construct a Funnel Transformer tokenizer. Based on WordPiece.

This tokenizer inherits from [PreTrainedTokenizer] which contains most of the main methods. Users should refer to this superclass for more information regarding those methods.

PARAMETER DESCRIPTION
vocab_file

File containing the vocabulary.

TYPE: `str`

do_lower_case

Whether or not to lowercase the input when tokenizing.

TYPE: `bool`, *optional*, defaults to `True` DEFAULT: True

do_basic_tokenize

Whether or not to do basic tokenization before WordPiece.

TYPE: `bool`, *optional*, defaults to `True` DEFAULT: True

never_split

Collection of tokens which will never be split during tokenization. Only has an effect when do_basic_tokenize=True

TYPE: `Iterable`, *optional* DEFAULT: None

unk_token

The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this token instead.

TYPE: `str`, *optional*, defaults to `"<unk>"` DEFAULT: '<unk>'

sep_token

The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for sequence classification or for a text and a question for question answering. It is also used as the last token of a sequence built with special tokens.

TYPE: `str`, *optional*, defaults to `"<sep>"` DEFAULT: '<sep>'

pad_token

The token used for padding, for example when batching sequences of different lengths.

TYPE: `str`, *optional*, defaults to `"<pad>"` DEFAULT: '<pad>'

cls_token

The classifier token which is used when doing sequence classification (classification of the whole sequence instead of per-token classification). It is the first token of the sequence when built with special tokens.

TYPE: `str`, *optional*, defaults to `"<cls>"` DEFAULT: '<cls>'

mask_token

The token used for masking values. This is the token used when training this model with masked language modeling. This is the token which the model will try to predict.

TYPE: `str`, *optional*, defaults to `"<mask>"` DEFAULT: '<mask>'

bos_token

The beginning of sentence token.

TYPE: `str`, *optional*, defaults to `"<s>"` DEFAULT: '<s>'

eos_token

The end of sentence token.

TYPE: `str`, *optional*, defaults to `"</s>"` DEFAULT: '</s>'

tokenize_chinese_chars

Whether or not to tokenize Chinese characters.

This should likely be deactivated for Japanese (see this issue).

TYPE: `bool`, *optional*, defaults to `True` DEFAULT: True

strip_accents

Whether or not to strip all accents. If this option is not specified, then it will be determined by the value for lowercase (as in the original BERT).

TYPE: `bool`, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/funnel/tokenization_funnel.py
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
class FunnelTokenizer(PreTrainedTokenizer):
    r"""
    Construct a Funnel Transformer tokenizer. Based on WordPiece.

    This tokenizer inherits from [`PreTrainedTokenizer`] which contains most of the main methods. Users should refer to
    this superclass for more information regarding those methods.

    Args:
        vocab_file (`str`):
            File containing the vocabulary.
        do_lower_case (`bool`, *optional*, defaults to `True`):
            Whether or not to lowercase the input when tokenizing.
        do_basic_tokenize (`bool`, *optional*, defaults to `True`):
            Whether or not to do basic tokenization before WordPiece.
        never_split (`Iterable`, *optional*):
            Collection of tokens which will never be split during tokenization. Only has an effect when
            `do_basic_tokenize=True`
        unk_token (`str`, *optional*, defaults to `"<unk>"`):
            The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
            token instead.
        sep_token (`str`, *optional*, defaults to `"<sep>"`):
            The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for
            sequence classification or for a text and a question for question answering. It is also used as the last
            token of a sequence built with special tokens.
        pad_token (`str`, *optional*, defaults to `"<pad>"`):
            The token used for padding, for example when batching sequences of different lengths.
        cls_token (`str`, *optional*, defaults to `"<cls>"`):
            The classifier token which is used when doing sequence classification (classification of the whole sequence
            instead of per-token classification). It is the first token of the sequence when built with special tokens.
        mask_token (`str`, *optional*, defaults to `"<mask>"`):
            The token used for masking values. This is the token used when training this model with masked language
            modeling. This is the token which the model will try to predict.
        bos_token (`str`, *optional*, defaults to `"<s>"`):
            The beginning of sentence token.
        eos_token (`str`, *optional*, defaults to `"</s>"`):
            The end of sentence token.
        tokenize_chinese_chars (`bool`, *optional*, defaults to `True`):
            Whether or not to tokenize Chinese characters.

            This should likely be deactivated for Japanese (see this
            [issue](https://github.com/huggingface/transformers/issues/328)).
        strip_accents (`bool`, *optional*):
            Whether or not to strip all accents. If this option is not specified, then it will be determined by the
            value for `lowercase` (as in the original BERT).
    """

    vocab_files_names = VOCAB_FILES_NAMES
    pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP
    max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
    cls_token_type_id: int = 2

    def __init__(
        self,
        vocab_file,
        do_lower_case=True,
        do_basic_tokenize=True,
        never_split=None,
        unk_token="<unk>",
        sep_token="<sep>",
        pad_token="<pad>",
        cls_token="<cls>",
        mask_token="<mask>",
        bos_token="<s>",
        eos_token="</s>",
        tokenize_chinese_chars=True,
        strip_accents=None,
        **kwargs,
    ):
        if not os.path.isfile(vocab_file):
            raise ValueError(
                f"Can't find a vocabulary file at path '{vocab_file}'. To load the vocabulary from a Google pretrained"
                " model use `tokenizer = FunnelTokenizer.from_pretrained(PRETRAINED_MODEL_NAME)`"
            )
        self.vocab = load_vocab(vocab_file)
        self.ids_to_tokens = collections.OrderedDict([(ids, tok) for tok, ids in self.vocab.items()])
        self.do_basic_tokenize = do_basic_tokenize
        if do_basic_tokenize:
            self.basic_tokenizer = BasicTokenizer(
                do_lower_case=do_lower_case,
                never_split=never_split,
                tokenize_chinese_chars=tokenize_chinese_chars,
                strip_accents=strip_accents,
            )
        self.wordpiece_tokenizer = WordpieceTokenizer(vocab=self.vocab, unk_token=str(unk_token))

        super().__init__(
            do_lower_case=do_lower_case,
            do_basic_tokenize=do_basic_tokenize,
            never_split=never_split,
            unk_token=unk_token,
            sep_token=sep_token,
            pad_token=pad_token,
            cls_token=cls_token,
            mask_token=mask_token,
            bos_token=bos_token,
            eos_token=eos_token,
            tokenize_chinese_chars=tokenize_chinese_chars,
            strip_accents=strip_accents,
            **kwargs,
        )

    @property
    # Copied from transformers.models.bert.tokenization_bert.BertTokenizer.do_lower_case
    def do_lower_case(self):
        return self.basic_tokenizer.do_lower_case

    @property
    # Copied from transformers.models.bert.tokenization_bert.BertTokenizer.vocab_size
    def vocab_size(self):
        return len(self.vocab)

    # Copied from transformers.models.bert.tokenization_bert.BertTokenizer.get_vocab
    def get_vocab(self):
        return dict(self.vocab, **self.added_tokens_encoder)

    # Copied from transformers.models.bert.tokenization_bert.BertTokenizer._tokenize
    def _tokenize(self, text, split_special_tokens=False):
        split_tokens = []
        if self.do_basic_tokenize:
            for token in self.basic_tokenizer.tokenize(
                text, never_split=self.all_special_tokens if not split_special_tokens else None
            ):
                # If the token is part of the never_split set
                if token in self.basic_tokenizer.never_split:
                    split_tokens.append(token)
                else:
                    split_tokens += self.wordpiece_tokenizer.tokenize(token)
        else:
            split_tokens = self.wordpiece_tokenizer.tokenize(text)
        return split_tokens

    # Copied from transformers.models.bert.tokenization_bert.BertTokenizer._convert_token_to_id
    def _convert_token_to_id(self, token):
        """Converts a token (str) in an id using the vocab."""
        return self.vocab.get(token, self.vocab.get(self.unk_token))

    # Copied from transformers.models.bert.tokenization_bert.BertTokenizer._convert_id_to_token
    def _convert_id_to_token(self, index):
        """Converts an index (integer) in a token (str) using the vocab."""
        return self.ids_to_tokens.get(index, self.unk_token)

    # Copied from transformers.models.bert.tokenization_bert.BertTokenizer.convert_tokens_to_string
    def convert_tokens_to_string(self, tokens):
        """Converts a sequence of tokens (string) in a single string."""
        out_string = " ".join(tokens).replace(" ##", "").strip()
        return out_string

    # Copied from transformers.models.bert.tokenization_bert.BertTokenizer.build_inputs_with_special_tokens
    def build_inputs_with_special_tokens(
        self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
    ) -> List[int]:
        """
        Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and
        adding special tokens. A BERT sequence has the following format:

        - single sequence: `[CLS] X [SEP]`
        - pair of sequences: `[CLS] A [SEP] B [SEP]`

        Args:
            token_ids_0 (`List[int]`):
                List of IDs to which the special tokens will be added.
            token_ids_1 (`List[int]`, *optional*):
                Optional second list of IDs for sequence pairs.

        Returns:
            `List[int]`: List of [input IDs](../glossary#input-ids) with the appropriate special tokens.
        """
        if token_ids_1 is None:
            return [self.cls_token_id] + token_ids_0 + [self.sep_token_id]
        cls = [self.cls_token_id]
        sep = [self.sep_token_id]
        return cls + token_ids_0 + sep + token_ids_1 + sep

    # Copied from transformers.models.bert.tokenization_bert.BertTokenizer.get_special_tokens_mask
    def get_special_tokens_mask(
        self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None, already_has_special_tokens: bool = False
    ) -> List[int]:
        """
        Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding
        special tokens using the tokenizer `prepare_for_model` method.

        Args:
            token_ids_0 (`List[int]`):
                List of IDs.
            token_ids_1 (`List[int]`, *optional*):
                Optional second list of IDs for sequence pairs.
            already_has_special_tokens (`bool`, *optional*, defaults to `False`):
                Whether or not the token list is already formatted with special tokens for the model.

        Returns:
            `List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
        """

        if already_has_special_tokens:
            return super().get_special_tokens_mask(
                token_ids_0=token_ids_0, token_ids_1=token_ids_1, already_has_special_tokens=True
            )

        if token_ids_1 is not None:
            return [1] + ([0] * len(token_ids_0)) + [1] + ([0] * len(token_ids_1)) + [1]
        return [1] + ([0] * len(token_ids_0)) + [1]

    def create_token_type_ids_from_sequences(
        self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
    ) -> List[int]:
        """
        Create a mask from the two sequences passed to be used in a sequence-pair classification task. A Funnel
        Transformer sequence pair mask has the following format:

        ```
        2 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
        | first sequence    | second sequence |
        ```

        If `token_ids_1` is `None`, this method only returns the first portion of the mask (0s).

        Args:
            token_ids_0 (`List[int]`):
                List of IDs.
            token_ids_1 (`List[int]`, *optional*):
                Optional second list of IDs for sequence pairs.

        Returns:
            `List[int]`: List of [token type IDs](../glossary#token-type-ids) according to the given sequence(s).
        """
        sep = [self.sep_token_id]
        cls = [self.cls_token_id]
        if token_ids_1 is None:
            return len(cls) * [self.cls_token_type_id] + len(token_ids_0 + sep) * [0]
        return len(cls) * [self.cls_token_type_id] + len(token_ids_0 + sep) * [0] + len(token_ids_1 + sep) * [1]

    # Copied from transformers.models.bert.tokenization_bert.BertTokenizer.save_vocabulary
    def save_vocabulary(self, save_directory: str, filename_prefix: Optional[str] = None) -> Tuple[str]:
        index = 0
        if os.path.isdir(save_directory):
            vocab_file = os.path.join(
                save_directory, (filename_prefix + "-" if filename_prefix else "") + VOCAB_FILES_NAMES["vocab_file"]
            )
        else:
            vocab_file = (filename_prefix + "-" if filename_prefix else "") + save_directory
        with open(vocab_file, "w", encoding="utf-8") as writer:
            for token, token_index in sorted(self.vocab.items(), key=lambda kv: kv[1]):
                if index != token_index:
                    logger.warning(
                        f"Saving vocabulary to {vocab_file}: vocabulary indices are not consecutive."
                        " Please check that the vocabulary is not corrupted!"
                    )
                    index = token_index
                writer.write(token + "\n")
                index += 1
        return (vocab_file,)

mindnlp.transformers.models.funnel.tokenization_funnel.FunnelTokenizer.build_inputs_with_special_tokens(token_ids_0, token_ids_1=None)

Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and adding special tokens. A BERT sequence has the following format:

  • single sequence: [CLS] X [SEP]
  • pair of sequences: [CLS] A [SEP] B [SEP]
PARAMETER DESCRIPTION
token_ids_0

List of IDs to which the special tokens will be added.

TYPE: `List[int]`

token_ids_1

Optional second list of IDs for sequence pairs.

TYPE: `List[int]`, *optional* DEFAULT: None

RETURNS DESCRIPTION
List[int]

List[int]: List of input IDs with the appropriate special tokens.

Source code in mindnlp/transformers/models/funnel/tokenization_funnel.py
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
def build_inputs_with_special_tokens(
    self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
) -> List[int]:
    """
    Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and
    adding special tokens. A BERT sequence has the following format:

    - single sequence: `[CLS] X [SEP]`
    - pair of sequences: `[CLS] A [SEP] B [SEP]`

    Args:
        token_ids_0 (`List[int]`):
            List of IDs to which the special tokens will be added.
        token_ids_1 (`List[int]`, *optional*):
            Optional second list of IDs for sequence pairs.

    Returns:
        `List[int]`: List of [input IDs](../glossary#input-ids) with the appropriate special tokens.
    """
    if token_ids_1 is None:
        return [self.cls_token_id] + token_ids_0 + [self.sep_token_id]
    cls = [self.cls_token_id]
    sep = [self.sep_token_id]
    return cls + token_ids_0 + sep + token_ids_1 + sep

mindnlp.transformers.models.funnel.tokenization_funnel.FunnelTokenizer.convert_tokens_to_string(tokens)

Converts a sequence of tokens (string) in a single string.

Source code in mindnlp/transformers/models/funnel/tokenization_funnel.py
237
238
239
240
def convert_tokens_to_string(self, tokens):
    """Converts a sequence of tokens (string) in a single string."""
    out_string = " ".join(tokens).replace(" ##", "").strip()
    return out_string

mindnlp.transformers.models.funnel.tokenization_funnel.FunnelTokenizer.create_token_type_ids_from_sequences(token_ids_0, token_ids_1=None)

Create a mask from the two sequences passed to be used in a sequence-pair classification task. A Funnel Transformer sequence pair mask has the following format:

2 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
| first sequence    | second sequence |

If token_ids_1 is None, this method only returns the first portion of the mask (0s).

PARAMETER DESCRIPTION
token_ids_0

List of IDs.

TYPE: `List[int]`

token_ids_1

Optional second list of IDs for sequence pairs.

TYPE: `List[int]`, *optional* DEFAULT: None

RETURNS DESCRIPTION
List[int]

List[int]: List of token type IDs according to the given sequence(s).

Source code in mindnlp/transformers/models/funnel/tokenization_funnel.py
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
def create_token_type_ids_from_sequences(
    self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
) -> List[int]:
    """
    Create a mask from the two sequences passed to be used in a sequence-pair classification task. A Funnel
    Transformer sequence pair mask has the following format:

    ```
    2 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
    | first sequence    | second sequence |
    ```

    If `token_ids_1` is `None`, this method only returns the first portion of the mask (0s).

    Args:
        token_ids_0 (`List[int]`):
            List of IDs.
        token_ids_1 (`List[int]`, *optional*):
            Optional second list of IDs for sequence pairs.

    Returns:
        `List[int]`: List of [token type IDs](../glossary#token-type-ids) according to the given sequence(s).
    """
    sep = [self.sep_token_id]
    cls = [self.cls_token_id]
    if token_ids_1 is None:
        return len(cls) * [self.cls_token_type_id] + len(token_ids_0 + sep) * [0]
    return len(cls) * [self.cls_token_type_id] + len(token_ids_0 + sep) * [0] + len(token_ids_1 + sep) * [1]

mindnlp.transformers.models.funnel.tokenization_funnel.FunnelTokenizer.get_special_tokens_mask(token_ids_0, token_ids_1=None, already_has_special_tokens=False)

Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding special tokens using the tokenizer prepare_for_model method.

PARAMETER DESCRIPTION
token_ids_0

List of IDs.

TYPE: `List[int]`

token_ids_1

Optional second list of IDs for sequence pairs.

TYPE: `List[int]`, *optional* DEFAULT: None

already_has_special_tokens

Whether or not the token list is already formatted with special tokens for the model.

TYPE: `bool`, *optional*, defaults to `False` DEFAULT: False

RETURNS DESCRIPTION
List[int]

List[int]: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.

Source code in mindnlp/transformers/models/funnel/tokenization_funnel.py
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
def get_special_tokens_mask(
    self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None, already_has_special_tokens: bool = False
) -> List[int]:
    """
    Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding
    special tokens using the tokenizer `prepare_for_model` method.

    Args:
        token_ids_0 (`List[int]`):
            List of IDs.
        token_ids_1 (`List[int]`, *optional*):
            Optional second list of IDs for sequence pairs.
        already_has_special_tokens (`bool`, *optional*, defaults to `False`):
            Whether or not the token list is already formatted with special tokens for the model.

    Returns:
        `List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
    """

    if already_has_special_tokens:
        return super().get_special_tokens_mask(
            token_ids_0=token_ids_0, token_ids_1=token_ids_1, already_has_special_tokens=True
        )

    if token_ids_1 is not None:
        return [1] + ([0] * len(token_ids_0)) + [1] + ([0] * len(token_ids_1)) + [1]
    return [1] + ([0] * len(token_ids_0)) + [1]

mindnlp.transformers.models.funnel.tokenization_funnel.WordpieceTokenizer

Runs WordPiece tokenization.

Source code in mindnlp/transformers/models/funnel/tokenization_funnel.py
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
class WordpieceTokenizer():
    """Runs WordPiece tokenization."""

    def __init__(self, vocab, unk_token, max_input_chars_per_word=100):
        self.vocab = vocab
        self.unk_token = unk_token
        self.max_input_chars_per_word = max_input_chars_per_word

    def tokenize(self, text):
        """
        Tokenizes a piece of text into its word pieces. This uses a greedy longest-match-first algorithm to perform
        tokenization using the given vocabulary.

        For example, `input = "unaffable"` wil return as output `["un", "##aff", "##able"]`.

        Args:
            text: A single token or whitespace separated tokens. This should have
                already been passed through *BasicTokenizer*.

        Returns:
            A list of wordpiece tokens.
        """

        output_tokens = []
        for token in whitespace_tokenize(text):
            chars = list(token)
            if len(chars) > self.max_input_chars_per_word:
                output_tokens.append(self.unk_token)
                continue

            is_bad = False
            start = 0
            sub_tokens = []
            while start < len(chars):
                end = len(chars)
                cur_substr = None
                while start < end:
                    substr = "".join(chars[start:end])
                    if start > 0:
                        substr = "##" + substr
                    if substr in self.vocab:
                        cur_substr = substr
                        break
                    end -= 1
                if cur_substr is None:
                    is_bad = True
                    break
                sub_tokens.append(cur_substr)
                start = end

            if is_bad:
                output_tokens.append(self.unk_token)
            else:
                output_tokens.extend(sub_tokens)
        return output_tokens

mindnlp.transformers.models.funnel.tokenization_funnel.WordpieceTokenizer.tokenize(text)

Tokenizes a piece of text into its word pieces. This uses a greedy longest-match-first algorithm to perform tokenization using the given vocabulary.

For example, input = "unaffable" wil return as output ["un", "##aff", "##able"].

PARAMETER DESCRIPTION
text

A single token or whitespace separated tokens. This should have already been passed through BasicTokenizer.

RETURNS DESCRIPTION

A list of wordpiece tokens.

Source code in mindnlp/transformers/models/funnel/tokenization_funnel.py
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
def tokenize(self, text):
    """
    Tokenizes a piece of text into its word pieces. This uses a greedy longest-match-first algorithm to perform
    tokenization using the given vocabulary.

    For example, `input = "unaffable"` wil return as output `["un", "##aff", "##able"]`.

    Args:
        text: A single token or whitespace separated tokens. This should have
            already been passed through *BasicTokenizer*.

    Returns:
        A list of wordpiece tokens.
    """

    output_tokens = []
    for token in whitespace_tokenize(text):
        chars = list(token)
        if len(chars) > self.max_input_chars_per_word:
            output_tokens.append(self.unk_token)
            continue

        is_bad = False
        start = 0
        sub_tokens = []
        while start < len(chars):
            end = len(chars)
            cur_substr = None
            while start < end:
                substr = "".join(chars[start:end])
                if start > 0:
                    substr = "##" + substr
                if substr in self.vocab:
                    cur_substr = substr
                    break
                end -= 1
            if cur_substr is None:
                is_bad = True
                break
            sub_tokens.append(cur_substr)
            start = end

        if is_bad:
            output_tokens.append(self.unk_token)
        else:
            output_tokens.extend(sub_tokens)
    return output_tokens

mindnlp.transformers.models.funnel.tokenization_funnel.load_vocab(vocab_file)

Loads a vocabulary file into a dictionary.

Source code in mindnlp/transformers/models/funnel/tokenization_funnel.py
74
75
76
77
78
79
80
81
82
def load_vocab(vocab_file):
    """Loads a vocabulary file into a dictionary."""
    vocab = collections.OrderedDict()
    with open(vocab_file, "r", encoding="utf-8") as reader:
        tokens = reader.readlines()
    for index, token in enumerate(tokens):
        token = token.rstrip("\n")
        vocab[token] = index
    return vocab

mindnlp.transformers.models.funnel.tokenization_funnel.whitespace_tokenize(text)

Runs basic whitespace cleaning and splitting on a piece of text.

Source code in mindnlp/transformers/models/funnel/tokenization_funnel.py
86
87
88
89
90
91
92
def whitespace_tokenize(text):
    """Runs basic whitespace cleaning and splitting on a piece of text."""
    text = text.strip()
    if not text:
        return []
    tokens = text.split()
    return tokens

mindnlp.transformers.models.funnel.tokenization_funnel_fast

Tokenization class for Funnel Transformer.

mindnlp.transformers.models.funnel.tokenization_funnel_fast.FunnelTokenizerFast

Bases: PreTrainedTokenizerFast

Construct a "fast" Funnel Transformer tokenizer (backed by HuggingFace's tokenizers library). Based on WordPiece.

This tokenizer inherits from [PreTrainedTokenizerFast] which contains most of the main methods. Users should refer to this superclass for more information regarding those methods.

PARAMETER DESCRIPTION
vocab_file

File containing the vocabulary.

TYPE: `str` DEFAULT: None

do_lower_case

Whether or not to lowercase the input when tokenizing.

TYPE: `bool`, *optional*, defaults to `True` DEFAULT: True

unk_token

The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this token instead.

TYPE: `str`, *optional*, defaults to `"<unk>"` DEFAULT: '<unk>'

sep_token

The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for sequence classification or for a text and a question for question answering. It is also used as the last token of a sequence built with special tokens.

TYPE: `str`, *optional*, defaults to `"<sep>"` DEFAULT: '<sep>'

pad_token

The token used for padding, for example when batching sequences of different lengths.

TYPE: `str`, *optional*, defaults to `"<pad>"` DEFAULT: '<pad>'

cls_token

The classifier token which is used when doing sequence classification (classification of the whole sequence instead of per-token classification). It is the first token of the sequence when built with special tokens.

TYPE: `str`, *optional*, defaults to `"<cls>"` DEFAULT: '<cls>'

mask_token

The token used for masking values. This is the token used when training this model with masked language modeling. This is the token which the model will try to predict.

TYPE: `str`, *optional*, defaults to `"<mask>"` DEFAULT: '<mask>'

clean_text

Whether or not to clean the text before tokenization by removing any control characters and replacing all whitespaces by the classic one.

TYPE: `bool`, *optional*, defaults to `True` DEFAULT: True

tokenize_chinese_chars

Whether or not to tokenize Chinese characters. This should likely be deactivated for Japanese (see this issue).

TYPE: `bool`, *optional*, defaults to `True` DEFAULT: True

bos_token

The beginning of sentence token.

TYPE: `str`, `optional`, defaults to `"<s>"` DEFAULT: '<s>'

eos_token

The end of sentence token.

TYPE: `str`, `optional`, defaults to `"</s>"` DEFAULT: '</s>'

strip_accents

Whether or not to strip all accents. If this option is not specified, then it will be determined by the value for lowercase (as in the original BERT).

TYPE: `bool`, *optional* DEFAULT: None

wordpieces_prefix

The prefix for subwords.

TYPE: `str`, *optional*, defaults to `"##"` DEFAULT: '##'

Source code in mindnlp/transformers/models/funnel/tokenization_funnel_fast.py
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
class FunnelTokenizerFast(PreTrainedTokenizerFast):
    r"""
    Construct a "fast" Funnel Transformer tokenizer (backed by HuggingFace's *tokenizers* library). Based on WordPiece.

    This tokenizer inherits from [`PreTrainedTokenizerFast`] which contains most of the main methods. Users should
    refer to this superclass for more information regarding those methods.

    Args:
        vocab_file (`str`):
            File containing the vocabulary.
        do_lower_case (`bool`, *optional*, defaults to `True`):
            Whether or not to lowercase the input when tokenizing.
        unk_token (`str`, *optional*, defaults to `"<unk>"`):
            The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
            token instead.
        sep_token (`str`, *optional*, defaults to `"<sep>"`):
            The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for
            sequence classification or for a text and a question for question answering. It is also used as the last
            token of a sequence built with special tokens.
        pad_token (`str`, *optional*, defaults to `"<pad>"`):
            The token used for padding, for example when batching sequences of different lengths.
        cls_token (`str`, *optional*, defaults to `"<cls>"`):
            The classifier token which is used when doing sequence classification (classification of the whole sequence
            instead of per-token classification). It is the first token of the sequence when built with special tokens.
        mask_token (`str`, *optional*, defaults to `"<mask>"`):
            The token used for masking values. This is the token used when training this model with masked language
            modeling. This is the token which the model will try to predict.
        clean_text (`bool`, *optional*, defaults to `True`):
            Whether or not to clean the text before tokenization by removing any control characters and replacing all
            whitespaces by the classic one.
        tokenize_chinese_chars (`bool`, *optional*, defaults to `True`):
            Whether or not to tokenize Chinese characters. This should likely be deactivated for Japanese (see [this
            issue](https://github.com/huggingface/transformers/issues/328)).
        bos_token (`str`, `optional`, defaults to `"<s>"`):
            The beginning of sentence token.
        eos_token (`str`, `optional`, defaults to `"</s>"`):
            The end of sentence token.
        strip_accents (`bool`, *optional*):
            Whether or not to strip all accents. If this option is not specified, then it will be determined by the
            value for `lowercase` (as in the original BERT).
        wordpieces_prefix (`str`, *optional*, defaults to `"##"`):
            The prefix for subwords.
    """

    vocab_files_names = VOCAB_FILES_NAMES
    pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP
    max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
    slow_tokenizer_class = FunnelTokenizer
    cls_token_type_id: int = 2

    def __init__(
        self,
        vocab_file=None,
        tokenizer_file=None,
        do_lower_case=True,
        unk_token="<unk>",
        sep_token="<sep>",
        pad_token="<pad>",
        cls_token="<cls>",
        mask_token="<mask>",
        bos_token="<s>",
        eos_token="</s>",
        clean_text=True,
        tokenize_chinese_chars=True,
        strip_accents=None,
        wordpieces_prefix="##",
        **kwargs,
    ):
        super().__init__(
            vocab_file,
            tokenizer_file=tokenizer_file,
            do_lower_case=do_lower_case,
            unk_token=unk_token,
            sep_token=sep_token,
            pad_token=pad_token,
            cls_token=cls_token,
            mask_token=mask_token,
            bos_token=bos_token,
            eos_token=eos_token,
            clean_text=clean_text,
            tokenize_chinese_chars=tokenize_chinese_chars,
            strip_accents=strip_accents,
            wordpieces_prefix=wordpieces_prefix,
            **kwargs,
        )

        normalizer_state = json.loads(self.backend_tokenizer.normalizer.__getstate__())
        if (
            normalizer_state.get("lowercase", do_lower_case) != do_lower_case
            or normalizer_state.get("strip_accents", strip_accents) != strip_accents
            or normalizer_state.get("handle_chinese_chars", tokenize_chinese_chars) != tokenize_chinese_chars
        ):
            normalizer_class = getattr(normalizers, normalizer_state.pop("type"))
            normalizer_state["lowercase"] = do_lower_case
            normalizer_state["strip_accents"] = strip_accents
            normalizer_state["handle_chinese_chars"] = tokenize_chinese_chars
            self.backend_tokenizer.normalizer = normalizer_class(**normalizer_state)

        self.do_lower_case = do_lower_case

    # Copied from transformers.models.bert.tokenization_bert_fast.BertTokenizerFast.build_inputs_with_special_tokens with BERT->Funnel
    def build_inputs_with_special_tokens(self, token_ids_0, token_ids_1=None):
        """
        Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and
        adding special tokens. A Funnel sequence has the following format:

        - single sequence: `[CLS] X [SEP]`
        - pair of sequences: `[CLS] A [SEP] B [SEP]`

        Args:
            token_ids_0 (`List[int]`):
                List of IDs to which the special tokens will be added.
            token_ids_1 (`List[int]`, *optional*):
                Optional second list of IDs for sequence pairs.

        Returns:
            `List[int]`: List of [input IDs](../glossary#input-ids) with the appropriate special tokens.
        """
        output = [self.cls_token_id] + token_ids_0 + [self.sep_token_id]

        if token_ids_1 is not None:
            output += token_ids_1 + [self.sep_token_id]

        return output

    def create_token_type_ids_from_sequences(
        self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
    ) -> List[int]:
        """
        Create a mask from the two sequences passed to be used in a sequence-pair classification task. A Funnel
        Transformer sequence pair mask has the following format:

        ```
        2 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
        | first sequence    | second sequence |
        ```

        If `token_ids_1` is `None`, this method only returns the first portion of the mask (0s).

        Args:
            token_ids_0 (`List[int]`):
                List of IDs.
            token_ids_1 (`List[int]`, *optional*):
                Optional second list of IDs for sequence pairs.

        Returns:
            `List[int]`: List of [token type IDs](../glossary#token-type-ids) according to the given sequence(s).
        """
        sep = [self.sep_token_id]
        cls = [self.cls_token_id]
        if token_ids_1 is None:
            return len(cls) * [self.cls_token_type_id] + len(token_ids_0 + sep) * [0]
        return len(cls) * [self.cls_token_type_id] + len(token_ids_0 + sep) * [0] + len(token_ids_1 + sep) * [1]

    # Copied from transformers.models.bert.tokenization_bert_fast.BertTokenizerFast.save_vocabulary
    def save_vocabulary(self, save_directory: str, filename_prefix: Optional[str] = None) -> Tuple[str]:
        files = self._tokenizer.model.save(save_directory, name=filename_prefix)
        return tuple(files)

mindnlp.transformers.models.funnel.tokenization_funnel_fast.FunnelTokenizerFast.build_inputs_with_special_tokens(token_ids_0, token_ids_1=None)

Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and adding special tokens. A Funnel sequence has the following format:

  • single sequence: [CLS] X [SEP]
  • pair of sequences: [CLS] A [SEP] B [SEP]
PARAMETER DESCRIPTION
token_ids_0

List of IDs to which the special tokens will be added.

TYPE: `List[int]`

token_ids_1

Optional second list of IDs for sequence pairs.

TYPE: `List[int]`, *optional* DEFAULT: None

RETURNS DESCRIPTION

List[int]: List of input IDs with the appropriate special tokens.

Source code in mindnlp/transformers/models/funnel/tokenization_funnel_fast.py
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
def build_inputs_with_special_tokens(self, token_ids_0, token_ids_1=None):
    """
    Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and
    adding special tokens. A Funnel sequence has the following format:

    - single sequence: `[CLS] X [SEP]`
    - pair of sequences: `[CLS] A [SEP] B [SEP]`

    Args:
        token_ids_0 (`List[int]`):
            List of IDs to which the special tokens will be added.
        token_ids_1 (`List[int]`, *optional*):
            Optional second list of IDs for sequence pairs.

    Returns:
        `List[int]`: List of [input IDs](../glossary#input-ids) with the appropriate special tokens.
    """
    output = [self.cls_token_id] + token_ids_0 + [self.sep_token_id]

    if token_ids_1 is not None:
        output += token_ids_1 + [self.sep_token_id]

    return output

mindnlp.transformers.models.funnel.tokenization_funnel_fast.FunnelTokenizerFast.create_token_type_ids_from_sequences(token_ids_0, token_ids_1=None)

Create a mask from the two sequences passed to be used in a sequence-pair classification task. A Funnel Transformer sequence pair mask has the following format:

2 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
| first sequence    | second sequence |

If token_ids_1 is None, this method only returns the first portion of the mask (0s).

PARAMETER DESCRIPTION
token_ids_0

List of IDs.

TYPE: `List[int]`

token_ids_1

Optional second list of IDs for sequence pairs.

TYPE: `List[int]`, *optional* DEFAULT: None

RETURNS DESCRIPTION
List[int]

List[int]: List of token type IDs according to the given sequence(s).

Source code in mindnlp/transformers/models/funnel/tokenization_funnel_fast.py
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
def create_token_type_ids_from_sequences(
    self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
) -> List[int]:
    """
    Create a mask from the two sequences passed to be used in a sequence-pair classification task. A Funnel
    Transformer sequence pair mask has the following format:

    ```
    2 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
    | first sequence    | second sequence |
    ```

    If `token_ids_1` is `None`, this method only returns the first portion of the mask (0s).

    Args:
        token_ids_0 (`List[int]`):
            List of IDs.
        token_ids_1 (`List[int]`, *optional*):
            Optional second list of IDs for sequence pairs.

    Returns:
        `List[int]`: List of [token type IDs](../glossary#token-type-ids) according to the given sequence(s).
    """
    sep = [self.sep_token_id]
    cls = [self.cls_token_id]
    if token_ids_1 is None:
        return len(cls) * [self.cls_token_type_id] + len(token_ids_0 + sep) * [0]
    return len(cls) * [self.cls_token_type_id] + len(token_ids_0 + sep) * [0] + len(token_ids_1 + sep) * [1]