Skip to content

bge_m3

mindnlp.transformers.models.bge_m3.configuration_bge_m3.BgeM3Config

Bases: PretrainedConfig

A class representing the configuration for a BgeM3 model.

This class inherits from the PretrainedConfig class and defines the configuration parameters for a BgeM3 model, including vocabulary size, hidden size, number of hidden layers, number of attention heads, intermediate size, activation function, dropout probabilities, maximum position embeddings, type vocabulary size, initializer range, layer normalization epsilon, padding token ID, beginning of sequence token ID, end of sequence token ID, position embedding type, cache usage, classifier dropout, Colbert dimension, sentence pooling method, and unused tokens.

PARAMETER DESCRIPTION
vocab_size

The size of the vocabulary.

TYPE: int DEFAULT: 30522

hidden_size

The size of the hidden layers.

TYPE: int DEFAULT: 768

num_hidden_layers

The number of hidden layers in the model.

TYPE: int DEFAULT: 12

num_attention_heads

The number of attention heads in the model.

TYPE: int DEFAULT: 12

intermediate_size

The size of the intermediate layer in the model.

TYPE: int DEFAULT: 3072

hidden_act

The activation function used in the hidden layers.

TYPE: str DEFAULT: 'gelu'

hidden_dropout_prob

The dropout probability for the hidden layers.

TYPE: float DEFAULT: 0.1

attention_probs_dropout_prob

The dropout probability for attention probabilities.

TYPE: float DEFAULT: 0.1

max_position_embeddings

The maximum position embeddings in the model.

TYPE: int DEFAULT: 512

type_vocab_size

The size of the type vocabulary.

TYPE: int DEFAULT: 2

initializer_range

The range for parameter initialization.

TYPE: float DEFAULT: 0.02

layer_norm_eps

The epsilon value for layer normalization.

TYPE: float DEFAULT: 1e-12

pad_token_id

The ID for padding tokens.

TYPE: int DEFAULT: 1

bos_token_id

The ID for the beginning of sequence tokens.

TYPE: int DEFAULT: 0

eos_token_id

The ID for the end of sequence tokens.

TYPE: int DEFAULT: 2

position_embedding_type

The type of position embedding used.

TYPE: str DEFAULT: 'absolute'

use_cache

Flag indicating whether caching is used.

TYPE: bool DEFAULT: True

classifier_dropout

The dropout rate for the classifier layer.

TYPE: float DEFAULT: None

colbert_dim

The dimension of Colbert.

TYPE: int DEFAULT: None

sentence_pooling_method

The method used for sentence pooling.

TYPE: str DEFAULT: 'cls'

unused_tokens

A list of unused tokens.

TYPE: list DEFAULT: None

ATTRIBUTE DESCRIPTION
vocab_size

The size of the vocabulary.

TYPE: int

hidden_size

The size of the hidden layers.

TYPE: int

num_hidden_layers

The number of hidden layers in the model.

TYPE: int

num_attention_heads

The number of attention heads in the model.

TYPE: int

hidden_act

The activation function used in the hidden layers.

TYPE: str

intermediate_size

The size of the intermediate layer in the model.

TYPE: int

hidden_dropout_prob

The dropout probability for the hidden layers.

TYPE: float

attention_probs_dropout_prob

The dropout probability for attention probabilities.

TYPE: float

max_position_embeddings

The maximum position embeddings in the model.

TYPE: int

type_vocab_size

The size of the type vocabulary.

TYPE: int

initializer_range

The range for parameter initialization.

TYPE: float

layer_norm_eps

The epsilon value for layer normalization.

TYPE: float

position_embedding_type

The type of position embedding used.

TYPE: str

use_cache

Flag indicating whether caching is used.

TYPE: bool

classifier_dropout

The dropout rate for the classifier layer.

TYPE: float

colbert_dim

The dimension of Colbert.

TYPE: int

sentence_pooling_method

The method used for sentence pooling.

TYPE: str

unused_tokens

A list of unused tokens.

TYPE: list

Source code in mindnlp/transformers/models/bge_m3/configuration_bge_m3.py
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
class BgeM3Config(PretrainedConfig):

    """
    A class representing the configuration for a BgeM3 model. 

    This class inherits from the PretrainedConfig class and defines the configuration parameters for a BgeM3 model,
    including vocabulary size, hidden size, number of hidden layers, number of attention heads, intermediate size,
    activation function, dropout probabilities, maximum position embeddings, type vocabulary size, initializer range,
    layer normalization epsilon, padding token ID, beginning of sequence token ID, end of sequence token ID,
    position embedding type, cache usage, classifier dropout, Colbert dimension, sentence pooling method, and unused tokens.

    Parameters:
        vocab_size (int): The size of the vocabulary.
        hidden_size (int): The size of the hidden layers.
        num_hidden_layers (int): The number of hidden layers in the model.
        num_attention_heads (int): The number of attention heads in the model.
        intermediate_size (int): The size of the intermediate layer in the model.
        hidden_act (str): The activation function used in the hidden layers.
        hidden_dropout_prob (float): The dropout probability for the hidden layers.
        attention_probs_dropout_prob (float): The dropout probability for attention probabilities.
        max_position_embeddings (int): The maximum position embeddings in the model.
        type_vocab_size (int): The size of the type vocabulary.
        initializer_range (float): The range for parameter initialization.
        layer_norm_eps (float): The epsilon value for layer normalization.
        pad_token_id (int): The ID for padding tokens.
        bos_token_id (int): The ID for the beginning of sequence tokens.
        eos_token_id (int): The ID for the end of sequence tokens.
        position_embedding_type (str): The type of position embedding used.
        use_cache (bool): Flag indicating whether caching is used.
        classifier_dropout (float): The dropout rate for the classifier layer.
        colbert_dim (int): The dimension of Colbert.
        sentence_pooling_method (str): The method used for sentence pooling.
        unused_tokens (list): A list of unused tokens.

    Attributes:
        vocab_size (int): The size of the vocabulary.
        hidden_size (int): The size of the hidden layers.
        num_hidden_layers (int): The number of hidden layers in the model.
        num_attention_heads (int): The number of attention heads in the model.
        hidden_act (str): The activation function used in the hidden layers.
        intermediate_size (int): The size of the intermediate layer in the model.
        hidden_dropout_prob (float): The dropout probability for the hidden layers.
        attention_probs_dropout_prob (float): The dropout probability for attention probabilities.
        max_position_embeddings (int): The maximum position embeddings in the model.
        type_vocab_size (int): The size of the type vocabulary.
        initializer_range (float): The range for parameter initialization.
        layer_norm_eps (float): The epsilon value for layer normalization.
        position_embedding_type (str): The type of position embedding used.
        use_cache (bool): Flag indicating whether caching is used.
        classifier_dropout (float): The dropout rate for the classifier layer.
        colbert_dim (int): The dimension of Colbert.
        sentence_pooling_method (str): The method used for sentence pooling.
        unused_tokens (list): A list of unused tokens.
    """
    model_type = "bge-m3"

    def __init__(
        self,
        vocab_size=30522,
        hidden_size=768,
        num_hidden_layers=12,
        num_attention_heads=12,
        intermediate_size=3072,
        hidden_act="gelu",
        hidden_dropout_prob=0.1,
        attention_probs_dropout_prob=0.1,
        max_position_embeddings=512,
        type_vocab_size=2,
        initializer_range=0.02,
        layer_norm_eps=1e-12,
        pad_token_id=1,
        bos_token_id=0,
        eos_token_id=2,
        position_embedding_type="absolute",
        use_cache=True,
        classifier_dropout=None,
        colbert_dim=None,
        sentence_pooling_method='cls',
        unused_tokens=None,
        **kwargs,
    ):
        """
        This method initializes an instance of the BgeM3Config class with the given parameters.

        Args:
            self: The instance of the class.
            vocab_size (int, optional): The size of the vocabulary. Default is 30522.
            hidden_size (int, optional): The size of the hidden layers. Default is 768.
            num_hidden_layers (int, optional): The number of hidden layers. Default is 12.
            num_attention_heads (int, optional): The number of attention heads. Default is 12.
            intermediate_size (int, optional): The size of the intermediate layer in the transformer encoder. Default is 3072.
            hidden_act (str, optional): The activation function for the hidden layers. Default is 'gelu'.
            hidden_dropout_prob (float, optional): The dropout probability for the hidden layers. Default is 0.1.
            attention_probs_dropout_prob (float, optional): The dropout probability for the attention probabilities. Default is 0.1.
            max_position_embeddings (int, optional): The maximum number of positions for positional embeddings. Default is 512.
            type_vocab_size (int, optional): The size of the type vocabulary. Default is 2.
            initializer_range (float, optional): The range for parameter initializers. Default is 0.02.
            layer_norm_eps (float, optional): The epsilon value for layer normalization. Default is 1e-12.
            pad_token_id (int, optional): The token id for padding. Default is 1.
            bos_token_id (int, optional): The token id for the beginning of sequence. Default is 0.
            eos_token_id (int, optional): The token id for the end of sequence. Default is 2.
            position_embedding_type (str, optional): The type of position embedding to use. Default is 'absolute'.
            use_cache (bool, optional): Whether to use caching during decoding. Default is True.
            classifier_dropout (float, optional): The dropout probability for the classifier layer. Default is None.
            colbert_dim (int, optional): The dimensionality of the colbert layer. Default is None.
            sentence_pooling_method (str, optional): The method for pooling sentence representations. Default is 'cls'.
            unused_tokens (list, optional): A list of unused tokens. Default is None.
            **kwargs: Additional keyword arguments.

        Returns:
            None.

        Raises:
            ValueError: If any of the parameters are invalid or out of range.
        """
        super().__init__(pad_token_id=pad_token_id, bos_token_id=bos_token_id, eos_token_id=eos_token_id, **kwargs)

        self.vocab_size = vocab_size
        self.hidden_size = hidden_size
        self.num_hidden_layers = num_hidden_layers
        self.num_attention_heads = num_attention_heads
        self.hidden_act = hidden_act
        self.intermediate_size = intermediate_size
        self.hidden_dropout_prob = hidden_dropout_prob
        self.attention_probs_dropout_prob = attention_probs_dropout_prob
        self.max_position_embeddings = max_position_embeddings
        self.type_vocab_size = type_vocab_size
        self.initializer_range = initializer_range
        self.layer_norm_eps = layer_norm_eps
        self.position_embedding_type = position_embedding_type
        self.use_cache = use_cache
        self.classifier_dropout = classifier_dropout
        self.colbert_dim = colbert_dim
        self.sentence_pooling_method = sentence_pooling_method
        self.unused_tokens = unused_tokens

mindnlp.transformers.models.bge_m3.configuration_bge_m3.BgeM3Config.__init__(vocab_size=30522, hidden_size=768, num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072, hidden_act='gelu', hidden_dropout_prob=0.1, attention_probs_dropout_prob=0.1, max_position_embeddings=512, type_vocab_size=2, initializer_range=0.02, layer_norm_eps=1e-12, pad_token_id=1, bos_token_id=0, eos_token_id=2, position_embedding_type='absolute', use_cache=True, classifier_dropout=None, colbert_dim=None, sentence_pooling_method='cls', unused_tokens=None, **kwargs)

This method initializes an instance of the BgeM3Config class with the given parameters.

PARAMETER DESCRIPTION
self

The instance of the class.

vocab_size

The size of the vocabulary. Default is 30522.

TYPE: int DEFAULT: 30522

hidden_size

The size of the hidden layers. Default is 768.

TYPE: int DEFAULT: 768

num_hidden_layers

The number of hidden layers. Default is 12.

TYPE: int DEFAULT: 12

num_attention_heads

The number of attention heads. Default is 12.

TYPE: int DEFAULT: 12

intermediate_size

The size of the intermediate layer in the transformer encoder. Default is 3072.

TYPE: int DEFAULT: 3072

hidden_act

The activation function for the hidden layers. Default is 'gelu'.

TYPE: str DEFAULT: 'gelu'

hidden_dropout_prob

The dropout probability for the hidden layers. Default is 0.1.

TYPE: float DEFAULT: 0.1

attention_probs_dropout_prob

The dropout probability for the attention probabilities. Default is 0.1.

TYPE: float DEFAULT: 0.1

max_position_embeddings

The maximum number of positions for positional embeddings. Default is 512.

TYPE: int DEFAULT: 512

type_vocab_size

The size of the type vocabulary. Default is 2.

TYPE: int DEFAULT: 2

initializer_range

The range for parameter initializers. Default is 0.02.

TYPE: float DEFAULT: 0.02

layer_norm_eps

The epsilon value for layer normalization. Default is 1e-12.

TYPE: float DEFAULT: 1e-12

pad_token_id

The token id for padding. Default is 1.

TYPE: int DEFAULT: 1

bos_token_id

The token id for the beginning of sequence. Default is 0.

TYPE: int DEFAULT: 0

eos_token_id

The token id for the end of sequence. Default is 2.

TYPE: int DEFAULT: 2

position_embedding_type

The type of position embedding to use. Default is 'absolute'.

TYPE: str DEFAULT: 'absolute'

use_cache

Whether to use caching during decoding. Default is True.

TYPE: bool DEFAULT: True

classifier_dropout

The dropout probability for the classifier layer. Default is None.

TYPE: float DEFAULT: None

colbert_dim

The dimensionality of the colbert layer. Default is None.

TYPE: int DEFAULT: None

sentence_pooling_method

The method for pooling sentence representations. Default is 'cls'.

TYPE: str DEFAULT: 'cls'

unused_tokens

A list of unused tokens. Default is None.

TYPE: list DEFAULT: None

**kwargs

Additional keyword arguments.

DEFAULT: {}

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
ValueError

If any of the parameters are invalid or out of range.

Source code in mindnlp/transformers/models/bge_m3/configuration_bge_m3.py
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
def __init__(
    self,
    vocab_size=30522,
    hidden_size=768,
    num_hidden_layers=12,
    num_attention_heads=12,
    intermediate_size=3072,
    hidden_act="gelu",
    hidden_dropout_prob=0.1,
    attention_probs_dropout_prob=0.1,
    max_position_embeddings=512,
    type_vocab_size=2,
    initializer_range=0.02,
    layer_norm_eps=1e-12,
    pad_token_id=1,
    bos_token_id=0,
    eos_token_id=2,
    position_embedding_type="absolute",
    use_cache=True,
    classifier_dropout=None,
    colbert_dim=None,
    sentence_pooling_method='cls',
    unused_tokens=None,
    **kwargs,
):
    """
    This method initializes an instance of the BgeM3Config class with the given parameters.

    Args:
        self: The instance of the class.
        vocab_size (int, optional): The size of the vocabulary. Default is 30522.
        hidden_size (int, optional): The size of the hidden layers. Default is 768.
        num_hidden_layers (int, optional): The number of hidden layers. Default is 12.
        num_attention_heads (int, optional): The number of attention heads. Default is 12.
        intermediate_size (int, optional): The size of the intermediate layer in the transformer encoder. Default is 3072.
        hidden_act (str, optional): The activation function for the hidden layers. Default is 'gelu'.
        hidden_dropout_prob (float, optional): The dropout probability for the hidden layers. Default is 0.1.
        attention_probs_dropout_prob (float, optional): The dropout probability for the attention probabilities. Default is 0.1.
        max_position_embeddings (int, optional): The maximum number of positions for positional embeddings. Default is 512.
        type_vocab_size (int, optional): The size of the type vocabulary. Default is 2.
        initializer_range (float, optional): The range for parameter initializers. Default is 0.02.
        layer_norm_eps (float, optional): The epsilon value for layer normalization. Default is 1e-12.
        pad_token_id (int, optional): The token id for padding. Default is 1.
        bos_token_id (int, optional): The token id for the beginning of sequence. Default is 0.
        eos_token_id (int, optional): The token id for the end of sequence. Default is 2.
        position_embedding_type (str, optional): The type of position embedding to use. Default is 'absolute'.
        use_cache (bool, optional): Whether to use caching during decoding. Default is True.
        classifier_dropout (float, optional): The dropout probability for the classifier layer. Default is None.
        colbert_dim (int, optional): The dimensionality of the colbert layer. Default is None.
        sentence_pooling_method (str, optional): The method for pooling sentence representations. Default is 'cls'.
        unused_tokens (list, optional): A list of unused tokens. Default is None.
        **kwargs: Additional keyword arguments.

    Returns:
        None.

    Raises:
        ValueError: If any of the parameters are invalid or out of range.
    """
    super().__init__(pad_token_id=pad_token_id, bos_token_id=bos_token_id, eos_token_id=eos_token_id, **kwargs)

    self.vocab_size = vocab_size
    self.hidden_size = hidden_size
    self.num_hidden_layers = num_hidden_layers
    self.num_attention_heads = num_attention_heads
    self.hidden_act = hidden_act
    self.intermediate_size = intermediate_size
    self.hidden_dropout_prob = hidden_dropout_prob
    self.attention_probs_dropout_prob = attention_probs_dropout_prob
    self.max_position_embeddings = max_position_embeddings
    self.type_vocab_size = type_vocab_size
    self.initializer_range = initializer_range
    self.layer_norm_eps = layer_norm_eps
    self.position_embedding_type = position_embedding_type
    self.use_cache = use_cache
    self.classifier_dropout = classifier_dropout
    self.colbert_dim = colbert_dim
    self.sentence_pooling_method = sentence_pooling_method
    self.unused_tokens = unused_tokens

mindnlp.transformers.models.bge_m3.modeling_bge_m3.BgeM3Model

Bases: XLMRobertaPreTrainedModel

The BgeM3Model class represents a model that extends XLMRobertaPreTrainedModel. It includes methods for dense embedding, sparse embedding, Colbert embedding, and processing token weights and Colbert vectors. The forward method processes input tensors to generate various outputs including last hidden state, dense output, pooler output, Colbert output, sparse output, hidden states, past key values, attentions, and cross attentions.

Source code in mindnlp/transformers/models/bge_m3/modeling_bge_m3.py
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
class BgeM3Model(XLMRobertaPreTrainedModel):

    """
    The BgeM3Model class represents a model that extends XLMRobertaPreTrainedModel.
    It includes methods for dense embedding, sparse embedding, Colbert embedding, and processing token weights
    and Colbert vectors.
    The forward method processes input tensors to generate various outputs including last hidden state, dense output,
    pooler output, Colbert output, sparse output, hidden states, past key values, attentions, and cross attentions.
    """
    config_class = BgeM3Config

    def __init__(self, config: BgeM3Config):
        """
            Initializes a new instance of the BgeM3Model class.

            Args:
                self: The current BgeM3Model instance.
                config (BgeM3Config): The configuration object for BgeM3Model.

            Returns:
                None

            Raises:
                None
            """
        super().__init__(config)
        self.roberta = XLMRobertaModel(config, add_pooling_layer=False)
        self.colbert_linear = nn.Linear(
            config.hidden_size,
            config.hidden_size if config.colbert_dim is None else config.colbert_dim,
        )
        self.sparse_linear = nn.Linear(config.hidden_size, 1)
        self.sentence_pooling_method = config.sentence_pooling_method

        self.init_weights()

    # Copied from FlagEmbedding
    def dense_embedding(self, hidden_state, mask):
        """
        This method calculates the dense embedding based on the provided hidden state and mask,
        using the specified sentence pooling method.

        Args:
            self (object): The instance of the BgeM3Model class.
            hidden_state (tensor): The hidden state tensor representing the input sequence.
            mask (tensor): The mask tensor indicating the presence of valid elements in the input sequence.
                Its shape should be compatible with hidden_state.

        Returns:
            None: This method does not return a value, as the dense embedding is directly computed and returned.

        Raises:
            ValueError: If the sentence pooling method specified is not supported or recognized.
            RuntimeError: If there are issues with the tensor operations or calculations within the method.
        """
        if self.sentence_pooling_method == "cls":
            return hidden_state[:, 0]
        elif self.sentence_pooling_method == "mean":
            s = ops.sum(hidden_state * mask.unsqueeze(-1).float(), dim=1)
            d = mask.sum(axis=1, keepdim=True).float()
            return s / d

    # Copied from FlagEmbedding
    def sparse_embedding(self, hidden_state, input_ids, return_embedding: bool = False):
        """
        Sparse Embedding

        This method computes the sparse embedding for a given hidden state and input IDs.

        Args:
            self (BgeM3Model): The instance of the BgeM3Model class.
            hidden_state: The hidden state tensor.
            input_ids: The input IDs tensor.
            return_embedding (bool, optional): Whether to return the sparse embedding or token weights.
                Defaults to False.

        Returns:
            None

        Raises:
            None
        """
        token_weights = ops.relu(self.sparse_linear(hidden_state))
        if not return_embedding:
            return token_weights

        sparse_embedding = ops.zeros(
            (input_ids.shape[0],
            input_ids.shape[1],
            self.config.vocab_size),
            dtype=token_weights.dtype,
        )
        sparse_embedding = ops.scatter(sparse_embedding, dim=-1, index=input_ids.unsqueeze(-1), src=token_weights)

        unused_tokens = self.config.unused_tokens
        sparse_embedding = ops.max(sparse_embedding, dim=1)[0]
        sparse_embedding[:, unused_tokens] *= 0.0
        return sparse_embedding

    # Copied from FlagEmbedding
    def colbert_embedding(self, last_hidden_state, mask):
        """
        Embeds the last hidden state of the BgeM3Model using the Colbert method.

        Args:
            self (BgeM3Model): The instance of the BgeM3Model class.
            last_hidden_state (torch.Tensor): The last hidden state of the model.
                Shape: (batch_size, sequence_length, hidden_size)
            mask (torch.Tensor): The mask specifying the valid positions in the last_hidden_state tensor.
                Shape: (batch_size, sequence_length)

        Returns:
            torch.Tensor: The embedded Colbert vectors.
                Shape: (batch_size, sequence_length-1, hidden_size)

        Raises:
            None
        """
        colbert_vecs = self.colbert_linear(last_hidden_state[:, 1:])
        colbert_vecs = colbert_vecs * mask[:, 1:][:, :, None].float()
        return colbert_vecs

    # Modified from FlagEmbedding
    def _process_token_weights(self, token_weights, input_ids, mask):
        """
        Process the token weights for the BgeM3Model.

        Args:
            self (BgeM3Model): An instance of the BgeM3Model class.
            token_weights (Tensor): A tensor containing the weights of each token.
            input_ids (Tensor): A tensor containing the input IDs.
            mask (Tensor): A tensor containing the mask.

        Returns:
            list[defaultdict]:
                A list of dictionaries, where each dictionary contains the maximum weight for each unique ID.

        Raises:
            None.

        This method processes the given token weights by removing unused tokens and filtering out invalid indices.
        It then computes the maximum weight for each unique ID and stores the results in a list of dictionaries.
        The resulting list is returned as the output of this method.
        """
        token_weights = token_weights.squeeze(-1)
        # conver to dict
        all_result = []
        unused_tokens = self.config.unused_tokens
        unused_tokens = mindspore.tensor(unused_tokens)

        # Get valid matrix
        valid_indices = ~mindspore.numpy.isin(input_ids, unused_tokens)
        # w>0
        valid_indices = (valid_indices & (token_weights > 0)).bool()
        valid_indices = (valid_indices & mask).bool()

        for i, valid in enumerate(valid_indices):
            result = defaultdict(int)

            # Get valid weight and ids
            valid_weights = token_weights[i][valid]
            valid_ids = input_ids[i][valid]

            # Get unique token
            unique_ids, inverse_indices = ops.unique(valid_ids)

            # Get max weight for each token
            for i in range(unique_ids.shape[0]):
                id_mask = inverse_indices == i
                result[str(unique_ids[i].item())] = valid_weights[id_mask].max().item()

            all_result.append(result)

        return all_result

    # Copied from FlagEmbedding
    def _process_colbert_vecs(self, colbert_vecs, tokens_num) -> List[mindspore.Tensor]:
        '''
        This method processes the Colbert vectors to extract a subset of vectors based on the tokens number.

        Args:
            self (BgeM3Model): The instance of the BgeM3Model class.
            colbert_vecs (Union[mindspore.Tensor, List[mindspore.Tensor]]): The Colbert vectors to be processed.
            tokens_num (List[int]): The list containing the number of tokens for each vector in colbert_vecs.

        Returns:
            List[mindspore.Tensor]: A list of processed vectors.

        Raises:
            ValueError: If the length of colbert_vecs and tokens_num does not match.
            IndexError: If the tokens_num contains an index that is out of range for colbert_vecs.
        '''
        # delte the vectors of padding tokens
        vecs = []
        for i in range(len(tokens_num)):
            vecs.append(colbert_vecs[i, : tokens_num[i] - 1])
        return vecs

    # Copied from transformers.models.bert.modeling_bert.BertModel.forward
    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        token_type_ids: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        encoder_hidden_states: Optional[mindspore.Tensor] = None,
        encoder_attention_mask: Optional[mindspore.Tensor] = None,
        past_key_values: Optional[List[mindspore.Tensor]] = None,
        use_cache: Optional[bool] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple[mindspore.Tensor], BgeM3ModelOutput]:
        """
        Constructs the BgeM3Model.

        Args:
            self: The instance of the class.
            input_ids (Optional[mindspore.Tensor]):
                The input tensor of shape (batch_size, sequence_length) containing the input IDs.
            attention_mask (Optional[mindspore.Tensor]):
                The attention mask tensor of shape (batch_size, sequence_length) containing attention masks
                for the input IDs.
            token_type_ids (Optional[mindspore.Tensor]):
                The token type IDs tensor of shape (batch_size, sequence_length) containing the token type IDs
                for the input IDs.
            position_ids (Optional[mindspore.Tensor]):
                The position IDs tensor of shape (batch_size, sequence_length) containing the position IDs
                for the input IDs.
            head_mask (Optional[mindspore.Tensor]):
                The head mask tensor of shape (num_heads,) or (num_layers, num_heads) containing the head mask
                for the transformer encoder.
            inputs_embeds (Optional[mindspore.Tensor]):
                The input embeddings tensor of shape (batch_size, sequence_length, hidden_size) containing
                the embeddings for the input IDs.
            encoder_hidden_states (Optional[mindspore.Tensor]):
                The encoder hidden states tensor of shape (batch_size, encoder_sequence_length, hidden_size)
                containing the hidden states of the encoder.
            encoder_attention_mask (Optional[mindspore.Tensor]):
                The encoder attention mask tensor of shape (batch_size, encoder_sequence_length) containing
                attention masks for the encoder hidden states.
            past_key_values (Optional[List[mindspore.Tensor]]):
                The list of past key value tensors of shape (2, batch_size, num_heads, sequence_length,
                hidden_size//num_heads) containing the past key value states for the transformer decoder.
            use_cache (Optional[bool]): Whether to use cache for the transformer decoder.
            output_attentions (Optional[bool]): Whether to output attentions.
            output_hidden_states (Optional[bool]): Whether to output hidden states.
            return_dict (Optional[bool]): Whether to return a dictionary instead of a tuple.

        Returns:
            Union[Tuple[mindspore.Tensor], BgeM3ModelOutput]:
                If `return_dict` is set to False, returns a tuple containing the following elements:

                - last_hidden_state (mindspore.Tensor):
                The last hidden state tensor of shape (batch_size, sequence_length, hidden_size)
                containing the last hidden state of the transformer.
                - pooler_output (mindspore.Tensor):
                The pooler output tensor of shape (batch_size, hidden_size) containing the pooler output of
                the transformer.
                - dense_output (mindspore.Tensor):
                The dense embedding output tensor of shape (batch_size, sequence_length, dense_size) containing
                the dense embeddings.
                - colbert_output (mindspore.Tensor):
                The Colbert embedding output tensor of shape (batch_size, sequence_length, colbert_size) containing
                the Colbert embeddings.
                - sparse_output (mindspore.Tensor):
                The sparse embedding output tensor of shape (batch_size, sequence_length, sparse_size) containing
                the sparse embeddings.
                - hidden_states (Tuple[mindspore.Tensor]):
                The hidden states tensor of shape (num_layers, batch_size, sequence_length, hidden_size) containing
                the hidden states of the transformer.
                - past_key_values (Tuple[mindspore.Tensor]):
                The past key value tensors of shape (2, batch_size, num_heads, sequence_length, hidden_size//num_heads)
                containing the past key value states for the transformer decoder.
                - attentions (Tuple[mindspore.Tensor]):
                The attentions tensors of shape (num_layers, batch_size, num_heads, sequence_length, sequence_length)
                containing the attentions of the transformer.
                - cross_attentions (Tuple[mindspore.Tensor]):
                The cross attentions tensors of shape (num_layers, batch_size, num_heads, sequence_length, encoder_sequence_length)
                containing the cross attentions of the transformer.

            BgeM3ModelOutput:
                If `return_dict` is set to True, returns an instance of the BgeM3ModelOutput class containing the
                following elements:

                - last_hidden_state (mindspore.Tensor):
                The last hidden state tensor of shape (batch_size, sequence_length, hidden_size)
                containing the last hidden state of the transformer.
                - dense_output (mindspore.Tensor):
                The dense embedding output tensor of shape (batch_size, sequence_length, dense_size)
                containing the dense embeddings.
                - pooler_output (mindspore.Tensor):
                The pooler output tensor of shape (batch_size, hidden_size)
                containing the pooler output of the transformer.
                - colbert_output (mindspore.Tensor):
                The Colbert embedding output tensor of shape (batch_size, sequence_length, colbert_size)
                containing the Colbert embeddings.
                - sparse_output (mindspore.Tensor):
                The sparse embedding output tensor of shape (batch_size, sequence_length, sparse_size)
                containing the sparse embeddings.
                - hidden_states (Tuple[mindspore.Tensor]):
                The hidden states tensor of shape (num_layers, batch_size, sequence_length, hidden_size)
                containing the hidden states of the transformer.
                - past_key_values (Tuple[mindspore.Tensor]):
                The past key value tensors of shape (2, batch_size, num_heads, sequence_length, hidden_size//num_heads)
                containing the past key value states for the transformer decoder.
                - attentions (Tuple[mindspore.Tensor]):
                The attentions tensors of shape (num_layers, batch_size, num_heads, sequence_length, sequence_length)
                containing the attentions of the transformer.
                - cross_attentions (Tuple[mindspore.Tensor]):
                The cross attentions tensors of shape (num_layers, batch_size, num_heads, sequence_length, encoder_sequence_length)
                containing the cross attentions of the transformer.

        Raises:
            None.
        """
        roberta_output: BaseModelOutputWithPoolingAndCrossAttentions = self.roberta(
            input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            encoder_hidden_states=encoder_hidden_states,
            encoder_attention_mask=encoder_attention_mask,
            past_key_values=past_key_values,
            use_cache=use_cache,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=True,
        )

        last_hidden_state = roberta_output.last_hidden_state
        dense_output = self.dense_embedding(last_hidden_state, attention_mask)

        tokens_num = attention_mask.sum(axis=1)
        colbert_output = self.colbert_embedding(last_hidden_state, attention_mask)
        colbert_output = self._process_colbert_vecs(colbert_output, tokens_num)

        sparse_output = self.sparse_embedding(last_hidden_state, input_ids)
        sparse_output = self._process_token_weights(sparse_output, input_ids, attention_mask)

        if not return_dict:
            return (
                last_hidden_state,
                roberta_output.pooler_output,
                dense_output,
                colbert_output,
                sparse_output,
                roberta_output.hidden_states,
                roberta_output.past_key_values,
                roberta_output.attentions,
                roberta_output.cross_attentions,
            )

        return BgeM3ModelOutput(
            last_hidden_state=last_hidden_state,
            dense_output=dense_output,
            pooler_output=roberta_output.pooler_output,
            colbert_output=colbert_output,
            sparse_output=sparse_output,
            hidden_states=roberta_output.hidden_states,
            past_key_values=roberta_output.past_key_values,
            attentions=roberta_output.attentions,
            cross_attentions=roberta_output.cross_attentions,
        )

mindnlp.transformers.models.bge_m3.modeling_bge_m3.BgeM3Model.__init__(config)

Initializes a new instance of the BgeM3Model class.

PARAMETER DESCRIPTION
self

The current BgeM3Model instance.

config

The configuration object for BgeM3Model.

TYPE: BgeM3Config

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/bge_m3/modeling_bge_m3.py
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
def __init__(self, config: BgeM3Config):
    """
        Initializes a new instance of the BgeM3Model class.

        Args:
            self: The current BgeM3Model instance.
            config (BgeM3Config): The configuration object for BgeM3Model.

        Returns:
            None

        Raises:
            None
        """
    super().__init__(config)
    self.roberta = XLMRobertaModel(config, add_pooling_layer=False)
    self.colbert_linear = nn.Linear(
        config.hidden_size,
        config.hidden_size if config.colbert_dim is None else config.colbert_dim,
    )
    self.sparse_linear = nn.Linear(config.hidden_size, 1)
    self.sentence_pooling_method = config.sentence_pooling_method

    self.init_weights()

mindnlp.transformers.models.bge_m3.modeling_bge_m3.BgeM3Model.colbert_embedding(last_hidden_state, mask)

Embeds the last hidden state of the BgeM3Model using the Colbert method.

PARAMETER DESCRIPTION
self

The instance of the BgeM3Model class.

TYPE: BgeM3Model

last_hidden_state

The last hidden state of the model. Shape: (batch_size, sequence_length, hidden_size)

TYPE: Tensor

mask

The mask specifying the valid positions in the last_hidden_state tensor. Shape: (batch_size, sequence_length)

TYPE: Tensor

RETURNS DESCRIPTION

torch.Tensor: The embedded Colbert vectors. Shape: (batch_size, sequence_length-1, hidden_size)

Source code in mindnlp/transformers/models/bge_m3/modeling_bge_m3.py
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
def colbert_embedding(self, last_hidden_state, mask):
    """
    Embeds the last hidden state of the BgeM3Model using the Colbert method.

    Args:
        self (BgeM3Model): The instance of the BgeM3Model class.
        last_hidden_state (torch.Tensor): The last hidden state of the model.
            Shape: (batch_size, sequence_length, hidden_size)
        mask (torch.Tensor): The mask specifying the valid positions in the last_hidden_state tensor.
            Shape: (batch_size, sequence_length)

    Returns:
        torch.Tensor: The embedded Colbert vectors.
            Shape: (batch_size, sequence_length-1, hidden_size)

    Raises:
        None
    """
    colbert_vecs = self.colbert_linear(last_hidden_state[:, 1:])
    colbert_vecs = colbert_vecs * mask[:, 1:][:, :, None].float()
    return colbert_vecs

mindnlp.transformers.models.bge_m3.modeling_bge_m3.BgeM3Model.dense_embedding(hidden_state, mask)

This method calculates the dense embedding based on the provided hidden state and mask, using the specified sentence pooling method.

PARAMETER DESCRIPTION
self

The instance of the BgeM3Model class.

TYPE: object

hidden_state

The hidden state tensor representing the input sequence.

TYPE: tensor

mask

The mask tensor indicating the presence of valid elements in the input sequence. Its shape should be compatible with hidden_state.

TYPE: tensor

RETURNS DESCRIPTION
None

This method does not return a value, as the dense embedding is directly computed and returned.

RAISES DESCRIPTION
ValueError

If the sentence pooling method specified is not supported or recognized.

RuntimeError

If there are issues with the tensor operations or calculations within the method.

Source code in mindnlp/transformers/models/bge_m3/modeling_bge_m3.py
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
def dense_embedding(self, hidden_state, mask):
    """
    This method calculates the dense embedding based on the provided hidden state and mask,
    using the specified sentence pooling method.

    Args:
        self (object): The instance of the BgeM3Model class.
        hidden_state (tensor): The hidden state tensor representing the input sequence.
        mask (tensor): The mask tensor indicating the presence of valid elements in the input sequence.
            Its shape should be compatible with hidden_state.

    Returns:
        None: This method does not return a value, as the dense embedding is directly computed and returned.

    Raises:
        ValueError: If the sentence pooling method specified is not supported or recognized.
        RuntimeError: If there are issues with the tensor operations or calculations within the method.
    """
    if self.sentence_pooling_method == "cls":
        return hidden_state[:, 0]
    elif self.sentence_pooling_method == "mean":
        s = ops.sum(hidden_state * mask.unsqueeze(-1).float(), dim=1)
        d = mask.sum(axis=1, keepdim=True).float()
        return s / d

mindnlp.transformers.models.bge_m3.modeling_bge_m3.BgeM3Model.forward(input_ids=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, encoder_hidden_states=None, encoder_attention_mask=None, past_key_values=None, use_cache=None, output_attentions=None, output_hidden_states=None, return_dict=None)

Constructs the BgeM3Model.

PARAMETER DESCRIPTION
self

The instance of the class.

input_ids

The input tensor of shape (batch_size, sequence_length) containing the input IDs.

TYPE: Optional[Tensor] DEFAULT: None

attention_mask

The attention mask tensor of shape (batch_size, sequence_length) containing attention masks for the input IDs.

TYPE: Optional[Tensor] DEFAULT: None

token_type_ids

The token type IDs tensor of shape (batch_size, sequence_length) containing the token type IDs for the input IDs.

TYPE: Optional[Tensor] DEFAULT: None

position_ids

The position IDs tensor of shape (batch_size, sequence_length) containing the position IDs for the input IDs.

TYPE: Optional[Tensor] DEFAULT: None

head_mask

The head mask tensor of shape (num_heads,) or (num_layers, num_heads) containing the head mask for the transformer encoder.

TYPE: Optional[Tensor] DEFAULT: None

inputs_embeds

The input embeddings tensor of shape (batch_size, sequence_length, hidden_size) containing the embeddings for the input IDs.

TYPE: Optional[Tensor] DEFAULT: None

encoder_hidden_states

The encoder hidden states tensor of shape (batch_size, encoder_sequence_length, hidden_size) containing the hidden states of the encoder.

TYPE: Optional[Tensor] DEFAULT: None

encoder_attention_mask

The encoder attention mask tensor of shape (batch_size, encoder_sequence_length) containing attention masks for the encoder hidden states.

TYPE: Optional[Tensor] DEFAULT: None

past_key_values

The list of past key value tensors of shape (2, batch_size, num_heads, sequence_length, hidden_size//num_heads) containing the past key value states for the transformer decoder.

TYPE: Optional[List[Tensor]] DEFAULT: None

use_cache

Whether to use cache for the transformer decoder.

TYPE: Optional[bool] DEFAULT: None

output_attentions

Whether to output attentions.

TYPE: Optional[bool] DEFAULT: None

output_hidden_states

Whether to output hidden states.

TYPE: Optional[bool] DEFAULT: None

return_dict

Whether to return a dictionary instead of a tuple.

TYPE: Optional[bool] DEFAULT: None

RETURNS DESCRIPTION
Union[Tuple[Tensor], BgeM3ModelOutput]

Union[Tuple[mindspore.Tensor], BgeM3ModelOutput]: If return_dict is set to False, returns a tuple containing the following elements:

  • last_hidden_state (mindspore.Tensor): The last hidden state tensor of shape (batch_size, sequence_length, hidden_size) containing the last hidden state of the transformer.
  • pooler_output (mindspore.Tensor): The pooler output tensor of shape (batch_size, hidden_size) containing the pooler output of the transformer.
  • dense_output (mindspore.Tensor): The dense embedding output tensor of shape (batch_size, sequence_length, dense_size) containing the dense embeddings.
  • colbert_output (mindspore.Tensor): The Colbert embedding output tensor of shape (batch_size, sequence_length, colbert_size) containing the Colbert embeddings.
  • sparse_output (mindspore.Tensor): The sparse embedding output tensor of shape (batch_size, sequence_length, sparse_size) containing the sparse embeddings.
  • hidden_states (Tuple[mindspore.Tensor]): The hidden states tensor of shape (num_layers, batch_size, sequence_length, hidden_size) containing the hidden states of the transformer.
  • past_key_values (Tuple[mindspore.Tensor]): The past key value tensors of shape (2, batch_size, num_heads, sequence_length, hidden_size//num_heads) containing the past key value states for the transformer decoder.
  • attentions (Tuple[mindspore.Tensor]): The attentions tensors of shape (num_layers, batch_size, num_heads, sequence_length, sequence_length) containing the attentions of the transformer.
  • cross_attentions (Tuple[mindspore.Tensor]): The cross attentions tensors of shape (num_layers, batch_size, num_heads, sequence_length, encoder_sequence_length) containing the cross attentions of the transformer.
BgeM3ModelOutput

If return_dict is set to True, returns an instance of the BgeM3ModelOutput class containing the following elements:

  • last_hidden_state (mindspore.Tensor): The last hidden state tensor of shape (batch_size, sequence_length, hidden_size) containing the last hidden state of the transformer.
  • dense_output (mindspore.Tensor): The dense embedding output tensor of shape (batch_size, sequence_length, dense_size) containing the dense embeddings.
  • pooler_output (mindspore.Tensor): The pooler output tensor of shape (batch_size, hidden_size) containing the pooler output of the transformer.
  • colbert_output (mindspore.Tensor): The Colbert embedding output tensor of shape (batch_size, sequence_length, colbert_size) containing the Colbert embeddings.
  • sparse_output (mindspore.Tensor): The sparse embedding output tensor of shape (batch_size, sequence_length, sparse_size) containing the sparse embeddings.
  • hidden_states (Tuple[mindspore.Tensor]): The hidden states tensor of shape (num_layers, batch_size, sequence_length, hidden_size) containing the hidden states of the transformer.
  • past_key_values (Tuple[mindspore.Tensor]): The past key value tensors of shape (2, batch_size, num_heads, sequence_length, hidden_size//num_heads) containing the past key value states for the transformer decoder.
  • attentions (Tuple[mindspore.Tensor]): The attentions tensors of shape (num_layers, batch_size, num_heads, sequence_length, sequence_length) containing the attentions of the transformer.
  • cross_attentions (Tuple[mindspore.Tensor]): The cross attentions tensors of shape (num_layers, batch_size, num_heads, sequence_length, encoder_sequence_length) containing the cross attentions of the transformer.

TYPE: Union[Tuple[Tensor], BgeM3ModelOutput]

Source code in mindnlp/transformers/models/bge_m3/modeling_bge_m3.py
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    token_type_ids: Optional[mindspore.Tensor] = None,
    position_ids: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    encoder_hidden_states: Optional[mindspore.Tensor] = None,
    encoder_attention_mask: Optional[mindspore.Tensor] = None,
    past_key_values: Optional[List[mindspore.Tensor]] = None,
    use_cache: Optional[bool] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[Tuple[mindspore.Tensor], BgeM3ModelOutput]:
    """
    Constructs the BgeM3Model.

    Args:
        self: The instance of the class.
        input_ids (Optional[mindspore.Tensor]):
            The input tensor of shape (batch_size, sequence_length) containing the input IDs.
        attention_mask (Optional[mindspore.Tensor]):
            The attention mask tensor of shape (batch_size, sequence_length) containing attention masks
            for the input IDs.
        token_type_ids (Optional[mindspore.Tensor]):
            The token type IDs tensor of shape (batch_size, sequence_length) containing the token type IDs
            for the input IDs.
        position_ids (Optional[mindspore.Tensor]):
            The position IDs tensor of shape (batch_size, sequence_length) containing the position IDs
            for the input IDs.
        head_mask (Optional[mindspore.Tensor]):
            The head mask tensor of shape (num_heads,) or (num_layers, num_heads) containing the head mask
            for the transformer encoder.
        inputs_embeds (Optional[mindspore.Tensor]):
            The input embeddings tensor of shape (batch_size, sequence_length, hidden_size) containing
            the embeddings for the input IDs.
        encoder_hidden_states (Optional[mindspore.Tensor]):
            The encoder hidden states tensor of shape (batch_size, encoder_sequence_length, hidden_size)
            containing the hidden states of the encoder.
        encoder_attention_mask (Optional[mindspore.Tensor]):
            The encoder attention mask tensor of shape (batch_size, encoder_sequence_length) containing
            attention masks for the encoder hidden states.
        past_key_values (Optional[List[mindspore.Tensor]]):
            The list of past key value tensors of shape (2, batch_size, num_heads, sequence_length,
            hidden_size//num_heads) containing the past key value states for the transformer decoder.
        use_cache (Optional[bool]): Whether to use cache for the transformer decoder.
        output_attentions (Optional[bool]): Whether to output attentions.
        output_hidden_states (Optional[bool]): Whether to output hidden states.
        return_dict (Optional[bool]): Whether to return a dictionary instead of a tuple.

    Returns:
        Union[Tuple[mindspore.Tensor], BgeM3ModelOutput]:
            If `return_dict` is set to False, returns a tuple containing the following elements:

            - last_hidden_state (mindspore.Tensor):
            The last hidden state tensor of shape (batch_size, sequence_length, hidden_size)
            containing the last hidden state of the transformer.
            - pooler_output (mindspore.Tensor):
            The pooler output tensor of shape (batch_size, hidden_size) containing the pooler output of
            the transformer.
            - dense_output (mindspore.Tensor):
            The dense embedding output tensor of shape (batch_size, sequence_length, dense_size) containing
            the dense embeddings.
            - colbert_output (mindspore.Tensor):
            The Colbert embedding output tensor of shape (batch_size, sequence_length, colbert_size) containing
            the Colbert embeddings.
            - sparse_output (mindspore.Tensor):
            The sparse embedding output tensor of shape (batch_size, sequence_length, sparse_size) containing
            the sparse embeddings.
            - hidden_states (Tuple[mindspore.Tensor]):
            The hidden states tensor of shape (num_layers, batch_size, sequence_length, hidden_size) containing
            the hidden states of the transformer.
            - past_key_values (Tuple[mindspore.Tensor]):
            The past key value tensors of shape (2, batch_size, num_heads, sequence_length, hidden_size//num_heads)
            containing the past key value states for the transformer decoder.
            - attentions (Tuple[mindspore.Tensor]):
            The attentions tensors of shape (num_layers, batch_size, num_heads, sequence_length, sequence_length)
            containing the attentions of the transformer.
            - cross_attentions (Tuple[mindspore.Tensor]):
            The cross attentions tensors of shape (num_layers, batch_size, num_heads, sequence_length, encoder_sequence_length)
            containing the cross attentions of the transformer.

        BgeM3ModelOutput:
            If `return_dict` is set to True, returns an instance of the BgeM3ModelOutput class containing the
            following elements:

            - last_hidden_state (mindspore.Tensor):
            The last hidden state tensor of shape (batch_size, sequence_length, hidden_size)
            containing the last hidden state of the transformer.
            - dense_output (mindspore.Tensor):
            The dense embedding output tensor of shape (batch_size, sequence_length, dense_size)
            containing the dense embeddings.
            - pooler_output (mindspore.Tensor):
            The pooler output tensor of shape (batch_size, hidden_size)
            containing the pooler output of the transformer.
            - colbert_output (mindspore.Tensor):
            The Colbert embedding output tensor of shape (batch_size, sequence_length, colbert_size)
            containing the Colbert embeddings.
            - sparse_output (mindspore.Tensor):
            The sparse embedding output tensor of shape (batch_size, sequence_length, sparse_size)
            containing the sparse embeddings.
            - hidden_states (Tuple[mindspore.Tensor]):
            The hidden states tensor of shape (num_layers, batch_size, sequence_length, hidden_size)
            containing the hidden states of the transformer.
            - past_key_values (Tuple[mindspore.Tensor]):
            The past key value tensors of shape (2, batch_size, num_heads, sequence_length, hidden_size//num_heads)
            containing the past key value states for the transformer decoder.
            - attentions (Tuple[mindspore.Tensor]):
            The attentions tensors of shape (num_layers, batch_size, num_heads, sequence_length, sequence_length)
            containing the attentions of the transformer.
            - cross_attentions (Tuple[mindspore.Tensor]):
            The cross attentions tensors of shape (num_layers, batch_size, num_heads, sequence_length, encoder_sequence_length)
            containing the cross attentions of the transformer.

    Raises:
        None.
    """
    roberta_output: BaseModelOutputWithPoolingAndCrossAttentions = self.roberta(
        input_ids,
        attention_mask=attention_mask,
        token_type_ids=token_type_ids,
        position_ids=position_ids,
        head_mask=head_mask,
        inputs_embeds=inputs_embeds,
        encoder_hidden_states=encoder_hidden_states,
        encoder_attention_mask=encoder_attention_mask,
        past_key_values=past_key_values,
        use_cache=use_cache,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=True,
    )

    last_hidden_state = roberta_output.last_hidden_state
    dense_output = self.dense_embedding(last_hidden_state, attention_mask)

    tokens_num = attention_mask.sum(axis=1)
    colbert_output = self.colbert_embedding(last_hidden_state, attention_mask)
    colbert_output = self._process_colbert_vecs(colbert_output, tokens_num)

    sparse_output = self.sparse_embedding(last_hidden_state, input_ids)
    sparse_output = self._process_token_weights(sparse_output, input_ids, attention_mask)

    if not return_dict:
        return (
            last_hidden_state,
            roberta_output.pooler_output,
            dense_output,
            colbert_output,
            sparse_output,
            roberta_output.hidden_states,
            roberta_output.past_key_values,
            roberta_output.attentions,
            roberta_output.cross_attentions,
        )

    return BgeM3ModelOutput(
        last_hidden_state=last_hidden_state,
        dense_output=dense_output,
        pooler_output=roberta_output.pooler_output,
        colbert_output=colbert_output,
        sparse_output=sparse_output,
        hidden_states=roberta_output.hidden_states,
        past_key_values=roberta_output.past_key_values,
        attentions=roberta_output.attentions,
        cross_attentions=roberta_output.cross_attentions,
    )

mindnlp.transformers.models.bge_m3.modeling_bge_m3.BgeM3Model.sparse_embedding(hidden_state, input_ids, return_embedding=False)

Sparse Embedding

This method computes the sparse embedding for a given hidden state and input IDs.

PARAMETER DESCRIPTION
self

The instance of the BgeM3Model class.

TYPE: BgeM3Model

hidden_state

The hidden state tensor.

input_ids

The input IDs tensor.

return_embedding

Whether to return the sparse embedding or token weights. Defaults to False.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/bge_m3/modeling_bge_m3.py
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
def sparse_embedding(self, hidden_state, input_ids, return_embedding: bool = False):
    """
    Sparse Embedding

    This method computes the sparse embedding for a given hidden state and input IDs.

    Args:
        self (BgeM3Model): The instance of the BgeM3Model class.
        hidden_state: The hidden state tensor.
        input_ids: The input IDs tensor.
        return_embedding (bool, optional): Whether to return the sparse embedding or token weights.
            Defaults to False.

    Returns:
        None

    Raises:
        None
    """
    token_weights = ops.relu(self.sparse_linear(hidden_state))
    if not return_embedding:
        return token_weights

    sparse_embedding = ops.zeros(
        (input_ids.shape[0],
        input_ids.shape[1],
        self.config.vocab_size),
        dtype=token_weights.dtype,
    )
    sparse_embedding = ops.scatter(sparse_embedding, dim=-1, index=input_ids.unsqueeze(-1), src=token_weights)

    unused_tokens = self.config.unused_tokens
    sparse_embedding = ops.max(sparse_embedding, dim=1)[0]
    sparse_embedding[:, unused_tokens] *= 0.0
    return sparse_embedding