bge_m3
mindnlp.transformers.models.bge_m3.configuration_bge_m3.BgeM3Config
¶
Bases: PretrainedConfig
A class representing the configuration for a BgeM3 model.
This class inherits from the PretrainedConfig class and defines the configuration parameters for a BgeM3 model, including vocabulary size, hidden size, number of hidden layers, number of attention heads, intermediate size, activation function, dropout probabilities, maximum position embeddings, type vocabulary size, initializer range, layer normalization epsilon, padding token ID, beginning of sequence token ID, end of sequence token ID, position embedding type, cache usage, classifier dropout, Colbert dimension, sentence pooling method, and unused tokens.
PARAMETER | DESCRIPTION |
---|---|
vocab_size |
The size of the vocabulary.
TYPE:
|
hidden_size |
The size of the hidden layers.
TYPE:
|
num_hidden_layers |
The number of hidden layers in the model.
TYPE:
|
num_attention_heads |
The number of attention heads in the model.
TYPE:
|
intermediate_size |
The size of the intermediate layer in the model.
TYPE:
|
hidden_act |
The activation function used in the hidden layers.
TYPE:
|
hidden_dropout_prob |
The dropout probability for the hidden layers.
TYPE:
|
attention_probs_dropout_prob |
The dropout probability for attention probabilities.
TYPE:
|
max_position_embeddings |
The maximum position embeddings in the model.
TYPE:
|
type_vocab_size |
The size of the type vocabulary.
TYPE:
|
initializer_range |
The range for parameter initialization.
TYPE:
|
layer_norm_eps |
The epsilon value for layer normalization.
TYPE:
|
pad_token_id |
The ID for padding tokens.
TYPE:
|
bos_token_id |
The ID for the beginning of sequence tokens.
TYPE:
|
eos_token_id |
The ID for the end of sequence tokens.
TYPE:
|
position_embedding_type |
The type of position embedding used.
TYPE:
|
use_cache |
Flag indicating whether caching is used.
TYPE:
|
classifier_dropout |
The dropout rate for the classifier layer.
TYPE:
|
colbert_dim |
The dimension of Colbert.
TYPE:
|
sentence_pooling_method |
The method used for sentence pooling.
TYPE:
|
unused_tokens |
A list of unused tokens.
TYPE:
|
ATTRIBUTE | DESCRIPTION |
---|---|
vocab_size |
The size of the vocabulary.
TYPE:
|
hidden_size |
The size of the hidden layers.
TYPE:
|
num_hidden_layers |
The number of hidden layers in the model.
TYPE:
|
num_attention_heads |
The number of attention heads in the model.
TYPE:
|
hidden_act |
The activation function used in the hidden layers.
TYPE:
|
intermediate_size |
The size of the intermediate layer in the model.
TYPE:
|
hidden_dropout_prob |
The dropout probability for the hidden layers.
TYPE:
|
attention_probs_dropout_prob |
The dropout probability for attention probabilities.
TYPE:
|
max_position_embeddings |
The maximum position embeddings in the model.
TYPE:
|
type_vocab_size |
The size of the type vocabulary.
TYPE:
|
initializer_range |
The range for parameter initialization.
TYPE:
|
layer_norm_eps |
The epsilon value for layer normalization.
TYPE:
|
position_embedding_type |
The type of position embedding used.
TYPE:
|
use_cache |
Flag indicating whether caching is used.
TYPE:
|
classifier_dropout |
The dropout rate for the classifier layer.
TYPE:
|
colbert_dim |
The dimension of Colbert.
TYPE:
|
sentence_pooling_method |
The method used for sentence pooling.
TYPE:
|
unused_tokens |
A list of unused tokens.
TYPE:
|
Source code in mindnlp/transformers/models/bge_m3/configuration_bge_m3.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 |
|
mindnlp.transformers.models.bge_m3.configuration_bge_m3.BgeM3Config.__init__(vocab_size=30522, hidden_size=768, num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072, hidden_act='gelu', hidden_dropout_prob=0.1, attention_probs_dropout_prob=0.1, max_position_embeddings=512, type_vocab_size=2, initializer_range=0.02, layer_norm_eps=1e-12, pad_token_id=1, bos_token_id=0, eos_token_id=2, position_embedding_type='absolute', use_cache=True, classifier_dropout=None, colbert_dim=None, sentence_pooling_method='cls', unused_tokens=None, **kwargs)
¶
This method initializes an instance of the BgeM3Config class with the given parameters.
PARAMETER | DESCRIPTION |
---|---|
self |
The instance of the class.
|
vocab_size |
The size of the vocabulary. Default is 30522.
TYPE:
|
hidden_size |
The size of the hidden layers. Default is 768.
TYPE:
|
num_hidden_layers |
The number of hidden layers. Default is 12.
TYPE:
|
num_attention_heads |
The number of attention heads. Default is 12.
TYPE:
|
intermediate_size |
The size of the intermediate layer in the transformer encoder. Default is 3072.
TYPE:
|
hidden_act |
The activation function for the hidden layers. Default is 'gelu'.
TYPE:
|
hidden_dropout_prob |
The dropout probability for the hidden layers. Default is 0.1.
TYPE:
|
attention_probs_dropout_prob |
The dropout probability for the attention probabilities. Default is 0.1.
TYPE:
|
max_position_embeddings |
The maximum number of positions for positional embeddings. Default is 512.
TYPE:
|
type_vocab_size |
The size of the type vocabulary. Default is 2.
TYPE:
|
initializer_range |
The range for parameter initializers. Default is 0.02.
TYPE:
|
layer_norm_eps |
The epsilon value for layer normalization. Default is 1e-12.
TYPE:
|
pad_token_id |
The token id for padding. Default is 1.
TYPE:
|
bos_token_id |
The token id for the beginning of sequence. Default is 0.
TYPE:
|
eos_token_id |
The token id for the end of sequence. Default is 2.
TYPE:
|
position_embedding_type |
The type of position embedding to use. Default is 'absolute'.
TYPE:
|
use_cache |
Whether to use caching during decoding. Default is True.
TYPE:
|
classifier_dropout |
The dropout probability for the classifier layer. Default is None.
TYPE:
|
colbert_dim |
The dimensionality of the colbert layer. Default is None.
TYPE:
|
sentence_pooling_method |
The method for pooling sentence representations. Default is 'cls'.
TYPE:
|
unused_tokens |
A list of unused tokens. Default is None.
TYPE:
|
**kwargs |
Additional keyword arguments.
DEFAULT:
|
RETURNS | DESCRIPTION |
---|---|
None. |
RAISES | DESCRIPTION |
---|---|
ValueError
|
If any of the parameters are invalid or out of range. |
Source code in mindnlp/transformers/models/bge_m3/configuration_bge_m3.py
77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 |
|
mindnlp.transformers.models.bge_m3.modeling_bge_m3.BgeM3Model
¶
Bases: XLMRobertaPreTrainedModel
The BgeM3Model class represents a model that extends XLMRobertaPreTrainedModel. It includes methods for dense embedding, sparse embedding, Colbert embedding, and processing token weights and Colbert vectors. The forward method processes input tensors to generate various outputs including last hidden state, dense output, pooler output, Colbert output, sparse output, hidden states, past key values, attentions, and cross attentions.
Source code in mindnlp/transformers/models/bge_m3/modeling_bge_m3.py
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 |
|
mindnlp.transformers.models.bge_m3.modeling_bge_m3.BgeM3Model.__init__(config)
¶
Initializes a new instance of the BgeM3Model class.
PARAMETER | DESCRIPTION |
---|---|
self |
The current BgeM3Model instance.
|
config |
The configuration object for BgeM3Model.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
None |
Source code in mindnlp/transformers/models/bge_m3/modeling_bge_m3.py
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
|
mindnlp.transformers.models.bge_m3.modeling_bge_m3.BgeM3Model.colbert_embedding(last_hidden_state, mask)
¶
Embeds the last hidden state of the BgeM3Model using the Colbert method.
PARAMETER | DESCRIPTION |
---|---|
self |
The instance of the BgeM3Model class.
TYPE:
|
last_hidden_state |
The last hidden state of the model. Shape: (batch_size, sequence_length, hidden_size)
TYPE:
|
mask |
The mask specifying the valid positions in the last_hidden_state tensor. Shape: (batch_size, sequence_length)
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
torch.Tensor: The embedded Colbert vectors. Shape: (batch_size, sequence_length-1, hidden_size) |
Source code in mindnlp/transformers/models/bge_m3/modeling_bge_m3.py
154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 |
|
mindnlp.transformers.models.bge_m3.modeling_bge_m3.BgeM3Model.dense_embedding(hidden_state, mask)
¶
This method calculates the dense embedding based on the provided hidden state and mask, using the specified sentence pooling method.
PARAMETER | DESCRIPTION |
---|---|
self |
The instance of the BgeM3Model class.
TYPE:
|
hidden_state |
The hidden state tensor representing the input sequence.
TYPE:
|
mask |
The mask tensor indicating the presence of valid elements in the input sequence. Its shape should be compatible with hidden_state.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
None
|
This method does not return a value, as the dense embedding is directly computed and returned. |
RAISES | DESCRIPTION |
---|---|
ValueError
|
If the sentence pooling method specified is not supported or recognized. |
RuntimeError
|
If there are issues with the tensor operations or calculations within the method. |
Source code in mindnlp/transformers/models/bge_m3/modeling_bge_m3.py
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 |
|
mindnlp.transformers.models.bge_m3.modeling_bge_m3.BgeM3Model.forward(input_ids=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, encoder_hidden_states=None, encoder_attention_mask=None, past_key_values=None, use_cache=None, output_attentions=None, output_hidden_states=None, return_dict=None)
¶
Constructs the BgeM3Model.
PARAMETER | DESCRIPTION |
---|---|
self |
The instance of the class.
|
input_ids |
The input tensor of shape (batch_size, sequence_length) containing the input IDs.
TYPE:
|
attention_mask |
The attention mask tensor of shape (batch_size, sequence_length) containing attention masks for the input IDs.
TYPE:
|
token_type_ids |
The token type IDs tensor of shape (batch_size, sequence_length) containing the token type IDs for the input IDs.
TYPE:
|
position_ids |
The position IDs tensor of shape (batch_size, sequence_length) containing the position IDs for the input IDs.
TYPE:
|
head_mask |
The head mask tensor of shape (num_heads,) or (num_layers, num_heads) containing the head mask for the transformer encoder.
TYPE:
|
inputs_embeds |
The input embeddings tensor of shape (batch_size, sequence_length, hidden_size) containing the embeddings for the input IDs.
TYPE:
|
encoder_hidden_states |
The encoder hidden states tensor of shape (batch_size, encoder_sequence_length, hidden_size) containing the hidden states of the encoder.
TYPE:
|
encoder_attention_mask |
The encoder attention mask tensor of shape (batch_size, encoder_sequence_length) containing attention masks for the encoder hidden states.
TYPE:
|
past_key_values |
The list of past key value tensors of shape (2, batch_size, num_heads, sequence_length, hidden_size//num_heads) containing the past key value states for the transformer decoder.
TYPE:
|
use_cache |
Whether to use cache for the transformer decoder.
TYPE:
|
output_attentions |
Whether to output attentions.
TYPE:
|
output_hidden_states |
Whether to output hidden states.
TYPE:
|
return_dict |
Whether to return a dictionary instead of a tuple.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Union[Tuple[Tensor], BgeM3ModelOutput]
|
Union[Tuple[mindspore.Tensor], BgeM3ModelOutput]:
If
|
BgeM3ModelOutput
|
If
TYPE:
|
Source code in mindnlp/transformers/models/bge_m3/modeling_bge_m3.py
253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 |
|
mindnlp.transformers.models.bge_m3.modeling_bge_m3.BgeM3Model.sparse_embedding(hidden_state, input_ids, return_embedding=False)
¶
Sparse Embedding
This method computes the sparse embedding for a given hidden state and input IDs.
PARAMETER | DESCRIPTION |
---|---|
self |
The instance of the BgeM3Model class.
TYPE:
|
hidden_state |
The hidden state tensor.
|
input_ids |
The input IDs tensor.
|
return_embedding |
Whether to return the sparse embedding or token weights. Defaults to False.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
None |
Source code in mindnlp/transformers/models/bge_m3/modeling_bge_m3.py
117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 |
|