Skip to content

pop2piano

mindnlp.transformers.models.pop2piano.modeling_pop2piano

Mindspore Pop2Piano model.

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoAttention

Bases: Module

This class represents a self-attention mechanism with optional relative attention bias for the Pop2Piano model. It inherits from nn.Module and provides functionalities for attention computation and head pruning.

ATTRIBUTE DESCRIPTION
config

Pop2PianoConfig, the configuration for the attention mechanism

has_relative_attention_bias

bool, flag indicating whether relative attention bias is enabled

relative_attention_num_buckets

int, the number of buckets for relative attention

relative_attention_max_distance

int, the maximum distance for relative attention

d_model

int, the model dimension

key_value_proj_dim

int, the dimension of projected key and value

n_heads

int, the number of attention heads

dropout

float, dropout rate

inner_dim

int, the inner dimension for multi-head attention

q

nn.Linear, query projection layer

k

nn.Linear, key projection layer

v

nn.Linear, value projection layer

o

nn.Linear, output projection layer

relative_attention_bias

nn.Embedding, embedding layer for relative attention bias

pruned_heads

set, set of pruned attention heads

gradient_checkpointing

bool, flag for gradient checkpointing

METHOD DESCRIPTION
prune_heads

Prunes specified attention heads from the model

_relative_position_bucket

Computes relative position buckets

compute_bias

Computes binned relative position bias

forward

Constructs attention mechanism

Note

For detailed information on each method and attribute, refer to the method and attribute documentation in the class implementation.

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
class Pop2PianoAttention(nn.Module):

    """
    This class represents a self-attention mechanism with optional relative attention bias for the Pop2Piano model.
    It inherits from nn.Module and provides functionalities for attention computation and head pruning.

    Attributes:
        config: Pop2PianoConfig, the configuration for the attention mechanism
        has_relative_attention_bias: bool, flag indicating whether relative attention bias is enabled
        relative_attention_num_buckets: int, the number of buckets for relative attention
        relative_attention_max_distance: int, the maximum distance for relative attention
        d_model: int, the model dimension
        key_value_proj_dim: int, the dimension of projected key and value
        n_heads: int, the number of attention heads
        dropout: float, dropout rate
        inner_dim: int, the inner dimension for multi-head attention
        q: nn.Linear, query projection layer
        k: nn.Linear, key projection layer
        v: nn.Linear, value projection layer
        o: nn.Linear, output projection layer
        relative_attention_bias: nn.Embedding, embedding layer for relative attention bias
        pruned_heads: set, set of pruned attention heads
        gradient_checkpointing: bool, flag for gradient checkpointing

    Methods:
        prune_heads: Prunes specified attention heads from the model
        _relative_position_bucket: Computes relative position buckets
        compute_bias: Computes binned relative position bias
        forward: Constructs attention mechanism

    Note:
        For detailed information on each method and attribute, refer to the method and attribute documentation in the
        class implementation.
    """
    def __init__(self, config: Pop2PianoConfig, has_relative_attention_bias=False):
        """
        Initializes an instance of the Pop2PianoAttention class.

        Args:
            self: The instance of the Pop2PianoAttention class.
            config (Pop2PianoConfig): An instance of Pop2PianoConfig containing the configuration parameters.
            has_relative_attention_bias (bool): A boolean indicating whether relative attention bias is enabled.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()
        self.is_decoder = config.is_decoder
        self.has_relative_attention_bias = has_relative_attention_bias
        self.relative_attention_num_buckets = config.relative_attention_num_buckets
        self.relative_attention_max_distance = config.relative_attention_max_distance
        self.d_model = config.d_model
        self.key_value_proj_dim = config.d_kv
        self.n_heads = config.num_heads
        self.dropout = config.dropout_rate
        self.inner_dim = self.n_heads * self.key_value_proj_dim

        # Mesh TensorFlow initialization to avoid scaling before softmax
        self.q = nn.Linear(self.d_model, self.inner_dim, bias=False)
        self.k = nn.Linear(self.d_model, self.inner_dim, bias=False)
        self.v = nn.Linear(self.d_model, self.inner_dim, bias=False)
        self.o = nn.Linear(self.inner_dim, self.d_model, bias=False)

        if self.has_relative_attention_bias:
            self.relative_attention_bias = nn.Embedding(self.relative_attention_num_buckets, self.n_heads)
        self.pruned_heads = set()
        self.gradient_checkpointing = False

    def prune_heads(self, heads):
        """
        This method 'prune_heads' is defined within the class 'Pop2PianoAttention' and is responsible for pruning the
        attention heads based on the provided criteria.

        Args:
            self: Represents the instance of the class 'Pop2PianoAttention'.
                It is used to access the class attributes and methods.

            heads: A list containing the indices of attention heads to be pruned.
                The indices should be within the range of the total number of attention heads.
                If the list is empty, no action will be taken.

        Returns:
            None: However, it modifies the internal state of the
                'Pop2PianoAttention' instance by pruning the attention heads and updating the relevant attributes.

        Raises:
            No specific exceptions are documented to be raised within this method. However, it is important to handle
            potential exceptions related to the internal functions being called within this method,
            such as 'find_pruneable_heads_and_indices' and 'prune_linear_layer'.
        """
        if len(heads) == 0:
            return
        heads, index = find_pruneable_heads_and_indices(
            heads, self.n_heads, self.key_value_proj_dim, self.pruned_heads
        )
        # Prune linear layers
        self.q = prune_linear_layer(self.q, index)
        self.k = prune_linear_layer(self.k, index)
        self.v = prune_linear_layer(self.v, index)
        self.o = prune_linear_layer(self.o, index, dim=1)
        # Update hyper params
        self.n_heads = self.n_heads - len(heads)
        self.inner_dim = self.key_value_proj_dim * self.n_heads
        self.pruned_heads = self.pruned_heads.union(heads)

    @staticmethod
    def _relative_position_bucket(relative_position, bidirectional=True, num_buckets=32, max_distance=128):
        """
        Adapted from Mesh Tensorflow:
        https://github.com/tensorflow/mesh/blob/0cb87fe07da627bf0b7e60475d59f95ed6b5be3d/mesh_tensorflow/transformer/transformer_layers.py#L593

        Translate relative position to a bucket number for relative attention. The relative position is defined as
        memory_position - query_position, i.e. the distance in tokens from the attending position to the attended-to
        position. If bidirectional=False, then positive relative positions are invalid. We use smaller buckets for
        small absolute relative_position and larger buckets for larger absolute relative_positions. All relative
        positions >=max_distance map to the same bucket. All relative positions <=-max_distance map to the same bucket.
        This should allow for more graceful generalization to longer sequences than the model has been trained on

        Args:
            relative_position: an int32 Tensor
            bidirectional: a boolean - whether the attention is bidirectional
            num_buckets: an integer
            max_distance: an integer

        Returns:
            a Tensor with the same shape as relative_position, containing int32 values in the range [0, num_buckets)
        """
        relative_buckets = 0
        if bidirectional:
            num_buckets //= 2
            relative_buckets += (relative_position > 0).to(mindspore.int64) * num_buckets
            relative_position = ops.abs(relative_position)
        else:
            relative_position = -ops.minimum(relative_position, ops.zeros_like(relative_position))
        # now relative_position is in the range [0, inf)

        # half of the buckets are for exact increments in positions
        max_exact = num_buckets // 2
        is_small = relative_position < max_exact

        # The other half of the buckets are for logarithmically bigger bins in positions up to max_distance
        relative_position_if_large = max_exact + (
            ops.log(relative_position.float() / max_exact)
            / math.log(max_distance / max_exact)
            * (num_buckets - max_exact)
        ).to(mindspore.int64)
        relative_position_if_large = ops.minimum(
            relative_position_if_large, ops.full_like(relative_position_if_large, num_buckets - 1)
        )

        relative_buckets += ops.where(is_small, relative_position, relative_position_if_large)
        return relative_buckets

    def compute_bias(self, query_length, key_length):
        """Compute binned relative position bias"""
        # if device is None:
        #     device = self.relative_attention_bias.weight.device
        context_position = ops.arange(query_length, dtype=mindspore.int64)[:, None]
        memory_position = ops.arange(key_length, dtype=mindspore.int64)[None, :]
        relative_position = memory_position - context_position  # shape (query_length, key_length)
        relative_position_bucket = self._relative_position_bucket(
            relative_position,  # shape (query_length, key_length)
            bidirectional=(not self.is_decoder),
            num_buckets=self.relative_attention_num_buckets,
            max_distance=self.relative_attention_max_distance,
        )
        values = self.relative_attention_bias(relative_position_bucket)  # shape (query_length, key_length, num_heads)
        values = values.permute([2, 0, 1]).unsqueeze(0)  # shape (1, num_heads, query_length, key_length)
        return values

    def forward(
        self,
        hidden_states,
        mask=None,
        key_value_states=None,
        position_bias=None,
        past_key_value=None,
        layer_head_mask=None,
        query_length=None,
        use_cache=False,
        output_attentions=False,
    ):
        """
        Self-attention (if key_value_states is None) or attention over source sentence (provided by key_value_states).
        """
        # Input is (batch_size, seq_length, dim)
        # Mask is (batch_size, key_length) (non-causal) or (batch_size, key_length, key_length)
        # past_key_value[0] is (batch_size, n_heads, q_len - 1, dim_per_head)
        batch_size, seq_length = hidden_states.shape[:2]

        real_seq_length = seq_length

        if past_key_value is not None:
            if len(past_key_value) != 2:
                raise ValueError(
                    f"past_key_value should have 2 past states: keys and values. Got { len(past_key_value)} past states"
                )
            real_seq_length += past_key_value[0].shape[2] if query_length is None else query_length

        key_length = real_seq_length if key_value_states is None else key_value_states.shape[1]

        def shape(states):
            """projection"""
            return states.view(batch_size, -1, self.n_heads, self.key_value_proj_dim).transpose(0, 2, 1, 3)

        def unshape(states):
            """reshape"""
            return states.transpose(0, 2, 1, 3).view(batch_size, -1, self.inner_dim)

        def project(hidden_states, proj_layer, key_value_states, past_key_value):
            """projects hidden states correctly to key/query states"""
            if key_value_states is None:
                # self-attn
                # (batch_size, n_heads, seq_length, dim_per_head)
                hidden_states = shape(proj_layer(hidden_states))
            elif past_key_value is None:
                # cross-attn
                # (batch_size, n_heads, seq_length, dim_per_head)
                hidden_states = shape(proj_layer(key_value_states))

            if past_key_value is not None:
                if key_value_states is None:
                    # self-attn
                    # (batch_size, n_heads, key_length, dim_per_head)
                    hidden_states = ops.cat([past_key_value, hidden_states], axis=2)
                elif past_key_value.shape[2] != key_value_states.shape[1]:
                    # checking that the `sequence_length` of the `past_key_value` is the same as
                    # the provided `key_value_states` to support prefix tuning
                    # cross-attn
                    # (batch_size, n_heads, seq_length, dim_per_head)
                    hidden_states = shape(proj_layer(key_value_states))
                else:
                    # cross-attn
                    hidden_states = past_key_value
            return hidden_states

        # get query states
        query_states = shape(self.q(hidden_states))  # (batch_size, n_heads, seq_length, dim_per_head)

        # get key/value states
        key_states = project(
            hidden_states, self.k, key_value_states, past_key_value[0] if past_key_value is not None else None
        )
        value_states = project(
            hidden_states, self.v, key_value_states, past_key_value[1] if past_key_value is not None else None
        )

        # compute scores
        scores = ops.matmul(
            query_states, key_states.transpose(0, 1, 3, 2)
        )  # equivalent of ops.einsum("bnqd,bnkd->bnqk", query_states, key_states), compatible with onnx op>9

        if position_bias is None:
            if not self.has_relative_attention_bias:
                position_bias = ops.zeros(
                    (1, self.n_heads, real_seq_length, key_length), dtype=scores.dtype
                )
                if self.gradient_checkpointing and self.training:
                    position_bias.requires_grad = True
            else:
                position_bias = self.compute_bias(real_seq_length, key_length)

            # if key and values are already calculated
            # we want only the last query position bias
            if past_key_value is not None:
                position_bias = position_bias[:, :, -hidden_states.shape[1] :, :]

            if mask is not None:
                position_bias = position_bias + mask  # (batch_size, n_heads, seq_length, key_length)

        if self.pruned_heads:
            mask = ops.ones(position_bias.shape[1])
            mask[list(self.pruned_heads)] = 0
            position_bias_masked = position_bias[:, mask.bool()]
        else:
            position_bias_masked = position_bias

        scores += position_bias_masked
        attn_weights = ops.softmax(scores.float(), axis=-1).astype(
            scores.dtype
        )  # (batch_size, n_heads, seq_length, key_length)
        attn_weights = ops.dropout(
            attn_weights, p=self.dropout, training=self.training
        )  # (batch_size, n_heads, seq_length, key_length)

        # Mask heads if we want to
        if layer_head_mask is not None:
            attn_weights = attn_weights * layer_head_mask

        attn_output = unshape(ops.matmul(attn_weights, value_states))  # (batch_size, seq_length, dim)
        attn_output = self.o(attn_output)

        present_key_value_state = (key_states, value_states) if (self.is_decoder and use_cache) else None
        outputs = (attn_output,) + (present_key_value_state,) + (position_bias,)

        if output_attentions:
            outputs = outputs + (attn_weights,)
        return outputs

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoAttention.__init__(config, has_relative_attention_bias=False)

Initializes an instance of the Pop2PianoAttention class.

PARAMETER DESCRIPTION
self

The instance of the Pop2PianoAttention class.

config

An instance of Pop2PianoConfig containing the configuration parameters.

TYPE: Pop2PianoConfig

has_relative_attention_bias

A boolean indicating whether relative attention bias is enabled.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
def __init__(self, config: Pop2PianoConfig, has_relative_attention_bias=False):
    """
    Initializes an instance of the Pop2PianoAttention class.

    Args:
        self: The instance of the Pop2PianoAttention class.
        config (Pop2PianoConfig): An instance of Pop2PianoConfig containing the configuration parameters.
        has_relative_attention_bias (bool): A boolean indicating whether relative attention bias is enabled.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()
    self.is_decoder = config.is_decoder
    self.has_relative_attention_bias = has_relative_attention_bias
    self.relative_attention_num_buckets = config.relative_attention_num_buckets
    self.relative_attention_max_distance = config.relative_attention_max_distance
    self.d_model = config.d_model
    self.key_value_proj_dim = config.d_kv
    self.n_heads = config.num_heads
    self.dropout = config.dropout_rate
    self.inner_dim = self.n_heads * self.key_value_proj_dim

    # Mesh TensorFlow initialization to avoid scaling before softmax
    self.q = nn.Linear(self.d_model, self.inner_dim, bias=False)
    self.k = nn.Linear(self.d_model, self.inner_dim, bias=False)
    self.v = nn.Linear(self.d_model, self.inner_dim, bias=False)
    self.o = nn.Linear(self.inner_dim, self.d_model, bias=False)

    if self.has_relative_attention_bias:
        self.relative_attention_bias = nn.Embedding(self.relative_attention_num_buckets, self.n_heads)
    self.pruned_heads = set()
    self.gradient_checkpointing = False

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoAttention.compute_bias(query_length, key_length)

Compute binned relative position bias

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
def compute_bias(self, query_length, key_length):
    """Compute binned relative position bias"""
    # if device is None:
    #     device = self.relative_attention_bias.weight.device
    context_position = ops.arange(query_length, dtype=mindspore.int64)[:, None]
    memory_position = ops.arange(key_length, dtype=mindspore.int64)[None, :]
    relative_position = memory_position - context_position  # shape (query_length, key_length)
    relative_position_bucket = self._relative_position_bucket(
        relative_position,  # shape (query_length, key_length)
        bidirectional=(not self.is_decoder),
        num_buckets=self.relative_attention_num_buckets,
        max_distance=self.relative_attention_max_distance,
    )
    values = self.relative_attention_bias(relative_position_bucket)  # shape (query_length, key_length, num_heads)
    values = values.permute([2, 0, 1]).unsqueeze(0)  # shape (1, num_heads, query_length, key_length)
    return values

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoAttention.forward(hidden_states, mask=None, key_value_states=None, position_bias=None, past_key_value=None, layer_head_mask=None, query_length=None, use_cache=False, output_attentions=False)

Self-attention (if key_value_states is None) or attention over source sentence (provided by key_value_states).

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
def forward(
    self,
    hidden_states,
    mask=None,
    key_value_states=None,
    position_bias=None,
    past_key_value=None,
    layer_head_mask=None,
    query_length=None,
    use_cache=False,
    output_attentions=False,
):
    """
    Self-attention (if key_value_states is None) or attention over source sentence (provided by key_value_states).
    """
    # Input is (batch_size, seq_length, dim)
    # Mask is (batch_size, key_length) (non-causal) or (batch_size, key_length, key_length)
    # past_key_value[0] is (batch_size, n_heads, q_len - 1, dim_per_head)
    batch_size, seq_length = hidden_states.shape[:2]

    real_seq_length = seq_length

    if past_key_value is not None:
        if len(past_key_value) != 2:
            raise ValueError(
                f"past_key_value should have 2 past states: keys and values. Got { len(past_key_value)} past states"
            )
        real_seq_length += past_key_value[0].shape[2] if query_length is None else query_length

    key_length = real_seq_length if key_value_states is None else key_value_states.shape[1]

    def shape(states):
        """projection"""
        return states.view(batch_size, -1, self.n_heads, self.key_value_proj_dim).transpose(0, 2, 1, 3)

    def unshape(states):
        """reshape"""
        return states.transpose(0, 2, 1, 3).view(batch_size, -1, self.inner_dim)

    def project(hidden_states, proj_layer, key_value_states, past_key_value):
        """projects hidden states correctly to key/query states"""
        if key_value_states is None:
            # self-attn
            # (batch_size, n_heads, seq_length, dim_per_head)
            hidden_states = shape(proj_layer(hidden_states))
        elif past_key_value is None:
            # cross-attn
            # (batch_size, n_heads, seq_length, dim_per_head)
            hidden_states = shape(proj_layer(key_value_states))

        if past_key_value is not None:
            if key_value_states is None:
                # self-attn
                # (batch_size, n_heads, key_length, dim_per_head)
                hidden_states = ops.cat([past_key_value, hidden_states], axis=2)
            elif past_key_value.shape[2] != key_value_states.shape[1]:
                # checking that the `sequence_length` of the `past_key_value` is the same as
                # the provided `key_value_states` to support prefix tuning
                # cross-attn
                # (batch_size, n_heads, seq_length, dim_per_head)
                hidden_states = shape(proj_layer(key_value_states))
            else:
                # cross-attn
                hidden_states = past_key_value
        return hidden_states

    # get query states
    query_states = shape(self.q(hidden_states))  # (batch_size, n_heads, seq_length, dim_per_head)

    # get key/value states
    key_states = project(
        hidden_states, self.k, key_value_states, past_key_value[0] if past_key_value is not None else None
    )
    value_states = project(
        hidden_states, self.v, key_value_states, past_key_value[1] if past_key_value is not None else None
    )

    # compute scores
    scores = ops.matmul(
        query_states, key_states.transpose(0, 1, 3, 2)
    )  # equivalent of ops.einsum("bnqd,bnkd->bnqk", query_states, key_states), compatible with onnx op>9

    if position_bias is None:
        if not self.has_relative_attention_bias:
            position_bias = ops.zeros(
                (1, self.n_heads, real_seq_length, key_length), dtype=scores.dtype
            )
            if self.gradient_checkpointing and self.training:
                position_bias.requires_grad = True
        else:
            position_bias = self.compute_bias(real_seq_length, key_length)

        # if key and values are already calculated
        # we want only the last query position bias
        if past_key_value is not None:
            position_bias = position_bias[:, :, -hidden_states.shape[1] :, :]

        if mask is not None:
            position_bias = position_bias + mask  # (batch_size, n_heads, seq_length, key_length)

    if self.pruned_heads:
        mask = ops.ones(position_bias.shape[1])
        mask[list(self.pruned_heads)] = 0
        position_bias_masked = position_bias[:, mask.bool()]
    else:
        position_bias_masked = position_bias

    scores += position_bias_masked
    attn_weights = ops.softmax(scores.float(), axis=-1).astype(
        scores.dtype
    )  # (batch_size, n_heads, seq_length, key_length)
    attn_weights = ops.dropout(
        attn_weights, p=self.dropout, training=self.training
    )  # (batch_size, n_heads, seq_length, key_length)

    # Mask heads if we want to
    if layer_head_mask is not None:
        attn_weights = attn_weights * layer_head_mask

    attn_output = unshape(ops.matmul(attn_weights, value_states))  # (batch_size, seq_length, dim)
    attn_output = self.o(attn_output)

    present_key_value_state = (key_states, value_states) if (self.is_decoder and use_cache) else None
    outputs = (attn_output,) + (present_key_value_state,) + (position_bias,)

    if output_attentions:
        outputs = outputs + (attn_weights,)
    return outputs

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoAttention.prune_heads(heads)

This method 'prune_heads' is defined within the class 'Pop2PianoAttention' and is responsible for pruning the attention heads based on the provided criteria.

PARAMETER DESCRIPTION
self

Represents the instance of the class 'Pop2PianoAttention'. It is used to access the class attributes and methods.

heads

A list containing the indices of attention heads to be pruned. The indices should be within the range of the total number of attention heads. If the list is empty, no action will be taken.

RETURNS DESCRIPTION
None

However, it modifies the internal state of the 'Pop2PianoAttention' instance by pruning the attention heads and updating the relevant attributes.

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
def prune_heads(self, heads):
    """
    This method 'prune_heads' is defined within the class 'Pop2PianoAttention' and is responsible for pruning the
    attention heads based on the provided criteria.

    Args:
        self: Represents the instance of the class 'Pop2PianoAttention'.
            It is used to access the class attributes and methods.

        heads: A list containing the indices of attention heads to be pruned.
            The indices should be within the range of the total number of attention heads.
            If the list is empty, no action will be taken.

    Returns:
        None: However, it modifies the internal state of the
            'Pop2PianoAttention' instance by pruning the attention heads and updating the relevant attributes.

    Raises:
        No specific exceptions are documented to be raised within this method. However, it is important to handle
        potential exceptions related to the internal functions being called within this method,
        such as 'find_pruneable_heads_and_indices' and 'prune_linear_layer'.
    """
    if len(heads) == 0:
        return
    heads, index = find_pruneable_heads_and_indices(
        heads, self.n_heads, self.key_value_proj_dim, self.pruned_heads
    )
    # Prune linear layers
    self.q = prune_linear_layer(self.q, index)
    self.k = prune_linear_layer(self.k, index)
    self.v = prune_linear_layer(self.v, index)
    self.o = prune_linear_layer(self.o, index, dim=1)
    # Update hyper params
    self.n_heads = self.n_heads - len(heads)
    self.inner_dim = self.key_value_proj_dim * self.n_heads
    self.pruned_heads = self.pruned_heads.union(heads)

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoBlock

Bases: Module

This class represents a block of the Pop2Piano model. It is a subclass of nn.Module and contains layers for self-attention, cross-attention (if applicable), and feed-forward processing.

ATTRIBUTE DESCRIPTION
is_decoder

Indicates whether the block is a decoder block or not.

TYPE: bool

layer

List of layers in the block, including self-attention, cross-attention, and feed-forward layers.

TYPE: ModuleList

METHOD DESCRIPTION
__init__

Initializes a new instance of the Pop2PianoBlock class.

forward

Constructs the block by applying the layers sequentially to the input hidden states.

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
 828
 829
 830
 831
 832
 833
 834
 835
 836
 837
 838
 839
 840
 841
 842
 843
 844
 845
 846
 847
 848
 849
 850
 851
 852
 853
 854
 855
 856
 857
 858
 859
 860
 861
 862
 863
 864
 865
 866
 867
 868
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
class Pop2PianoBlock(nn.Module):

    """
    This class represents a block of the Pop2Piano model. It is a subclass of nn.Module and contains layers for
    self-attention, cross-attention (if applicable), and feed-forward processing.

    Attributes:
        is_decoder (bool): Indicates whether the block is a decoder block or not.
        layer (nn.ModuleList): List of layers in the block, including self-attention, cross-attention, and
            feed-forward layers.

    Methods:
        __init__: Initializes a new instance of the Pop2PianoBlock class.
        forward: Constructs the block by applying the layers sequentially to the input hidden states.

    """
    def __init__(self, config, has_relative_attention_bias=False):
        """
        Initializes a new instance of the Pop2PianoBlock class.

        Args:
            self: The class instance that the method operates on.
            config: An instance of the configuration class that contains the model configuration.
            has_relative_attention_bias: A boolean value indicating whether the model has relative attention bias.
                Defaults to False.

        Returns:
            None

        Raises:
            None
        """
        super().__init__()
        self.is_decoder = config.is_decoder
        self.layer = nn.ModuleList()
        self.layer.append(Pop2PianoLayerSelfAttention(config, has_relative_attention_bias=has_relative_attention_bias))
        if self.is_decoder:
            self.layer.append(Pop2PianoLayerCrossAttention(config))

        self.layer.append(Pop2PianoLayerFF(config))

    def forward(
        self,
        hidden_states,
        attention_mask=None,
        position_bias=None,
        encoder_hidden_states=None,
        encoder_attention_mask=None,
        encoder_decoder_position_bias=None,
        layer_head_mask=None,
        cross_attn_layer_head_mask=None,
        past_key_value=None,
        use_cache=False,
        output_attentions=False,
    ):
        """
        Constructs the Pop2PianoBlock.

        This method forwards the Pop2PianoBlock by performing self-attention and cross-attention operations on the
        given input hidden states.

        Args:
            self (Pop2PianoBlock): The instance of the Pop2PianoBlock class.
            hidden_states (Tensor): The input hidden states. It has shape (batch_size, sequence_length, hidden_size).
            attention_mask (Tensor, optional): The attention mask tensor. It has shape (batch_size, sequence_length)
                and each element is either 0 or 1. Defaults to None.
            position_bias (Tensor, optional): The position bias tensor.
                It has shape (batch_size, num_heads, sequence_length, sequence_length). Defaults to None.
            encoder_hidden_states (Tensor, optional): The encoder hidden states tensor.
                It has shape (batch_size, sequence_length, hidden_size). Defaults to None.
            encoder_attention_mask (Tensor, optional): The encoder attention mask tensor.
                It has shape (batch_size, sequence_length) and each element is either 0 or 1. Defaults to None.
            encoder_decoder_position_bias (Tensor, optional): The encoder-decoder position bias tensor.
                It has shape (batch_size, num_heads, sequence_length, sequence_length). Defaults to None.
            layer_head_mask (Tensor, optional): The layer head mask tensor.
                It has shape (num_hidden_layers, num_heads) and each element is either 0 or 1. Defaults to None.
            cross_attn_layer_head_mask (Tensor, optional): The cross-attention layer head mask tensor.
                It has shape (num_hidden_layers, num_heads) and each element is either 0 or 1. Defaults to None.
            past_key_value (Tuple[Tensor], optional): The tuple of past key-value state tensors.
                The tuple contains two tensors for self-attention and four tensors for cross-attention. Defaults to None.
            use_cache (bool, optional): Whether to use cache for the attention outputs. Defaults to False.
            output_attentions (bool, optional): Whether to output attentions. Defaults to False.

        Returns:
            Tuple[Tensor]: The tuple containing the output hidden states tensor and other optional tensors,
                depending on the value of use_cache.

        Raises:
            ValueError: If the length of past_key_value is not equal to the expected number of past states.
            Warning: If past_key_values is passed to the encoder instead of the decoder.
        """
        if past_key_value is not None:
            if not self.is_decoder:
                logger.warning("`past_key_values` is passed to the encoder. Please make sure this is intended.")
            expected_num_past_key_values = 2 if encoder_hidden_states is None else 4

            if len(past_key_value) != expected_num_past_key_values:
                raise ValueError(
                    f"There should be {expected_num_past_key_values} past states. "
                    f"{'2 (past / key) for cross attention. ' if expected_num_past_key_values == 4 else ''}"
                    f"Got {len(past_key_value)} past key / value states"
                )

            self_attn_past_key_value = past_key_value[:2]
            cross_attn_past_key_value = past_key_value[2:]
        else:
            self_attn_past_key_value, cross_attn_past_key_value = None, None

        self_attention_outputs = self.layer[0](
            hidden_states,
            attention_mask=attention_mask,
            position_bias=position_bias,
            layer_head_mask=layer_head_mask,
            past_key_value=self_attn_past_key_value,
            use_cache=use_cache,
            output_attentions=output_attentions,
        )
        hidden_states, present_key_value_state = self_attention_outputs[:2]
        attention_outputs = self_attention_outputs[2:]  # Keep self-attention outputs and relative position weights

        # clamp inf values to enable fp16 training
        if hidden_states.dtype == mindspore.float16:
            clamp_value = finfo(hidden_states.dtype, 'max') - 1000 if ops.isinf(hidden_states).any() else \
                finfo(hidden_states.dtype, 'max')
            hidden_states = ops.clamp(hidden_states, min=-clamp_value, max=clamp_value)

        do_cross_attention = self.is_decoder and encoder_hidden_states is not None
        if do_cross_attention:
            # the actual query length is unknown for cross attention
            # if using past key value states. Need to inject it here
            if present_key_value_state is not None:
                query_length = present_key_value_state[0].shape[2]
            else:
                query_length = None

            cross_attention_outputs = self.layer[1](
                hidden_states,
                key_value_states=encoder_hidden_states,
                attention_mask=encoder_attention_mask,
                position_bias=encoder_decoder_position_bias,
                layer_head_mask=cross_attn_layer_head_mask,
                past_key_value=cross_attn_past_key_value,
                query_length=query_length,
                use_cache=use_cache,
                output_attentions=output_attentions,
            )
            hidden_states = cross_attention_outputs[0]

            # clamp inf values to enable fp16 training
            if hidden_states.dtype == mindspore.float16:
                clamp_value = finfo(hidden_states.dtype, 'max') - 1000 if ops.isinf(hidden_states).any() else \
                    finfo(hidden_states.dtype, 'max')
                hidden_states = ops.clamp(hidden_states, min=-clamp_value, max=clamp_value)

            # Combine self attn and cross attn key value states
            if present_key_value_state is not None:
                present_key_value_state = present_key_value_state + cross_attention_outputs[1]

            # Keep cross-attention outputs and relative position weights
            attention_outputs = attention_outputs + cross_attention_outputs[2:]

        # Apply Feed Forward layer
        hidden_states = self.layer[-1](hidden_states)

        # clamp inf values to enable fp16 training
        if hidden_states.dtype == mindspore.float16:
            clamp_value = finfo(hidden_states.dtype, 'max') - 1000 if ops.isinf(hidden_states).any() else \
                    finfo(hidden_states.dtype, 'max')
            hidden_states = ops.clamp(hidden_states, min=-clamp_value, max=clamp_value)

        outputs = (hidden_states,)

        if use_cache:
            outputs = outputs + (present_key_value_state,) + attention_outputs
        else:
            outputs = outputs + attention_outputs

        return outputs  # hidden-states, present_key_value_states, (self-attention position bias), (self-attention weights), (cross-attention position bias), (cross-attention weights)

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoBlock.__init__(config, has_relative_attention_bias=False)

Initializes a new instance of the Pop2PianoBlock class.

PARAMETER DESCRIPTION
self

The class instance that the method operates on.

config

An instance of the configuration class that contains the model configuration.

has_relative_attention_bias

A boolean value indicating whether the model has relative attention bias. Defaults to False.

DEFAULT: False

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
def __init__(self, config, has_relative_attention_bias=False):
    """
    Initializes a new instance of the Pop2PianoBlock class.

    Args:
        self: The class instance that the method operates on.
        config: An instance of the configuration class that contains the model configuration.
        has_relative_attention_bias: A boolean value indicating whether the model has relative attention bias.
            Defaults to False.

    Returns:
        None

    Raises:
        None
    """
    super().__init__()
    self.is_decoder = config.is_decoder
    self.layer = nn.ModuleList()
    self.layer.append(Pop2PianoLayerSelfAttention(config, has_relative_attention_bias=has_relative_attention_bias))
    if self.is_decoder:
        self.layer.append(Pop2PianoLayerCrossAttention(config))

    self.layer.append(Pop2PianoLayerFF(config))

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoBlock.forward(hidden_states, attention_mask=None, position_bias=None, encoder_hidden_states=None, encoder_attention_mask=None, encoder_decoder_position_bias=None, layer_head_mask=None, cross_attn_layer_head_mask=None, past_key_value=None, use_cache=False, output_attentions=False)

Constructs the Pop2PianoBlock.

This method forwards the Pop2PianoBlock by performing self-attention and cross-attention operations on the given input hidden states.

PARAMETER DESCRIPTION
self

The instance of the Pop2PianoBlock class.

TYPE: Pop2PianoBlock

hidden_states

The input hidden states. It has shape (batch_size, sequence_length, hidden_size).

TYPE: Tensor

attention_mask

The attention mask tensor. It has shape (batch_size, sequence_length) and each element is either 0 or 1. Defaults to None.

TYPE: Tensor DEFAULT: None

position_bias

The position bias tensor. It has shape (batch_size, num_heads, sequence_length, sequence_length). Defaults to None.

TYPE: Tensor DEFAULT: None

encoder_hidden_states

The encoder hidden states tensor. It has shape (batch_size, sequence_length, hidden_size). Defaults to None.

TYPE: Tensor DEFAULT: None

encoder_attention_mask

The encoder attention mask tensor. It has shape (batch_size, sequence_length) and each element is either 0 or 1. Defaults to None.

TYPE: Tensor DEFAULT: None

encoder_decoder_position_bias

The encoder-decoder position bias tensor. It has shape (batch_size, num_heads, sequence_length, sequence_length). Defaults to None.

TYPE: Tensor DEFAULT: None

layer_head_mask

The layer head mask tensor. It has shape (num_hidden_layers, num_heads) and each element is either 0 or 1. Defaults to None.

TYPE: Tensor DEFAULT: None

cross_attn_layer_head_mask

The cross-attention layer head mask tensor. It has shape (num_hidden_layers, num_heads) and each element is either 0 or 1. Defaults to None.

TYPE: Tensor DEFAULT: None

past_key_value

The tuple of past key-value state tensors. The tuple contains two tensors for self-attention and four tensors for cross-attention. Defaults to None.

TYPE: Tuple[Tensor] DEFAULT: None

use_cache

Whether to use cache for the attention outputs. Defaults to False.

TYPE: bool DEFAULT: False

output_attentions

Whether to output attentions. Defaults to False.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION

Tuple[Tensor]: The tuple containing the output hidden states tensor and other optional tensors, depending on the value of use_cache.

RAISES DESCRIPTION
ValueError

If the length of past_key_value is not equal to the expected number of past states.

Warning

If past_key_values is passed to the encoder instead of the decoder.

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
def forward(
    self,
    hidden_states,
    attention_mask=None,
    position_bias=None,
    encoder_hidden_states=None,
    encoder_attention_mask=None,
    encoder_decoder_position_bias=None,
    layer_head_mask=None,
    cross_attn_layer_head_mask=None,
    past_key_value=None,
    use_cache=False,
    output_attentions=False,
):
    """
    Constructs the Pop2PianoBlock.

    This method forwards the Pop2PianoBlock by performing self-attention and cross-attention operations on the
    given input hidden states.

    Args:
        self (Pop2PianoBlock): The instance of the Pop2PianoBlock class.
        hidden_states (Tensor): The input hidden states. It has shape (batch_size, sequence_length, hidden_size).
        attention_mask (Tensor, optional): The attention mask tensor. It has shape (batch_size, sequence_length)
            and each element is either 0 or 1. Defaults to None.
        position_bias (Tensor, optional): The position bias tensor.
            It has shape (batch_size, num_heads, sequence_length, sequence_length). Defaults to None.
        encoder_hidden_states (Tensor, optional): The encoder hidden states tensor.
            It has shape (batch_size, sequence_length, hidden_size). Defaults to None.
        encoder_attention_mask (Tensor, optional): The encoder attention mask tensor.
            It has shape (batch_size, sequence_length) and each element is either 0 or 1. Defaults to None.
        encoder_decoder_position_bias (Tensor, optional): The encoder-decoder position bias tensor.
            It has shape (batch_size, num_heads, sequence_length, sequence_length). Defaults to None.
        layer_head_mask (Tensor, optional): The layer head mask tensor.
            It has shape (num_hidden_layers, num_heads) and each element is either 0 or 1. Defaults to None.
        cross_attn_layer_head_mask (Tensor, optional): The cross-attention layer head mask tensor.
            It has shape (num_hidden_layers, num_heads) and each element is either 0 or 1. Defaults to None.
        past_key_value (Tuple[Tensor], optional): The tuple of past key-value state tensors.
            The tuple contains two tensors for self-attention and four tensors for cross-attention. Defaults to None.
        use_cache (bool, optional): Whether to use cache for the attention outputs. Defaults to False.
        output_attentions (bool, optional): Whether to output attentions. Defaults to False.

    Returns:
        Tuple[Tensor]: The tuple containing the output hidden states tensor and other optional tensors,
            depending on the value of use_cache.

    Raises:
        ValueError: If the length of past_key_value is not equal to the expected number of past states.
        Warning: If past_key_values is passed to the encoder instead of the decoder.
    """
    if past_key_value is not None:
        if not self.is_decoder:
            logger.warning("`past_key_values` is passed to the encoder. Please make sure this is intended.")
        expected_num_past_key_values = 2 if encoder_hidden_states is None else 4

        if len(past_key_value) != expected_num_past_key_values:
            raise ValueError(
                f"There should be {expected_num_past_key_values} past states. "
                f"{'2 (past / key) for cross attention. ' if expected_num_past_key_values == 4 else ''}"
                f"Got {len(past_key_value)} past key / value states"
            )

        self_attn_past_key_value = past_key_value[:2]
        cross_attn_past_key_value = past_key_value[2:]
    else:
        self_attn_past_key_value, cross_attn_past_key_value = None, None

    self_attention_outputs = self.layer[0](
        hidden_states,
        attention_mask=attention_mask,
        position_bias=position_bias,
        layer_head_mask=layer_head_mask,
        past_key_value=self_attn_past_key_value,
        use_cache=use_cache,
        output_attentions=output_attentions,
    )
    hidden_states, present_key_value_state = self_attention_outputs[:2]
    attention_outputs = self_attention_outputs[2:]  # Keep self-attention outputs and relative position weights

    # clamp inf values to enable fp16 training
    if hidden_states.dtype == mindspore.float16:
        clamp_value = finfo(hidden_states.dtype, 'max') - 1000 if ops.isinf(hidden_states).any() else \
            finfo(hidden_states.dtype, 'max')
        hidden_states = ops.clamp(hidden_states, min=-clamp_value, max=clamp_value)

    do_cross_attention = self.is_decoder and encoder_hidden_states is not None
    if do_cross_attention:
        # the actual query length is unknown for cross attention
        # if using past key value states. Need to inject it here
        if present_key_value_state is not None:
            query_length = present_key_value_state[0].shape[2]
        else:
            query_length = None

        cross_attention_outputs = self.layer[1](
            hidden_states,
            key_value_states=encoder_hidden_states,
            attention_mask=encoder_attention_mask,
            position_bias=encoder_decoder_position_bias,
            layer_head_mask=cross_attn_layer_head_mask,
            past_key_value=cross_attn_past_key_value,
            query_length=query_length,
            use_cache=use_cache,
            output_attentions=output_attentions,
        )
        hidden_states = cross_attention_outputs[0]

        # clamp inf values to enable fp16 training
        if hidden_states.dtype == mindspore.float16:
            clamp_value = finfo(hidden_states.dtype, 'max') - 1000 if ops.isinf(hidden_states).any() else \
                finfo(hidden_states.dtype, 'max')
            hidden_states = ops.clamp(hidden_states, min=-clamp_value, max=clamp_value)

        # Combine self attn and cross attn key value states
        if present_key_value_state is not None:
            present_key_value_state = present_key_value_state + cross_attention_outputs[1]

        # Keep cross-attention outputs and relative position weights
        attention_outputs = attention_outputs + cross_attention_outputs[2:]

    # Apply Feed Forward layer
    hidden_states = self.layer[-1](hidden_states)

    # clamp inf values to enable fp16 training
    if hidden_states.dtype == mindspore.float16:
        clamp_value = finfo(hidden_states.dtype, 'max') - 1000 if ops.isinf(hidden_states).any() else \
                finfo(hidden_states.dtype, 'max')
        hidden_states = ops.clamp(hidden_states, min=-clamp_value, max=clamp_value)

    outputs = (hidden_states,)

    if use_cache:
        outputs = outputs + (present_key_value_state,) + attention_outputs
    else:
        outputs = outputs + attention_outputs

    return outputs  # hidden-states, present_key_value_states, (self-attention position bias), (self-attention weights), (cross-attention position bias), (cross-attention weights)

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoConcatEmbeddingToMel

Bases: Module

Embedding Matrix for composer tokens.

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
class Pop2PianoConcatEmbeddingToMel(nn.Module):
    """Embedding Matrix for `composer` tokens."""
    def __init__(self, config):
        """
        Initializes the Pop2PianoConcatEmbeddingToMel class.

        Args:
            self: The instance of the Pop2PianoConcatEmbeddingToMel class.
            config:
                A configuration object containing parameters for the initialization.

                - Type: Config
                - Purpose: Specifies the configuration settings for the embedding layer.
                - Restrictions: Must contain the following attributes:

                    - composer_vocab_size: An integer specifying the vocabulary size for the composer.
                    - d_model: An integer specifying the dimension of the embedding.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()
        self.embedding = nn.Embedding(vocab_size=config.composer_vocab_size, embedding_size=config.d_model)

    def forward(self, feature, index_value, embedding_offset):
        """
        This method forwards inputs_embeds for Pop2PianoConcatEmbeddingToMel model.

        Args:
            self (object): The instance of the class Pop2PianoConcatEmbeddingToMel.
            feature (array): The input feature array to be concatenated with composer_embedding.
            index_value (int): The index value used for embedding lookup.
            embedding_offset (int): The offset value to adjust the index_value for embedding lookup.

        Returns:
            None.

        Raises:
            None
        """
        index_shifted = index_value - embedding_offset
        composer_embedding = self.embedding(index_shifted).unsqueeze(1)
        inputs_embeds = ops.cat([composer_embedding, feature], axis=1)
        return inputs_embeds

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoConcatEmbeddingToMel.__init__(config)

Initializes the Pop2PianoConcatEmbeddingToMel class.

PARAMETER DESCRIPTION
self

The instance of the Pop2PianoConcatEmbeddingToMel class.

config

A configuration object containing parameters for the initialization.

  • Type: Config
  • Purpose: Specifies the configuration settings for the embedding layer.
  • Restrictions: Must contain the following attributes:

    • composer_vocab_size: An integer specifying the vocabulary size for the composer.
    • d_model: An integer specifying the dimension of the embedding.

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
def __init__(self, config):
    """
    Initializes the Pop2PianoConcatEmbeddingToMel class.

    Args:
        self: The instance of the Pop2PianoConcatEmbeddingToMel class.
        config:
            A configuration object containing parameters for the initialization.

            - Type: Config
            - Purpose: Specifies the configuration settings for the embedding layer.
            - Restrictions: Must contain the following attributes:

                - composer_vocab_size: An integer specifying the vocabulary size for the composer.
                - d_model: An integer specifying the dimension of the embedding.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()
    self.embedding = nn.Embedding(vocab_size=config.composer_vocab_size, embedding_size=config.d_model)

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoConcatEmbeddingToMel.forward(feature, index_value, embedding_offset)

This method forwards inputs_embeds for Pop2PianoConcatEmbeddingToMel model.

PARAMETER DESCRIPTION
self

The instance of the class Pop2PianoConcatEmbeddingToMel.

TYPE: object

feature

The input feature array to be concatenated with composer_embedding.

TYPE: array

index_value

The index value used for embedding lookup.

TYPE: int

embedding_offset

The offset value to adjust the index_value for embedding lookup.

TYPE: int

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
def forward(self, feature, index_value, embedding_offset):
    """
    This method forwards inputs_embeds for Pop2PianoConcatEmbeddingToMel model.

    Args:
        self (object): The instance of the class Pop2PianoConcatEmbeddingToMel.
        feature (array): The input feature array to be concatenated with composer_embedding.
        index_value (int): The index value used for embedding lookup.
        embedding_offset (int): The offset value to adjust the index_value for embedding lookup.

    Returns:
        None.

    Raises:
        None
    """
    index_shifted = index_value - embedding_offset
    composer_embedding = self.embedding(index_shifted).unsqueeze(1)
    inputs_embeds = ops.cat([composer_embedding, feature], axis=1)
    return inputs_embeds

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoDenseActDense

Bases: Module

This class represents a Pop2PianoDenseActDense layer, which is used in neural network models. It inherits from the nn.Module class.

The Pop2PianoDenseActDense layer consists of two dense linear transformations (wi and wo), an activation function (act), and a dropout layer (dropout). The layer takes a tensor of hidden states as input and applies the following operations to the input:

  1. The input tensor is passed through the wi dense linear transformation.
  2. The result is then passed through the activation function specified by the Pop2PianoConfig's dense_act_fn attribute.
  3. The output of the activation function is then passed through the dropout layer, which randomly sets elements of the tensor to zero with a probability specified by the Pop2PianoConfig's dropout_rate attribute.
  4. If the weight of the wo dense linear transformation is a tensor and the input tensor's dtype is different from the weight's dtype, and the weight's dtype is not int8, the input tensor is converted to the same dtype as the weight.
  5. The converted input tensor is then passed through the wo dense linear transformation.
  6. The final output of the layer is returned.

Please note that this class assumes the existence of the Pop2PianoConfig class, which should be passed as an argument to the class's forwardor.

Example
>>> config = Pop2PianoConfig(...)
>>> layer = Pop2PianoDenseActDense(config)
>>> hidden_states = ...
>>> output = layer.forward(hidden_states)
Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
class Pop2PianoDenseActDense(nn.Module):

    """
    This class represents a Pop2PianoDenseActDense layer, which is used in neural network models.
    It inherits from the nn.Module class.

    The Pop2PianoDenseActDense layer consists of two dense linear transformations (wi and wo),
    an activation function (act), and a dropout layer (dropout). The layer takes a tensor of hidden states as input
    and applies the following operations to the input:

    1. The input tensor is passed through the wi dense linear transformation.
    2. The result is then passed through the activation function specified by the Pop2PianoConfig's dense_act_fn
    attribute.
    3. The output of the activation function is then passed through the dropout layer, which randomly sets elements
    of the tensor to zero with a probability specified by the Pop2PianoConfig's dropout_rate attribute.
    4. If the weight of the wo dense linear transformation is a tensor and the input tensor's dtype is different from
    the weight's dtype, and the weight's dtype is not int8, the input tensor is converted to the same dtype as the
    weight.
    5. The converted input tensor is then passed through the wo dense linear transformation.
    6. The final output of the layer is returned.

    Please note that this class assumes the existence of the Pop2PianoConfig class, which should be passed as an
    argument to the class's forwardor.

    Example:
        ```python
        >>> config = Pop2PianoConfig(...)
        >>> layer = Pop2PianoDenseActDense(config)
        >>> hidden_states = ...
        >>> output = layer.forward(hidden_states)
        ```
    """
    def __init__(self, config: Pop2PianoConfig):
        """
        Initializes the Pop2PianoDenseActDense class.

        Args:
            self: The instance of the class.
            config (Pop2PianoConfig): An instance of the Pop2PianoConfig class containing the configuration parameters
                for the model. It specifies the model's dimensions and activation function for the dense layers.

        Returns:
            None.

        Raises:
            TypeError: If the 'config' parameter is not of type Pop2PianoConfig.
            ValueError: If the 'config' parameter does not contain valid configuration parameters.
        """
        super().__init__()
        self.wi = nn.Linear(config.d_model, config.d_ff, bias=False)
        self.wo = nn.Linear(config.d_ff, config.d_model, bias=False)
        self.dropout = nn.Dropout(p=config.dropout_rate)
        self.act = ACT2FN[config.dense_act_fn]

    def forward(self, hidden_states):
        """
        Constructs the Pop2PianoDenseActDense object.

        Args:
            self: The instance of the Pop2PianoDenseActDense class.
            hidden_states (mindspore.Tensor): The hidden states to be processed.
                It should have a shape of (batch_size, feature_size).

        Returns:
            mindspore.Tensor: The processed hidden states. It has the same shape as the input hidden_states.

        Raises:
            TypeError: If the hidden_states parameter is not of type mindspore.Tensor.
            ValueError: If the shape of the hidden_states parameter is not (batch_size, feature_size).
        """
        hidden_states = self.wi(hidden_states)
        hidden_states = self.act(hidden_states)
        hidden_states = self.dropout(hidden_states)
        if (
            isinstance(self.wo.weight, mindspore.Tensor)
            and hidden_states.dtype != self.wo.weight.dtype
            and self.wo.weight.dtype != mindspore.int8
        ):
            hidden_states = hidden_states.to(self.wo.weight.dtype)
        hidden_states = self.wo(hidden_states)
        return hidden_states

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoDenseActDense.__init__(config)

Initializes the Pop2PianoDenseActDense class.

PARAMETER DESCRIPTION
self

The instance of the class.

config

An instance of the Pop2PianoConfig class containing the configuration parameters for the model. It specifies the model's dimensions and activation function for the dense layers.

TYPE: Pop2PianoConfig

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
TypeError

If the 'config' parameter is not of type Pop2PianoConfig.

ValueError

If the 'config' parameter does not contain valid configuration parameters.

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
def __init__(self, config: Pop2PianoConfig):
    """
    Initializes the Pop2PianoDenseActDense class.

    Args:
        self: The instance of the class.
        config (Pop2PianoConfig): An instance of the Pop2PianoConfig class containing the configuration parameters
            for the model. It specifies the model's dimensions and activation function for the dense layers.

    Returns:
        None.

    Raises:
        TypeError: If the 'config' parameter is not of type Pop2PianoConfig.
        ValueError: If the 'config' parameter does not contain valid configuration parameters.
    """
    super().__init__()
    self.wi = nn.Linear(config.d_model, config.d_ff, bias=False)
    self.wo = nn.Linear(config.d_ff, config.d_model, bias=False)
    self.dropout = nn.Dropout(p=config.dropout_rate)
    self.act = ACT2FN[config.dense_act_fn]

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoDenseActDense.forward(hidden_states)

Constructs the Pop2PianoDenseActDense object.

PARAMETER DESCRIPTION
self

The instance of the Pop2PianoDenseActDense class.

hidden_states

The hidden states to be processed. It should have a shape of (batch_size, feature_size).

TYPE: Tensor

RETURNS DESCRIPTION

mindspore.Tensor: The processed hidden states. It has the same shape as the input hidden_states.

RAISES DESCRIPTION
TypeError

If the hidden_states parameter is not of type mindspore.Tensor.

ValueError

If the shape of the hidden_states parameter is not (batch_size, feature_size).

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
def forward(self, hidden_states):
    """
    Constructs the Pop2PianoDenseActDense object.

    Args:
        self: The instance of the Pop2PianoDenseActDense class.
        hidden_states (mindspore.Tensor): The hidden states to be processed.
            It should have a shape of (batch_size, feature_size).

    Returns:
        mindspore.Tensor: The processed hidden states. It has the same shape as the input hidden_states.

    Raises:
        TypeError: If the hidden_states parameter is not of type mindspore.Tensor.
        ValueError: If the shape of the hidden_states parameter is not (batch_size, feature_size).
    """
    hidden_states = self.wi(hidden_states)
    hidden_states = self.act(hidden_states)
    hidden_states = self.dropout(hidden_states)
    if (
        isinstance(self.wo.weight, mindspore.Tensor)
        and hidden_states.dtype != self.wo.weight.dtype
        and self.wo.weight.dtype != mindspore.int8
    ):
        hidden_states = hidden_states.to(self.wo.weight.dtype)
    hidden_states = self.wo(hidden_states)
    return hidden_states

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoDenseGatedActDense

Bases: Module

This class represents a custom neural network module called Pop2PianoDenseGatedActDense that implements a dense gated activation function using Pop2PianoConfig parameters. The module consists of dense layers with gated activation functions for neural network computations. It inherits from the nn.Module class and provides methods for initializing and forwarding the neural network layers. The class contains methods for initializing network parameters and performing forward computations through the network layers.

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
class Pop2PianoDenseGatedActDense(nn.Module):

    """
    This class represents a custom neural network module called Pop2PianoDenseGatedActDense that implements a dense
    gated activation function using Pop2PianoConfig parameters.
    The module consists of dense layers with gated activation functions for neural network computations.
    It inherits from the nn.Module class and provides methods for initializing and forwarding the neural network layers.
    The class contains methods for initializing network parameters and performing forward computations through the
    network layers.
    """
    def __init__(self, config: Pop2PianoConfig):
        """
        Initializes a Pop2PianoDenseGatedActDense instance with the provided configuration.

        Args:
            self (Pop2PianoDenseGatedActDense): The instance of the Pop2PianoDenseGatedActDense class.
            config (Pop2PianoConfig):
                An instance of Pop2PianoConfig containing configuration parameters.

                - This parameter is used to configure the dense layers and activation functions.
                - It specifies the dimensions of the model, feed-forward layers, dropout rate, and activation
                function type.

        Returns:
            None.

        Raises:
            ValueError: If the configuration parameters are invalid or missing.
            TypeError: If the data types of the configuration parameters are incorrect.
            KeyError: If the activation function specified in the configuration is not supported.
        """
        super().__init__()
        self.wi_0 = nn.Linear(config.d_model, config.d_ff, bias=False)
        self.wi_1 = nn.Linear(config.d_model, config.d_ff, bias=False)
        self.wo = nn.Linear(config.d_ff, config.d_model, bias=False)
        self.dropout = nn.Dropout(p=config.dropout_rate)
        self.act = ACT2FN[config.dense_act_fn]

    def forward(self, hidden_states):
        """
        This method 'forward' in the class 'Pop2PianoDenseGatedActDense' forwards hidden states based on the
        provided input hidden states.

        Args:
            self: Instance of the class Pop2PianoDenseGatedActDense. It is used to access the class attributes and
                methods.

            hidden_states: A tensor representing the input hidden states. It is used as the initial input to forward
                the final hidden states. Type: Tensor.

        Returns:
            None: This method does not return any value but updates the hidden_states variable within the method.

        Raises:
            TypeError: If the input parameters are not of the expected types.
            ValueError: If there are issues with the shapes or values of the tensors being manipulated.
            RuntimeError: If there are runtime issues during the execution of the method.
        """
        hidden_gelu = self.act(self.wi_0(hidden_states))
        hidden_linear = self.wi_1(hidden_states)
        hidden_states = hidden_gelu * hidden_linear
        hidden_states = self.dropout(hidden_states)

        # To make 8bit quantization work for google/flan-t5-xxl, self.wo is kept in float32.
        # See https://github.com/huggingface/transformers/issues/20287
        # we also make sure the weights are not in `int8` in case users will force `_keep_in_fp32_modules` to be `None``
        if (
            isinstance(self.wo.weight, mindspore.Tensor)
            and hidden_states.dtype != self.wo.weight.dtype
            and self.wo.weight.dtype != mindspore.int8
        ):
            hidden_states = hidden_states.to(self.wo.weight.dtype)

        hidden_states = self.wo(hidden_states)
        return hidden_states

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoDenseGatedActDense.__init__(config)

Initializes a Pop2PianoDenseGatedActDense instance with the provided configuration.

PARAMETER DESCRIPTION
self

The instance of the Pop2PianoDenseGatedActDense class.

TYPE: Pop2PianoDenseGatedActDense

config

An instance of Pop2PianoConfig containing configuration parameters.

  • This parameter is used to configure the dense layers and activation functions.
  • It specifies the dimensions of the model, feed-forward layers, dropout rate, and activation function type.

TYPE: Pop2PianoConfig

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
ValueError

If the configuration parameters are invalid or missing.

TypeError

If the data types of the configuration parameters are incorrect.

KeyError

If the activation function specified in the configuration is not supported.

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
def __init__(self, config: Pop2PianoConfig):
    """
    Initializes a Pop2PianoDenseGatedActDense instance with the provided configuration.

    Args:
        self (Pop2PianoDenseGatedActDense): The instance of the Pop2PianoDenseGatedActDense class.
        config (Pop2PianoConfig):
            An instance of Pop2PianoConfig containing configuration parameters.

            - This parameter is used to configure the dense layers and activation functions.
            - It specifies the dimensions of the model, feed-forward layers, dropout rate, and activation
            function type.

    Returns:
        None.

    Raises:
        ValueError: If the configuration parameters are invalid or missing.
        TypeError: If the data types of the configuration parameters are incorrect.
        KeyError: If the activation function specified in the configuration is not supported.
    """
    super().__init__()
    self.wi_0 = nn.Linear(config.d_model, config.d_ff, bias=False)
    self.wi_1 = nn.Linear(config.d_model, config.d_ff, bias=False)
    self.wo = nn.Linear(config.d_ff, config.d_model, bias=False)
    self.dropout = nn.Dropout(p=config.dropout_rate)
    self.act = ACT2FN[config.dense_act_fn]

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoDenseGatedActDense.forward(hidden_states)

This method 'forward' in the class 'Pop2PianoDenseGatedActDense' forwards hidden states based on the provided input hidden states.

PARAMETER DESCRIPTION
self

Instance of the class Pop2PianoDenseGatedActDense. It is used to access the class attributes and methods.

hidden_states

A tensor representing the input hidden states. It is used as the initial input to forward the final hidden states. Type: Tensor.

RETURNS DESCRIPTION
None

This method does not return any value but updates the hidden_states variable within the method.

RAISES DESCRIPTION
TypeError

If the input parameters are not of the expected types.

ValueError

If there are issues with the shapes or values of the tensors being manipulated.

RuntimeError

If there are runtime issues during the execution of the method.

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
def forward(self, hidden_states):
    """
    This method 'forward' in the class 'Pop2PianoDenseGatedActDense' forwards hidden states based on the
    provided input hidden states.

    Args:
        self: Instance of the class Pop2PianoDenseGatedActDense. It is used to access the class attributes and
            methods.

        hidden_states: A tensor representing the input hidden states. It is used as the initial input to forward
            the final hidden states. Type: Tensor.

    Returns:
        None: This method does not return any value but updates the hidden_states variable within the method.

    Raises:
        TypeError: If the input parameters are not of the expected types.
        ValueError: If there are issues with the shapes or values of the tensors being manipulated.
        RuntimeError: If there are runtime issues during the execution of the method.
    """
    hidden_gelu = self.act(self.wi_0(hidden_states))
    hidden_linear = self.wi_1(hidden_states)
    hidden_states = hidden_gelu * hidden_linear
    hidden_states = self.dropout(hidden_states)

    # To make 8bit quantization work for google/flan-t5-xxl, self.wo is kept in float32.
    # See https://github.com/huggingface/transformers/issues/20287
    # we also make sure the weights are not in `int8` in case users will force `_keep_in_fp32_modules` to be `None``
    if (
        isinstance(self.wo.weight, mindspore.Tensor)
        and hidden_states.dtype != self.wo.weight.dtype
        and self.wo.weight.dtype != mindspore.int8
    ):
        hidden_states = hidden_states.to(self.wo.weight.dtype)

    hidden_states = self.wo(hidden_states)
    return hidden_states

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoForConditionalGeneration

Bases: Pop2PianoPreTrainedModel

The Pop2PianoForConditionalGeneration class is a subclass of Pop2PianoPreTrainedModel that represents a Pop2Piano model for conditional generation. It is specifically designed for generating MIDI token ids based on given input features.

Initialization

The class forwardor __init__ takes a Pop2PianoConfig object as an argument and initializes the model. It sets up the necessary components like the shared embedding layer, encoder, decoder, and LM head.

Model Components
  • shared: An embedding layer that maps token ids to their corresponding embeddings.
  • encoder: The Pop2PianoStack module responsible for encoding the input features.
  • decoder: The Pop2PianoStack module responsible for decoding and generating the output sequence.
  • lm_head: A linear layer that maps the decoder output to the vocabulary space.
Getter and Setter Methods
  • get_input_embeddings: Returns the shared embedding layer.
  • set_input_embeddings: Sets the shared embedding layer to the provided new_embeddings.
  • set_output_embeddings: Sets the LM head to the provided new_embeddings.
  • get_output_embeddings: Returns the LM head.
  • get_encoder: Returns the encoder module.
  • get_decoder: Returns the decoder module.
Generation Methods
  • get_mel_conditioner_outputs(): Concatenates mel conditioner tokens to the front of the input features for controlling the type of MIDI token generated by the model. It takes the input features, composer name, generation config, and attention mask as inputs.
  • forward(): Constructs the model for conditional generation. It takes various inputs like input ids, attention mask, decoder input ids, etc., and returns the generated MIDI token ids.
  • generate(): Generates token ids for MIDI outputs. It takes input features, attention mask, composer name, generation config, and additional kwargs as inputs. It returns the generated MIDI token ids.
  • prepare_inputs_for_generation(): Prepares the inputs for generation. It takes input ids, past key values, attention mask, and various masks as inputs and returns a dictionary of prepared inputs.
  • prepare_decoder_input_ids_from_labels(): Prepares the decoder input ids from labels. It takes labels as input and returns the shifted right labels.
  • _reorder_cache(): Reorders the past key values according to the beam index.

Please refer to the documentation of the parent class Pop2PianoPreTrainedModel for more details on other inherited methods and attributes.

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
class Pop2PianoForConditionalGeneration(Pop2PianoPreTrainedModel):

    """
    The `Pop2PianoForConditionalGeneration` class is a subclass of `Pop2PianoPreTrainedModel` that represents a
    Pop2Piano model for conditional generation. It is specifically designed for generating MIDI token ids based on
    given input features.

    Initialization:
        The class forwardor `__init__` takes a `Pop2PianoConfig` object as an argument and initializes the model.
        It sets up the necessary components like the shared embedding layer, encoder, decoder, and LM head.

    Model Components:
        - `shared`: An embedding layer that maps token ids to their corresponding embeddings.
        - `encoder`: The Pop2PianoStack module responsible for encoding the input features.
        - `decoder`: The Pop2PianoStack module responsible for decoding and generating the output sequence.
        - `lm_head`: A linear layer that maps the decoder output to the vocabulary space.

    Getter and Setter Methods:
        - `get_input_embeddings`: Returns the shared embedding layer.
        - `set_input_embeddings`: Sets the shared embedding layer to the provided `new_embeddings`.
        - `set_output_embeddings`: Sets the LM head to the provided `new_embeddings`.
        - `get_output_embeddings`: Returns the LM head.
        - `get_encoder`: Returns the encoder module.
        - `get_decoder`: Returns the decoder module.

    Generation Methods:
        - `get_mel_conditioner_outputs()`: Concatenates mel conditioner tokens to the front of the input features for
        controlling the type of MIDI token generated by the model. It takes the input features, composer name,
        generation config, and attention mask as inputs.
        - `forward()`: Constructs the model for conditional generation. It takes various inputs like input ids,
        attention mask, decoder input ids, etc., and returns the generated MIDI token ids.
        - `generate()`: Generates token ids for MIDI outputs. It takes input features, attention mask, composer name,
        generation config, and additional kwargs as inputs. It returns the generated MIDI token ids.
        - `prepare_inputs_for_generation()`: Prepares the inputs for generation. It takes input ids, past key values,
        attention mask, and various masks as inputs and returns a dictionary of prepared inputs.
        - `prepare_decoder_input_ids_from_labels()`: Prepares the decoder input ids from labels.
        It takes labels as input and returns the shifted right labels.
        - `_reorder_cache()`: Reorders the past key values according to the beam index.

    Please refer to the documentation of the parent class `Pop2PianoPreTrainedModel` for more details on other
    inherited methods and attributes.
    """
    _tied_weights_keys = ["encoder.embed_tokens.weight", "decoder.embed_tokens.weight", "lm_head.weight"]

    def __init__(self, config: Pop2PianoConfig):
        """
        Initializes an instance of the Pop2PianoForConditionalGeneration class.

        Args:
            self: The object instance.
            config (Pop2PianoConfig):
                The configuration object used for initializing the model.

                - The 'config' parameter is of type Pop2PianoConfig.
                - This parameter is required to create an instance of the model.
                - It contains various configuration settings for the model.
                - The 'config' parameter is used to set the attributes of the model object.
                - The 'config' parameter should not be None.

        Returns:
            None

        Raises:
            None
        """
        super().__init__(config)
        self.config = config
        self.model_dim = config.d_model

        self.shared = nn.Embedding(config.vocab_size, config.d_model)

        self.mel_conditioner = Pop2PianoConcatEmbeddingToMel(config)

        encoder_config = copy.deepcopy(config)
        encoder_config.is_decoder = False
        encoder_config.use_cache = False
        encoder_config.is_encoder_decoder = False

        self.encoder = Pop2PianoStack(encoder_config, self.shared)

        decoder_config = copy.deepcopy(config)
        decoder_config.is_decoder = True
        decoder_config.is_encoder_decoder = False
        decoder_config.num_layers = config.num_decoder_layers
        self.decoder = Pop2PianoStack(decoder_config, self.shared)

        self.lm_head = nn.Linear(config.d_model, config.vocab_size, bias=False)

        # Initialize weights and apply final processing
        self.post_init()

    def get_input_embeddings(self):
        """
        This method, 'get_input_embeddings', is defined within the class 'Pop2PianoForConditionalGeneration' and
        is used to retrieve the input embeddings.

        Args:
            self (object):
                The instance of the class.

                - Purpose: Represents the current instance of the class.
                - Restrictions: Must be an instance of 'Pop2PianoForConditionalGeneration'.

        Returns:
            None.

        Raises:
            None.
        """
        return self.shared

    def set_input_embeddings(self, new_embeddings):
        """
        Set the input embeddings for the Pop2PianoForConditionalGeneration model.

        Args:
            self (Pop2PianoForConditionalGeneration): The instance of the Pop2PianoForConditionalGeneration class.
            new_embeddings (object): The new input embeddings to be set for the model.
                Should be compatible with the model's encoder and decoder.

        Returns:
            None.

        Raises:
            None.
        """
        self.shared = new_embeddings
        self.encoder.set_input_embeddings(new_embeddings)
        self.decoder.set_input_embeddings(new_embeddings)

    def set_output_embeddings(self, new_embeddings):
        """
        Sets the output embeddings of the Pop2PianoForConditionalGeneration model.

        Args:
            self (Pop2PianoForConditionalGeneration): The instance of the Pop2PianoForConditionalGeneration class.
            new_embeddings (object): The new embeddings to be set as the output embeddings.

        Returns:
            None.

        Raises:
            None.
        """
        self.lm_head = new_embeddings

    def get_output_embeddings(self):
        """
        Method to retrieve the output embeddings from the Pop2PianoForConditionalGeneration class.

        Args:
            self: Pop2PianoForConditionalGeneration object. Represents the instance of the class.

        Returns:
            lm_head: The method returns the output embeddings from the 'lm_head' attribute of the instance.

        Raises:
            None.
        """
        return self.lm_head

    def get_encoder(self):
        """
        Returns the encoder used for Pop2PianoForConditionalGeneration.

        Args:
            self: An instance of the Pop2PianoForConditionalGeneration class.

        Returns:
            None.

        Raises:
            None.
        """
        return self.encoder

    def get_decoder(self):
        """
        Returns the decoder model used for conditional generation in the Pop2PianoForConditionalGeneration class.

        Args:
            self (Pop2PianoForConditionalGeneration): The instance of the Pop2PianoForConditionalGeneration class.
                This parameter is required to access the decoder model.

        Returns:
            None.

        Raises:
            None.
        """
        return self.decoder

    def get_mel_conditioner_outputs(
        self,
        input_features: mindspore.Tensor,
        composer: str,
        generation_config: GenerationConfig,
        attention_mask: mindspore.Tensor = None,
    ):
        """
        This method is used to concatenate mel conditioner tokens at the front of the input_features in order to
        control the type of MIDI token generated by the model.

        Args:
            input_features (`mindspore.Tensor` of shape `(batch_size, sequence_length, hidden_size)`):
                input features extracted from the feature extractor.
            composer (`str`):
                composer token which determines the type of MIDI tokens to be generated.
            generation_config (`~generation.GenerationConfig`):
                The generation is used to get the composer-feature_token pair.
            attention_mask (``, *optional*):
                For batched generation `input_features` are padded to have the same shape across all examples.
                `attention_mask` helps to determine which areas were padded and which were not.

                - 1 for tokens that are **not padded**,
                - 0 for tokens that are **padded**.
        """
        composer_to_feature_token = generation_config.composer_to_feature_token
        if composer not in composer_to_feature_token.keys():
            raise ValueError(
                f"Please choose a composer from {list(composer_to_feature_token.keys())}. Composer received - {composer}"
            )
        composer_value = composer_to_feature_token[composer]
        composer_value = mindspore.tensor(composer_value)
        composer_value = composer_value.repeat(input_features.shape[0])

        embedding_offset = min(composer_to_feature_token.values())

        input_features = self.mel_conditioner(
            feature=input_features,
            index_value=composer_value,
            embedding_offset=embedding_offset,
        )
        if attention_mask is not None:
            input_features[~attention_mask[:, 0].bool()] = 0.0

            # since self.mel_conditioner adds a new array at the front of inputs_embeds we need to do the same for attention_mask to keep the shapes same
            attention_mask = ops.cat([attention_mask[:, 0].view(-1, 1), attention_mask], axis=1)
            return input_features, attention_mask

        return input_features, None

    def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        decoder_input_ids: Optional[mindspore.Tensor] = None,
        decoder_attention_mask: Optional[mindspore.Tensor] = None,
        head_mask: Optional[mindspore.Tensor] = None,
        decoder_head_mask: Optional[mindspore.Tensor] = None,
        cross_attn_head_mask: Optional[mindspore.Tensor] = None,
        encoder_outputs: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
        past_key_values: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        input_features: Optional[mindspore.Tensor] = None,
        decoder_inputs_embeds: Optional[mindspore.Tensor] = None,
        labels: Optional[mindspore.Tensor] = None,
        use_cache: Optional[bool] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple[mindspore.Tensor], Seq2SeqLMOutput]:
        r"""
        Args:
            labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
                Labels for computing the sequence classification/regression loss. Indices should be in `[-100, 0, ...,
                config.vocab_size - 1]`. All labels set to `-100` are ignored (masked), the loss is only computed for
                labels in `[0, ..., config.vocab_size]`

        Returns:
            Union[Tuple[mindspore.Tensor], Seq2SeqLMOutput]
        """
        use_cache = use_cache if use_cache is not None else self.config.use_cache
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        if inputs_embeds is not None and input_features is not None:
            raise ValueError("Both `inputs_embeds` and `input_features` received! Please provide only one of them")
        if input_features is not None and inputs_embeds is None:
            inputs_embeds = input_features

        # Encode if needed (training, first prediction pass)
        if encoder_outputs is None:
            # Convert encoder inputs in embeddings if needed
            encoder_outputs = self.encoder(
                input_ids=input_ids,
                attention_mask=attention_mask,
                inputs_embeds=inputs_embeds,
                head_mask=head_mask,
                output_attentions=output_attentions,
                output_hidden_states=output_hidden_states,
                return_dict=return_dict,
            )
        elif return_dict and not isinstance(encoder_outputs, BaseModelOutput):
            encoder_outputs = BaseModelOutput(
                last_hidden_state=encoder_outputs[0],
                hidden_states=encoder_outputs[1] if len(encoder_outputs) > 1 else None,
                attentions=encoder_outputs[2] if len(encoder_outputs) > 2 else None,
            )

        hidden_states = encoder_outputs[0]

        if labels is not None and decoder_input_ids is None and decoder_inputs_embeds is None:
            # get decoder inputs from shifting lm labels to the right
            decoder_input_ids = self._shift_right(labels)

        # Decode
        decoder_outputs = self.decoder(
            input_ids=decoder_input_ids,
            attention_mask=decoder_attention_mask,
            inputs_embeds=decoder_inputs_embeds,
            past_key_values=past_key_values,
            encoder_hidden_states=hidden_states,
            encoder_attention_mask=attention_mask,
            head_mask=decoder_head_mask,
            cross_attn_head_mask=cross_attn_head_mask,
            use_cache=use_cache,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        sequence_output = decoder_outputs[0]

        if self.config.tie_word_embeddings:
            # Rescale output before projecting on vocab
            # See https://github.com/tensorflow/mesh/blob/fa19d69eafc9a482aff0b59ddd96b025c0cb207d/mesh_tensorflow/transformer/transformer.py#L586
            sequence_output = sequence_output * (self.model_dim**-0.5)

        lm_logits = self.lm_head(sequence_output)

        loss = None
        if labels is not None:
            loss = ops.cross_entropy(lm_logits.view(-1, lm_logits.shape[-1]), labels.view(-1), ignore_index=-100)

        if not return_dict:
            output = (lm_logits,) + decoder_outputs[1:] + encoder_outputs
            return ((loss,) + output) if loss is not None else output

        return Seq2SeqLMOutput(
            loss=loss,
            logits=lm_logits,
            past_key_values=decoder_outputs.past_key_values,
            decoder_hidden_states=decoder_outputs.hidden_states,
            decoder_attentions=decoder_outputs.attentions,
            cross_attentions=decoder_outputs.cross_attentions,
            encoder_last_hidden_state=encoder_outputs.last_hidden_state,
            encoder_hidden_states=encoder_outputs.hidden_states,
            encoder_attentions=encoder_outputs.attentions,
        )

    def generate(
        self,
        input_features,
        attention_mask=None,
        composer="composer1",
        generation_config=None,
        **kwargs,
    ):
        """
        Generates token ids for midi outputs.

        <Tip warning={true}>

        Most generation-controlling parameters are set in `generation_config` which, if not passed, will be set to the
        model's default generation configuration. You can override any `generation_config` by passing the corresponding
        parameters to generate(), e.g. `.generate(inputs, num_beams=4, do_sample=True)`. For an overview of generation
        strategies and code examples, check out the [following guide](./generation_strategies).

        </Tip>

        Parameters:
            input_features (`mindspore.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
                This is the featurized version of audio generated by `Pop2PianoFeatureExtractor`.
            attention_mask:
                For batched generation `input_features` are padded to have the same shape across all examples.
                `attention_mask` helps to determine which areas were padded and which were not.

                - 1 for tokens that are **not padded**,
                - 0 for tokens that are **padded**.
            composer (`str`, *optional*, defaults to `"composer1"`):
                This value is passed to `Pop2PianoConcatEmbeddingToMel` to generate different embeddings for each
                `"composer"`. Please make sure that the composet value is present in `composer_to_feature_token` in
                `generation_config`. For an example please see
                https://hf-mirror.com/sweetcocoa/pop2piano/blob/main/generation_config.json .
            generation_config (`~generation.GenerationConfig`, *optional*):
                The generation configuration to be used as base parametrization for the generation call. `**kwargs`
                passed to generate matching the attributes of `generation_config` will override them.

                If `generation_config` is not provided, the default will be used, which had the following loading
                priority:

                1. from the `generation_config.json` model file, if it exists;
                2. from the model configuration. Please note that unspecified parameters will inherit
                [`~generation.GenerationConfig`]'s default values, whose documentation should be checked to parameterize
                generation.
            kwargs:
                Ad hoc parametrization of `generate_config` and/or additional model-specific kwargs that will be
                forwarded to the `forward` function of the model. If the model is an encoder-decoder model, encoder
                specific kwargs should not be prefixed and decoder specific kwargs should be prefixed with *decoder_*.

        Returns:
            [`~utils.ModelOutput`] or `mindspore.Tensor`: A [`~utils.ModelOutput`] (if `return_dict_in_generate=True`
                or when `config.return_dict_in_generate=True`) or a `mindspore.Tensor`.

                Since Pop2Piano is an encoder-decoder model (`model.config.is_encoder_decoder=True`), the possible
                [`~utils.ModelOutput`] types are:

                - [`~generation.GenerateEncoderDecoderOutput`],
                - [`~generation.GenerateBeamEncoderDecoderOutput`]
        """
        if generation_config is None:
            generation_config = self.generation_config
        generation_config.update(**kwargs)

        # check for composer_to_feature_token
        if not hasattr(generation_config, "composer_to_feature_token"):
            raise ValueError(
                "`composer_to_feature_token` was not found! Please refer to "
                "https://hf-mirror.com/sweetcocoa/pop2piano/blob/main/generation_config.json"
                "and parse a dict like that."
            )

        if len(generation_config.composer_to_feature_token) != self.config.composer_vocab_size:
            raise ValueError(
                "config.composer_vocab_size must be same as the number of keys in "
                f"generation_config.composer_to_feature_token! "
                f"Found {self.config.composer_vocab_size} vs {len(generation_config.composer_to_feature_token)}."
            )

        # to control the variation of generated MIDI tokens we concatenate mel-conditioner tokens(which depends on composer_token)
        # at the front of input_features.
        input_features, attention_mask = self.get_mel_conditioner_outputs(
            input_features=input_features,
            attention_mask=attention_mask,
            composer=composer,
            generation_config=generation_config,
        )

        return super().generate(
            inputs=None,
            inputs_embeds=input_features,
            attention_mask=attention_mask,
            generation_config=generation_config,
            **kwargs,
        )

    def prepare_inputs_for_generation(
        self,
        input_ids,
        past_key_values=None,
        attention_mask=None,
        head_mask=None,
        decoder_head_mask=None,
        cross_attn_head_mask=None,
        use_cache=None,
        encoder_outputs=None,
        **kwargs,
    ):
        """
        This method prepares inputs for generation in the Pop2PianoForConditionalGeneration class.

        Args:
            self: The instance of the class.
            input_ids (Tensor): The input tensor containing the token ids for the input sequence.
            past_key_values (Tuple): A tuple of tensors containing the past key and value states for fast decoding.
                Defaults to None.
            attention_mask (Tensor): An optional tensor of the same size as input_ids, used to mask the input tokens.
                Defaults to None.
            head_mask (Tensor): An optional tensor with shape (num_heads,) that is used to mask the attention heads.
                Defaults to None.
            decoder_head_mask (Tensor): An optional tensor with shape (num_heads,) that is used to mask the decoder
                attention heads. Defaults to None.
            cross_attn_head_mask (Tensor): An optional tensor with shape (num_heads,) that is used to mask the
                cross-attention heads. Defaults to None.
            use_cache (bool): A flag indicating whether to use the cache for fast decoding. Defaults to None.
            encoder_outputs (Tuple): A tuple of tensors containing the encoder outputs, used in the cross-attention
                mechanism.

        Returns:
            None.

        Raises:
            None
        """
        # cut decoder_input_ids if past is used
        if past_key_values is not None:
            input_ids = input_ids[:, -1:]

        return {
            "decoder_input_ids": input_ids,
            "past_key_values": past_key_values,
            "encoder_outputs": encoder_outputs,
            "attention_mask": attention_mask,
            "head_mask": head_mask,
            "decoder_head_mask": decoder_head_mask,
            "cross_attn_head_mask": cross_attn_head_mask,
            "use_cache": use_cache,
        }

    def prepare_decoder_input_ids_from_labels(self, labels: mindspore.Tensor):
        """
        Prepare decoder input IDs from labels for conditional generation.

        Args:
            self (Pop2PianoForConditionalGeneration): The instance of the Pop2PianoForConditionalGeneration class.
            labels (mindspore.Tensor): The labels tensor representing the target sequence.
                It serves as the input to forward the decoder input IDs by shifting the labels to the right.

        Returns:
            None: This method does not return a value explicitly. It prepares the decoder input IDs for the model.

        Raises:
            None: This method does not raise any exceptions.
        """
        return self._shift_right(labels)

    def _reorder_cache(self, past_key_values, beam_idx):
        """
        Reorders the cache for the Pop2PianoForConditionalGeneration class.

        This method takes three parameters: self, past_key_values, and beam_idx.

        Args:
            self: An instance of the Pop2PianoForConditionalGeneration class.
            past_key_values: A tuple representing the past key values of the decoder.
                It contains the cached states for each layer of the decoder.
                If None, a warning will be logged and the method will return None.
            beam_idx: A tensor representing the indices of the beams. It is used to reorder the past key values.

        Returns:
            reordered_decoder_past: A tuple representing the reordered past key values.
                It contains the reordered states for each layer of the decoder.

        Raises:
            ValueError: If the shape of the reordered_layer_past_states[0] and layer_past_states[0] do not match,
                or if the length of reordered_layer_past_states and layer_past_states do not match.
        """
        # if decoder past is not included in output
        # speedy decoding is disabled and no need to reorder
        if past_key_values is None:
            logger.warning("You might want to consider setting `use_cache=True` to speed up decoding")
            return past_key_values

        reordered_decoder_past = ()
        for layer_past_states in past_key_values:
            # get the correct batch idx from layer past batch dim
            # batch dim of `past` is at 2nd position
            reordered_layer_past_states = ()
            for layer_past_state in layer_past_states:
                # need to set correct `past` for each of the four key / value states
                reordered_layer_past_states = reordered_layer_past_states + (
                    layer_past_state.index_select(0, beam_idx.to(layer_past_state.device)),
                )

            if reordered_layer_past_states[0].shape != layer_past_states[0].shape:
                raise ValueError(
                    f"reordered_layer_past_states[0] shape {reordered_layer_past_states[0].shape} and layer_past_states[0] shape {layer_past_states[0].shape} mismatched"
                )
            if len(reordered_layer_past_states) != len(layer_past_states):
                raise ValueError(
                    f"length of reordered_layer_past_states {len(reordered_layer_past_states)} and length of layer_past_states {len(layer_past_states)} mismatched"
                )

            reordered_decoder_past = reordered_decoder_past + (reordered_layer_past_states,)
        return reordered_decoder_past

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoForConditionalGeneration.__init__(config)

Initializes an instance of the Pop2PianoForConditionalGeneration class.

PARAMETER DESCRIPTION
self

The object instance.

config

The configuration object used for initializing the model.

  • The 'config' parameter is of type Pop2PianoConfig.
  • This parameter is required to create an instance of the model.
  • It contains various configuration settings for the model.
  • The 'config' parameter is used to set the attributes of the model object.
  • The 'config' parameter should not be None.

TYPE: Pop2PianoConfig

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
def __init__(self, config: Pop2PianoConfig):
    """
    Initializes an instance of the Pop2PianoForConditionalGeneration class.

    Args:
        self: The object instance.
        config (Pop2PianoConfig):
            The configuration object used for initializing the model.

            - The 'config' parameter is of type Pop2PianoConfig.
            - This parameter is required to create an instance of the model.
            - It contains various configuration settings for the model.
            - The 'config' parameter is used to set the attributes of the model object.
            - The 'config' parameter should not be None.

    Returns:
        None

    Raises:
        None
    """
    super().__init__(config)
    self.config = config
    self.model_dim = config.d_model

    self.shared = nn.Embedding(config.vocab_size, config.d_model)

    self.mel_conditioner = Pop2PianoConcatEmbeddingToMel(config)

    encoder_config = copy.deepcopy(config)
    encoder_config.is_decoder = False
    encoder_config.use_cache = False
    encoder_config.is_encoder_decoder = False

    self.encoder = Pop2PianoStack(encoder_config, self.shared)

    decoder_config = copy.deepcopy(config)
    decoder_config.is_decoder = True
    decoder_config.is_encoder_decoder = False
    decoder_config.num_layers = config.num_decoder_layers
    self.decoder = Pop2PianoStack(decoder_config, self.shared)

    self.lm_head = nn.Linear(config.d_model, config.vocab_size, bias=False)

    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoForConditionalGeneration.forward(input_ids=None, attention_mask=None, decoder_input_ids=None, decoder_attention_mask=None, head_mask=None, decoder_head_mask=None, cross_attn_head_mask=None, encoder_outputs=None, past_key_values=None, inputs_embeds=None, input_features=None, decoder_inputs_embeds=None, labels=None, use_cache=None, output_attentions=None, output_hidden_states=None, return_dict=None)

PARAMETER DESCRIPTION
labels

Labels for computing the sequence classification/regression loss. Indices should be in [-100, 0, ..., config.vocab_size - 1]. All labels set to -100 are ignored (masked), the loss is only computed for labels in [0, ..., config.vocab_size]

TYPE: `mindspore.Tensor` of shape `(batch_size,)`, *optional* DEFAULT: None

RETURNS DESCRIPTION
Union[Tuple[Tensor], Seq2SeqLMOutput]

Union[Tuple[mindspore.Tensor], Seq2SeqLMOutput]

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
def forward(
    self,
    input_ids: Optional[mindspore.Tensor] = None,
    attention_mask: Optional[mindspore.Tensor] = None,
    decoder_input_ids: Optional[mindspore.Tensor] = None,
    decoder_attention_mask: Optional[mindspore.Tensor] = None,
    head_mask: Optional[mindspore.Tensor] = None,
    decoder_head_mask: Optional[mindspore.Tensor] = None,
    cross_attn_head_mask: Optional[mindspore.Tensor] = None,
    encoder_outputs: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
    past_key_values: Optional[Tuple[Tuple[mindspore.Tensor]]] = None,
    inputs_embeds: Optional[mindspore.Tensor] = None,
    input_features: Optional[mindspore.Tensor] = None,
    decoder_inputs_embeds: Optional[mindspore.Tensor] = None,
    labels: Optional[mindspore.Tensor] = None,
    use_cache: Optional[bool] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[Tuple[mindspore.Tensor], Seq2SeqLMOutput]:
    r"""
    Args:
        labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*):
            Labels for computing the sequence classification/regression loss. Indices should be in `[-100, 0, ...,
            config.vocab_size - 1]`. All labels set to `-100` are ignored (masked), the loss is only computed for
            labels in `[0, ..., config.vocab_size]`

    Returns:
        Union[Tuple[mindspore.Tensor], Seq2SeqLMOutput]
    """
    use_cache = use_cache if use_cache is not None else self.config.use_cache
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    if inputs_embeds is not None and input_features is not None:
        raise ValueError("Both `inputs_embeds` and `input_features` received! Please provide only one of them")
    if input_features is not None and inputs_embeds is None:
        inputs_embeds = input_features

    # Encode if needed (training, first prediction pass)
    if encoder_outputs is None:
        # Convert encoder inputs in embeddings if needed
        encoder_outputs = self.encoder(
            input_ids=input_ids,
            attention_mask=attention_mask,
            inputs_embeds=inputs_embeds,
            head_mask=head_mask,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )
    elif return_dict and not isinstance(encoder_outputs, BaseModelOutput):
        encoder_outputs = BaseModelOutput(
            last_hidden_state=encoder_outputs[0],
            hidden_states=encoder_outputs[1] if len(encoder_outputs) > 1 else None,
            attentions=encoder_outputs[2] if len(encoder_outputs) > 2 else None,
        )

    hidden_states = encoder_outputs[0]

    if labels is not None and decoder_input_ids is None and decoder_inputs_embeds is None:
        # get decoder inputs from shifting lm labels to the right
        decoder_input_ids = self._shift_right(labels)

    # Decode
    decoder_outputs = self.decoder(
        input_ids=decoder_input_ids,
        attention_mask=decoder_attention_mask,
        inputs_embeds=decoder_inputs_embeds,
        past_key_values=past_key_values,
        encoder_hidden_states=hidden_states,
        encoder_attention_mask=attention_mask,
        head_mask=decoder_head_mask,
        cross_attn_head_mask=cross_attn_head_mask,
        use_cache=use_cache,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    sequence_output = decoder_outputs[0]

    if self.config.tie_word_embeddings:
        # Rescale output before projecting on vocab
        # See https://github.com/tensorflow/mesh/blob/fa19d69eafc9a482aff0b59ddd96b025c0cb207d/mesh_tensorflow/transformer/transformer.py#L586
        sequence_output = sequence_output * (self.model_dim**-0.5)

    lm_logits = self.lm_head(sequence_output)

    loss = None
    if labels is not None:
        loss = ops.cross_entropy(lm_logits.view(-1, lm_logits.shape[-1]), labels.view(-1), ignore_index=-100)

    if not return_dict:
        output = (lm_logits,) + decoder_outputs[1:] + encoder_outputs
        return ((loss,) + output) if loss is not None else output

    return Seq2SeqLMOutput(
        loss=loss,
        logits=lm_logits,
        past_key_values=decoder_outputs.past_key_values,
        decoder_hidden_states=decoder_outputs.hidden_states,
        decoder_attentions=decoder_outputs.attentions,
        cross_attentions=decoder_outputs.cross_attentions,
        encoder_last_hidden_state=encoder_outputs.last_hidden_state,
        encoder_hidden_states=encoder_outputs.hidden_states,
        encoder_attentions=encoder_outputs.attentions,
    )

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoForConditionalGeneration.generate(input_features, attention_mask=None, composer='composer1', generation_config=None, **kwargs)

Generates token ids for midi outputs.

Most generation-controlling parameters are set in generation_config which, if not passed, will be set to the model's default generation configuration. You can override any generation_config by passing the corresponding parameters to generate(), e.g. .generate(inputs, num_beams=4, do_sample=True). For an overview of generation strategies and code examples, check out the following guide.

PARAMETER DESCRIPTION
input_features

This is the featurized version of audio generated by Pop2PianoFeatureExtractor.

TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*

attention_mask

For batched generation input_features are padded to have the same shape across all examples. attention_mask helps to determine which areas were padded and which were not.

  • 1 for tokens that are not padded,
  • 0 for tokens that are padded.

DEFAULT: None

composer

This value is passed to Pop2PianoConcatEmbeddingToMel to generate different embeddings for each "composer". Please make sure that the composet value is present in composer_to_feature_token in generation_config. For an example please see https://hf-mirror.com/sweetcocoa/pop2piano/blob/main/generation_config.json .

TYPE: `str`, *optional*, defaults to `"composer1"` DEFAULT: 'composer1'

generation_config

The generation configuration to be used as base parametrization for the generation call. **kwargs passed to generate matching the attributes of generation_config will override them.

If generation_config is not provided, the default will be used, which had the following loading priority:

  1. from the generation_config.json model file, if it exists;
  2. from the model configuration. Please note that unspecified parameters will inherit [~generation.GenerationConfig]'s default values, whose documentation should be checked to parameterize generation.

TYPE: `~generation.GenerationConfig`, *optional* DEFAULT: None

kwargs

Ad hoc parametrization of generate_config and/or additional model-specific kwargs that will be forwarded to the forward function of the model. If the model is an encoder-decoder model, encoder specific kwargs should not be prefixed and decoder specific kwargs should be prefixed with decoder_.

DEFAULT: {}

RETURNS DESCRIPTION

[~utils.ModelOutput] or mindspore.Tensor: A [~utils.ModelOutput] (if return_dict_in_generate=True or when config.return_dict_in_generate=True) or a mindspore.Tensor.

Since Pop2Piano is an encoder-decoder model (model.config.is_encoder_decoder=True), the possible [~utils.ModelOutput] types are:

  • [~generation.GenerateEncoderDecoderOutput],
  • [~generation.GenerateBeamEncoderDecoderOutput]
Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
def generate(
    self,
    input_features,
    attention_mask=None,
    composer="composer1",
    generation_config=None,
    **kwargs,
):
    """
    Generates token ids for midi outputs.

    <Tip warning={true}>

    Most generation-controlling parameters are set in `generation_config` which, if not passed, will be set to the
    model's default generation configuration. You can override any `generation_config` by passing the corresponding
    parameters to generate(), e.g. `.generate(inputs, num_beams=4, do_sample=True)`. For an overview of generation
    strategies and code examples, check out the [following guide](./generation_strategies).

    </Tip>

    Parameters:
        input_features (`mindspore.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
            This is the featurized version of audio generated by `Pop2PianoFeatureExtractor`.
        attention_mask:
            For batched generation `input_features` are padded to have the same shape across all examples.
            `attention_mask` helps to determine which areas were padded and which were not.

            - 1 for tokens that are **not padded**,
            - 0 for tokens that are **padded**.
        composer (`str`, *optional*, defaults to `"composer1"`):
            This value is passed to `Pop2PianoConcatEmbeddingToMel` to generate different embeddings for each
            `"composer"`. Please make sure that the composet value is present in `composer_to_feature_token` in
            `generation_config`. For an example please see
            https://hf-mirror.com/sweetcocoa/pop2piano/blob/main/generation_config.json .
        generation_config (`~generation.GenerationConfig`, *optional*):
            The generation configuration to be used as base parametrization for the generation call. `**kwargs`
            passed to generate matching the attributes of `generation_config` will override them.

            If `generation_config` is not provided, the default will be used, which had the following loading
            priority:

            1. from the `generation_config.json` model file, if it exists;
            2. from the model configuration. Please note that unspecified parameters will inherit
            [`~generation.GenerationConfig`]'s default values, whose documentation should be checked to parameterize
            generation.
        kwargs:
            Ad hoc parametrization of `generate_config` and/or additional model-specific kwargs that will be
            forwarded to the `forward` function of the model. If the model is an encoder-decoder model, encoder
            specific kwargs should not be prefixed and decoder specific kwargs should be prefixed with *decoder_*.

    Returns:
        [`~utils.ModelOutput`] or `mindspore.Tensor`: A [`~utils.ModelOutput`] (if `return_dict_in_generate=True`
            or when `config.return_dict_in_generate=True`) or a `mindspore.Tensor`.

            Since Pop2Piano is an encoder-decoder model (`model.config.is_encoder_decoder=True`), the possible
            [`~utils.ModelOutput`] types are:

            - [`~generation.GenerateEncoderDecoderOutput`],
            - [`~generation.GenerateBeamEncoderDecoderOutput`]
    """
    if generation_config is None:
        generation_config = self.generation_config
    generation_config.update(**kwargs)

    # check for composer_to_feature_token
    if not hasattr(generation_config, "composer_to_feature_token"):
        raise ValueError(
            "`composer_to_feature_token` was not found! Please refer to "
            "https://hf-mirror.com/sweetcocoa/pop2piano/blob/main/generation_config.json"
            "and parse a dict like that."
        )

    if len(generation_config.composer_to_feature_token) != self.config.composer_vocab_size:
        raise ValueError(
            "config.composer_vocab_size must be same as the number of keys in "
            f"generation_config.composer_to_feature_token! "
            f"Found {self.config.composer_vocab_size} vs {len(generation_config.composer_to_feature_token)}."
        )

    # to control the variation of generated MIDI tokens we concatenate mel-conditioner tokens(which depends on composer_token)
    # at the front of input_features.
    input_features, attention_mask = self.get_mel_conditioner_outputs(
        input_features=input_features,
        attention_mask=attention_mask,
        composer=composer,
        generation_config=generation_config,
    )

    return super().generate(
        inputs=None,
        inputs_embeds=input_features,
        attention_mask=attention_mask,
        generation_config=generation_config,
        **kwargs,
    )

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoForConditionalGeneration.get_decoder()

Returns the decoder model used for conditional generation in the Pop2PianoForConditionalGeneration class.

PARAMETER DESCRIPTION
self

The instance of the Pop2PianoForConditionalGeneration class. This parameter is required to access the decoder model.

TYPE: Pop2PianoForConditionalGeneration

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
def get_decoder(self):
    """
    Returns the decoder model used for conditional generation in the Pop2PianoForConditionalGeneration class.

    Args:
        self (Pop2PianoForConditionalGeneration): The instance of the Pop2PianoForConditionalGeneration class.
            This parameter is required to access the decoder model.

    Returns:
        None.

    Raises:
        None.
    """
    return self.decoder

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoForConditionalGeneration.get_encoder()

Returns the encoder used for Pop2PianoForConditionalGeneration.

PARAMETER DESCRIPTION
self

An instance of the Pop2PianoForConditionalGeneration class.

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
def get_encoder(self):
    """
    Returns the encoder used for Pop2PianoForConditionalGeneration.

    Args:
        self: An instance of the Pop2PianoForConditionalGeneration class.

    Returns:
        None.

    Raises:
        None.
    """
    return self.encoder

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoForConditionalGeneration.get_input_embeddings()

This method, 'get_input_embeddings', is defined within the class 'Pop2PianoForConditionalGeneration' and is used to retrieve the input embeddings.

PARAMETER DESCRIPTION
self

The instance of the class.

  • Purpose: Represents the current instance of the class.
  • Restrictions: Must be an instance of 'Pop2PianoForConditionalGeneration'.

TYPE: object

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
def get_input_embeddings(self):
    """
    This method, 'get_input_embeddings', is defined within the class 'Pop2PianoForConditionalGeneration' and
    is used to retrieve the input embeddings.

    Args:
        self (object):
            The instance of the class.

            - Purpose: Represents the current instance of the class.
            - Restrictions: Must be an instance of 'Pop2PianoForConditionalGeneration'.

    Returns:
        None.

    Raises:
        None.
    """
    return self.shared

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoForConditionalGeneration.get_mel_conditioner_outputs(input_features, composer, generation_config, attention_mask=None)

This method is used to concatenate mel conditioner tokens at the front of the input_features in order to control the type of MIDI token generated by the model.

PARAMETER DESCRIPTION
input_features

input features extracted from the feature extractor.

TYPE: `mindspore.Tensor` of shape `(batch_size, sequence_length, hidden_size)`

composer

composer token which determines the type of MIDI tokens to be generated.

TYPE: `str`

generation_config

The generation is used to get the composer-feature_token pair.

TYPE: `~generation.GenerationConfig`

attention_mask

For batched generation input_features are padded to have the same shape across all examples. attention_mask helps to determine which areas were padded and which were not.

  • 1 for tokens that are not padded,
  • 0 for tokens that are padded.

TYPE: ``, *optional* DEFAULT: None

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
def get_mel_conditioner_outputs(
    self,
    input_features: mindspore.Tensor,
    composer: str,
    generation_config: GenerationConfig,
    attention_mask: mindspore.Tensor = None,
):
    """
    This method is used to concatenate mel conditioner tokens at the front of the input_features in order to
    control the type of MIDI token generated by the model.

    Args:
        input_features (`mindspore.Tensor` of shape `(batch_size, sequence_length, hidden_size)`):
            input features extracted from the feature extractor.
        composer (`str`):
            composer token which determines the type of MIDI tokens to be generated.
        generation_config (`~generation.GenerationConfig`):
            The generation is used to get the composer-feature_token pair.
        attention_mask (``, *optional*):
            For batched generation `input_features` are padded to have the same shape across all examples.
            `attention_mask` helps to determine which areas were padded and which were not.

            - 1 for tokens that are **not padded**,
            - 0 for tokens that are **padded**.
    """
    composer_to_feature_token = generation_config.composer_to_feature_token
    if composer not in composer_to_feature_token.keys():
        raise ValueError(
            f"Please choose a composer from {list(composer_to_feature_token.keys())}. Composer received - {composer}"
        )
    composer_value = composer_to_feature_token[composer]
    composer_value = mindspore.tensor(composer_value)
    composer_value = composer_value.repeat(input_features.shape[0])

    embedding_offset = min(composer_to_feature_token.values())

    input_features = self.mel_conditioner(
        feature=input_features,
        index_value=composer_value,
        embedding_offset=embedding_offset,
    )
    if attention_mask is not None:
        input_features[~attention_mask[:, 0].bool()] = 0.0

        # since self.mel_conditioner adds a new array at the front of inputs_embeds we need to do the same for attention_mask to keep the shapes same
        attention_mask = ops.cat([attention_mask[:, 0].view(-1, 1), attention_mask], axis=1)
        return input_features, attention_mask

    return input_features, None

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoForConditionalGeneration.get_output_embeddings()

Method to retrieve the output embeddings from the Pop2PianoForConditionalGeneration class.

PARAMETER DESCRIPTION
self

Pop2PianoForConditionalGeneration object. Represents the instance of the class.

RETURNS DESCRIPTION
lm_head

The method returns the output embeddings from the 'lm_head' attribute of the instance.

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
def get_output_embeddings(self):
    """
    Method to retrieve the output embeddings from the Pop2PianoForConditionalGeneration class.

    Args:
        self: Pop2PianoForConditionalGeneration object. Represents the instance of the class.

    Returns:
        lm_head: The method returns the output embeddings from the 'lm_head' attribute of the instance.

    Raises:
        None.
    """
    return self.lm_head

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoForConditionalGeneration.prepare_decoder_input_ids_from_labels(labels)

Prepare decoder input IDs from labels for conditional generation.

PARAMETER DESCRIPTION
self

The instance of the Pop2PianoForConditionalGeneration class.

TYPE: Pop2PianoForConditionalGeneration

labels

The labels tensor representing the target sequence. It serves as the input to forward the decoder input IDs by shifting the labels to the right.

TYPE: Tensor

RETURNS DESCRIPTION
None

This method does not return a value explicitly. It prepares the decoder input IDs for the model.

RAISES DESCRIPTION
None

This method does not raise any exceptions.

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
def prepare_decoder_input_ids_from_labels(self, labels: mindspore.Tensor):
    """
    Prepare decoder input IDs from labels for conditional generation.

    Args:
        self (Pop2PianoForConditionalGeneration): The instance of the Pop2PianoForConditionalGeneration class.
        labels (mindspore.Tensor): The labels tensor representing the target sequence.
            It serves as the input to forward the decoder input IDs by shifting the labels to the right.

    Returns:
        None: This method does not return a value explicitly. It prepares the decoder input IDs for the model.

    Raises:
        None: This method does not raise any exceptions.
    """
    return self._shift_right(labels)

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoForConditionalGeneration.prepare_inputs_for_generation(input_ids, past_key_values=None, attention_mask=None, head_mask=None, decoder_head_mask=None, cross_attn_head_mask=None, use_cache=None, encoder_outputs=None, **kwargs)

This method prepares inputs for generation in the Pop2PianoForConditionalGeneration class.

PARAMETER DESCRIPTION
self

The instance of the class.

input_ids

The input tensor containing the token ids for the input sequence.

TYPE: Tensor

past_key_values

A tuple of tensors containing the past key and value states for fast decoding. Defaults to None.

TYPE: Tuple DEFAULT: None

attention_mask

An optional tensor of the same size as input_ids, used to mask the input tokens. Defaults to None.

TYPE: Tensor DEFAULT: None

head_mask

An optional tensor with shape (num_heads,) that is used to mask the attention heads. Defaults to None.

TYPE: Tensor DEFAULT: None

decoder_head_mask

An optional tensor with shape (num_heads,) that is used to mask the decoder attention heads. Defaults to None.

TYPE: Tensor DEFAULT: None

cross_attn_head_mask

An optional tensor with shape (num_heads,) that is used to mask the cross-attention heads. Defaults to None.

TYPE: Tensor DEFAULT: None

use_cache

A flag indicating whether to use the cache for fast decoding. Defaults to None.

TYPE: bool DEFAULT: None

encoder_outputs

A tuple of tensors containing the encoder outputs, used in the cross-attention mechanism.

TYPE: Tuple DEFAULT: None

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
def prepare_inputs_for_generation(
    self,
    input_ids,
    past_key_values=None,
    attention_mask=None,
    head_mask=None,
    decoder_head_mask=None,
    cross_attn_head_mask=None,
    use_cache=None,
    encoder_outputs=None,
    **kwargs,
):
    """
    This method prepares inputs for generation in the Pop2PianoForConditionalGeneration class.

    Args:
        self: The instance of the class.
        input_ids (Tensor): The input tensor containing the token ids for the input sequence.
        past_key_values (Tuple): A tuple of tensors containing the past key and value states for fast decoding.
            Defaults to None.
        attention_mask (Tensor): An optional tensor of the same size as input_ids, used to mask the input tokens.
            Defaults to None.
        head_mask (Tensor): An optional tensor with shape (num_heads,) that is used to mask the attention heads.
            Defaults to None.
        decoder_head_mask (Tensor): An optional tensor with shape (num_heads,) that is used to mask the decoder
            attention heads. Defaults to None.
        cross_attn_head_mask (Tensor): An optional tensor with shape (num_heads,) that is used to mask the
            cross-attention heads. Defaults to None.
        use_cache (bool): A flag indicating whether to use the cache for fast decoding. Defaults to None.
        encoder_outputs (Tuple): A tuple of tensors containing the encoder outputs, used in the cross-attention
            mechanism.

    Returns:
        None.

    Raises:
        None
    """
    # cut decoder_input_ids if past is used
    if past_key_values is not None:
        input_ids = input_ids[:, -1:]

    return {
        "decoder_input_ids": input_ids,
        "past_key_values": past_key_values,
        "encoder_outputs": encoder_outputs,
        "attention_mask": attention_mask,
        "head_mask": head_mask,
        "decoder_head_mask": decoder_head_mask,
        "cross_attn_head_mask": cross_attn_head_mask,
        "use_cache": use_cache,
    }

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoForConditionalGeneration.set_input_embeddings(new_embeddings)

Set the input embeddings for the Pop2PianoForConditionalGeneration model.

PARAMETER DESCRIPTION
self

The instance of the Pop2PianoForConditionalGeneration class.

TYPE: Pop2PianoForConditionalGeneration

new_embeddings

The new input embeddings to be set for the model. Should be compatible with the model's encoder and decoder.

TYPE: object

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
def set_input_embeddings(self, new_embeddings):
    """
    Set the input embeddings for the Pop2PianoForConditionalGeneration model.

    Args:
        self (Pop2PianoForConditionalGeneration): The instance of the Pop2PianoForConditionalGeneration class.
        new_embeddings (object): The new input embeddings to be set for the model.
            Should be compatible with the model's encoder and decoder.

    Returns:
        None.

    Raises:
        None.
    """
    self.shared = new_embeddings
    self.encoder.set_input_embeddings(new_embeddings)
    self.decoder.set_input_embeddings(new_embeddings)

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoForConditionalGeneration.set_output_embeddings(new_embeddings)

Sets the output embeddings of the Pop2PianoForConditionalGeneration model.

PARAMETER DESCRIPTION
self

The instance of the Pop2PianoForConditionalGeneration class.

TYPE: Pop2PianoForConditionalGeneration

new_embeddings

The new embeddings to be set as the output embeddings.

TYPE: object

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
def set_output_embeddings(self, new_embeddings):
    """
    Sets the output embeddings of the Pop2PianoForConditionalGeneration model.

    Args:
        self (Pop2PianoForConditionalGeneration): The instance of the Pop2PianoForConditionalGeneration class.
        new_embeddings (object): The new embeddings to be set as the output embeddings.

    Returns:
        None.

    Raises:
        None.
    """
    self.lm_head = new_embeddings

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoLayerCrossAttention

Bases: Module

The Pop2PianoLayerCrossAttention class represents a layer that performs cross-attention within the Pop2Piano model architecture. This class inherits from nn.Module and contains methods for initializing the layer and forwarding the cross-attention mechanism.

ATTRIBUTE DESCRIPTION
EncDecAttention

Instance of Pop2PianoAttention for performing cross-attention.

layer_norm

Instance of Pop2PianoLayerNorm for layer normalization.

dropout

Dropout layer for regularization.

METHOD DESCRIPTION
__init__

Initializes the Pop2PianoLayerCrossAttention with the given configuration.

forward

Constructs the cross-attention mechanism by applying layer normalization, attention computation, and dropout.

RETURNS DESCRIPTION
outputs

Tuple containing the layer output and additional attention outputs.

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
class Pop2PianoLayerCrossAttention(nn.Module):

    """
    The Pop2PianoLayerCrossAttention class represents a layer that performs cross-attention within the Pop2Piano model architecture.
    This class inherits from nn.Module and contains methods for initializing the layer and forwarding the cross-attention mechanism.

    Attributes:
        EncDecAttention: Instance of Pop2PianoAttention for performing cross-attention.
        layer_norm: Instance of Pop2PianoLayerNorm for layer normalization.
        dropout: Dropout layer for regularization.

    Methods:
        __init__: Initializes the Pop2PianoLayerCrossAttention with the given configuration.

        forward: Constructs the cross-attention mechanism by applying layer normalization, attention computation,
            and dropout.

    Returns:
        outputs: Tuple containing the layer output and additional attention outputs.

    """
    def __init__(self, config):
        """
        Initialize a Pop2PianoLayerCrossAttention object.

        Args:
            self (Pop2PianoLayerCrossAttention): The instance of the Pop2PianoLayerCrossAttention class.
            config (object):
                Configuration object containing necessary parameters for initialization.

                - Type: object
                - Purpose: Contains configuration settings for the attention layer.
                - Restrictions: Must be a valid configuration object.

        Returns:
            None.

        Raises:
            None
        """
        super().__init__()
        self.EncDecAttention = Pop2PianoAttention(config, has_relative_attention_bias=False)
        self.layer_norm = Pop2PianoLayerNorm(config.d_model, eps=config.layer_norm_epsilon)
        self.dropout = nn.Dropout(p=config.dropout_rate)

    def forward(
        self,
        hidden_states,
        key_value_states,
        attention_mask=None,
        position_bias=None,
        layer_head_mask=None,
        past_key_value=None,
        use_cache=False,
        query_length=None,
        output_attentions=False,
    ):
        """
        Method 'forward' in the class 'Pop2PianoLayerCrossAttention'.

        This method forwards the output of the Pop2PianoLayerCrossAttention layer.

        Args:
            self: The instance of the class.
            hidden_states (tensor): The input hidden states to the layer.
            key_value_states (tensor): The key-value states used in attention computation.
            attention_mask (tensor, optional): Mask to avoid attending to certain positions.
            position_bias (tensor, optional): Bias applied to positions for relative attention.
            layer_head_mask (tensor, optional): Mask applied to the heads in the layer.
            past_key_value (tuple, optional): Tuple containing past key and value tensors.
            use_cache (bool, optional): If True, cache the computed key-value states.
            query_length (int, optional): Length of the query sequence.
            output_attentions (bool, optional): If True, return attention weights.

        Returns:
            tuple: A tuple containing the layer output tensor and additional outputs from attention computation.

        Raises:
            None
        """
        normed_hidden_states = self.layer_norm(hidden_states)
        attention_output = self.EncDecAttention(
            normed_hidden_states,
            mask=attention_mask,
            key_value_states=key_value_states,
            position_bias=position_bias,
            layer_head_mask=layer_head_mask,
            past_key_value=past_key_value,
            use_cache=use_cache,
            query_length=query_length,
            output_attentions=output_attentions,
        )
        layer_output = hidden_states + self.dropout(attention_output[0])
        outputs = (layer_output,) + attention_output[1:]  # add attentions if we output them
        return outputs

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoLayerCrossAttention.__init__(config)

Initialize a Pop2PianoLayerCrossAttention object.

PARAMETER DESCRIPTION
self

The instance of the Pop2PianoLayerCrossAttention class.

TYPE: Pop2PianoLayerCrossAttention

config

Configuration object containing necessary parameters for initialization.

  • Type: object
  • Purpose: Contains configuration settings for the attention layer.
  • Restrictions: Must be a valid configuration object.

TYPE: object

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
def __init__(self, config):
    """
    Initialize a Pop2PianoLayerCrossAttention object.

    Args:
        self (Pop2PianoLayerCrossAttention): The instance of the Pop2PianoLayerCrossAttention class.
        config (object):
            Configuration object containing necessary parameters for initialization.

            - Type: object
            - Purpose: Contains configuration settings for the attention layer.
            - Restrictions: Must be a valid configuration object.

    Returns:
        None.

    Raises:
        None
    """
    super().__init__()
    self.EncDecAttention = Pop2PianoAttention(config, has_relative_attention_bias=False)
    self.layer_norm = Pop2PianoLayerNorm(config.d_model, eps=config.layer_norm_epsilon)
    self.dropout = nn.Dropout(p=config.dropout_rate)

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoLayerCrossAttention.forward(hidden_states, key_value_states, attention_mask=None, position_bias=None, layer_head_mask=None, past_key_value=None, use_cache=False, query_length=None, output_attentions=False)

Method 'forward' in the class 'Pop2PianoLayerCrossAttention'.

This method forwards the output of the Pop2PianoLayerCrossAttention layer.

PARAMETER DESCRIPTION
self

The instance of the class.

hidden_states

The input hidden states to the layer.

TYPE: tensor

key_value_states

The key-value states used in attention computation.

TYPE: tensor

attention_mask

Mask to avoid attending to certain positions.

TYPE: tensor DEFAULT: None

position_bias

Bias applied to positions for relative attention.

TYPE: tensor DEFAULT: None

layer_head_mask

Mask applied to the heads in the layer.

TYPE: tensor DEFAULT: None

past_key_value

Tuple containing past key and value tensors.

TYPE: tuple DEFAULT: None

use_cache

If True, cache the computed key-value states.

TYPE: bool DEFAULT: False

query_length

Length of the query sequence.

TYPE: int DEFAULT: None

output_attentions

If True, return attention weights.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
tuple

A tuple containing the layer output tensor and additional outputs from attention computation.

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
def forward(
    self,
    hidden_states,
    key_value_states,
    attention_mask=None,
    position_bias=None,
    layer_head_mask=None,
    past_key_value=None,
    use_cache=False,
    query_length=None,
    output_attentions=False,
):
    """
    Method 'forward' in the class 'Pop2PianoLayerCrossAttention'.

    This method forwards the output of the Pop2PianoLayerCrossAttention layer.

    Args:
        self: The instance of the class.
        hidden_states (tensor): The input hidden states to the layer.
        key_value_states (tensor): The key-value states used in attention computation.
        attention_mask (tensor, optional): Mask to avoid attending to certain positions.
        position_bias (tensor, optional): Bias applied to positions for relative attention.
        layer_head_mask (tensor, optional): Mask applied to the heads in the layer.
        past_key_value (tuple, optional): Tuple containing past key and value tensors.
        use_cache (bool, optional): If True, cache the computed key-value states.
        query_length (int, optional): Length of the query sequence.
        output_attentions (bool, optional): If True, return attention weights.

    Returns:
        tuple: A tuple containing the layer output tensor and additional outputs from attention computation.

    Raises:
        None
    """
    normed_hidden_states = self.layer_norm(hidden_states)
    attention_output = self.EncDecAttention(
        normed_hidden_states,
        mask=attention_mask,
        key_value_states=key_value_states,
        position_bias=position_bias,
        layer_head_mask=layer_head_mask,
        past_key_value=past_key_value,
        use_cache=use_cache,
        query_length=query_length,
        output_attentions=output_attentions,
    )
    layer_output = hidden_states + self.dropout(attention_output[0])
    outputs = (layer_output,) + attention_output[1:]  # add attentions if we output them
    return outputs

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoLayerFF

Bases: Module

This class represents a feed-forward layer used in the Pop2Piano model. It is inherited from the nn.Module class.

ATTRIBUTE DESCRIPTION
DenseReluDense

A dense layer with gated activation function, if config.is_gated_act is True, otherwise a dense layer with regular activation function.

TYPE: Pop2PianoDenseGatedActDense or Pop2PianoDenseActDense

layer_norm

A layer normalization module.

TYPE: Pop2PianoLayerNorm

dropout

A dropout module.

TYPE: Dropout

METHOD DESCRIPTION
__init__

Initializes the Pop2PianoLayerFF instance with the provided configuration.

forward

Constructs the feed-forward layer by applying layer normalization, dense layer, dropout, and residual connection.

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
class Pop2PianoLayerFF(nn.Module):

    """
    This class represents a feed-forward layer used in the Pop2Piano model. It is inherited from the nn.Module class.

    Attributes:
        DenseReluDense (Pop2PianoDenseGatedActDense or Pop2PianoDenseActDense): A dense layer with gated activation
            function, if config.is_gated_act is True, otherwise a dense layer with regular activation function.
        layer_norm (Pop2PianoLayerNorm): A layer normalization module.
        dropout (nn.Dropout): A dropout module.

    Methods:
        __init__: Initializes the Pop2PianoLayerFF instance with the provided configuration.
        forward: Constructs the feed-forward layer by applying layer normalization, dense layer, dropout,
            and residual connection.

    """
    def __init__(self, config: Pop2PianoConfig):
        """
        Initializes the Pop2PianoLayerFF class instance with the provided configuration.

        Args:
            self (Pop2PianoLayerFF): The instance of the Pop2PianoLayerFF class.
            config (Pop2PianoConfig): An instance of the Pop2PianoConfig class containing configuration parameters.
                This parameter is required for configuring the behavior of the Pop2PianoLayerFF instance.
                It should be of type Pop2PianoConfig and must not be None.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__()
        if config.is_gated_act:
            self.DenseReluDense = Pop2PianoDenseGatedActDense(config)
        else:
            self.DenseReluDense = Pop2PianoDenseActDense(config)

        self.layer_norm = Pop2PianoLayerNorm(config.d_model, eps=config.layer_norm_epsilon)
        self.dropout = nn.Dropout(p=config.dropout_rate)

    def forward(self, hidden_states):
        """
        Constructs the forward pass of the Pop2PianoLayerFF model.

        Args:
            self (Pop2PianoLayerFF): An instance of the Pop2PianoLayerFF class.
            hidden_states (torch.Tensor): The input hidden states. A tensor of shape (batch_size, hidden_size).

        Returns:
            torch.Tensor: The updated hidden states. A tensor of shape (batch_size, hidden_size).

        Raises:
            None.
        """
        forwarded_states = self.layer_norm(hidden_states)
        forwarded_states = self.DenseReluDense(forwarded_states)
        hidden_states = hidden_states + self.dropout(forwarded_states)
        return hidden_states

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoLayerFF.__init__(config)

Initializes the Pop2PianoLayerFF class instance with the provided configuration.

PARAMETER DESCRIPTION
self

The instance of the Pop2PianoLayerFF class.

TYPE: Pop2PianoLayerFF

config

An instance of the Pop2PianoConfig class containing configuration parameters. This parameter is required for configuring the behavior of the Pop2PianoLayerFF instance. It should be of type Pop2PianoConfig and must not be None.

TYPE: Pop2PianoConfig

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
def __init__(self, config: Pop2PianoConfig):
    """
    Initializes the Pop2PianoLayerFF class instance with the provided configuration.

    Args:
        self (Pop2PianoLayerFF): The instance of the Pop2PianoLayerFF class.
        config (Pop2PianoConfig): An instance of the Pop2PianoConfig class containing configuration parameters.
            This parameter is required for configuring the behavior of the Pop2PianoLayerFF instance.
            It should be of type Pop2PianoConfig and must not be None.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__()
    if config.is_gated_act:
        self.DenseReluDense = Pop2PianoDenseGatedActDense(config)
    else:
        self.DenseReluDense = Pop2PianoDenseActDense(config)

    self.layer_norm = Pop2PianoLayerNorm(config.d_model, eps=config.layer_norm_epsilon)
    self.dropout = nn.Dropout(p=config.dropout_rate)

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoLayerFF.forward(hidden_states)

Constructs the forward pass of the Pop2PianoLayerFF model.

PARAMETER DESCRIPTION
self

An instance of the Pop2PianoLayerFF class.

TYPE: Pop2PianoLayerFF

hidden_states

The input hidden states. A tensor of shape (batch_size, hidden_size).

TYPE: Tensor

RETURNS DESCRIPTION

torch.Tensor: The updated hidden states. A tensor of shape (batch_size, hidden_size).

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
def forward(self, hidden_states):
    """
    Constructs the forward pass of the Pop2PianoLayerFF model.

    Args:
        self (Pop2PianoLayerFF): An instance of the Pop2PianoLayerFF class.
        hidden_states (torch.Tensor): The input hidden states. A tensor of shape (batch_size, hidden_size).

    Returns:
        torch.Tensor: The updated hidden states. A tensor of shape (batch_size, hidden_size).

    Raises:
        None.
    """
    forwarded_states = self.layer_norm(hidden_states)
    forwarded_states = self.DenseReluDense(forwarded_states)
    hidden_states = hidden_states + self.dropout(forwarded_states)
    return hidden_states

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoLayerNorm

Bases: Module

Pop2PianoLayerNorm class represents a layer normalization module in the Pop2Piano style, designed without bias and mean subtraction. This class inherits from nn.Module and provides functionality for performing layer normalization on hidden states in a neural network. The class includes methods for initialization and forwardion, applying the Pop2Piano style normalization to the input hidden states. The 'Pop2PianoLayerNorm' class is suitable for use in deep learning models requiring efficient and effective normalization techniques.

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
class Pop2PianoLayerNorm(nn.Module):

    """
    Pop2PianoLayerNorm class represents a layer normalization module in the Pop2Piano style, designed without bias and
    mean subtraction.
    This class inherits from nn.Module and provides functionality for performing layer normalization on hidden states in
    a neural network.
    The class includes methods for initialization and forwardion, applying the Pop2Piano style normalization to
    the input hidden states.
    The 'Pop2PianoLayerNorm' class is suitable for use in deep learning models requiring efficient and effective
    normalization techniques.
    """
    def __init__(self, hidden_size, eps=1e-6):
        """
        Construct a layernorm module in the Pop2Piano style. No bias and no subtraction of mean.
        """
        super().__init__()
        self.weight = Parameter(initializer('zeros', (hidden_size,), mindspore.float32), 'weight')
        self.variance_epsilon = eps

    def forward(self, hidden_states):
        """
        Method 'forward' in the class 'Pop2PianoLayerNorm'.

        Args:
            self: Represents the instance of the class Pop2PianoLayerNorm. It is used to access attributes and methods
                of the class.

                - Type: Pop2PianoLayerNorm object
                - Purpose: To operate on the instance of the class.
                - Restrictions: None

            hidden_states:
                Represents the hidden states input to the method.

                - Type: Tensor
                - Purpose: Input hidden states that need to be normalized.
                - Restrictions: Should be convertible to float32. Expected shape: (batch_size, seq_length, hidden_size).

        Returns:
            None: This method does not return a value but updates the hidden_states in-place after normalizing them.

        Raises:
            None.
        """
        # Pop2Piano uses a layer_norm which only scales and doesn't shift, which is also known as Root Mean
        # Square Layer Normalization https://arxiv.org/abs/1910.07467 thus varience is calculated
        # w/o mean and there is no bias. Additionally we want to make sure that the accumulation for
        # half-precision inputs is done in fp32

        variance = hidden_states.to(mindspore.float32).pow(2).mean(-1, keep_dims=True)
        hidden_states = hidden_states * ops.rsqrt(variance + self.variance_epsilon)

        # convert into half-precision if necessary
        if self.weight.dtype in [mindspore.float16, mindspore.bfloat16]:
            hidden_states = hidden_states.to(self.weight.dtype)

        return self.weight * hidden_states

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoLayerNorm.__init__(hidden_size, eps=1e-06)

Construct a layernorm module in the Pop2Piano style. No bias and no subtraction of mean.

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
62
63
64
65
66
67
68
def __init__(self, hidden_size, eps=1e-6):
    """
    Construct a layernorm module in the Pop2Piano style. No bias and no subtraction of mean.
    """
    super().__init__()
    self.weight = Parameter(initializer('zeros', (hidden_size,), mindspore.float32), 'weight')
    self.variance_epsilon = eps

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoLayerNorm.forward(hidden_states)

Method 'forward' in the class 'Pop2PianoLayerNorm'.

PARAMETER DESCRIPTION
self

Represents the instance of the class Pop2PianoLayerNorm. It is used to access attributes and methods of the class.

  • Type: Pop2PianoLayerNorm object
  • Purpose: To operate on the instance of the class.
  • Restrictions: None

hidden_states

Represents the hidden states input to the method.

  • Type: Tensor
  • Purpose: Input hidden states that need to be normalized.
  • Restrictions: Should be convertible to float32. Expected shape: (batch_size, seq_length, hidden_size).

RETURNS DESCRIPTION
None

This method does not return a value but updates the hidden_states in-place after normalizing them.

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
def forward(self, hidden_states):
    """
    Method 'forward' in the class 'Pop2PianoLayerNorm'.

    Args:
        self: Represents the instance of the class Pop2PianoLayerNorm. It is used to access attributes and methods
            of the class.

            - Type: Pop2PianoLayerNorm object
            - Purpose: To operate on the instance of the class.
            - Restrictions: None

        hidden_states:
            Represents the hidden states input to the method.

            - Type: Tensor
            - Purpose: Input hidden states that need to be normalized.
            - Restrictions: Should be convertible to float32. Expected shape: (batch_size, seq_length, hidden_size).

    Returns:
        None: This method does not return a value but updates the hidden_states in-place after normalizing them.

    Raises:
        None.
    """
    # Pop2Piano uses a layer_norm which only scales and doesn't shift, which is also known as Root Mean
    # Square Layer Normalization https://arxiv.org/abs/1910.07467 thus varience is calculated
    # w/o mean and there is no bias. Additionally we want to make sure that the accumulation for
    # half-precision inputs is done in fp32

    variance = hidden_states.to(mindspore.float32).pow(2).mean(-1, keep_dims=True)
    hidden_states = hidden_states * ops.rsqrt(variance + self.variance_epsilon)

    # convert into half-precision if necessary
    if self.weight.dtype in [mindspore.float16, mindspore.bfloat16]:
        hidden_states = hidden_states.to(self.weight.dtype)

    return self.weight * hidden_states

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoLayerSelfAttention

Bases: Module

This class represents a self-attention mechanism used in the Pop2PianoLayer model.

The Pop2PianoLayerSelfAttention class is a subclass of the nn.Module class in the PyTorch library. It is responsible for performing self-attention on the input hidden states.

ATTRIBUTE DESCRIPTION
SelfAttention

An instance of the Pop2PianoAttention class used for self-attention computation.

TYPE: Pop2PianoAttention

layer_norm

An instance of the Pop2PianoLayerNorm class used for layer normalization.

TYPE: Pop2PianoLayerNorm

dropout

An instance of the Dropout class used for dropout regularization.

TYPE: Dropout

METHOD DESCRIPTION
__init__

Constructs a new Pop2PianoLayerSelfAttention object.

forward

Performs self-attention on the input hidden states.

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
class Pop2PianoLayerSelfAttention(nn.Module):

    """This class represents a self-attention mechanism used in the Pop2PianoLayer model.

    The Pop2PianoLayerSelfAttention class is a subclass of the nn.Module class in the PyTorch library.
    It is responsible for performing self-attention on the input hidden states.

    Attributes:
        SelfAttention (Pop2PianoAttention): An instance of the Pop2PianoAttention class used for self-attention
            computation.
        layer_norm (Pop2PianoLayerNorm): An instance of the Pop2PianoLayerNorm class used for layer normalization.
        dropout (nn.Dropout): An instance of the Dropout class used for dropout regularization.

    Methods:
        __init__: Constructs a new Pop2PianoLayerSelfAttention object.
        forward: Performs self-attention on the input hidden states.

    """
    def __init__(self, config, has_relative_attention_bias=False):
        """
        Initializes an instance of the Pop2PianoLayerSelfAttention class.

        Args:
            self: The instance of the class.
            config (object): An object containing configuration parameters for the attention layer.
            has_relative_attention_bias (bool, optional):
                Specifies whether the attention layer has relative attention bias. Defaults to False.

        Returns:
            None

        Raises:
            None
        """
        super().__init__()
        self.SelfAttention = Pop2PianoAttention(config, has_relative_attention_bias=has_relative_attention_bias)
        self.layer_norm = Pop2PianoLayerNorm(config.d_model, eps=config.layer_norm_epsilon)
        self.dropout = nn.Dropout(p=config.dropout_rate)

    def forward(
        self,
        hidden_states,
        attention_mask=None,
        position_bias=None,
        layer_head_mask=None,
        past_key_value=None,
        use_cache=False,
        output_attentions=False,
    ):
        """
        Constructs the Pop2PianoLayerSelfAttention.

        This method is responsible for forwarding the Pop2PianoLayerSelfAttention in the given class.
        It takes in several parameters to perform the forwardion and returns None.

        Args:
            self (Pop2PianoLayerSelfAttention): An instance of the Pop2PianoLayerSelfAttention class.
            hidden_states (Tensor): The input hidden states.
            attention_mask (Tensor, optional): An optional mask tensor. Default is None.
            position_bias (Tensor, optional): An optional tensor for position bias. Default is None.
            layer_head_mask (Tensor, optional): An optional tensor for layer head mask. Default is None.
            past_key_value (Tuple[Tensor], optional): An optional tuple of past key and value tensors. Default is None.
            use_cache (bool, optional): A flag indicating whether to use cache. Default is False.
            output_attentions (bool, optional): A flag indicating whether to output attentions. Default is False.

        Returns:
            None

        Raises:
            None
        """
        normed_hidden_states = self.layer_norm(hidden_states)
        attention_output = self.SelfAttention(
            normed_hidden_states,
            mask=attention_mask,
            position_bias=position_bias,
            layer_head_mask=layer_head_mask,
            past_key_value=past_key_value,
            use_cache=use_cache,
            output_attentions=output_attentions,
        )
        hidden_states = hidden_states + self.dropout(attention_output[0])
        outputs = (hidden_states,) + attention_output[1:]  # add attentions if we output them
        return outputs

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoLayerSelfAttention.__init__(config, has_relative_attention_bias=False)

Initializes an instance of the Pop2PianoLayerSelfAttention class.

PARAMETER DESCRIPTION
self

The instance of the class.

config

An object containing configuration parameters for the attention layer.

TYPE: object

has_relative_attention_bias

Specifies whether the attention layer has relative attention bias. Defaults to False.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
def __init__(self, config, has_relative_attention_bias=False):
    """
    Initializes an instance of the Pop2PianoLayerSelfAttention class.

    Args:
        self: The instance of the class.
        config (object): An object containing configuration parameters for the attention layer.
        has_relative_attention_bias (bool, optional):
            Specifies whether the attention layer has relative attention bias. Defaults to False.

    Returns:
        None

    Raises:
        None
    """
    super().__init__()
    self.SelfAttention = Pop2PianoAttention(config, has_relative_attention_bias=has_relative_attention_bias)
    self.layer_norm = Pop2PianoLayerNorm(config.d_model, eps=config.layer_norm_epsilon)
    self.dropout = nn.Dropout(p=config.dropout_rate)

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoLayerSelfAttention.forward(hidden_states, attention_mask=None, position_bias=None, layer_head_mask=None, past_key_value=None, use_cache=False, output_attentions=False)

Constructs the Pop2PianoLayerSelfAttention.

This method is responsible for forwarding the Pop2PianoLayerSelfAttention in the given class. It takes in several parameters to perform the forwardion and returns None.

PARAMETER DESCRIPTION
self

An instance of the Pop2PianoLayerSelfAttention class.

TYPE: Pop2PianoLayerSelfAttention

hidden_states

The input hidden states.

TYPE: Tensor

attention_mask

An optional mask tensor. Default is None.

TYPE: Tensor DEFAULT: None

position_bias

An optional tensor for position bias. Default is None.

TYPE: Tensor DEFAULT: None

layer_head_mask

An optional tensor for layer head mask. Default is None.

TYPE: Tensor DEFAULT: None

past_key_value

An optional tuple of past key and value tensors. Default is None.

TYPE: Tuple[Tensor] DEFAULT: None

use_cache

A flag indicating whether to use cache. Default is False.

TYPE: bool DEFAULT: False

output_attentions

A flag indicating whether to output attentions. Default is False.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
def forward(
    self,
    hidden_states,
    attention_mask=None,
    position_bias=None,
    layer_head_mask=None,
    past_key_value=None,
    use_cache=False,
    output_attentions=False,
):
    """
    Constructs the Pop2PianoLayerSelfAttention.

    This method is responsible for forwarding the Pop2PianoLayerSelfAttention in the given class.
    It takes in several parameters to perform the forwardion and returns None.

    Args:
        self (Pop2PianoLayerSelfAttention): An instance of the Pop2PianoLayerSelfAttention class.
        hidden_states (Tensor): The input hidden states.
        attention_mask (Tensor, optional): An optional mask tensor. Default is None.
        position_bias (Tensor, optional): An optional tensor for position bias. Default is None.
        layer_head_mask (Tensor, optional): An optional tensor for layer head mask. Default is None.
        past_key_value (Tuple[Tensor], optional): An optional tuple of past key and value tensors. Default is None.
        use_cache (bool, optional): A flag indicating whether to use cache. Default is False.
        output_attentions (bool, optional): A flag indicating whether to output attentions. Default is False.

    Returns:
        None

    Raises:
        None
    """
    normed_hidden_states = self.layer_norm(hidden_states)
    attention_output = self.SelfAttention(
        normed_hidden_states,
        mask=attention_mask,
        position_bias=position_bias,
        layer_head_mask=layer_head_mask,
        past_key_value=past_key_value,
        use_cache=use_cache,
        output_attentions=output_attentions,
    )
    hidden_states = hidden_states + self.dropout(attention_output[0])
    outputs = (hidden_states,) + attention_output[1:]  # add attentions if we output them
    return outputs

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoPreTrainedModel

Bases: PreTrainedModel

An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained models.

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
class Pop2PianoPreTrainedModel(PreTrainedModel):
    """
    An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
    models.
    """
    config_class = Pop2PianoConfig
    base_model_prefix = "transformer"
    is_parallelizable = False
    supports_gradient_checkpointing = True
    _no_split_modules = ["Pop2PianoBlock"]
    _keep_in_fp32_modules = ["wo"]

    def _init_weights(self, module):
        """Initialize the weights"""
        factor = self.config.initializer_factor  # Used for testing weights initialization
        if isinstance(module, Pop2PianoLayerNorm):
            module.weight.data.set_data(initializer(Normal(factor * 1.0), \
                                                    module.weight.data.shape, module.weight.data.dtype))
        elif isinstance(module, Pop2PianoConcatEmbeddingToMel):
            module.embedding.weight.data.set_data(initializer(Normal(factor * 1.0), \
                                                              module.embedding.weight.data.shape, \
                                                              module.embedding.weight.data.dtype))
        elif isinstance(module, Pop2PianoForConditionalGeneration):
            # Mesh TensorFlow embeddings initialization
            # See https://github.com/tensorflow/mesh/blob/fa19d69eafc9a482aff0b59ddd96b025c0cb207d/mesh_tensorflow/layers.py#L1624
            module.shared.weight.data.set_data(initializer(Normal(factor * 1.0), \
                                               module.shared.weight.data.shape, \
                                               module.shared.weight.data.dtype))
            if hasattr(module, "lm_head") and not self.config.tie_word_embeddings:
                module.lm_head.weight.data.set_data(initializer(Normal(factor * 1.0), \
                                                    module.lm_head.weight.data.shape, \
                                                    module.lm_head.weight.data.dtype))
        elif isinstance(module, Pop2PianoDenseActDense):
            # Mesh TensorFlow FF initialization
            # See https://github.com/tensorflow/mesh/blob/master/mesh_tensorflow/transformer/transformer_layers.py#L56
            # and https://github.com/tensorflow/mesh/blob/fa19d69eafc9a482aff0b59ddd96b025c0cb207d/mesh_tensorflow/layers.py#L89
            module.wi.weight.data.set_data(initializer(Normal(factor * ((self.config.d_model) ** -0.5)), \
                                           module.wi.weight.data.shape, \
                                           module.wi.weight.data.dtype))
            if hasattr(module.wi, "bias") and module.wi.bias is not None:
                module.wi.bias.data.set_data(initializer("zero", module.wi.bias.data.shape, \
                                                         module.wi.bias.data.dtype))
            module.wo.weight.data.set_data(initializer(Normal(factor * ((self.config.d_ff) ** -0.5)), \
                                           module.wo.weight.data.shape, \
                                           module.wo.weight.data.dtype))
            if hasattr(module.wo, "bias") and module.wo.bias is not None:
                module.wo.bias.data.set_data(initializer("zero", module.wo.bias.data.shape, \
                                                         module.wo.bias.data.dtype))
        elif isinstance(module, Pop2PianoDenseGatedActDense):
            module.wi_0.weight.data.set_data(initializer(Normal(factor * ((self.config.d_model) ** -0.5)), \
                                             module.wi_0.weight.data.shape, \
                                             module.wi_0.weight.data.dtype))
            if hasattr(module.wi_0, "bias") and module.wi_0.bias is not None:
                module.wi_0.bias.data.set_data(initializer("zero", module.wi_0.bias.data.shape, \
                                                           module.wi_0.bias.data.dtype))
            module.wi_1.weight.data.set_data(initializer(Normal(factor * ((self.config.d_model) ** -0.5)), \
                                             module.wi_1.weight.data.shape, \
                                             module.wi_1.weight.data.dtype))
            if hasattr(module.wi_1, "bias") and module.wi_1.bias is not None:
                module.wi_1.bias.data.set_data(initializer("zero", module.wi_1.bias.data.shape, \
                                               module.wi_1.bias.data.dtype))
            module.wo.weight.data.set_data(initializer(Normal(factor * ((self.config.d_ff) ** -0.5)), \
                                           module.wo.weight.data.shape, \
                                           module.wo.weight.data.dtype))
            if hasattr(module.wo, "bias") and module.wo.bias is not None:
                module.wo.bias.data.set_data(initializer("zero", module.wo.bias.data.shape, \
                                                         module.wo.bias.data.dtype))
        elif isinstance(module, Pop2PianoAttention):
            # Mesh TensorFlow attention initialization to avoid scaling before softmax
            # See https://github.com/tensorflow/mesh/blob/fa19d69eafc9a482aff0b59ddd96b025c0cb207d/mesh_tensorflow/transformer/attention.py#L136
            d_model = self.config.d_model
            key_value_proj_dim = self.config.d_kv
            n_heads = self.config.num_heads
            module.q.weight.data.set_data(initializer(Normal(factor * ((d_model * key_value_proj_dim) ** -0.5)), \
                                          module.q.weight.data.shape, \
                                          module.q.weight.data.dtype))
            module.k.weight.data.set_data(initializer(Normal(factor * (d_model**-0.5)), \
                                          module.k.weight.data.shape, \
                                          module.k.weight.data.dtype))
            module.v.weight.data.set_data(initializer(Normal(factor * (d_model**-0.5)), \
                                          module.v.weight.data.shape, \
                                          module.v.weight.data.dtype))
            module.o.weight.data.set_data(initializer(Normal(factor * ((n_heads * key_value_proj_dim) ** -0.5)), \
                                          module.o.weight.data.shape, \
                                          module.o.weight.data.dtype))
            if module.has_relative_attention_bias:
                module.relative_attention_bias.weight.data.set_data(initializer(Normal(factor * ((d_model) ** -0.5)), \
                                                                    module.relative_attention_bias.weight.data.shape, \
                                                                    module.relative_attention_bias.weight.data.dtype))

    def _shift_right(self, input_ids):
        """
        Shifts the input sequence to the right by one position for decoding in the Pop2PianoPreTrainedModel class.

        Args:
            self (Pop2PianoPreTrainedModel): The instance of the Pop2PianoPreTrainedModel class.
            input_ids (torch.Tensor): The input tensor of shape [batch_size, sequence_length] containing the input IDs
                for each token in the sequence.

        Returns:
            torch.Tensor: The shifted input tensor of the same shape as input_ids, where the first token in
                each sequence is replaced with the decoder_start_token_id, and subsequent tokens are shifted one
                position to the right.

        Raises:
            ValueError: If self.model.config.decoder_start_token_id is not defined or is None.
            ValueError: If self.model.config.pad_token_id is not defined or is None.
        """
        decoder_start_token_id = self.config.decoder_start_token_id
        pad_token_id = self.config.pad_token_id

        if decoder_start_token_id is None:
            raise ValueError(
                "self.model.config.decoder_start_token_id has to be defined. In Pop2Piano it is usually set to the pad_token_id."
            )

        # shift inputs to the right
        # if is_torch_fx_proxy(input_ids):
        #     # Item assignment is not supported natively for proxies.
        #     shifted_input_ids = torch.full(input_ids.shape[:-1] + (1,), decoder_start_token_id)
        #     shifted_input_ids = torch.cat([shifted_input_ids, input_ids[..., :-1]], dim=-1)
        # else:
        shifted_input_ids = input_ids.new_zeros(input_ids.shape)
        shifted_input_ids[..., 1:] = input_ids[..., :-1].copy()
        shifted_input_ids[..., 0] = decoder_start_token_id

        if pad_token_id is None:
            raise ValueError("self.model.config.pad_token_id has to be defined.")
        # replace possible -100 values in labels by `pad_token_id`
        shifted_input_ids = shifted_input_ids.masked_fill(shifted_input_ids == -100, pad_token_id)

        return shifted_input_ids

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoStack

Bases: Pop2PianoPreTrainedModel

This class represents a stack of Pop2Piano blocks that can be used for modeling and processing tasks in a Pop2Piano-based architecture. The class inherits from Pop2PianoPreTrainedModel and includes methods for initializing the model, setting input embeddings, and forwarding the model with various input and output options.

The class includes methods for initializing the model with token embeddings, processing input data, and generating model outputs. It also supports features such as caching, attention masks, and output options for hidden states and attentions.

The Pop2PianoStack class is designed to handle multiple layers of Pop2Piano blocks and provides flexibility for customizing model behavior and output based on the input configurations.

For more detailed information on the methods and their parameters, refer to the method docstrings within the class implementation.

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
class Pop2PianoStack(Pop2PianoPreTrainedModel):

    """
    This class represents a stack of Pop2Piano blocks that can be used for modeling and processing tasks in a
    Pop2Piano-based architecture. The class inherits from Pop2PianoPreTrainedModel and includes methods for initializing
    the model, setting input embeddings, and forwarding the model with various input and output options.

    The class includes methods for initializing the model with token embeddings, processing input data, and generating
    model outputs. It also supports features such as caching, attention masks, and output options for hidden states and
    attentions.

    The Pop2PianoStack class is designed to handle multiple layers of Pop2Piano blocks and provides flexibility for
    customizing model behavior and output based on the input configurations.

    For more detailed information on the methods and their parameters, refer to the method docstrings within the
    class implementation.
    """
    # Copied from transformers.models.t5.modeling_t5.T5Stack.__init__ with T5->Pop2Piano,t5->pop2piano
    def __init__(self, config, embed_tokens=None):
        """
        Initializes a Pop2PianoStack instance.

        Args:
            self: The instance of the Pop2PianoStack class.
            config:
                A configuration object containing parameters for the model.

                - Type: Any
                - Purpose: Specifies the configuration settings for the model.
            embed_tokens:
                Tokens used for embedding.

                - Type: Any
                - Purpose: Optional tokens for embedding.
                - Restrictions: Default value is None.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__(config)

        self.embed_tokens = embed_tokens
        self.is_decoder = config.is_decoder

        self.block = nn.ModuleList(
            [Pop2PianoBlock(config, has_relative_attention_bias=bool(i == 0)) for i in range(config.num_layers)]
        )
        self.final_layer_norm = Pop2PianoLayerNorm(config.d_model, eps=config.layer_norm_epsilon)
        self.dropout = nn.Dropout(p=config.dropout_rate)

        # Initialize weights and apply final processing
        self.post_init()
        # Model parallel
        self.model_parallel = False
        self.device_map = None
        self.gradient_checkpointing = False

    # Copied from transformers.models.t5.modeling_t5.T5Stack.get_input_embeddings
    def get_input_embeddings(self):
        '''
        This method retrieves the input embeddings from the Pop2PianoStack class.

        Args:
            self: Pop2PianoStack instance. The self parameter is the instance of the Pop2PianoStack class.

        Returns:
            embed_tokens: This method returns the embed_tokens attribute of the Pop2PianoStack instance,
                which represents the input embeddings.

        Raises:
            This method does not raise any exceptions.
        '''
        return self.embed_tokens

    # Copied from transformers.models.t5.modeling_t5.T5Stack.set_input_embeddings
    def set_input_embeddings(self, new_embeddings):
        """
        Set the input embeddings for the Pop2PianoStack model.

        Args:
            self (Pop2PianoStack): The instance of the Pop2PianoStack class.
            new_embeddings (object): The new embeddings to be set for input.

        Returns:
            None: This method updates the embed_tokens attribute of the Pop2PianoStack instance.

        Raises:
            None.
        """
        self.embed_tokens = new_embeddings

    def forward(
        self,
        input_ids=None,
        attention_mask=None,
        encoder_hidden_states=None,
        encoder_attention_mask=None,
        inputs_embeds=None,
        head_mask=None,
        cross_attn_head_mask=None,
        past_key_values=None,
        use_cache=None,
        output_attentions=None,
        output_hidden_states=None,
        return_dict=None,
    ):
        """
        This method forwards the Pop2PianoStack model with the specified input parameters.

        Args:
            self: The instance of the Pop2PianoStack class.
            input_ids (optional): Tensor of shape (batch_size, sequence_length) representing input token IDs.
            attention_mask (optional): Tensor of shape (batch_size, sequence_length) representing attention mask.
            encoder_hidden_states (optional): Tensor representing hidden states from the encoder.
            encoder_attention_mask (optional): Tensor representing the attention mask for encoder_hidden_states.
            inputs_embeds (optional): Tensor representing the input embeddings.
            head_mask (optional): Tensor representing the head mask for self-attention.
            cross_attn_head_mask (optional): Tensor representing the head mask for cross-attention.
            past_key_values (optional): List of past key values for caching.
            use_cache (optional): Boolean indicating whether to use caching.
            output_attentions (optional): Boolean indicating whether to output attentions.
            output_hidden_states (optional): Boolean indicating whether to output hidden states.
            return_dict (optional): Boolean indicating whether to return a dictionary.

        Returns:
            None

        Raises:
            ValueError: If both input_ids and inputs_embeds are specified simultaneously.
            ValueError: If neither input_ids nor inputs_embeds are specified.
            ValueError: If model is not initialized with valid token embeddings.
            ValueError: If `use_cache` is set to True when model is not used as a decoder.
            Warning: If `use_cache=True` is incompatible with gradient checkpointing.

        Note: Detailed implementation logic is provided in the method's code.
        """
        use_cache = use_cache if use_cache is not None else self.config.use_cache
        output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
        output_hidden_states = (
            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
        )
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        if input_ids is not None and inputs_embeds is not None:
            err_msg_prefix = "decoder_" if self.is_decoder else ""
            raise ValueError(
                f"You cannot specify both {err_msg_prefix}input_ids and {err_msg_prefix}inputs_embeds at the same time"
            )
        if input_ids is not None:
            input_shape = input_ids.shape
            input_ids = input_ids.view(-1, input_shape[-1])
        elif inputs_embeds is not None:
            input_shape = inputs_embeds.shape[:-1]
        else:
            err_msg_prefix = "decoder_" if self.is_decoder else ""
            raise ValueError(f"You have to specify either {err_msg_prefix}input_ids or {err_msg_prefix}inputs_embeds")

        if inputs_embeds is None:
            if self.embed_tokens is None:
                raise ValueError("You have to initialize the model with valid token embeddings")
            inputs_embeds = self.embed_tokens(input_ids)

        batch_size, seq_length = input_shape

        # required mask seq length can be calculated via length of past
        mask_seq_length = past_key_values[0][0].shape[2] + seq_length if past_key_values is not None else seq_length

        if use_cache is True:
            if not self.is_decoder:
                raise ValueError(f"`use_cache` can only be set to `True` if {self} is used as a decoder")

        if attention_mask is None:
            attention_mask = ops.ones((batch_size, mask_seq_length))
        if self.is_decoder and encoder_attention_mask is None and encoder_hidden_states is not None:
            encoder_seq_length = encoder_hidden_states.shape[1]
            encoder_attention_mask = ops.ones(
                (batch_size, encoder_seq_length), dtype=mindspore.int64
            )

        # initialize past_key_values with `None` if past does not exist
        if past_key_values is None:
            past_key_values = [None] * len(self.block)

        # We can provide a self-attention mask of dimensions [batch_size, from_seq_length, to_seq_length]
        # ourselves in which case we just need to make it broadcastable to all heads.
        extended_attention_mask = self.get_extended_attention_mask(attention_mask, input_shape)

        # If a 2D or 3D attention mask is provided for the cross-attention
        # we need to make broadcastable to [batch_size, num_heads, seq_length, seq_length]
        if self.is_decoder and encoder_hidden_states is not None:
            encoder_batch_size, encoder_sequence_length, _ = encoder_hidden_states.shape
            encoder_hidden_shape = (encoder_batch_size, encoder_sequence_length)
            if encoder_attention_mask is None:
                encoder_attention_mask = ops.ones(encoder_hidden_shape)
            encoder_extended_attention_mask = self.invert_attention_mask(encoder_attention_mask)
        else:
            encoder_extended_attention_mask = None

        if self.gradient_checkpointing and self.training:
            if use_cache:
                logger.warning_once(
                    "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`..."
                )
                use_cache = False

        # Prepare head mask if needed
        head_mask = self.get_head_mask(head_mask, self.config.num_layers)
        cross_attn_head_mask = self.get_head_mask(cross_attn_head_mask, self.config.num_layers)
        present_key_value_states = () if use_cache else None
        all_hidden_states = () if output_hidden_states else None
        all_attentions = () if output_attentions else None
        all_cross_attentions = () if (output_attentions and self.is_decoder) else None
        position_bias = None
        encoder_decoder_position_bias = None

        hidden_states = self.dropout(inputs_embeds)

        for i, (layer_module, past_key_value) in enumerate(zip(self.block, past_key_values)):
            layer_head_mask = head_mask[i]
            cross_attn_layer_head_mask = cross_attn_head_mask[i]
            if output_hidden_states:
                all_hidden_states = all_hidden_states + (hidden_states,)

            if self.gradient_checkpointing and self.training:
                layer_outputs = self._gradient_checkpointing_func(
                    layer_module.forward,
                    hidden_states,
                    extended_attention_mask,
                    position_bias,
                    encoder_hidden_states,
                    encoder_extended_attention_mask,
                    encoder_decoder_position_bias,
                    layer_head_mask,
                    cross_attn_layer_head_mask,
                    None,  # past_key_value is always None with gradient checkpointing
                    use_cache,
                    output_attentions,
                )
            else:
                layer_outputs = layer_module(
                    hidden_states,
                    attention_mask=extended_attention_mask,
                    position_bias=position_bias,
                    encoder_hidden_states=encoder_hidden_states,
                    encoder_attention_mask=encoder_extended_attention_mask,
                    encoder_decoder_position_bias=encoder_decoder_position_bias,
                    layer_head_mask=layer_head_mask,
                    cross_attn_layer_head_mask=cross_attn_layer_head_mask,
                    past_key_value=past_key_value,
                    use_cache=use_cache,
                    output_attentions=output_attentions,
                )

            # layer_outputs is a tuple with:
            # hidden-states, key-value-states, (self-attention position bias), (self-attention weights), (cross-attention position bias), (cross-attention weights)
            if use_cache is False:
                layer_outputs = layer_outputs[:1] + (None,) + layer_outputs[1:]

            hidden_states, present_key_value_state = layer_outputs[:2]

            # We share the position biases between the layers - the first layer store them
            # layer_outputs = hidden-states, key-value-states (self-attention position bias), (self-attention weights),
            # (cross-attention position bias), (cross-attention weights)
            position_bias = layer_outputs[2]
            if self.is_decoder and encoder_hidden_states is not None:
                encoder_decoder_position_bias = layer_outputs[4 if output_attentions else 3]
            # append next layer key value states
            if use_cache:
                present_key_value_states = present_key_value_states + (present_key_value_state,)

            if output_attentions:
                all_attentions = all_attentions + (layer_outputs[3],)
                if self.is_decoder:
                    all_cross_attentions = all_cross_attentions + (layer_outputs[5],)

        hidden_states = self.final_layer_norm(hidden_states)
        hidden_states = self.dropout(hidden_states)

        # Add last layer
        if output_hidden_states:
            all_hidden_states = all_hidden_states + (hidden_states,)

        if not return_dict:
            return tuple(
                v
                for v in [
                    hidden_states,
                    present_key_value_states,
                    all_hidden_states,
                    all_attentions,
                    all_cross_attentions,
                ]
                if v is not None
            )
        return BaseModelOutputWithPastAndCrossAttentions(
            last_hidden_state=hidden_states,
            past_key_values=present_key_value_states,
            hidden_states=all_hidden_states,
            attentions=all_attentions,
            cross_attentions=all_cross_attentions,
        )

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoStack.__init__(config, embed_tokens=None)

Initializes a Pop2PianoStack instance.

PARAMETER DESCRIPTION
self

The instance of the Pop2PianoStack class.

config

A configuration object containing parameters for the model.

  • Type: Any
  • Purpose: Specifies the configuration settings for the model.

embed_tokens

Tokens used for embedding.

  • Type: Any
  • Purpose: Optional tokens for embedding.
  • Restrictions: Default value is None.

DEFAULT: None

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
def __init__(self, config, embed_tokens=None):
    """
    Initializes a Pop2PianoStack instance.

    Args:
        self: The instance of the Pop2PianoStack class.
        config:
            A configuration object containing parameters for the model.

            - Type: Any
            - Purpose: Specifies the configuration settings for the model.
        embed_tokens:
            Tokens used for embedding.

            - Type: Any
            - Purpose: Optional tokens for embedding.
            - Restrictions: Default value is None.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__(config)

    self.embed_tokens = embed_tokens
    self.is_decoder = config.is_decoder

    self.block = nn.ModuleList(
        [Pop2PianoBlock(config, has_relative_attention_bias=bool(i == 0)) for i in range(config.num_layers)]
    )
    self.final_layer_norm = Pop2PianoLayerNorm(config.d_model, eps=config.layer_norm_epsilon)
    self.dropout = nn.Dropout(p=config.dropout_rate)

    # Initialize weights and apply final processing
    self.post_init()
    # Model parallel
    self.model_parallel = False
    self.device_map = None
    self.gradient_checkpointing = False

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoStack.forward(input_ids=None, attention_mask=None, encoder_hidden_states=None, encoder_attention_mask=None, inputs_embeds=None, head_mask=None, cross_attn_head_mask=None, past_key_values=None, use_cache=None, output_attentions=None, output_hidden_states=None, return_dict=None)

This method forwards the Pop2PianoStack model with the specified input parameters.

PARAMETER DESCRIPTION
self

The instance of the Pop2PianoStack class.

input_ids

Tensor of shape (batch_size, sequence_length) representing input token IDs.

TYPE: optional DEFAULT: None

attention_mask

Tensor of shape (batch_size, sequence_length) representing attention mask.

TYPE: optional DEFAULT: None

encoder_hidden_states

Tensor representing hidden states from the encoder.

TYPE: optional DEFAULT: None

encoder_attention_mask

Tensor representing the attention mask for encoder_hidden_states.

TYPE: optional DEFAULT: None

inputs_embeds

Tensor representing the input embeddings.

TYPE: optional DEFAULT: None

head_mask

Tensor representing the head mask for self-attention.

TYPE: optional DEFAULT: None

cross_attn_head_mask

Tensor representing the head mask for cross-attention.

TYPE: optional DEFAULT: None

past_key_values

List of past key values for caching.

TYPE: optional DEFAULT: None

use_cache

Boolean indicating whether to use caching.

TYPE: optional DEFAULT: None

output_attentions

Boolean indicating whether to output attentions.

TYPE: optional DEFAULT: None

output_hidden_states

Boolean indicating whether to output hidden states.

TYPE: optional DEFAULT: None

return_dict

Boolean indicating whether to return a dictionary.

TYPE: optional DEFAULT: None

RETURNS DESCRIPTION

None

RAISES DESCRIPTION
ValueError

If both input_ids and inputs_embeds are specified simultaneously.

ValueError

If neither input_ids nor inputs_embeds are specified.

ValueError

If model is not initialized with valid token embeddings.

ValueError

If use_cache is set to True when model is not used as a decoder.

Warning

If use_cache=True is incompatible with gradient checkpointing.

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
def forward(
    self,
    input_ids=None,
    attention_mask=None,
    encoder_hidden_states=None,
    encoder_attention_mask=None,
    inputs_embeds=None,
    head_mask=None,
    cross_attn_head_mask=None,
    past_key_values=None,
    use_cache=None,
    output_attentions=None,
    output_hidden_states=None,
    return_dict=None,
):
    """
    This method forwards the Pop2PianoStack model with the specified input parameters.

    Args:
        self: The instance of the Pop2PianoStack class.
        input_ids (optional): Tensor of shape (batch_size, sequence_length) representing input token IDs.
        attention_mask (optional): Tensor of shape (batch_size, sequence_length) representing attention mask.
        encoder_hidden_states (optional): Tensor representing hidden states from the encoder.
        encoder_attention_mask (optional): Tensor representing the attention mask for encoder_hidden_states.
        inputs_embeds (optional): Tensor representing the input embeddings.
        head_mask (optional): Tensor representing the head mask for self-attention.
        cross_attn_head_mask (optional): Tensor representing the head mask for cross-attention.
        past_key_values (optional): List of past key values for caching.
        use_cache (optional): Boolean indicating whether to use caching.
        output_attentions (optional): Boolean indicating whether to output attentions.
        output_hidden_states (optional): Boolean indicating whether to output hidden states.
        return_dict (optional): Boolean indicating whether to return a dictionary.

    Returns:
        None

    Raises:
        ValueError: If both input_ids and inputs_embeds are specified simultaneously.
        ValueError: If neither input_ids nor inputs_embeds are specified.
        ValueError: If model is not initialized with valid token embeddings.
        ValueError: If `use_cache` is set to True when model is not used as a decoder.
        Warning: If `use_cache=True` is incompatible with gradient checkpointing.

    Note: Detailed implementation logic is provided in the method's code.
    """
    use_cache = use_cache if use_cache is not None else self.config.use_cache
    output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
    output_hidden_states = (
        output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
    )
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    if input_ids is not None and inputs_embeds is not None:
        err_msg_prefix = "decoder_" if self.is_decoder else ""
        raise ValueError(
            f"You cannot specify both {err_msg_prefix}input_ids and {err_msg_prefix}inputs_embeds at the same time"
        )
    if input_ids is not None:
        input_shape = input_ids.shape
        input_ids = input_ids.view(-1, input_shape[-1])
    elif inputs_embeds is not None:
        input_shape = inputs_embeds.shape[:-1]
    else:
        err_msg_prefix = "decoder_" if self.is_decoder else ""
        raise ValueError(f"You have to specify either {err_msg_prefix}input_ids or {err_msg_prefix}inputs_embeds")

    if inputs_embeds is None:
        if self.embed_tokens is None:
            raise ValueError("You have to initialize the model with valid token embeddings")
        inputs_embeds = self.embed_tokens(input_ids)

    batch_size, seq_length = input_shape

    # required mask seq length can be calculated via length of past
    mask_seq_length = past_key_values[0][0].shape[2] + seq_length if past_key_values is not None else seq_length

    if use_cache is True:
        if not self.is_decoder:
            raise ValueError(f"`use_cache` can only be set to `True` if {self} is used as a decoder")

    if attention_mask is None:
        attention_mask = ops.ones((batch_size, mask_seq_length))
    if self.is_decoder and encoder_attention_mask is None and encoder_hidden_states is not None:
        encoder_seq_length = encoder_hidden_states.shape[1]
        encoder_attention_mask = ops.ones(
            (batch_size, encoder_seq_length), dtype=mindspore.int64
        )

    # initialize past_key_values with `None` if past does not exist
    if past_key_values is None:
        past_key_values = [None] * len(self.block)

    # We can provide a self-attention mask of dimensions [batch_size, from_seq_length, to_seq_length]
    # ourselves in which case we just need to make it broadcastable to all heads.
    extended_attention_mask = self.get_extended_attention_mask(attention_mask, input_shape)

    # If a 2D or 3D attention mask is provided for the cross-attention
    # we need to make broadcastable to [batch_size, num_heads, seq_length, seq_length]
    if self.is_decoder and encoder_hidden_states is not None:
        encoder_batch_size, encoder_sequence_length, _ = encoder_hidden_states.shape
        encoder_hidden_shape = (encoder_batch_size, encoder_sequence_length)
        if encoder_attention_mask is None:
            encoder_attention_mask = ops.ones(encoder_hidden_shape)
        encoder_extended_attention_mask = self.invert_attention_mask(encoder_attention_mask)
    else:
        encoder_extended_attention_mask = None

    if self.gradient_checkpointing and self.training:
        if use_cache:
            logger.warning_once(
                "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`..."
            )
            use_cache = False

    # Prepare head mask if needed
    head_mask = self.get_head_mask(head_mask, self.config.num_layers)
    cross_attn_head_mask = self.get_head_mask(cross_attn_head_mask, self.config.num_layers)
    present_key_value_states = () if use_cache else None
    all_hidden_states = () if output_hidden_states else None
    all_attentions = () if output_attentions else None
    all_cross_attentions = () if (output_attentions and self.is_decoder) else None
    position_bias = None
    encoder_decoder_position_bias = None

    hidden_states = self.dropout(inputs_embeds)

    for i, (layer_module, past_key_value) in enumerate(zip(self.block, past_key_values)):
        layer_head_mask = head_mask[i]
        cross_attn_layer_head_mask = cross_attn_head_mask[i]
        if output_hidden_states:
            all_hidden_states = all_hidden_states + (hidden_states,)

        if self.gradient_checkpointing and self.training:
            layer_outputs = self._gradient_checkpointing_func(
                layer_module.forward,
                hidden_states,
                extended_attention_mask,
                position_bias,
                encoder_hidden_states,
                encoder_extended_attention_mask,
                encoder_decoder_position_bias,
                layer_head_mask,
                cross_attn_layer_head_mask,
                None,  # past_key_value is always None with gradient checkpointing
                use_cache,
                output_attentions,
            )
        else:
            layer_outputs = layer_module(
                hidden_states,
                attention_mask=extended_attention_mask,
                position_bias=position_bias,
                encoder_hidden_states=encoder_hidden_states,
                encoder_attention_mask=encoder_extended_attention_mask,
                encoder_decoder_position_bias=encoder_decoder_position_bias,
                layer_head_mask=layer_head_mask,
                cross_attn_layer_head_mask=cross_attn_layer_head_mask,
                past_key_value=past_key_value,
                use_cache=use_cache,
                output_attentions=output_attentions,
            )

        # layer_outputs is a tuple with:
        # hidden-states, key-value-states, (self-attention position bias), (self-attention weights), (cross-attention position bias), (cross-attention weights)
        if use_cache is False:
            layer_outputs = layer_outputs[:1] + (None,) + layer_outputs[1:]

        hidden_states, present_key_value_state = layer_outputs[:2]

        # We share the position biases between the layers - the first layer store them
        # layer_outputs = hidden-states, key-value-states (self-attention position bias), (self-attention weights),
        # (cross-attention position bias), (cross-attention weights)
        position_bias = layer_outputs[2]
        if self.is_decoder and encoder_hidden_states is not None:
            encoder_decoder_position_bias = layer_outputs[4 if output_attentions else 3]
        # append next layer key value states
        if use_cache:
            present_key_value_states = present_key_value_states + (present_key_value_state,)

        if output_attentions:
            all_attentions = all_attentions + (layer_outputs[3],)
            if self.is_decoder:
                all_cross_attentions = all_cross_attentions + (layer_outputs[5],)

    hidden_states = self.final_layer_norm(hidden_states)
    hidden_states = self.dropout(hidden_states)

    # Add last layer
    if output_hidden_states:
        all_hidden_states = all_hidden_states + (hidden_states,)

    if not return_dict:
        return tuple(
            v
            for v in [
                hidden_states,
                present_key_value_states,
                all_hidden_states,
                all_attentions,
                all_cross_attentions,
            ]
            if v is not None
        )
    return BaseModelOutputWithPastAndCrossAttentions(
        last_hidden_state=hidden_states,
        past_key_values=present_key_value_states,
        hidden_states=all_hidden_states,
        attentions=all_attentions,
        cross_attentions=all_cross_attentions,
    )

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoStack.get_input_embeddings()

This method retrieves the input embeddings from the Pop2PianoStack class.

PARAMETER DESCRIPTION
self

Pop2PianoStack instance. The self parameter is the instance of the Pop2PianoStack class.

RETURNS DESCRIPTION
embed_tokens

This method returns the embed_tokens attribute of the Pop2PianoStack instance, which represents the input embeddings.

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
def get_input_embeddings(self):
    '''
    This method retrieves the input embeddings from the Pop2PianoStack class.

    Args:
        self: Pop2PianoStack instance. The self parameter is the instance of the Pop2PianoStack class.

    Returns:
        embed_tokens: This method returns the embed_tokens attribute of the Pop2PianoStack instance,
            which represents the input embeddings.

    Raises:
        This method does not raise any exceptions.
    '''
    return self.embed_tokens

mindnlp.transformers.models.pop2piano.modeling_pop2piano.Pop2PianoStack.set_input_embeddings(new_embeddings)

Set the input embeddings for the Pop2PianoStack model.

PARAMETER DESCRIPTION
self

The instance of the Pop2PianoStack class.

TYPE: Pop2PianoStack

new_embeddings

The new embeddings to be set for input.

TYPE: object

RETURNS DESCRIPTION
None

This method updates the embed_tokens attribute of the Pop2PianoStack instance.

Source code in mindnlp/transformers/models/pop2piano/modeling_pop2piano.py
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
def set_input_embeddings(self, new_embeddings):
    """
    Set the input embeddings for the Pop2PianoStack model.

    Args:
        self (Pop2PianoStack): The instance of the Pop2PianoStack class.
        new_embeddings (object): The new embeddings to be set for input.

    Returns:
        None: This method updates the embed_tokens attribute of the Pop2PianoStack instance.

    Raises:
        None.
    """
    self.embed_tokens = new_embeddings

mindnlp.transformers.models.pop2piano.tokenization_pop2piano

Tokenization class for Pop2Piano.

mindnlp.transformers.models.pop2piano.tokenization_pop2piano.Pop2PianoTokenizer

Bases: PreTrainedTokenizer

Constructs a Pop2Piano tokenizer. This tokenizer does not require training.

This tokenizer inherits from [PreTrainedTokenizer] which contains most of the main methods. Users should refer to this superclass for more information regarding those methods.

PARAMETER DESCRIPTION
vocab

Path to the vocab file which contains the vocabulary.

TYPE: `str`

default_velocity

Determines the default velocity to be used while creating midi Notes.

TYPE: `int`, *optional*, defaults to 77 DEFAULT: 77

num_bars

Determines cutoff_time_idx in for each token.

TYPE: `int`, *optional*, defaults to 2 DEFAULT: 2

Source code in mindnlp/transformers/models/pop2piano/tokenization_pop2piano.py
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
class Pop2PianoTokenizer(PreTrainedTokenizer):
    """
    Constructs a Pop2Piano tokenizer. This tokenizer does not require training.

    This tokenizer inherits from [`PreTrainedTokenizer`] which contains most of the main methods. Users should refer to
    this superclass for more information regarding those methods.

    Args:
        vocab (`str`):
            Path to the vocab file which contains the vocabulary.
        default_velocity (`int`, *optional*, defaults to 77):
            Determines the default velocity to be used while creating midi Notes.
        num_bars (`int`, *optional*, defaults to 2):
            Determines cutoff_time_idx in for each token.
    """
    model_input_names = ["token_ids", "attention_mask"]
    vocab_files_names = VOCAB_FILES_NAMES
    pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP

    def __init__(
        self,
        vocab,
        default_velocity=77,
        num_bars=2,
        unk_token="-1",
        eos_token="1",
        pad_token="0",
        bos_token="2",
        **kwargs,
    ):
        """
        This method initializes an instance of the Pop2PianoTokenizer class.

        Args:
            self: The instance of the Pop2PianoTokenizer class.
            vocab (str): The path to the vocabulary file.
            default_velocity (int): The default velocity for the tokenizer, default value is 77.
            num_bars (int): The number of bars.
            unk_token (str or AddedToken): The unknown token for the tokenizer.
                If str, it will be converted to an AddedToken.
            eos_token (str or AddedToken): The end-of-sequence token for the tokenizer.
                If str, it will be converted to an AddedToken.
            pad_token (str or AddedToken): The padding token for the tokenizer.
                If str, it will be converted to an AddedToken.
            bos_token (str or AddedToken): The beginning-of-sequence token for the tokenizer.
                If str, it will be converted to an AddedToken.

        Returns:
            None.

        Raises:
            FileNotFoundError: If the 'vocab' file is not found.
            JSONDecodeError: If there is an error decoding the JSON data from the 'vocab' file.
        """
        unk_token = AddedToken(unk_token, lstrip=False, rstrip=False) if isinstance(unk_token, str) else unk_token
        eos_token = AddedToken(eos_token, lstrip=False, rstrip=False) if isinstance(eos_token, str) else eos_token
        pad_token = AddedToken(pad_token, lstrip=False, rstrip=False) if isinstance(pad_token, str) else pad_token
        bos_token = AddedToken(bos_token, lstrip=False, rstrip=False) if isinstance(bos_token, str) else bos_token

        self.default_velocity = default_velocity
        self.num_bars = num_bars

        # Load the vocab
        with open(vocab, "rb") as file:
            self.encoder = json.load(file)

        # create mappings for encoder
        self.decoder = {v: k for k, v in self.encoder.items()}

        super().__init__(
            unk_token=unk_token,
            eos_token=eos_token,
            pad_token=pad_token,
            bos_token=bos_token,
            **kwargs,
        )

    @property
    def vocab_size(self):
        """Returns the vocabulary size of the tokenizer."""
        return len(self.encoder)

    def get_vocab(self):
        """Returns the vocabulary of the tokenizer"""
        return dict(self.encoder, **self.added_tokens_encoder)

    def _convert_id_to_token(self, token_id: int) -> list:
        """
        Decodes the token ids generated by the transformer into notes.

        Args:
            token_id (`int`):
                This denotes the ids generated by the transformers to be converted to Midi tokens.

        Returns:
            `List`: A list consists of token_type (`str`) and value (`int`).
        """
        token_type_value = self.decoder.get(token_id, f"{self.unk_token}_TOKEN_TIME")
        token_type_value = token_type_value.split("_")
        token_type, value = "_".join(token_type_value[1:]), int(token_type_value[0])

        return [token_type, value]

    def _convert_token_to_id(self, token, token_type="TOKEN_TIME") -> int:
        """
        Encodes the Midi tokens to transformer generated token ids.

        Args:
            token (`int`):
                This denotes the token value.
            token_type (`str`):
                This denotes the type of the token. There are four types of midi tokens such as "TOKEN_TIME",
                "TOKEN_VELOCITY", "TOKEN_NOTE" and "TOKEN_SPECIAL".

        Returns:
            `int`: returns the id of the token.
        """
        return self.encoder.get(f"{token}_{token_type}", int(self.unk_token))

    def relative_batch_tokens_ids_to_notes(
        self,
        tokens: np.ndarray,
        beat_offset_idx: int,
        bars_per_batch:int,
        cutoff_time_idx: int,
    ):
        """
        Converts relative tokens to notes which are then used to generate pretty midi object.

        Args:
            tokens (`numpy.ndarray`):
                Tokens to be converted to notes.
            beat_offset_idx (`int`):
                Denotes beat offset index for each note in generated Midi.
            bars_per_batch (`int`):
                A parameter to control the Midi output generation.
            cutoff_time_idx (`int`):
                Denotes the cutoff time index for each note in generated Midi.
        """
        notes = None

        for index in range(len(tokens)):
            _tokens = tokens[index]
            _start_idx = beat_offset_idx + index * bars_per_batch * 4
            _cutoff_time_idx = cutoff_time_idx + _start_idx
            _notes = self.relative_tokens_ids_to_notes(
                _tokens,
                start_idx=_start_idx,
                cutoff_time_idx=_cutoff_time_idx,
            )

            if len(_notes) == 0:
                pass
            elif notes is None:
                notes = _notes
            else:
                notes = np.concatenate((notes, _notes), axis=0)

        if notes is None:
            return []
        return notes

    def relative_batch_tokens_ids_to_midi(
        self,
        tokens: np.ndarray,
        beatstep: np.ndarray,
        beat_offset_idx: int = 0,
        bars_per_batch: int = 2,
        cutoff_time_idx: int = 12,
    ):
        """
        Converts tokens to Midi. This method calls `relative_batch_tokens_ids_to_notes` method to convert batch tokens
        to notes then uses `notes_to_midi` method to convert them to Midi.

        Args:
            tokens (`numpy.ndarray`):
                Denotes tokens which alongside beatstep will be converted to Midi.
            beatstep (`np.ndarray`):
                We get beatstep from feature extractor which is also used to get Midi.
            beat_offset_idx (`int`, *optional*, defaults to 0):
                Denotes beat offset index for each note in generated Midi.
            bars_per_batch (`int`, *optional*, defaults to 2):
                A parameter to control the Midi output generation.
            cutoff_time_idx (`int`, *optional*, defaults to 12):
                Denotes the cutoff time index for each note in generated Midi.
        """
        beat_offset_idx = 0 if beat_offset_idx is None else beat_offset_idx
        notes = self.relative_batch_tokens_ids_to_notes(
            tokens=tokens,
            beat_offset_idx=beat_offset_idx,
            bars_per_batch=bars_per_batch,
            cutoff_time_idx=cutoff_time_idx,
        )
        midi = self.notes_to_midi(notes, beatstep, offset_sec=beatstep[beat_offset_idx])
        return midi

    # Taken from the original code
    # Please see https://github.com/sweetcocoa/pop2piano/blob/fac11e8dcfc73487513f4588e8d0c22a22f2fdc5/midi_tokenizer.py#L257
    def relative_tokens_ids_to_notes(self, tokens: np.ndarray, start_idx: float, cutoff_time_idx: float = None):
        """
        Converts relative tokens to notes which will then be used to create Pretty Midi objects.

        Args:
            tokens (`numpy.ndarray`):
                Relative Tokens which will be converted to notes.
            start_idx (`float`):
                A parameter which denotes the starting index.
            cutoff_time_idx (`float`, *optional*):
                A parameter used while converting tokens to notes.
        """
        words = [self._convert_id_to_token(token) for token in tokens]

        current_idx = start_idx
        current_velocity = 0
        note_onsets_ready = [None for i in range(sum(k.endswith("NOTE") for k in self.encoder.keys()) + 1)]
        notes = []
        for token_type, number in words:
            if token_type == "TOKEN_SPECIAL":
                if number == 1:
                    break
            elif token_type == "TOKEN_TIME":
                current_idx = token_time_to_note(
                    number=number, cutoff_time_idx=cutoff_time_idx, current_idx=current_idx
                )
            elif token_type == "TOKEN_VELOCITY":
                current_velocity = number

            elif token_type == "TOKEN_NOTE":
                notes = token_note_to_note(
                    number=number,
                    current_velocity=current_velocity,
                    default_velocity=self.default_velocity,
                    note_onsets_ready=note_onsets_ready,
                    current_idx=current_idx,
                    notes=notes,
                )
            else:
                raise ValueError("Token type not understood!")

        for pitch, note_onset in enumerate(note_onsets_ready):
            # force offset if no offset for each pitch
            if note_onset is not None:
                if cutoff_time_idx is None:
                    cutoff = note_onset + 1
                else:
                    cutoff = max(cutoff_time_idx, note_onset + 1)

                offset_idx = max(current_idx, cutoff)
                notes.append([note_onset, offset_idx, pitch, self.default_velocity])

        if len(notes) == 0:
            return []

        notes = np.array(notes)
        note_order = notes[:, 0] * 128 + notes[:, 1]
        notes = notes[note_order.argsort()]
        return notes

    def notes_to_midi(self, notes: np.ndarray, beatstep: np.ndarray, offset_sec: int = 0.0):
        """
        Converts notes to Midi.

        Args:
            notes (`numpy.ndarray`):
                This is used to create Pretty Midi objects.
            beatstep (`numpy.ndarray`):
                This is the extrapolated beatstep that we get from feature extractor.
            offset_sec (`int`, *optional*, defaults to 0.0):
                This represents the offset seconds which is used while creating each Pretty Midi Note.
        """
        requires_backends(self, ["pretty_midi"])

        new_pm = pretty_midi.PrettyMIDI(resolution=384, initial_tempo=120.0)
        new_inst = pretty_midi.Instrument(program=0)
        new_notes = []

        for onset_idx, offset_idx, pitch, velocity in notes:
            new_note = pretty_midi.Note(
                velocity=velocity,
                pitch=pitch,
                start=beatstep[onset_idx] - offset_sec,
                end=beatstep[offset_idx] - offset_sec,
            )
            new_notes.append(new_note)
        new_inst.notes = new_notes
        new_pm.instruments.append(new_inst)
        new_pm.remove_invalid_notes()
        return new_pm

    def save_vocabulary(self, save_directory: str, filename_prefix: Optional[str] = None) -> Tuple[str]:
        """
        Saves the tokenizer's vocabulary dictionary to the provided save_directory.

        Args:
            save_directory (`str`):
                A path to the directory where to saved. It will be created if it doesn't exist.
            filename_prefix (`Optional[str]`, *optional*):
                A prefix to add to the names of the files saved by the tokenizer.
        """
        if not os.path.isdir(save_directory):
            logger.error(f"Vocabulary path ({save_directory}) should be a directory.")
            return None

        # Save the encoder.
        out_vocab_file = os.path.join(
            save_directory, (filename_prefix + "-" if filename_prefix else "") + VOCAB_FILES_NAMES["vocab"]
        )
        with open(out_vocab_file, "w") as file:
            file.write(json.dumps(self.encoder))

        return (out_vocab_file,)

    def encode_plus(
        self,
        notes: Union[np.ndarray, List["pretty_midi.Note"]],
        truncation_strategy: Optional[TruncationStrategy] = None,
        max_length: Optional[int] = None,
        **kwargs,
    ) -> BatchEncoding:
        r"""
        This is the `encode_plus` method for `Pop2PianoTokenizer`. It converts the midi notes to the transformer
        generated token ids. It only works on a single batch, to process multiple batches please use
        `batch_encode_plus` or `__call__` method.

        Args:
            notes (`numpy.ndarray` of shape `[sequence_length, 4]` or `list` of `pretty_midi.Note` objects):
                This represents the midi notes. If `notes` is a `numpy.ndarray`:

                - Each sequence must have 4 values, they are `onset idx`, `offset idx`, `pitch` and `velocity`.

                If `notes` is a `list` containing `pretty_midi.Note` objects:

                - Each sequence must have 4 attributes, they are `start`, `end`, `pitch` and `velocity`.
            truncation_strategy ([`~tokenization_utils_base.TruncationStrategy`], *optional*):
                Indicates the truncation strategy that is going to be used during truncation.
            max_length (`int`, *optional*):
                Maximum length of the returned list and optionally padding length (see above).

        Returns:
            `BatchEncoding` containing the tokens ids.
        """
        requires_backends(self, ["pretty_midi"])

        # check if notes is a pretty_midi object or not, if yes then extract the attributes and put them into a numpy
        # array.
        if isinstance(notes[0], pretty_midi.Note):
            notes = np.array(
                [[each_note.start, each_note.end, each_note.pitch, each_note.velocity] for each_note in notes]
            ).reshape(-1, 4)

        # to round up all the values to the closest int values.
        notes = np.round(notes).astype(np.int32)
        max_time_idx = notes[:, :2].max()

        times = [[] for i in range((max_time_idx + 1))]
        for onset, offset, pitch, velocity in notes:
            times[onset].append([pitch, velocity])
            times[offset].append([pitch, 0])

        tokens = []
        current_velocity = 0
        for i, time in enumerate(times):
            if len(time) == 0:
                continue
            tokens.append(self._convert_token_to_id(i, "TOKEN_TIME"))
            for pitch, velocity in time:
                velocity = int(velocity > 0)
                if current_velocity != velocity:
                    current_velocity = velocity
                    tokens.append(self._convert_token_to_id(velocity, "TOKEN_VELOCITY"))
                tokens.append(self._convert_token_to_id(pitch, "TOKEN_NOTE"))

        total_len = len(tokens)

        # truncation
        if truncation_strategy != TruncationStrategy.DO_NOT_TRUNCATE and max_length and total_len > max_length:
            tokens, _, _ = self.truncate_sequences(
                ids=tokens,
                num_tokens_to_remove=total_len - max_length,
                truncation_strategy=truncation_strategy,
                **kwargs,
            )

        return BatchEncoding({"token_ids": tokens})

    def batch_encode_plus(
        self,
        notes: Union[np.ndarray, List["pretty_midi.Note"]],
        truncation_strategy: Optional[TruncationStrategy] = None,
        max_length: Optional[int] = None,
        **kwargs,
    ) -> BatchEncoding:
        r"""
        This is the `batch_encode_plus` method for `Pop2PianoTokenizer`. It converts the midi notes to the transformer
        generated token ids. It works on multiple batches by calling `encode_plus` multiple times in a loop.

        Args:
            notes (`numpy.ndarray` of shape `[batch_size, sequence_length, 4]` or `list` of `pretty_midi.Note` objects):
                This represents the midi notes. If `notes` is a `numpy.ndarray`:

                - Each sequence must have 4 values, they are `onset idx`, `offset idx`, `pitch` and `velocity`.

                If `notes` is a `list` containing `pretty_midi.Note` objects:

                - Each sequence must have 4 attributes, they are `start`, `end`, `pitch` and `velocity`.
            truncation_strategy ([`~tokenization_utils_base.TruncationStrategy`], *optional*):
                Indicates the truncation strategy that is going to be used during truncation.
            max_length (`int`, *optional*):
                Maximum length of the returned list and optionally padding length (see above).

        Returns:
            `BatchEncoding` containing the tokens ids.
        """
        encoded_batch_token_ids = []
        for i in range(len(notes)):
            encoded_batch_token_ids.append(
                self.encode_plus(
                    notes[i],
                    truncation_strategy=truncation_strategy,
                    max_length=max_length,
                    **kwargs,
                )["token_ids"]
            )

        return BatchEncoding({"token_ids": encoded_batch_token_ids})

    def __call__(
        self,
        notes: Union[
            np.ndarray,
            List["pretty_midi.Note"],
            List[List["pretty_midi.Note"]],
        ],
        padding: Union[bool, str, PaddingStrategy] = False,
        truncation: Union[bool, str, TruncationStrategy] = None,
        max_length: Optional[int] = None,
        pad_to_multiple_of: Optional[int] = None,
        return_attention_mask: Optional[bool] = None,
        return_tensors: Optional[Union[str, TensorType]] = None,
        verbose: bool = True,
        **kwargs,
    ) -> BatchEncoding:
        r"""
        This is the `__call__` method for `Pop2PianoTokenizer`. It converts the midi notes to the transformer generated
        token ids.

        Args:
            notes (`numpy.ndarray` of shape `[batch_size, max_sequence_length, 4]` or `list` of `pretty_midi.Note` objects):
                This represents the midi notes.

                If `notes` is a `numpy.ndarray`:

                Each sequence must have 4 values, they are `onset idx`, `offset idx`, `pitch` and `velocity`.

                If `notes` is a `list` containing `pretty_midi.Note` objects:

                - Each sequence must have 4 attributes, they are `start`, `end`, `pitch` and `velocity`.
            padding (`bool`, `str` or [`~file_utils.PaddingStrategy`], *optional*, defaults to `False`):
                Activates and controls padding. Accepts the following values:

                - `True` or `'longest'`: Pad to the longest sequence in the batch (or no padding if only a single
                sequence if provided).
                - `'max_length'`: Pad to a maximum length specified with the argument `max_length` or to the maximum
                acceptable input length for the model if that argument is not provided.
                - `False` or `'do_not_pad'` (default): No padding (i.e., can output a batch with sequences of different
                lengths).
            truncation (`bool`, `str` or [`~tokenization_utils_base.TruncationStrategy`], *optional*, defaults to `False`):
                Activates and controls truncation. Accepts the following values:

                - `True` or `'longest_first'`: Truncate to a maximum length specified with the argument `max_length` or
                to the maximum acceptable input length for the model if that argument is not provided. This will
                truncate token by token, removing a token from the longest sequence in the pair if a pair of
                sequences (or a batch of pairs) is provided.
                - `'only_first'`: Truncate to a maximum length specified with the argument `max_length` or to the
                maximum acceptable input length for the model if that argument is not provided. This will only
                truncate the first sequence of a pair if a pair of sequences (or a batch of pairs) is provided.
                - `'only_second'`: Truncate to a maximum length specified with the argument `max_length` or to the
                maximum acceptable input length for the model if that argument is not provided. This will only
                truncate the second sequence of a pair if a pair of sequences (or a batch of pairs) is provided.
                - `False` or `'do_not_truncate'` (default): No truncation (i.e., can output batch with sequence lengths
                greater than the model maximum admissible input size).
            max_length (`int`, *optional*):
                Controls the maximum length to use by one of the truncation/padding parameters. If left unset or set to
                `None`, this will use the predefined model maximum length if a maximum length is required by one of the
                truncation/padding parameters. If the model has no specific maximum input length (like XLNet)
                truncation/padding to a maximum length will be deactivated.
            pad_to_multiple_of (`int`, *optional*):
                If set will pad the sequence to a multiple of the provided value. This is especially useful to enable
                the use of Tensor Cores on NVIDIA hardware with compute capability `>= 7.5` (Volta).
            return_attention_mask (`bool`, *optional*):
                Whether to return the attention mask. If left to the default, will return the attention mask according
                to the specific tokenizer's default, defined by the `return_outputs` attribute.

                [What are attention masks?](../glossary#attention-mask)
            return_tensors (`str` or [`~file_utils.TensorType`], *optional*):
                If set, will return tensors instead of list of python integers. Acceptable values are:

                - `'tf'`: Return TensorFlow `tf.constant` objects.
                - `'pt'`: Return PyTorch `torch.Tensor` objects.
                - `'np'`: Return Numpy `np.ndarray` objects.
            verbose (`bool`, *optional*, defaults to `True`):
                Whether or not to print more information and warnings.

        Returns:
            `BatchEncoding` containing the token_ids.
        """
        # check if it is batched or not
        # it is batched if its a list containing a list of `pretty_midi.Notes` where the outer list contains all the
        # batches and the inner list contains all Notes for a single batch. Otherwise if np.ndarray is passed it will be
        # considered batched if it has shape of `[batch_size, seqence_length, 4]` or ndim=3.
        is_batched = notes.ndim == 3 if isinstance(notes, np.ndarray) else isinstance(notes[0], list)

        # get the truncation and padding strategy
        padding_strategy, truncation_strategy, max_length, kwargs = self._get_padding_truncation_strategies(
            padding=padding,
            truncation=truncation,
            max_length=max_length,
            pad_to_multiple_of=pad_to_multiple_of,
            verbose=verbose,
            **kwargs,
        )

        if is_batched:
            # If the user has not explicitly mentioned `return_attention_mask` as False, we change it to True
            return_attention_mask = True if return_attention_mask is None else return_attention_mask
            token_ids = self.batch_encode_plus(
                notes=notes,
                truncation_strategy=truncation_strategy,
                max_length=max_length,
                **kwargs,
            )
        else:
            token_ids = self.encode_plus(
                notes=notes,
                truncation_strategy=truncation_strategy,
                max_length=max_length,
                **kwargs,
            )

        # since we already have truncated sequnences we are just left to do padding
        token_ids = self.pad(
            token_ids,
            padding=padding_strategy,
            max_length=max_length,
            pad_to_multiple_of=pad_to_multiple_of,
            return_attention_mask=return_attention_mask,
            return_tensors=return_tensors,
            verbose=verbose,
        )

        return token_ids

    def batch_decode(
        self,
        token_ids,
        feature_extractor_output: BatchFeature,
        return_midi: bool = True,
    ):
        r"""
        This is the `batch_decode` method for `Pop2PianoTokenizer`. It converts the token_ids generated by the
        transformer to midi_notes and returns them.

        Args:
            token_ids (`Union[np.ndarray, torch.Tensor, tf.Tensor]`):
                Output token_ids of `Pop2PianoConditionalGeneration` model.
            feature_extractor_output (`BatchFeature`):
                Denotes the output of `Pop2PianoFeatureExtractor.__call__`. It must contain `"beatstep"` and
                `"extrapolated_beatstep"`. Also `"attention_mask_beatsteps"` and
                `"attention_mask_extrapolated_beatstep"`
                 should be present if they were returned by the feature extractor.
            return_midi (`bool`, *optional*, defaults to `True`):
                Whether to return midi object or not.

        Returns:
            Conditional Return:
                If `return_midi` is True:

                - `BatchEncoding` containing both `notes` and `pretty_midi.pretty_midi.PrettyMIDI` objects.

                If `return_midi` is False:

                - `BatchEncoding` containing `notes`.
        """
        # check if they have attention_masks(attention_mask, attention_mask_beatsteps, attention_mask_extrapolated_beatstep) or not
        attention_masks_present = bool(
            hasattr(feature_extractor_output, "attention_mask")
            and hasattr(feature_extractor_output, "attention_mask_beatsteps")
            and hasattr(feature_extractor_output, "attention_mask_extrapolated_beatstep")
        )

        # if we are processing batched inputs then we must need attention_masks
        if not attention_masks_present and feature_extractor_output["beatsteps"].shape[0] > 1:
            raise ValueError(
                "attention_mask, attention_mask_beatsteps and attention_mask_extrapolated_beatstep must be present "
                "for batched inputs! But one of them were not present."
            )

        # check for length mismatch between inputs_embeds, beatsteps and extrapolated_beatstep
        if attention_masks_present:
            # since we know about the number of examples in token_ids from attention_mask
            if (
                sum(feature_extractor_output["attention_mask"][:, 0] == 0)
                != feature_extractor_output["beatsteps"].shape[0]
                or feature_extractor_output["beatsteps"].shape[0]
                != feature_extractor_output["extrapolated_beatstep"].shape[0]
            ):
                raise ValueError(
                    "Length mistamtch between token_ids, beatsteps and extrapolated_beatstep! Found "
                    f"token_ids length - {token_ids.shape[0]}, beatsteps shape - {feature_extractor_output['beatsteps'].shape[0]} "
                    f"and extrapolated_beatsteps shape - {feature_extractor_output['extrapolated_beatstep'].shape[0]}"
                )
            if feature_extractor_output["attention_mask"].shape[0] != token_ids.shape[0]:
                raise ValueError(
                    f"Found attention_mask of length - {feature_extractor_output['attention_mask'].shape[0]} but token_ids of length - {token_ids.shape[0]}"
                )
        else:
            # if there is no attention mask present then it's surely a single example
            if (
                feature_extractor_output["beatsteps"].shape[0] != 1
                or feature_extractor_output["extrapolated_beatstep"].shape[0] != 1
            ):
                raise ValueError(
                    "Length mistamtch of beatsteps and extrapolated_beatstep! Since attention_mask is not present the number of examples must be 1, "
                    f"But found beatsteps length - {feature_extractor_output['beatsteps'].shape[0]}, extrapolated_beatsteps length - {feature_extractor_output['extrapolated_beatstep'].shape[0]}."
                )

        if attention_masks_present:
            # check for zeros(since token_ids are seperated by zero arrays)
            batch_idx = np.where(feature_extractor_output["attention_mask"][:, 0] == 0)[0]
        else:
            batch_idx = [token_ids.shape[0]]

        notes_list = []
        pretty_midi_objects_list = []
        start_idx = 0
        for index, end_idx in enumerate(batch_idx):
            each_tokens_ids = token_ids[start_idx:end_idx]
            # check where the whole example ended by searching for eos_token_id and getting the upper bound
            each_tokens_ids = each_tokens_ids[:, : int(np.max(np.where(each_tokens_ids == int(self.eos_token))[1])) + 1]
            beatsteps = feature_extractor_output["beatsteps"][index]
            extrapolated_beatstep = feature_extractor_output["extrapolated_beatstep"][index]

            # if attention mask is present then mask out real array/tensor
            if attention_masks_present:
                attention_mask_beatsteps = feature_extractor_output["attention_mask_beatsteps"][index]
                attention_mask_extrapolated_beatstep = feature_extractor_output[
                    "attention_mask_extrapolated_beatstep"
                ][index]
                beatsteps = beatsteps[: int(np.max(np.where(attention_mask_beatsteps == 1)[0])) + 1]
                extrapolated_beatstep = extrapolated_beatstep[
                    : int(np.max(np.where(attention_mask_extrapolated_beatstep == 1)[0])) + 1
                ]

            each_tokens_ids = to_numpy(each_tokens_ids)
            beatsteps = to_numpy(beatsteps)
            extrapolated_beatstep = to_numpy(extrapolated_beatstep)

            pretty_midi_object = self.relative_batch_tokens_ids_to_midi(
                tokens=each_tokens_ids,
                beatstep=extrapolated_beatstep,
                bars_per_batch=self.num_bars,
                cutoff_time_idx=(self.num_bars + 1) * 4,
            )

            for note in pretty_midi_object.instruments[0].notes:
                note.start += beatsteps[0]
                note.end += beatsteps[0]
                notes_list.append(note)

            pretty_midi_objects_list.append(pretty_midi_object)
            start_idx += end_idx + 1  # 1 represents the zero array

        if return_midi:
            return BatchEncoding({"notes": notes_list, "pretty_midi_objects": pretty_midi_objects_list})

        return BatchEncoding({"notes": notes_list})

mindnlp.transformers.models.pop2piano.tokenization_pop2piano.Pop2PianoTokenizer.vocab_size property

Returns the vocabulary size of the tokenizer.

mindnlp.transformers.models.pop2piano.tokenization_pop2piano.Pop2PianoTokenizer.__call__(notes, padding=False, truncation=None, max_length=None, pad_to_multiple_of=None, return_attention_mask=None, return_tensors=None, verbose=True, **kwargs)

This is the __call__ method for Pop2PianoTokenizer. It converts the midi notes to the transformer generated token ids.

PARAMETER DESCRIPTION
notes

This represents the midi notes.

If notes is a numpy.ndarray:

Each sequence must have 4 values, they are onset idx, offset idx, pitch and velocity.

If notes is a list containing pretty_midi.Note objects:

  • Each sequence must have 4 attributes, they are start, end, pitch and velocity.

TYPE: `numpy.ndarray` of shape `[batch_size, max_sequence_length, 4]` or `list` of `pretty_midi.Note` objects

padding

Activates and controls padding. Accepts the following values:

  • True or 'longest': Pad to the longest sequence in the batch (or no padding if only a single sequence if provided).
  • 'max_length': Pad to a maximum length specified with the argument max_length or to the maximum acceptable input length for the model if that argument is not provided.
  • False or 'do_not_pad' (default): No padding (i.e., can output a batch with sequences of different lengths).

TYPE: `bool`, `str` or [`~file_utils.PaddingStrategy`], *optional*, defaults to `False` DEFAULT: False

truncation

Activates and controls truncation. Accepts the following values:

  • True or 'longest_first': Truncate to a maximum length specified with the argument max_length or to the maximum acceptable input length for the model if that argument is not provided. This will truncate token by token, removing a token from the longest sequence in the pair if a pair of sequences (or a batch of pairs) is provided.
  • 'only_first': Truncate to a maximum length specified with the argument max_length or to the maximum acceptable input length for the model if that argument is not provided. This will only truncate the first sequence of a pair if a pair of sequences (or a batch of pairs) is provided.
  • 'only_second': Truncate to a maximum length specified with the argument max_length or to the maximum acceptable input length for the model if that argument is not provided. This will only truncate the second sequence of a pair if a pair of sequences (or a batch of pairs) is provided.
  • False or 'do_not_truncate' (default): No truncation (i.e., can output batch with sequence lengths greater than the model maximum admissible input size).

TYPE: `bool`, `str` or [`~tokenization_utils_base.TruncationStrategy`], *optional*, defaults to `False` DEFAULT: None

max_length

Controls the maximum length to use by one of the truncation/padding parameters. If left unset or set to None, this will use the predefined model maximum length if a maximum length is required by one of the truncation/padding parameters. If the model has no specific maximum input length (like XLNet) truncation/padding to a maximum length will be deactivated.

TYPE: `int`, *optional* DEFAULT: None

pad_to_multiple_of

If set will pad the sequence to a multiple of the provided value. This is especially useful to enable the use of Tensor Cores on NVIDIA hardware with compute capability >= 7.5 (Volta).

TYPE: `int`, *optional* DEFAULT: None

return_attention_mask

Whether to return the attention mask. If left to the default, will return the attention mask according to the specific tokenizer's default, defined by the return_outputs attribute.

What are attention masks?

TYPE: `bool`, *optional* DEFAULT: None

return_tensors

If set, will return tensors instead of list of python integers. Acceptable values are:

  • 'tf': Return TensorFlow tf.constant objects.
  • 'pt': Return PyTorch torch.Tensor objects.
  • 'np': Return Numpy np.ndarray objects.

TYPE: `str` or [`~file_utils.TensorType`], *optional* DEFAULT: None

verbose

Whether or not to print more information and warnings.

TYPE: `bool`, *optional*, defaults to `True` DEFAULT: True

RETURNS DESCRIPTION
BatchEncoding

BatchEncoding containing the token_ids.

Source code in mindnlp/transformers/models/pop2piano/tokenization_pop2piano.py
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620