Skip to content

Prompt tuning

mindnlp.peft.tuners.prompt_tuning.config

prompt tuning config.

mindnlp.peft.tuners.prompt_tuning.config.PromptTuningConfig dataclass

Bases: PromptLearningConfig

This is the configuration class to store the configuration of a [PromptEmbedding].

PARAMETER DESCRIPTION
prompt_tuning_init

The initialization of the prompt embedding.

TYPE: Union[[`PromptTuningInit`], `str`] DEFAULT: RANDOM

prompt_tuning_init_text

The text to initialize the prompt embedding. Only used if prompt_tuning_init is TEXT.

TYPE: `str`, *optional* DEFAULT: None

tokenizer_name_or_path

The name or path of the tokenizer. Only used if prompt_tuning_init is TEXT.

TYPE: `str`, *optional* DEFAULT: None

tokenizer_kwargs

The keyword arguments to pass to AutoTokenizer.from_pretrained. Only used if prompt_tuning_init is TEXT.

TYPE: `dict`, *optional* DEFAULT: None

Source code in mindnlp/peft/tuners/prompt_tuning/config.py
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
@dataclass
class PromptTuningConfig(PromptLearningConfig):
    """
    This is the configuration class to store the configuration of a [`PromptEmbedding`].

    Args:
        prompt_tuning_init (Union[[`PromptTuningInit`], `str`]): The initialization of the prompt embedding.
        prompt_tuning_init_text (`str`, *optional*):
            The text to initialize the prompt embedding. Only used if `prompt_tuning_init` is `TEXT`.
        tokenizer_name_or_path (`str`, *optional*):
            The name or path of the tokenizer. Only used if `prompt_tuning_init` is `TEXT`.
        tokenizer_kwargs (`dict`, *optional*):
            The keyword arguments to pass to `AutoTokenizer.from_pretrained`. Only used if `prompt_tuning_init` is
            `TEXT`.
    """
    prompt_tuning_init: Union[PromptTuningInit, str] = field(
        default=PromptTuningInit.RANDOM,
        metadata={"help": "How to initialize the prompt tuning parameters"},
    )
    prompt_tuning_init_text: Optional[str] = field(
        default=None,
        metadata={
            "help": "The text to use for prompt tuning initialization. Only used if prompt_tuning_init is `TEXT`"
        },
    )
    tokenizer_name_or_path: Optional[str] = field(
        default=None,
        metadata={
            "help": "The tokenizer to use for prompt tuning initialization. Only used if prompt_tuning_init is `TEXT`"
        },
    )

    tokenizer_kwargs: Optional[dict] = field(
        default=None,
        metadata={
            "help": (
                "The keyword arguments to pass to `AutoTokenizer.from_pretrained`. Only used if prompt_tuning_init is "
                "`TEXT`"
            ),
        },
    )

    def __post_init__(self):
        r"""
        This method initializes the PromptTuningConfig object after its creation.

        Args:
            self: The instance of the PromptTuningConfig class.

        Returns:
            None. This method does not return any value.

        Raises:
            - ValueError: If the prompt_tuning_init is set to TEXT and tokenizer_name_or_path is not provided.
            - ValueError: If the prompt_tuning_init is set to TEXT and prompt_tuning_init_text is not provided.
            - ValueError: If tokenizer_kwargs is provided but prompt_tuning_init is not set to TEXT.
        """
        self.peft_type = PeftType.PROMPT_TUNING
        if (self.prompt_tuning_init == PromptTuningInit.TEXT) and not self.tokenizer_name_or_path:
            raise ValueError(
                f"When prompt_tuning_init='{PromptTuningInit.TEXT.value}', "
                f"tokenizer_name_or_path can't be {self.tokenizer_name_or_path}."
            )
        if (self.prompt_tuning_init == PromptTuningInit.TEXT) and self.prompt_tuning_init_text is None:
            raise ValueError(
                f"When prompt_tuning_init='{PromptTuningInit.TEXT.value}', "
                f"prompt_tuning_init_text can't be {self.prompt_tuning_init_text}."
            )
        if self.tokenizer_kwargs and (self.prompt_tuning_init != PromptTuningInit.TEXT):
            raise ValueError(
                f"tokenizer_kwargs only valid when using prompt_tuning_init='{PromptTuningInit.TEXT.value}'."
            )

mindnlp.peft.tuners.prompt_tuning.config.PromptTuningConfig.__post_init__()

This method initializes the PromptTuningConfig object after its creation.

PARAMETER DESCRIPTION
self

The instance of the PromptTuningConfig class.

RETURNS DESCRIPTION

None. This method does not return any value.

RAISES DESCRIPTION
-ValueError

If the prompt_tuning_init is set to TEXT and tokenizer_name_or_path is not provided.

-ValueError

If the prompt_tuning_init is set to TEXT and prompt_tuning_init_text is not provided.

-ValueError

If tokenizer_kwargs is provided but prompt_tuning_init is not set to TEXT.

Source code in mindnlp/peft/tuners/prompt_tuning/config.py
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
def __post_init__(self):
    r"""
    This method initializes the PromptTuningConfig object after its creation.

    Args:
        self: The instance of the PromptTuningConfig class.

    Returns:
        None. This method does not return any value.

    Raises:
        - ValueError: If the prompt_tuning_init is set to TEXT and tokenizer_name_or_path is not provided.
        - ValueError: If the prompt_tuning_init is set to TEXT and prompt_tuning_init_text is not provided.
        - ValueError: If tokenizer_kwargs is provided but prompt_tuning_init is not set to TEXT.
    """
    self.peft_type = PeftType.PROMPT_TUNING
    if (self.prompt_tuning_init == PromptTuningInit.TEXT) and not self.tokenizer_name_or_path:
        raise ValueError(
            f"When prompt_tuning_init='{PromptTuningInit.TEXT.value}', "
            f"tokenizer_name_or_path can't be {self.tokenizer_name_or_path}."
        )
    if (self.prompt_tuning_init == PromptTuningInit.TEXT) and self.prompt_tuning_init_text is None:
        raise ValueError(
            f"When prompt_tuning_init='{PromptTuningInit.TEXT.value}', "
            f"prompt_tuning_init_text can't be {self.prompt_tuning_init_text}."
        )
    if self.tokenizer_kwargs and (self.prompt_tuning_init != PromptTuningInit.TEXT):
        raise ValueError(
            f"tokenizer_kwargs only valid when using prompt_tuning_init='{PromptTuningInit.TEXT.value}'."
        )

mindnlp.peft.tuners.prompt_tuning.config.PromptTuningInit

Bases: str, Enum

Represents an initialization state for prompt tuning in a Python class named 'PromptTuningInit'. This class inherits from the 'str' class and the 'enum.Enum' class.

PromptTuningInit is used to define and manage the initialization state for prompt tuning. It provides functionality to set and retrieve the initialization state, and inherits all the methods and attributes of the 'str' class and the 'enum.Enum' class.

Inherited Attributes from the 'str' class: - capitalize() - casefold() - center() - count() - encode() - endswith() - expandtabs() - find() - format() - format_map() - index() - isalnum() - isalpha() - isascii() - isdecimal() - isdigit() - isidentifier() - islower() - isnumeric() - isprintable() - isspace() - istitle() - isupper() - join() - ljust() - lower() - lstrip() - maketrans() - partition() - replace() - rfind() - rindex() - rjust() - rpartition() - rsplit() - rstrip() - split() - splitlines() - startswith() - strip() - swapcase() - title() - translate() - upper() - zfill()

Inherited Attributes from the 'enum.Enum' class: - name - value

Inherited Methods from the 'enum.Enum' class: - class - contains - delattr - dir - eq - format - ge - getattribute - getitem - gt - hash - init - init_subclass - iter - le - len - lt - members - module - ne - new - reduce - reduce_ex - repr - setattr - sizeof - str - subclasshook

Source code in mindnlp/peft/tuners/prompt_tuning/config.py
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
class PromptTuningInit(str, enum.Enum):

    r"""
    Represents an initialization state for prompt tuning in a Python class named 'PromptTuningInit'. 
    This class inherits from the 'str' class and the 'enum.Enum' class.

    PromptTuningInit is used to define and manage the initialization state for prompt tuning. 
    It provides functionality to set and retrieve the initialization state, and inherits 
    all the methods and attributes of the 'str' class and the 'enum.Enum' class.

    Attributes:
        - None

    Methods:
        - None

    Inherited Attributes from the 'str' class:
        - capitalize()
        - casefold()
        - center()
        - count()
        - encode()
        - endswith()
        - expandtabs()
        - find()
        - format()
        - format_map()
        - index()
        - isalnum()
        - isalpha()
        - isascii()
        - isdecimal()
        - isdigit()
        - isidentifier()
        - islower()
        - isnumeric()
        - isprintable()
        - isspace()
        - istitle()
        - isupper()
        - join()
        - ljust()
        - lower()
        - lstrip()
        - maketrans()
        - partition()
        - replace()
        - rfind()
        - rindex()
        - rjust()
        - rpartition()
        - rsplit()
        - rstrip()
        - split()
        - splitlines()
        - startswith()
        - strip()
        - swapcase()
        - title()
        - translate()
        - upper()
        - zfill()

    Inherited Attributes from the 'enum.Enum' class:
        - name
        - value

    Inherited Methods from the 'enum.Enum' class:
        - __class__
        - __contains__
        - __delattr__
        - __dir__
        - __eq__
        - __format__
        - __ge__
        - __getattribute__
        - __getitem__
        - __gt__
        - __hash__
        - __init__
        - __init_subclass__
        - __iter__
        - __le__
        - __len__
        - __lt__
        - __members__
        - __module__
        - __ne__
        - __new__
        - __reduce__
        - __reduce_ex__
        - __repr__
        - __setattr__
        - __sizeof__
        - __str__
        - __subclasshook__

    """
    TEXT = "TEXT"
    RANDOM = "RANDOM"

mindnlp.peft.tuners.prompt_tuning.model

prompt tuning model

mindnlp.peft.tuners.prompt_tuning.model.PromptEmbedding

Bases: Module

The model to encode virtual tokens into prompt embeddings.

PARAMETER DESCRIPTION
config

The configuration of the prompt embedding.

TYPE: [`PromptTuningConfig`]

word_embeddings

The word embeddings of the base transformer model.

TYPE: `nn.Module`

Attributes: - embedding (nn.Embedding) -- The embedding layer of the prompt embedding.

Example:

>>> from peft import PromptEmbedding, PromptTuningConfig

>>> config = PromptTuningConfig(
...     peft_type="PROMPT_TUNING",
...     task_type="SEQ_2_SEQ_LM",
...     num_virtual_tokens=20,
...     token_dim=768,
...     num_transformer_submodules=1,
...     num_attention_heads=12,
...     num_layers=12,
...     prompt_tuning_init="TEXT",
...     prompt_tuning_init_text="Predict if sentiment of this review is positive, negative or neutral",
...     tokenizer_name_or_path="t5-base",
... )

>>> # t5_model.shared is the word embeddings of the base model
>>> prompt_embedding = PromptEmbedding(config, t5_model.shared)

Input Shape: (batch_size, total_virtual_tokens)

Output Shape: (batch_size, total_virtual_tokens, token_dim)

Source code in mindnlp/peft/tuners/prompt_tuning/model.py
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
class PromptEmbedding(nn.Module):
    """
    The model to encode virtual tokens into prompt embeddings.

    Args:
        config ([`PromptTuningConfig`]): The configuration of the prompt embedding.
        word_embeddings (`nn.Module`): The word embeddings of the base transformer model.

    **Attributes**:
        - **embedding** (`nn.Embedding`) -- The embedding layer of the prompt embedding.

    Example:

    ```py
    >>> from peft import PromptEmbedding, PromptTuningConfig

    >>> config = PromptTuningConfig(
    ...     peft_type="PROMPT_TUNING",
    ...     task_type="SEQ_2_SEQ_LM",
    ...     num_virtual_tokens=20,
    ...     token_dim=768,
    ...     num_transformer_submodules=1,
    ...     num_attention_heads=12,
    ...     num_layers=12,
    ...     prompt_tuning_init="TEXT",
    ...     prompt_tuning_init_text="Predict if sentiment of this review is positive, negative or neutral",
    ...     tokenizer_name_or_path="t5-base",
    ... )

    >>> # t5_model.shared is the word embeddings of the base model
    >>> prompt_embedding = PromptEmbedding(config, t5_model.shared)
    ```

    Input Shape: (`batch_size`, `total_virtual_tokens`)

    Output Shape: (`batch_size`, `total_virtual_tokens`, `token_dim`)
    """
    def __init__(self, config, word_embeddings):
        r"""
        Initialize the PromptEmbedding class.

        Args:
            self: Reference to the current instance of the class.
            config (object): Configuration object containing various settings.
                - num_virtual_tokens (int): Number of virtual tokens.
                - num_transformer_subcells (int): Number of transformer subcells.
                - token_dim (int): Dimensionality of the token embeddings.
                - prompt_tuning_init (Enum): Specifies the type of prompt tuning initialization.
                - inference_mode (bool): Indicates if the model is in inference mode.
                - tokenizer_kwargs (dict, optional): Additional keyword arguments for the tokenizer.
                - tokenizer_name_or_path (str): Name or path of the pretrained tokenizer.
                - prompt_tuning_init_text (str): Text used for prompt tuning initialization.
            word_embeddings (object): Word embeddings for initializing the embedding layer.

        Returns:
            None. The method initializes the embedding layer with the provided word embeddings.

        Raises:
            ImportError: If the transformers module cannot be imported.
            ValueError: If the number of text tokens exceeds the total virtual tokens.
            TypeError: If the word embedding weights cannot be converted to float32.
        """
        super().__init__()

        total_virtual_tokens = config.num_virtual_tokens * config.num_transformer_submodules
        self.embedding = nn.Embedding(total_virtual_tokens, config.token_dim)
        if config.prompt_tuning_init == PromptTuningInit.TEXT and not config.inference_mode:
            from ....transformers import AutoTokenizer

            tokenizer_kwargs = config.tokenizer_kwargs or {}
            tokenizer = AutoTokenizer.from_pretrained(config.tokenizer_name_or_path, **tokenizer_kwargs)
            init_text = config.prompt_tuning_init_text
            init_token_ids = tokenizer(init_text)["input_ids"]
            # Trim or iterate until num_text_tokens matches total_virtual_tokens
            num_text_tokens = len(init_token_ids)
            if num_text_tokens > total_virtual_tokens:
                init_token_ids = init_token_ids[:total_virtual_tokens]
            elif num_text_tokens < total_virtual_tokens:
                num_reps = math.ceil(total_virtual_tokens / num_text_tokens)
                init_token_ids = init_token_ids * num_reps
            init_token_ids = init_token_ids[:total_virtual_tokens]
            init_token_ids = mindspore.tensor(init_token_ids)
            word_embedding_weights = word_embeddings(init_token_ids).copy()
            word_embedding_weights = word_embedding_weights.to(mindspore.float32)
            self.embedding.weight = Parameter(word_embedding_weights)

    def forward(self, indices):
        r"""
        Construct the prompt embeddings based on the given indices.

        Args:
            self (PromptEmbedding): An instance of the PromptEmbedding class.
            indices (int): The indices used to retrieve the prompt embeddings.

        Returns:
            None: This method does not return any value.

        Raises:
            None: This method does not raise any exceptions.
        """
        # Just get embeddings
        prompt_embeddings = self.embedding(indices)
        return prompt_embeddings

mindnlp.peft.tuners.prompt_tuning.model.PromptEmbedding.__init__(config, word_embeddings)

Initialize the PromptEmbedding class.

PARAMETER DESCRIPTION
self

Reference to the current instance of the class.

config

Configuration object containing various settings. - num_virtual_tokens (int): Number of virtual tokens. - num_transformer_subcells (int): Number of transformer subcells. - token_dim (int): Dimensionality of the token embeddings. - prompt_tuning_init (Enum): Specifies the type of prompt tuning initialization. - inference_mode (bool): Indicates if the model is in inference mode. - tokenizer_kwargs (dict, optional): Additional keyword arguments for the tokenizer. - tokenizer_name_or_path (str): Name or path of the pretrained tokenizer. - prompt_tuning_init_text (str): Text used for prompt tuning initialization.

TYPE: object

word_embeddings

Word embeddings for initializing the embedding layer.

TYPE: object

RETURNS DESCRIPTION

None. The method initializes the embedding layer with the provided word embeddings.

RAISES DESCRIPTION
ImportError

If the transformers module cannot be imported.

ValueError

If the number of text tokens exceeds the total virtual tokens.

TypeError

If the word embedding weights cannot be converted to float32.

Source code in mindnlp/peft/tuners/prompt_tuning/model.py
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
def __init__(self, config, word_embeddings):
    r"""
    Initialize the PromptEmbedding class.

    Args:
        self: Reference to the current instance of the class.
        config (object): Configuration object containing various settings.
            - num_virtual_tokens (int): Number of virtual tokens.
            - num_transformer_subcells (int): Number of transformer subcells.
            - token_dim (int): Dimensionality of the token embeddings.
            - prompt_tuning_init (Enum): Specifies the type of prompt tuning initialization.
            - inference_mode (bool): Indicates if the model is in inference mode.
            - tokenizer_kwargs (dict, optional): Additional keyword arguments for the tokenizer.
            - tokenizer_name_or_path (str): Name or path of the pretrained tokenizer.
            - prompt_tuning_init_text (str): Text used for prompt tuning initialization.
        word_embeddings (object): Word embeddings for initializing the embedding layer.

    Returns:
        None. The method initializes the embedding layer with the provided word embeddings.

    Raises:
        ImportError: If the transformers module cannot be imported.
        ValueError: If the number of text tokens exceeds the total virtual tokens.
        TypeError: If the word embedding weights cannot be converted to float32.
    """
    super().__init__()

    total_virtual_tokens = config.num_virtual_tokens * config.num_transformer_submodules
    self.embedding = nn.Embedding(total_virtual_tokens, config.token_dim)
    if config.prompt_tuning_init == PromptTuningInit.TEXT and not config.inference_mode:
        from ....transformers import AutoTokenizer

        tokenizer_kwargs = config.tokenizer_kwargs or {}
        tokenizer = AutoTokenizer.from_pretrained(config.tokenizer_name_or_path, **tokenizer_kwargs)
        init_text = config.prompt_tuning_init_text
        init_token_ids = tokenizer(init_text)["input_ids"]
        # Trim or iterate until num_text_tokens matches total_virtual_tokens
        num_text_tokens = len(init_token_ids)
        if num_text_tokens > total_virtual_tokens:
            init_token_ids = init_token_ids[:total_virtual_tokens]
        elif num_text_tokens < total_virtual_tokens:
            num_reps = math.ceil(total_virtual_tokens / num_text_tokens)
            init_token_ids = init_token_ids * num_reps
        init_token_ids = init_token_ids[:total_virtual_tokens]
        init_token_ids = mindspore.tensor(init_token_ids)
        word_embedding_weights = word_embeddings(init_token_ids).copy()
        word_embedding_weights = word_embedding_weights.to(mindspore.float32)
        self.embedding.weight = Parameter(word_embedding_weights)

mindnlp.peft.tuners.prompt_tuning.model.PromptEmbedding.forward(indices)

Construct the prompt embeddings based on the given indices.

PARAMETER DESCRIPTION
self

An instance of the PromptEmbedding class.

TYPE: PromptEmbedding

indices

The indices used to retrieve the prompt embeddings.

TYPE: int

RETURNS DESCRIPTION
None

This method does not return any value.

RAISES DESCRIPTION
None

This method does not raise any exceptions.

Source code in mindnlp/peft/tuners/prompt_tuning/model.py
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
def forward(self, indices):
    r"""
    Construct the prompt embeddings based on the given indices.

    Args:
        self (PromptEmbedding): An instance of the PromptEmbedding class.
        indices (int): The indices used to retrieve the prompt embeddings.

    Returns:
        None: This method does not return any value.

    Raises:
        None: This method does not raise any exceptions.
    """
    # Just get embeddings
    prompt_embeddings = self.embedding(indices)
    return prompt_embeddings