Skip to content

chatglm

mindnlp.transformers.models.chatglm.modeling_chatglm.CHATGLM_6B_PRETRAINED_MODEL_ARCHIVE_LIST = ['THUDM/chatglm-6b'] module-attribute

mindnlp.transformers.models.chatglm.modeling_chatglm.ChatGLMModel

Bases: ChatGLMPreTrainedModel

The model can behave as an encoder (with only self-attention) as well as a decoder, in which case a layer of cross-attention is added between the self-attention layers, following the architecture described in Attention is all you need by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser and Illia Polosukhin.

To behave as an decoder the model needs to be initialized with the is_decoder argument of the configuration set to True. To be used in a Seq2Seq model, the model needs to initialized with both is_decoder argument and add_cross_attention set to True; an encoder_hidden_states is then expected as an input to the forward pass.

Source code in mindnlp/transformers/models/chatglm/modeling_chatglm.py
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
class ChatGLMModel(ChatGLMPreTrainedModel):
    """

    The model can behave as an encoder (with only self-attention) as well
    as a decoder, in which case a layer of cross-attention is added between
    the self-attention layers, following the architecture described in [Attention is
    all you need](https://arxiv.org/abs/1706.03762) by Ashish Vaswani,
    Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser and Illia Polosukhin.

    To behave as an decoder the model needs to be initialized with the
    `is_decoder` argument of the configuration set to `True`.
    To be used in a Seq2Seq model, the model needs to initialized with both `is_decoder`
    argument and `add_cross_attention` set to `True`; an
    `encoder_hidden_states` is then expected as an input to the forward pass.
    """
    def __init__(self, config: ChatGLMConfig):
        """
        Initializes a ChatGLMModel object with the provided configuration.

        Args:
            self: The instance of the ChatGLMModel class.
            config (ChatGLMConfig):
                An object containing configuration parameters for the model.

                - max_sequence_length (int): The maximum length of input sequences.
                - hidden_size (int): The size of the hidden layer.
                - num_attention_heads (int): The number of attention heads.
                - vocab_size (int): The size of the vocabulary.
                - num_layers (int): The number of layers in the model.
                - layernorm_epsilon (float): The epsilon value for layer normalization.
                - inner_hidden_size (int): The size of the inner hidden layer.
                - position_encoding_2d (bool): Flag indicating whether to use 2D position encoding.
                - pre_seq_len (int): The length of the prefix sequence.
                - prefix_projection (bool): Flag indicating whether to project the prefix or not.

        Returns:
            None.

        Raises:
            ValueError: If any of the configuration parameters are invalid or missing.
            TypeError: If the data types of the configuration parameters are incorrect.
            RuntimeError: If an error occurs during the initialization process.
        """
        super().__init__(config)
        # recording parameters
        self.max_sequence_length = config.max_sequence_length
        self.hidden_size = config.hidden_size
        self.params_dtype = mindspore.float16
        self.num_attention_heads = config.num_attention_heads
        self.vocab_size = config.vocab_size
        self.num_layers = config.num_layers
        self.layernorm_epsilon = config.layernorm_epsilon
        self.inner_hidden_size = config.inner_hidden_size
        self.hidden_size_per_attention_head = self.hidden_size // self.num_attention_heads
        self.position_encoding_2d = config.position_encoding_2d
        self.pre_seq_len = config.pre_seq_len
        self.prefix_projection = config.prefix_projection

        self.word_embeddings = nn.Embedding(
            self.vocab_size, self.hidden_size,
            dtype=self.params_dtype
        )

        def get_layer(layer_id):
            return GLMBlock(
                self.hidden_size,
                self.num_attention_heads,
                self.layernorm_epsilon,
                layer_id,
                inner_hidden_size=self.inner_hidden_size,
                hidden_size_per_attention_head=self.hidden_size_per_attention_head,
                layernorm=nn.LayerNorm,
                use_bias=True,
                params_dtype=self.params_dtype,
                position_encoding_2d=self.position_encoding_2d,
            )

        self.layers = nn.ModuleList(
            [get_layer(layer_id) for layer_id in range(self.num_layers)]
        )

        # Final layer norm before output.
        self.final_layernorm = nn.LayerNorm([self.hidden_size], eps=self.layernorm_epsilon)

        if self.pre_seq_len is not None:
            for param in self.parameters():
                param.requires_grad = False
            self.prefix_tokens = ops.arange(self.pre_seq_len).long()
            self.prefix_encoder = PrefixEncoder(config)
            self.dropout = nn.Dropout(p=0.1)

            # total_params = sum(p.numel() for p in self.parameters())
            # trainable_params = sum(p.numel() for p in self.parameters() if p.requires_grad)
            # print("Using p-tuning v2: # trainable_params = {} / {}".format(trainable_params, total_params))

    def get_input_embeddings(self):
        """
        Returns the word embeddings for the input data.

        Args:
            self (ChatGLMModel): An instance of the ChatGLMModel class.

        Returns:
            None

        Raises:
            None

        This method retrieves the word embeddings used for the input data in the ChatGLMModel.
        The word embeddings are a numerical representation of words that capture semantic meaning.
        The embeddings are trained on a large corpus of text data to capture relationships between words.

        Note that this method does not modify the input embeddings. It simply returns the existing word embeddings that have been set for the model.

        Example:
            ```python
            >>> model = ChatGLMModel()
            >>> input_embeddings = model.get_input_embeddings()
            ...
            >>> # Perform operations on input_embeddings
            ...
            ```
        """
        return self.word_embeddings

    def set_input_embeddings(self, new_embeddings: mindspore.Tensor):
        """
        This method sets the input embeddings for the ChatGLMModel.

        Args:
            self (ChatGLMModel): The instance of the ChatGLMModel class.
            new_embeddings (mindspore.Tensor): The new embeddings to be set as input embeddings for the model.
                It should be a mindspore Tensor object.

        Returns:
            None.

        Raises:
            None.
        """
        self.word_embeddings = new_embeddings

    def get_prompt(self, batch_size, dtype=mindspore.float16):
        """
        This method retrieves the prompt for generating responses in the ChatGLMModel.

        Args:
            self (object): The instance of the ChatGLMModel class.
            batch_size (int): The number of prompt sequences to generate.
            dtype (mindspore.dtype, optional): The data type for the prompt key values. Default is mindspore.float16.

        Returns:
            None.

        Raises:
            TypeError: If the batch_size is not an integer.
            ValueError: If the batch_size is less than or equal to 0.
            TypeError: If the dtype is not a valid mindspore data type.
        """
        prefix_tokens = self.prefix_tokens.unsqueeze(0).expand(batch_size, -1)
        past_key_values = self.prefix_encoder(prefix_tokens).astype(dtype)
        past_key_values = past_key_values.view(
            batch_size,
            self.pre_seq_len,
            self.num_layers * 2,
            self.num_attention_heads,
            self.hidden_size // self.num_attention_heads
        )
        # seq_len, b, nh, hidden_size
        past_key_values = self.dropout(past_key_values)
        past_key_values = past_key_values.permute([2, 1, 0, 3, 4]).split(2)
        # past_key_values = [(v[0], v[1]) for v in past_key_values]
        return past_key_values

    def forward(
            self,
            input_ids: Optional[mindspore.Tensor] = None,
            position_ids: Optional[mindspore.Tensor] = None,
            attention_mask: Optional[mindspore.Tensor] = None,
            past_key_values: Optional[Tuple[Tuple[mindspore.Tensor, mindspore.Tensor], ...]] = None,
            inputs_embeds: Optional[mindspore.Tensor] = None,
            use_cache: Optional[bool] = None,
            output_attentions: Optional[bool] = None,
            output_hidden_states: Optional[bool] = None,
            return_dict: Optional[bool] = None,
    ) -> Union[Tuple[mindspore.Tensor, ...], BaseModelOutputWithPast]:
        '''
        Constructs the ChatGLMModel.

        Args:
            self: The object itself.
            input_ids (Optional[mindspore.Tensor]): The input tensor containing the IDs of the tokens. Defaults to None.
            position_ids (Optional[mindspore.Tensor]): The input tensor containing the IDs of the positions. Defaults to None.
            attention_mask (Optional[mindspore.Tensor]): The input tensor containing the attention mask. Defaults to None.
            past_key_values (Optional[Tuple[Tuple[mindspore.Tensor, mindspore.Tensor]]]):
                The input tensor containing the past key values. Defaults to None.
            inputs_embeds (Optional[mindspore.Tensor]): The input tensor containing the embedded inputs. Defaults to None.
            use_cache (Optional[bool]): Specifies whether to use cache. Defaults to None.
            output_attentions (Optional[bool]): Specifies whether to output attentions. Defaults to None.
            output_hidden_states (Optional[bool]): Specifies whether to output hidden states. Defaults to None.
            return_dict (Optional[bool]): Specifies whether to return a dictionary. Defaults to None.

        Returns:
            Union[Tuple[mindspore.Tensor, ...], BaseModelOutputWithPast]: The output of the model.

        Raises:
            ValueError: If both input_ids and inputs_embeds are specified.
            ValueError: If neither input_ids nor inputs_embeds are specified.
        '''
        output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
        output_hidden_states = (
            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
        )
        use_cache = use_cache if use_cache is not None else self.config.use_cache
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        if input_ids is not None and inputs_embeds is not None:
            raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
        if input_ids is not None:
            batch_size, _ = input_ids.shape[:2]
        elif inputs_embeds is not None:
            batch_size, _ = inputs_embeds.shape[:2]
        else:
            raise ValueError("You have to specify either input_ids or inputs_embeds")

        if inputs_embeds is None:
            inputs_embeds = self.word_embeddings(input_ids)

        if past_key_values is None:
            if self.pre_seq_len is not None:
                past_key_values = self.get_prompt(batch_size=input_ids.shape[0], dtype=inputs_embeds.dtype)
            else:
                past_key_values = tuple([None] * len(self.layers))

            if attention_mask is None:
                attention_mask = self.get_masks(
                    input_ids,
                )

            if position_ids is None:
                MASK, gMASK = self.config.mask_token_id, self.config.gmask_token_id
                seqs = input_ids.tolist()

                mask_positions, use_gmasks = [], []
                for seq in seqs:
                    mask_token = gMASK if gMASK in seq else MASK
                    use_gmask = mask_token == gMASK
                    mask_positions.append(seq.index(mask_token))
                    use_gmasks.append(use_gmask)

                position_ids = self.get_position_ids(
                    input_ids,
                    mask_positions=mask_positions,
                    use_gmasks=use_gmasks
                )

        if self.pre_seq_len is not None and attention_mask is not None:
            prefix_attention_mask = ops.ones((batch_size, 1, input_ids.shape[-1], self.pre_seq_len))
            prefix_attention_mask = (prefix_attention_mask < 0.5).bool()
            attention_mask = ops.cat((prefix_attention_mask, attention_mask), axis=3)

        # [seq_len, batch, hidden_size]
        hidden_states = inputs_embeds.swapaxes(0, 1)

        presents = () if use_cache else None
        all_self_attentions = () if output_attentions else None
        all_hidden_states = () if output_hidden_states else None

        if attention_mask is None:
            attention_mask = ops.zeros((1, 1)).bool()

        for i, layer in enumerate(self.layers):

            if output_hidden_states:
                all_hidden_states = all_hidden_states + (hidden_states,)
            layer_past = past_key_values[i]

            layer_ret = layer(
                hidden_states,
                position_ids=position_ids,
                attention_mask=attention_mask,
                layer_id=mindspore.tensor(i),
                layer_past=layer_past,
                use_cache=use_cache,
                output_attentions=output_attentions
            )

            hidden_states = layer_ret[0]
            if use_cache:
                presents = presents + (layer_ret[1],)

            if output_attentions:
                all_self_attentions = all_self_attentions + (layer_ret[2 if use_cache else 1],)

        # Final layer norm.
        hidden_states = self.final_layernorm(hidden_states)

        if output_hidden_states:
            all_hidden_states = all_hidden_states + (hidden_states,)

        if not return_dict:
            return tuple(v for v in [hidden_states, presents, all_hidden_states, all_self_attentions] if v is not None)

        return BaseModelOutputWithPast(
            last_hidden_state=hidden_states,
            past_key_values=presents,
            hidden_states=all_hidden_states,
            attentions=all_self_attentions,
        )

mindnlp.transformers.models.chatglm.modeling_chatglm.ChatGLMModel.__init__(config)

Initializes a ChatGLMModel object with the provided configuration.

PARAMETER DESCRIPTION
self

The instance of the ChatGLMModel class.

config

An object containing configuration parameters for the model.

  • max_sequence_length (int): The maximum length of input sequences.
  • hidden_size (int): The size of the hidden layer.
  • num_attention_heads (int): The number of attention heads.
  • vocab_size (int): The size of the vocabulary.
  • num_layers (int): The number of layers in the model.
  • layernorm_epsilon (float): The epsilon value for layer normalization.
  • inner_hidden_size (int): The size of the inner hidden layer.
  • position_encoding_2d (bool): Flag indicating whether to use 2D position encoding.
  • pre_seq_len (int): The length of the prefix sequence.
  • prefix_projection (bool): Flag indicating whether to project the prefix or not.

TYPE: ChatGLMConfig

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
ValueError

If any of the configuration parameters are invalid or missing.

TypeError

If the data types of the configuration parameters are incorrect.

RuntimeError

If an error occurs during the initialization process.

Source code in mindnlp/transformers/models/chatglm/modeling_chatglm.py
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
def __init__(self, config: ChatGLMConfig):
    """
    Initializes a ChatGLMModel object with the provided configuration.

    Args:
        self: The instance of the ChatGLMModel class.
        config (ChatGLMConfig):
            An object containing configuration parameters for the model.

            - max_sequence_length (int): The maximum length of input sequences.
            - hidden_size (int): The size of the hidden layer.
            - num_attention_heads (int): The number of attention heads.
            - vocab_size (int): The size of the vocabulary.
            - num_layers (int): The number of layers in the model.
            - layernorm_epsilon (float): The epsilon value for layer normalization.
            - inner_hidden_size (int): The size of the inner hidden layer.
            - position_encoding_2d (bool): Flag indicating whether to use 2D position encoding.
            - pre_seq_len (int): The length of the prefix sequence.
            - prefix_projection (bool): Flag indicating whether to project the prefix or not.

    Returns:
        None.

    Raises:
        ValueError: If any of the configuration parameters are invalid or missing.
        TypeError: If the data types of the configuration parameters are incorrect.
        RuntimeError: If an error occurs during the initialization process.
    """
    super().__init__(config)
    # recording parameters
    self.max_sequence_length = config.max_sequence_length
    self.hidden_size = config.hidden_size
    self.params_dtype = mindspore.float16
    self.num_attention_heads = config.num_attention_heads
    self.vocab_size = config.vocab_size
    self.num_layers = config.num_layers
    self.layernorm_epsilon = config.layernorm_epsilon
    self.inner_hidden_size = config.inner_hidden_size
    self.hidden_size_per_attention_head = self.hidden_size // self.num_attention_heads
    self.position_encoding_2d = config.position_encoding_2d
    self.pre_seq_len = config.pre_seq_len
    self.prefix_projection = config.prefix_projection

    self.word_embeddings = nn.Embedding(
        self.vocab_size, self.hidden_size,
        dtype=self.params_dtype
    )

    def get_layer(layer_id):
        return GLMBlock(
            self.hidden_size,
            self.num_attention_heads,
            self.layernorm_epsilon,
            layer_id,
            inner_hidden_size=self.inner_hidden_size,
            hidden_size_per_attention_head=self.hidden_size_per_attention_head,
            layernorm=nn.LayerNorm,
            use_bias=True,
            params_dtype=self.params_dtype,
            position_encoding_2d=self.position_encoding_2d,
        )

    self.layers = nn.ModuleList(
        [get_layer(layer_id) for layer_id in range(self.num_layers)]
    )

    # Final layer norm before output.
    self.final_layernorm = nn.LayerNorm([self.hidden_size], eps=self.layernorm_epsilon)

    if self.pre_seq_len is not None:
        for param in self.parameters():
            param.requires_grad = False
        self.prefix_tokens = ops.arange(self.pre_seq_len).long()
        self.prefix_encoder = PrefixEncoder(config)
        self.dropout = nn.Dropout(p=0.1)

mindnlp.transformers.models.chatglm.modeling_chatglm.ChatGLMModel.forward(input_ids=None, position_ids=None, attention_mask=None, past_key_values=None, inputs_embeds=None, use_cache=None, output_attentions=None, output_hidden_states=None, return_dict=None)

Constructs the ChatGLMModel.

PARAMETER DESCRIPTION
self

The object itself.

input_ids

The input tensor containing the IDs of the tokens. Defaults to None.

TYPE: Optional[Tensor] DEFAULT: None

position_ids

The input tensor containing the IDs of the positions. Defaults to None.

TYPE: Optional[Tensor] DEFAULT: None

attention_mask

The input tensor containing the attention mask. Defaults to None.

TYPE: Optional[Tensor] DEFAULT: None

past_key_values

The input tensor containing the past key values. Defaults to None.

TYPE: Optional[Tuple[Tuple[Tensor, Tensor]]] DEFAULT: None

inputs_embeds

The input tensor containing the embedded inputs. Defaults to None.

TYPE: Optional[Tensor] DEFAULT: None

use_cache

Specifies whether to use cache. Defaults to None.

TYPE: Optional[bool] DEFAULT: None

output_attentions

Specifies whether to output attentions. Defaults to None.

TYPE: Optional[bool] DEFAULT: None

output_hidden_states

Specifies whether to output hidden states. Defaults to None.

TYPE: Optional[bool] DEFAULT: None

return_dict

Specifies whether to return a dictionary. Defaults to None.

TYPE: Optional[bool] DEFAULT: None

RETURNS DESCRIPTION
Union[Tuple[Tensor, ...], BaseModelOutputWithPast]

Union[Tuple[mindspore.Tensor, ...], BaseModelOutputWithPast]: The output of the model.

RAISES DESCRIPTION
ValueError

If both input_ids and inputs_embeds are specified.

ValueError

If neither input_ids nor inputs_embeds are specified.

Source code in mindnlp/transformers/models/chatglm/modeling_chatglm.py
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        past_key_values: Optional[Tuple[Tuple[mindspore.Tensor, mindspore.Tensor], ...]] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        use_cache: Optional[bool] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
) -> Union[Tuple[mindspore.Tensor, ...], BaseModelOutputWithPast]:
    '''
    Constructs the ChatGLMModel.

    Args:
        self: The object itself.
        input_ids (Optional[mindspore.Tensor]): The input tensor containing the IDs of the tokens. Defaults to None.
        position_ids (Optional[mindspore.Tensor]): The input tensor containing the IDs of the positions. Defaults to None.
        attention_mask (Optional[mindspore.Tensor]): The input tensor containing the attention mask. Defaults to None.
        past_key_values (Optional[Tuple[Tuple[mindspore.Tensor, mindspore.Tensor]]]):
            The input tensor containing the past key values. Defaults to None.
        inputs_embeds (Optional[mindspore.Tensor]): The input tensor containing the embedded inputs. Defaults to None.
        use_cache (Optional[bool]): Specifies whether to use cache. Defaults to None.
        output_attentions (Optional[bool]): Specifies whether to output attentions. Defaults to None.
        output_hidden_states (Optional[bool]): Specifies whether to output hidden states. Defaults to None.
        return_dict (Optional[bool]): Specifies whether to return a dictionary. Defaults to None.

    Returns:
        Union[Tuple[mindspore.Tensor, ...], BaseModelOutputWithPast]: The output of the model.

    Raises:
        ValueError: If both input_ids and inputs_embeds are specified.
        ValueError: If neither input_ids nor inputs_embeds are specified.
    '''
    output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
    output_hidden_states = (
        output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
    )
    use_cache = use_cache if use_cache is not None else self.config.use_cache
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    if input_ids is not None and inputs_embeds is not None:
        raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
    if input_ids is not None:
        batch_size, _ = input_ids.shape[:2]
    elif inputs_embeds is not None:
        batch_size, _ = inputs_embeds.shape[:2]
    else:
        raise ValueError("You have to specify either input_ids or inputs_embeds")

    if inputs_embeds is None:
        inputs_embeds = self.word_embeddings(input_ids)

    if past_key_values is None:
        if self.pre_seq_len is not None:
            past_key_values = self.get_prompt(batch_size=input_ids.shape[0], dtype=inputs_embeds.dtype)
        else:
            past_key_values = tuple([None] * len(self.layers))

        if attention_mask is None:
            attention_mask = self.get_masks(
                input_ids,
            )

        if position_ids is None:
            MASK, gMASK = self.config.mask_token_id, self.config.gmask_token_id
            seqs = input_ids.tolist()

            mask_positions, use_gmasks = [], []
            for seq in seqs:
                mask_token = gMASK if gMASK in seq else MASK
                use_gmask = mask_token == gMASK
                mask_positions.append(seq.index(mask_token))
                use_gmasks.append(use_gmask)

            position_ids = self.get_position_ids(
                input_ids,
                mask_positions=mask_positions,
                use_gmasks=use_gmasks
            )

    if self.pre_seq_len is not None and attention_mask is not None:
        prefix_attention_mask = ops.ones((batch_size, 1, input_ids.shape[-1], self.pre_seq_len))
        prefix_attention_mask = (prefix_attention_mask < 0.5).bool()
        attention_mask = ops.cat((prefix_attention_mask, attention_mask), axis=3)

    # [seq_len, batch, hidden_size]
    hidden_states = inputs_embeds.swapaxes(0, 1)

    presents = () if use_cache else None
    all_self_attentions = () if output_attentions else None
    all_hidden_states = () if output_hidden_states else None

    if attention_mask is None:
        attention_mask = ops.zeros((1, 1)).bool()

    for i, layer in enumerate(self.layers):

        if output_hidden_states:
            all_hidden_states = all_hidden_states + (hidden_states,)
        layer_past = past_key_values[i]

        layer_ret = layer(
            hidden_states,
            position_ids=position_ids,
            attention_mask=attention_mask,
            layer_id=mindspore.tensor(i),
            layer_past=layer_past,
            use_cache=use_cache,
            output_attentions=output_attentions
        )

        hidden_states = layer_ret[0]
        if use_cache:
            presents = presents + (layer_ret[1],)

        if output_attentions:
            all_self_attentions = all_self_attentions + (layer_ret[2 if use_cache else 1],)

    # Final layer norm.
    hidden_states = self.final_layernorm(hidden_states)

    if output_hidden_states:
        all_hidden_states = all_hidden_states + (hidden_states,)

    if not return_dict:
        return tuple(v for v in [hidden_states, presents, all_hidden_states, all_self_attentions] if v is not None)

    return BaseModelOutputWithPast(
        last_hidden_state=hidden_states,
        past_key_values=presents,
        hidden_states=all_hidden_states,
        attentions=all_self_attentions,
    )

mindnlp.transformers.models.chatglm.modeling_chatglm.ChatGLMModel.get_input_embeddings()

Returns the word embeddings for the input data.

PARAMETER DESCRIPTION
self

An instance of the ChatGLMModel class.

TYPE: ChatGLMModel

RETURNS DESCRIPTION

None

This method retrieves the word embeddings used for the input data in the ChatGLMModel. The word embeddings are a numerical representation of words that capture semantic meaning. The embeddings are trained on a large corpus of text data to capture relationships between words.

Note that this method does not modify the input embeddings. It simply returns the existing word embeddings that have been set for the model.

Example
>>> model = ChatGLMModel()
>>> input_embeddings = model.get_input_embeddings()
...
>>> # Perform operations on input_embeddings
...
Source code in mindnlp/transformers/models/chatglm/modeling_chatglm.py
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
def get_input_embeddings(self):
    """
    Returns the word embeddings for the input data.

    Args:
        self (ChatGLMModel): An instance of the ChatGLMModel class.

    Returns:
        None

    Raises:
        None

    This method retrieves the word embeddings used for the input data in the ChatGLMModel.
    The word embeddings are a numerical representation of words that capture semantic meaning.
    The embeddings are trained on a large corpus of text data to capture relationships between words.

    Note that this method does not modify the input embeddings. It simply returns the existing word embeddings that have been set for the model.

    Example:
        ```python
        >>> model = ChatGLMModel()
        >>> input_embeddings = model.get_input_embeddings()
        ...
        >>> # Perform operations on input_embeddings
        ...
        ```
    """
    return self.word_embeddings

mindnlp.transformers.models.chatglm.modeling_chatglm.ChatGLMModel.get_prompt(batch_size, dtype=mindspore.float16)

This method retrieves the prompt for generating responses in the ChatGLMModel.

PARAMETER DESCRIPTION
self

The instance of the ChatGLMModel class.

TYPE: object

batch_size

The number of prompt sequences to generate.

TYPE: int

dtype

The data type for the prompt key values. Default is mindspore.float16.

TYPE: dtype DEFAULT: float16

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
TypeError

If the batch_size is not an integer.

ValueError

If the batch_size is less than or equal to 0.

TypeError

If the dtype is not a valid mindspore data type.

Source code in mindnlp/transformers/models/chatglm/modeling_chatglm.py
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
def get_prompt(self, batch_size, dtype=mindspore.float16):
    """
    This method retrieves the prompt for generating responses in the ChatGLMModel.

    Args:
        self (object): The instance of the ChatGLMModel class.
        batch_size (int): The number of prompt sequences to generate.
        dtype (mindspore.dtype, optional): The data type for the prompt key values. Default is mindspore.float16.

    Returns:
        None.

    Raises:
        TypeError: If the batch_size is not an integer.
        ValueError: If the batch_size is less than or equal to 0.
        TypeError: If the dtype is not a valid mindspore data type.
    """
    prefix_tokens = self.prefix_tokens.unsqueeze(0).expand(batch_size, -1)
    past_key_values = self.prefix_encoder(prefix_tokens).astype(dtype)
    past_key_values = past_key_values.view(
        batch_size,
        self.pre_seq_len,
        self.num_layers * 2,
        self.num_attention_heads,
        self.hidden_size // self.num_attention_heads
    )
    # seq_len, b, nh, hidden_size
    past_key_values = self.dropout(past_key_values)
    past_key_values = past_key_values.permute([2, 1, 0, 3, 4]).split(2)
    # past_key_values = [(v[0], v[1]) for v in past_key_values]
    return past_key_values

mindnlp.transformers.models.chatglm.modeling_chatglm.ChatGLMModel.set_input_embeddings(new_embeddings)

This method sets the input embeddings for the ChatGLMModel.

PARAMETER DESCRIPTION
self

The instance of the ChatGLMModel class.

TYPE: ChatGLMModel

new_embeddings

The new embeddings to be set as input embeddings for the model. It should be a mindspore Tensor object.

TYPE: Tensor

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/chatglm/modeling_chatglm.py
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
def set_input_embeddings(self, new_embeddings: mindspore.Tensor):
    """
    This method sets the input embeddings for the ChatGLMModel.

    Args:
        self (ChatGLMModel): The instance of the ChatGLMModel class.
        new_embeddings (mindspore.Tensor): The new embeddings to be set as input embeddings for the model.
            It should be a mindspore Tensor object.

    Returns:
        None.

    Raises:
        None.
    """
    self.word_embeddings = new_embeddings

mindnlp.transformers.models.chatglm.modeling_chatglm.ChatGLMPreTrainedModel

Bases: PreTrainedModel

An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained models.

Source code in mindnlp/transformers/models/chatglm/modeling_chatglm.py
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
class ChatGLMPreTrainedModel(PreTrainedModel):
    """
    An abstract class to handle weights initialization and
    a simple interface for downloading and loading pretrained models.
    """
    config_class = ChatGLMConfig
    base_model_prefix = "transformer"
    _no_split_modules = ["GLMBlock"]
    _keys_to_ignore_on_load_unexpected = [r'inv_freq']

    def _init_weights(self, cell: nn.Module):
        """Initialize the weights."""
        return

    def get_masks(self, input_ids):
        """
        This method named 'get_masks' is defined within the class 'ChatGLMPreTrainedModel'. It takes two parameters: self and input_ids.

        Args:
            self: A reference to the instance of the class.
            input_ids: A tensor representing the input sequence of token IDs.
                It has a shape of (batch_size, seq_length) where batch_size is the number of input sequences and
                seq_length is the length of each sequence.

        Returns:
            None.

        Raises:
            None.
        """
        batch_size, seq_length = input_ids.shape
        context_lengths = [seq.tolist().index(self.config.bos_token_id) for seq in input_ids]
        attention_mask = ops.ones((batch_size, seq_length, seq_length))
        attention_mask = attention_mask.tril()
        for i, context_length in enumerate(context_lengths):
            attention_mask[i, :, :context_length] = 1
        attention_mask = attention_mask.unsqueeze(1)
        attention_mask = (attention_mask < 0.5).bool()

        return attention_mask

    def get_position_ids(self, input_ids, mask_positions, use_gmasks=None):
        '''
        This method calculates the position ids for the given input sequence.

        Args:
            self (ChatGLMPreTrainedModel): An instance of the ChatGLMPreTrainedModel class.
            input_ids (mindspore.Tensor): A 2D tensor of shape (batch_size, seq_length) containing input sequence ids.
            mask_positions (mindspore.Tensor): A 1D tensor of shape (batch_size,) containing mask positions.
            use_gmasks (List[bool], optional): A list of length batch_size indicating whether to use global masks for
                each input sequence. Defaults to None.

        Returns:
            position_ids (mindspore.Tensor): A 2D tensor of shape (batch_size, seq_length) containing the position ids.

        Raises:
            None
        '''
        batch_size, seq_length = input_ids.shape
        if use_gmasks is None:
            use_gmasks = [False] * batch_size
        context_lengths = [seq.tolist().index(self.config.bos_token_id) for seq in input_ids]
        if self.position_encoding_2d:
            position_ids = ops.arange(seq_length, dtype=mindspore.int64).unsqueeze(0).tile((batch_size, 1))
            for i, context_length in enumerate(context_lengths):
                position_ids[i, context_length:] = mask_positions[i]
            block_position_ids = [ops.cat((
                ops.zeros(context_length, dtype=mindspore.int64),
                ops.arange(seq_length - context_length, dtype=mindspore.int64) + 1
            )) for context_length in context_lengths]
            block_position_ids = ops.stack(block_position_ids, axis=0)
            position_ids = ops.stack((position_ids, block_position_ids), axis=1)
        else:
            position_ids = ops.arange(seq_length, dtype=mindspore.int64).unsqueeze(0).tile((batch_size, 1))
            for i, context_length in enumerate(context_lengths):
                if not use_gmasks[i]:
                    position_ids[i, context_length:] = mask_positions[i]

        return position_ids

mindnlp.transformers.models.chatglm.modeling_chatglm.ChatGLMPreTrainedModel.get_masks(input_ids)

This method named 'get_masks' is defined within the class 'ChatGLMPreTrainedModel'. It takes two parameters: self and input_ids.

PARAMETER DESCRIPTION
self

A reference to the instance of the class.

input_ids

A tensor representing the input sequence of token IDs. It has a shape of (batch_size, seq_length) where batch_size is the number of input sequences and seq_length is the length of each sequence.

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/chatglm/modeling_chatglm.py
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
def get_masks(self, input_ids):
    """
    This method named 'get_masks' is defined within the class 'ChatGLMPreTrainedModel'. It takes two parameters: self and input_ids.

    Args:
        self: A reference to the instance of the class.
        input_ids: A tensor representing the input sequence of token IDs.
            It has a shape of (batch_size, seq_length) where batch_size is the number of input sequences and
            seq_length is the length of each sequence.

    Returns:
        None.

    Raises:
        None.
    """
    batch_size, seq_length = input_ids.shape
    context_lengths = [seq.tolist().index(self.config.bos_token_id) for seq in input_ids]
    attention_mask = ops.ones((batch_size, seq_length, seq_length))
    attention_mask = attention_mask.tril()
    for i, context_length in enumerate(context_lengths):
        attention_mask[i, :, :context_length] = 1
    attention_mask = attention_mask.unsqueeze(1)
    attention_mask = (attention_mask < 0.5).bool()

    return attention_mask

mindnlp.transformers.models.chatglm.modeling_chatglm.ChatGLMPreTrainedModel.get_position_ids(input_ids, mask_positions, use_gmasks=None)

This method calculates the position ids for the given input sequence.

PARAMETER DESCRIPTION
self

An instance of the ChatGLMPreTrainedModel class.

TYPE: ChatGLMPreTrainedModel

input_ids

A 2D tensor of shape (batch_size, seq_length) containing input sequence ids.

TYPE: Tensor

mask_positions

A 1D tensor of shape (batch_size,) containing mask positions.

TYPE: Tensor

use_gmasks

A list of length batch_size indicating whether to use global masks for each input sequence. Defaults to None.

TYPE: List[bool] DEFAULT: None

RETURNS DESCRIPTION
position_ids

A 2D tensor of shape (batch_size, seq_length) containing the position ids.

TYPE: Tensor

Source code in mindnlp/transformers/models/chatglm/modeling_chatglm.py
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
def get_position_ids(self, input_ids, mask_positions, use_gmasks=None):
    '''
    This method calculates the position ids for the given input sequence.

    Args:
        self (ChatGLMPreTrainedModel): An instance of the ChatGLMPreTrainedModel class.
        input_ids (mindspore.Tensor): A 2D tensor of shape (batch_size, seq_length) containing input sequence ids.
        mask_positions (mindspore.Tensor): A 1D tensor of shape (batch_size,) containing mask positions.
        use_gmasks (List[bool], optional): A list of length batch_size indicating whether to use global masks for
            each input sequence. Defaults to None.

    Returns:
        position_ids (mindspore.Tensor): A 2D tensor of shape (batch_size, seq_length) containing the position ids.

    Raises:
        None
    '''
    batch_size, seq_length = input_ids.shape
    if use_gmasks is None:
        use_gmasks = [False] * batch_size
    context_lengths = [seq.tolist().index(self.config.bos_token_id) for seq in input_ids]
    if self.position_encoding_2d:
        position_ids = ops.arange(seq_length, dtype=mindspore.int64).unsqueeze(0).tile((batch_size, 1))
        for i, context_length in enumerate(context_lengths):
            position_ids[i, context_length:] = mask_positions[i]
        block_position_ids = [ops.cat((
            ops.zeros(context_length, dtype=mindspore.int64),
            ops.arange(seq_length - context_length, dtype=mindspore.int64) + 1
        )) for context_length in context_lengths]
        block_position_ids = ops.stack(block_position_ids, axis=0)
        position_ids = ops.stack((position_ids, block_position_ids), axis=1)
    else:
        position_ids = ops.arange(seq_length, dtype=mindspore.int64).unsqueeze(0).tile((batch_size, 1))
        for i, context_length in enumerate(context_lengths):
            if not use_gmasks[i]:
                position_ids[i, context_length:] = mask_positions[i]

    return position_ids

mindnlp.transformers.models.chatglm.modeling_chatglm.ChatGLMForConditionalGeneration

Bases: ChatGLMPreTrainedModel

This class represents a ChatGLM model for conditional generation, inheriting from ChatGLMPreTrainedModel.

The class includes methods for initializing the model, updating model keyword arguments for generation, preparing inputs for generation, forwarding the model, reordering cache for beam search or beam sample, processing model responses, and facilitating chat interactions. It also provides methods for streaming chat and generation.

The model allows for quantization with a specified number of bits.

METHOD DESCRIPTION
__init__

Initializes the model with a ChatGLMConfig.

get_output_embeddings

Returns the output embeddings.

set_output_embeddings

Sets new output embeddings.

_update_model_kwargs_for_generation

Updates model keyword arguments for generation.

prepare_inputs_for_generation

Prepares inputs for model generation.

forward

Constructs the model for generation and computes the loss if labels are provided.

_reorder_cache

Reorders the past_key_values cache for beam search or beam sample.

process_response

Processes the model response by replacing tokens and punctuations.

chat

Conducts a chat interaction based on the query and history.

stream_chat

Conducts a streaming chat interaction for continuous conversations.

stream_generate

Generates text in a streaming fashion based on input ids and generation configuration.

quantize

Quantizes the model with a specified number of bits.

For a detailed understanding of the class functionality and methods, refer to the specific method descriptions.

Source code in mindnlp/transformers/models/chatglm/modeling_chatglm.py
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
class ChatGLMForConditionalGeneration(ChatGLMPreTrainedModel):

    """
    This class represents a ChatGLM model for conditional generation, inheriting from ChatGLMPreTrainedModel.

    The class includes methods for initializing the model, updating model keyword arguments for generation,
    preparing inputs for generation, forwarding the model, reordering cache for beam search or beam sample,
    processing model responses, and facilitating chat interactions. It also provides methods for streaming chat and generation.

    The model allows for quantization with a specified number of bits.

    Methods:
        __init__: Initializes the model with a ChatGLMConfig.
        get_output_embeddings: Returns the output embeddings.
        set_output_embeddings: Sets new output embeddings.
        _update_model_kwargs_for_generation: Updates model keyword arguments for generation.
        prepare_inputs_for_generation: Prepares inputs for model generation.
        forward: Constructs the model for generation and computes the loss if labels are provided.
        _reorder_cache: Reorders the past_key_values cache for beam search or beam sample.
        process_response: Processes the model response by replacing tokens and punctuations.
        chat: Conducts a chat interaction based on the query and history.
        stream_chat: Conducts a streaming chat interaction for continuous conversations.
        stream_generate: Generates text in a streaming fashion based on input ids and generation configuration.
        quantize: Quantizes the model with a specified number of bits.

    For a detailed understanding of the class functionality and methods, refer to the specific method descriptions.
    """
    def __init__(self, config: ChatGLMConfig):
        """
        Initializes the ChatGLMForConditionalGeneration class.

        Args:
            self: The object instance itself.
            config (ChatGLMConfig): An instance of ChatGLMConfig containing configuration parameters for the model.

        Returns:
            None.

        Raises:
            TypeError: If the config parameter is not of type ChatGLMConfig.
            ValueError: If the config parameter is missing required attributes.
            AttributeError: If the config object does not have certain expected attributes.
        """
        super().__init__(config)
        # self.hidden_size = config.hidden_size
        # self.params_dtype = mindspore.float16
        # self.vocab_size = config.vocab_size
        self.max_sequence_length = config.max_sequence_length

        self.position_encoding_2d = config.position_encoding_2d

        self.transformer = ChatGLMModel(config)

        self.lm_head = nn.Linear(
            config.hidden_size,
            config.vocab_size,
            bias=False,
            dtype=mindspore.float16
        )

        self.config = config

        self.quantized = False

        if self.config.quantization_bit:
            self.quantize(self.config.quantization_bit)

    def get_output_embeddings(self):
        """
        Get the output embeddings for the ChatGLM model.

        Args:
            self: The instance of the ChatGLMForConditionalGeneration class.

        Returns:
            The output embeddings for the language model head.

        Raises:
            None.
        """
        return self.lm_head

    def set_output_embeddings(self, new_embeddings):
        """
        Method to set new output embeddings for the ChatGLMForConditionalGeneration model.

        Args:
            self (ChatGLMForConditionalGeneration): The instance of the ChatGLMForConditionalGeneration class.
            new_embeddings (Any): The new output embeddings to be set for the model. This can be of any type.

        Returns:
            None.

        Raises:
            None.
        """
        self.lm_head = new_embeddings

    def _update_model_kwargs_for_generation(
        self,
        outputs: ModelOutput,
        model_kwargs: Dict[str, Any],
        is_encoder_decoder: bool = False,
        standardize_cache_format: bool = False,
    ) -> Dict[str, Any]:
        """
        Updates the model keyword arguments for generation.

        Args:
            self (ChatGLMForConditionalGeneration): The instance of the ChatGLMForConditionalGeneration class.
            outputs (ModelOutput): The model output.
            model_kwargs (Dict[str, Any]): The keyword arguments for the model.
            is_encoder_decoder (bool, optional): Indicates if the model is an encoder-decoder model. Defaults to False.
            standardize_cache_format (bool, optional): Indicates if the cache format should be standardized. Defaults to False.

        Returns:
            Dict[str, Any]: The updated model keyword arguments.

        Raises:
            None.
        """
        # update past_key_values
        model_kwargs["past_key_values"] = self._extract_past_from_model_output(
            outputs, standardize_cache_format=standardize_cache_format
        )

        # update attention mask
        if "attention_mask" in model_kwargs:
            attention_mask = model_kwargs["attention_mask"]
            if attention_mask is not None and attention_mask.dtype == mindspore.bool_:
                attention_mask = ops.cat(
                    [attention_mask, attention_mask.new_ones((*attention_mask.shape[:3], 1))], axis=3)
                new_attention_mask = attention_mask[:, :, -1:].copy()
                new_attention_mask[..., -1] = False
                model_kwargs["attention_mask"] = ops.cat(
                    [attention_mask, new_attention_mask], axis=2
                )

        # update position ids
        if "position_ids" in model_kwargs:
            position_ids = model_kwargs["position_ids"]
            new_position_id = position_ids[..., -1:].copy()
            new_position_id[:, 1, :] += 1
            model_kwargs["position_ids"] = ops.cat(
                [position_ids, new_position_id], axis=-1
            )

        return model_kwargs

    def prepare_inputs_for_generation(
            self,
            input_ids: mindspore.Tensor,
            past: Optional[mindspore.Tensor] = None,
            past_key_values: Optional[mindspore.Tensor] = None,
            attention_mask: Optional[mindspore.Tensor] = None,
            position_ids: Optional[mindspore.Tensor] = None,
            **kwargs
    ) -> dict:
        """
        This method prepares inputs for generation in the ChatGLMForConditionalGeneration class.

        Args:
            self (ChatGLMForConditionalGeneration): The instance of the ChatGLMForConditionalGeneration class.
            input_ids (mindspore.Tensor): The input tensor containing the token IDs for the model input.
            past (Optional[mindspore.Tensor]): Optional tensor containing the past states for autoregressive generation.
            past_key_values (Optional[mindspore.Tensor]): Optional tensor containing past key values for efficient decoding.
            attention_mask (Optional[mindspore.Tensor]): Optional tensor specifying which elements in the input should be attended to.
            position_ids (Optional[mindspore.Tensor]): Optional tensor specifying the position IDs for input tokens.

        Returns:
            dict: A dictionary containing the prepared inputs for generation including 'input_ids', 'past_key_values',
                'position_ids', and 'attention_mask'.

        Raises:
            TypeError: If the input arguments are of incorrect types.
            ValueError: If there are issues with the input data or configuration.
            IndexError: If there are indexing errors while processing the input data.
            Warning: If there are warnings related to the attention mask data type.
        """
        _, seq_length = input_ids.shape
        MASK, gMASK = self.config.mask_token_id, self.config.gmask_token_id
        seqs = input_ids.tolist()
        mask_positions, use_gmasks = [], []
        for seq in seqs:
            mask_token = gMASK if gMASK in seq else MASK
            use_gmask = mask_token == gMASK
            mask_positions.append(seq.index(mask_token))
            use_gmasks.append(use_gmask)

        # only last token for input_ids if past is not None
        if past is not None or past_key_values is not None:
            last_token = input_ids[:, -1].unsqueeze(-1)
            if attention_mask is not None and attention_mask.dtype == mindspore.bool_:
                attention_mask = attention_mask[:, :, -1:]
            else:
                attention_mask = None
            if position_ids is not None:
                position_ids = position_ids[..., -1:]
            else:
                context_lengths = [seq.index(self.config.bos_token_id) for seq in seqs]
                if self.position_encoding_2d:
                    position_ids = mindspore.tensor(
                        [[mask_position, seq_length - context_length] for mask_position, context_length in
                         zip(mask_positions, context_lengths)], dtype=mindspore.int64).unsqueeze(-1)
                else:
                    position_ids = mindspore.tensor(mask_positions, dtype=mindspore.int64).unsqueeze(-1)

            if past is None:
                past = past_key_values
            return {
                "input_ids": last_token,
                "past_key_values": past,
                "position_ids": position_ids,
                "attention_mask": attention_mask
            }
        else:
            if attention_mask is not None and attention_mask.dtype != mindspore.bool_:
                logger.warning_once(f"The dtype of attention mask ({attention_mask.dtype}) is not bool")
                attention_mask = None
            if attention_mask is None:
                attention_mask = self.get_masks(
                    input_ids,
                )
            if position_ids is None:
                position_ids = self.get_position_ids(
                    input_ids,
                    mask_positions=mask_positions,
                    use_gmasks=use_gmasks
                )

            return {
                "input_ids": input_ids,
                "past_key_values": past,
                "position_ids": position_ids,
                "attention_mask": attention_mask
            }

    def forward(
            self,
            input_ids: Optional[mindspore.Tensor] = None,
            position_ids: Optional[mindspore.Tensor] = None,
            attention_mask: Optional[mindspore.Tensor] = None,
            past_key_values: Optional[Tuple[mindspore.Tensor]] = None,
            inputs_embeds: Optional[mindspore.Tensor] = None,
            labels: Optional[mindspore.Tensor] = None,
            use_cache: Optional[bool] = None,
            output_attentions: Optional[bool] = None,
            output_hidden_states: Optional[bool] = None,
            return_dict: Optional[bool] = None,
    ):
        """
        Constructs the ChatGLMForConditionalGeneration model.

        Args:
            self (ChatGLMForConditionalGeneration): The instance of the ChatGLMForConditionalGeneration class.
            input_ids (Optional[mindspore.Tensor]):
                The input tensor of shape (batch_size, sequence_length) containing the input IDs.
            position_ids (Optional[mindspore.Tensor]):
                The input tensor of shape (batch_size, sequence_length) containing the position IDs.
            attention_mask (Optional[mindspore.Tensor]):
                The input tensor of shape (batch_size, sequence_length) containing the attention mask.
            past_key_values (Optional[Tuple[mindspore.Tensor]]):
                The input tensor of shape (batch_size, sequence_length) containing the past key values.
            inputs_embeds (Optional[mindspore.Tensor]):
                The input tensor of shape (batch_size, sequence_length, embedding_size) containing the embedded inputs.
            labels (Optional[mindspore.Tensor]):
                The input tensor of shape (batch_size, sequence_length) containing the labels.
            use_cache (Optional[bool]):
                Whether to use cache or not. If not provided, defaults to the value specified in the model's configuration.
            output_attentions (Optional[bool]): Whether to output attentions or not.
            output_hidden_states (Optional[bool]): Whether to output hidden states or not.
            return_dict (Optional[bool]):
                Whether to return a dictionary or not. If not provided, defaults to the value specified in the model's configuration.

        Returns:
            None.

        Raises:
            None.
        """
        use_cache = use_cache if use_cache is not None else self.config.use_cache
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
        transformer_outputs = self.transformer(
            input_ids=input_ids,
            position_ids=position_ids,
            attention_mask=attention_mask,
            past_key_values=past_key_values,
            inputs_embeds=inputs_embeds,
            use_cache=use_cache,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        hidden_states = transformer_outputs[0]

        lm_logits = self.lm_head(hidden_states).permute(1, 0, 2)
        loss = None
        if labels is not None:
            lm_logits = lm_logits.to(mindspore.float32)

            # Shift so that tokens < n predict n
            shift_logits = lm_logits[..., :-1, :]
            shift_labels = labels[..., 1:]
            # Flatten the tokens
            loss = ops.cross_entropy(shift_logits.view(-1, shift_logits.shape[-1]), shift_labels.view(-1), ignore_index=-100)

            lm_logits = lm_logits.to(hidden_states.dtype)
            loss = loss.to(hidden_states.dtype)

        if not return_dict:
            output = (lm_logits,) + transformer_outputs[1:]
            return ((loss,) + output) if loss is not None else output

        return CausalLMOutputWithPast(
            loss=loss,
            logits=lm_logits,
            past_key_values=transformer_outputs.past_key_values,
            hidden_states=transformer_outputs.hidden_states,
            attentions=transformer_outputs.attentions,
        )

    @staticmethod
    def _reorder_cache(
            past: Tuple[Tuple[mindspore.Tensor, mindspore.Tensor], ...], beam_idx: mindspore.Tensor
    ) -> Tuple[Tuple[mindspore.Tensor, mindspore.Tensor], ...]:
        """
        This function is used to re-order the `past_key_values` cache if [`~PreTrainedModel.beam_search`] or
        [`~PreTrainedModel.beam_sample`] is called. This is required to match `past_key_values` with the correct
        beam_idx at every generation step.

        Output shares the same memory storage as `past`.
        """
        return tuple(
            (
                layer_past[0].index_select(1, beam_idx),
                layer_past[1].index_select(1, beam_idx),
            )
            for layer_past in past
        )

    def process_response(self, response):
        """
        Processes the response received from the model.

        Args:
            self (ChatGLMForConditionalGeneration): The instance of the ChatGLMForConditionalGeneration class.
            response (str): The response received from the model.

        Returns:
            None.

        Raises:
            None.
        """
        response = response.strip()
        response = response.replace("[[训练时间]]", "2023年")
        punkts = [
            [",", ","],
            ["!", "!"],
            [":", ":"],
            [";", ";"],
            ["\?", "?"],
        ]
        for item in punkts:
            response = re.sub(r"([\u4e00-\u9fff])%s" % item[0], r"\1%s" % item[1], response)
            response = re.sub(r"%s([\u4e00-\u9fff])" % item[0], r"%s\1" % item[1], response)
        return response

    def chat(self, tokenizer, query: str, history: List[Tuple[str, str]] = None, max_length: int = 2048, num_beams=1,
             do_sample=True, top_p=0.7, temperature=0.95, logits_processor=None, **kwargs):
        """
        This method 'chat' is defined in the class 'ChatGLMForConditionalGeneration' and is used for generating a
        response to a given query using a conditional generation model. It takes the following
        parameters:

        Args:
            self: The instance of the class.
            tokenizer: An instance of a tokenizer that will be used to encode the prompt and decode the generated response.
            query (str): The input query for which a response needs to be generated.
            history (List[Tuple[str, str]], optional):
                A list of tuples containing the previous queries and their corresponding responses. Defaults to None.
            max_length (int, optional): The maximum length of the generated response. Defaults to 2048.
            num_beams (int, optional): Number of beams for beam search. Defaults to 1.
            do_sample (bool, optional): Flag indicating whether to use sampling for generating the response. Defaults to True.
            top_p (float, optional): The nucleus sampling top probability. Defaults to 0.7.
            temperature (float, optional): The temperature parameter for sampling. Defaults to 0.95.
            logits_processor (object, optional): An object for processing the logits. Defaults to None.
            **kwargs: Additional keyword arguments for model generation.

        Returns:
            None:
                This method does not have a specific return value,
                but it generates a response to the input query and updates the history of queries and responses.

        Raises:
            None:
                This method does not explicitly raise any exceptions.
                However, the behavior of the method may be influenced by exceptions raised by the tokenizer or
                the conditional generation model used within the method.
        """
        if history is None:
            history = []
        if logits_processor is None:
            logits_processor = LogitsProcessorList()
        logits_processor.append(InvalidScoreLogitsProcessor())
        gen_kwargs = {"max_length": max_length, "num_beams": num_beams, "do_sample": do_sample, "top_p": top_p,
                      "temperature": temperature, "logits_processor": logits_processor, **kwargs}
        if not history:
            prompt = query
        else:
            prompt = ""
            for i, (old_query, response) in enumerate(history):
                prompt += "[Round {}]\n问:{}\n答:{}\n".format(i, old_query, response)
            prompt += "[Round {}]\n问:{}\n答:".format(len(history), query)
        inputs = tokenizer([prompt], return_tensors="ms")
        outputs = self.generate(**inputs, **gen_kwargs)

        outputs = outputs.tolist()[0][len(inputs["input_ids"][0]):]
        response = tokenizer.decode(outputs)
        response = self.process_response(response)
        history = history + [(query, response)]
        return response, history

    def stream_chat(self, tokenizer, query: str, history: List[Tuple[str, str]] = None, max_length: int = 2048,
                    do_sample=True, top_p=0.7, temperature=0.95, logits_processor=None, **kwargs):
        """
        Stream chat method for generating responses based on a given query and history.

        Args:
            self (ChatGLMForConditionalGeneration): An instance of the ChatGLMForConditionalGeneration class.
            tokenizer: The tokenizer used for tokenizing the input text.
            query (str): The query string for which a response is generated.
            history (List[Tuple[str, str]], optional):
                A list of tuples containing the previous queries and their responses. Defaults to None.
            max_length (int, optional): The maximum length of the generated response. Defaults to 2048.
            do_sample (bool, optional): Whether to use sampling for generating response. Defaults to True.
            top_p (float, optional): The cumulative probability threshold for top-p sampling. Defaults to 0.7.
            temperature (float, optional): The temperature value used for sampling. Defaults to 0.95.
            logits_processor (object, optional):
                An object used for processing logits during response generation. Defaults to None.

        Returns:
            None

        Raises:
            None
        """
        if history is None:
            history = []
        if logits_processor is None:
            logits_processor = LogitsProcessorList()
        logits_processor.append(InvalidScoreLogitsProcessor())
        gen_kwargs = {"max_length": max_length, "do_sample": do_sample, "top_p": top_p,
                      "temperature": temperature, "logits_processor": logits_processor, **kwargs}
        if not history:
            prompt = query
        else:
            prompt = ""
            for i, (old_query, response) in enumerate(history):
                prompt += "[Round {}]\n问:{}\n答:{}\n".format(i, old_query, response)
            prompt += "[Round {}]\n问:{}\n答:".format(len(history), query)
        inputs = tokenizer([prompt], return_tensors="ms")
        for outputs in self.stream_generate(**inputs, **gen_kwargs):
            outputs = outputs.tolist()[0][len(inputs["input_ids"][0]):]
            response = tokenizer.decode(outputs)
            response = self.process_response(response)
            new_history = history + [(query, response)]
            yield response, new_history

    def stream_generate(
            self,
            input_ids,
            generation_config: Optional[GenerationConfig] = None,
            logits_processor: Optional[LogitsProcessorList] = None,
            stopping_criteria: Optional[StoppingCriteriaList] = None,
            prefix_allowed_tokens_fn: Optional[Callable[[int, mindspore.Tensor], List[int]]] = None,
            **kwargs,
    ):
        """
        Generates text using the ChatGLM model.

        Args:
            self (ChatGLMForConditionalGeneration): The instance of the ChatGLMForConditionalGeneration class.
            input_ids (mindspore.Tensor): The input tensor containing the tokenized input sequence.
            generation_config (Optional[GenerationConfig], optional): The configuration for text generation. Defaults to None.
            logits_processor (Optional[LogitsProcessorList], optional): The processor for modifying the logits. Defaults to None.
            stopping_criteria (Optional[StoppingCriteriaList], optional): The criteria for stopping the generation. Defaults to None.
            prefix_allowed_tokens_fn (Optional[Callable[[int, mindspore.Tensor], List[int]]], optional):
                A function that returns the list of allowed tokens for each prefix. Defaults to None.

        Returns:
            None

        Raises:
            UserWarning: If both `max_new_tokens` and `max_length` are set, `max_new_tokens` takes precedence.
            UserWarning: If the input length exceeds the `max_length` limit, it may cause unexpected behavior.
            Other exceptions: Any other exceptions that may occur during the execution of the method.
        """
        _, input_ids_seq_length = input_ids.shape[0], input_ids.shape[-1]

        if generation_config is None:
            generation_config = self.generation_config
        generation_config = copy.deepcopy(generation_config)
        model_kwargs = generation_config.update(**kwargs)
        _, eos_token_id = generation_config.bos_token_id, generation_config.eos_token_id

        if isinstance(eos_token_id, int):
            eos_token_id = [eos_token_id]

        has_default_max_length = kwargs.get("max_length") is None and generation_config.max_length is not None
        if has_default_max_length and generation_config.max_new_tokens is None:
            warnings.warn(
                f"Using `max_length`'s default ({generation_config.max_length}) to control the generation length. "
                "This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we"
                " recommend using `max_new_tokens` to control the maximum length of the generation.",
                UserWarning,
            )
        elif generation_config.max_new_tokens is not None:
            generation_config.max_length = generation_config.max_new_tokens + input_ids_seq_length
            if not has_default_max_length:
                logger.warn(
                    f"Both `max_new_tokens` (={generation_config.max_new_tokens}) and `max_length`(="
                    f"{generation_config.max_length}) seem to have been set. `max_new_tokens` will take precedence. "
                    "Please refer to the documentation for more information. "
                    "(https://hf-mirror.com/docs/transformers/main/en/main_classes/text_generation)",
                    UserWarning,
                )

        if input_ids_seq_length >= generation_config.max_length:
            input_ids_string = "decoder_input_ids" if self.config.is_encoder_decoder else "input_ids"
            logger.warning(
                f"Input length of {input_ids_string} is {input_ids_seq_length}, but `max_length` is set to"
                f" {generation_config.max_length}. This can lead to unexpected behavior. You should consider"
                " increasing `max_new_tokens`."
            )

        # 2. Set generation parameters if not already defined
        logits_processor = logits_processor if logits_processor is not None else LogitsProcessorList()
        stopping_criteria = stopping_criteria if stopping_criteria is not None else StoppingCriteriaList()

        logits_processor = self._get_logits_processor(
            generation_config=generation_config,
            input_ids_seq_length=input_ids_seq_length,
            encoder_input_ids=input_ids,
            prefix_allowed_tokens_fn=prefix_allowed_tokens_fn,
            logits_processor=logits_processor,
        )

        stopping_criteria = self._get_stopping_criteria(
            generation_config=generation_config, stopping_criteria=stopping_criteria
        )
        logits_warper = self._get_logits_warper(generation_config)

        unfinished_sequences = ops.ones(input_ids.shape[0], dtype=input_ids.dtype)
        scores = None
        while True:
            model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
            # forward pass to get next token
            outputs = self(
                **model_inputs,
                return_dict=True,
                output_attentions=False,
                output_hidden_states=False,
            )

            next_token_logits = outputs.logits[:, -1, :]

            # pre-process distribution
            next_token_scores = logits_processor(input_ids, next_token_logits)
            next_token_scores = logits_warper(input_ids, next_token_scores)

            # sample
            probs = ops.softmax(next_token_scores, axis=-1)
            if generation_config.do_sample:
                next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
            else:
                next_tokens = ops.argmax(probs, dim=-1)

            # update generated ids, model inputs, and length for next step
            input_ids = ops.cat([input_ids, next_tokens[:, None]], axis=-1)
            model_kwargs = self._update_model_kwargs_for_generation(
                outputs, model_kwargs, is_encoder_decoder=self.config.is_encoder_decoder
            )
            unfinished_sequences = unfinished_sequences.mul((sum(next_tokens != i for i in eos_token_id)).long())

            # stop when each sentence is finished, or if we exceed the maximum length
            if unfinished_sequences.max() == 0 or stopping_criteria(input_ids, scores):
                break
            yield input_ids

    def quantize(self, bits: int, **kwargs):
        """
        Perform quantization on the input data.

        Args:
            self (ChatGLMForConditionalGeneration): An instance of the ChatGLMForConditionalGeneration class.
            bits (int): The number of bits to quantize the data to. Must be a positive integer.

        Returns:
            None.

        Raises:
            None.
        """

mindnlp.transformers.models.chatglm.modeling_chatglm.ChatGLMForConditionalGeneration.__init__(config)

Initializes the ChatGLMForConditionalGeneration class.

PARAMETER DESCRIPTION
self

The object instance itself.

config

An instance of ChatGLMConfig containing configuration parameters for the model.

TYPE: ChatGLMConfig

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
TypeError

If the config parameter is not of type ChatGLMConfig.

ValueError

If the config parameter is missing required attributes.

AttributeError

If the config object does not have certain expected attributes.

Source code in mindnlp/transformers/models/chatglm/modeling_chatglm.py
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
def __init__(self, config: ChatGLMConfig):
    """
    Initializes the ChatGLMForConditionalGeneration class.

    Args:
        self: The object instance itself.
        config (ChatGLMConfig): An instance of ChatGLMConfig containing configuration parameters for the model.

    Returns:
        None.

    Raises:
        TypeError: If the config parameter is not of type ChatGLMConfig.
        ValueError: If the config parameter is missing required attributes.
        AttributeError: If the config object does not have certain expected attributes.
    """
    super().__init__(config)
    # self.hidden_size = config.hidden_size
    # self.params_dtype = mindspore.float16
    # self.vocab_size = config.vocab_size
    self.max_sequence_length = config.max_sequence_length

    self.position_encoding_2d = config.position_encoding_2d

    self.transformer = ChatGLMModel(config)

    self.lm_head = nn.Linear(
        config.hidden_size,
        config.vocab_size,
        bias=False,
        dtype=mindspore.float16
    )

    self.config = config

    self.quantized = False

    if self.config.quantization_bit:
        self.quantize(self.config.quantization_bit)

mindnlp.transformers.models.chatglm.modeling_chatglm.ChatGLMForConditionalGeneration.chat(tokenizer, query, history=None, max_length=2048, num_beams=1, do_sample=True, top_p=0.7, temperature=0.95, logits_processor=None, **kwargs)

This method 'chat' is defined in the class 'ChatGLMForConditionalGeneration' and is used for generating a response to a given query using a conditional generation model. It takes the following parameters:

PARAMETER DESCRIPTION
self

The instance of the class.

tokenizer

An instance of a tokenizer that will be used to encode the prompt and decode the generated response.

query

The input query for which a response needs to be generated.

TYPE: str

history

A list of tuples containing the previous queries and their corresponding responses. Defaults to None.

TYPE: List[Tuple[str, str]] DEFAULT: None

max_length

The maximum length of the generated response. Defaults to 2048.

TYPE: int DEFAULT: 2048

num_beams

Number of beams for beam search. Defaults to 1.

TYPE: int DEFAULT: 1

do_sample

Flag indicating whether to use sampling for generating the response. Defaults to True.

TYPE: bool DEFAULT: True

top_p

The nucleus sampling top probability. Defaults to 0.7.

TYPE: float DEFAULT: 0.7

temperature

The temperature parameter for sampling. Defaults to 0.95.

TYPE: float DEFAULT: 0.95

logits_processor

An object for processing the logits. Defaults to None.

TYPE: object DEFAULT: None

**kwargs

Additional keyword arguments for model generation.

DEFAULT: {}

RETURNS DESCRIPTION
None

This method does not have a specific return value, but it generates a response to the input query and updates the history of queries and responses.

RAISES DESCRIPTION
None

This method does not explicitly raise any exceptions. However, the behavior of the method may be influenced by exceptions raised by the tokenizer or the conditional generation model used within the method.

Source code in mindnlp/transformers/models/chatglm/modeling_chatglm.py
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
def chat(self, tokenizer, query: str, history: List[Tuple[str, str]] = None, max_length: int = 2048, num_beams=1,
         do_sample=True, top_p=0.7, temperature=0.95, logits_processor=None, **kwargs):
    """
    This method 'chat' is defined in the class 'ChatGLMForConditionalGeneration' and is used for generating a
    response to a given query using a conditional generation model. It takes the following
    parameters:

    Args:
        self: The instance of the class.
        tokenizer: An instance of a tokenizer that will be used to encode the prompt and decode the generated response.
        query (str): The input query for which a response needs to be generated.
        history (List[Tuple[str, str]], optional):
            A list of tuples containing the previous queries and their corresponding responses. Defaults to None.
        max_length (int, optional): The maximum length of the generated response. Defaults to 2048.
        num_beams (int, optional): Number of beams for beam search. Defaults to 1.
        do_sample (bool, optional): Flag indicating whether to use sampling for generating the response. Defaults to True.
        top_p (float, optional): The nucleus sampling top probability. Defaults to 0.7.
        temperature (float, optional): The temperature parameter for sampling. Defaults to 0.95.
        logits_processor (object, optional): An object for processing the logits. Defaults to None.
        **kwargs: Additional keyword arguments for model generation.

    Returns:
        None:
            This method does not have a specific return value,
            but it generates a response to the input query and updates the history of queries and responses.

    Raises:
        None:
            This method does not explicitly raise any exceptions.
            However, the behavior of the method may be influenced by exceptions raised by the tokenizer or
            the conditional generation model used within the method.
    """
    if history is None:
        history = []
    if logits_processor is None:
        logits_processor = LogitsProcessorList()
    logits_processor.append(InvalidScoreLogitsProcessor())
    gen_kwargs = {"max_length": max_length, "num_beams": num_beams, "do_sample": do_sample, "top_p": top_p,
                  "temperature": temperature, "logits_processor": logits_processor, **kwargs}
    if not history:
        prompt = query
    else:
        prompt = ""
        for i, (old_query, response) in enumerate(history):
            prompt += "[Round {}]\n问:{}\n答:{}\n".format(i, old_query, response)
        prompt += "[Round {}]\n问:{}\n答:".format(len(history), query)
    inputs = tokenizer([prompt], return_tensors="ms")
    outputs = self.generate(**inputs, **gen_kwargs)

    outputs = outputs.tolist()[0][len(inputs["input_ids"][0]):]
    response = tokenizer.decode(outputs)
    response = self.process_response(response)
    history = history + [(query, response)]
    return response, history

mindnlp.transformers.models.chatglm.modeling_chatglm.ChatGLMForConditionalGeneration.forward(input_ids=None, position_ids=None, attention_mask=None, past_key_values=None, inputs_embeds=None, labels=None, use_cache=None, output_attentions=None, output_hidden_states=None, return_dict=None)

Constructs the ChatGLMForConditionalGeneration model.

PARAMETER DESCRIPTION
self

The instance of the ChatGLMForConditionalGeneration class.

TYPE: ChatGLMForConditionalGeneration

input_ids

The input tensor of shape (batch_size, sequence_length) containing the input IDs.

TYPE: Optional[Tensor] DEFAULT: None

position_ids

The input tensor of shape (batch_size, sequence_length) containing the position IDs.

TYPE: Optional[Tensor] DEFAULT: None

attention_mask

The input tensor of shape (batch_size, sequence_length) containing the attention mask.

TYPE: Optional[Tensor] DEFAULT: None

past_key_values

The input tensor of shape (batch_size, sequence_length) containing the past key values.

TYPE: Optional[Tuple[Tensor]] DEFAULT: None

inputs_embeds

The input tensor of shape (batch_size, sequence_length, embedding_size) containing the embedded inputs.

TYPE: Optional[Tensor] DEFAULT: None

labels

The input tensor of shape (batch_size, sequence_length) containing the labels.

TYPE: Optional[Tensor] DEFAULT: None

use_cache

Whether to use cache or not. If not provided, defaults to the value specified in the model's configuration.

TYPE: Optional[bool] DEFAULT: None

output_attentions

Whether to output attentions or not.

TYPE: Optional[bool] DEFAULT: None

output_hidden_states

Whether to output hidden states or not.

TYPE: Optional[bool] DEFAULT: None

return_dict

Whether to return a dictionary or not. If not provided, defaults to the value specified in the model's configuration.

TYPE: Optional[bool] DEFAULT: None

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/chatglm/modeling_chatglm.py
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        past_key_values: Optional[Tuple[mindspore.Tensor]] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
        labels: Optional[mindspore.Tensor] = None,
        use_cache: Optional[bool] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
):
    """
    Constructs the ChatGLMForConditionalGeneration model.

    Args:
        self (ChatGLMForConditionalGeneration): The instance of the ChatGLMForConditionalGeneration class.
        input_ids (Optional[mindspore.Tensor]):
            The input tensor of shape (batch_size, sequence_length) containing the input IDs.
        position_ids (Optional[mindspore.Tensor]):
            The input tensor of shape (batch_size, sequence_length) containing the position IDs.
        attention_mask (Optional[mindspore.Tensor]):
            The input tensor of shape (batch_size, sequence_length) containing the attention mask.
        past_key_values (Optional[Tuple[mindspore.Tensor]]):
            The input tensor of shape (batch_size, sequence_length) containing the past key values.
        inputs_embeds (Optional[mindspore.Tensor]):
            The input tensor of shape (batch_size, sequence_length, embedding_size) containing the embedded inputs.
        labels (Optional[mindspore.Tensor]):
            The input tensor of shape (batch_size, sequence_length) containing the labels.
        use_cache (Optional[bool]):
            Whether to use cache or not. If not provided, defaults to the value specified in the model's configuration.
        output_attentions (Optional[bool]): Whether to output attentions or not.
        output_hidden_states (Optional[bool]): Whether to output hidden states or not.
        return_dict (Optional[bool]):
            Whether to return a dictionary or not. If not provided, defaults to the value specified in the model's configuration.

    Returns:
        None.

    Raises:
        None.
    """
    use_cache = use_cache if use_cache is not None else self.config.use_cache
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict
    transformer_outputs = self.transformer(
        input_ids=input_ids,
        position_ids=position_ids,
        attention_mask=attention_mask,
        past_key_values=past_key_values,
        inputs_embeds=inputs_embeds,
        use_cache=use_cache,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
    )

    hidden_states = transformer_outputs[0]

    lm_logits = self.lm_head(hidden_states).permute(1, 0, 2)
    loss = None
    if labels is not None:
        lm_logits = lm_logits.to(mindspore.float32)

        # Shift so that tokens < n predict n
        shift_logits = lm_logits[..., :-1, :]
        shift_labels = labels[..., 1:]
        # Flatten the tokens
        loss = ops.cross_entropy(shift_logits.view(-1, shift_logits.shape[-1]), shift_labels.view(-1), ignore_index=-100)

        lm_logits = lm_logits.to(hidden_states.dtype)
        loss = loss.to(hidden_states.dtype)

    if not return_dict:
        output = (lm_logits,) + transformer_outputs[1:]
        return ((loss,) + output) if loss is not None else output

    return CausalLMOutputWithPast(
        loss=loss,
        logits=lm_logits,
        past_key_values=transformer_outputs.past_key_values,
        hidden_states=transformer_outputs.hidden_states,
        attentions=transformer_outputs.attentions,
    )

mindnlp.transformers.models.chatglm.modeling_chatglm.ChatGLMForConditionalGeneration.get_output_embeddings()

Get the output embeddings for the ChatGLM model.

PARAMETER DESCRIPTION
self

The instance of the ChatGLMForConditionalGeneration class.

RETURNS DESCRIPTION

The output embeddings for the language model head.

Source code in mindnlp/transformers/models/chatglm/modeling_chatglm.py
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
def get_output_embeddings(self):
    """
    Get the output embeddings for the ChatGLM model.

    Args:
        self: The instance of the ChatGLMForConditionalGeneration class.

    Returns:
        The output embeddings for the language model head.

    Raises:
        None.
    """
    return self.lm_head

mindnlp.transformers.models.chatglm.modeling_chatglm.ChatGLMForConditionalGeneration.prepare_inputs_for_generation(input_ids, past=None, past_key_values=None, attention_mask=None, position_ids=None, **kwargs)

This method prepares inputs for generation in the ChatGLMForConditionalGeneration class.

PARAMETER DESCRIPTION
self

The instance of the ChatGLMForConditionalGeneration class.

TYPE: ChatGLMForConditionalGeneration

input_ids

The input tensor containing the token IDs for the model input.

TYPE: Tensor

past

Optional tensor containing the past states for autoregressive generation.

TYPE: Optional[Tensor] DEFAULT: None

past_key_values

Optional tensor containing past key values for efficient decoding.

TYPE: Optional[Tensor] DEFAULT: None

attention_mask

Optional tensor specifying which elements in the input should be attended to.

TYPE: Optional[Tensor] DEFAULT: None

position_ids

Optional tensor specifying the position IDs for input tokens.

TYPE: Optional[Tensor] DEFAULT: None

RETURNS DESCRIPTION
dict

A dictionary containing the prepared inputs for generation including 'input_ids', 'past_key_values', 'position_ids', and 'attention_mask'.

TYPE: dict

RAISES DESCRIPTION
TypeError

If the input arguments are of incorrect types.

ValueError

If there are issues with the input data or configuration.

IndexError

If there are indexing errors while processing the input data.

Warning

If there are warnings related to the attention mask data type.

Source code in mindnlp/transformers/models/chatglm/modeling_chatglm.py
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
def prepare_inputs_for_generation(
        self,
        input_ids: mindspore.Tensor,
        past: Optional[mindspore.Tensor] = None,
        past_key_values: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        **kwargs
) -> dict:
    """
    This method prepares inputs for generation in the ChatGLMForConditionalGeneration class.

    Args:
        self (ChatGLMForConditionalGeneration): The instance of the ChatGLMForConditionalGeneration class.
        input_ids (mindspore.Tensor): The input tensor containing the token IDs for the model input.
        past (Optional[mindspore.Tensor]): Optional tensor containing the past states for autoregressive generation.
        past_key_values (Optional[mindspore.Tensor]): Optional tensor containing past key values for efficient decoding.
        attention_mask (Optional[mindspore.Tensor]): Optional tensor specifying which elements in the input should be attended to.
        position_ids (Optional[mindspore.Tensor]): Optional tensor specifying the position IDs for input tokens.

    Returns:
        dict: A dictionary containing the prepared inputs for generation including 'input_ids', 'past_key_values',
            'position_ids', and 'attention_mask'.

    Raises:
        TypeError: If the input arguments are of incorrect types.
        ValueError: If there are issues with the input data or configuration.
        IndexError: If there are indexing errors while processing the input data.
        Warning: If there are warnings related to the attention mask data type.
    """
    _, seq_length = input_ids.shape
    MASK, gMASK = self.config.mask_token_id, self.config.gmask_token_id
    seqs = input_ids.tolist()
    mask_positions, use_gmasks = [], []
    for seq in seqs:
        mask_token = gMASK if gMASK in seq else MASK
        use_gmask = mask_token == gMASK
        mask_positions.append(seq.index(mask_token))
        use_gmasks.append(use_gmask)

    # only last token for input_ids if past is not None
    if past is not None or past_key_values is not None:
        last_token = input_ids[:, -1].unsqueeze(-1)
        if attention_mask is not None and attention_mask.dtype == mindspore.bool_:
            attention_mask = attention_mask[:, :, -1:]
        else:
            attention_mask = None
        if position_ids is not None:
            position_ids = position_ids[..., -1:]
        else:
            context_lengths = [seq.index(self.config.bos_token_id) for seq in seqs]
            if self.position_encoding_2d:
                position_ids = mindspore.tensor(
                    [[mask_position, seq_length - context_length] for mask_position, context_length in
                     zip(mask_positions, context_lengths)], dtype=mindspore.int64).unsqueeze(-1)
            else:
                position_ids = mindspore.tensor(mask_positions, dtype=mindspore.int64).unsqueeze(-1)

        if past is None:
            past = past_key_values
        return {
            "input_ids": last_token,
            "past_key_values": past,
            "position_ids": position_ids,
            "attention_mask": attention_mask
        }
    else:
        if attention_mask is not None and attention_mask.dtype != mindspore.bool_:
            logger.warning_once(f"The dtype of attention mask ({attention_mask.dtype}) is not bool")
            attention_mask = None
        if attention_mask is None:
            attention_mask = self.get_masks(
                input_ids,
            )
        if position_ids is None:
            position_ids = self.get_position_ids(
                input_ids,
                mask_positions=mask_positions,
                use_gmasks=use_gmasks
            )

        return {
            "input_ids": input_ids,
            "past_key_values": past,
            "position_ids": position_ids,
            "attention_mask": attention_mask
        }

mindnlp.transformers.models.chatglm.modeling_chatglm.ChatGLMForConditionalGeneration.process_response(response)

Processes the response received from the model.

PARAMETER DESCRIPTION
self

The instance of the ChatGLMForConditionalGeneration class.

TYPE: ChatGLMForConditionalGeneration

response

The response received from the model.

TYPE: str

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/chatglm/modeling_chatglm.py
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
def process_response(self, response):
    """
    Processes the response received from the model.

    Args:
        self (ChatGLMForConditionalGeneration): The instance of the ChatGLMForConditionalGeneration class.
        response (str): The response received from the model.

    Returns:
        None.

    Raises:
        None.
    """
    response = response.strip()
    response = response.replace("[[训练时间]]", "2023年")
    punkts = [
        [",", ","],
        ["!", "!"],
        [":", ":"],
        [";", ";"],
        ["\?", "?"],
    ]
    for item in punkts:
        response = re.sub(r"([\u4e00-\u9fff])%s" % item[0], r"\1%s" % item[1], response)
        response = re.sub(r"%s([\u4e00-\u9fff])" % item[0], r"%s\1" % item[1], response)
    return response

mindnlp.transformers.models.chatglm.modeling_chatglm.ChatGLMForConditionalGeneration.quantize(bits, **kwargs)

Perform quantization on the input data.

PARAMETER DESCRIPTION
self

An instance of the ChatGLMForConditionalGeneration class.

TYPE: ChatGLMForConditionalGeneration

bits

The number of bits to quantize the data to. Must be a positive integer.

TYPE: int

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/chatglm/modeling_chatglm.py
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
def quantize(self, bits: int, **kwargs):
    """
    Perform quantization on the input data.

    Args:
        self (ChatGLMForConditionalGeneration): An instance of the ChatGLMForConditionalGeneration class.
        bits (int): The number of bits to quantize the data to. Must be a positive integer.

    Returns:
        None.

    Raises:
        None.
    """

mindnlp.transformers.models.chatglm.modeling_chatglm.ChatGLMForConditionalGeneration.set_output_embeddings(new_embeddings)

Method to set new output embeddings for the ChatGLMForConditionalGeneration model.

PARAMETER DESCRIPTION
self

The instance of the ChatGLMForConditionalGeneration class.

TYPE: ChatGLMForConditionalGeneration

new_embeddings

The new output embeddings to be set for the model. This can be of any type.

TYPE: Any

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/chatglm/modeling_chatglm.py
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
def set_output_embeddings(self, new_embeddings):
    """
    Method to set new output embeddings for the ChatGLMForConditionalGeneration model.

    Args:
        self (ChatGLMForConditionalGeneration): The instance of the ChatGLMForConditionalGeneration class.
        new_embeddings (Any): The new output embeddings to be set for the model. This can be of any type.

    Returns:
        None.

    Raises:
        None.
    """
    self.lm_head = new_embeddings

mindnlp.transformers.models.chatglm.modeling_chatglm.ChatGLMForConditionalGeneration.stream_chat(tokenizer, query, history=None, max_length=2048, do_sample=True, top_p=0.7, temperature=0.95, logits_processor=None, **kwargs)

Stream chat method for generating responses based on a given query and history.

PARAMETER DESCRIPTION
self

An instance of the ChatGLMForConditionalGeneration class.

TYPE: ChatGLMForConditionalGeneration

tokenizer

The tokenizer used for tokenizing the input text.

query

The query string for which a response is generated.

TYPE: str

history

A list of tuples containing the previous queries and their responses. Defaults to None.

TYPE: List[Tuple[str, str]] DEFAULT: None

max_length

The maximum length of the generated response. Defaults to 2048.

TYPE: int DEFAULT: 2048

do_sample

Whether to use sampling for generating response. Defaults to True.

TYPE: bool DEFAULT: True

top_p

The cumulative probability threshold for top-p sampling. Defaults to 0.7.

TYPE: float DEFAULT: 0.7

temperature

The temperature value used for sampling. Defaults to 0.95.

TYPE: float DEFAULT: 0.95

logits_processor

An object used for processing logits during response generation. Defaults to None.

TYPE: object DEFAULT: None

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/chatglm/modeling_chatglm.py
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
def stream_chat(self, tokenizer, query: str, history: List[Tuple[str, str]] = None, max_length: int = 2048,
                do_sample=True, top_p=0.7, temperature=0.95, logits_processor=None, **kwargs):
    """
    Stream chat method for generating responses based on a given query and history.

    Args:
        self (ChatGLMForConditionalGeneration): An instance of the ChatGLMForConditionalGeneration class.
        tokenizer: The tokenizer used for tokenizing the input text.
        query (str): The query string for which a response is generated.
        history (List[Tuple[str, str]], optional):
            A list of tuples containing the previous queries and their responses. Defaults to None.
        max_length (int, optional): The maximum length of the generated response. Defaults to 2048.
        do_sample (bool, optional): Whether to use sampling for generating response. Defaults to True.
        top_p (float, optional): The cumulative probability threshold for top-p sampling. Defaults to 0.7.
        temperature (float, optional): The temperature value used for sampling. Defaults to 0.95.
        logits_processor (object, optional):
            An object used for processing logits during response generation. Defaults to None.

    Returns:
        None

    Raises:
        None
    """
    if history is None:
        history = []
    if logits_processor is None:
        logits_processor = LogitsProcessorList()
    logits_processor.append(InvalidScoreLogitsProcessor())
    gen_kwargs = {"max_length": max_length, "do_sample": do_sample, "top_p": top_p,
                  "temperature": temperature, "logits_processor": logits_processor, **kwargs}
    if not history:
        prompt = query
    else:
        prompt = ""
        for i, (old_query, response) in enumerate(history):
            prompt += "[Round {}]\n问:{}\n答:{}\n".format(i, old_query, response)
        prompt += "[Round {}]\n问:{}\n答:".format(len(history), query)
    inputs = tokenizer([prompt], return_tensors="ms")
    for outputs in self.stream_generate(**inputs, **gen_kwargs):
        outputs = outputs.tolist()[0][len(inputs["input_ids"][0]):]
        response = tokenizer.decode(outputs)
        response = self.process_response(response)
        new_history = history + [(query, response)]
        yield response, new_history

mindnlp.transformers.models.chatglm.modeling_chatglm.ChatGLMForConditionalGeneration.stream_generate(input_ids, generation_config=None, logits_processor=None, stopping_criteria=None, prefix_allowed_tokens_fn=None, **kwargs)

Generates text using the ChatGLM model.

PARAMETER DESCRIPTION
self

The instance of the ChatGLMForConditionalGeneration class.

TYPE: ChatGLMForConditionalGeneration

input_ids

The input tensor containing the tokenized input sequence.

TYPE: Tensor

generation_config

The configuration for text generation. Defaults to None.

TYPE: Optional[GenerationConfig] DEFAULT: None

logits_processor

The processor for modifying the logits. Defaults to None.

TYPE: Optional[LogitsProcessorList] DEFAULT: None

stopping_criteria

The criteria for stopping the generation. Defaults to None.

TYPE: Optional[StoppingCriteriaList] DEFAULT: None

prefix_allowed_tokens_fn

A function that returns the list of allowed tokens for each prefix. Defaults to None.

TYPE: Optional[Callable[[int, Tensor], List[int]]] DEFAULT: None

RETURNS DESCRIPTION

None

RAISES DESCRIPTION
UserWarning

If both max_new_tokens and max_length are set, max_new_tokens takes precedence.

UserWarning

If the input length exceeds the max_length limit, it may cause unexpected behavior.

Other exceptions

Any other exceptions that may occur during the execution of the method.

Source code in mindnlp/transformers/models/chatglm/modeling_chatglm.py
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
def stream_generate(
        self,
        input_ids,
        generation_config: Optional[GenerationConfig] = None,
        logits_processor: Optional[LogitsProcessorList] = None,
        stopping_criteria: Optional[StoppingCriteriaList] = None,
        prefix_allowed_tokens_fn: Optional[Callable[[int, mindspore.Tensor], List[int]]] = None,
        **kwargs,
):
    """
    Generates text using the ChatGLM model.

    Args:
        self (ChatGLMForConditionalGeneration): The instance of the ChatGLMForConditionalGeneration class.
        input_ids (mindspore.Tensor): The input tensor containing the tokenized input sequence.
        generation_config (Optional[GenerationConfig], optional): The configuration for text generation. Defaults to None.
        logits_processor (Optional[LogitsProcessorList], optional): The processor for modifying the logits. Defaults to None.
        stopping_criteria (Optional[StoppingCriteriaList], optional): The criteria for stopping the generation. Defaults to None.
        prefix_allowed_tokens_fn (Optional[Callable[[int, mindspore.Tensor], List[int]]], optional):
            A function that returns the list of allowed tokens for each prefix. Defaults to None.

    Returns:
        None

    Raises:
        UserWarning: If both `max_new_tokens` and `max_length` are set, `max_new_tokens` takes precedence.
        UserWarning: If the input length exceeds the `max_length` limit, it may cause unexpected behavior.
        Other exceptions: Any other exceptions that may occur during the execution of the method.
    """
    _, input_ids_seq_length = input_ids.shape[0], input_ids.shape[-1]

    if generation_config is None:
        generation_config = self.generation_config
    generation_config = copy.deepcopy(generation_config)
    model_kwargs = generation_config.update(**kwargs)
    _, eos_token_id = generation_config.bos_token_id, generation_config.eos_token_id

    if isinstance(eos_token_id, int):
        eos_token_id = [eos_token_id]

    has_default_max_length = kwargs.get("max_length") is None and generation_config.max_length is not None
    if has_default_max_length and generation_config.max_new_tokens is None:
        warnings.warn(
            f"Using `max_length`'s default ({generation_config.max_length}) to control the generation length. "
            "This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we"
            " recommend using `max_new_tokens` to control the maximum length of the generation.",
            UserWarning,
        )
    elif generation_config.max_new_tokens is not None:
        generation_config.max_length = generation_config.max_new_tokens + input_ids_seq_length
        if not has_default_max_length:
            logger.warn(
                f"Both `max_new_tokens` (={generation_config.max_new_tokens}) and `max_length`(="
                f"{generation_config.max_length}) seem to have been set. `max_new_tokens` will take precedence. "
                "Please refer to the documentation for more information. "
                "(https://hf-mirror.com/docs/transformers/main/en/main_classes/text_generation)",
                UserWarning,
            )

    if input_ids_seq_length >= generation_config.max_length:
        input_ids_string = "decoder_input_ids" if self.config.is_encoder_decoder else "input_ids"
        logger.warning(
            f"Input length of {input_ids_string} is {input_ids_seq_length}, but `max_length` is set to"
            f" {generation_config.max_length}. This can lead to unexpected behavior. You should consider"
            " increasing `max_new_tokens`."
        )

    # 2. Set generation parameters if not already defined
    logits_processor = logits_processor if logits_processor is not None else LogitsProcessorList()
    stopping_criteria = stopping_criteria if stopping_criteria is not None else StoppingCriteriaList()

    logits_processor = self._get_logits_processor(
        generation_config=generation_config,
        input_ids_seq_length=input_ids_seq_length,
        encoder_input_ids=input_ids,
        prefix_allowed_tokens_fn=prefix_allowed_tokens_fn,
        logits_processor=logits_processor,
    )

    stopping_criteria = self._get_stopping_criteria(
        generation_config=generation_config, stopping_criteria=stopping_criteria
    )
    logits_warper = self._get_logits_warper(generation_config)

    unfinished_sequences = ops.ones(input_ids.shape[0], dtype=input_ids.dtype)
    scores = None
    while True:
        model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
        # forward pass to get next token
        outputs = self(
            **model_inputs,
            return_dict=True,
            output_attentions=False,
            output_hidden_states=False,
        )

        next_token_logits = outputs.logits[:, -1, :]

        # pre-process distribution
        next_token_scores = logits_processor(input_ids, next_token_logits)
        next_token_scores = logits_warper(input_ids, next_token_scores)

        # sample
        probs = ops.softmax(next_token_scores, axis=-1)
        if generation_config.do_sample:
            next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
        else:
            next_tokens = ops.argmax(probs, dim=-1)

        # update generated ids, model inputs, and length for next step
        input_ids = ops.cat([input_ids, next_tokens[:, None]], axis=-1)
        model_kwargs = self._update_model_kwargs_for_generation(
            outputs, model_kwargs, is_encoder_decoder=self.config.is_encoder_decoder
        )
        unfinished_sequences = unfinished_sequences.mul((sum(next_tokens != i for i in eos_token_id)).long())

        # stop when each sentence is finished, or if we exceed the maximum length
        if unfinished_sequences.max() == 0 or stopping_criteria(input_ids, scores):
            break
        yield input_ids

mindnlp.transformers.models.chatglm.configuration_chatglm.ChatGLMConfig

Bases: PretrainedConfig

This is the configuration class to store the configuration of a [~ChatGLMModel]. It is used to instantiate an ChatGLM model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the ChatGLM-6B THUDM/ChatGLM-6B architecture.

Configuration objects inherit from [PretrainedConfig] and can be used to control the model outputs. Read the documentation from [PretrainedConfig] for more information.

PARAMETER DESCRIPTION
vocab_size

Vocabulary size of the ChatGLM-6B model. Defines the number of different tokens that can be represented by the inputs_ids passed when calling [~ChatGLMModel] or [~TFChatGLMModel].

TYPE: `int`, *optional*, defaults to 150528 DEFAULT: 150528

hidden_size

Dimension of the encoder layers and the pooler layer.

TYPE: `int`, *optional*, defaults to 4096 DEFAULT: 4096

num_hidden_layers

Number of hidden layers in the Transformer encoder.

TYPE: `int`, *optional*, defaults to 28

num_attention_heads

Number of attention heads for each attention layer in the Transformer encoder.

TYPE: `int`, *optional*, defaults to 32 DEFAULT: 32

inner_hidden_size

Dimension of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder.

TYPE: `int`, *optional*, defaults to 16384 DEFAULT: 16384

max_sequence_length

The maximum sequence length that this model might ever be used with. Typically set this to something large just in case (e.g., 512 or 1024 or 2048).

TYPE: `int`, *optional*, defaults to 512 DEFAULT: 2048

layernorm_epsilon

The epsilon used by the layer normalization layers.

TYPE: `float`, *optional*, defaults to 1e-5 DEFAULT: 1e-05

use_cache

Whether the model should return the last key/values attentions (not used by all models).

TYPE: `bool`, *optional*, defaults to `True` DEFAULT: False

Example
>>> from configuration_chatglm import ChatGLMConfig
>>> from modeling_chatglm import ChatGLMModel
...
>>> # Initializing a ChatGLM-6B THUDM/ChatGLM-6B style configuration
>>> configuration = ChatGLMConfig()
...
>>> # Initializing a model from the THUDM/ChatGLM-6B style configuration
>>> model = ChatGLMModel(configuration)
...
>>> # Accessing the model configuration
>>> configuration = model.config
Source code in mindnlp/transformers/models/chatglm/configuration_chatglm.py
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
class ChatGLMConfig(PretrainedConfig):
    r"""
    This is the configuration class to store the configuration of a [`~ChatGLMModel`].
    It is used to instantiate an ChatGLM model according to the specified arguments, defining the model
    architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of
    the ChatGLM-6B [THUDM/ChatGLM-6B](https://hf-mirror.com/THUDM/chatglm-6b) architecture.

    Configuration objects inherit from  [`PretrainedConfig`] and can be used
    to control the model outputs. Read the documentation from  [`PretrainedConfig`]
    for more information.


    Args:
        vocab_size (`int`, *optional*, defaults to 150528):
            Vocabulary size of the ChatGLM-6B model. Defines the number of different tokens that can be represented by the
            `inputs_ids` passed when calling [`~ChatGLMModel`] or
            [`~TFChatGLMModel`].
        hidden_size (`int`, *optional*, defaults to 4096):
            Dimension of the encoder layers and the pooler layer.
        num_hidden_layers (`int`, *optional*, defaults to 28):
            Number of hidden layers in the Transformer encoder.
        num_attention_heads (`int`, *optional*, defaults to 32):
            Number of attention heads for each attention layer in the Transformer encoder.
        inner_hidden_size (`int`, *optional*, defaults to 16384):
            Dimension of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder.
        max_sequence_length (`int`, *optional*, defaults to 512):
            The maximum sequence length that this model might ever be used with.
            Typically set this to something large just in case (e.g., 512 or 1024 or 2048).
        layernorm_epsilon (`float`, *optional*, defaults to 1e-5):
            The epsilon used by the layer normalization layers.
        use_cache (`bool`, *optional*, defaults to `True`):
            Whether the model should return the last key/values attentions (not used by all models).

    Example:
        ```python
        >>> from configuration_chatglm import ChatGLMConfig
        >>> from modeling_chatglm import ChatGLMModel
        ...
        >>> # Initializing a ChatGLM-6B THUDM/ChatGLM-6B style configuration
        >>> configuration = ChatGLMConfig()
        ...
        >>> # Initializing a model from the THUDM/ChatGLM-6B style configuration
        >>> model = ChatGLMModel(configuration)
        ...
        >>> # Accessing the model configuration
        >>> configuration = model.config
        ```
"""
    model_type = "chatglm"

    def __init__(
            self,
            vocab_size=150528,
            hidden_size=4096,
            num_layers=28,
            num_attention_heads=32,
            layernorm_epsilon=1e-5,
            use_cache=False,
            bos_token_id=150004,
            eos_token_id=150005,
            mask_token_id=150000,
            gmask_token_id=150001,
            pad_token_id=0,
            max_sequence_length=2048,
            inner_hidden_size=16384,
            position_encoding_2d=True,
            quantization_bit=0,
            pre_seq_len=None,
            prefix_projection=False,
            **kwargs
    ):
        """
        Initializes a ChatGLMConfig object with the specified configuration parameters.

        Args:
            self (object): The instance of the ChatGLMConfig class.
            vocab_size (int, optional): The size of the vocabulary. Default is 150528.
            hidden_size (int, optional): The size of the hidden layers. Default is 4096.
            num_layers (int, optional): The number of layers in the model. Default is 28.
            num_attention_heads (int, optional): The number of attention heads. Default is 32.
            layernorm_epsilon (float, optional): The epsilon value for layer normalization. Default is 1e-05.
            use_cache (bool, optional): Whether to use cache during inference. Default is False.
            bos_token_id (int, optional): The ID of the beginning of sequence token. Default is 150004.
            eos_token_id (int, optional): The ID of the end of sequence token. Default is 150005.
            mask_token_id (int, optional): The ID of the mask token. Default is 150000.
            gmask_token_id (int, optional): The ID of the global mask token. Default is 150001.
            pad_token_id (int, optional): The ID of the padding token. Default is 0.
            max_sequence_length (int, optional): The maximum sequence length allowed. Default is 2048.
            inner_hidden_size (int, optional): The size of inner hidden layers. Default is 16384.
            position_encoding_2d (bool, optional): Whether to use 2D position encoding. Default is True.
            quantization_bit (int, optional): The number of bits for quantization. Default is 0.
            pre_seq_len (int, optional): The length of the preceding sequence. Default is None.
            prefix_projection (bool, optional): Whether to use prefix projection. Default is False.

        Returns:
            None.

        Raises:
            None.
        """
        self.num_layers = num_layers
        self.vocab_size = vocab_size
        self.hidden_size = hidden_size
        self.num_attention_heads = num_attention_heads
        self.max_sequence_length = max_sequence_length
        self.layernorm_epsilon = layernorm_epsilon
        self.inner_hidden_size = inner_hidden_size
        self.use_cache = use_cache
        self.bos_token_id = bos_token_id
        self.eos_token_id = eos_token_id
        self.pad_token_id = pad_token_id
        self.mask_token_id = mask_token_id
        self.gmask_token_id = gmask_token_id
        self.position_encoding_2d = position_encoding_2d
        self.quantization_bit = quantization_bit
        self.pre_seq_len = pre_seq_len
        self.prefix_projection = prefix_projection

        super().__init__(
            pad_token_id=pad_token_id,
            bos_token_id=bos_token_id,
            eos_token_id=eos_token_id,
            **kwargs
        )

mindnlp.transformers.models.chatglm.configuration_chatglm.ChatGLMConfig.__init__(vocab_size=150528, hidden_size=4096, num_layers=28, num_attention_heads=32, layernorm_epsilon=1e-05, use_cache=False, bos_token_id=150004, eos_token_id=150005, mask_token_id=150000, gmask_token_id=150001, pad_token_id=0, max_sequence_length=2048, inner_hidden_size=16384, position_encoding_2d=True, quantization_bit=0, pre_seq_len=None, prefix_projection=False, **kwargs)

Initializes a ChatGLMConfig object with the specified configuration parameters.

PARAMETER DESCRIPTION
self

The instance of the ChatGLMConfig class.

TYPE: object

vocab_size

The size of the vocabulary. Default is 150528.

TYPE: int DEFAULT: 150528

hidden_size

The size of the hidden layers. Default is 4096.

TYPE: int DEFAULT: 4096

num_layers

The number of layers in the model. Default is 28.

TYPE: int DEFAULT: 28

num_attention_heads

The number of attention heads. Default is 32.

TYPE: int DEFAULT: 32

layernorm_epsilon

The epsilon value for layer normalization. Default is 1e-05.

TYPE: float DEFAULT: 1e-05

use_cache

Whether to use cache during inference. Default is False.

TYPE: bool DEFAULT: False

bos_token_id

The ID of the beginning of sequence token. Default is 150004.

TYPE: int DEFAULT: 150004

eos_token_id

The ID of the end of sequence token. Default is 150005.

TYPE: int DEFAULT: 150005

mask_token_id

The ID of the mask token. Default is 150000.

TYPE: int DEFAULT: 150000

gmask_token_id

The ID of the global mask token. Default is 150001.

TYPE: int DEFAULT: 150001

pad_token_id

The ID of the padding token. Default is 0.

TYPE: int DEFAULT: 0

max_sequence_length

The maximum sequence length allowed. Default is 2048.

TYPE: int DEFAULT: 2048

inner_hidden_size

The size of inner hidden layers. Default is 16384.

TYPE: int DEFAULT: 16384

position_encoding_2d

Whether to use 2D position encoding. Default is True.

TYPE: bool DEFAULT: True

quantization_bit

The number of bits for quantization. Default is 0.

TYPE: int DEFAULT: 0

pre_seq_len

The length of the preceding sequence. Default is None.

TYPE: int DEFAULT: None

prefix_projection

Whether to use prefix projection. Default is False.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/chatglm/configuration_chatglm.py
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
def __init__(
        self,
        vocab_size=150528,
        hidden_size=4096,
        num_layers=28,
        num_attention_heads=32,
        layernorm_epsilon=1e-5,
        use_cache=False,
        bos_token_id=150004,
        eos_token_id=150005,
        mask_token_id=150000,
        gmask_token_id=150001,
        pad_token_id=0,
        max_sequence_length=2048,
        inner_hidden_size=16384,
        position_encoding_2d=True,
        quantization_bit=0,
        pre_seq_len=None,
        prefix_projection=False,
        **kwargs
):
    """
    Initializes a ChatGLMConfig object with the specified configuration parameters.

    Args:
        self (object): The instance of the ChatGLMConfig class.
        vocab_size (int, optional): The size of the vocabulary. Default is 150528.
        hidden_size (int, optional): The size of the hidden layers. Default is 4096.
        num_layers (int, optional): The number of layers in the model. Default is 28.
        num_attention_heads (int, optional): The number of attention heads. Default is 32.
        layernorm_epsilon (float, optional): The epsilon value for layer normalization. Default is 1e-05.
        use_cache (bool, optional): Whether to use cache during inference. Default is False.
        bos_token_id (int, optional): The ID of the beginning of sequence token. Default is 150004.
        eos_token_id (int, optional): The ID of the end of sequence token. Default is 150005.
        mask_token_id (int, optional): The ID of the mask token. Default is 150000.
        gmask_token_id (int, optional): The ID of the global mask token. Default is 150001.
        pad_token_id (int, optional): The ID of the padding token. Default is 0.
        max_sequence_length (int, optional): The maximum sequence length allowed. Default is 2048.
        inner_hidden_size (int, optional): The size of inner hidden layers. Default is 16384.
        position_encoding_2d (bool, optional): Whether to use 2D position encoding. Default is True.
        quantization_bit (int, optional): The number of bits for quantization. Default is 0.
        pre_seq_len (int, optional): The length of the preceding sequence. Default is None.
        prefix_projection (bool, optional): Whether to use prefix projection. Default is False.

    Returns:
        None.

    Raises:
        None.
    """
    self.num_layers = num_layers
    self.vocab_size = vocab_size
    self.hidden_size = hidden_size
    self.num_attention_heads = num_attention_heads
    self.max_sequence_length = max_sequence_length
    self.layernorm_epsilon = layernorm_epsilon
    self.inner_hidden_size = inner_hidden_size
    self.use_cache = use_cache
    self.bos_token_id = bos_token_id
    self.eos_token_id = eos_token_id
    self.pad_token_id = pad_token_id
    self.mask_token_id = mask_token_id
    self.gmask_token_id = gmask_token_id
    self.position_encoding_2d = position_encoding_2d
    self.quantization_bit = quantization_bit
    self.pre_seq_len = pre_seq_len
    self.prefix_projection = prefix_projection

    super().__init__(
        pad_token_id=pad_token_id,
        bos_token_id=bos_token_id,
        eos_token_id=eos_token_id,
        **kwargs
    )

mindnlp.transformers.models.chatglm.tokenization_chatglm.ChatGLMTokenizer

Bases: PreTrainedTokenizer

Construct a ChatGLM tokenizer. Based on byte-level Byte-Pair-Encoding.

PARAMETER DESCRIPTION
vocab_file

Path to the vocabulary file.

TYPE: `str`

Source code in mindnlp/transformers/models/chatglm/tokenization_chatglm.py
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
class ChatGLMTokenizer(PreTrainedTokenizer):
    """
    Construct a ChatGLM tokenizer. Based on byte-level Byte-Pair-Encoding.

    Args:
        vocab_file (`str`):
            Path to the vocabulary file.
    """
    vocab_files_names = {"vocab_file": "ice_text.model"}
    max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
    model_input_names = ["input_ids", "attention_mask", "position_ids"]

    def __init__(
            self,
            vocab_file,
            do_lower_case=False,
            remove_space=False,
            bos_token='<sop>',
            eos_token='<eop>',
            end_token='</s>',
            mask_token='[MASK]',
            gmask_token='[gMASK]',
            padding_side="left",
            pad_token="<pad>",
            unk_token="<unk>",
            num_image_tokens=20000,
            **kwargs
    ) -> None:
        """
        Initializes a ChatGLMTokenizer object.

        Args:
            vocab_file (str): The file path to the vocabulary file.
            do_lower_case (bool, optional): Flag indicating whether to convert all tokens to lowercase. Defaults to False.
            remove_space (bool, optional): Flag indicating whether to remove spaces from tokens. Defaults to False.
            bos_token (str, optional): The beginning of sentence token. Defaults to '<sop>'.
            eos_token (str, optional): The end of sentence token. Defaults to '<eop>'.
            end_token (str, optional): The end token. Defaults to '</s>'.
            mask_token (str, optional): The mask token. Defaults to '[MASK]'.
            gmask_token (str, optional): The global mask token. Defaults to '[gMASK]'.
            padding_side (str, optional): The side to pad tokens on. Defaults to 'left'.
            pad_token (str, optional): The padding token. Defaults to '<pad>'.
            unk_token (str, optional): The unknown token. Defaults to '<unk>'.
            num_image_tokens (int, optional): The number of image tokens. Defaults to 20000.
            **kwargs: Additional keyword arguments.

        Returns:
            None.

        Raises:
            None: This method does not raise any exceptions.
        """
        self.sp_tokenizer = SPTokenizer(vocab_file, num_image_tokens=num_image_tokens)
        super().__init__(
            do_lower_case=do_lower_case,
            remove_space=remove_space,
            padding_side=padding_side,
            bos_token=bos_token,
            eos_token=eos_token,
            end_token=end_token,
            mask_token=mask_token,
            gmask_token=gmask_token,
            pad_token=pad_token,
            unk_token=unk_token,
            num_image_tokens=num_image_tokens,
            **kwargs
        )

        self.do_lower_case = do_lower_case
        self.remove_space = remove_space
        self.vocab_file = vocab_file

        self.bos_token = bos_token
        self.eos_token = eos_token
        self.end_token = end_token
        self.mask_token = mask_token
        self.gmask_token = gmask_token
        """ Initialisation """

    @property
    def gmask_token_id(self) -> Optional[int]:
        """
        This method returns the token ID of the gmask token in the ChatGLMTokenizer.

        Args:
            self (ChatGLMTokenizer): The instance of the ChatGLMTokenizer class.

        Returns:
            Optional[int]: Returns the token ID of the gmask token if it exists, otherwise returns None.

        Raises:
            None
        """
        if self.gmask_token is None:
            return None
        return self.convert_tokens_to_ids(self.gmask_token)

    @property
    def end_token_id(self) -> Optional[int]:
        """
        Returns:
            `Optional[int]`:
                Id of the end of context token in the vocabulary. Returns `None` if the token has not been set.
        """
        if self.end_token is None:
            return None
        return self.convert_tokens_to_ids(self.end_token)

    @property
    def vocab_size(self):
        """ Returns vocab size """
        return self.sp_tokenizer.num_tokens

    def get_vocab(self):
        """ Returns vocab as a dict """
        vocab = {self._convert_id_to_token(i): i for i in range(self.vocab_size)}
        vocab.update(self.added_tokens_encoder)
        return vocab

    def preprocess_text(self, inputs):
        """
        preprocess_text method in the ChatGLMTokenizer class preprocesses the input text based on the specified configuration.

        Args:
            self (ChatGLMTokenizer): The instance of the ChatGLMTokenizer class.
            inputs (str): The input text to be preprocessed.

        Returns:
            str:
                The preprocessed text based on the specified configuration.

                - If self.remove_space is True, leading and trailing spaces are removed,
                and consecutive spaces within the text are replaced with a  single space.
                - If self.do_lower_case is True, the text is converted to lowercase. The preprocessed text is returned.

        Raises:
            None
        """
        if self.remove_space:
            outputs = " ".join(inputs.strip().split())
        else:
            outputs = inputs

        if self.do_lower_case:
            outputs = outputs.lower()

        return outputs

    def _tokenize(self, text, **kwargs):
        """ Returns a tokenized string. """
        text = self.preprocess_text(text)

        seq = self.sp_tokenizer.tokenize(text)

        return seq

    def convert_tokens_to_string(self, tokens: List[str]) -> str:
        """
        Converts a list of tokens into a single string representation.

        Args:
            self (ChatGLMTokenizer): An instance of the ChatGLMTokenizer class.
            tokens (List[str]): A list of tokens to be converted into a string representation.

        Returns:
            str: The string representation of the given list of tokens.

        Raises:
            None.

        Note:
            - The tokens should be generated using the sp_tokenizer of the ChatGLMTokenizer instance.
            - The resulting string may contain whitespace and punctuation marks based on the original tokenization.

        Example:
            ```python
            >>> tokenizer = ChatGLMTokenizer()
            >>> tokens = ['Hello', ',', 'how', 'are', 'you', '?']
            >>> string_representation = tokenizer.convert_tokens_to_string(tokens)
            ```
        """
        return self.sp_tokenizer.decode_tokens(tokens)

    def _decode(
            self,
            token_ids: Union[int, List[int]],
            **kwargs
    ) -> str:
        """
        This method decodes the given token IDs into a string representation.

        Args:
            self (ChatGLMTokenizer): The instance of the ChatGLMTokenizer class.
            token_ids (Union[int, List[int]]): The token IDs to be decoded. It can be a single integer or a list of integers.

        Returns:
            str: The decoded string representation of the token IDs.

        Raises:
            None.
        """
        if isinstance(token_ids, int):
            token_ids = [token_ids]
        if len(token_ids) == 0:
            return ""
        if self.pad_token_id in token_ids:  # remove pad
            token_ids = list(filter((self.pad_token_id).__ne__, token_ids))
        return super()._decode(token_ids, **kwargs)

    def _convert_token_to_id(self, token):
        """ Converts a token (str) in an id using the vocab. """
        return self.sp_tokenizer[token]

    def _convert_id_to_token(self, index):
        """Converts an index (integer) in a token (str) using the vocab."""
        return self.sp_tokenizer[index]

    def save_vocabulary(self, save_directory, filename_prefix=None):
        """
        Save the vocabulary and special tokens file to a directory.

        Args:
            save_directory (`str`):
                The directory in which to save the vocabulary.
            filename_prefix (`str`, *optional*):
                An optional prefix to add to the named of the saved files.

        Returns:
            `Tuple(str)`: Paths to the files saved.
        """
        if os.path.isdir(save_directory):
            vocab_file = os.path.join(
                save_directory, self.vocab_files_names["vocab_file"]
            )
        else:
            vocab_file = save_directory

        with open(self.vocab_file, 'rb') as fin:
            proto_str = fin.read()

        with open(vocab_file, "wb") as writer:
            writer.write(proto_str)

        return (vocab_file,)

    def build_inputs_with_special_tokens(
            self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
    ) -> List[int]:
        """
        Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and
        adding special tokens. A BERT sequence has the following format:

        - single sequence: `[CLS] X [SEP]`
        - pair of sequences: `[CLS] A [SEP] B [SEP]`

        Args:
            token_ids_0 (`List[int]`):
                List of IDs to which the special tokens will be added.
            token_ids_1 (`List[int]`, *optional*):
                Optional second list of IDs for sequence pairs.

        Returns:
            `List[int]`: List of [input IDs](../glossary#input-ids) with the appropriate special tokens.
        """
        gmask_id = self.sp_tokenizer[self.gmask_token]
        eos_id = self.sp_tokenizer[self.eos_token]
        token_ids_0 = token_ids_0 + [gmask_id, self.sp_tokenizer[self.bos_token]]
        if token_ids_1 is not None:
            token_ids_0 = token_ids_0 + token_ids_1 + [eos_id]
        return token_ids_0

    def _pad(
            self,
            encoded_inputs: Union[Dict[str, EncodedInput], BatchEncoding],
            max_length: Optional[int] = None,
            padding_strategy: PaddingStrategy = PaddingStrategy.DO_NOT_PAD,
            pad_to_multiple_of: Optional[int] = None,
            return_attention_mask: Optional[bool] = None,
    ) -> dict:
        """
        Pad encoded inputs (on left/right and up to predefined length or max length in the batch)

        Args:
            encoded_inputs:
                Dictionary of tokenized inputs (`List[int]`) or batch of tokenized inputs (`List[List[int]]`).
            max_length: maximum length of the returned list and optionally padding length (see below).
                Will truncate by taking into account the special tokens.
            padding_strategy: PaddingStrategy to use for padding.

                - PaddingStrategy.LONGEST Pad to the longest sequence in the batch
                - PaddingStrategy.MAX_LENGTH: Pad to the max length (default)
                - PaddingStrategy.DO_NOT_PAD: Do not pad
                - The tokenizer padding sides are defined in self.padding_side:

                    - 'left': pads on the left of the sequences
                    - 'right': pads on the right of the sequences
            pad_to_multiple_of: (optional) Integer if set will pad the sequence to a multiple of the provided value.
                This is especially useful to enable the use of Tensor Core on NVIDIA hardware with compute capability
                `>= 7.5` (Volta).
            return_attention_mask:
                (optional) Set to False to avoid returning attention mask (default: set to model specifics)
        """
        # Load from model defaults
        bos_token_id = self.sp_tokenizer[self.bos_token]
        mask_token_id = self.sp_tokenizer[self.mask_token]
        gmask_token_id = self.sp_tokenizer[self.gmask_token]
        assert self.padding_side == "left"

        required_input = encoded_inputs[self.model_input_names[0]]
        seq_length = len(required_input)

        if padding_strategy == PaddingStrategy.LONGEST:
            max_length = len(required_input)

        if max_length is not None and pad_to_multiple_of is not None and (max_length % pad_to_multiple_of != 0):
            max_length = ((max_length // pad_to_multiple_of) + 1) * pad_to_multiple_of

        needs_to_be_padded = padding_strategy != PaddingStrategy.DO_NOT_PAD and len(required_input) != max_length

        # Initialize attention mask if not present.
        if max_length is not None:
            if "attention_mask" not in encoded_inputs:
                if bos_token_id in required_input:
                    context_length = required_input.index(bos_token_id)
                else:
                    context_length = seq_length
                attention_mask = np.ones((1, seq_length, seq_length))
                attention_mask = np.tril(attention_mask)
                attention_mask[:, :, :context_length] = 1
                attention_mask = np.bool_(attention_mask < 0.5)
                encoded_inputs["attention_mask"] = attention_mask

            if "position_ids" not in encoded_inputs:
                if bos_token_id in required_input:
                    context_length = required_input.index(bos_token_id)
                else:
                    context_length = seq_length
                position_ids = np.arange(seq_length, dtype=np.int64)
                mask_token = mask_token_id if mask_token_id in required_input else gmask_token_id
                if mask_token in required_input:
                    mask_position = required_input.index(mask_token)
                    position_ids[context_length:] = mask_position
                block_position_ids = np.concatenate(
                    [np.zeros(context_length, dtype=np.int64),
                     np.arange(1, seq_length - context_length + 1, dtype=np.int64)])
                encoded_inputs["position_ids"] = np.stack([position_ids, block_position_ids], axis=0)

        if needs_to_be_padded:
            difference = max_length - len(required_input)

            if "attention_mask" in encoded_inputs:
                encoded_inputs["attention_mask"] = np.pad(encoded_inputs["attention_mask"],
                                                          pad_width=[(0, 0), (difference, 0), (difference, 0)],
                                                          mode='constant', constant_values=True)
            if "token_type_ids" in encoded_inputs:
                encoded_inputs["token_type_ids"] = [self.pad_token_type_id] * difference + encoded_inputs[
                    "token_type_ids"
                ]
            if "special_tokens_mask" in encoded_inputs:
                encoded_inputs["special_tokens_mask"] = [1] * difference + encoded_inputs["special_tokens_mask"]
            if "position_ids" in encoded_inputs:
                encoded_inputs["position_ids"] = np.pad(encoded_inputs["position_ids"],
                                                        pad_width=[(0, 0), (difference, 0)])
            encoded_inputs[self.model_input_names[0]] = [self.pad_token_id] * difference + required_input

        return encoded_inputs

mindnlp.transformers.models.chatglm.tokenization_chatglm.ChatGLMTokenizer.end_token_id: Optional[int] property

RETURNS DESCRIPTION
Optional[int]

Optional[int]: Id of the end of context token in the vocabulary. Returns None if the token has not been set.

mindnlp.transformers.models.chatglm.tokenization_chatglm.ChatGLMTokenizer.gmask_token = gmask_token instance-attribute

Initialisation

mindnlp.transformers.models.chatglm.tokenization_chatglm.ChatGLMTokenizer.gmask_token_id: Optional[int] property

This method returns the token ID of the gmask token in the ChatGLMTokenizer.

PARAMETER DESCRIPTION
self

The instance of the ChatGLMTokenizer class.

TYPE: ChatGLMTokenizer

RETURNS DESCRIPTION
Optional[int]

Optional[int]: Returns the token ID of the gmask token if it exists, otherwise returns None.

mindnlp.transformers.models.chatglm.tokenization_chatglm.ChatGLMTokenizer.vocab_size property

Returns vocab size

mindnlp.transformers.models.chatglm.tokenization_chatglm.ChatGLMTokenizer.__init__(vocab_file, do_lower_case=False, remove_space=False, bos_token='<sop>', eos_token='<eop>', end_token='</s>', mask_token='[MASK]', gmask_token='[gMASK]', padding_side='left', pad_token='<pad>', unk_token='<unk>', num_image_tokens=20000, **kwargs)

Initializes a ChatGLMTokenizer object.

PARAMETER DESCRIPTION
vocab_file

The file path to the vocabulary file.

TYPE: str

do_lower_case

Flag indicating whether to convert all tokens to lowercase. Defaults to False.

TYPE: bool DEFAULT: False

remove_space

Flag indicating whether to remove spaces from tokens. Defaults to False.

TYPE: bool DEFAULT: False

bos_token

The beginning of sentence token. Defaults to ''.

TYPE: str DEFAULT: '<sop>'

eos_token

The end of sentence token. Defaults to ''.

TYPE: str DEFAULT: '<eop>'

end_token

The end token. Defaults to ''.

TYPE: str DEFAULT: '</s>'

mask_token

The mask token. Defaults to '[MASK]'.

TYPE: str DEFAULT: '[MASK]'

gmask_token

The global mask token. Defaults to '[gMASK]'.

TYPE: str DEFAULT: '[gMASK]'

padding_side

The side to pad tokens on. Defaults to 'left'.

TYPE: str DEFAULT: 'left'

pad_token

The padding token. Defaults to ''.

TYPE: str DEFAULT: '<pad>'

unk_token

The unknown token. Defaults to ''.

TYPE: str DEFAULT: '<unk>'

num_image_tokens

The number of image tokens. Defaults to 20000.

TYPE: int DEFAULT: 20000

**kwargs

Additional keyword arguments.

DEFAULT: {}

RETURNS DESCRIPTION
None

None.

RAISES DESCRIPTION
None

This method does not raise any exceptions.

Source code in mindnlp/transformers/models/chatglm/tokenization_chatglm.py
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
def __init__(
        self,
        vocab_file,
        do_lower_case=False,
        remove_space=False,
        bos_token='<sop>',
        eos_token='<eop>',
        end_token='</s>',
        mask_token='[MASK]',
        gmask_token='[gMASK]',
        padding_side="left",
        pad_token="<pad>",
        unk_token="<unk>",
        num_image_tokens=20000,
        **kwargs
) -> None:
    """
    Initializes a ChatGLMTokenizer object.

    Args:
        vocab_file (str): The file path to the vocabulary file.
        do_lower_case (bool, optional): Flag indicating whether to convert all tokens to lowercase. Defaults to False.
        remove_space (bool, optional): Flag indicating whether to remove spaces from tokens. Defaults to False.
        bos_token (str, optional): The beginning of sentence token. Defaults to '<sop>'.
        eos_token (str, optional): The end of sentence token. Defaults to '<eop>'.
        end_token (str, optional): The end token. Defaults to '</s>'.
        mask_token (str, optional): The mask token. Defaults to '[MASK]'.
        gmask_token (str, optional): The global mask token. Defaults to '[gMASK]'.
        padding_side (str, optional): The side to pad tokens on. Defaults to 'left'.
        pad_token (str, optional): The padding token. Defaults to '<pad>'.
        unk_token (str, optional): The unknown token. Defaults to '<unk>'.
        num_image_tokens (int, optional): The number of image tokens. Defaults to 20000.
        **kwargs: Additional keyword arguments.

    Returns:
        None.

    Raises:
        None: This method does not raise any exceptions.
    """
    self.sp_tokenizer = SPTokenizer(vocab_file, num_image_tokens=num_image_tokens)
    super().__init__(
        do_lower_case=do_lower_case,
        remove_space=remove_space,
        padding_side=padding_side,
        bos_token=bos_token,
        eos_token=eos_token,
        end_token=end_token,
        mask_token=mask_token,
        gmask_token=gmask_token,
        pad_token=pad_token,
        unk_token=unk_token,
        num_image_tokens=num_image_tokens,
        **kwargs
    )

    self.do_lower_case = do_lower_case
    self.remove_space = remove_space
    self.vocab_file = vocab_file

    self.bos_token = bos_token
    self.eos_token = eos_token
    self.end_token = end_token
    self.mask_token = mask_token
    self.gmask_token = gmask_token
    """ Initialisation """

mindnlp.transformers.models.chatglm.tokenization_chatglm.ChatGLMTokenizer.build_inputs_with_special_tokens(token_ids_0, token_ids_1=None)

Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and adding special tokens. A BERT sequence has the following format:

  • single sequence: [CLS] X [SEP]
  • pair of sequences: [CLS] A [SEP] B [SEP]
PARAMETER DESCRIPTION
token_ids_0

List of IDs to which the special tokens will be added.

TYPE: `List[int]`

token_ids_1

Optional second list of IDs for sequence pairs.

TYPE: `List[int]`, *optional* DEFAULT: None

RETURNS DESCRIPTION
List[int]

List[int]: List of input IDs with the appropriate special tokens.

Source code in mindnlp/transformers/models/chatglm/tokenization_chatglm.py
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
def build_inputs_with_special_tokens(
        self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
) -> List[int]:
    """
    Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and
    adding special tokens. A BERT sequence has the following format:

    - single sequence: `[CLS] X [SEP]`
    - pair of sequences: `[CLS] A [SEP] B [SEP]`

    Args:
        token_ids_0 (`List[int]`):
            List of IDs to which the special tokens will be added.
        token_ids_1 (`List[int]`, *optional*):
            Optional second list of IDs for sequence pairs.

    Returns:
        `List[int]`: List of [input IDs](../glossary#input-ids) with the appropriate special tokens.
    """
    gmask_id = self.sp_tokenizer[self.gmask_token]
    eos_id = self.sp_tokenizer[self.eos_token]
    token_ids_0 = token_ids_0 + [gmask_id, self.sp_tokenizer[self.bos_token]]
    if token_ids_1 is not None:
        token_ids_0 = token_ids_0 + token_ids_1 + [eos_id]
    return token_ids_0

mindnlp.transformers.models.chatglm.tokenization_chatglm.ChatGLMTokenizer.convert_tokens_to_string(tokens)

Converts a list of tokens into a single string representation.

PARAMETER DESCRIPTION
self

An instance of the ChatGLMTokenizer class.

TYPE: ChatGLMTokenizer

tokens

A list of tokens to be converted into a string representation.

TYPE: List[str]

RETURNS DESCRIPTION
str

The string representation of the given list of tokens.

TYPE: str

Note
  • The tokens should be generated using the sp_tokenizer of the ChatGLMTokenizer instance.
  • The resulting string may contain whitespace and punctuation marks based on the original tokenization.
Example
>>> tokenizer = ChatGLMTokenizer()
>>> tokens = ['Hello', ',', 'how', 'are', 'you', '?']
>>> string_representation = tokenizer.convert_tokens_to_string(tokens)
Source code in mindnlp/transformers/models/chatglm/tokenization_chatglm.py
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
def convert_tokens_to_string(self, tokens: List[str]) -> str:
    """
    Converts a list of tokens into a single string representation.

    Args:
        self (ChatGLMTokenizer): An instance of the ChatGLMTokenizer class.
        tokens (List[str]): A list of tokens to be converted into a string representation.

    Returns:
        str: The string representation of the given list of tokens.

    Raises:
        None.

    Note:
        - The tokens should be generated using the sp_tokenizer of the ChatGLMTokenizer instance.
        - The resulting string may contain whitespace and punctuation marks based on the original tokenization.

    Example:
        ```python
        >>> tokenizer = ChatGLMTokenizer()
        >>> tokens = ['Hello', ',', 'how', 'are', 'you', '?']
        >>> string_representation = tokenizer.convert_tokens_to_string(tokens)
        ```
    """
    return self.sp_tokenizer.decode_tokens(tokens)

mindnlp.transformers.models.chatglm.tokenization_chatglm.ChatGLMTokenizer.get_vocab()

Returns vocab as a dict

Source code in mindnlp/transformers/models/chatglm/tokenization_chatglm.py
595
596
597
598
599
def get_vocab(self):
    """ Returns vocab as a dict """
    vocab = {self._convert_id_to_token(i): i for i in range(self.vocab_size)}
    vocab.update(self.added_tokens_encoder)
    return vocab

mindnlp.transformers.models.chatglm.tokenization_chatglm.ChatGLMTokenizer.preprocess_text(inputs)

preprocess_text method in the ChatGLMTokenizer class preprocesses the input text based on the specified configuration.

PARAMETER DESCRIPTION
self

The instance of the ChatGLMTokenizer class.

TYPE: ChatGLMTokenizer

inputs

The input text to be preprocessed.

TYPE: str

RETURNS DESCRIPTION
str

The preprocessed text based on the specified configuration.

  • If self.remove_space is True, leading and trailing spaces are removed, and consecutive spaces within the text are replaced with a single space.
  • If self.do_lower_case is True, the text is converted to lowercase. The preprocessed text is returned.
Source code in mindnlp/transformers/models/chatglm/tokenization_chatglm.py
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
def preprocess_text(self, inputs):
    """
    preprocess_text method in the ChatGLMTokenizer class preprocesses the input text based on the specified configuration.

    Args:
        self (ChatGLMTokenizer): The instance of the ChatGLMTokenizer class.
        inputs (str): The input text to be preprocessed.

    Returns:
        str:
            The preprocessed text based on the specified configuration.

            - If self.remove_space is True, leading and trailing spaces are removed,
            and consecutive spaces within the text are replaced with a  single space.
            - If self.do_lower_case is True, the text is converted to lowercase. The preprocessed text is returned.

    Raises:
        None
    """
    if self.remove_space:
        outputs = " ".join(inputs.strip().split())
    else:
        outputs = inputs

    if self.do_lower_case:
        outputs = outputs.lower()

    return outputs

mindnlp.transformers.models.chatglm.tokenization_chatglm.ChatGLMTokenizer.save_vocabulary(save_directory, filename_prefix=None)

Save the vocabulary and special tokens file to a directory.

PARAMETER DESCRIPTION
save_directory

The directory in which to save the vocabulary.

TYPE: `str`

filename_prefix

An optional prefix to add to the named of the saved files.

TYPE: `str`, *optional* DEFAULT: None

RETURNS DESCRIPTION

Tuple(str): Paths to the files saved.

Source code in mindnlp/transformers/models/chatglm/tokenization_chatglm.py
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
def save_vocabulary(self, save_directory, filename_prefix=None):
    """
    Save the vocabulary and special tokens file to a directory.

    Args:
        save_directory (`str`):
            The directory in which to save the vocabulary.
        filename_prefix (`str`, *optional*):
            An optional prefix to add to the named of the saved files.

    Returns:
        `Tuple(str)`: Paths to the files saved.
    """
    if os.path.isdir(save_directory):
        vocab_file = os.path.join(
            save_directory, self.vocab_files_names["vocab_file"]
        )
    else:
        vocab_file = save_directory

    with open(self.vocab_file, 'rb') as fin:
        proto_str = fin.read()

    with open(vocab_file, "wb") as writer:
        writer.write(proto_str)

    return (vocab_file,)

mindnlp.transformers.models.chatglm.modeling_graph_chatglm.MSChatGLMModel

Bases: MSChatGLMPreTrainedModel

The model can behave as an encoder (with only self-attention) as well as a decoder, in which case a layer of cross-attention is added between the self-attention layers, following the architecture described in Attention is all you need by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser and Illia Polosukhin.

To behave as an decoder the model needs to be initialized with the is_decoder argument of the configuration set to True. To be used in a Seq2Seq model, the model needs to initialized with both is_decoder argument and add_cross_attention set to True; an encoder_hidden_states is then expected as an input to the forward pass.

Source code in mindnlp/transformers/models/chatglm/modeling_graph_chatglm.py
 745
 746
 747
 748
 749
 750
 751
 752
 753
 754
 755
 756
 757
 758
 759
 760
 761
 762
 763
 764
 765
 766
 767
 768
 769
 770
 771
 772
 773
 774
 775
 776
 777
 778
 779
 780
 781
 782
 783
 784
 785
 786
 787
 788
 789
 790
 791
 792
 793
 794
 795
 796
 797
 798
 799
 800
 801
 802
 803
 804
 805
 806
 807
 808
 809
 810
 811
 812
 813
 814
 815
 816
 817
 818
 819
 820
 821
 822
 823
 824
 825
 826
 827
 828
 829
 830
 831
 832
 833
 834
 835
 836
 837
 838
 839
 840
 841
 842
 843
 844
 845
 846
 847
 848
 849
 850
 851
 852
 853
 854
 855
 856
 857
 858
 859
 860
 861
 862
 863
 864
 865
 866
 867
 868
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
class MSChatGLMModel(MSChatGLMPreTrainedModel):
    """

    The model can behave as an encoder (with only self-attention) as well
    as a decoder, in which case a layer of cross-attention is added between
    the self-attention layers, following the architecture described in [Attention is
    all you need](https://arxiv.org/abs/1706.03762) by Ashish Vaswani,
    Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser and Illia Polosukhin.

    To behave as an decoder the model needs to be initialized with the
    `is_decoder` argument of the configuration set to `True`.
    To be used in a Seq2Seq model, the model needs to initialized with both `is_decoder`
    argument and `add_cross_attention` set to `True`; an
    `encoder_hidden_states` is then expected as an input to the forward pass.
    """
    def __init__(self, config: ChatGLMConfig):
        """
        Initializes an instance of the MSChatGLMModel class with the provided configuration.

        Args:
            self: The instance of the MSChatGLMModel class.
            config (ChatGLMConfig):
                The configuration for the model.

                - max_sequence_length (int): The maximum sequence length for the input.
                - hidden_size (int): The size of the hidden layer.
                - num_attention_heads (int): The number of attention heads.
                - vocab_size (int): The size of the vocabulary.
                - num_layers (int): The number of layers for the model.
                - layernorm_epsilon (float): The epsilon value for the layer normalization.
                - inner_hidden_size (int): The size of the inner hidden layer.
                - position_encoding_2d (bool): Whether to use 2D position encoding.
                - pre_seq_len (int): The length of the prefix sequence.
                - prefix_projection (bool): Whether to use prefix projection.
                - use_cache (bool): Whether to use cache.
                - output_hidden_states (bool): Whether to output hidden states.
                - output_attentions (bool): Whether to output attentions.

        Returns:
            None

        Raises:
            None
        """
        super().__init__(config)
        # recording parameters
        self.max_sequence_length = config.max_sequence_length
        self.hidden_size = config.hidden_size
        self.params_dtype = mindspore.float16
        self.num_attention_heads = config.num_attention_heads
        self.vocab_size = config.vocab_size
        self.num_layers = config.num_layers
        self.layernorm_epsilon = config.layernorm_epsilon
        self.inner_hidden_size = config.inner_hidden_size
        self.hidden_size_per_attention_head = self.hidden_size // self.num_attention_heads
        self.position_encoding_2d = config.position_encoding_2d
        self.pre_seq_len = config.pre_seq_len
        self.prefix_projection = config.prefix_projection

        self.use_cache = config.use_cache
        self.output_hidden_states = config.output_hidden_states
        self.output_attentions = config.output_attentions

        self.word_embeddings = nn.Embedding(
            self.vocab_size, self.hidden_size,
            dtype=self.params_dtype
        )

        def get_layer(layer_id):
            return GLMBlock(
                config,
                self.hidden_size,
                self.num_attention_heads,
                self.layernorm_epsilon,
                layer_id,
                inner_hidden_size=self.inner_hidden_size,
                hidden_size_per_attention_head=self.hidden_size_per_attention_head,
                use_bias=True,
                params_dtype=self.params_dtype,
                position_encoding_2d=self.position_encoding_2d,
            )

        self.layers = nn.ModuleList(
            [get_layer(layer_id) for layer_id in range(self.num_layers)]
        )
        # Final layer norm before output.
        self.final_layernorm = nn.LayerNorm([self.hidden_size], eps=self.layernorm_epsilon)

        if self.pre_seq_len is not None:
            # for param in self.parameters():
            #     param.requires_grad = False
            self.prefix_tokens = Tensor(np.arange(self.pre_seq_len))
            self.prefix_encoder = PrefixEncoder(config)
            self.dropout = nn.Dropout(p=0.1)

            # total_params = sum(p.numel() for p in self.parameters())
            # trainable_params = sum(p.numel() for p in self.parameters() if p.requires_grad)
            # print("Using p-tuning v2: # trainable_params = {} / {}".format(trainable_params, total_params))

    def get_input_embeddings(self):
        """
        Retrieve the input embeddings for the MSChatGLMModel.

        Args:
            self (MSChatGLMModel): An instance of the MSChatGLMModel class.

        Returns:
            None.

        Raises:
            None.
        """
        return self.word_embeddings

    def set_input_embeddings(self, new_embeddings: mindspore.Tensor):
        """
        Sets the input embeddings for the MSChatGLMModel.

        Args:
            self (MSChatGLMModel): The instance of the MSChatGLMModel class.
            new_embeddings (mindspore.Tensor): The new embeddings to be set as input.
                It should be a tensor object representing the word embeddings.

        Returns:
            None.

        Raises:
            None.

        Note:
            The input embeddings are used for representing words in the MSChatGLMModel.
            By setting new embeddings, the model can be fine-tuned or customized to use different word representations.
        """
        self.word_embeddings = new_embeddings

    def get_prompt(self, batch_size, dtype=mindspore.float16):
        """get prompt."""
        prefix_tokens = self.prefix_tokens.unsqueeze(0).expand(batch_size, -1)
        past_key_values = self.prefix_encoder(prefix_tokens).type(dtype)
        past_key_values = past_key_values.view(
            batch_size,
            self.pre_seq_len,
            self.num_layers * 2,
            self.num_attention_heads,
            self.hidden_size // self.num_attention_heads
        )
        # seq_len, b, nh, hidden_size
        past_key_values = self.dropout(past_key_values)
        past_key_values = past_key_values.permute([2, 1, 0, 3, 4]).split(2)
        # past_key_values = [(v[0], v[1]) for v in past_key_values]
        return past_key_values

    def forward(
            self,
            input_ids: Optional[mindspore.Tensor] = None,
            position_ids: Optional[mindspore.Tensor] = None,
            attention_mask: Optional[mindspore.Tensor] = None,
            past_key_values: Optional[Tuple[Tuple[mindspore.Tensor, mindspore.Tensor], ...]] = None,
            inputs_embeds: Optional[mindspore.Tensor] = None,
    ) -> Tuple[mindspore.Tensor, ...]:
        """Constructs the MSChatGLMModel.

        This method is used to forward the MSChatGLMModel. It takes in several parameters and returns a tuple of tensors.

        Args:
            self (MSChatGLMModel): The instance of the MSChatGLMModel class.
            input_ids (Optional[mindspore.Tensor]): The input tensor representing the tokenized input sequences. Default is None.
            position_ids (Optional[mindspore.Tensor]): The input tensor representing the position ids of the tokens. Default is None.
            attention_mask (Optional[mindspore.Tensor]): The input tensor representing the attention mask. Default is None.
            past_key_values (Optional[Tuple[Tuple[mindspore.Tensor, mindspore.Tensor], ...]]):
                The input tensor representing the past key values. Default is None.
            inputs_embeds (Optional[mindspore.Tensor]): The input tensor representing the embedded input sequences. Default is None.

        Returns:
            Tuple[mindspore.Tensor, ...]: A tuple containing the hidden states, presents, all hidden states, and all self attentions.

        Raises:
            ValueError: If both input_ids and inputs_embeds are specified.
            ValueError: If neither input_ids nor inputs_embeds are specified.

        """
        if input_ids is not None and inputs_embeds is not None:
            raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
        if input_ids is not None:
            batch_size, _ = input_ids.shape[:2]
        elif inputs_embeds is not None:
            batch_size, _ = inputs_embeds.shape[:2]
        else:
            raise ValueError("You have to specify either input_ids or inputs_embeds")

        if inputs_embeds is None:
            inputs_embeds = self.word_embeddings(input_ids)

        if past_key_values is None:
            if self.pre_seq_len is not None:
                past_key_values = self.get_prompt(batch_size=input_ids.shape[0],
                                                  dtype=inputs_embeds.dtype)
            else:
                past_key_values = tuple([None] * len(self.layers))

            if attention_mask is None:
                attention_mask = self.get_masks(
                    input_ids,
                )

            if position_ids is None:
                MASK, gMASK = self.config.mask_token_id, self.config.gmask_token_id
                seqs = input_ids.asnumpy().tolist()

                mask_positions, use_gmasks = [], []
                for seq in seqs:
                    mask_token = gMASK if gMASK in seq else MASK
                    use_gmask = mask_token == gMASK
                    mask_positions.append(seq.index(mask_token))
                    use_gmasks.append(use_gmask)

                position_ids = self.get_position_ids(
                    input_ids,
                    mask_positions=mask_positions,
                    use_gmasks=use_gmasks
                )

        if self.pre_seq_len is not None and attention_mask is not None:
            prefix_attention_mask = ops.ones((batch_size, 1, input_ids.shape[-1], self.pre_seq_len))
            prefix_attention_mask = (prefix_attention_mask < 0.5).bool()
            attention_mask = ops.cat((prefix_attention_mask, attention_mask), axis=3)

        # [seq_len, batch, hidden_size]
        hidden_states = inputs_embeds.swapaxes(0, 1)

        presents = ()
        all_self_attentions = ()
        all_hidden_states = ()

        if attention_mask is None:
            attention_mask = ops.zeros((1, 1)).bool()

        # past_key_values = past_key_values.chunk(self.num_layers, 0)
        for i, layer in enumerate(self.layers):
            if self.output_hidden_states:
                all_hidden_states = all_hidden_states + (hidden_states,)
            layer_past = past_key_values[i]

            layer_ret = layer(
                hidden_states,
                position_ids=position_ids,
                attention_mask=attention_mask,
                layer_id=mindspore.Tensor(i),
                layer_past=layer_past,
                use_cache=self.use_cache,
                output_attentions=self.output_attentions
            )
            hidden_states = layer_ret[0]

            if self.use_cache:
                presents = presents + (layer_ret[1],)

            if self.output_attentions:
                idx = 2 if self.use_cache else 1
                all_self_attentions = all_self_attentions + (layer_ret[idx],)

        # Final layer norm.
        # return (hidden_states,)
        hidden_states = self.final_layernorm(hidden_states)

        if self.output_hidden_states:
            all_hidden_states = all_hidden_states + (hidden_states,)

        if self.use_cache:
            presents = ops.stack(presents)

        return (hidden_states, presents, all_hidden_states, all_self_attentions)

mindnlp.transformers.models.chatglm.modeling_graph_chatglm.MSChatGLMModel.__init__(config)

Initializes an instance of the MSChatGLMModel class with the provided configuration.

PARAMETER DESCRIPTION
self

The instance of the MSChatGLMModel class.

config

The configuration for the model.

  • max_sequence_length (int): The maximum sequence length for the input.
  • hidden_size (int): The size of the hidden layer.
  • num_attention_heads (int): The number of attention heads.
  • vocab_size (int): The size of the vocabulary.
  • num_layers (int): The number of layers for the model.
  • layernorm_epsilon (float): The epsilon value for the layer normalization.
  • inner_hidden_size (int): The size of the inner hidden layer.
  • position_encoding_2d (bool): Whether to use 2D position encoding.
  • pre_seq_len (int): The length of the prefix sequence.
  • prefix_projection (bool): Whether to use prefix projection.
  • use_cache (bool): Whether to use cache.
  • output_hidden_states (bool): Whether to output hidden states.
  • output_attentions (bool): Whether to output attentions.

TYPE: ChatGLMConfig

RETURNS DESCRIPTION

None

Source code in mindnlp/transformers/models/chatglm/modeling_graph_chatglm.py
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
def __init__(self, config: ChatGLMConfig):
    """
    Initializes an instance of the MSChatGLMModel class with the provided configuration.

    Args:
        self: The instance of the MSChatGLMModel class.
        config (ChatGLMConfig):
            The configuration for the model.

            - max_sequence_length (int): The maximum sequence length for the input.
            - hidden_size (int): The size of the hidden layer.
            - num_attention_heads (int): The number of attention heads.
            - vocab_size (int): The size of the vocabulary.
            - num_layers (int): The number of layers for the model.
            - layernorm_epsilon (float): The epsilon value for the layer normalization.
            - inner_hidden_size (int): The size of the inner hidden layer.
            - position_encoding_2d (bool): Whether to use 2D position encoding.
            - pre_seq_len (int): The length of the prefix sequence.
            - prefix_projection (bool): Whether to use prefix projection.
            - use_cache (bool): Whether to use cache.
            - output_hidden_states (bool): Whether to output hidden states.
            - output_attentions (bool): Whether to output attentions.

    Returns:
        None

    Raises:
        None
    """
    super().__init__(config)
    # recording parameters
    self.max_sequence_length = config.max_sequence_length
    self.hidden_size = config.hidden_size
    self.params_dtype = mindspore.float16
    self.num_attention_heads = config.num_attention_heads
    self.vocab_size = config.vocab_size
    self.num_layers = config.num_layers
    self.layernorm_epsilon = config.layernorm_epsilon
    self.inner_hidden_size = config.inner_hidden_size
    self.hidden_size_per_attention_head = self.hidden_size // self.num_attention_heads
    self.position_encoding_2d = config.position_encoding_2d
    self.pre_seq_len = config.pre_seq_len
    self.prefix_projection = config.prefix_projection

    self.use_cache = config.use_cache
    self.output_hidden_states = config.output_hidden_states
    self.output_attentions = config.output_attentions

    self.word_embeddings = nn.Embedding(
        self.vocab_size, self.hidden_size,
        dtype=self.params_dtype
    )

    def get_layer(layer_id):
        return GLMBlock(
            config,
            self.hidden_size,
            self.num_attention_heads,
            self.layernorm_epsilon,
            layer_id,
            inner_hidden_size=self.inner_hidden_size,
            hidden_size_per_attention_head=self.hidden_size_per_attention_head,
            use_bias=True,
            params_dtype=self.params_dtype,
            position_encoding_2d=self.position_encoding_2d,
        )

    self.layers = nn.ModuleList(
        [get_layer(layer_id) for layer_id in range(self.num_layers)]
    )
    # Final layer norm before output.
    self.final_layernorm = nn.LayerNorm([self.hidden_size], eps=self.layernorm_epsilon)

    if self.pre_seq_len is not None:
        # for param in self.parameters():
        #     param.requires_grad = False
        self.prefix_tokens = Tensor(np.arange(self.pre_seq_len))
        self.prefix_encoder = PrefixEncoder(config)
        self.dropout = nn.Dropout(p=0.1)

mindnlp.transformers.models.chatglm.modeling_graph_chatglm.MSChatGLMModel.forward(input_ids=None, position_ids=None, attention_mask=None, past_key_values=None, inputs_embeds=None)

Constructs the MSChatGLMModel.

This method is used to forward the MSChatGLMModel. It takes in several parameters and returns a tuple of tensors.

PARAMETER DESCRIPTION
self

The instance of the MSChatGLMModel class.

TYPE: MSChatGLMModel

input_ids

The input tensor representing the tokenized input sequences. Default is None.

TYPE: Optional[Tensor] DEFAULT: None

position_ids

The input tensor representing the position ids of the tokens. Default is None.

TYPE: Optional[Tensor] DEFAULT: None

attention_mask

The input tensor representing the attention mask. Default is None.

TYPE: Optional[Tensor] DEFAULT: None

past_key_values

The input tensor representing the past key values. Default is None.

TYPE: Optional[Tuple[Tuple[Tensor, Tensor], ...]] DEFAULT: None

inputs_embeds

The input tensor representing the embedded input sequences. Default is None.

TYPE: Optional[Tensor] DEFAULT: None

RETURNS DESCRIPTION
Tuple[Tensor, ...]

Tuple[mindspore.Tensor, ...]: A tuple containing the hidden states, presents, all hidden states, and all self attentions.

RAISES DESCRIPTION
ValueError

If both input_ids and inputs_embeds are specified.

ValueError

If neither input_ids nor inputs_embeds are specified.

Source code in mindnlp/transformers/models/chatglm/modeling_graph_chatglm.py
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        past_key_values: Optional[Tuple[Tuple[mindspore.Tensor, mindspore.Tensor], ...]] = None,
        inputs_embeds: Optional[mindspore.Tensor] = None,
) -> Tuple[mindspore.Tensor, ...]:
    """Constructs the MSChatGLMModel.

    This method is used to forward the MSChatGLMModel. It takes in several parameters and returns a tuple of tensors.

    Args:
        self (MSChatGLMModel): The instance of the MSChatGLMModel class.
        input_ids (Optional[mindspore.Tensor]): The input tensor representing the tokenized input sequences. Default is None.
        position_ids (Optional[mindspore.Tensor]): The input tensor representing the position ids of the tokens. Default is None.
        attention_mask (Optional[mindspore.Tensor]): The input tensor representing the attention mask. Default is None.
        past_key_values (Optional[Tuple[Tuple[mindspore.Tensor, mindspore.Tensor], ...]]):
            The input tensor representing the past key values. Default is None.
        inputs_embeds (Optional[mindspore.Tensor]): The input tensor representing the embedded input sequences. Default is None.

    Returns:
        Tuple[mindspore.Tensor, ...]: A tuple containing the hidden states, presents, all hidden states, and all self attentions.

    Raises:
        ValueError: If both input_ids and inputs_embeds are specified.
        ValueError: If neither input_ids nor inputs_embeds are specified.

    """
    if input_ids is not None and inputs_embeds is not None:
        raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
    if input_ids is not None:
        batch_size, _ = input_ids.shape[:2]
    elif inputs_embeds is not None:
        batch_size, _ = inputs_embeds.shape[:2]
    else:
        raise ValueError("You have to specify either input_ids or inputs_embeds")

    if inputs_embeds is None:
        inputs_embeds = self.word_embeddings(input_ids)

    if past_key_values is None:
        if self.pre_seq_len is not None:
            past_key_values = self.get_prompt(batch_size=input_ids.shape[0],
                                              dtype=inputs_embeds.dtype)
        else:
            past_key_values = tuple([None] * len(self.layers))

        if attention_mask is None:
            attention_mask = self.get_masks(
                input_ids,
            )

        if position_ids is None:
            MASK, gMASK = self.config.mask_token_id, self.config.gmask_token_id
            seqs = input_ids.asnumpy().tolist()

            mask_positions, use_gmasks = [], []
            for seq in seqs:
                mask_token = gMASK if gMASK in seq else MASK
                use_gmask = mask_token == gMASK
                mask_positions.append(seq.index(mask_token))
                use_gmasks.append(use_gmask)

            position_ids = self.get_position_ids(
                input_ids,
                mask_positions=mask_positions,
                use_gmasks=use_gmasks
            )

    if self.pre_seq_len is not None and attention_mask is not None:
        prefix_attention_mask = ops.ones((batch_size, 1, input_ids.shape[-1], self.pre_seq_len))
        prefix_attention_mask = (prefix_attention_mask < 0.5).bool()
        attention_mask = ops.cat((prefix_attention_mask, attention_mask), axis=3)

    # [seq_len, batch, hidden_size]
    hidden_states = inputs_embeds.swapaxes(0, 1)

    presents = ()
    all_self_attentions = ()
    all_hidden_states = ()

    if attention_mask is None:
        attention_mask = ops.zeros((1, 1)).bool()

    # past_key_values = past_key_values.chunk(self.num_layers, 0)
    for i, layer in enumerate(self.layers):
        if self.output_hidden_states:
            all_hidden_states = all_hidden_states + (hidden_states,)
        layer_past = past_key_values[i]

        layer_ret = layer(
            hidden_states,
            position_ids=position_ids,
            attention_mask=attention_mask,
            layer_id=mindspore.Tensor(i),
            layer_past=layer_past,
            use_cache=self.use_cache,
            output_attentions=self.output_attentions
        )
        hidden_states = layer_ret[0]

        if self.use_cache:
            presents = presents + (layer_ret[1],)

        if self.output_attentions:
            idx = 2 if self.use_cache else 1
            all_self_attentions = all_self_attentions + (layer_ret[idx],)

    # Final layer norm.
    # return (hidden_states,)
    hidden_states = self.final_layernorm(hidden_states)

    if self.output_hidden_states:
        all_hidden_states = all_hidden_states + (hidden_states,)

    if self.use_cache:
        presents = ops.stack(presents)

    return (hidden_states, presents, all_hidden_states, all_self_attentions)

mindnlp.transformers.models.chatglm.modeling_graph_chatglm.MSChatGLMModel.get_input_embeddings()

Retrieve the input embeddings for the MSChatGLMModel.

PARAMETER DESCRIPTION
self

An instance of the MSChatGLMModel class.

TYPE: MSChatGLMModel

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/chatglm/modeling_graph_chatglm.py
844
845
846
847
848
849
850
851
852
853
854
855
856
857
def get_input_embeddings(self):
    """
    Retrieve the input embeddings for the MSChatGLMModel.

    Args:
        self (MSChatGLMModel): An instance of the MSChatGLMModel class.

    Returns:
        None.

    Raises:
        None.
    """
    return self.word_embeddings

mindnlp.transformers.models.chatglm.modeling_graph_chatglm.MSChatGLMModel.get_prompt(batch_size, dtype=mindspore.float16)

get prompt.

Source code in mindnlp/transformers/models/chatglm/modeling_graph_chatglm.py
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
def get_prompt(self, batch_size, dtype=mindspore.float16):
    """get prompt."""
    prefix_tokens = self.prefix_tokens.unsqueeze(0).expand(batch_size, -1)
    past_key_values = self.prefix_encoder(prefix_tokens).type(dtype)
    past_key_values = past_key_values.view(
        batch_size,
        self.pre_seq_len,
        self.num_layers * 2,
        self.num_attention_heads,
        self.hidden_size // self.num_attention_heads
    )
    # seq_len, b, nh, hidden_size
    past_key_values = self.dropout(past_key_values)
    past_key_values = past_key_values.permute([2, 1, 0, 3, 4]).split(2)
    # past_key_values = [(v[0], v[1]) for v in past_key_values]
    return past_key_values

mindnlp.transformers.models.chatglm.modeling_graph_chatglm.MSChatGLMModel.set_input_embeddings(new_embeddings)

Sets the input embeddings for the MSChatGLMModel.

PARAMETER DESCRIPTION
self

The instance of the MSChatGLMModel class.

TYPE: MSChatGLMModel

new_embeddings

The new embeddings to be set as input. It should be a tensor object representing the word embeddings.

TYPE: Tensor

RETURNS DESCRIPTION

None.

Note

The input embeddings are used for representing words in the MSChatGLMModel. By setting new embeddings, the model can be fine-tuned or customized to use different word representations.

Source code in mindnlp/transformers/models/chatglm/modeling_graph_chatglm.py
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
def set_input_embeddings(self, new_embeddings: mindspore.Tensor):
    """
    Sets the input embeddings for the MSChatGLMModel.

    Args:
        self (MSChatGLMModel): The instance of the MSChatGLMModel class.
        new_embeddings (mindspore.Tensor): The new embeddings to be set as input.
            It should be a tensor object representing the word embeddings.

    Returns:
        None.

    Raises:
        None.

    Note:
        The input embeddings are used for representing words in the MSChatGLMModel.
        By setting new embeddings, the model can be fine-tuned or customized to use different word representations.
    """
    self.word_embeddings = new_embeddings

mindnlp.transformers.models.chatglm.modeling_graph_chatglm.MSChatGLMPreTrainedModel

Bases: PreTrainedModel

An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained models.

Source code in mindnlp/transformers/models/chatglm/modeling_graph_chatglm.py
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
class MSChatGLMPreTrainedModel(PreTrainedModel):
    """
    An abstract class to handle weights initialization and
    a simple interface for downloading and loading pretrained models.
    """
    is_parallelizable = False
    config_class = ChatGLMConfig
    base_model_prefix = "transformer"
    _no_split_modules = ["GLMBlock"]
    _keys_to_ignore_on_load_unexpected = [r'inv_freq']

    def _init_weights(self, cell: nn.Module):
        """Initialize the weights."""
    def get_masks(self, input_ids):
        """get masks"""
        batch_size, seq_length = input_ids.shape
        context_lengths = [seq.asnumpy().tolist().index(self.config.bos_token_id) for seq in input_ids]
        attention_mask = ops.ones((batch_size, seq_length, seq_length))
        attention_mask = attention_mask.tril()
        for i, context_length in enumerate(context_lengths):
            attention_mask[i, :, :context_length] = 1
        attention_mask = attention_mask.unsqueeze(1)
        attention_mask = (attention_mask < 0.5).bool()

        return attention_mask

    def get_position_ids(self, input_ids, mask_positions, use_gmasks=None):
        """get position ids"""
        batch_size, seq_length = input_ids.shape
        if use_gmasks is None:
            use_gmasks = [False] * batch_size
        context_lengths = [seq.asnumpy().tolist().index(self.config.bos_token_id) for seq in input_ids]
        if self.position_encoding_2d:
            position_ids = ops.arange(seq_length, dtype=mindspore.int64).unsqueeze(0).tile((batch_size, 1))
            for i, context_length in enumerate(context_lengths):
                position_ids[i, context_length:] = mask_positions[i]
            block_position_ids = [ops.cat((
                ops.zeros(context_length, dtype=mindspore.int64),
                ops.arange(seq_length - context_length, dtype=mindspore.int64) + 1
            )) for context_length in context_lengths]
            block_position_ids = ops.stack(block_position_ids, axis=0)
            position_ids = ops.stack((position_ids, block_position_ids), axis=1)
        else:
            position_ids = ops.arange(seq_length, dtype=mindspore.int64).unsqueeze(0).tile((batch_size, 1))
            for i, context_length in enumerate(context_lengths):
                if not use_gmasks[i]:
                    position_ids[i, context_length:] = mask_positions[i]

        return position_ids

mindnlp.transformers.models.chatglm.modeling_graph_chatglm.MSChatGLMPreTrainedModel.get_masks(input_ids)

get masks

Source code in mindnlp/transformers/models/chatglm/modeling_graph_chatglm.py
707
708
709
710
711
712
713
714
715
716
717
718
def get_masks(self, input_ids):
    """get masks"""
    batch_size, seq_length = input_ids.shape
    context_lengths = [seq.asnumpy().tolist().index(self.config.bos_token_id) for seq in input_ids]
    attention_mask = ops.ones((batch_size, seq_length, seq_length))
    attention_mask = attention_mask.tril()
    for i, context_length in enumerate(context_lengths):
        attention_mask[i, :, :context_length] = 1
    attention_mask = attention_mask.unsqueeze(1)
    attention_mask = (attention_mask < 0.5).bool()

    return attention_mask

mindnlp.transformers.models.chatglm.modeling_graph_chatglm.MSChatGLMPreTrainedModel.get_position_ids(input_ids, mask_positions, use_gmasks=None)

get position ids

Source code in mindnlp/transformers/models/chatglm/modeling_graph_chatglm.py
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
def get_position_ids(self, input_ids, mask_positions, use_gmasks=None):
    """get position ids"""
    batch_size, seq_length = input_ids.shape
    if use_gmasks is None:
        use_gmasks = [False] * batch_size
    context_lengths = [seq.asnumpy().tolist().index(self.config.bos_token_id) for seq in input_ids]
    if self.position_encoding_2d:
        position_ids = ops.arange(seq_length, dtype=mindspore.int64).unsqueeze(0).tile((batch_size, 1))
        for i, context_length in enumerate(context_lengths):
            position_ids[i, context_length:] = mask_positions[i]
        block_position_ids = [ops.cat((
            ops.zeros(context_length, dtype=mindspore.int64),
            ops.arange(seq_length - context_length, dtype=mindspore.int64) + 1
        )) for context_length in context_lengths]
        block_position_ids = ops.stack(block_position_ids, axis=0)
        position_ids = ops.stack((position_ids, block_position_ids), axis=1)
    else:
        position_ids = ops.arange(seq_length, dtype=mindspore.int64).unsqueeze(0).tile((batch_size, 1))
        for i, context_length in enumerate(context_lengths):
            if not use_gmasks[i]:
                position_ids[i, context_length:] = mask_positions[i]

    return position_ids

mindnlp.transformers.models.chatglm.modeling_graph_chatglm.MSChatGLMForConditionalGeneration

Bases: MSChatGLMPreTrainedModel

MSChatGLMForConditionalGeneration

Source code in mindnlp/transformers/models/chatglm/modeling_graph_chatglm.py
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
class MSChatGLMForConditionalGeneration(MSChatGLMPreTrainedModel):
    """MSChatGLMForConditionalGeneration"""
    def __init__(self, config: ChatGLMConfig):
        """
        Initializes an instance of the MSChatGLMForConditionalGeneration class.

        Args:
            self: The instance of the MSChatGLMForConditionalGeneration class.
            config (ChatGLMConfig):
                An object of type ChatGLMConfig containing configuration parameters for the model.

                - max_sequence_length (int): The maximum length of input sequences.
                - position_encoding_2d (bool): Flag indicating whether to use 2D position encoding.
                - quantization_bit (int): Number of bits to use for quantization.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__(config)

        self.max_sequence_length = config.max_sequence_length
        self.position_encoding_2d = config.position_encoding_2d
        self.transformer = MSChatGLMModel(config)
        self.lm_head = nn.Linear(
            config.hidden_size,
            config.vocab_size,
            bias=False,
            dtype=mindspore.float16
        )
        self.quantized = False

        if self.config.quantization_bit:
            self.quantize(self.config.quantization_bit, empty_init=True)

    def get_output_embeddings(self):
        """
        Returns the output embeddings of the MSChatGLMForConditionalGeneration model.

        Args:
            self: The instance of the MSChatGLMForConditionalGeneration class.

        Returns:
            returns the output embeddings of the model as a tensor.

        Raises:
            None.

        This method retrieves the output embeddings of the MSChatGLMForConditionalGeneration model.
        The output embeddings are the final representations of the input tokens after being processed by the model's
        language model head. The embeddings are returned as a tensor.
        """
        return self.lm_head

    def set_output_embeddings(self, new_embeddings):
        """
        Set the output embeddings for the MSChatGLMForConditionalGeneration model.

        Args:
            self (MSChatGLMForConditionalGeneration): The instance of the MSChatGLMForConditionalGeneration class.
            new_embeddings (object): The new embeddings to be set as the output embeddings for the model.
                It can be of any valid type.

        Returns:
            None.

        Raises:
            None.
        """
        self.lm_head = new_embeddings

    def _update_model_kwargs_for_generation(
        self,
        outputs,
        model_kwargs: Dict[str, Any],
        is_encoder_decoder: bool = False,
        standardize_cache_format: bool = False,
    ) -> Dict[str, Any]:
        """
        This method '_update_model_kwargs_for_generation' in the class 'MSChatGLMForConditionalGeneration' updates
        the model_kwargs for generation based on the provided outputs and other parameters.

        Args:
            self: The instance of the class.
            outputs: The model outputs that are used to update the model_kwargs.
            model_kwargs (Dict[str, Any]): A dictionary containing keyword arguments for the model.
            is_encoder_decoder (bool): A boolean indicating whether the model is an encoder-decoder model. Default is False.
            standardize_cache_format (bool): A boolean indicating whether to standardize the cache format. Default is False.

        Returns:
            Dict[str, Any]: A dictionary containing updated keyword arguments for the model.

        Raises:
            ValueError: If the provided attention_mask has an unsupported data type.
            IndexError: If there are issues with indexing while updating position_ids.
        """
        # update past_key_values
        model_kwargs["past_key_values"] = self._extract_past_from_model_output(
            outputs, standardize_cache_format=standardize_cache_format
        )

        # update attention mask
        if "attention_mask" in model_kwargs:
            attention_mask = model_kwargs["attention_mask"]
            if attention_mask is not None and attention_mask.dtype == mindspore.bool_:
                attention_mask = ops.cat(
                    [attention_mask, attention_mask.new_ones((*attention_mask.shape[:3], 1))], axis=3)
                new_attention_mask = attention_mask[:, :, -1:].copy()
                new_attention_mask[..., -1] = False
                model_kwargs["attention_mask"] = ops.cat(
                    [attention_mask, new_attention_mask], axis=2
                )

        # update position ids
        if "position_ids" in model_kwargs:
            position_ids = model_kwargs["position_ids"]
            new_position_id = position_ids[..., -1:].copy()
            new_position_id[:, 1, :] += 1
            model_kwargs["position_ids"] = ops.cat(
                [position_ids, new_position_id], axis=-1
            )

        return model_kwargs

    def prepare_inputs_for_generation(
            self,
            input_ids: mindspore.Tensor,
            past: Optional[mindspore.Tensor] = None,
            past_key_values: Optional[mindspore.Tensor] = None,
            attention_mask: Optional[mindspore.Tensor] = None,
            position_ids: Optional[mindspore.Tensor] = None,
            **kwargs
    ) -> dict:
        """
        This method prepares inputs for generation in the MSChatGLMForConditionalGeneration class.

        Args:
            self: The instance of the class.
            input_ids (mindspore.Tensor): The input tensor containing token ids.
            past (Optional[mindspore.Tensor]): The past state tensor (default is None).
            past_key_values (Optional[mindspore.Tensor]): The past key values tensor (default is None).
            attention_mask (Optional[mindspore.Tensor]): The attention mask tensor (default is None).
            position_ids (Optional[mindspore.Tensor]): The position ids tensor (default is None).
            **kwargs: Additional keyword arguments.

        Returns:
            dict: A dictionary containing the prepared inputs for generation including 'input_ids', 'past_key_values',
                'position_ids', and 'attention_mask'.

        Raises:
            TypeError: If the input arguments are not of the expected types.
            ValueError: If there are issues with the input data or configuration.
            IndexError: If there are index out of bounds errors during processing.
            Warning: If there are issues with the dtype of attention mask.
        """
        batch_size, seq_length = input_ids.shape

        if self.get_inputs() is None:
            self.set_inputs(
                Tensor(shape=[batch_size, None], dtype=mindspore.int64), # input_ids
                Tensor(shape=[batch_size, 2, None], dtype=mindspore.int64), # position_ids
                Tensor(shape=[batch_size, 1, None, None], dtype=mindspore.bool_), # attention_mask
                Tensor(shape=[self.config.num_layers, 2, None, batch_size, 32, 128], dtype=mindspore.float16) # past_key_values
            )
        MASK, gMASK = self.config.mask_token_id, self.config.gmask_token_id
        seqs = input_ids.asnumpy().tolist()
        mask_positions, use_gmasks = [], []
        for seq in seqs:
            mask_token = gMASK if gMASK in seq else MASK
            use_gmask = mask_token == gMASK
            mask_positions.append(seq.index(mask_token))
            use_gmasks.append(use_gmask)

        # only last token for input_ids if past is not None
        if past is not None or past_key_values is not None:
            # past_key_values = ops.stack([ops.stack(past_key_values[i]) for i in range(self.config.num_layers)])
            last_token = input_ids[:, -1].unsqueeze(-1)
            if attention_mask is not None and attention_mask.dtype == mindspore.bool_:
                attention_mask = attention_mask[:, :, -1:]
            else:
                attention_mask = None

            if attention_mask is None:
                attention_mask = ops.zeros((1, 1, 1, 1)).bool()

            if position_ids is not None:
                position_ids = position_ids[..., -1:]
            else:
                context_lengths = [seq.index(self.config.bos_token_id) for seq in seqs]
                if self.position_encoding_2d:
                    position_ids = mindspore.Tensor(
                        [[mask_position, seq_length - context_length] for mask_position, context_length in
                         zip(mask_positions, context_lengths)], dtype=mindspore.int64).unsqueeze(-1)
                else:
                    position_ids = mindspore.Tensor(mask_positions, dtype=mindspore.int64).unsqueeze(-1)

            if past is None:
                past = past_key_values
            return {
                "input_ids": last_token,
                "past_key_values": past,
                "position_ids": position_ids,
                "attention_mask": attention_mask
            }
        else:
            if attention_mask is not None and attention_mask.dtype != mindspore.bool_:
                logger.warning_once(f"The dtype of attention mask ({attention_mask.dtype}) is not bool")
                attention_mask = None
            if attention_mask is None:
                attention_mask = self.get_masks(input_ids)
            if position_ids is None:
                position_ids = self.get_position_ids(input_ids, mask_positions=mask_positions, use_gmasks=use_gmasks)

            past_key_values = ops.zeros((28, 2, input_ids.shape[1], 1, 32, 128), dtype=mindspore.float16)
            return {
                "input_ids": input_ids,
                "past_key_values": past_key_values,
                "position_ids": position_ids,
                "attention_mask": attention_mask
            }

    def forward(
            self,
            input_ids: Optional[mindspore.Tensor] = None,
            position_ids: Optional[mindspore.Tensor] = None,
            attention_mask: Optional[mindspore.Tensor] = None,
            past_key_values: Optional[Tuple[mindspore.Tensor]] = None,
            **kwargs
    ):
        """
        Constructs the MSChatGLMForConditionalGeneration model.

        Args:
            self (MSChatGLMForConditionalGeneration): The instance of the MSChatGLMForConditionalGeneration class.
            input_ids (Optional[mindspore.Tensor]):
                The input tensor containing the tokenized input sequence. Default is None.
            position_ids (Optional[mindspore.Tensor]):
                The tensor containing the position indices for each token in the input sequence. Default is None.
            attention_mask (Optional[mindspore.Tensor]):
                The mask tensor indicating which elements in the input sequence should be attended to. Default is None.
            past_key_values (Optional[Tuple[mindspore.Tensor]]):
                The tuple of tensors containing the key-value pairs from the previous attention pass. Default is None.
            **kwargs: Additional keyword arguments.

        Returns:
            dict:
                A dictionary containing the following keys:

                - 'loss' (None): The loss value. Always None.
                - 'logits' (mindspore.Tensor): The output logits tensor of shape (batch_size, sequence_length, vocab_size).
                - 'past_key_values' (Tuple[mindspore.Tensor]): The tuple of tensors containing the key-value pairs from the current attention pass.
                - 'hidden_states' (mindspore.Tensor): The hidden states tensor of shape (batch_size, sequence_length, hidden_size).
                - 'attentions' (mindspore.Tensor): The attention tensor of shape (batch_size, num_heads, sequence_length, sequence_length).

        Raises:
            None.
        """
        transformer_outputs = self.transformer(
            input_ids=input_ids,
            position_ids=position_ids,
            attention_mask=attention_mask,
            past_key_values=past_key_values,
            inputs_embeds=None,
        )

        hidden_states = transformer_outputs[0]

        # return (hidden_states,)
        lm_logits = self.lm_head(hidden_states).permute(1, 0, 2)

        loss = None

        return {'loss': loss, 'logits': lm_logits,
                'past_key_values': transformer_outputs[1],
                'hidden_states': transformer_outputs[2],
                'attentions': transformer_outputs[3]
            }

    @staticmethod
    def _reorder_cache(
            past: Tuple[Tuple[mindspore.Tensor, mindspore.Tensor], ...], beam_idx: mindspore.Tensor
    ) -> Tuple[Tuple[mindspore.Tensor, mindspore.Tensor], ...]:
        """
        This function is used to re-order the `past_key_values` cache if [`~PreTrainedModel.beam_search`] or
        [`~PreTrainedModel.beam_sample`] is called. This is required to match `past_key_values` with the correct
        beam_idx at every generation step.

        Output shares the same memory storage as `past`.
        """
        return tuple(
            (
                layer_past[0].index_select(1, beam_idx),
                layer_past[1].index_select(1, beam_idx),
            )
            for layer_past in past
        )

    def process_response(self, response):
        """process_response"""
        response = response.strip()
        response = response.replace("[[训练时间]]", "2023年")
        punkts = [
            [",", ","],
            ["!", "!"],
            [":", ":"],
            [";", ";"],
            [r"\?", "?"],
        ]
        for item in punkts:
            response = re.sub(r"([\u4e00-\u9fff])%s" % item[0], r"\1%s" % item[1], response)
            response = re.sub(r"%s([\u4e00-\u9fff])" % item[0], r"%s\1" % item[1], response)
        return response

    def chat(self, tokenizer, query: str, history: List[Tuple[str, str]] = None, max_length: int = 2048, num_beams=1,
             do_sample=True, top_p=0.7, temperature=0.95, logits_processor=None, **kwargs):
        """chat."""
        if history is None:
            history = []
        if logits_processor is None:
            logits_processor = LogitsProcessorList()
        logits_processor.append(InvalidScoreLogitsProcessor())
        gen_kwargs = {"max_length": max_length, "num_beams": num_beams, "do_sample": do_sample, "top_p": top_p,
                      "temperature": temperature, "logits_processor": logits_processor, **kwargs}
        if not history:
            prompt = query
        else:
            prompt = ""
            for i, (old_query, response) in enumerate(history):
                prompt += "[Round {}]\n问:{}\n答:{}\n".format(i, old_query, response)
            prompt += "[Round {}]\n问:{}\n答:".format(len(history), query)
        inputs = tokenizer([prompt], return_tensors="ms")
        outputs = self.generate(**inputs, **gen_kwargs)
        outputs = outputs.tolist()[0][len(inputs["input_ids"][0]):]
        response = tokenizer.decode(outputs)
        response = self.process_response(response)
        history = history + [(query, response)]
        return response, history

    def stream_chat(self, tokenizer, query: str, history: List[Tuple[str, str]] = None, max_length: int = 2048,
                    do_sample=True, top_p=0.7, temperature=0.95, logits_processor=None, **kwargs):
        """stream chat"""
        if history is None:
            history = []
        if logits_processor is None:
            logits_processor = LogitsProcessorList()
        logits_processor.append(InvalidScoreLogitsProcessor())
        gen_kwargs = {"max_length": max_length, "do_sample": do_sample, "top_p": top_p,
                      "temperature": temperature, "logits_processor": logits_processor, **kwargs}
        if not history:
            prompt = query
        else:
            prompt = ""
            for i, (old_query, response) in enumerate(history):
                prompt += "[Round {}]\n问:{}\n答:{}\n".format(i, old_query, response)
            prompt += "[Round {}]\n问:{}\n答:".format(len(history), query)
        inputs = tokenizer([prompt], return_tensors="ms")
        for outputs in self.stream_generate(**inputs, **gen_kwargs):
            outputs = outputs.tolist()[0][len(inputs["input_ids"][0]):]
            response = tokenizer.decode(outputs)
            response = self.process_response(response)
            new_history = history + [(query, response)]
            yield response, new_history

    def stream_generate(
            self,
            input_ids,
            generation_config: Optional[GenerationConfig] = None,
            logits_processor: Optional[LogitsProcessorList] = None,
            stopping_criteria: Optional[StoppingCriteriaList] = None,
            prefix_allowed_tokens_fn: Optional[Callable[[int, mindspore.Tensor], List[int]]] = None,
            **kwargs,
    ):
        """stream generate"""
        _, input_ids_seq_length = input_ids.shape[0], input_ids.shape[-1]

        if generation_config is None:
            generation_config = self.generation_config
        generation_config = copy.deepcopy(generation_config)
        model_kwargs = generation_config.update(**kwargs)
        _, eos_token_id = generation_config.bos_token_id, generation_config.eos_token_id

        if isinstance(eos_token_id, int):
            eos_token_id = [eos_token_id]

        has_default_max_length = kwargs.get("max_length") is None and generation_config.max_length is not None
        if has_default_max_length and generation_config.max_new_tokens is None:
            warnings.warn(
                f"Using `max_length`'s default ({generation_config.max_length}) to control the generation length. "
                "This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we"
                " recommend using `max_new_tokens` to control the maximum length of the generation.",
                UserWarning,
            )
        elif generation_config.max_new_tokens is not None:
            generation_config.max_length = generation_config.max_new_tokens + input_ids_seq_length
            if not has_default_max_length:
                logger.warn(
                    f"Both `max_new_tokens` (={generation_config.max_new_tokens}) and `max_length`(="
                    f"{generation_config.max_length}) seem to have been set. `max_new_tokens` will take precedence. "
                    "Please refer to the documentation for more information. "
                    "(https://hf-mirror.com/docs/transformers/main/en/main_classes/text_generation)",
                    UserWarning,
                )

        if input_ids_seq_length >= generation_config.max_length:
            input_ids_string = "decoder_input_ids" if self.config.is_encoder_decoder else "input_ids"
            logger.warning(
                f"Input length of {input_ids_string} is {input_ids_seq_length}, but `max_length` is set to"
                f" {generation_config.max_length}. This can lead to unexpected behavior. You should consider"
                " increasing `max_new_tokens`."
            )

        # 2. Set generation parameters if not already defined
        logits_processor = logits_processor if logits_processor is not None else LogitsProcessorList()
        stopping_criteria = stopping_criteria if stopping_criteria is not None else StoppingCriteriaList()

        logits_processor = self._get_logits_processor(
            generation_config=generation_config,
            input_ids_seq_length=input_ids_seq_length,
            encoder_input_ids=input_ids,
            prefix_allowed_tokens_fn=prefix_allowed_tokens_fn,
            logits_processor=logits_processor,
        )

        stopping_criteria = self._get_stopping_criteria(
            generation_config=generation_config, stopping_criteria=stopping_criteria
        )
        logits_warper = self._get_logits_warper(generation_config)

        unfinished_sequences = input_ids.new(input_ids.shape[0]).fill(1)
        scores = None

        while True:
            model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
            # forward pass to get next token
            outputs = self(
                **model_inputs,
                return_dict=True,
                output_attentions=False,
                output_hidden_states=False,
            )

            next_token_logits = outputs.logits[:, -1, :]

            # pre-process distribution
            next_token_scores = logits_processor(input_ids, next_token_logits)
            next_token_scores = logits_warper(input_ids, next_token_scores)

            # sample
            probs = ops.softmax(next_token_scores, axis=-1)
            if generation_config.do_sample:
                next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
            else:
                next_tokens = ops.argmax(probs, dim=-1)

            # update generated ids, model inputs, and length for next step
            input_ids = ops.cat([input_ids, next_tokens[:, None]], axis=-1)
            model_kwargs = self._update_model_kwargs_for_generation(
                outputs, model_kwargs, is_encoder_decoder=self.config.is_encoder_decoder
            )
            unfinished_sequences = unfinished_sequences.mul((sum(next_tokens != i for i in eos_token_id)).long())

            # stop when each sentence is finished, or if we exceed the maximum length
            if unfinished_sequences.max() == 0 or stopping_criteria(input_ids, scores):
                break
            yield input_ids

    def quantize(self, bits: int, empty_init=False, **kwargs):
        """TODO: support quantize"""

mindnlp.transformers.models.chatglm.modeling_graph_chatglm.MSChatGLMForConditionalGeneration.__init__(config)

Initializes an instance of the MSChatGLMForConditionalGeneration class.

PARAMETER DESCRIPTION
self

The instance of the MSChatGLMForConditionalGeneration class.

config

An object of type ChatGLMConfig containing configuration parameters for the model.

  • max_sequence_length (int): The maximum length of input sequences.
  • position_encoding_2d (bool): Flag indicating whether to use 2D position encoding.
  • quantization_bit (int): Number of bits to use for quantization.

TYPE: ChatGLMConfig

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/chatglm/modeling_graph_chatglm.py
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
def __init__(self, config: ChatGLMConfig):
    """
    Initializes an instance of the MSChatGLMForConditionalGeneration class.

    Args:
        self: The instance of the MSChatGLMForConditionalGeneration class.
        config (ChatGLMConfig):
            An object of type ChatGLMConfig containing configuration parameters for the model.

            - max_sequence_length (int): The maximum length of input sequences.
            - position_encoding_2d (bool): Flag indicating whether to use 2D position encoding.
            - quantization_bit (int): Number of bits to use for quantization.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__(config)

    self.max_sequence_length = config.max_sequence_length
    self.position_encoding_2d = config.position_encoding_2d
    self.transformer = MSChatGLMModel(config)
    self.lm_head = nn.Linear(
        config.hidden_size,
        config.vocab_size,
        bias=False,
        dtype=mindspore.float16
    )
    self.quantized = False

    if self.config.quantization_bit:
        self.quantize(self.config.quantization_bit, empty_init=True)

mindnlp.transformers.models.chatglm.modeling_graph_chatglm.MSChatGLMForConditionalGeneration.chat(tokenizer, query, history=None, max_length=2048, num_beams=1, do_sample=True, top_p=0.7, temperature=0.95, logits_processor=None, **kwargs)

chat.

Source code in mindnlp/transformers/models/chatglm/modeling_graph_chatglm.py
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
def chat(self, tokenizer, query: str, history: List[Tuple[str, str]] = None, max_length: int = 2048, num_beams=1,
         do_sample=True, top_p=0.7, temperature=0.95, logits_processor=None, **kwargs):
    """chat."""
    if history is None:
        history = []
    if logits_processor is None:
        logits_processor = LogitsProcessorList()
    logits_processor.append(InvalidScoreLogitsProcessor())
    gen_kwargs = {"max_length": max_length, "num_beams": num_beams, "do_sample": do_sample, "top_p": top_p,
                  "temperature": temperature, "logits_processor": logits_processor, **kwargs}
    if not history:
        prompt = query
    else:
        prompt = ""
        for i, (old_query, response) in enumerate(history):
            prompt += "[Round {}]\n问:{}\n答:{}\n".format(i, old_query, response)
        prompt += "[Round {}]\n问:{}\n答:".format(len(history), query)
    inputs = tokenizer([prompt], return_tensors="ms")
    outputs = self.generate(**inputs, **gen_kwargs)
    outputs = outputs.tolist()[0][len(inputs["input_ids"][0]):]
    response = tokenizer.decode(outputs)
    response = self.process_response(response)
    history = history + [(query, response)]
    return response, history

mindnlp.transformers.models.chatglm.modeling_graph_chatglm.MSChatGLMForConditionalGeneration.forward(input_ids=None, position_ids=None, attention_mask=None, past_key_values=None, **kwargs)

Constructs the MSChatGLMForConditionalGeneration model.

PARAMETER DESCRIPTION
self

The instance of the MSChatGLMForConditionalGeneration class.

TYPE: MSChatGLMForConditionalGeneration

input_ids

The input tensor containing the tokenized input sequence. Default is None.

TYPE: Optional[Tensor] DEFAULT: None

position_ids

The tensor containing the position indices for each token in the input sequence. Default is None.

TYPE: Optional[Tensor] DEFAULT: None

attention_mask

The mask tensor indicating which elements in the input sequence should be attended to. Default is None.

TYPE: Optional[Tensor] DEFAULT: None

past_key_values

The tuple of tensors containing the key-value pairs from the previous attention pass. Default is None.

TYPE: Optional[Tuple[Tensor]] DEFAULT: None

**kwargs

Additional keyword arguments.

DEFAULT: {}

RETURNS DESCRIPTION
dict

A dictionary containing the following keys:

  • 'loss' (None): The loss value. Always None.
  • 'logits' (mindspore.Tensor): The output logits tensor of shape (batch_size, sequence_length, vocab_size).
  • 'past_key_values' (Tuple[mindspore.Tensor]): The tuple of tensors containing the key-value pairs from the current attention pass.
  • 'hidden_states' (mindspore.Tensor): The hidden states tensor of shape (batch_size, sequence_length, hidden_size).
  • 'attentions' (mindspore.Tensor): The attention tensor of shape (batch_size, num_heads, sequence_length, sequence_length).
Source code in mindnlp/transformers/models/chatglm/modeling_graph_chatglm.py
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
def forward(
        self,
        input_ids: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        past_key_values: Optional[Tuple[mindspore.Tensor]] = None,
        **kwargs
):
    """
    Constructs the MSChatGLMForConditionalGeneration model.

    Args:
        self (MSChatGLMForConditionalGeneration): The instance of the MSChatGLMForConditionalGeneration class.
        input_ids (Optional[mindspore.Tensor]):
            The input tensor containing the tokenized input sequence. Default is None.
        position_ids (Optional[mindspore.Tensor]):
            The tensor containing the position indices for each token in the input sequence. Default is None.
        attention_mask (Optional[mindspore.Tensor]):
            The mask tensor indicating which elements in the input sequence should be attended to. Default is None.
        past_key_values (Optional[Tuple[mindspore.Tensor]]):
            The tuple of tensors containing the key-value pairs from the previous attention pass. Default is None.
        **kwargs: Additional keyword arguments.

    Returns:
        dict:
            A dictionary containing the following keys:

            - 'loss' (None): The loss value. Always None.
            - 'logits' (mindspore.Tensor): The output logits tensor of shape (batch_size, sequence_length, vocab_size).
            - 'past_key_values' (Tuple[mindspore.Tensor]): The tuple of tensors containing the key-value pairs from the current attention pass.
            - 'hidden_states' (mindspore.Tensor): The hidden states tensor of shape (batch_size, sequence_length, hidden_size).
            - 'attentions' (mindspore.Tensor): The attention tensor of shape (batch_size, num_heads, sequence_length, sequence_length).

    Raises:
        None.
    """
    transformer_outputs = self.transformer(
        input_ids=input_ids,
        position_ids=position_ids,
        attention_mask=attention_mask,
        past_key_values=past_key_values,
        inputs_embeds=None,
    )

    hidden_states = transformer_outputs[0]

    # return (hidden_states,)
    lm_logits = self.lm_head(hidden_states).permute(1, 0, 2)

    loss = None

    return {'loss': loss, 'logits': lm_logits,
            'past_key_values': transformer_outputs[1],
            'hidden_states': transformer_outputs[2],
            'attentions': transformer_outputs[3]
        }

mindnlp.transformers.models.chatglm.modeling_graph_chatglm.MSChatGLMForConditionalGeneration.get_output_embeddings()

Returns the output embeddings of the MSChatGLMForConditionalGeneration model.

PARAMETER DESCRIPTION
self

The instance of the MSChatGLMForConditionalGeneration class.

RETURNS DESCRIPTION

returns the output embeddings of the model as a tensor.

This method retrieves the output embeddings of the MSChatGLMForConditionalGeneration model. The output embeddings are the final representations of the input tokens after being processed by the model's language model head. The embeddings are returned as a tensor.

Source code in mindnlp/transformers/models/chatglm/modeling_graph_chatglm.py
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
def get_output_embeddings(self):
    """
    Returns the output embeddings of the MSChatGLMForConditionalGeneration model.

    Args:
        self: The instance of the MSChatGLMForConditionalGeneration class.

    Returns:
        returns the output embeddings of the model as a tensor.

    Raises:
        None.

    This method retrieves the output embeddings of the MSChatGLMForConditionalGeneration model.
    The output embeddings are the final representations of the input tokens after being processed by the model's
    language model head. The embeddings are returned as a tensor.
    """
    return self.lm_head

mindnlp.transformers.models.chatglm.modeling_graph_chatglm.MSChatGLMForConditionalGeneration.prepare_inputs_for_generation(input_ids, past=None, past_key_values=None, attention_mask=None, position_ids=None, **kwargs)

This method prepares inputs for generation in the MSChatGLMForConditionalGeneration class.

PARAMETER DESCRIPTION
self

The instance of the class.

input_ids

The input tensor containing token ids.

TYPE: Tensor

past

The past state tensor (default is None).

TYPE: Optional[Tensor] DEFAULT: None

past_key_values

The past key values tensor (default is None).

TYPE: Optional[Tensor] DEFAULT: None

attention_mask

The attention mask tensor (default is None).

TYPE: Optional[Tensor] DEFAULT: None

position_ids

The position ids tensor (default is None).

TYPE: Optional[Tensor] DEFAULT: None

**kwargs

Additional keyword arguments.

DEFAULT: {}

RETURNS DESCRIPTION
dict

A dictionary containing the prepared inputs for generation including 'input_ids', 'past_key_values', 'position_ids', and 'attention_mask'.

TYPE: dict

RAISES DESCRIPTION
TypeError

If the input arguments are not of the expected types.

ValueError

If there are issues with the input data or configuration.

IndexError

If there are index out of bounds errors during processing.

Warning

If there are issues with the dtype of attention mask.

Source code in mindnlp/transformers/models/chatglm/modeling_graph_chatglm.py
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
def prepare_inputs_for_generation(
        self,
        input_ids: mindspore.Tensor,
        past: Optional[mindspore.Tensor] = None,
        past_key_values: Optional[mindspore.Tensor] = None,
        attention_mask: Optional[mindspore.Tensor] = None,
        position_ids: Optional[mindspore.Tensor] = None,
        **kwargs
) -> dict:
    """
    This method prepares inputs for generation in the MSChatGLMForConditionalGeneration class.

    Args:
        self: The instance of the class.
        input_ids (mindspore.Tensor): The input tensor containing token ids.
        past (Optional[mindspore.Tensor]): The past state tensor (default is None).
        past_key_values (Optional[mindspore.Tensor]): The past key values tensor (default is None).
        attention_mask (Optional[mindspore.Tensor]): The attention mask tensor (default is None).
        position_ids (Optional[mindspore.Tensor]): The position ids tensor (default is None).
        **kwargs: Additional keyword arguments.

    Returns:
        dict: A dictionary containing the prepared inputs for generation including 'input_ids', 'past_key_values',
            'position_ids', and 'attention_mask'.

    Raises:
        TypeError: If the input arguments are not of the expected types.
        ValueError: If there are issues with the input data or configuration.
        IndexError: If there are index out of bounds errors during processing.
        Warning: If there are issues with the dtype of attention mask.
    """
    batch_size, seq_length = input_ids.shape

    if self.get_inputs() is None:
        self.set_inputs(
            Tensor(shape=[batch_size, None], dtype=mindspore.int64), # input_ids
            Tensor(shape=[batch_size, 2, None], dtype=mindspore.int64), # position_ids
            Tensor(shape=[batch_size, 1, None, None], dtype=mindspore.bool_), # attention_mask
            Tensor(shape=[self.config.num_layers, 2, None, batch_size, 32, 128], dtype=mindspore.float16) # past_key_values
        )
    MASK, gMASK = self.config.mask_token_id, self.config.gmask_token_id
    seqs = input_ids.asnumpy().tolist()
    mask_positions, use_gmasks = [], []
    for seq in seqs:
        mask_token = gMASK if gMASK in seq else MASK
        use_gmask = mask_token == gMASK
        mask_positions.append(seq.index(mask_token))
        use_gmasks.append(use_gmask)

    # only last token for input_ids if past is not None
    if past is not None or past_key_values is not None:
        # past_key_values = ops.stack([ops.stack(past_key_values[i]) for i in range(self.config.num_layers)])
        last_token = input_ids[:, -1].unsqueeze(-1)
        if attention_mask is not None and attention_mask.dtype == mindspore.bool_:
            attention_mask = attention_mask[:, :, -1:]
        else:
            attention_mask = None

        if attention_mask is None:
            attention_mask = ops.zeros((1, 1, 1, 1)).bool()

        if position_ids is not None:
            position_ids = position_ids[..., -1:]
        else:
            context_lengths = [seq.index(self.config.bos_token_id) for seq in seqs]
            if self.position_encoding_2d:
                position_ids = mindspore.Tensor(
                    [[mask_position, seq_length - context_length] for mask_position, context_length in
                     zip(mask_positions, context_lengths)], dtype=mindspore.int64).unsqueeze(-1)
            else:
                position_ids = mindspore.Tensor(mask_positions, dtype=mindspore.int64).unsqueeze(-1)

        if past is None:
            past = past_key_values
        return {
            "input_ids": last_token,
            "past_key_values": past,
            "position_ids": position_ids,
            "attention_mask": attention_mask
        }
    else:
        if attention_mask is not None and attention_mask.dtype != mindspore.bool_:
            logger.warning_once(f"The dtype of attention mask ({attention_mask.dtype}) is not bool")
            attention_mask = None
        if attention_mask is None:
            attention_mask = self.get_masks(input_ids)
        if position_ids is None:
            position_ids = self.get_position_ids(input_ids, mask_positions=mask_positions, use_gmasks=use_gmasks)

        past_key_values = ops.zeros((28, 2, input_ids.shape[1], 1, 32, 128), dtype=mindspore.float16)
        return {
            "input_ids": input_ids,
            "past_key_values": past_key_values,
            "position_ids": position_ids,
            "attention_mask": attention_mask
        }

mindnlp.transformers.models.chatglm.modeling_graph_chatglm.MSChatGLMForConditionalGeneration.process_response(response)

process_response

Source code in mindnlp/transformers/models/chatglm/modeling_graph_chatglm.py
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
def process_response(self, response):
    """process_response"""
    response = response.strip()
    response = response.replace("[[训练时间]]", "2023年")
    punkts = [
        [",", ","],
        ["!", "!"],
        [":", ":"],
        [";", ";"],
        [r"\?", "?"],
    ]
    for item in punkts:
        response = re.sub(r"([\u4e00-\u9fff])%s" % item[0], r"\1%s" % item[1], response)
        response = re.sub(r"%s([\u4e00-\u9fff])" % item[0], r"%s\1" % item[1], response)
    return response

mindnlp.transformers.models.chatglm.modeling_graph_chatglm.MSChatGLMForConditionalGeneration.quantize(bits, empty_init=False, **kwargs)

Source code in mindnlp/transformers/models/chatglm/modeling_graph_chatglm.py
1487
1488
def quantize(self, bits: int, empty_init=False, **kwargs):
    """TODO: support quantize"""

mindnlp.transformers.models.chatglm.modeling_graph_chatglm.MSChatGLMForConditionalGeneration.set_output_embeddings(new_embeddings)

Set the output embeddings for the MSChatGLMForConditionalGeneration model.

PARAMETER DESCRIPTION
self

The instance of the MSChatGLMForConditionalGeneration class.

TYPE: MSChatGLMForConditionalGeneration

new_embeddings

The new embeddings to be set as the output embeddings for the model. It can be of any valid type.

TYPE: object

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/chatglm/modeling_graph_chatglm.py
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
def set_output_embeddings(self, new_embeddings):
    """
    Set the output embeddings for the MSChatGLMForConditionalGeneration model.

    Args:
        self (MSChatGLMForConditionalGeneration): The instance of the MSChatGLMForConditionalGeneration class.
        new_embeddings (object): The new embeddings to be set as the output embeddings for the model.
            It can be of any valid type.

    Returns:
        None.

    Raises:
        None.
    """
    self.lm_head = new_embeddings

mindnlp.transformers.models.chatglm.modeling_graph_chatglm.MSChatGLMForConditionalGeneration.stream_chat(tokenizer, query, history=None, max_length=2048, do_sample=True, top_p=0.7, temperature=0.95, logits_processor=None