Skip to content

graphormer

mindnlp.transformers.models.graphormer.configuration_graphormer.GraphormerConfig

Bases: PretrainedConfig

This is the configuration class to store the configuration of a [~GraphormerModel]. It is used to instantiate an Graphormer model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the Graphormer graphormer-base-pcqm4mv1 architecture.

Configuration objects inherit from [PretrainedConfig] and can be used to control the model outputs. Read the documentation from [PretrainedConfig] for more information.

PARAMETER DESCRIPTION
num_classes

Number of target classes or labels, set to n for binary classification of n tasks.

TYPE: `int`, *optional*, defaults to 1 DEFAULT: 1

num_atoms

Number of node types in the graphs.

TYPE: `int`, *optional*, defaults to 512*9 DEFAULT: 512 * 9

num_edges

Number of edges types in the graph.

TYPE: `int`, *optional*, defaults to 512*3 DEFAULT: 512 * 3

num_in_degree

Number of in degrees types in the input graphs.

TYPE: `int`, *optional*, defaults to 512 DEFAULT: 512

num_out_degree

Number of out degrees types in the input graphs.

TYPE: `int`, *optional*, defaults to 512 DEFAULT: 512

num_edge_dis

Number of edge dis in the input graphs.

TYPE: `int`, *optional*, defaults to 128 DEFAULT: 128

multi_hop_max_dist

Maximum distance of multi hop edges between two nodes.

TYPE: `int`, *optional*, defaults to 20 DEFAULT: 5

spatial_pos_max

Maximum distance between nodes in the graph attention bias matrices, used during preprocessing and collation.

TYPE: `int`, *optional*, defaults to 1024 DEFAULT: 1024

edge_type

Type of edge relation chosen.

TYPE: `str`, *optional*, defaults to multihop DEFAULT: 'multi_hop'

max_nodes

Maximum number of nodes which can be parsed for the input graphs.

TYPE: `int`, *optional*, defaults to 512 DEFAULT: 512

share_input_output_embed

Shares the embedding layer between encoder and decoder - careful, True is not implemented.

TYPE: `bool`, *optional*, defaults to `False` DEFAULT: False

num_layers

Number of layers.

TYPE: `int`, *optional*, defaults to 12

embedding_dim

Dimension of the embedding layer in encoder.

TYPE: `int`, *optional*, defaults to 768 DEFAULT: 768

ffn_embedding_dim

Dimension of the "intermediate" (often named feed-forward) layer in encoder.

TYPE: `int`, *optional*, defaults to 768 DEFAULT: 768

num_attention_heads

Number of attention heads in the encoder.

TYPE: `int`, *optional*, defaults to 32 DEFAULT: 32

self_attention

Model is self attentive (False not implemented).

TYPE: `bool`, *optional*, defaults to `True` DEFAULT: True

activation_function

The non-linear activation function (function or string) in the encoder and pooler. If string, "gelu", "relu", "silu" and "gelu_new" are supported.

TYPE: `str` or `function`, *optional*, defaults to `"gelu"`

dropout

The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.

TYPE: `float`, *optional*, defaults to 0.1 DEFAULT: 0.1

attention_dropout

The dropout probability for the attention weights.

TYPE: `float`, *optional*, defaults to 0.1 DEFAULT: 0.1

activation_dropout

The dropout probability for the activation of the linear transformer layer.

TYPE: `float`, *optional*, defaults to 0.1 DEFAULT: 0.1

layerdrop

The LayerDrop probability for the encoder. See the LayerDrop paper for more details.

TYPE: `float`, *optional*, defaults to 0.0 DEFAULT: 0.0

bias

Uses bias in the attention module - unsupported at the moment.

TYPE: `bool`, *optional*, defaults to `True` DEFAULT: True

embed_scale(`float`,

Scaling factor for the node embeddings.

TYPE: *optional*, defaults to None

num_trans_layers_to_freeze

Number of transformer layers to freeze.

TYPE: `int`, *optional*, defaults to 0 DEFAULT: 0

encoder_normalize_before

Normalize features before encoding the graph.

TYPE: `bool`, *optional*, defaults to `False` DEFAULT: False

pre_layernorm

Apply layernorm before self attention and the feed forward network. Without this, post layernorm will be used.

TYPE: `bool`, *optional*, defaults to `False` DEFAULT: False

apply_graphormer_init

Apply a custom graphormer initialisation to the model before training.

TYPE: `bool`, *optional*, defaults to `False` DEFAULT: False

freeze_embeddings

Freeze the embedding layer, or train it along the model.

TYPE: `bool`, *optional*, defaults to `False` DEFAULT: False

encoder_normalize_before

Apply the layer norm before each encoder block.

TYPE: `bool`, *optional*, defaults to `False` DEFAULT: False

q_noise

Amount of quantization noise (see "Training with Quantization Noise for Extreme Model Compression"). (For more detail, see fairseq's documentation on quant_noise).

TYPE: `float`, *optional*, defaults to 0.0 DEFAULT: 0.0

qn_block_size

Size of the blocks for subsequent quantization with iPQ (see q_noise).

TYPE: `int`, *optional*, defaults to 8 DEFAULT: 8

kdim

Dimension of the key in the attention, if different from the other values.

TYPE: `int`, *optional*, defaults to None DEFAULT: None

vdim

Dimension of the value in the attention, if different from the other values.

TYPE: `int`, *optional*, defaults to None DEFAULT: None

use_cache

Whether or not the model should return the last key/values attentions (not used by all models).

TYPE: `bool`, *optional*, defaults to `True`

traceable

Changes return value of the encoder's inner_state to stacked tensors.

TYPE: `bool`, *optional*, defaults to `False` DEFAULT: False

Example
>>> from transformers import GraphormerForGraphClassification, GraphormerConfig
...
>>> # Initializing a Graphormer graphormer-base-pcqm4mv2 style configuration
>>> configuration = GraphormerConfig()
...
>>> # Initializing a model from the graphormer-base-pcqm4mv1 style configuration
>>> model = GraphormerForGraphClassification(configuration)
...
>>> # Accessing the model configuration
>>> configuration = model.config
Source code in mindnlp/transformers/models/graphormer/configuration_graphormer.py
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
class GraphormerConfig(PretrainedConfig):
    r"""
    This is the configuration class to store the configuration of a [`~GraphormerModel`]. It is used to instantiate an
    Graphormer model according to the specified arguments, defining the model architecture. Instantiating a
    configuration with the defaults will yield a similar configuration to that of the Graphormer
    [graphormer-base-pcqm4mv1](https://hf-mirror.com/graphormer-base-pcqm4mv1) architecture.

    Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
    documentation from [`PretrainedConfig`] for more information.


    Args:
        num_classes (`int`, *optional*, defaults to 1):
            Number of target classes or labels, set to n for binary classification of n tasks.
        num_atoms (`int`, *optional*, defaults to 512*9):
            Number of node types in the graphs.
        num_edges (`int`, *optional*, defaults to 512*3):
            Number of edges types in the graph.
        num_in_degree (`int`, *optional*, defaults to 512):
            Number of in degrees types in the input graphs.
        num_out_degree (`int`, *optional*, defaults to 512):
            Number of out degrees types in the input graphs.
        num_edge_dis (`int`, *optional*, defaults to 128):
            Number of edge dis in the input graphs.
        multi_hop_max_dist (`int`, *optional*, defaults to 20):
            Maximum distance of multi hop edges between two nodes.
        spatial_pos_max (`int`, *optional*, defaults to 1024):
            Maximum distance between nodes in the graph attention bias matrices, used during preprocessing and
            collation.
        edge_type (`str`, *optional*, defaults to multihop):
            Type of edge relation chosen.
        max_nodes (`int`, *optional*, defaults to 512):
            Maximum number of nodes which can be parsed for the input graphs.
        share_input_output_embed (`bool`, *optional*, defaults to `False`):
            Shares the embedding layer between encoder and decoder - careful, True is not implemented.
        num_layers (`int`, *optional*, defaults to 12):
            Number of layers.
        embedding_dim (`int`, *optional*, defaults to 768):
            Dimension of the embedding layer in encoder.
        ffn_embedding_dim (`int`, *optional*, defaults to 768):
            Dimension of the "intermediate" (often named feed-forward) layer in encoder.
        num_attention_heads (`int`, *optional*, defaults to 32):
            Number of attention heads in the encoder.
        self_attention (`bool`, *optional*, defaults to `True`):
            Model is self attentive (False not implemented).
        activation_function (`str` or `function`, *optional*, defaults to `"gelu"`):
            The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
            `"relu"`, `"silu"` and `"gelu_new"` are supported.
        dropout (`float`, *optional*, defaults to 0.1):
            The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
        attention_dropout (`float`, *optional*, defaults to 0.1):
            The dropout probability for the attention weights.
        activation_dropout (`float`, *optional*, defaults to 0.1):
            The dropout probability for the activation of the linear transformer layer.
        layerdrop (`float`, *optional*, defaults to 0.0):
            The LayerDrop probability for the encoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556)
            for more details.
        bias (`bool`, *optional*, defaults to `True`):
            Uses bias in the attention module - unsupported at the moment.
        embed_scale(`float`, *optional*, defaults to None):
            Scaling factor for the node embeddings.
        num_trans_layers_to_freeze (`int`, *optional*, defaults to 0):
            Number of transformer layers to freeze.
        encoder_normalize_before (`bool`, *optional*, defaults to `False`):
            Normalize features before encoding the graph.
        pre_layernorm (`bool`, *optional*, defaults to `False`):
            Apply layernorm before self attention and the feed forward network. Without this, post layernorm will be
            used.
        apply_graphormer_init (`bool`, *optional*, defaults to `False`):
            Apply a custom graphormer initialisation to the model before training.
        freeze_embeddings (`bool`, *optional*, defaults to `False`):
            Freeze the embedding layer, or train it along the model.
        encoder_normalize_before (`bool`, *optional*, defaults to `False`):
            Apply the layer norm before each encoder block.
        q_noise (`float`, *optional*, defaults to 0.0):
            Amount of quantization noise (see "Training with Quantization Noise for Extreme Model Compression"). (For
            more detail, see fairseq's documentation on quant_noise).
        qn_block_size (`int`, *optional*, defaults to 8):
            Size of the blocks for subsequent quantization with iPQ (see q_noise).
        kdim (`int`, *optional*, defaults to None):
            Dimension of the key in the attention, if different from the other values.
        vdim (`int`, *optional*, defaults to None):
            Dimension of the value in the attention, if different from the other values.
        use_cache (`bool`, *optional*, defaults to `True`):
            Whether or not the model should return the last key/values attentions (not used by all models).
        traceable (`bool`, *optional*, defaults to `False`):
            Changes return value of the encoder's inner_state to stacked tensors.

    Example:
        ```python
        >>> from transformers import GraphormerForGraphClassification, GraphormerConfig
        ...
        >>> # Initializing a Graphormer graphormer-base-pcqm4mv2 style configuration
        >>> configuration = GraphormerConfig()
        ...
        >>> # Initializing a model from the graphormer-base-pcqm4mv1 style configuration
        >>> model = GraphormerForGraphClassification(configuration)
        ...
        >>> # Accessing the model configuration
        >>> configuration = model.config
        ```
    """
    model_type = "graphormer"
    keys_to_ignore_at_inference = ["past_key_values"]

    def __init__(
        self,
        num_classes: int = 1,
        num_atoms: int = 512 * 9,
        num_edges: int = 512 * 3,
        num_in_degree: int = 512,
        num_out_degree: int = 512,
        num_spatial: int = 512,
        num_edge_dis: int = 128,
        multi_hop_max_dist: int = 5,  # sometimes is 20
        spatial_pos_max: int = 1024,
        edge_type: str = "multi_hop",
        max_nodes: int = 512,
        share_input_output_embed: bool = False,
        num_hidden_layers: int = 12,
        embedding_dim: int = 768,
        ffn_embedding_dim: int = 768,
        num_attention_heads: int = 32,
        dropout: float = 0.1,
        attention_dropout: float = 0.1,
        activation_dropout: float = 0.1,
        layerdrop: float = 0.0,
        encoder_normalize_before: bool = False,
        pre_layernorm: bool = False,
        apply_graphormer_init: bool = False,
        activation_fn: str = "gelu",
        embed_scale: float = None,
        freeze_embeddings: bool = False,
        num_trans_layers_to_freeze: int = 0,
        traceable: bool = False,
        q_noise: float = 0.0,
        qn_block_size: int = 8,
        kdim: int = None,
        vdim: int = None,
        bias: bool = True,
        self_attention: bool = True,
        pad_token_id=0,
        bos_token_id=1,
        eos_token_id=2,
        **kwargs,
    ):
        """
        Initialize a GraphormerConfig object with specified configuration parameters.

        Args:
            num_classes (int): Number of classes for classification task.
            num_atoms (int): Number of atoms in the graph.
            num_edges (int): Number of edges in the graph.
            num_in_degree (int): Number of incoming degrees for each node.
            num_out_degree (int): Number of outgoing degrees for each node.
            num_spatial (int): Number of spatial features.
            num_edge_dis (int): Number of edge distances.
            multi_hop_max_dist (int): Maximum distance for multi-hop attention.
            spatial_pos_max (int): Maximum spatial position value.
            edge_type (str): Type of edges in the graph.
            max_nodes (int): Maximum number of nodes in the graph.
            share_input_output_embed (bool): Flag to indicate sharing input and output embeddings.
            num_hidden_layers (int): Number of hidden layers.
            embedding_dim (int): Dimension of embeddings.
            ffn_embedding_dim (int): Dimension of feed-forward network embeddings.
            num_attention_heads (int): Number of attention heads.
            dropout (float): Dropout rate.
            attention_dropout (float): Dropout rate for attention layers.
            activation_dropout (float): Dropout rate for activation layers.
            layerdrop (float): Layer drop probability.
            encoder_normalize_before (bool): Flag to normalize before encoder layers.
            pre_layernorm (bool): Flag to apply pre-layer normalization.
            apply_graphormer_init (bool): Flag to apply Graphormer initialization.
            activation_fn (str): Activation function to use.
            embed_scale (float): Scaling factor for embeddings.
            freeze_embeddings (bool): Flag to freeze embeddings.
            num_trans_layers_to_freeze (int): Number of transformer layers to freeze.
            traceable (bool): Flag for traceability.
            q_noise (float): Quantum noise level.
            qn_block_size (int): Quantum noise block size.
            kdim (int): Key dimension.
            vdim (int): Value dimension.
            bias (bool): Flag to include bias terms.
            self_attention (bool): Flag to use self-attention mechanism.
            pad_token_id: ID for padding token.
            bos_token_id: ID for beginning-of-sequence token.
            eos_token_id: ID for end-of-sequence token.
            **kwargs: Additional keyword arguments.

        Returns:
            None.

        Raises:
            None.
        """
        self.num_classes = num_classes
        self.num_atoms = num_atoms
        self.num_in_degree = num_in_degree
        self.num_out_degree = num_out_degree
        self.num_edges = num_edges
        self.num_spatial = num_spatial
        self.num_edge_dis = num_edge_dis
        self.edge_type = edge_type
        self.multi_hop_max_dist = multi_hop_max_dist
        self.spatial_pos_max = spatial_pos_max
        self.max_nodes = max_nodes
        self.num_hidden_layers = num_hidden_layers
        self.embedding_dim = embedding_dim
        self.hidden_size = embedding_dim
        self.ffn_embedding_dim = ffn_embedding_dim
        self.num_attention_heads = num_attention_heads
        self.dropout = dropout
        self.attention_dropout = attention_dropout
        self.activation_dropout = activation_dropout
        self.layerdrop = layerdrop
        self.encoder_normalize_before = encoder_normalize_before
        self.pre_layernorm = pre_layernorm
        self.apply_graphormer_init = apply_graphormer_init
        self.activation_fn = activation_fn
        self.embed_scale = embed_scale
        self.freeze_embeddings = freeze_embeddings
        self.num_trans_layers_to_freeze = num_trans_layers_to_freeze
        self.share_input_output_embed = share_input_output_embed
        self.traceable = traceable
        self.q_noise = q_noise
        self.qn_block_size = qn_block_size

        # These parameters are here for future extensions
        # atm, the model only supports self attention
        self.kdim = kdim
        self.vdim = vdim
        self.self_attention = self_attention
        self.bias = bias

        super().__init__(
            pad_token_id=pad_token_id,
            bos_token_id=bos_token_id,
            eos_token_id=eos_token_id,
            **kwargs,
        )

mindnlp.transformers.models.graphormer.configuration_graphormer.GraphormerConfig.__init__(num_classes=1, num_atoms=512 * 9, num_edges=512 * 3, num_in_degree=512, num_out_degree=512, num_spatial=512, num_edge_dis=128, multi_hop_max_dist=5, spatial_pos_max=1024, edge_type='multi_hop', max_nodes=512, share_input_output_embed=False, num_hidden_layers=12, embedding_dim=768, ffn_embedding_dim=768, num_attention_heads=32, dropout=0.1, attention_dropout=0.1, activation_dropout=0.1, layerdrop=0.0, encoder_normalize_before=False, pre_layernorm=False, apply_graphormer_init=False, activation_fn='gelu', embed_scale=None, freeze_embeddings=False, num_trans_layers_to_freeze=0, traceable=False, q_noise=0.0, qn_block_size=8, kdim=None, vdim=None, bias=True, self_attention=True, pad_token_id=0, bos_token_id=1, eos_token_id=2, **kwargs)

Initialize a GraphormerConfig object with specified configuration parameters.

PARAMETER DESCRIPTION
num_classes

Number of classes for classification task.

TYPE: int DEFAULT: 1

num_atoms

Number of atoms in the graph.

TYPE: int DEFAULT: 512 * 9

num_edges

Number of edges in the graph.

TYPE: int DEFAULT: 512 * 3

num_in_degree

Number of incoming degrees for each node.

TYPE: int DEFAULT: 512

num_out_degree

Number of outgoing degrees for each node.

TYPE: int DEFAULT: 512

num_spatial

Number of spatial features.

TYPE: int DEFAULT: 512

num_edge_dis

Number of edge distances.

TYPE: int DEFAULT: 128

multi_hop_max_dist

Maximum distance for multi-hop attention.

TYPE: int DEFAULT: 5

spatial_pos_max

Maximum spatial position value.

TYPE: int DEFAULT: 1024

edge_type

Type of edges in the graph.

TYPE: str DEFAULT: 'multi_hop'

max_nodes

Maximum number of nodes in the graph.

TYPE: int DEFAULT: 512

share_input_output_embed

Flag to indicate sharing input and output embeddings.

TYPE: bool DEFAULT: False

num_hidden_layers

Number of hidden layers.

TYPE: int DEFAULT: 12

embedding_dim

Dimension of embeddings.

TYPE: int DEFAULT: 768

ffn_embedding_dim

Dimension of feed-forward network embeddings.

TYPE: int DEFAULT: 768

num_attention_heads

Number of attention heads.

TYPE: int DEFAULT: 32

dropout

Dropout rate.

TYPE: float DEFAULT: 0.1

attention_dropout

Dropout rate for attention layers.

TYPE: float DEFAULT: 0.1

activation_dropout

Dropout rate for activation layers.

TYPE: float DEFAULT: 0.1

layerdrop

Layer drop probability.

TYPE: float DEFAULT: 0.0

encoder_normalize_before

Flag to normalize before encoder layers.

TYPE: bool DEFAULT: False

pre_layernorm

Flag to apply pre-layer normalization.

TYPE: bool DEFAULT: False

apply_graphormer_init

Flag to apply Graphormer initialization.

TYPE: bool DEFAULT: False

activation_fn

Activation function to use.

TYPE: str DEFAULT: 'gelu'

embed_scale

Scaling factor for embeddings.

TYPE: float DEFAULT: None

freeze_embeddings

Flag to freeze embeddings.

TYPE: bool DEFAULT: False

num_trans_layers_to_freeze

Number of transformer layers to freeze.

TYPE: int DEFAULT: 0

traceable

Flag for traceability.

TYPE: bool DEFAULT: False

q_noise

Quantum noise level.

TYPE: float DEFAULT: 0.0

qn_block_size

Quantum noise block size.

TYPE: int DEFAULT: 8

kdim

Key dimension.

TYPE: int DEFAULT: None

vdim

Value dimension.

TYPE: int DEFAULT: None

bias

Flag to include bias terms.

TYPE: bool DEFAULT: True

self_attention

Flag to use self-attention mechanism.

TYPE: bool DEFAULT: True

pad_token_id

ID for padding token.

DEFAULT: 0

bos_token_id

ID for beginning-of-sequence token.

DEFAULT: 1

eos_token_id

ID for end-of-sequence token.

DEFAULT: 2

**kwargs

Additional keyword arguments.

DEFAULT: {}

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/graphormer/configuration_graphormer.py
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
def __init__(
    self,
    num_classes: int = 1,
    num_atoms: int = 512 * 9,
    num_edges: int = 512 * 3,
    num_in_degree: int = 512,
    num_out_degree: int = 512,
    num_spatial: int = 512,
    num_edge_dis: int = 128,
    multi_hop_max_dist: int = 5,  # sometimes is 20
    spatial_pos_max: int = 1024,
    edge_type: str = "multi_hop",
    max_nodes: int = 512,
    share_input_output_embed: bool = False,
    num_hidden_layers: int = 12,
    embedding_dim: int = 768,
    ffn_embedding_dim: int = 768,
    num_attention_heads: int = 32,
    dropout: float = 0.1,
    attention_dropout: float = 0.1,
    activation_dropout: float = 0.1,
    layerdrop: float = 0.0,
    encoder_normalize_before: bool = False,
    pre_layernorm: bool = False,
    apply_graphormer_init: bool = False,
    activation_fn: str = "gelu",
    embed_scale: float = None,
    freeze_embeddings: bool = False,
    num_trans_layers_to_freeze: int = 0,
    traceable: bool = False,
    q_noise: float = 0.0,
    qn_block_size: int = 8,
    kdim: int = None,
    vdim: int = None,
    bias: bool = True,
    self_attention: bool = True,
    pad_token_id=0,
    bos_token_id=1,
    eos_token_id=2,
    **kwargs,
):
    """
    Initialize a GraphormerConfig object with specified configuration parameters.

    Args:
        num_classes (int): Number of classes for classification task.
        num_atoms (int): Number of atoms in the graph.
        num_edges (int): Number of edges in the graph.
        num_in_degree (int): Number of incoming degrees for each node.
        num_out_degree (int): Number of outgoing degrees for each node.
        num_spatial (int): Number of spatial features.
        num_edge_dis (int): Number of edge distances.
        multi_hop_max_dist (int): Maximum distance for multi-hop attention.
        spatial_pos_max (int): Maximum spatial position value.
        edge_type (str): Type of edges in the graph.
        max_nodes (int): Maximum number of nodes in the graph.
        share_input_output_embed (bool): Flag to indicate sharing input and output embeddings.
        num_hidden_layers (int): Number of hidden layers.
        embedding_dim (int): Dimension of embeddings.
        ffn_embedding_dim (int): Dimension of feed-forward network embeddings.
        num_attention_heads (int): Number of attention heads.
        dropout (float): Dropout rate.
        attention_dropout (float): Dropout rate for attention layers.
        activation_dropout (float): Dropout rate for activation layers.
        layerdrop (float): Layer drop probability.
        encoder_normalize_before (bool): Flag to normalize before encoder layers.
        pre_layernorm (bool): Flag to apply pre-layer normalization.
        apply_graphormer_init (bool): Flag to apply Graphormer initialization.
        activation_fn (str): Activation function to use.
        embed_scale (float): Scaling factor for embeddings.
        freeze_embeddings (bool): Flag to freeze embeddings.
        num_trans_layers_to_freeze (int): Number of transformer layers to freeze.
        traceable (bool): Flag for traceability.
        q_noise (float): Quantum noise level.
        qn_block_size (int): Quantum noise block size.
        kdim (int): Key dimension.
        vdim (int): Value dimension.
        bias (bool): Flag to include bias terms.
        self_attention (bool): Flag to use self-attention mechanism.
        pad_token_id: ID for padding token.
        bos_token_id: ID for beginning-of-sequence token.
        eos_token_id: ID for end-of-sequence token.
        **kwargs: Additional keyword arguments.

    Returns:
        None.

    Raises:
        None.
    """
    self.num_classes = num_classes
    self.num_atoms = num_atoms
    self.num_in_degree = num_in_degree
    self.num_out_degree = num_out_degree
    self.num_edges = num_edges
    self.num_spatial = num_spatial
    self.num_edge_dis = num_edge_dis
    self.edge_type = edge_type
    self.multi_hop_max_dist = multi_hop_max_dist
    self.spatial_pos_max = spatial_pos_max
    self.max_nodes = max_nodes
    self.num_hidden_layers = num_hidden_layers
    self.embedding_dim = embedding_dim
    self.hidden_size = embedding_dim
    self.ffn_embedding_dim = ffn_embedding_dim
    self.num_attention_heads = num_attention_heads
    self.dropout = dropout
    self.attention_dropout = attention_dropout
    self.activation_dropout = activation_dropout
    self.layerdrop = layerdrop
    self.encoder_normalize_before = encoder_normalize_before
    self.pre_layernorm = pre_layernorm
    self.apply_graphormer_init = apply_graphormer_init
    self.activation_fn = activation_fn
    self.embed_scale = embed_scale
    self.freeze_embeddings = freeze_embeddings
    self.num_trans_layers_to_freeze = num_trans_layers_to_freeze
    self.share_input_output_embed = share_input_output_embed
    self.traceable = traceable
    self.q_noise = q_noise
    self.qn_block_size = qn_block_size

    # These parameters are here for future extensions
    # atm, the model only supports self attention
    self.kdim = kdim
    self.vdim = vdim
    self.self_attention = self_attention
    self.bias = bias

    super().__init__(
        pad_token_id=pad_token_id,
        bos_token_id=bos_token_id,
        eos_token_id=eos_token_id,
        **kwargs,
    )

mindnlp.transformers.models.graphormer.collating_graphormer.GraphormerDataCollator

Graphormer data collator

Converts graph dataset into the format accepted by Graphormer model

Source code in mindnlp/transformers/models/graphormer/collating_graphormer.py
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
class GraphormerDataCollator:
    """
    Graphormer data collator

    Converts graph dataset into the format accepted by Graphormer model
    """
    def __init__(self, spatial_pos_max=20, on_the_fly_processing=False):
        """
        Initializes a new instance of the GraphormerDataCollator class.

        Args:
            self: The object instance.
            spatial_pos_max (int): The maximum spatial position value. Defaults to 20.
            on_the_fly_processing (bool): Indicates whether on-the-fly processing is enabled or not. Defaults to False.

        Returns:
            None.

        Raises:
            ImportError: If the required Cython package (pyximport) is not available.

        """
        if not is_cython_available():
            raise ImportError("Graphormer preprocessing needs Cython (pyximport)")

        self.spatial_pos_max = spatial_pos_max
        self.on_the_fly_processing = on_the_fly_processing
        self.output_columns=["attn_bias",
                             "attn_edge_type",
                             "spatial_pos",
                             "in_degree",
                             "input_nodes",
                             "input_edges",
                             "out_degree",
                             "labels"]

    def __call__(self, edge_index, edge_attr, y, num_nodes, node_feat, batch_info):
        """
        This method, named '__call__', is defined within the class 'GraphormerDataCollator' and is used to process data
        for graph neural network models. It takes the following parameters:

        Args:
            self: The instance of the class.
            edge_index (List): A list of edge indices representing the connectivity of nodes in the graph.
            edge_attr (List): A list of edge attributes corresponding to the edges in the graph.
            y (List): A list of target values or labels associated with the graph data.
            num_nodes (List): A list containing the number of nodes in each graph.
            node_feat (List): A list of node features for each graph in the dataset.
            batch_info (Dict): A dictionary containing batch information for the graphs.

        Returns:
            None.

        Raises:
            TypeError: If the input parameters are not of the expected types.
            ValueError: If the input parameters do not meet specific requirements within the method logic.
            IndexError: If there are issues with index access during the processing of graph data.
        """
        features = []
        num_features = len(edge_index)
        for i in range(num_features):
            features.append({"edge_index": edge_index[i],
                             "edge_attr": edge_attr[i],
                             "y": y[i],
                             "num_nodes": num_nodes[i],
                             "node_feat": node_feat[i]})

        if self.on_the_fly_processing:
            features = [preprocess_item(i) for i in features]

        if not isinstance(features[0], Mapping):
            features = [vars(f) for f in features]
        batch = {}

        max_node_num = max(len(i["input_nodes"]) for i in features)
        node_feat_size = len(features[0]["input_nodes"][0])
        edge_feat_size = len(features[0]["attn_edge_type"][0][0])
        max_dist = max(len(i["input_edges"][0][0]) for i in features)
        edge_input_size = len(features[0]["input_edges"][0][0][0])
        batch_size = len(features)

        batch["attn_bias"] = np.zeros((batch_size, max_node_num + 1, max_node_num + 1),
                                       dtype=np.float32)
        batch["attn_edge_type"] = np.zeros((batch_size, max_node_num, max_node_num, edge_feat_size),
                                            dtype=np.int64)
        batch["spatial_pos"] = np.zeros((batch_size, max_node_num, max_node_num),
                                         dtype=np.int64)
        batch["in_degree"] = np.zeros((batch_size, max_node_num),
                                       dtype=np.int64)
        batch["input_nodes"] = np.zeros((batch_size, max_node_num, node_feat_size),
                                         dtype=np.int64)
        batch["input_edges"] = np.zeros(
            (batch_size, max_node_num, max_node_num, max_dist, edge_input_size),
            dtype=np.int64
        )

        for idx, ftr in enumerate(features):

            if len(ftr["attn_bias"][1:, 1:][ftr["spatial_pos"] >= self.spatial_pos_max]) > 0:
                ftr["attn_bias"][1:, 1:][ftr["spatial_pos"] >= self.spatial_pos_max] = float("-inf")

            batch["attn_bias"][idx, : ftr["attn_bias"].shape[0], : ftr["attn_bias"].shape[1]] = ftr["attn_bias"]
            batch["attn_edge_type"][idx, : ftr["attn_edge_type"].shape[0], : ftr["attn_edge_type"].shape[1], :] = ftr[
                "attn_edge_type"
            ]
            batch["spatial_pos"][idx, : ftr["spatial_pos"].shape[0], : ftr["spatial_pos"].shape[1]] = ftr["spatial_pos"]
            batch["in_degree"][idx, : ftr["in_degree"].shape[0]] = ftr["in_degree"]
            batch["input_nodes"][idx, : ftr["input_nodes"].shape[0], :] = ftr["input_nodes"]
            batch["input_edges"][
                idx, : ftr["input_edges"].shape[0], : ftr["input_edges"].shape[1], : ftr["input_edges"].shape[2], :
            ] = ftr["input_edges"]

        batch["out_degree"] = batch["in_degree"]

        sample = features[0]["labels"]
        if len(sample) == 1:  # one task
            if isinstance(sample[0], float):  # regression
                batch["labels"] = np.concatenate([i["labels"] for i in features])
            else:  # binary classification
                batch["labels"] = np.concatenate([i["labels"] for i in features])
        else:  # multi task classification, left to float to keep the NaNs
            batch["labels"] = np.stack([i["labels"] for i in features], axis=0)

        outputs = [batch[key] for key in self.output_columns]
        return tuple(outputs)

mindnlp.transformers.models.graphormer.collating_graphormer.GraphormerDataCollator.__call__(edge_index, edge_attr, y, num_nodes, node_feat, batch_info)

This method, named 'call', is defined within the class 'GraphormerDataCollator' and is used to process data for graph neural network models. It takes the following parameters:

PARAMETER DESCRIPTION
self

The instance of the class.

edge_index

A list of edge indices representing the connectivity of nodes in the graph.

TYPE: List

edge_attr

A list of edge attributes corresponding to the edges in the graph.

TYPE: List

y

A list of target values or labels associated with the graph data.

TYPE: List

num_nodes

A list containing the number of nodes in each graph.

TYPE: List

node_feat

A list of node features for each graph in the dataset.

TYPE: List

batch_info

A dictionary containing batch information for the graphs.

TYPE: Dict

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
TypeError

If the input parameters are not of the expected types.

ValueError

If the input parameters do not meet specific requirements within the method logic.

IndexError

If there are issues with index access during the processing of graph data.

Source code in mindnlp/transformers/models/graphormer/collating_graphormer.py
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
def __call__(self, edge_index, edge_attr, y, num_nodes, node_feat, batch_info):
    """
    This method, named '__call__', is defined within the class 'GraphormerDataCollator' and is used to process data
    for graph neural network models. It takes the following parameters:

    Args:
        self: The instance of the class.
        edge_index (List): A list of edge indices representing the connectivity of nodes in the graph.
        edge_attr (List): A list of edge attributes corresponding to the edges in the graph.
        y (List): A list of target values or labels associated with the graph data.
        num_nodes (List): A list containing the number of nodes in each graph.
        node_feat (List): A list of node features for each graph in the dataset.
        batch_info (Dict): A dictionary containing batch information for the graphs.

    Returns:
        None.

    Raises:
        TypeError: If the input parameters are not of the expected types.
        ValueError: If the input parameters do not meet specific requirements within the method logic.
        IndexError: If there are issues with index access during the processing of graph data.
    """
    features = []
    num_features = len(edge_index)
    for i in range(num_features):
        features.append({"edge_index": edge_index[i],
                         "edge_attr": edge_attr[i],
                         "y": y[i],
                         "num_nodes": num_nodes[i],
                         "node_feat": node_feat[i]})

    if self.on_the_fly_processing:
        features = [preprocess_item(i) for i in features]

    if not isinstance(features[0], Mapping):
        features = [vars(f) for f in features]
    batch = {}

    max_node_num = max(len(i["input_nodes"]) for i in features)
    node_feat_size = len(features[0]["input_nodes"][0])
    edge_feat_size = len(features[0]["attn_edge_type"][0][0])
    max_dist = max(len(i["input_edges"][0][0]) for i in features)
    edge_input_size = len(features[0]["input_edges"][0][0][0])
    batch_size = len(features)

    batch["attn_bias"] = np.zeros((batch_size, max_node_num + 1, max_node_num + 1),
                                   dtype=np.float32)
    batch["attn_edge_type"] = np.zeros((batch_size, max_node_num, max_node_num, edge_feat_size),
                                        dtype=np.int64)
    batch["spatial_pos"] = np.zeros((batch_size, max_node_num, max_node_num),
                                     dtype=np.int64)
    batch["in_degree"] = np.zeros((batch_size, max_node_num),
                                   dtype=np.int64)
    batch["input_nodes"] = np.zeros((batch_size, max_node_num, node_feat_size),
                                     dtype=np.int64)
    batch["input_edges"] = np.zeros(
        (batch_size, max_node_num, max_node_num, max_dist, edge_input_size),
        dtype=np.int64
    )

    for idx, ftr in enumerate(features):

        if len(ftr["attn_bias"][1:, 1:][ftr["spatial_pos"] >= self.spatial_pos_max]) > 0:
            ftr["attn_bias"][1:, 1:][ftr["spatial_pos"] >= self.spatial_pos_max] = float("-inf")

        batch["attn_bias"][idx, : ftr["attn_bias"].shape[0], : ftr["attn_bias"].shape[1]] = ftr["attn_bias"]
        batch["attn_edge_type"][idx, : ftr["attn_edge_type"].shape[0], : ftr["attn_edge_type"].shape[1], :] = ftr[
            "attn_edge_type"
        ]
        batch["spatial_pos"][idx, : ftr["spatial_pos"].shape[0], : ftr["spatial_pos"].shape[1]] = ftr["spatial_pos"]
        batch["in_degree"][idx, : ftr["in_degree"].shape[0]] = ftr["in_degree"]
        batch["input_nodes"][idx, : ftr["input_nodes"].shape[0], :] = ftr["input_nodes"]
        batch["input_edges"][
            idx, : ftr["input_edges"].shape[0], : ftr["input_edges"].shape[1], : ftr["input_edges"].shape[2], :
        ] = ftr["input_edges"]

    batch["out_degree"] = batch["in_degree"]

    sample = features[0]["labels"]
    if len(sample) == 1:  # one task
        if isinstance(sample[0], float):  # regression
            batch["labels"] = np.concatenate([i["labels"] for i in features])
        else:  # binary classification
            batch["labels"] = np.concatenate([i["labels"] for i in features])
    else:  # multi task classification, left to float to keep the NaNs
        batch["labels"] = np.stack([i["labels"] for i in features], axis=0)

    outputs = [batch[key] for key in self.output_columns]
    return tuple(outputs)

mindnlp.transformers.models.graphormer.collating_graphormer.GraphormerDataCollator.__init__(spatial_pos_max=20, on_the_fly_processing=False)

Initializes a new instance of the GraphormerDataCollator class.

PARAMETER DESCRIPTION
self

The object instance.

spatial_pos_max

The maximum spatial position value. Defaults to 20.

TYPE: int DEFAULT: 20

on_the_fly_processing

Indicates whether on-the-fly processing is enabled or not. Defaults to False.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION

None.

RAISES DESCRIPTION
ImportError

If the required Cython package (pyximport) is not available.

Source code in mindnlp/transformers/models/graphormer/collating_graphormer.py
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
def __init__(self, spatial_pos_max=20, on_the_fly_processing=False):
    """
    Initializes a new instance of the GraphormerDataCollator class.

    Args:
        self: The object instance.
        spatial_pos_max (int): The maximum spatial position value. Defaults to 20.
        on_the_fly_processing (bool): Indicates whether on-the-fly processing is enabled or not. Defaults to False.

    Returns:
        None.

    Raises:
        ImportError: If the required Cython package (pyximport) is not available.

    """
    if not is_cython_available():
        raise ImportError("Graphormer preprocessing needs Cython (pyximport)")

    self.spatial_pos_max = spatial_pos_max
    self.on_the_fly_processing = on_the_fly_processing
    self.output_columns=["attn_bias",
                         "attn_edge_type",
                         "spatial_pos",
                         "in_degree",
                         "input_nodes",
                         "input_edges",
                         "out_degree",
                         "labels"]

mindnlp.transformers.models.graphormer.modeling_graphormer.GraphormerForGraphClassification

Bases: GraphormerPreTrainedModel

This model can be used for graph-level classification or regression tasks.

It can be trained on

  • regression (by setting config.num_classes to 1); there should be one float-type label per graph
  • one task classification (by setting config.num_classes to the number of classes); there should be one integer label per graph
  • binary multi-task classification (by setting config.num_classes to the number of labels); there should be a list of integer labels for each graph.
Source code in mindnlp/transformers/models/graphormer/modeling_graphormer.py
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
class GraphormerForGraphClassification(GraphormerPreTrainedModel):
    """
    This model can be used for graph-level classification or regression tasks.

    It can be trained on

    - regression (by setting config.num_classes to 1); there should be one float-type label per graph
    - one task classification (by setting config.num_classes to the number of classes); there should be one integer
    label per graph
    - binary multi-task classification (by setting config.num_classes to the number of labels); there should be a list
    of integer labels for each graph.
    """
    def __init__(self, config: GraphormerConfig):
        """
        Initializes a new instance of GraphormerForGraphClassification.

        Args:
            self: The instance of the class.
            config (GraphormerConfig):
                An instance of GraphormerConfig containing the configuration settings for the Graphormer model.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__(config)
        self.encoder = GraphormerModel(config)
        self.embedding_dim = config.embedding_dim
        self.num_classes = config.num_classes
        self.classifier = GraphormerDecoderHead(self.embedding_dim, self.num_classes)
        self.is_encoder_decoder = True

        # Initialize weights and apply final processing
        self.post_init()

    def forward(
        self,
        input_nodes: Tensor,
        input_edges: Tensor,
        attn_bias: Tensor,
        in_degree: Tensor,
        out_degree: Tensor,
        spatial_pos: Tensor,
        attn_edge_type: Tensor,
        labels: Optional[Tensor] = None,
        return_dict: Optional[bool] = None,
        **kwargs,
    ) -> Union[Tuple[Tensor], SequenceClassifierOutput]:
        """Constructs a Graphormer for graph classification.

        This method takes the following parameters:

        - self: The object instance.
        - input_nodes: A Tensor representing the input nodes.
        - input_edges: A Tensor representing the input edges.
        - attn_bias: A Tensor representing the attention bias.
        - in_degree: A Tensor representing the in-degree of the nodes.
        - out_degree: A Tensor representing the out-degree of the nodes.
        - spatial_pos: A Tensor representing the spatial positions of the nodes.
        - attn_edge_type: A Tensor representing the attention edge types.
        - labels: An optional Tensor representing the labels for classification. Defaults to None.
        - return_dict: An optional boolean indicating whether to return a dictionary.
        If not provided, it uses the value from the configuration. Defaults to None.
        - **kwargs: Additional keyword arguments.

        The method returns a value of type Union[Tuple[Tensor], SequenceClassifierOutput].

        Args:
            self: The object instance.
            input_nodes: A Tensor representing the input nodes. Shape: [batch_size, sequence_length, hidden_size].
            input_edges: A Tensor representing the input edges.
                Shape: [batch_size, sequence_length, sequence_length, hidden_size].
            attn_bias: A Tensor representing the attention bias. Shape: [batch_size, sequence_length, sequence_length].
            in_degree: A Tensor representing the in-degree of the nodes. Shape: [batch_size, sequence_length].
            out_degree: A Tensor representing the out-degree of the nodes. Shape: [batch_size, sequence_length].
            spatial_pos: A Tensor representing the spatial positions of the nodes.
                Shape: [batch_size, sequence_length, hidden_size].
            attn_edge_type: A Tensor representing the attention edge types.
                Shape: [batch_size, sequence_length, sequence_length].
            labels: An optional Tensor representing the labels for classification. Shape: [batch_size, num_classes].
                Defaults to None.
            return_dict: An optional boolean indicating whether to return a dictionary.
                If not provided, it uses the value from the configuration. Defaults to None.
            **kwargs: Additional keyword arguments.

        Returns:
            Conditional Return:

                - If 'return_dict' is False, the method returns a tuple containing the following elements (if not None):

                    - loss: A Tensor representing the calculated loss. Shape: [batch_size].
                    - logits: A Tensor representing the output logits. Shape: [batch_size, num_classes].
                    - hidden_states: A list of Tensors representing the hidden states. Each Tensor has shape
                    [batch_size, sequence_length, hidden_size].

                - If 'return_dict' is True, the method returns a SequenceClassifierOutput object with the following
                attributes (if not None):

                    - loss: A Tensor representing the calculated loss. Shape: [batch_size].
                    - logits: A Tensor representing the output logits. Shape: [batch_size, num_classes].
                    - hidden_states: A list of Tensors representing the hidden states. Each Tensor has shape
                    [batch_size, sequence_length, hidden_size].
                    - attentions: None.

        Raises:
            MSELossError: If 'labels' is not None and 'num_classes' is 1, but the shape of 'labels' is not compatible
                with logits.
            CrossEntropyLossError: If 'labels' is not None and 'num_classes' is greater than 1, but the shape of
                'labels' is not compatible with logits.
            BCEWithLogitsLossError: If 'labels' is not None and 'num_classes' is greater than 1, but the shape of
                'labels' is not compatible with logits.
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        encoder_outputs = self.encoder(
            input_nodes,
            input_edges,
            attn_bias,
            in_degree,
            out_degree,
            spatial_pos,
            attn_edge_type,
            return_dict=True,
        )

        outputs, hidden_states = encoder_outputs["last_hidden_state"], encoder_outputs["hidden_states"]

        head_outputs = self.classifier(outputs)
        logits = head_outputs[:, 0, :]

        loss = None
        if labels is not None:
            mask = 1 - ops.isnan(labels) # invert True and False

            if self.num_classes == 1:  # regression
                loss_fct = MSELoss()
                loss = loss_fct(logits[mask].squeeze(), labels[mask].squeeze().float())
            elif self.num_classes > 1 and len(labels.shape) == 1:  # One task classification
                loss_fct = CrossEntropyLoss()
                loss = loss_fct(logits[mask].view(-1, self.num_classes), labels[mask].view(-1))
            else:  # Binary multi-task classification
                loss_fct = BCEWithLogitsLoss(reduction="sum")
                loss = loss_fct(logits[mask], labels[mask])

        if not return_dict:
            return tuple(x for x in [loss, logits, hidden_states] if x is not None)
        return SequenceClassifierOutput(loss=loss, logits=logits, hidden_states=hidden_states, attentions=None)

mindnlp.transformers.models.graphormer.modeling_graphormer.GraphormerForGraphClassification.__init__(config)

Initializes a new instance of GraphormerForGraphClassification.

PARAMETER DESCRIPTION
self

The instance of the class.

config

An instance of GraphormerConfig containing the configuration settings for the Graphormer model.

TYPE: GraphormerConfig

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/graphormer/modeling_graphormer.py
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
def __init__(self, config: GraphormerConfig):
    """
    Initializes a new instance of GraphormerForGraphClassification.

    Args:
        self: The instance of the class.
        config (GraphormerConfig):
            An instance of GraphormerConfig containing the configuration settings for the Graphormer model.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__(config)
    self.encoder = GraphormerModel(config)
    self.embedding_dim = config.embedding_dim
    self.num_classes = config.num_classes
    self.classifier = GraphormerDecoderHead(self.embedding_dim, self.num_classes)
    self.is_encoder_decoder = True

    # Initialize weights and apply final processing
    self.post_init()

mindnlp.transformers.models.graphormer.modeling_graphormer.GraphormerForGraphClassification.forward(input_nodes, input_edges, attn_bias, in_degree, out_degree, spatial_pos, attn_edge_type, labels=None, return_dict=None, **kwargs)

Constructs a Graphormer for graph classification.

This method takes the following parameters:

  • self: The object instance.
  • input_nodes: A Tensor representing the input nodes.
  • input_edges: A Tensor representing the input edges.
  • attn_bias: A Tensor representing the attention bias.
  • in_degree: A Tensor representing the in-degree of the nodes.
  • out_degree: A Tensor representing the out-degree of the nodes.
  • spatial_pos: A Tensor representing the spatial positions of the nodes.
  • attn_edge_type: A Tensor representing the attention edge types.
  • labels: An optional Tensor representing the labels for classification. Defaults to None.
  • return_dict: An optional boolean indicating whether to return a dictionary. If not provided, it uses the value from the configuration. Defaults to None.
  • **kwargs: Additional keyword arguments.

The method returns a value of type Union[Tuple[Tensor], SequenceClassifierOutput].

PARAMETER DESCRIPTION
self

The object instance.

input_nodes

A Tensor representing the input nodes. Shape: [batch_size, sequence_length, hidden_size].

TYPE: Tensor

input_edges

A Tensor representing the input edges. Shape: [batch_size, sequence_length, sequence_length, hidden_size].

TYPE: Tensor

attn_bias

A Tensor representing the attention bias. Shape: [batch_size, sequence_length, sequence_length].

TYPE: Tensor

in_degree

A Tensor representing the in-degree of the nodes. Shape: [batch_size, sequence_length].

TYPE: Tensor

out_degree

A Tensor representing the out-degree of the nodes. Shape: [batch_size, sequence_length].

TYPE: Tensor

spatial_pos

A Tensor representing the spatial positions of the nodes. Shape: [batch_size, sequence_length, hidden_size].

TYPE: Tensor

attn_edge_type

A Tensor representing the attention edge types. Shape: [batch_size, sequence_length, sequence_length].

TYPE: Tensor

labels

An optional Tensor representing the labels for classification. Shape: [batch_size, num_classes]. Defaults to None.

TYPE: Optional[Tensor] DEFAULT: None

return_dict

An optional boolean indicating whether to return a dictionary. If not provided, it uses the value from the configuration. Defaults to None.

TYPE: Optional[bool] DEFAULT: None

**kwargs

Additional keyword arguments.

DEFAULT: {}

RETURNS DESCRIPTION
Union[Tuple[Tensor], SequenceClassifierOutput]

Conditional Return:

  • If 'return_dict' is False, the method returns a tuple containing the following elements (if not None):

    • loss: A Tensor representing the calculated loss. Shape: [batch_size].
    • logits: A Tensor representing the output logits. Shape: [batch_size, num_classes].
    • hidden_states: A list of Tensors representing the hidden states. Each Tensor has shape [batch_size, sequence_length, hidden_size].
  • If 'return_dict' is True, the method returns a SequenceClassifierOutput object with the following attributes (if not None):

    • loss: A Tensor representing the calculated loss. Shape: [batch_size].
    • logits: A Tensor representing the output logits. Shape: [batch_size, num_classes].
    • hidden_states: A list of Tensors representing the hidden states. Each Tensor has shape [batch_size, sequence_length, hidden_size].
    • attentions: None.
RAISES DESCRIPTION
MSELossError

If 'labels' is not None and 'num_classes' is 1, but the shape of 'labels' is not compatible with logits.

CrossEntropyLossError

If 'labels' is not None and 'num_classes' is greater than 1, but the shape of 'labels' is not compatible with logits.

BCEWithLogitsLossError

If 'labels' is not None and 'num_classes' is greater than 1, but the shape of 'labels' is not compatible with logits.

Source code in mindnlp/transformers/models/graphormer/modeling_graphormer.py
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
def forward(
    self,
    input_nodes: Tensor,
    input_edges: Tensor,
    attn_bias: Tensor,
    in_degree: Tensor,
    out_degree: Tensor,
    spatial_pos: Tensor,
    attn_edge_type: Tensor,
    labels: Optional[Tensor] = None,
    return_dict: Optional[bool] = None,
    **kwargs,
) -> Union[Tuple[Tensor], SequenceClassifierOutput]:
    """Constructs a Graphormer for graph classification.

    This method takes the following parameters:

    - self: The object instance.
    - input_nodes: A Tensor representing the input nodes.
    - input_edges: A Tensor representing the input edges.
    - attn_bias: A Tensor representing the attention bias.
    - in_degree: A Tensor representing the in-degree of the nodes.
    - out_degree: A Tensor representing the out-degree of the nodes.
    - spatial_pos: A Tensor representing the spatial positions of the nodes.
    - attn_edge_type: A Tensor representing the attention edge types.
    - labels: An optional Tensor representing the labels for classification. Defaults to None.
    - return_dict: An optional boolean indicating whether to return a dictionary.
    If not provided, it uses the value from the configuration. Defaults to None.
    - **kwargs: Additional keyword arguments.

    The method returns a value of type Union[Tuple[Tensor], SequenceClassifierOutput].

    Args:
        self: The object instance.
        input_nodes: A Tensor representing the input nodes. Shape: [batch_size, sequence_length, hidden_size].
        input_edges: A Tensor representing the input edges.
            Shape: [batch_size, sequence_length, sequence_length, hidden_size].
        attn_bias: A Tensor representing the attention bias. Shape: [batch_size, sequence_length, sequence_length].
        in_degree: A Tensor representing the in-degree of the nodes. Shape: [batch_size, sequence_length].
        out_degree: A Tensor representing the out-degree of the nodes. Shape: [batch_size, sequence_length].
        spatial_pos: A Tensor representing the spatial positions of the nodes.
            Shape: [batch_size, sequence_length, hidden_size].
        attn_edge_type: A Tensor representing the attention edge types.
            Shape: [batch_size, sequence_length, sequence_length].
        labels: An optional Tensor representing the labels for classification. Shape: [batch_size, num_classes].
            Defaults to None.
        return_dict: An optional boolean indicating whether to return a dictionary.
            If not provided, it uses the value from the configuration. Defaults to None.
        **kwargs: Additional keyword arguments.

    Returns:
        Conditional Return:

            - If 'return_dict' is False, the method returns a tuple containing the following elements (if not None):

                - loss: A Tensor representing the calculated loss. Shape: [batch_size].
                - logits: A Tensor representing the output logits. Shape: [batch_size, num_classes].
                - hidden_states: A list of Tensors representing the hidden states. Each Tensor has shape
                [batch_size, sequence_length, hidden_size].

            - If 'return_dict' is True, the method returns a SequenceClassifierOutput object with the following
            attributes (if not None):

                - loss: A Tensor representing the calculated loss. Shape: [batch_size].
                - logits: A Tensor representing the output logits. Shape: [batch_size, num_classes].
                - hidden_states: A list of Tensors representing the hidden states. Each Tensor has shape
                [batch_size, sequence_length, hidden_size].
                - attentions: None.

    Raises:
        MSELossError: If 'labels' is not None and 'num_classes' is 1, but the shape of 'labels' is not compatible
            with logits.
        CrossEntropyLossError: If 'labels' is not None and 'num_classes' is greater than 1, but the shape of
            'labels' is not compatible with logits.
        BCEWithLogitsLossError: If 'labels' is not None and 'num_classes' is greater than 1, but the shape of
            'labels' is not compatible with logits.
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    encoder_outputs = self.encoder(
        input_nodes,
        input_edges,
        attn_bias,
        in_degree,
        out_degree,
        spatial_pos,
        attn_edge_type,
        return_dict=True,
    )

    outputs, hidden_states = encoder_outputs["last_hidden_state"], encoder_outputs["hidden_states"]

    head_outputs = self.classifier(outputs)
    logits = head_outputs[:, 0, :]

    loss = None
    if labels is not None:
        mask = 1 - ops.isnan(labels) # invert True and False

        if self.num_classes == 1:  # regression
            loss_fct = MSELoss()
            loss = loss_fct(logits[mask].squeeze(), labels[mask].squeeze().float())
        elif self.num_classes > 1 and len(labels.shape) == 1:  # One task classification
            loss_fct = CrossEntropyLoss()
            loss = loss_fct(logits[mask].view(-1, self.num_classes), labels[mask].view(-1))
        else:  # Binary multi-task classification
            loss_fct = BCEWithLogitsLoss(reduction="sum")
            loss = loss_fct(logits[mask], labels[mask])

    if not return_dict:
        return tuple(x for x in [loss, logits, hidden_states] if x is not None)
    return SequenceClassifierOutput(loss=loss, logits=logits, hidden_states=hidden_states, attentions=None)

mindnlp.transformers.models.graphormer.modeling_graphormer.GraphormerModel

Bases: GraphormerPreTrainedModel

The Graphormer model is a graph-encoder model.

It goes from a graph to its representation. If you want to use the model for a downstream classification task, use GraphormerForGraphClassification instead. For any other downstream task, feel free to add a new class, or combine this model with a downstream model of your choice, following the example in GraphormerForGraphClassification.

Source code in mindnlp/transformers/models/graphormer/modeling_graphormer.py
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
class GraphormerModel(GraphormerPreTrainedModel):
    """
    The Graphormer model is a graph-encoder model.

    It goes from a graph to its representation. If you want to use the model for a downstream classification task, use
    GraphormerForGraphClassification instead. For any other downstream task, feel free to add a new class, or combine
    this model with a downstream model of your choice, following the example in GraphormerForGraphClassification.
    """
    def __init__(self, config: GraphormerConfig):
        """
        Initializes a new instance of the GraphormerModel class.

        Args:
            self: The instance of the GraphormerModel class.
            config (GraphormerConfig):
                An object of type GraphormerConfig containing the configuration settings for the model.
                The config parameter is used to set various attributes of the GraphormerModel instance,
                such as max_nodes, graph_encoder, share_input_output_embed, lm_output_learned_bias, load_softmax,
                lm_head_transform_weight, activation_fn, and layer_norm.

        Returns:
            None.

        Raises:
            None.
        """
        super().__init__(config)
        self.max_nodes = config.max_nodes

        self.graph_encoder = GraphormerGraphEncoder(config)

        self.share_input_output_embed = config.share_input_output_embed
        self.lm_output_learned_bias = None

        # Remove head is set to true during fine-tuning
        self.load_softmax = not getattr(config, "remove_head", False)

        self.lm_head_transform_weight = nn.Linear(config.embedding_dim, config.embedding_dim)
        self.activation_fn = ACT2FN[config.activation_fn]
        self.layer_norm = nn.LayerNorm([config.embedding_dim])

        self.post_init()

    def reset_output_layer_parameters(self):
        """
        Reset output layer parameters
        """
        self.lm_output_learned_bias = Parameter(ops.zeros(1))

    def forward(
        self,
        input_nodes: Tensor,
        input_edges: Tensor,
        attn_bias: Tensor,
        in_degree: Tensor,
        out_degree: Tensor,
        spatial_pos: Tensor,
        attn_edge_type: Tensor,
        perturb: Optional[Tensor] = None,
        masked_tokens: None = None,
        return_dict: Optional[bool] = None,
        **kwargs,
    ) -> Union[Tuple[Tensor], BaseModelOutputWithNoAttention]:
        """
        Construct method in the GraphormerModel class.

        Args:
            self: The instance of the class.
            input_nodes (Tensor): The input nodes tensor for the graph.
            input_edges (Tensor): The input edges tensor for the graph.
            attn_bias (Tensor): The attention bias tensor.
            in_degree (Tensor): The in-degree tensor for nodes in the graph.
            out_degree (Tensor): The out-degree tensor for nodes in the graph.
            spatial_pos (Tensor): The spatial position tensor for nodes in the graph.
            attn_edge_type (Tensor): The attention edge type tensor.
            perturb (Optional[Tensor], default=None): A tensor for perturbation.
            masked_tokens (None): Not implemented; should be None.
            return_dict (Optional[bool], default=None): If True, returns a BaseModelOutputWithNoAttention object.

        Returns:
            Union[Tuple[Tensor], BaseModelOutputWithNoAttention]:
                Depending on the value of return_dict, either a tuple containing input_nodes and inner_states
                or a BaseModelOutputWithNoAttention object.

        Raises:
            NotImplementedError: If masked_tokens is not None, indicating that the functionality is not implemented.
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        inner_states, _ = self.graph_encoder(
            input_nodes, input_edges, attn_bias, in_degree, out_degree, spatial_pos, attn_edge_type, perturb=perturb
        )

        # last inner state, then revert Batch and Graph len
        input_nodes = inner_states[-1].swapaxes(0, 1)

        # project masked tokens only
        if masked_tokens is not None:
            raise NotImplementedError

        input_nodes = self.layer_norm(self.activation_fn(self.lm_head_transform_weight(input_nodes)))

        # project back to size of vocabulary
        if self.share_input_output_embed and hasattr(self.graph_encoder.embed_tokens, "weight"):
            input_nodes = ops.dense(input_nodes, self.graph_encoder.embed_tokens.weight)

        if not return_dict:
            return tuple(x for x in [input_nodes, inner_states] if x is not None)
        return BaseModelOutputWithNoAttention(last_hidden_state=input_nodes, hidden_states=inner_states)

mindnlp.transformers.models.graphormer.modeling_graphormer.GraphormerModel.__init__(config)

Initializes a new instance of the GraphormerModel class.

PARAMETER DESCRIPTION
self

The instance of the GraphormerModel class.

config

An object of type GraphormerConfig containing the configuration settings for the model. The config parameter is used to set various attributes of the GraphormerModel instance, such as max_nodes, graph_encoder, share_input_output_embed, lm_output_learned_bias, load_softmax, lm_head_transform_weight, activation_fn, and layer_norm.

TYPE: GraphormerConfig

RETURNS DESCRIPTION

None.

Source code in mindnlp/transformers/models/graphormer/modeling_graphormer.py
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
def __init__(self, config: GraphormerConfig):
    """
    Initializes a new instance of the GraphormerModel class.

    Args:
        self: The instance of the GraphormerModel class.
        config (GraphormerConfig):
            An object of type GraphormerConfig containing the configuration settings for the model.
            The config parameter is used to set various attributes of the GraphormerModel instance,
            such as max_nodes, graph_encoder, share_input_output_embed, lm_output_learned_bias, load_softmax,
            lm_head_transform_weight, activation_fn, and layer_norm.

    Returns:
        None.

    Raises:
        None.
    """
    super().__init__(config)
    self.max_nodes = config.max_nodes

    self.graph_encoder = GraphormerGraphEncoder(config)

    self.share_input_output_embed = config.share_input_output_embed
    self.lm_output_learned_bias = None

    # Remove head is set to true during fine-tuning
    self.load_softmax = not getattr(config, "remove_head", False)

    self.lm_head_transform_weight = nn.Linear(config.embedding_dim, config.embedding_dim)
    self.activation_fn = ACT2FN[config.activation_fn]
    self.layer_norm = nn.LayerNorm([config.embedding_dim])

    self.post_init()

mindnlp.transformers.models.graphormer.modeling_graphormer.GraphormerModel.forward(input_nodes, input_edges, attn_bias, in_degree, out_degree, spatial_pos, attn_edge_type, perturb=None, masked_tokens=None, return_dict=None, **kwargs)

Construct method in the GraphormerModel class.

PARAMETER DESCRIPTION
self

The instance of the class.

input_nodes

The input nodes tensor for the graph.

TYPE: Tensor

input_edges

The input edges tensor for the graph.

TYPE: Tensor

attn_bias

The attention bias tensor.

TYPE: Tensor

in_degree

The in-degree tensor for nodes in the graph.

TYPE: Tensor

out_degree

The out-degree tensor for nodes in the graph.

TYPE: Tensor

spatial_pos

The spatial position tensor for nodes in the graph.

TYPE: Tensor

attn_edge_type

The attention edge type tensor.

TYPE: Tensor

perturb

A tensor for perturbation.

TYPE: Optional[Tensor], default=None DEFAULT: None

masked_tokens

Not implemented; should be None.

TYPE: None DEFAULT: None

return_dict

If True, returns a BaseModelOutputWithNoAttention object.

TYPE: Optional[bool], default=None DEFAULT: None

RETURNS DESCRIPTION
Union[Tuple[Tensor], BaseModelOutputWithNoAttention]

Union[Tuple[Tensor], BaseModelOutputWithNoAttention]: Depending on the value of return_dict, either a tuple containing input_nodes and inner_states or a BaseModelOutputWithNoAttention object.

RAISES DESCRIPTION
NotImplementedError

If masked_tokens is not None, indicating that the functionality is not implemented.

Source code in mindnlp/transformers/models/graphormer/modeling_graphormer.py
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
def forward(
    self,
    input_nodes: Tensor,
    input_edges: Tensor,
    attn_bias: Tensor,
    in_degree: Tensor,
    out_degree: Tensor,
    spatial_pos: Tensor,
    attn_edge_type: Tensor,
    perturb: Optional[Tensor] = None,
    masked_tokens: None = None,
    return_dict: Optional[bool] = None,
    **kwargs,
) -> Union[Tuple[Tensor], BaseModelOutputWithNoAttention]:
    """
    Construct method in the GraphormerModel class.

    Args:
        self: The instance of the class.
        input_nodes (Tensor): The input nodes tensor for the graph.
        input_edges (Tensor): The input edges tensor for the graph.
        attn_bias (Tensor): The attention bias tensor.
        in_degree (Tensor): The in-degree tensor for nodes in the graph.
        out_degree (Tensor): The out-degree tensor for nodes in the graph.
        spatial_pos (Tensor): The spatial position tensor for nodes in the graph.
        attn_edge_type (Tensor): The attention edge type tensor.
        perturb (Optional[Tensor], default=None): A tensor for perturbation.
        masked_tokens (None): Not implemented; should be None.
        return_dict (Optional[bool], default=None): If True, returns a BaseModelOutputWithNoAttention object.

    Returns:
        Union[Tuple[Tensor], BaseModelOutputWithNoAttention]:
            Depending on the value of return_dict, either a tuple containing input_nodes and inner_states
            or a BaseModelOutputWithNoAttention object.

    Raises:
        NotImplementedError: If masked_tokens is not None, indicating that the functionality is not implemented.
    """
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    inner_states, _ = self.graph_encoder(
        input_nodes, input_edges, attn_bias, in_degree, out_degree, spatial_pos, attn_edge_type, perturb=perturb
    )

    # last inner state, then revert Batch and Graph len
    input_nodes = inner_states[-1].swapaxes(0, 1)

    # project masked tokens only
    if masked_tokens is not None:
        raise NotImplementedError

    input_nodes = self.layer_norm(self.activation_fn(self.lm_head_transform_weight(input_nodes)))

    # project back to size of vocabulary
    if self.share_input_output_embed and hasattr(self.graph_encoder.embed_tokens, "weight"):
        input_nodes = ops.dense(input_nodes, self.graph_encoder.embed_tokens.weight)

    if not return_dict:
        return tuple(x for x in [input_nodes, inner_states] if x is not None)
    return BaseModelOutputWithNoAttention(last_hidden_state=input_nodes, hidden_states=inner_states)

mindnlp.transformers.models.graphormer.modeling_graphormer.GraphormerModel.reset_output_layer_parameters()

Reset output layer parameters

Source code in mindnlp/transformers/models/graphormer/modeling_graphormer.py
1086
1087
1088
1089
1090
def reset_output_layer_parameters(self):
    """
    Reset output layer parameters
    """
    self.lm_output_learned_bias = Parameter(ops.zeros(1))

mindnlp.transformers.models.graphormer.modeling_graphormer.GraphormerPreTrainedModel

Bases: PreTrainedModel

An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained models.

Source code in mindnlp/transformers/models/graphormer/modeling_graphormer.py
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
class GraphormerPreTrainedModel(PreTrainedModel):
    """
    An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
    models.
    """
    config_class = GraphormerConfig
    base_model_prefix = "graphormer"
    supports_gradient_checkpointing = True
    main_input_name_nodes = "input_nodes"
    main_input_name_edges = "input_edges"

    def init_graphormer_params(self, module: Union[nn.Linear, nn.Embedding, GraphormerMultiheadAttention]):
        """
        Initialize the weights specific to the Graphormer Model.
        """
        if isinstance(module, nn.Linear):
            module.weight.set_data(init_normal(module.weight, sigma=0.02, mean=0.0))
            if module.bias:
                module.bias.set_data(init_zero(module.bias))
        if isinstance(module, nn.Embedding):
            weight = np.random.normal(loc=0.0, scale=0.02, size=module.weight.shape)
            if module.padding_idx:
                weight[module.padding_idx] = 0

            module.weight.set_data(Tensor(weight, module.weight.dtype))
        if isinstance(module, GraphormerMultiheadAttention):
            module.q_proj.weight.set_data(init_normal(module.q_proj.weight,
                                                      sigma=0.02, mean=0.0))
            module.k_proj.weight.set_data(init_normal(module.k_proj.weight,
                                                      sigma=0.02, mean=0.0))
            module.v_proj.weight.set_data(init_normal(module.v_proj.weight,
                                                      sigma=0.02, mean=0.0))

    def _init_weights(
        self,
        cell
    ):
        """
        Initialize the weights
        """
        if isinstance(cell, (nn.Linear, nn.Conv2d)):
            # We might be missing part of the Linear init, dependant on the layer num
            cell.weight.set_data(init_normal(cell.weight, sigma=0.02, mean=0.0))
            if cell.bias:
                cell.bias.set_data(init_zero(cell.bias))
        elif isinstance(cell, nn.Embedding):
            weight = np.random.normal(loc=0.0, scale=0.02, size=cell.weight.shape)
            if cell.padding_idx:
                weight[cell.padding_idx] = 0

            cell.weight.set_data(Tensor(weight, cell.weight.dtype))
        elif isinstance(cell, GraphormerMultiheadAttention):
            cell.q_proj.weight.set_data(init_normal(cell.q_proj.weight,
                                                      sigma=0.02, mean=0.0))
            cell.k_proj.weight.set_data(init_normal(cell.k_proj.weight,
                                                      sigma=0.02, mean=0.0))
            cell.v_proj.weight.set_data(init_normal(cell.v_proj.weight,
                                                      sigma=0.02, mean=0.0))
        elif isinstance(cell, GraphormerGraphEncoder):
            if cell.apply_graphormer_init:
                cell.apply(self.init_graphormer_params)

    def _set_gradient_checkpointing(self, module, value=False):
        """
        Set the gradient checkpointing option for a given module in a GraphormerPreTrainedModel.

        Args:
            self (GraphormerPreTrainedModel): The instance of the GraphormerPreTrainedModel class.
            module (GraphormerModel): The module for which the gradient checkpointing option is being set.
            value (bool): The value indicating whether gradient checkpointing is enabled or disabled.

        Returns:
            None.

        Raises:
            TypeError: If the provided module is not an instance of GraphormerModel.
        """
        if isinstance(module, GraphormerModel):
            module.gradient_checkpointing = value

mindnlp.transformers.models.graphormer.modeling_graphormer.GraphormerPreTrainedModel.init_graphormer_params(module)

Initialize the weights specific to the Graphormer Model.

Source code in mindnlp/transformers/models/graphormer/modeling_graphormer.py
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
def init_graphormer_params(self, module: Union[nn.Linear, nn.Embedding, GraphormerMultiheadAttention]):
    """
    Initialize the weights specific to the Graphormer Model.
    """
    if isinstance(module, nn.Linear):
        module.weight.set_data(init_normal(module.weight, sigma=0.02, mean=0.0))
        if module.bias:
            module.bias.set_data(init_zero(module.bias))
    if isinstance(module, nn.Embedding):
        weight = np.random.normal(loc=0.0, scale=0.02, size=module.weight.shape)
        if module.padding_idx:
            weight[module.padding_idx] = 0

        module.weight.set_data(Tensor(weight, module.weight.dtype))
    if isinstance(module, GraphormerMultiheadAttention):
        module.q_proj.weight.set_data(init_normal(module.q_proj.weight,
                                                  sigma=0.02, mean=0.0))
        module.k_proj.weight.set_data(init_normal(module.k_proj.weight,
                                                  sigma=0.02, mean=0.0))
        module.v_proj.weight.set_data(init_normal(module.v_proj.weight,
                                                  sigma=0.02, mean=0.0))