Metrics

accuracy

“Class for Metric Accuracy

class mindnlp.metrics.accuracy.Accuracy(name='Accuracy')[source]

Bases: Metric

Calculates accuracy. The function is shown as follows:

\[\text{ACC} =\frac{\text{TP} + \text{TN}} {\text{TP} + \text{TN} + \text{FP} + \text{FN}}\]

where ACC is accuracy, TP is the number of true posistive cases, TN is the number of true negative cases, FP is the number of false posistive cases, FN is the number of false negative cases.

Parameters:

name (str) – Name of the metric.

Example

>>> import numpy as np
>>> import mindspore
>>> from mindspore import nn, Tensor
>>> from mindnlp.common.metrics import Accuracy
>>> preds = Tensor(np.array([[0.2, 0.5], [0.3, 0.1], [0.9, 0.6]]), mindspore.float32)
>>> labels = Tensor(np.array([1, 0, 1]), mindspore.int32)
>>> metric = Accuracy()
>>> metric.update(preds, labels)
>>> acc = metric.eval()
>>> print(acc)
0.6666666666666666
clear()[source]

Clears the internal evaluation results.

eval()[source]

Computes and returns the accuracy.

Returns:

  • acc (float) - The computed result.

Raises:

RuntimeError – If the number of samples is 0.

get_metric_name()[source]

Returns the name of the metric.

update(*inputs)[source]

Updates local variables.

Parameters:

inputs

Input preds and labels.

  • preds (Union[Tensor, list, numpy.ndarray]): Predicted value. preds is a list of floating numbers in range \([0, 1]\) and the shape of preds is \((N, C)\) in most cases (not strictly), where \(N\) is the number of cases and \(C\) is the number of categories.

  • labels (Union[Tensor, list, numpy.ndarray]): Ground truth value. labels must be in one-hot format that shape is \((N, C)\), or can be transformed to one-hot format that shape is \((N,)\).

Raises:
  • ValueError – If the number of inputs is not 2.

  • ValueError – class numbers of last input predicted data and current predicted data not match.

mindnlp.metrics.accuracy.accuracy_fn(preds, labels)[source]

Calculates the accuracy. The function is shown as follows:

\[\text{ACC} =\frac{\text{TP} + \text{TN}} {\text{TP} + \text{TN} + \text{FP} + \text{FN}}\]

where ACC is accuracy, TP is the number of true posistive cases, TN is the number of true negative cases, FP is the number of false posistive cases, FN is the number of false negative cases.

Parameters:
  • preds (Union[Tensor, list, np.ndarray]) – Predicted value. preds is a list of floating numbers in range \([0, 1]\) and the shape of preds is \((N, C)\) in most cases (not strictly), where \(N\) is the number of cases and \(C\) is the number of categories.

  • labels (Union[Tensor, list, np.ndarray]) – Ground truth. labels must be in one-hot format that shape is \((N, C)\), or can be transformed to one-hot format that shape is \((N,)\).

Returns:

  • acc (float) - The computed result.

Raises:

RuntimeError – If the number of samples is 0.

Example

>>> import numpy as np
>>> import mindspore
>>> from mindspore import Tensor
>>> from mindnlp.common.metrics import accuracy
>>> preds = Tensor(np.array([[0.2, 0.5], [0.3, 0.1], [0.9, 0.6]]), mindspore.float32)
>>> labels = Tensor(np.array([1, 0, 1]), mindspore.int32)
>>> acc = accuracy(preds, labels)
>>> print(acc)
0.6666666666666666

bleu

“Class for Metric BleuScore

class mindnlp.metrics.bleu.BleuScore(n_size=4, weights=None, name='BleuScore')[source]

Bases: Metric

Calculates the BLEU score. BLEU (bilingual evaluation understudy) is a metric for evaluating the quality of text translated by machine. It uses a modified form of precision to compare a candidate translation against multiple reference translations. The function is shown as follows:

\[ \begin{align}\begin{aligned}\begin{split}BP & = \begin{cases} 1, & \text{if }c>r \\ e_{1-r/c}, & \text{if }c\leq r \end{cases}\end{split}\\BLEU & = BP\exp(\sum_{n=1}^N w_{n} \log{p_{n}})\end{aligned}\end{align} \]

where c is the length of candidate sentence, and r is the length of reference sentence.

Parameters:
  • n_size (int) – N_gram value ranges from 1 to 4. Default: 4.

  • weights (Union[list, None]) – Weights of precision of each gram. Defaults to None.

  • name (str) – Name of the metric.

Raises:
  • ValueError – If the value range of n_size is not from 1 to 4.

  • ValueError – If the lengths of weights is not equal to n_size.

Example

>>> from mindnlp.common.metrics import BleuScore
>>> cand = [["The", "cat", "The", "cat", "on", "the", "mat"]]
>>> ref_list = [[["The", "cat", "is", "on", "the", "mat"],
                ["There", "is", "a", "cat", "on", "the", "mat"]]]
>>> metric = BleuScore()
>>> metric.update(cand, ref_list)
>>> bleu_score = metric.eval()
>>> print(bleu_score)
0.46713797772820015
clear()[source]

Clears the internal evaluation results.

eval()[source]

Computes and returns the BLEU score.

Returns:

  • bleu_score (float) - The computed result.

get_metric_name()[source]

Returns the name of the metric.

update(*inputs)[source]

Updates local variables.

Parameters:

inputs

Input cand and ref_list.

  • cand (list): A list of tokenized candidate sentences.

  • ref_list (list): A list of lists of tokenized ground truth sentences.

Raises:
  • ValueError – If the number of inputs is not 2.

  • ValueError – If the lengths of cand and ref_list are not equal.

mindnlp.metrics.bleu.bleu_fn(cand, ref_list, n_size=4, weights=None)[source]

Calculates the BLEU score. BLEU (bilingual evaluation understudy) is a metric for evaluating the quality of text translated by machine. It uses a modified form of precision to compare a candidate translation against multiple reference translations. The function is shown as follows:

\[ \begin{align}\begin{aligned}\begin{split}BP & = \begin{cases} 1, & \text{if }c>r \\ e_{1-r/c}, & \text{if }c\leq r \end{cases}\end{split}\\BLEU & = BP\exp(\sum_{n=1}^N w_{n} \log{p_{n}})\end{aligned}\end{align} \]

where c is the length of candidate sentence, and r is the length of reference sentence.

Parameters:
  • cand (list) – A list of tokenized candidate sentences.

  • ref_list (list) – A list of lists of tokenized true sentences.

  • n_size (int) – N_gram value ranges from 1 to 4. Default: 4.

  • weights (Union[list, None]) – Weights of precision of each gram. Defaults to None.

Returns:

  • bleu_score (float) - The computed result.

Raises:
  • ValueError – If the value range of n_size is not from 1 to 4.

  • ValueError – If the lengths of cand and ref_list are not equal.

  • ValueError – If the lengths of weights is not equal to n_size.

Example

>>> from mindnlp.common.metrics import bleu
>>> cand = [["The", "cat", "The", "cat", "on", "the", "mat"]]
>>> ref_list = [[["The", "cat", "is", "on", "the", "mat"],
                ["There", "is", "a", "cat", "on", "the", "mat"]]]
>>> bleu_score = bleu(cand, ref_list)
>>> print(bleu_score)
0.46713797772820015
mindnlp.metrics.bleu.count_ngram(input_list, n_gram)[source]

count ngram

confusion_matrix

“Class for Metric ConfusionMatrix

class mindnlp.metrics.confusion_matrix.ConfusionMatrix(class_num=2, name='ConfusionMatrix')[source]

Bases: Metric

Calculates the confusion matrix. Confusion matrix is commonly used to evaluate the performance of classification models, including binary classification and multiple classification.

Parameters:
  • class_num (int) – Number of classes in the dataset. Default: 2.

  • name (str) – Name of the metric.

Example

>>> import numpy as np
>>> import mindspore
>>> from mindspore import Tensor
>>> from mindnlp.engine.metrics import ConfusionMatrix
>>> preds = Tensor(np.array([1, 0, 1, 0]))
>>> labels = Tensor(np.array([1, 0, 0, 1]))
>>> metric = ConfusionMatrix()
>>> metric.update(preds, labels)
>>> conf_mat = metric.eval()
>>> print(conf_mat)
[[1. 1.]
 [1. 1.]]
clear()[source]

Clears the internal evaluation results.

eval()[source]

Computes and returns the Confusion Matrix.

Returns:

  • conf_mat (np.ndarray) - The computed result.

get_metric_name()[source]

Returns the name of the metric.

update(*inputs)[source]

Updates local variables.

Parameters:

inputs

Input preds and labels.

  • preds (Union[Tensor, list, np.ndarray]): Predicted value. preds is a list of floating numbers and the shape of preds is \((N, C)\) or \((N,)\).

  • labels (Union[Tensor, list, np.ndarray]): Ground truth. The shape of labels is \((N,)\).

Raises:
  • ValueError – If the number of inputs is not 2.

  • ValueError – If preds and labels do not have valid dimensions.

mindnlp.metrics.confusion_matrix.confusion_matrix_fn(preds, labels, class_num=2)[source]

Calculates the confusion matrix. Confusion matrix is commonly used to evaluate the performance of classification models, including binary classification and multiple classification.

Parameters:
  • preds (Union[Tensor, list, np.ndarray]) – Predicted value. preds is a list of floating numbers and the shape of preds is \((N, C)\) or \((N,)\).

  • labels (Union[Tensor, list, np.ndarray]) – Ground truth. The shape of labels is \((N,)\).

  • class_num (int) – Number of classes in the dataset. Default: 2.

Returns:

  • conf_mat (np.ndarray) - The computed result.

Raises:

ValueError – If preds and labels do not have valid dimensions.

Example

>>> import numpy as np
>>> import mindspore
>>> from mindspore import Tensor
>>> from mindnlp.common.metrics import confusion_matrix
>>> preds = Tensor(np.array([1, 0, 1, 0]))
>>> labels = Tensor(np.array([1, 0, 0, 1]))
>>> conf_mat = confusion_matrix(preds, labels)
>>> print(conf_mat)
[[1. 1.]
 [1. 1.]]

distinct

“Class for Metric Distinct

class mindnlp.metrics.distinct.Distinct(n_size=2, name='Distinct')[source]

Bases: Metric

Calculates the Distinct-N. Distinct-N is a metric that measures the diversity of a sentence. It focuses on the number of distinct n-gram of a sentence. The larger the number of distinct n-grams, the higher the diversity of the text. The function is shown as follows:

Parameters:
  • n_size (int) – N_gram value. Defaults: 2.

  • name (str) – Name of the metric.

Example

>>> from mindnlp.common.metrics import Distinct
>>> cand_list = ["The", "cat", "The", "cat", "on", "the", "mat"]
>>> metric = Distinct()
>>> metric.update(cand_list)
>>> distinct_score = metric.eval()
>>> print(distinct_score)
0.8333333333333334
clear()[source]

Clears the internal evaluation results.

eval()[source]

Computes and returns the Distinct-N.

Returns:

  • distinct_score (float) - The computed result.

get_metric_name()[source]

Returns the name of the metric.

update(*inputs)[source]

Updates local variables.

Parameters:

inputs

Input cand_list.

  • cand_list (list): A list of tokenized candidate sentence.

Raises:

ValueError – If the number of inputs is not 1.

mindnlp.metrics.distinct.distinct_fn(cand_list, n_size=2)[source]

Calculates the Distinct-N. Distinct-N is a metric that measures the diversity of a sentence. It focuses on the number of distinct n-gram of a sentence. The larger the number of distinct n-grams, the higher the diversity of the text. The function is shown as follows:

Parameters:
  • cand_list (list) – A list of tokenized candidate sentence.

  • n_size (int) – N_gram value. Defaults: 2.

Returns:

  • distinct_score (float) - The computed result.

Example

>>> from mindnlp.common.metrics import distinct
>>> cand_list = ["The", "cat", "The", "cat", "on", "the", "mat"]
>>> distinct_score = distinct(cand_list)
>>> print(distinct_score)
0.8333333333333334

em_score

“Class for Metric EmScore

class mindnlp.metrics.em_score.EmScore(name='EmScore')[source]

Bases: Metric

Calculates the exact match (EM) score. This metric measures the percentage of predictions that match any one of the ground truth answers exactly.

Parameters:

name (str) – Name of the metric.

Example

>>> import numpy as np
>>> import mindspore
>>> from mindspore import Tensor
>>> from mindnlp.engine.metrics import EmScore
>>> preds = "this is the best span"
>>> examples = ["this is a good span", "something irrelevant"]
>>> metric = EmScore()
>>> metric.update(preds, examples)
>>> em_score = metric.eval()
>>> print(em_score)
0.0
clear()[source]

Clears the internal evaluation results.

eval()[source]

Computes and returns the EM score.

Returns: - exact_match (float) - The computed result.

get_metric_name()[source]

Returns the name of the metric.

update(*inputs)[source]

Updates local variables.

Parameters:

inputs

Input preds and examples.

  • preds (Union[str, list]): Predicted value.

  • examples (list): Ground truth.

Raises:
  • ValueError – If the number of inputs is not 2.

  • RuntimeError – If preds and examples have different lengths.

mindnlp.metrics.em_score.em_score_fn(preds, examples)[source]

Calculates the exact match (EM) score. This metric measures the percentage of predictions that match any one of the ground truth exactly.

Parameters:
  • preds (Union[str, list]) – Predicted value.

  • examples (list) – Ground truth.

Returns:

  • exact_match (float) - The computed result.

Raises:

RuntimeError – If preds and examples have different lengths.

Example

>>> import numpy as np
>>> import mindspore
>>> from mindspore import Tensor
>>> from mindnlp.common.metrics import em_score
>>> preds = "this is the best span"
>>> examples = ["this is a good span", "something irrelevant"]
>>> exact_match = em_score(preds, examples)
>>> print(exact_match)
0.0

f1

“Class for Metric F1Score

class mindnlp.metrics.f1.F1Score(name='F1Score')[source]

Bases: Metric

Calculates the F1 score. Fbeta score is a weighted mean of precision and recall, and F1 score is a special case of Fbeta when beta is 1. The function is shown as follows:

\[F_1=\frac{2\cdot TP}{2\cdot TP + FN + FP}\]

where TP is the number of true posistive cases, FN is the number of false negative cases, FP is the number of false positive cases.

Parameters:

name (str) – Name of the metric.

Example

>>> import numpy as np
>>> import mindspore
>>> from mindspore import Tensor
>>> from mindnlp.engine.metrics import F1Score
>>> preds = Tensor(np.array([[0.2, 0.5], [0.3, 0.1], [0.9, 0.6]]))
>>> labels = Tensor(np.array([1, 0, 1]))
>>> metric = F1Score()
>>> metric.update(preds, labels)
>>> f1_s = metric.eval()
>>> print(f1_s)
[0.6666666666666666 0.6666666666666666]
clear()[source]

Clears the internal evaluation results.

eval()[source]

Computes and returns the F1 score.

Returns:

  • f1_s (numpy.ndarray) - The computed result.

Raises:

RuntimeError – If the number of samples is 0.

get_metric_name()[source]

Returns the name of the metric.

update(*inputs)[source]

Updates local variables.

Parameters:

inputs

Input preds and labels.

  • preds (Union[Tensor, list, np.ndarray]): Predicted value. preds is a list of floating numbers in range \([0, 1]\) and the shape of preds is \((N, C)\) in most cases (not strictly), where \(N\) is the number of cases and \(C\) is the number of categories.

  • labels (Union[Tensor, list, np.ndarray]): Ground truth. labels must be in one-hot format that shape is \((N, C)\), or can be transformed to one-hot format that shape is \((N,)\).

Raises:
  • ValueError – If the number of inputs is not 2.

  • ValueError – class numbers of last input predicted data and current predicted data not match.

  • ValueError – If preds doesn’t have the same classes number as labels.

mindnlp.metrics.f1.f1_score_fn(preds, labels)[source]

Calculates the F1 score. Fbeta score is a weighted mean of precision and recall, and F1 score is a special case of Fbeta when beta is 1. The function is shown as follows:

\[F_1=\frac{2\cdot TP}{2\cdot TP + FN + FP}\]

where TP is the number of true posistive cases, FN is the number of false negative cases, FP is the number of false positive cases.

Parameters:
  • preds (Union[Tensor, list, np.ndarray]) – Predicted value. preds is a list of floating numbers in range \([0, 1]\) and the shape of preds is \((N, C)\) in most cases (not strictly), where \(N\) is the number of cases and \(C\) is the number of categories.

  • labels (Union[Tensor, list, np.ndarray]) – Ground truth. labels must be in one-hot format that shape is \((N, C)\), or can be transformed to one-hot format that shape is \((N,)\).

Returns:

  • f1_s (np.ndarray) - The computed result.

Raises:

ValueError – If preds doesn’t have the same classes number as labels.

Example

>>> import numpy as np
>>> import mindspore
>>> from mindspore import Tensor
>>> from mindnlp.common.metrics import f1_score
>>> preds = Tensor(np.array([[0.2, 0.5], [0.3, 0.1], [0.9, 0.6]]))
>>> labels = Tensor(np.array([1, 0, 1]))
>>> f1_s = f1_score(preds, labels)
>>> print(f1_s)
[0.6666666666666666 0.6666666666666666]

matthews

“Class for Metric MatthewsCorrelation

class mindnlp.metrics.matthews.MatthewsCorrelation(name='MatthewsCorrelation')[source]

Bases: Metric

Calculates the Matthews correlation coefficient (MCC). MCC is in essence a correlation coefficient between the observed and predicted binary classifications; it returns a value between −1 and +1. A coefficient of +1 represents a perfect prediction, 0 no better than random prediction and −1 indicates total disagreement between prediction and observation. The function is shown as follows:

\[MCC=\frac{TP \times TN-FP \times FN}{\sqrt{(TP+FP)(TP+FN)(TN+FP)(TN+FN)}}\]

where TP is the number of true posistive cases, TN is the number of true negative cases, FN is the number of false negative cases, FP is the number of false positive cases.

Parameters:

name (str) – Name of the metric.

Example

>>> import numpy as np
>>> import mindspore
>>> from mindspore import Tensor
>>> from mindnlp.engine.metrics import MatthewsCorrelation
>>> preds = [[0.8, 0.2], [-0.5, 0.5], [0.1, 0.4], [0.6, 0.3], [0.6, 0.3]]
>>> labels = [0, 1, 0, 1, 0]
>>> metric = MatthewsCorrelation()
>>> metric.update(preds, labels)
>>> m_c_c = metric.eval()
>>> print(m_c_c)
0.16666666666666666
clear()[source]

Clears the internal evaluation results.

eval()[source]

Computes and returns the MCC.

Returns:

  • m_c_c (float) - The computed result.

get_metric_name()[source]

Returns the name of the metric.

update(*inputs)[source]

Updates local variables.

Parameters:

inputs

Input preds and labels.

  • preds (Union[Tensor, list, numpy.ndarray]): Predicted value. preds is a list of floating numbers in range \([0, 1]\) and the shape of preds is \((N, C)\) in most cases (not strictly), where \(N\) is the number of cases and \(C\) is the number of categories.

  • labels (Union[Tensor, list, numpy.ndarray]): Ground truth value. labels must be in one-hot format that shape is \((N, C)\), or can be transformed to one-hot format that shape is \((N,)\).

Raises:

ValueError – If the number of inputs is not 2.

mindnlp.metrics.matthews.matthews_correlation_fn(preds, labels)[source]

Calculates the Matthews correlation coefficient (MCC). MCC is in essence a correlation coefficient between the observed and predicted binary classifications; it returns a value between −1 and +1. A coefficient of +1 represents a perfect prediction, 0 no better than random prediction and −1 indicates total disagreement between prediction and observation. The function is shown as follows:

\[MCC=\frac{TP \times TN-FP \times FN}{\sqrt{(TP+FP)(TP+FN)(TN+FP)(TN+FN)}}\]

where TP is the number of true posistive cases, TN is the number of true negative cases, FN is the number of false negative cases, FP is the number of false positive cases.

Parameters:
  • preds (Union[Tensor, list, np.ndarray]) – Predicted value. preds is a list of floating numbers and the shape of preds is \((N, C)\) in most cases (not strictly), where \(N\) is the number of cases and \(C\) is the number of categories.

  • labels (Union[Tensor, list, np.ndarray]) – Ground truth. labels must be in one-hot format that shape is \((N, C)\), or can be transformed to one-hot format that shape is \((N,)\).

Returns:

  • m_c_c (float) - The computed result.

Example

>>> import numpy as np
>>> import mindspore
>>> from mindspore import Tensor
>>> from mindnlp.common.metrics import matthews_correlation
>>> preds = [[0.8, 0.2], [-0.5, 0.5], [0.1, 0.4], [0.6, 0.3], [0.6, 0.3]]
>>> labels = [0, 1, 0, 1, 0]
>>> m_c_c = matthews_correlation(preds, labels)
>>> print(m_c_c)
0.16666666666666666

pearson

“Class for Metric PearsonCorrelation

class mindnlp.metrics.pearson.PearsonCorrelation(name='PearsonCorrelation')[source]

Bases: Metric

Calculates the Pearson correlation coefficient (PCC). PCC is a measure of linear correlation between two sets of data. It is the ratio between the covariance of two variables and the product of their standard deviations; thus, it is essentially a normalized measurement of the covariance, such that the result always has a value between −1 and 1.

Parameters:

name (str) – Name of the metric.

Example

>>> import numpy as np
>>> import mindspore
>>> from mindspore import Tensor
>>> from mindnlp.engine.metrics import PearsonCorrelation
>>> preds = Tensor(np.array([[0.1], [1.0], [2.4], [0.9]]), mindspore.float32)
>>> labels = Tensor(np.array([[0.0], [1.0], [2.9], [1.0]]), mindspore.float32)
>>> metric = PearsonCorrelation()
>>> metric.update(preds, labels)
>>> p_c_c = metric.eval()
>>> print(p_c_c)
0.9985229081857804
clear()[source]

Clears the internal evaluation results.

eval()[source]

Computes and returns the PCC.

Returns:

  • p_c_c (float) - The computed result.

get_metric_name()[source]

Returns the name of the metric.

update(*inputs)[source]

Updates local variables.

Parameters:

inputs

Input preds and labels.

  • preds (Union[Tensor, list, np.ndarray]): Predicted value. preds is a list of floating numbers and the shape of preds is \((N, 1)\).

  • labels (Union[Tensor, list, np.ndarray]): Ground truth. labels is a list of floating numbers and the shape of preds is \((N, 1)\).

Raises:
  • ValueError – If the number of inputs is not 2.

  • RuntimeError – If preds and labels have different lengths.

mindnlp.metrics.pearson.pearson_correlation_fn(preds, labels)[source]

Calculates the Pearson correlation coefficient (PCC). PCC is a measure of linear correlation between two sets of data. It is the ratio between the covariance of two variables and the product of their standard deviations; thus, it is essentially a normalized measurement of the covariance, such that the result always has a value between −1 and 1.

Parameters:
  • preds (Union[Tensor, list, np.ndarray]) – Predicted value. preds is a list of floating numbers and the shape of preds is \((N, 1)\).

  • labels (Union[Tensor, list, np.ndarray]) – Ground truth. labels is a list of floating numbers and the shape of preds is \((N, 1)\).

Returns:

  • p_c_c (float) - The computed result.

Raises:

RuntimeError – If preds and labels have different lengths.

Example

>>> import numpy as np
>>> import mindspore
>>> from mindspore import Tensor
>>> from mindnlp.common.metrics import pearson_correlation
>>> preds = Tensor(np.array([[0.1], [1.0], [2.4], [0.9]]), mindspore.float32)
>>> labels = Tensor(np.array([[0.0], [1.0], [2.9], [1.0]]), mindspore.float32)
>>> p_c_c = pearson_correlation(preds, labels)
>>> print(p_c_c)
0.9985229081857804

perplexity

“Class for Metric Perplexity

class mindnlp.metrics.perplexity.Perplexity(ignore_label=None, name='Perplexity')[source]

Bases: Metric

Calculates the perplexity. Perplexity is a measure of how well a probabilibity model predicts a sample. A low perplexity indicates the model is good at predicting the sample. The function is shown as follows:

\[PP(W)=P(w_{1}w_{2}...w_{N})^{-\frac{1}{N}}=\sqrt[N]{\frac{1}{P(w_{1}w_{2}...w_{N})}}\]

Where \(w\) represents words in corpus.

Parameters:
  • ignore_label (Union[int, None]) – Index of an invalid label to be ignored when counting. If set to None, it means there’s no invalid label. Default: None.

  • name (str) – Name of the metric.

Examples

>>> import numpy as np
>>> import mindspore
>>> from mindspore import Tensor
>>> from mindnlp.common.metrics import Perplexity
>>> preds = Tensor(np.array([[0.2, 0.5], [0.3, 0.1], [0.9, 0.6]]))
>>> labels = Tensor(np.array([1, 0, 1]))
>>> metric = Perplexity()
>>> metric.update(preds, labels)
>>> ppl = metric.eval()
>>> print(ppl)
2.231443166940565
clear()[source]

Clears the internal evaluation results.

eval()[source]

Computes and returns the perplexity.

Returns:

  • ppl (float) - The computed result.

Raises:

RuntimeError – If the sample size is 0.

get_metric_name()[source]

Return the name of the metric.

update(*inputs)[source]

Updates local variables.

Parameters:

inputs

Input preds and labels.

  • preds (Union[Tensor, list, np.ndarray]): Predicted value. preds is a list of floating numbers in range \([0, 1]\) and the shape of preds is \((N, C)\) in most cases (not strictly), where \(N\) is the number of cases and \(C\) is the number of categories.

  • labels (Union[Tensor, list, np.ndarray]): Ground truth. labels must be in one-hot format that shape is \((N, C)\), or can be transformed to one-hot format that shape is \((N,)\).

Raises:
  • ValueError – If the number of inputs is not 2.

  • RuntimeError – If preds and labels have different lengths.

  • RuntimeError – If pred and label have different shapes.

mindnlp.metrics.perplexity.perplexity_fn(preds, labels, ignore_label=None)[source]

Calculates the perplexity. Perplexity is a measure of how well a probabilibity model predicts a sample. A low perplexity indicates the model is good at predicting the sample. The function is shown as follows:

\[PP(W)=P(w_{1}w_{2}...w_{N})^{-\frac{1}{N}}=\sqrt[N]{\frac{1}{P(w_{1}w_{2}...w_{N})}}\]

where \(w\) represents words in corpus.

Parameters:
  • preds (Union[Tensor, list, np.ndarray]) – Predicted value. preds is a list of floating numbers in range \([0, 1]\) and the shape of preds is \((N, C)\) in most cases (not strictly), where \(N\) is the number of cases and \(C\) is the number of categories.

  • labels (Union[Tensor, list, np.ndarray]) – Ground truth. labels must be in one-hot format that shape is \((N, C)\), or can be transformed to one-hot format that shape is \((N,)\).

  • ignore_label (Union[int, None]) – Index of an invalid label to be ignored when counting. If set to None, it means there’s no invalid label. Default: None.

Returns:

  • ppl (float) - The computed result.

Raises:
  • RuntimeError – If preds and labels have different lengths.

  • RuntimeError – If pred and label have different shapes.

  • RuntimeError – If the sample size is 0.

Examples

>>> import numpy as np
>>> import mindspore
>>> from mindspore import Tensor
>>> from mindnlp.common.metrics import perplexity
>>> preds = Tensor(np.array([[0.2, 0.5], [0.3, 0.1], [0.9, 0.6]]), mindspore.float32)
>>> labels = Tensor(np.array([1, 0, 1]), mindspore.int32)
>>> ppl = perplexity(preds, labels, ignore_label=None)
>>> print(ppl)
2.231443166940565

precision

“Class for Metric Precision

class mindnlp.metrics.precision.Precision(name='Precision')[source]

Bases: Metric

Calculates precision. Precision (also known as positive predictive value) is the actual positive proportion in the predicted positive sample. It can only be used to evaluate the precision score of binary tasks. The function is shown as follows:

\[\text{Precision} =\frac{\text{TP}} {\text{TP} + \text{FP}}\]

where TP is the number of true posistive cases, FP is the number of false posistive cases.

Parameters:

name (str) – Name of the metric.

Example

>>> from mindnlp.common.metrics import Precision
>>> preds = Tensor(np.array([[0.2, 0.5], [0.3, 0.1], [0.9, 0.6]]), mindspore.float32)
>>> labels = Tensor(np.array([1, 0, 1]), mindspore.int32)
>>> metric = Precision()
>>> metric.update(preds, labels)
>>> prec = metric.eval()
>>> print(prec)
[0.5 1. ]
clear()[source]

Clears the internal evaluation results.

eval()[source]

Computes and returns the precision.

Returns:

  • prec (numpy.ndarray) - The computed result.

get_metric_name()[source]

Returns the name of the metric.

update(*inputs)[source]

Updates local variables. If the index of the maximum of the predicted value matches the label, the predicted result is correct.

Parameters:

inputs

Input preds and labels.

  • preds (Union[Tensor, list, numpy.ndarray]): Predicted value. preds is a list of floating numbers in range \([0, 1]\) and the shape of preds is \((N, C)\) in most cases (not strictly), where \(N\) is the number of cases and \(C\) is the number of categories.

  • labels (Union[Tensor, list, numpy.ndarray]): Ground truth value. labels must be in one-hot format that shape is \((N, C)\), or can be transformed to one-hot format that shape is \((N,)\).

Raises:
  • ValueError – If the number of inputs is not 2.

  • ValueError – If preds doesn’t have the same classes number as labels.

mindnlp.metrics.precision.precision_fn(preds, labels)[source]

Calculates the precision. Precision (also known as positive predictive value) is the actual positive proportion in the predicted positive sample. It can only be used to evaluate the precision score of binary tasks. The function is shown as follows:

\[\text{Precision} =\frac{\text{TP}} {\text{TP} + \text{FP}}\]

where TP is the number of true posistive cases, FP is the number of false posistive cases.

Parameters:
  • preds (Union[Tensor, list, np.ndarray]) – Predicted value. preds is a list of floating numbers in range \([0, 1]\) and the shape of preds is \((N, C)\) in most cases (not strictly), where \(N\) is the number of cases and \(C\) is the number of categories.

  • labels (Union[Tensor, list, np.ndarray]) – Ground truth. labels must be in one-hot format that shape is \((N, C)\), or can be transformed to one-hot format that shape is \((N,)\).

Returns:

  • prec (np.ndarray) - The computed result.

Raises:

ValueError – If preds doesn’t have the same classes number as labels.

Example

>>> import numpy as np
>>> import mindspore
>>> from mindspore import Tensor
>>> from mindnlp.common.metrics import precision
>>> preds = Tensor(np.array([[0.2, 0.5], [0.3, 0.1], [0.9, 0.6]]), mindspore.float32)
>>> labels = Tensor(np.array([1, 0, 1]), mindspore.int32)
>>> prec = precision(preds, labels)
>>> print(prec)
[0.5 1. ]

recall

“Class for Metric Recall

class mindnlp.metrics.recall.Recall(name='Recall')[source]

Bases: Metric

Calculates the recall. Recall is also referred to as the true positive rate or sensitivity. The function is shown as follows:

\[\text{Recall} =\frac{\text{TP}} {\text{TP} + \text{FN}}\]

where TP is the number of true posistive cases, FN is the number of false negative cases.

Parameters:

name (str) – Name of the metric.

Example

>>> import numpy as np
>>> import mindspore
>>> from mindspore import Tensor
>>> from mindnlp.common.metrics import Recall
>>> preds = Tensor(np.array([[0.2, 0.5], [0.3, 0.1], [0.9, 0.6]]), mindspore.float32)
>>> labels = Tensor(np.array([1, 0, 1]), mindspore.int32)
>>> metric = Recall()
>>> metric.update(preds, labels)
>>> rec = metric.eval()
>>> print(rec)
[1. 0.5]
clear()[source]

Clears the internal evaluation results.

eval()[source]

Computes and returns the recall.

Returns:

  • rec (numpy.ndarray) - The computed result.

get_metric_name()[source]

Returns the name of the metric.

update(*inputs)[source]

Updates local variables.

Parameters:

inputs

Input preds and labels.

  • preds (Union[Tensor, list, np.ndarray]): Predicted value. preds is a list of floating numbers in range \([0, 1]\) and the shape of preds is \((N, C)\) in most cases (not strictly), where \(N\) is the number of cases and \(C\) is the number of categories.

  • labels (Union[Tensor, list, np.ndarray]): Ground truth. labels must be in one-hot format that shape is \((N, C)\), or can be transformed to one-hot format that shape is \((N,)\).

Raises:
  • ValueError – If the number of inputs is not 2.

  • ValueError – If preds doesn’t have the same classes number as labels.

mindnlp.metrics.recall.recall_fn(preds, labels)[source]

Calculates the recall. Recall is also referred to as the true positive rate or sensitivity. The function is shown as follows:

\[\text{Recall} =\frac{\text{TP}} {\text{TP} + \text{FN}}\]

where TP is the number of true posistive cases, FN is the number of false negative cases.

Parameters:
  • preds (Union[Tensor, list, np.ndarray]) – Predicted value. preds is a list of floating numbers in range \([0, 1]\) and the shape of preds is \((N, C)\) in most cases (not strictly), where \(N\) is the number of cases and \(C\) is the number of categories.

  • labels (Union[Tensor, list, np.ndarray]) – Ground truth. labels must be in one-hot format that shape is \((N, C)\), or can be transformed to one-hot format that shape is \((N,)\).

Returns:

  • rec (np.ndarray) - The computed result.

Raises:

ValueError – If preds doesn’t have the same classes number as labels.

Example

>>> import numpy as np
>>> import mindspore
>>> from mindspore import Tensor
>>> from mindnlp.common.metrics import recall
>>> preds = Tensor(np.array([[0.2, 0.5], [0.3, 0.1], [0.9, 0.6]]), mindspore.float32)
>>> labels = Tensor(np.array([1, 0, 1]), mindspore.int32)
>>> rec = recall(preds, labels)
>>> print(rec)
[1. 0.5]

rouge

“Classes for Metrics RougeN and RougeL

class mindnlp.metrics.rouge.RougeL(beta=1.2, name='RougeL')[source]

Bases: Metric

Calculates the ROUGE-L score. ROUGE (Recall-Oriented Understudy for Gisting Evaluation) is a set of metrics used for evaluating automatic summarization and machine translation models. ROUGE-L is calculated based on Longest Common Subsequence (LCS). The function is shown as follows:

\[ \begin{align}\begin{aligned}R_{l c s}=\frac{L C S(X, Y)}{m}\\p_{l c s}=\frac{L C S(X, Y)}{n}\\F_{l c s}=\frac{\left(1+\beta^{2}\right) R_{l c s} P_{l c s}}{R_{l c s}+\beta^{2} P_{l c s}}\end{aligned}\end{align} \]

where X is the candidate sentence, Y is the reference sentence. m and n represent the length of X and Y respectively. LCS means the longest common subsequence.

Parameters:
  • beta (float) – A hyperparameter to decide the weight of recall. Defaults: 1.2.

  • name (str) – Name of the metric.

Example

>>> from mindnlp.common.metrics import RougeL
>>> cand_list = ["The","cat","The","cat","on","the","mat"]
>>> ref_list = [["The","cat","is","on","the","mat"],
                ["There","is","a","cat","on","the","mat"]]
>>> metric = RougeL()
>>> metric.update(cand_list, ref_list)
>>> rougel_score = metric.eval()
>>> print(rougel_score)
0.7800511508951408
clear()[source]

Clears the internal evaluation results.

eval()[source]

Computes and returns the Rouge-L score.

Returns:

  • rougel_score (float) - The computed result.

get_metric_name()[source]

Returns the name of the metric.

update(*inputs)[source]

Updates local variables.

Parameters:

inputs – Input cand_list and ref_list. cand_list (list): A list of tokenized candidate sentence. ref_list (list): A list of lists of tokenized ground truth sentences.

Raises:

ValueError – If the number of inputs is not 2.

class mindnlp.metrics.rouge.RougeN(n_size=1, name='RougeN')[source]

Bases: Metric

Calculates the ROUGE-N. ROUGE (Recall-Oriented Understudy for Gisting Evaluation) is a set of metrics used for evaluating automatic summarization and machine translation models. ROUGE-N refers to the overlap of n-grams between candidates and reference summaries.

Parameters:
  • n_size (int) – N_gram value. Default: 1.

  • name (str) – Name of the metric.

Example

>>> from mindnlp.common.metrics import RougeN
>>> cand_list = ["the", "cat", "was", "found", "under", "the", "bed"]
>>> ref_list = [["the", "cat", "was", "under", "the", "bed"]]
>>> metric = RougeN(2)
>>> metric.update(cand_list, ref_list)
>>> rougen_score = metric.eval()
>>> print(rougen_score)
0.8
clear()[source]

Clears the internal evaluation results.

eval()[source]

Computes and returns the Rouge-N score.

Returns:

  • rougen_score (float) - The computed result.

Raises:

RuntimeError – If the reference size is 0.

get_metric_name()[source]

Returns the name of the metric.

update(*inputs)[source]

Updates local variables.

Parameters:

inputs

Input cand_list and ref_list.

  • cand_list (list): A list of tokenized candidate sentence.

  • ref_list (list): A list of lists of tokenized ground truth sentences.

Raises:

ValueError – If the number of inputs is not 2.

mindnlp.metrics.rouge.rouge_l_fn(cand_list, ref_list, beta=1.2)[source]

Calculates the ROUGE-L score. ROUGE (Recall-Oriented Understudy for Gisting Evaluation) is a set of metrics used for evaluating automatic summarization and machine translation models. ROUGE-L is calculated based on Longest Common Subsequence (LCS). The function is shown as follows:

\[ \begin{align}\begin{aligned}R_{l c s}=\frac{L C S(X, Y)}{m}\\p_{l c s}=\frac{L C S(X, Y)}{n}\\F_{l c s}=\frac{\left(1+\beta^{2}\right) R_{l c s} P_{l c s}}{R_{l c s}+\beta^{2} P_{l c s}}\end{aligned}\end{align} \]

where X is the candidate sentence, Y is the reference sentence. m and n represent the length of X and Y respectively. LCS means the longest common subsequence.

Parameters:
  • cand_list (list) – A list of tokenized candidate sentence.

  • ref_list (list) – A list of lists of tokenized true sentences.

  • beta (float) – A hyperparameter to decide the weight of recall. Defaults: 1.2.

Returns:

  • rougel_score (float) - The computed result.

Example

>>> from mindnlp.common.metrics import rouge_l
>>> cand_list = ["The","cat","The","cat","on","the","mat"]
>>> ref_list = [["The","cat","is","on","the","mat"],
                ["There","is","a","cat","on","the","mat"]]
>>> rougel_score = rouge_l(cand_list, ref_list)
>>> print(rougel_score)
0.7800511508951408
mindnlp.metrics.rouge.rouge_n_fn(cand_list, ref_list, n_size=1)[source]

Calculates the ROUGE-N score. ROUGE (Recall-Oriented Understudy for Gisting Evaluation) is a set of metrics used for evaluating automatic summarization and machine translation models. ROUGE-N refers to the overlap of n-grams between candidates and reference summaries.

Parameters:
  • cand_list (list) – A list of tokenized candidate sentences.

  • ref_list (list) – A list of lists of tokenized true sentences.

  • n_size (int) – N_gram value. Default: 1.

Returns:

  • rougen_score (float) - The computed result.

Raises:

RuntimeError – If the reference size is 0.

Example

>>> from mindnlp.common.metrics import rouge_n
>>> cand_list = ["the", "cat", "was", "found", "under", "the", "bed"]
>>> ref_list = [["the", "cat", "was", "under", "the", "bed"]]
>>> rougen_score = rouge_n(cand_list, ref_list, 2)
>>> print(rougen_score)
0.8

spearman

“Class for Metric Spearman

class mindnlp.metrics.spearman.SpearmanCorrelation(name='SpearmanCorrelation')[source]

Bases: Metric

Calculates the Spearman’s rank correlation coefficient (SRCC). It is a nonparametric measure of rank correlation (statistical dependence between the rankings of two variables). It assesses how well the relationship between two variables can be described using a monotonic function. If there are no repeated data values, a perfect Spearman correlation of +1 or −1 occurs when each of the variables is a perfect monotone function of the other.

Parameters:

name (str) – Name of the metric.

Example

>>> import numpy as np
>>> import mindspore
>>> from mindspore import Tensor
>>> from mindnlp.engine.metrics import SpearmanCorrelation
>>> preds = Tensor(np.array([[0.1], [1.0], [2.4], [0.9]]), mindspore.float32)
>>> labels = Tensor(np.array([[0.0], [1.0], [2.9], [1.0]]), mindspore.float32)
>>> metric = SpearmanCorrelation()
>>> metric.update(preds, labels)
>>> s_r_c_c = metric.eval()
>>> print(s_r_c_c)
1.0
clear()[source]

Clears the internal evaluation results.

eval()[source]

Computes and returns the SRCC.

Returns:

  • s_r_c_c (float) - The computed result.

get_metric_name()[source]

Returns the name of the metric.

update(*inputs)[source]

Updates local variables.

Parameters:

inputs

Input preds and labels.

  • preds (Union[Tensor, list, np.ndarray]): Predicted value. preds is a list of floating numbers and the shape of preds is \((N, 1)\).

  • labels (Union[Tensor, list, np.ndarray]): Ground truth. labels is a list of floating numbers and the shape of preds is \((N, 1)\).

Raises:
  • ValueError – If the number of inputs is not 2.

  • RuntimeError – If preds and labels have different lengths.

mindnlp.metrics.spearman.spearman_correlation_fn(preds, labels)[source]

Calculates the Spearman’s rank correlation coefficient (SRCC). It is a nonparametric measure of rank correlation (statistical dependence between the rankings of two variables). It assesses how well the relationship between two variables can be described using a monotonic function. If there are no repeated data values, a perfect Spearman correlation of +1 or −1 occurs when each of the variables is a perfect monotone function of the other.

Parameters:
  • preds (Union[Tensor, list, np.ndarray]) – Predicted value. preds is a list of floating numbers and the shape of preds is \((N, 1)\).

  • labels (Union[Tensor, list, np.ndarray]) – Ground truth. labels is a list of floating numbers and the shape of preds is \((N, 1)\).

Returns:

  • s_r_c_c (float) - The computed result.

Raises:

RuntimeError – If preds and labels have different lengths.

Example

>>> import numpy as np
>>> import mindspore
>>> from mindspore import Tensor
>>> from mindnlp.common.metrics import spearman_correlation
>>> preds = Tensor(np.array([[0.1], [1.0], [2.4], [0.9]]), mindspore.float32)
>>> labels = Tensor(np.array([[0.0], [1.0], [2.9], [1.0]]), mindspore.float32)
>>> s_r_c_c = spearman_correlation(preds, labels)
>>> print(s_r_c_c)
1.0