Use Trainer¶
In the Quick Start tutorial, we have learned using the Traienr
API to fine-tune a model.
This tutorial will give you a comprehensive description of configure Trainer
to train your model for optimal outcomes.
The TrainingArguments
and Trainer
classes from MindNLP streamline the process of training machine learning models. TrainingArguments
allows you to easily configure essential training parameters. Trainer
then leverages these configurations to efficiently handle the entire training loop. Together, these tools abstract away much of the complexity of the training task, enabling both novices and experts to effectively optimize their models.
Configure training parameters¶
By creating a TrainingArguments
object, you can specify the desired configuration for the training process.
The following is a code snippet that instantiate a TrainingArugments
object:
from mindnlp.engine import TrainingArguments
training_args = TrainingArguments(
output_dir="../../output",
num_train_epochs=3,
learning_rate=5e-5,
per_device_train_batch_size=16,
per_device_eval_batch_size=16,
evaluation_strategy="steps",
eval_steps=100,
save_strategy="epoch",
logging_strategy="steps",
logging_steps=50
)
Let's break down the code to understand each argument in detail.
Basic parameters¶
output_dir
This parameter specifies the directory where the model's checkpoints and training outputs will be saved.
* num_train_epochs
This parameter defines the total number of training cycles through the entire dataset that the model will undergo.
Adjusting the number of training epochs directly impacts how well the model learns from the dataset. A higher number of epochs allows the model to learn more from the training data and achieve better results. However, setting too many epochs can lead to overfitting, where the model performs well on training data but poorly on new, unseen data.
Optimizer parameters¶
TrainingArguments
allows you to specify parameters of the optimizer, these include:
optim
This parameter specifies the optimizer to be used for training the model. So far, MindNLP supports AdamW and SGD. You can choose the optimizer by setting this parameter to "adamw"
or "sgd"
. By default, TrainingArguments
chooses AdamW.
learning_rate
This patameter sets the initial learning rate for the optimizer, determining the step size at each iteration during loss minimization.
This is one of first parameters to check if your training process fails to converge properly.
A higher learning rate can converge faster, but if too high, it may overshoot the minimum, causing instability by bouncing around or diverging from the optimal weights. Conversely, a too-low learning rate can lead to slow convergence, potentially getting stuck in local minima.
- Advanced parameters for optimizer
Using default values of the following advanced parameters suffices for most of the training. Interested readers and experts can adjust them to achieve better training result.
- `weight_decay`: This parameter helps prevent overfitting by penalizing large weights. Weight decay is a regularization term added to the loss function, which effectively reduces the magnitude of weights in the model.
- `adam_beta1` and `adam_beta2`: These parameters are specific to the AdamW optimizer. `adam_beta1` controls the exponential decay rate for the first moment estimates (similar to the momentum term), and `adam_beta2` controls the exponential decay rate for the second-moment estimates (related to the adaptive learning rate).
- `adam_epsilon`: This is a very small number to prevent any division by zero in the implementation of the Adam optimizer. It's used to improve numerical stability.
- `max_grad_norm`: This is used for gradient clipping, a technique to prevent exploding gradients in deep neural networks. Clipping the gradients at a specified norm helps in stabilizing the training process.
Batch size parameters¶
Parameters related to batch size allow you to control how many examples are processed at a time during training and evaluation phases. Here's a summary of these parameters:
per_device_train_batch_size
This parameter sets the batch size for each training step.
A large batch size can speed up training and make updates more consistent, but it might need more memory and could potentially converge to suboptimal minima.
On the other hand, a smaller batch size requires less memory and might help the model learn better, though it could slow down the training process.
per_device_eval_batch_size
This parameter sets the batch size for each step in the evaluation.
Note: if you already batched your dataset before hand, by calling dataset.batch()
for example, you would want to set the batch size in the TrainingArguments
to 1, so the Trainer
will not further batch on top your already batched dataset.
Strategies for evaluation, saving and logging¶
The TrainingArguments
allows you to define the strategies for evaluation, saving and logging during the training process.
Evaluation strategy¶
The evaluation_strategy
parameter determines when the model should be evaluated during the training process. Evalutation is essential for monitoring model performance on a validation dataset, which is normally different from the training dataset.
The strategy of performing evaluation can be:
- "no": No evaluation is performed.
- "steps": Evaluation occurs at specified intervals in terms of training steps.
If the "steps" strategy is chosen, eval_steps
needs to specified to control how many steps of training should occur between each evaluation.
- "epoch": Evaluation happens at the end of each epoch.
Save strategy¶
The save_strategy
parameter controls when the model's state should be saved during the training process. Saving is crucial for preserving model checkpoints at different stages of training, which can be useful for recovery or further fine-tuning.
The strategy for saving can be:
- "no": No saving is performed.
- "steps": Saving occurs at specified intervals in terms of training steps.
If the "steps" strategy is chosen, save_steps
needs to be specified to control how many steps of training should occur between each saved checkpoint.
- "epoch": Saving happens at the end of each epoch.
Logging strategy¶
The logging_strategy
parameter determines when the model's training metrics should be logged during the training process. Logging is important for tracking progress, understanding model behavior, and diagnosing issues during training.
The strategy for logging can be: - "no": No logging is performed. - "steps": Logging occurs at specified intervals in terms of training steps. If the "steps" strategy is chosen, the logging_steps needs to be specified to control how many steps of training should occur between each logging event. - "epoch": Logging happens at the end of each epoch.
Create the trainer¶
The Trainer
in MindNLP accepts configurations from a TrainingArgument
object and handle the entire training loop.
Assume you have defined the model
, dataset_train
, dataset_val
and a function compute_metrics
, for example as in the Quick Start tutorial, a Trainer
object can be created using the following code:
from mindnlp.engine import Trainer
trainer = Trainer(
model=model,
train_dataset=dataset_train,
eval_dataset=dataset_val,
compute_metrics=compute_metrics,
args=training_args,
)
Here's an explanation of key arguments used to customize training behavior:
model
: Pass the model instance you plan to train. This is the primary object that will undergo the training process.args
: YourTrainingArgument
object that sets up configurations for training.-
train_dataset
,eval_dataset
: These are the datasets on which the model will be trained or evaluated, respectively. Rember to preprocess the dataset as in the Data Preprocess tutorial. -
compute_metrics
: A function that calculates specific performance metrics from the model's predictions. It takes amindnlp.engine.utils.EvalPrediction
object, which contains the predictions and labels, and returns the metric results.
An example of compute_metrics
function can be defined as follows:
import evaluate
import numpy as np
from mindnlp.engine.utils import EvalPrediction
metric = evaluate.load("accuracy")
def compute_metrics(eval_pred: EvalPrediction):
logits, labels = eval_pred
predictions = np.argmax(logits, axis=-1)
return metric.compute(predictions=predictions, references=labels)
Note that at the moment we still need to load the accuracy metric fromthe Hugging Face evaluate module.
Once the trainer is created, run trainer.train()
to start your training process.