3. In-Depth Guide: Model Training & Evaluation
This guide covers the model training process in detail, including dataset registration, configuration for single and multi-class problems, and how to evaluate model performance.
3.1. The Training Workflow
1. Registering Datasets
Before training can commence, you must register your tiled training and validation datasets with Detectron2.
For single-class datasets:
from detectree2.models.train import register_train_data
train_location = "/path/to/Danum/tiles/train/"
register_train_data(train_location, 'Danum', val_fold=5)
For multi-class datasets, you must also provide the path to the class_to_idx.json
file you created during data preparation. This ensures the model knows about all possible classes.
from detectree2.models.train import register_train_data
train_dir = "/path/to/Danum_lianas/tiles/train"
class_mapping_file = "/path/to/Danum_lianas/tiles/class_to_idx.json"
data_name = "DanumLiana"
register_train_data(train_dir, data_name, val_fold=5, class_mapping_file=class_mapping_file)
The data will be registered as <name>_train
and <name>_val
(e.g., Danum_train
and Danum_val
).
2. Configuring the Model
We must supply a base_model
from Detectron2’s model_zoo
. This loads a backbone that has been pre-trained, which saves time and improves performance.
For single-class training:
from detectree2.models.train import setup_cfg
# Set the base (pre-trained) model from the detectron2 model_zoo
base_model = "COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x.yaml"
trains = ("Danum_train", "Paracou_train") # Registered train data
tests = ("Danum_val", "Paracou_val") # Registered validation data
out_dir = "./train_outputs"
cfg = setup_cfg(base_model, trains, tests, workers=4, eval_period=100, max_iter=3000, out_dir=out_dir)
Alternatively, it is possible to train from one of detectree2
’s pre-trained models. This is recommended if you have limited training data.
# Download a pre-trained model
# !wget https://zenodo.org/records/15863800/files/250312_flexi.pth
trained_model = "./250312_flexi.pth"
cfg = setup_cfg(base_model, trains, tests, trained_model, workers=4, eval_period=100, max_iter=3000, out_dir=out_dir)
For multi-class training, you must pass the class_mapping_file
to the configuration setup. This automatically registers the correct number of classes with the model.
cfg = setup_cfg(
base_model="COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x.yaml",
trains=("DanumLiana_train",),
tests=("DanumLiana_val",),
max_iter=50000,
eval_period=50,
base_lr=0.003,
out_dir="./liana_outputs",
class_mapping_file=class_mapping_file
)
3. Running the Trainer
Once configured, you can start the training. The trainer includes “early stopping” via the patience
parameter, which stops training if validation accuracy does not improve for a set number of epochs.
from detectree2.models.train import MyTrainer
trainer = MyTrainer(cfg, patience=5)
trainer.resume_or_load(resume=False)
trainer.train()
3.2. Data Augmentation
Data augmentation artificially increases the size of the training dataset by applying random transformations to the input data, which helps improve model generalization.
By default, random rotations and flips will be performed on input images.
augmentations = [
T.RandomRotation(angle=[0, 360], expand=False),
T.RandomFlip(prob=0.5, horizontal=True, vertical=False),
]
If the input data is RGB, additional augmentations will be applied to adjust the brightness, contrast, saturation, and lighting of the images.
# Additional augmentations for RGB images
if cfg.IMGMODE == "rgb":
augmentations.extend([
T.RandomBrightness(0.7, 1.5),
T.RandomLighting(0.7),
T.RandomContrast(0.6, 1.3),
T.RandomSaturation(0.8, 1.4)
])
There are three resizing modes for the input data: fixed
, random
, and rand_fixed
, set in the setup_cfg
function.
fixed: Resizes images to a fixed width/height (e.g., 1000 pixels). Efficient but less flexible.
random: Randomly resizes images between 0.6x and 1.4x their original size. Helps the model learn to detect objects at different scales.
rand_fixed: Randomly resizes images but constrains them to a fixed pixel range (e.g., 600-1400 pixels). A good compromise between flexibility and memory usage.
3.3. Post-Training Analysis
It is important to check that the model has converged and is not overfitting. You can do this by plotting the training and validation loss from the metrics.json
file output by the trainer.
import json
import matplotlib.pyplot as plt
from detectree2.models.train import load_json_arr
experiment_folder = "./train_outputs"
experiment_metrics = load_json_arr(experiment_folder + '/metrics.json')
plt.plot(
[x['iteration'] for x in experiment_metrics if 'validation_loss' in x],
[x['validation_loss'] for x in experiment_metrics if 'validation_loss' in x], label='Total Validation Loss', color='red')
plt.plot(
[x['iteration'] for x in experiment_metrics if 'total_loss' in x],
[x['total_loss'] for x in experiment_metrics if 'total_loss' in x], label='Total Training Loss')
plt.legend(loc='upper right')
plt.title('Comparison of the training and validation loss of detectree2')
plt.ylabel('Total Loss')
plt.xlabel('Number of Iterations')
plt.show()

To understand how segmentation performance improves, you can also plot the AP50 score over iterations.
plt.plot(
[x['iteration'] for x in experiment_metrics if 'bbox/AP50' in x],
[x['bbox/AP50'] for x in experiment_metrics if 'bbox/AP50' in x], label='Validation AP50')
plt.legend(loc='lower right')
plt.title('Validation AP50 over training iterations')
plt.ylabel('AP50')
plt.xlabel('Number of Iterations')
plt.show()

3.4. Evaluating Model Performance
Coming soon! See Colab notebook for an example routine (detectree2/notebooks/colab/evaluationJB.ipynb
).
Performance Metrics Explained
In instance segmentation, AP50 refers to the Average Precision at an Intersection over Union (IoU) threshold of 50%.
IoU (Intersection over Union): IoU measures the overlap between the predicted segmentation mask and the ground truth mask. It is calculated as the area of overlap divided by the area of union.
AP50: A predicted object is considered a true positive if its IoU with a ground truth mask is >= 0.5 (50%). AP50 is the average precision calculated at this 50% threshold. It is a standard metric for evaluating how well a model detects objects.
