5. Advanced Topics & “Pro Tips”

This section contains a collection of advanced techniques and tips for expert users and special cases, such as training with multispectral data and handling class imbalance.

5.1. Advanced Multispectral Training

The process for training a multispectral model is similar to that for RGB data but there are some key steps that are different. Data will be read from .tif files of 4 or more bands instead of the 3-band .png files.

First, ensure your data is tiled with mode="ms" and registered correctly.

The number of bands can be checked with rasterio:

import rasterio
import glob

folder_path = "/path/to/tilesMS/"
img_paths = glob.glob(folder_path + "*.tif")
img_path = img_paths[0]

with rasterio.open(img_path) as dataset:
   num_bands = dataset.count
print(f'The raster has {num_bands} bands.')

Due to the additional bands, the weights of the first convolutional layer (conv1) are modified to accommodate a variable number of input channels. This is automatically done when imgmode is set to "ms". The setup_cfg function also automatically extends cfg.MODEL.PIXEL_MEAN and cfg.MODEL.PIXEL_STD to include the additional bands when num_bands is set to a value greater than 3.

from detectree2.models.train import MyTrainer, setup_cfg

trains = ("ParacouMS_train",)
tests = ("ParacouMS_val",)
out_dir = "./ms_outputs"

base_model = "COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x.yaml"

# Set up the configuration for multispectral training
cfg = setup_cfg(base_model, trains, tests, workers=2, eval_period=50,
               max_iter=50000, out_dir=out_dir, imgmode="ms",
               num_bands=num_bands)

With additional bands, you may need to reduce the number of images per batch if you encounter memory errors (e.g., CUDA out of memory).

cfg.SOLVER.IMS_PER_BATCH = 1

trainer = MyTrainer(cfg, patience=5)
trainer.resume_or_load(resume=False)
trainer.train()

5.1.1. Pro Tip: Selective Band Usage

You may want to experiment with using only a subset of your available spectral bands without creating new image files. This can be achieved by creating a custom data mapper that reads only specific bands from your .tif files.

First, define a list of the 1-based band indices you wish to use. Then, define a custom FlexibleDatasetMapper that incorporates this logic, and pass it to the training loader.

import detectree2.models.train as t
import detectron2.data.transforms as T
import rasterio
import torch
import numpy as np

# Define which bands to read (1-based indices)
only_read_bands = [4, 5, 6, 7]

# You must update num_bands in the cfg to match the number of selected bands
cfg.INPUT.NUM_IN_CHANNELS = len(only_read_bands)

# Create a custom mapper class to read only specific bands
class CustomBandMapper(t.FlexibleDatasetMapper):
    def __call__(self, dataset_dict):
        try:
            with rasterio.open(dataset_dict["file_name"]) as src:
                # Read only the specified bands
                img = src.read(indexes=only_read_bands)

            # Transpose to (H, W, C)
            img = np.transpose(img, (1, 2, 0)).astype("float32")

            aug_input = T.AugInput(img)
            transforms = self.augmentations(aug_input)
            img = aug_input.image
            dataset_dict["image"] = torch.as_tensor(np.ascontiguousarray(img.transpose(2, 0, 1)))

            if "annotations" in dataset_dict:
                self._transform_annotations(dataset_dict, transforms, img.shape[:2])

            return dataset_dict
        except Exception as e:
            print(f"Error processing {dataset_dict.get('file_name', 'unknown')}: {e}")
            return None

# Override the default train loader with one that uses the custom mapper
t.FlexibleDatasetMapper = CustomBandMapper

# Now, when you run trainer.train(), it will use only the bands specified.

5.1.2. Pro Tip: Advanced Weight Initialization

The default method for adapting a 3-channel (RGB) pre-trained model to more input channels is to repeat the weights of the first three channels. The detectree2 library provides a utility function to perform this weight adaptation.

For developers who need to adapt an existing model to a different number of input bands (e.g., for 4-band imagery), the multiply_conv1_weights function located in detectree2.models.train automatically copies weights of an existing model round-robin style. Without the call to this method, the model’s weights would be initialized randomly across the whole input convolution layer.

5.2. Advanced Multi-Class Techniques

5.2.1. Handling Class Imbalance with Federated Loss

Real-world datasets for multi-class problems are often imbalanced. To counteract this, you can enable Federated Loss, a technique that gives more weight to rare classes during training. You can enable this by setting a few parameters on the cfg object after it has been created.

# After creating the cfg object with setup_cfg(...)

# Enable Federated Loss
cfg.MODEL.ROI_BOX_HEAD.USE_FED_LOSS = True
cfg.MODEL.ROI_BOX_HEAD.USE_SIGMOID_CE = True

# This power parameter controls how much to weight the rare classes
cfg.MODEL.ROI_BOX_HEAD.FED_LOSS_FREQ_WEIGHT_POWER = 0.7

# You can also set the number of classes for the federated loss
from detectree2.preprocessing.tiling import get_class_distribution
class_distribution = get_class_distribution(tiles_paths[0], 5)
cfg.MODEL.ROI_BOX_HEAD.FED_LOSS_NUM_CLASSES = int(len(class_distribution) * 0.75)

5.2.2. Pro Tip: Transfer Learning with a Different Number of Classes

A common scenario is fine-tuning a model that was pre-trained on a different number of classes than your new dataset. The resume_or_load method is designed to handle this automatically.

If a mismatch in the number of classes is detected between the checkpoint and the new model, Detectron2 will adapt the architecture by randomly initializing the affected parts of the model. This provides a reasonable starting point for the new classes and allows training to proceed without crashing.

5.3. Advanced Fine-Tuning

5.3.1. Pro Tip: Advanced Fine-grained Layer Freezing

When you load a pre-trained model, you may not want to retrain the entire network, especially if your own dataset is small. A powerful technique is to “freeze” parts of the network, making their weights non-trainable, and only fine-tune the higher-level layers. Here’s how you can apply this technique after creating the trainer object and before calling trainer.train():

trainer = MyTrainer(cfg, patience=10)
trainer.resume_or_load(resume=False)

# --- Advanced: Freeze layers of the pre-trained backbone ---

print("Applying custom layer freezing...")

# Freeze the initial convolutional stem
trainer.model.backbone.bottom_up.stem.freeze()

# Freeze the blocks within the first residual stage (res2)
for block in trainer.model.backbone.bottom_up.stages[0].children():
    block.freeze()

print("Starting training with custom frozen layers.")
trainer.train()

Why is this useful?

  • Prevents Overfitting: On small datasets, allowing the full network to train can cause it to “forget” the powerful general features it learned and instead memorize your small dataset. Freezing the early layers prevents this.

  • Faster Training: With fewer trainable parameters, each training iteration is faster.

  • Experimentation: It gives you a crucial tool for experimentation. If your new images are very different from the original training data, you might only freeze the stem. If they are very similar, you might freeze everything up to res4.