Training and Deploying Object Detection with YOLO

15. Training and Deploying Object Detection with YOLO#

15.1. Overview#

In this lesson, we will train an Object Detection model using YOLOv11. You’ll be able to choose specific augmentations, batch size, resolution, and other parameters based on your system’s capabilities and runtime. The dataset is already provided in YOLO format and will be used to train and evaluate the model.

15.2. Learning Objectives#

By the end of this section, you will:

Understand the YOLO format and how to train a custom object detection model using YOLOv11.
Experiment with different augmentations and hyperparameters for object detection.
Evaluate the model’s performance and visualize the results.

15.3. Downloading the Dataset#

The dataset for this lesson is already formatted in YOLO format. You can load it directly for training and evaluation. Ensure you have the dataset uploaded before proceeding.

15.4. Preparing the Environment#

Let’s first install the required libraries and set up the environment to train our YOLOv11 model. Crucially Make sure that you are in a GPU runtime by running the cell below. It should output the GPU currently connected to.

!nvidia-smi

# Install the required dependencies
!pip install ultralytics

# Import required libraries
import os
from ultralytics import YOLO
import json
import zipfile
import os

15.5. Loading the Dataset#

To download the dataset used in this tutorial, visit this link. Make sure to select “YOLOV11” as the format and choose the zip download option. Once the zip file is downloaded to your computer, upload it to your Colab runtime environment. You can do this by clicking the folder icon on the left sidebar and uploading the file there.

If you are using a different dataset or format, ensure it is structured in the YOLO format. For this tutorial, we assume that the dataset is organized into train/, val/, and test/ directories.

# Set paths to the dataset
# Replace with the path to your zip folder, which can be found by rightclicking
# on it in the file browser.

zip_file_path = '/content/ClassPlastics.v1i.yolov11.zip'

# Unzip the file
with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
    zip_ref.extractall('/content/Dataset')

# Set the dataset path
dataset_path = '/content/Dataset'

# Verify the dataset path
print(f"Dataset path is set to: {dataset_path}")
print(f"Files in dataset path: {os.listdir(dataset_path)}")

# Set train and val paths
train_path = os.path.join(dataset_path, 'train/')
val_path = os.path.join(dataset_path, 'val/')

15.6. Initializing TensorBoard Before Training#

TensorBoard is a powerful visualization tool that provides real-time insights into your model’s training process. By initializing TensorBoard before training, you can monitor key metrics such as loss, accuracy, and learning rates, allowing for timely adjustments and improved model performance. This proactive monitoring helps in identifying issues like overfitting or underfitting early in the training process. Be sure to click the refresh button in the top right of the Tensorboard often!

%load_ext tensorboard
%tensorboard --logdir /content/runs/detect/train

15.7. Training the YOLOv11 Model#

Key Training Parameters: imgsz, batch, and epochs

imgsz (Image Size): This parameter defines the target size to which all training images are resized. A standard value is 640 pixels, but adjusting this can impact model accuracy and computational load. Larger sizes may improve accuracy but require more resources, while smaller sizes can speed up training at the cost of precision.

batch (Batch Size): This determines the number of images processed simultaneously during training. Setting an appropriate batch size is essential; too large can lead to memory issues, while too small may result in unstable training. YOLOv11 offers flexibility, allowing you to set a specific integer (e.g., batch=16), use auto mode for 60% GPU memory utilization (batch=-1), or specify a utilization fraction (batch=0.70).

epochs: This defines the number of complete passes through the training dataset. Choosing the right number of epochs is vital; too few may lead to underfitting, while too many can cause overfitting. Monitoring performance metrics during training can help determine the optimal number of epochs.

from ultralytics import YOLO

# Load the YOLOv11 model (pretrained on COCO dataset)
model = YOLO("yolo11n.pt")

# Path to the dataset configuration YAML file
dataset_config = '/content/Dataset/data.yaml'  # Path to the YAML file

# Train the model
results = model.train(
    data=dataset_config,  # Path to the YAML file
    epochs=100,
    batch=64,  # Set a valid batch size (adjust as needed)
    imgsz=640,  # Image size for training
    plots=True,
    patience=50
)

# Optionally, you can print the results after training to inspect
print(results)

15.8. Loss#

As you are waiting for your model to train, take note of the the loss values. In YOLOv11, the loss function comprises three primary components: box loss, class loss, and Distribution Focal Loss (DFL). Each plays a distinct role in training the model effectively.

15.8.1. Box Loss#

Box loss is responsible for optimizing the localization accuracy of predicted bounding boxes. It measures the discrepancy between the predicted boxes and the ground truth annotations. YOLOv11 employs the Complete Intersection over Union (CIoU) loss for this purpose, which considers:

Overlap Area: The intersection over union between the predicted and ground truth boxes. Distance Between Centers: How far apart the centers of the two boxes are. Aspect Ratio Consistency: Differences in the width and height ratios of the boxes. By integrating these factors, CIoU provides a comprehensive measure for bounding box regression, leading to more precise localization.

15.8.2. Class Loss#

Class loss ensures that the model accurately classifies detected objects into their respective categories. It is typically calculated using Cross-Entropy Loss, which evaluates the difference between the predicted class probabilities and the actual class labels. Minimizing this loss helps the model improve its classification performance.

15.8.3. Distribution Focal Loss (DFL)#

DFL is designed to enhance the model’s ability to distinguish between objects that are similar or challenging to differentiate. It focuses on refining the bounding box predictions by emphasizing harder-to-classify examples, improving the model’s discriminative power. This is particularly beneficial in scenarios with class imbalance or when dealing with small or ambiguous objects.

Each of these loss components contributes to the overall training objective by addressing different aspects of the object detection task: localization, classification, and the handling of difficult examples. Balancing these losses appropriately is crucial for achieving optimal model performance.

15.9. Fitting#

Monitoring loss metrics is crucial for assessing model performance and identifying signs of overfitting or underfitting. In YOLOv11, consistently decreasing box loss, class loss, and Distribution Focal Loss (DFL) during training indicates effective learning. However, if these loss metrics stagnate—showing no significant improvement over successive epochs—it may suggest that the model has reached its optimal capacity or is encountering issues such as overfitting or underfitting.

15.9.1. Overfitting#

Overfitting occurs when the model performs well on training data but poorly on validation data, indicating it has memorized the training examples rather than generalizing from them. This is often observed when training loss continues to decrease while validation loss starts to increase. To mitigate overfitting, techniques such as early stopping can be employed. In YOLOv11, you can set the patience parameter in your training configuration to specify the number of epochs to wait for an improvement in validation metrics before stopping training. For example, setting patience=5 will halt training if there’s no improvement in validation metrics for five consecutive epochs.

15.9.2. Underfitting#

Underfitting is characterized by poor performance on both training and validation datasets, suggesting the model is too simplistic to capture the underlying patterns in the data. This can be identified when both training and validation losses are high and show minimal improvement. To address underfitting, consider increasing the model’s complexity, providing more training data, or adjusting hyperparameters to better capture the data’s intricacies.

By closely monitoring these loss metrics and implementing strategies like early stopping with an appropriate patience parameter, you can ensure efficient training, prevent overfitting, and achieve optimal model performance.

15.9.3. Evaluating the Model#

After training, we will evaluate the model performance using validation data and calculated metrics such as mean Average Precision (mAP). YOLOv11 will perform evaluation automatically after running its training mode, however if you stopped early or have other reasons to run validation after a model is trained, you can do so using the val mode

import os
from ultralytics import YOLO

model_path = '/content/runs/detect/train/weights/best.pt'
model = YOLO(model_path)
test_images_dir = '/content/Dataset/test/images'
results = model.predict(source=test_images_dir, save=True, save_txt=True)
results

For now, we will assume that your model was trained to its set number of epochs, so we wil be displaying the graphs directly from its train directory.

from IPython.display import Image, display
import os

# Set the base directory
base_dir = "/content/runs/detect/train/"

# List of filenames to display
filenames = [
    "labels.jpg",
    "F1_curve.png",
    "PR_curve.png",
    "P_curve.png",
    "R_curve.png",
    "confusion_matrix.png",
    "confusion_matrix_normalized.png"
]

# Display each image
for filename in filenames:
    image_path = os.path.join(base_dir, filename)
    display(Image(image_path))

15.10. Inference#

After training your model and evaluating its performance, the next step is to run inference on a video to assess its real-world applicability. Within this code cell, you can adjust two inference parameters: confidence threshold (conf) and Intersection over Union threshold (iou).

15.10.1. Confidence Threshold (conf)#

(default: 0.25) This parameter sets the minimum confidence level for detections. Objects detected with a confidence score below this threshold will be disregarded. Adjusting this value can help reduce false positives.

15.10.2. Intersection over Union (IoU)#

iou (default: 0.7): This parameter defines the IoU threshold for Non-Maximum Suppression (NMS). Lower values result in fewer detections by eliminating overlapping boxes, which is useful for reducing duplicates.

Use the following code block to download a video to test your new model on:

!wget https://huggingface.co/datasets/OceanCV/PlasticTank_Video/resolve/main/tankvid.mp4?download=true -O tankvid.mp4

import cv2
from ultralytics import YOLO

model_path = '/content/runs/detect/train/weights/best.pt'
model = YOLO(model_path)

video_path = 'tankvid.mp4'

results = model.predict(source=video_path, conf=0.25, iou=0.7)

15.10.3. Reflecting on Results#

Now that you’ve trained and evaluated your model, reflect on the following questions:

How might the parameters affect the model’s performance and how would you design an experiment to test the best params for your usecase?
Were there any significant differences in the val metrics for different classes, why?
What visual observations can you make from the test results?