16. Instance Segmentation with YOLO#
16.1. Overview#
In this lesson, we will train an Instance Segmentation model using YOLOv11. You’ll be able to choose specific augmentations, batch size, resolution, and other parameters based on your system’s capabilities and runtime. The dataset is already provided in YOLO format and will be used to train and evaluate the model.
16.2. Learning Objectives#
By the end of this section, you will:
Understand the extended YOLO format and how to train a custom instance segmentation model using YOLOv11.
Experiment with different augmentations and hyperparameters for instance segmentation.
Evaluate the model’s performance and visualize bounding boxes and masks.
16.3. Background#
For instance segmentation in YOLO, each object is still labeled with a bounding box (as in standard object detection) and an associated segmentation mask. In practice, the YOLO label files include extra information to store the polygonal representation or encoded mask for each object. This is unique compared to many other frameworks because YOLO’s model architecture can simultaneously predict bounding boxes for localization and generate masks for precise instance segmentation.
16.4. Downloading the Dataset#
The dataset for this lesson is already formatted in the YOLO instance segmentation format. You can load it directly for training and evaluation. Ensure you have the dataset uploaded before proceeding.
Access the dataset at the following link:
https://universe.roboflow.com/lini-foundation/lini-coral-forms-1.0/dataset/1
16.5. Preparing the Environment#
Let’s first install the required libraries and set up the environment to train our YOLOv11 instance segmentation model. Crucially, make sure that you are in a GPU runtime by running the cell below. It should output the GPU currently connected to.
!nvidia-smi
!unzip /content/dataset.zip -d dataset
# Install the required dependencies
!pip install ultralytics
# Import required libraries
import os
from ultralytics import YOLO
import json
import zipfile
import os
import cv2
import numpy as np
import plotly.graph_objects as go
from PIL import Image
import io
import base64
import plotly.express as px
import plotly.io as pio
16.6. Initializing TensorBoard Before Training#
TensorBoard is a powerful visualization tool that provides real-time insights into your model’s training process. By initializing TensorBoard before training, you can monitor key metrics such as loss, accuracy, and learning rates, allowing for timely adjustments and improved model performance. This proactive monitoring helps in identifying issues like overfitting or underfitting early in the training process. Be sure to click the refresh button in the top right of the Tensorboard often!
%load_ext tensorboard
%tensorboard --logdir /content/runs/segment/train
# Load a model
model = YOLO("yolo11n-seg.pt")
# Train the model
results = model.train(
data='/content/dataset/data.yaml',
epochs=100
)
# Set Plotly to use the Colab renderer
pio.renderers.default = "colab"
# Directories
test_image_dir = "/content/dataset/test/images"
saved_output_dir = "/content/runs/segment/predict"
model = YOLO("/content/runs/segment/train/weights/best.pt")
results = model(test_image_dir, save=True)
# Iterate over the results for each image processed
for result in results:
orig_path = result.path # the path of the input image
base_name = os.path.basename(orig_path)
orig_img = cv2.imread(orig_path)
if orig_img is None:
print(f"Could not load image {orig_path}. Skipping.")
continue
orig_img = cv2.cvtColor(orig_img, cv2.COLOR_BGR2RGB)
h, w = orig_img.shape[:2]
total_pixels = h * w
mask = np.zeros((h, w), dtype=np.uint8)
# If any segmentation masks were returned, combine them.
if result.masks is not None and result.masks.data is not None:
for m in result.masks.data:
m_np = m.cpu().numpy() if hasattr(m, "cpu") else m
m_bin = (m_np > 0.5).astype(np.uint8)
mask = np.maximum(mask, m_bin)
# Calculate the number of pixels in the mask and its percentage coverage over the image.
mask_pixel_count = int(np.sum(mask))
coverage = (mask_pixel_count / total_pixels) * 100
saved_img_path = os.path.join(saved_output_dir, base_name)
if not os.path.exists(saved_img_path):
print(f"Saved image for {base_name} not found in {saved_output_dir}.")
continue
saved_img = cv2.imread(saved_img_path)
saved_img = cv2.cvtColor(saved_img, cv2.COLOR_BGR2RGB)
fig = px.imshow(saved_img, title=f"{base_name}: {coverage:.2f}% coverage, {mask_pixel_count} pixels")
fig.update_xaxes(showticklabels=False)
fig.update_yaxes(showticklabels=False)
fig.show()
image 1/14 /content/dataset/test/images/20230704_174611_mp4-8_jpg.rf.e616459a2ba009da3e97af91ee6896e9.jpg: 640x640 1 phaceolid, 12.0ms
image 2/14 /content/dataset/test/images/20230829_123255_jpg.rf.83ef6c419fb9450cc30533be3a0f3152.jpg: 640x640 1 Parascolymia, 10.8ms
image 3/14 /content/dataset/test/images/20230829_123544_jpg.rf.68e223d29096ccf784464c9d4e08160f.jpg: 640x640 1 Parascolymia, 11.1ms
image 4/14 /content/dataset/test/images/20230829_123650_jpg.rf.4e4157deb8ec4d1e4574e9414e8f10ce.jpg: 640x640 1 Parascolymia, 10.8ms
image 5/14 /content/dataset/test/images/20230829_124126_jpg.rf.8e7d988292723a46eb3283dca055a962.jpg: 640x640 1 phaceolid, 10.8ms
image 6/14 /content/dataset/test/images/20230829_124409_jpg.rf.871e34845775219697ebe21e97cb13c3.jpg: 640x640 1 phaceolid, 10.7ms
image 7/14 /content/dataset/test/images/20230829_124510_jpg.rf.10f413fa7f82a5196905071454271061.jpg: 640x640 1 phaceolid, 11.0ms
image 8/14 /content/dataset/test/images/20230829_124619_jpg.rf.f539c120bb7a0c8568a490654b824394.jpg: 640x640 1 massive, 11.5ms
image 9/14 /content/dataset/test/images/20230829_124712_jpg.rf.cbcb6be5d2e9999b1aad9cd87fc43f11.jpg: 640x640 1 massive, 11.9ms
image 10/14 /content/dataset/test/images/20230829_124915_jpg.rf.7df15bfc9f192d8bf1811121154b0871.jpg: 640x640 1 submassive, 11.4ms
image 11/14 /content/dataset/test/images/20230829_125036_jpg.rf.fc23a71e88b2f989140870a79a9c2ea3.jpg: 640x640 1 Encrusting, 1 branching, 11.1ms
image 12/14 /content/dataset/test/images/20230829_125447_jpg.rf.7f40cac22f123bca6e074adc2c4f7a19.jpg: 640x640 1 branching, 11.2ms
image 13/14 /content/dataset/test/images/20230829_130442_jpg.rf.e8a337d3a2f754e0862ac1dd807302c4.jpg: 640x640 1 submassive, 11.0ms
image 14/14 /content/dataset/test/images/20230829_130539_jpg.rf.1b6b1a3dceee3da29739ade4bcca6969.jpg: 640x640 2 branchings, 11.2ms
Speed: 2.3ms preprocess, 11.2ms inference, 1.9ms postprocess per image at shape (1, 3, 640, 640)
Results saved to runs/segment/predict3