9. Image Classification with Keras#
9.1. Overview#
In this lesson, we will explore image classification using a Convolutional Neural Network (CNN) in Keras with TensorFlow. Keras is a high-level API built on top of TensorFlow, designed to simplify the process of building and training neural networks. TensorFlow, on the other hand, is a comprehensive open-source machine learning framework that provides powerful tools for building and deploying deep learning models, including neural networks.
We will use a dataset of benthic animals to classify images of crabs and rockfish, which were captured in 2011 by the ROV ROPOS at Southern Hydrate Ridge, as part of the Ocean Observatories Initiative Regional Cabled Array operated by the University of Washington and funded by the National Science Foundation (NSF).
9.1.1. Learning Objectives#
By the end of this section, you will:
Understand how to load, preprocess, and split image datasets for training a neural network.
Develop, compile, and train a Convolutional Neural Network (CNN) model for image classification using Keras.
Start to visualize and interpret training and validation accuracy and loss graphs, and evaluate model performance.
Make predictions on new, unseen images and assess the model’s confidence in classifying them.
9.1.2. Object Detection vs. Image Classification#
There are two common tasks in computer vision: object detection and image classification.
Image classification involves categorizing an entire image into a single class. In this case, we are classifying each image as either a “crab” or a “rockfish.”
Object detection, on the other hand, is more complex. It involves not only identifying the objects present in an image but also determining their exact location within the image. This requires the use of annotations (usually separate files) to provide bounding boxes around each object.
In this lesson, we focus on image classification. The images used in this dataset are not annotated with bounding boxes because they are assumed to contain only one of the two possible classes—either “crab” or “rockfish.” This simplifies the problem, allowing us to rely on the image file names and folder structure to determine the class labels. No separate annotation files are necessary.
Note
The following activity requires a small dataset download, you can download it here: - SHRCrabsandFishClassification.zip
Alternatively, you can use a dataset modified in the previous lesson!
9.2. Libraries Overview#
Here’s a quick overview of the libraries used in this activity:
matplotlib.pyplot: A plotting library used to visualize data. In this lesson, it helps display images and graphs of model performance.
numpy: A fundamental library for numerical operations in Python, often used for handling arrays and matrices.
PIL (Python Imaging Library): Provides functionality for opening, manipulating, and saving image files. In this lesson, it’s used to display sample images.
OpenCV (cv2): Provides advanced image augmentation
tensorflow: A popular machine learning framework used for building and training models. We use TensorFlow’s high-level Keras API to create and train the Convolutional Neural Network (CNN) in this lesson.
random: A standard Python module used here to randomly select and display images from the dataset.
pathlib: Used for handling and manipulating file system paths in an easy-to-use manner.
zipfile: A Python library for handling
.zip
files. Here, it’s used to extract the dataset.tensorflow.keras: A high-level neural networks API, included with TensorFlow, used to create layers and models for machine learning.
# Importing Required Libraries
import matplotlib.pyplot as plt
import numpy as np
import PIL
import cv2
import tensorflow as tf
import random
import pathlib
import zipfile
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
9.3. Dataset Description#
This tutorial uses a dataset of 800 photos of benthic animals, consisting of two classes:
crab
rockfish
These images are organized into two subdirectories within the dataset folder:
new_benthic_photo/
crab/
rockfish/
Now lets set up our downloaded dataset, to do this import your unextracted zip file into colab under the files section on the left side of your screen.
dataset_path = "/content/SHRCrabsandFishClassification.zip"
with zipfile.ZipFile(dataset_path, 'r') as zip_ref:
zip_ref.extractall("/content/SHRCrabsandFishClassification")
data_dir = pathlib.Path("/content/SHRCrabsandFishClassification")
image_count = len(list(data_dir.glob('*/*.png')))
print(f'Total number of images: {image_count}')
crabs = list(data_dir.glob('crab/*'))
rockfish = list(data_dir.glob('rockfish/*'))
from PIL import Image
Image.open(str(crabs[0])).show()
Image.open(str(crabs[1])).show()
Image.open(str(rockfish[0])).show()
Image.open(str(rockfish[1])).show()
# Define Data Directories
crab_dir = data_dir / 'crab'
rockfish_dir = data_dir / 'rockfish'
# List all images in the 'crab' folder
crab_images = list(crab_dir.glob('*'))
random_crab = random.randint(0, len(crab_images) - 1) # Randomly select from all crab images
# Load and display the crab image using OpenCV and Matplotlib
crab_image = cv2.imread(str(crab_images[random_crab]))
crab_image = cv2.cvtColor(crab_image, cv2.COLOR_BGR2RGB) # Convert BGR to RGB for correct color display
plt.imshow(crab_image)
plt.title("Random Crab Image")
plt.axis('off')
plt.show()
# List all images in the 'rockfish' folder
rockfish_images = list(rockfish_dir.glob('*'))
random_rockfish = random.randint(0, len(rockfish_images) - 1) # Randomly select from all rockfish images
# Load and display the rockfish image using OpenCV and Matplotlib
rockfish_image = cv2.imread(str(rockfish_images[random_rockfish]))
rockfish_image = cv2.cvtColor(rockfish_image, cv2.COLOR_BGR2RGB) # Convert BGR to RGB for correct color display
plt.imshow(rockfish_image)
plt.title("Random Rockfish Image")
plt.axis('off')
plt.show()
Note
By defining the data directories for each class (crab and rockfish), we make it easier to access and manage the images belonging to each category. This allows us to quickly refer to the directories when performing operations like random sampling, image preprocessing, or displaying specific examples from each class. This is a bit overkill with only 2 classes, but its good practice because it ensures that our code is organized and can efficiently work with large datasets where images are separated into folders based on their classes.
9.4. Understanding Dataset Parameters#
Here, we set key parameters for handling the dataset:
batch_size = 32: This defines how many images will be processed in one pass through the model. Using a batch size of 32 means that during training, the model will look at 32 images before updating its internal parameters. A batch size of 32 is commonly used for balancing memory efficiency and training speed.
img_height = 180 and img_width = 180: These define the dimensions to which each image will be resized. By resizing all images to a uniform height and width of 180x180 pixels, we ensure consistency in input size, which is required for neural networks. Although resizing reduces detail, it also speeds up computation and simplifies model training.
# Set Parameters for the Dataset
batch_size = 32
img_height = 180
img_width = 180
9.5. Loading the Dataset#
We use the tf.keras.utils.image_dataset_from_directory
function to load and preprocess the dataset. Here’s what each argument does:
data_dir: This points to the directory where the dataset is stored (the root folder containing the
crab
androckfish
subdirectories).validation_split = 0.2: This splits the dataset into training and validation sets. In this case, 80% of the data will be used for training, and 20% for validation.
subset: We specify whether we’re creating the training or validation dataset. Setting
subset="training"
creates the training set, andsubset="validation"
creates the validation set.seed = 123: A seed value ensures that the dataset split is reproducible, meaning that the same data will always be allocated to the training and validation sets when the code is rerun.
image_size = (img_height, img_width): This resizes each image to the predefined size of
180x180
pixels, ensuring all images are consistent when input to the model.batch_size = 32: This defines the number of images processed in each batch, ensuring efficient training and memory usage.
This setup loads the dataset in a format that’s ready for model training while automatically handling preprocessing like image resizing.
# Loading the Dataset
train_ds = tf.keras.utils.image_dataset_from_directory(
data_dir,
validation_split=0.2,
subset="training",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
val_ds = tf.keras.utils.image_dataset_from_directory(
data_dir,
validation_split=0.2,
subset="validation",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
# Display Class Names
class_names = train_ds.class_names
print(class_names)
9.6. Visualizing the Data#
This block of code is used to visualize a sample of images from the training dataset. Here’s what each part does:
plt.figure(figsize=(10, 10)): This sets up the figure size for displaying the images. The size
(10, 10)
ensures a large enough grid to comfortably view multiple images.train_ds.take(1): This grabs a single batch of images and labels from the training dataset. Since the batch size is set to 32, this will retrieve 32 images and their corresponding labels, but we’re only displaying 9 of them.
for i in range(9): This loop goes through the first 9 images in the batch and plots them.
ax = plt.subplot(3, 3, i + 1): This creates a
3x3
grid of subplots to display 9 images.plt.imshow(images[i].numpy().astype(“uint8”)): This converts each image tensor into a NumPy array and displays it as an image.
plt.title(class_names[labels[i]]): This adds the class name (either “crab” or “rockfish”) as the title above each image based on the label associated with the image.
plt.axis(“off”): This hides the axes for a cleaner visualization of the images.
The purpose of this block is to quickly visualize how the dataset looks, allowing us to verify that the images and their labels are being loaded correctly.
# Visualize the Data
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 10))
for images, labels in train_ds.take(1):
for i in range(9):
ax = plt.subplot(3, 3, i + 1)
plt.imshow(images[i].numpy().astype("uint8"))
plt.title(class_names[labels[i]])
plt.axis("off")
9.7. Verifying Dataset Structure#
In this block of code, we inspect the structure of the dataset by printing the shape of a batch of images and their corresponding labels. Before diving into the code, let’s briefly discuss tensors.
9.7.1. What Are Tensors?#
A tensor is a multi-dimensional array that generalizes matrices to higher dimensions. Tensors are the basic data structure in deep learning and are used to represent inputs, weights, and outputs of models. In our case, each image is represented as a 3D tensor (height, width, color channels), and the dataset is organized into batches of these tensors. The labels are 1D tensors representing the class of each image.
9.7.2. Code Breakdown:#
for image_batch, labels_batch in train_ds: This loop retrieves a single batch of images and labels from the training dataset. Since our
batch_size
is 32, the batch will contain 32 images and their corresponding labels.print(image_batch.shape): This prints the shape of the
image_batch
, which should be(32, 180, 180, 3)
:32
: The batch size (32 images).180, 180
: The dimensions of each image, resized to 180x180 pixels.3
: The number of color channels (RGB).
print(labels_batch.shape): This prints the shape of the
labels_batch
, which should be(32,)
because there are 32 labels, one for each image in the batch.break: This ensures that the loop runs only once, as we only need to check the structure of one batch.
9.7.3. Purpose:#
This step is important to confirm that the dataset is loaded correctly and the images and labels are properly batched and shaped, making them ready for input into the model.
# Verify Dataset Structure
for image_batch, labels_batch in train_ds:
print(image_batch.shape)
print(labels_batch.shape)
break
9.8. A Basic Keras Model#
In this section, we define the Convolutional Neural Network (CNN) architecture for image classification. Unlike the previous lesson where we used a single layer with an edge detection kernel, here we employ a deeper network with multiple layers. Each layer plays a crucial role in feature extraction and learning from the dataset. Let’s go through each part of this code in detail.
9.8.1. Dataset Preparation for Efficient Processing#
We use AUTOTUNE
to optimize the data loading and processing, and we configure the training and validation datasets for efficient shuffling, caching, and prefetching.
9.8.2. Normalization Layer#
The normalization layer rescales pixel values from the range [0, 255]
to [0, 1]
. Normalizing the input helps the model converge faster during training by ensuring that the pixel values are small and consistent.
9.8.3. Model Structure: Sequential API#
The Sequential API allows us to stack layers in a linear fashion to define a basic CNN architecture.
9.8.3.1. Input Layer#
The input layer first rescales the images and specifies that each input image has dimensions 180x180
and 3 color channels (RGB).
9.8.3.2. First Convolutional Block#
The first block consists of a Conv2D layer with 16 filters (or kernels) followed by MaxPooling. The convolutional layer applies filters to the input image to extract low-level features (like edges), while the pooling layer reduces the spatial dimensions of the feature maps, making the model less sensitive to small changes and reducing the computational cost.
9.8.3.3. Second Convolutional Block#
The second block has 32 filters, allowing the network to learn more complex features like textures and patterns. Max pooling again reduces the dimensions of the output.
9.8.3.4. Third Convolutional Block#
In the third block, 64 filters are applied, allowing the model to detect higher-level features. The deeper we go into the network, the more abstract the features become, allowing the model to make more complex distinctions between classes.
9.8.3.5. Flattening and Dense Layers#
The Flatten layer converts the 2D feature maps into a 1D vector that is fed into the dense layers. The Dense(128) layer learns to combine the features detected in the convolutional layers. The final Dense(num_classes) layer outputs the classification logits for the crab
and rockfish
classes.
9.8.4. Layer Importance#
In the previous lesson, we used a single edge detection kernel, which was a manual, handcrafted filter. Here, the CNN automatically learns the best filters (or features) from the data. The layer structure is key to CNNs because each layer extracts more complex features, allowing the network to progressively understand the image in a more detailed and abstract way.
# A Basic Keras Model
AUTOTUNE = tf.data.AUTOTUNE
train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)
normalization_layer = layers.Rescaling(1./255)
num_classes = len(class_names)
model = Sequential([
layers.Rescaling(1./255, input_shape=(img_height, img_width, 3)),
layers.Conv2D(16, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(32, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(64, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dense(num_classes)
])
9.9. Compiling the Model#
After defining the architecture of the CNN, the next step is to compile the model. Compiling involves specifying three key elements: the optimizer, the loss function, and the metrics used to evaluate the model’s performance.
9.9.1. Optimizer: Adam#
Adam (Adaptive Moment Estimation) is a popular optimizer that combines the advantages of two other methods: AdaGrad (which works well with sparse gradients) and RMSProp (which adapts the learning rate based on recent gradients). Adam adapts the learning rate throughout training, making it a good default choice for most models.
9.9.2. Loss Function: Sparse Categorical Crossentropy#
Sparse Categorical Crossentropy is the loss function used when we have multiple classes (in this case, crab
and rockfish
), and the labels are integer values. It measures how far the predicted probabilities are from the actual labels.
from_logits=True: This indicates that the output of the final layer is raw scores (logits) rather than probabilities. Since we haven’t applied a softmax activation in the final layer, logits will be converted to probabilities during the loss calculation.
9.9.3. Metrics: Accuracy#
Accuracy calculates the percentage of correct predictions made by the model. During training, both training accuracy and validation accuracy will be tracked to monitor how well the model is learning and generalizing.
By compiling the model, we set the stage for training, specifying how the model will optimize its weights, calculate loss, and measure success.
# Compile the Model
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
9.10. Model Summary#
The model.summary()
function provides a concise overview of the model’s architecture. It displays the layer types, output shapes, and the number of parameters in each layer, giving you a quick understanding of the model’s structure and complexity.
This is useful to verify that the model is constructed as intended and to check the total number of trainable parameters.
# Model Summary
model.summary()
9.11. Training the Model#
In this step, we train the model using the model.fit()
function.
epochs=10: This specifies that the model will go through the entire dataset 10 times (or 10 training cycles). Each epoch allows the model to learn more patterns and improve its predictions.
train_ds: The training dataset used for learning.
validation_data=val_ds: The validation dataset used to evaluate how well the model generalizes to unseen data.
The history object stores the training progress, including accuracy and loss, which we visualize in the next step.
epochs=10
history = model.fit(
train_ds,
validation_data=val_ds,
epochs=epochs
)
9.12. Visualizing Training Results#
This block of code visualizes the model’s training and validation accuracy and loss over the training epochs. These plots are crucial for understanding how well the model is learning and generalizing.
Training Accuracy and Validation Accuracy: These graphs show how the model’s accuracy improves during training and how well it performs on unseen validation data.
Training Loss and Validation Loss: These graphs show how the model’s loss decreases during training. Loss is a measure of how well the model’s predictions match the true labels.
Interpreting these graphs is one of the most important skills in CV. They help you assess whether the model is overfitting (performing well on training but poorly on validation) or underfitting (performing poorly on both training and validation).
# Visualize Training Results
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs_range = range(epochs)
plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')
plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()
9.13. Making Predictions on a New Image#
After training the model, we can use it to make predictions on new, unseen images. In this example, we load a new image called “mystery.png”, preprocess it, and let the model predict whether it belongs to the “crab” or “rockfish” class.
Here’s a breakdown:
Load the image: The image is resized to the same dimensions (180x180) used during training.
Preprocess: The image is converted to an array and expanded to match the input shape expected by the model (batch format).
Make predictions: The model generates raw prediction scores for each class. We apply the softmax function to convert these scores into probabilities.
Print the result: We output the predicted class (either “crab” or “rockfish”) along with the confidence level.
from google.colab import files
uploaded = files.upload() # call the upload method on files
# Load the image
img_height = 180 # Set the appropriate target size
img_width = 180
for fn in uploaded.keys():
image_path = fn # get the filename from the uploaded dictionary
img = tf.keras.utils.load_img(image_path, target_size=(img_height, img_width))
img_array = tf.keras.utils.img_to_array(img)
img_array = tf.expand_dims(img_array, 0) # Create a batch
# Make predictions
predictions = model.predict(img_array)
score = tf.nn.softmax(predictions[0])
class_names = ['crab', 'rockfish']
# Print the result
print(
"This image most likely belongs to {} with a {:.2f} percent confidence."
.format(class_names[np.argmax(score)], 100 * np.max(score))
)