neural networks, tutorial,

Hyperparameter tuning with Keras and Keras Tuner for images classification

Aug 16, 2020 · 14 mins read
Hyperparameter tuning with Keras and Keras Tuner for images classification
Share this

In this post I describe and review Keras Tuner which can help you pick the optimal set of hyperparameters for your model. I used a small dataset for garbage classification and two different network architectures to train them with and without hyperparameter optimization. Will Keras Tuner make a difference?

In the section below I describe the idea behind hyperparameter tuning (grid search and random search). If you are already familiar with that, you can start reading from Project Outline section - in which I introduce the dataset and toy project I used for this article. If you came here to read specifically about Keras Tuner and results of this experiment, feel free to scroll down even further.

What is hyperparameter tuning (optimization)?

Hyperparameter tuning (or optimization) is a process of choosing a set of optimal hyperparameters for a learning algorithm e.g. neural network or decision tree. These algorithms have a lot of parameters that control the way they learn. It might be lower or higher learning rate, number of units in neural network layers, dropout rate or even different optimizer. At the end the goal is to find a tuple of hyperparameters that minimizes a loss function we use for training.

The most popular way of performing a hyperparameter optimization is probably a Grid Search. This brute-force (or exhaustive) method will examine all combinations from a predefined subset of parameters so it’s guaranteed that you’ll find the optimal solution. However, you may imagine how long it may take when the search space keeps growing.

There’s an alternative approach called Random Search that does examine only few combinations (randomly), not all of them. This is an asset and a drawback at the same time. Although you may not find the best set of parameters available, random search is a lot faster than searching the whole grid of values. This makes it the most common (I guess) option for hyperparameter optimization.

There are also other methods you can use for hyperparameter tuning. Also grid and random search approaches deserve more detailed explanation, but for the sake of brevity let’s skip it this time and let’s jump right to the project and code.

Sample images from Garbage Classification dataset

Figure 1. Sample images from Garbage Classification dataset

Project outline

If you are interested only in review of Keras Tuner, no matter what the project is about, you can skip this section.

For this test I used Split Garbage Dataset being a split version of another Garbage Classification dataset. It contains around 2000 images assigned to six categories (cardboard, glass, metal, paper, plastic and trash). Let me write just a simple outline of things I’ve done in this project and then in next sections I will cover the optimization part in more details.

  1. Read and preprocess data:
    1.1. Load data from train, valid and test subdirectories using image_dataset_from_directory function.
    1.2. Compute class weights (sklearn.utils.class_weight.compute_class_weight) for the training set.
  2. Define baseline models:
    2.1. VGG16 and ResNet50 with top (classification) layers excluded and replaced with two dense layers + dropout. 2.2. Leave fixed number of units in dense layers, dropout, learning rate and other parameters of baseline models.
  3. Train baseline models:
    3.1. Train both for 50 epochs with validation data, EarlyStopping and computed class weights for data balancing.
  4. Define search space and run kerastuner.RandomSearch:
    4.1. Run 25 trials (with different combinations of parameters) and train each for 15 epochs with early stopping.
    4.2. List the best hyperparameters and build a new model with those to train it from scratch.
  5. Train optimized model in the same way as baseline (see 2.1 and 2.2).
  6. Display train history (metrics), classification report and confusion matrix.
  7. Compare training history of baseline models and the one I built after hyperparameter tuning.

By the way, I trained all of that in Google Colab and the code is available in my public GitHub repository if you want to read it line by line and see how things are implemented. Now I’m going to describe points 4-7 to show you how I used Keras Tuner to find a good model and if it made any difference (compared to baseline).

Step 1: Defining an optimizable model

I started with implementation of a class (extending kerastuner.HyperModel) that defines my optimizable model and its parameters. The core part is a build method that accepts hp parameter (that’s how Keras Tuner does it) used to define possible values in HPO search space. Note that the method returns already compiled model.

import tensoflow as tf
import kerastuner as kt
...

class MyHyperModel(kt.HyperModel):

  def __init__(self, num_classes):
    self.num_classes = num_classes

  def build(self, hp):
    model = ...
    model.compile(...)
    return model

Let’s look closer at the build method line by line to understand how possible parameter values can be defined (there are few methods for that) and used. I start with using hp.Choice function to state that I’m interested with two different models, namely ResNet50 or VGG16. When tuning parameters, Keras Tuner will choose one of these values for each run.

Unfortunately complex types (like tf.keras.applications.* models) cannot be used in Choice directly, so I had to add an auxiliary function to return and initialize one of models based on given string (“resnet50” or “vgg16”). Here’s how it looks:

def return_backbone(self, backbone):
  if backbone == 'resnet50':
    return tf.keras.applications.ResNet50
  elif backbone == 'vgg16':
    return tf.keras.applications.VGG16

def build(self, hp):
  backbone = hp.Choice('backbone', values=['resnet50', 'vgg16'])
  base_model = self.return_backbone(backbone)(weights='imagenet', include_top=False)

  ...

Other ways to define parameters are hp.Float and hp.Int that you can see below. For these two we define min_value and max_value defining a range from which Keras Tuner can pick different values for tuning. Optionally, you may specify step value e.g. if you want to choose only values divisible by 64.

lr = hp.Choice('lr', values=[1e-2, 1e-3, 1e-4])
dropout_1 = hp.Float('dropout_1', 0.1, 0.5)
dropout_2 = hp.Float('dropout_2', 0.1, 0.2)
dense_units = hp.Int('units', min_value=128, max_value=512, step=64)
opt = hp.Choice('optimizer', values=['adam', 'sgd'])

Using these variables could not be easier and you won’t even realize that they are defined by Keras Tuner. When defining model and specifying parameters such as dropout rate, layer size or learning rate, use one of objects created with hp. methods. Simple as that. Keras Tuner does the rest.

x = base_model.output
x = tf.keras.layers.GlobalAveragePooling2D()(x)
x = tf.keras.layers.Dropout(rate=dropout_1)(x)
x = tf.keras.layers.Dense(1024, activation='relu')(x)
x = tf.keras.layers.Dropout(rate=dropout_2)(x)
x = tf.keras.layers.Dense(units=dense_units, activation='relu')(x)
predictions = tf.keras.layers.Dense(6, activation='softmax')(x)

After binding optimizer variables to where they belong, finish the build method with creating and compiling model. Remember that this method has to return a model that is compiled (using Keras .compile method) and can be trained.

Step 2: Optimize hyperparameters

So we’ve finally got to optimization (or tuning) step. Now you have the model, you’ve specified which parameters should be optimized and what is the range of values. Cool. The next step is to create a Tuner object e.g. kt.RandomSearch which accepts your model and implements a .search() method to finally start the right part of this experiment.

There are couple of Tuner classes such as BayesianOptimization, Hyperband, RandomSearch or Sklearn classes. They implement different algorithms for hyperparameter search and accept different parameters. In case of Random Search which I used in this experiment only three parameters are necessary:

  • hypermodel is basically the object of your model (kt.HyperModel)
  • objective is a name of metric to be optimized e.g. val_loss or val_accuracy,
  • max_trials defines total number of trials (runs) to test, the higher it is the more configurations will be tested.

To start tuning parameters, call .search() method of Tuner class which has a signature similar to .fit() methods you use to train Keras model. After you execute it, you need to wait a bit for results. Somewhere between grabbing a cup of coffee and watching all seasons of Friends, depending on the size of search space you’ve defined.

hypermodel = MyHyperModel(num_classes=6)
tuner = kt.RandomSearch(hypermodel, objective='val_loss', max_trials=25, overwrite=True)

callbacks = [tf.keras.callbacks.EarlyStopping(
  monitor='val_loss', mode='min', patience=5, restore_best_weights=True)]

tuner.search(train_ds, epochs=15, validation_data=valid_ds,
             callbacks=callbacks, class_weight=class_weights, verbose=0)

Step 3: Check optimization results

No matter how long you’ve been waiting, I bet you want to know what are the best parameters found by Keras Tuner. There’s a simple method for that, although I couldn’t find any way to print all of defined parameters (I had to print them one by one specyfing parameter name). Tuner has another useful functions to use after searching, e.g. the one that returns the best model that you can evaluate on a hold-out test set if you have one.

best_hp = tuner.get_best_hyperparameters()[0]

for hp in ['backbone', 'dropout_1', 'dropout_2', 'lr', 'units']:
    print(f'{hp} = {best_hp.get(hp)}')
models = tuner.get_best_models(num_models=2)

valid_loss, valid_acc = models[0].evaluate(valid_ds)
print(f'Loss = {valid_loss}, accuracy = {valid_acc} for validation set (best model)')

valid_loss, valid_acc = models[1].evaluate(valid_ds)
print(f'Loss = {valid_loss}, accuracy = {valid_acc} for validation set (2nd best model)')

Apart from evaluating the model (two best models) once again on validation set, I wanted to compare the best model with two baseline networks I created and trained before. To obtain training history for an optimizable model I had to re-train it once again using the best set of parameters returned by Tuner. Unfortunately I couldn’t find any easy way to get history for a model returned after the searching step.

Metrics for HyperModel trained with best parameters

Figure 2. Metrics for HyperModel trained with best parameters

Compare optimized model with baseline models

At this point, KT’s job is done. What’s left to do is to check whether Keras Tuner has introduced any improvements when compared to baseline models. I used the hold-out test set to evaluate three different models - two with default params and the one that uses parameters returned by optimizer (HyperModel). Let’s see how they perform on data they haven’t seen before.

Found 431 files belonging to 6 classes.
[Baseline VGG16] Loss = 0.4932003617286682, accuracy = 0.8236659169197083 for test set
[Baseline ResNet50] Loss = 0.33364376425743103, accuracy = 0.8723897933959961 for test set
[HyperModel] Loss = 0.2778874635696411, accuracy = 0.9187934994697571 for test set

Thankfully, HyperModel has performed better than what I called baseline models. The difference can be seen in both loss and accuracy values. I did also suspect that ResNet50 will outperform VGG16 network in this classification problem, so these results seem legit to me. But I have something way more interesting - visuals.

Training (and validation) history for all of three models

Figure 3. Training (and validation) history for all of three models

I’ve got at least two observations here: Optimized model has reached better loss and accuracy values than baseline. This is a good news actually. The other thing is that training of Optimized lasted for only 16 epochs (compare to 20 for VGG and almost 50 for ResNet architecture) because it has probably started to overfit and EarlyStopping shut it down. Nevertheless, within those 16 epochs it has outperformed other competitors.

Conclusion

It looks like Keras Tuner did make a difference here.

If it comes to my impressions I think it’s relatively simple to use Keras Tuner if have a model that is already implemented in Keras. Adding only few lines of code you may find much better configuration than default and also compare these models just like I did. And I think there’s no reason not to use HPO for your future models.

Apart from running into troubles when playing with BatchDataset that I got from image_dataset_from_directory (TensorFlow 2.3), I think it was quite a nice experience. To get better understanding of all HPO algorithms I need to run few more experiments (e.g. using BayesianOptimization tuner instead). I would also love to play with other tools like Optuna or hyperopt to be able to compare them. Well, soon.