Reinforcement Learning – EML

One Step At A Time: Epoch In Machine Learning

Stewart Kaplan — Mon, 19 May 2025 00:25:21 +0000

Epochs in machine learning can be confusing for newcomers.

This guide will break down epochs and explain what they are, how they work, and why they’re important.

We’ll also dive deep into Epochs’ relationship with Batch size and iterations.

Deep learning is a complex subject, but by understanding epochs, you’re on your way to mastering it!

What is an Epoch in Machine Learning?

When we train a neural network on our training dataset, we perform forward and back propagation using gradient descent to update the weights.

To do this, forward and back propagation requires inputting the data into the neural network.

There are three options to do this:

1.) One by One

2.) Mini Batches

3.) Entire Batch

Feeding your neural network data one by one will update the weights each time using gradient descent.

If you feed your neural network data with mini-batches, after every mini-batch, it’ll update the weights using gradient descent.

Batch Size is equal to N/2 (Mini Batch)

And finally, if you feed your neural network your entire dataset, it’ll update the weights each time using gradient descent.

However, no matter how you feed the data to your neural network, once it’s seen the entire dataset, this is one Epoch.

HWY 1 on Key West, Florida.

Using one by one will take N updates (where N is the # of rows in your dataset) for one Epoch.

In contrast, using the entire dataset will only take one update for one Epoch.

Why do we use more than one Epoch?

Since one Epoch is when our machine learning algorithm has seen our entire dataset one time, more data is needed for our algorithm to learn the hidden trends within our dataset.

This is why we use more than one Epoch to provide enough data to train our algorithm.

How to Choose The Right Number of Epochs

There’s no magic number when choosing the correct number of epochs for training your machine-learning algorithm.

You’ll generally set a number high enough that your algorithm can learn your dataset but not too high where you’re wasting resources and overfitting.

The best way to find the perfect balance is through trial and error.

Start with a relatively high number of epochs and gradually decrease until you find the sweet spot.

It might take a little time, but getting the best results is worth it.

What Is the Difference Between Epoch and Batch In Machine Learning?

An epoch is running through the entire dataset once, and batch size is just how many “chunks” we do it in.

If we have a dataset of 1000 points and a batch size of 10, we’re going to train our model on 10 points at a time, update our weights 10 points at a time, and do that 100 times.

That’s one Epoch.

If we want to run more epochs, we keep going until we hit our stopping criterion.

Pretty simple, right?

We do this because we don’t try to process too much data at once and overload our RAM. If you’re only processing 10 points at a time, you’ll be safe from memory overload.

There are other benefits, too – like stopping training if the validation loss isn’t improving after a certain number of epochs or if the training loss starts increasing (which would mean you’re overfitting).

Do Different Frameworks Have Different Meaning For Epoch?

Whether you’re using TensorFlow, PyTorch, or whatever new deep learning framework comes out in the future – an epoch is one run through the entire dataset.

The meaning of Epoch has eclipsed module usage and is taught as a fundamental part of deep learning.

Whenever someone references an epoch, they’re talking about your dataset, seeing the entire training set.

Does an Epoch Exist Outside of Machine Learning?

While the idea behind Epoch does exist outside of machine learning, you won’t hear it called “epoch.”

For example, if you’re working as a data scientist and are building a visualization for a chart – you’d use the entire dataset.

This would be an “epoch,” but your boss would never reference it that way, as it’s not standard jargon outside machine learning.

So an epoch does exist outside of machine learning, but the terminology would never be used outside of machine learning (and mostly deep learning contexts).

What is an iteration in machine learning?

An iteration is how many updates it takes to complete one Epoch.

In other words, it’s the number of times the model weights are updated during training.

The term comes from the Latin word iter, meaning “to go through or do again.”

Higher iterations, in my experience, improve accuracy but take longer to train because the model has to update the weights much more often.

Trade-offs between the interaction of iterations and batch size exist.

As you increase batch size, your iterations are lowered.

For example, you might want to use fewer iterations if you train a model on a small dataset, as you’ll be able to fit most of the dataset into memory.

But if you’re training a model on a large dataset, you might want to increase iterations (which would lower batch size) because it would take too long to train the model otherwise – and probably wouldn’t fit into ram.

Experimenting with different values and seeing what works best for your situation is essential.

Do Many Epochs Help Our Gradient Descent Optimization Algorithm Converge?

Increasing the number of Epochs your algorithm sees will help until a certain point. Once that point is reached, you’ll start overfitting your dataset.

As a machine learning engineer, your job is to find that sweet spot where your algorithm is seeing enough data but not so much that it’s started focusing on the noise of the training data.

A validation dataset will help track the loss and create programmatic stops if the loss starts to increase.

Loud and Proud: Verbose in Machine Learning

Stewart Kaplan — Tue, 06 May 2025 15:57:37 +0000

In machine learning, there are two types: those who like to keep things short and sweet and those who want to explain everything in detail.

I fall into the latter category – I love verbosity.

Some might call it overkill, but I see it as a way to ensure no stone is left unturned.

In this guide, we will explore the ins and outs of verbose in machine learning, including when it should be used and how to implement it correctly in your models.

By the end, you’ll know the following:

What Verbosity is
Understanding The Output From Verbose
Setting Up Verbose with Two Famous Machine Learning Algos
When You Should and Shouldn’t use Verbose Settings

What is Verbose in Machine Learning?

In machine learning, “verbose” refers to a particular setting used when training and validating models.

When verbose is turned on, the algorithm will provide more detailed information about its progress as your model iterates through the training process.

It’ll push this output right to your console!

This can be useful for:

Debugging
Error Finding
Understanding your Models progression with offline metrics
Early Stopping In Deep Learning

As a warning, the verbose output can sometimes slow down the training process since printing output to your console is much slower than just running the model.

I like to run the algorithm without any verbose setting output and only use it if there are problems with the results.

Realize in most standard modeling packages; there are different “levels” of verbosity.

For example, here are the levels for the famous Sklearn package.

We will use the GridsearchCV for this example:

Setting Verbose = 0

Silent Modeling!

Setting Verbose = 1

This will display the computation time for each fold and the parameter candidate.

Setting Verbose = 2:

This will display everything from 1, and the score will also be displayed;

Setting Verbose = 3:

This will display everything from 1 and 2, along with candidate parameter indexes and the computation time.

This will be slightly different for each model we choose, but we get the general gist: as verbosity increases, we get more output information to our console.

Reference: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html

Understanding Verbose Output Within Data Science

One way to approach verbose output from your models is to break it down into smaller chunks.

If you’re looking at a massive block of text, try focusing on one section at a time.

Before you go through each word of your log, you should understand what value you’re trying to optimize (score, computation time, etc.) and focus mainly on that.

What I like to do is to look for patterns in the output. Watching as the numbers increase or decrease is usually much more helpful than the EXACT number at that EXACT time.

Verbosity settings are to be used as a guide, and if you come at your modeling process with a goal in mind, verbosity can help you get to the finish line.

Below, we’ll explain how to set up the verbose setting in python with some famous models so you can use it in any situation.

How To Set Up Verbose in XGBoost Models

For our XGBoost model, we only changed the verbosity setting from 1 to 3.

In the below models, we used the “verbose” setting, but in XGBoost, this setting is called “verbosity.”

Here is the code we used,

from sklearn.datasets import load_wine
import xgboost as xgb
import numpy as np
from sklearn.metrics import mean_squared_error

wine_df = load_wine()

X = wine_df.data
y = wine_df.target

xgb_model = xgb.XGBRegressor(objective="reg:squarederror", random_state=42, verbosity=1)

xgb_model.fit(X, y)

y_pred = xgb_model.predict(X)

mse=mean_squared_error(y, y_pred)

print(f'\n\nModel Mean squared error {np.sqrt(mse)}')

We can see that our model is silent when verbosity is set to 1.

When increasing this to 2, we see much more output.

Finally, once we’ve passed 3 into our model, we see everything from 2, plus some final metrics and KPIs on model performance.

How To Set Up Verbose in Scikit Learn Models

Here is our code; we only changed the verbose setting in our gradient-boosted model to get the images below.

from sklearn.datasets import load_wine
import xgboost as xgb
import numpy as np
from sklearn.metrics import mean_squared_error

wine_df = load_wine()

X = wine_df.data
y = wine_df.target

xgb_model = xgb.XGBRegressor(objective="reg:squarederror", random_state=42, verbosity=1)

xgb_model.fit(X, y)

y_pred = xgb_model.predict(X)

mse=mean_squared_error(y, y_pred)

print(f'\n\nModel Mean squared error {np.sqrt(mse)}')

With Verbose set to 0, we see that our model is “silent.”

When we upgrade this to 1, we see gaps in our iterations.

Finally, when we push this to 2, we see a complete breakdown from each iteration of our model.

How To Set Up Verbose in Deep Learning (Keras) Models

Here is the code we used for our Deep learning example, only changing the verbose setting in the .fit method.

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from sklearn.datasets import load_wine

wine_df = load_wine()

X = wine_df.data
y = wine_df.target

model = Sequential()
model.add(Dense(12, input_shape=(X.shape[1],), activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])

model.fit(X, y, epochs=150, batch_size=10, verbose=1)

_, accuracy = model.evaluate(X, y)
print('Accuracy: %.2f' % (accuracy*100))

When verbose is set to 0, our model is silent.

When this is pushed up to 1, we only get an update after each epoch.

Finally, when this is set to 2, we get an output showing each iteration in each epoch.

When Should You Use Verbose Settings in Machine Learning?

Verbosity will give you a ton of information while building out your models.

This can be super helpful when you’re in the fine-tuning stage or trying to dive deep into a problem you need help with.

Since most verbosity settings are in levels, you can choose a level that gives you the amount of output you need.

When Do We Turn Off Verbose?

When working in data science and on models, it’s critical to strike the right balance between training speed and model performance.

Your boss needs to see results, but they also need to see the right results.

If you build inaccurate models, nobody will want them – but if you never get models out the door, you’ll find yourself needing a new job.

One way to strike this balance is to be selective with the verbose parameter.

You can start your modeling without verbose turned on, and if you do not see the results you want, turn it on, as it can help you dive deeper into your model and find areas that need improvement.