Keras Shuffle: A full in-depth guide (Get THIS right)

Deep learning can be tricky, but we have some APIs that help us create wonderful models that can quickly converge to a great solution.

The Keras API used for neural networks has risen in popularity for modeling with TensorFlow.

Keras Shuffle is easy to mess up and is essential for your success with modeling and data science.

What is Keras Shuffle?

Keras Shuffle is a modeling parameter asking you if you want to shuffle your training data before each epoch. This parameter should be set to false if your data is time-series and true anytime the training data points are independent.

A successful Model starts way before you start writing your code.

Understanding how you want to set up your batching and epochs is crucial for your model’s success.

We go over Keras Shuffle, the different parameters of Keras Shuffle, when you should set it to True or False, and how to get the best usage out of it below.

Messing up this model parameter will create an overfitting model that isn’t reproducible in the real world.

This is one of the last things we want as machine learning engineers, and we will show you how to avoid this.

What is Keras Shuffle?

In the most basic explanation, Keras Shuffle is a modeling parameter asking you if you want to shuffle your training data before each epoch.

To break this down a little further, if we have one dataset and the number of epochs is set to 5, it would use the whole dataset set 5 times.

Many will set shuffle=True, so your model does not see the training data in the same order for each epoch.

This can improve the model’s accuracy and potentially cover up some bias in your data.

Realize this does not shuffle the validation or test set,  so reproducibility of each training epoch will be impossible; However, model runs can still be compared fairly as the validation set for each epoch will remain not shuffled and the same.

model.fit(x, y, batch_size=400, epochs=5, shuffle=True)

In the above line, the dataset will be used five times and split up into 400 chunks each time. Because shuffle=True, the data will be shuffled five different times for each epoch.

model.fit(x, y, batch_size=400, epochs=5, shuffle=False)

In the above line, the dataset will be used five times and split up into 400 chunks each time. Because shuffle=False, your data will be taken in sequential order for each of the five epochs.

More information about building out these models is in our article all about the Dense Layer

When would you use Keras Shuffle?

Anytime you are modeling with Keras, you are required to use Keras Shuffle.

You do not have a choice as it is a required parameter in the .fit method.

More information on another .fit parameter can be found here at steps per epoch keras.

Let’s go over a couple of different instances of where’d you would set Keras Shuffle to true and when you would want to set it to false.

If you’re doing any classification, you’re going to want to have shuffle set to true. 

Also, if you are dealing with any independent data, you’re going to want to set shuffle to true.

This is because shuffling has been shown to reduce overfitting in sample scores. (Source)

Your goal is to have shuffle set to true, as this will improve your model.

Sometimes, you can not have shuffle set to true, which will cause your model not to shuffle data.

Most machine learning algorithms have an underlying assumption.

When would you use keras shuffle

This assumption assumes that each instant or line of your data is independent of each other.

We cannot shuffle time-series data because the data are no longer independent from each other.

Think about the stock market; one of the most significant indicators of a stock’s current position is the previous one.

For that to be true, how could this current instance be independent of the last?

(They aren’t Independent, and stock market data is time-series)

Now think what would happen if you shuffle that data.

Let’s say your training data included the bolded highlighted data points and value t was put inside your testing set.

Time Value
t-2 36
t-1 42
t x
t+1 58

 

Quickly we see how unfair it is to possess both data from the past and the future in the training set as predictions are now caped between [42,58] for t.

We will see an incredibly high test accuracy when running our tests and validations.

However, once this model is deployed, the accuracy will quickly fall off.

Because in the real world, we will never possess t+1 time,  as the future doesn’t exist (at least in Machine Learning), and we won’t have a data point on it.

Our testing accuracy will quickly plummet without this upper bound on the current prediction. 

Starting your model correctly is how you have success during modeling, we will teach you how in Keras Input Shape.

Parameters of Keras Shuffle?

Remember from earlier that Keras Shuffle is either true or false.

Since this parameter cannot be omitted, you must provide the true or false parameter (or do not mention it at all).

 parameters of keras shuffle

Keras Shuffle is always set to true by default, so even if you forget to provide it, your data will automatically be shuffled during training.

Keras Shuffle or Train Test Split?

This is a great question, but it is fundamentally wrong to compare them.

Keras Shuffle is an intra-training set decision, meaning that whatever you choose, this will only be applied to the training set. 

This does not affect validation or test sets, and only the trained model will be different based on this parameter.

The trained model will be different because it will see either shuffled or non-shuffled data.

Understanding features during modeling is important. We wrote Keras Feature Importance to give a good intro so you could understand your models better.

 keras shuffle or train test split.

Train Test Split is much more about separating training and testing data sets. 

Whenever you apply train test split, you’re slicing your data into entirely different data sets.

These two work very well together.

And using train test split to create the training and validation sets for your deep learning model (Keras API) will enable you to test the accuracy during training quickly.

The same rules apply for shuffling during train test split as they do for Keras Shuffle.

Most of the time, you’ll want to shuffle while splitting your data,  but if your data is not independent (time-series), you will not be able to shuffle at any part of your pipeline. 

So the question is not Keras Shuffle or Train Test Split? 

It’s More Keras Shuffle and Train Test Split?

Keras Pandas Example

Remember to import pandas, with the assumption of import pandas as pd

df = pd.read_csv(file)

You will now need to grab your target variable 

y = df['target']

Remove that target variable from your training set

df = df.drop(['target'], axis=1)

Convert this to a tensor; that way, Keras Shuffle is available, and you may also need to convert your target variable.

df = tf.convert_to_tensor(df.values)

You will need to build out your model in a function (named define_some_model here) and define it with some name (we use model)

model = define_some_model()

Finally, call your model on your dataset and target variable.

model.fit(df, y, batch_size=400, epochs=5, shuffle=True)

Frequently Asked Questions

Keras Databricks

Databricks runtime includes both TensorFlow and the Keras API. Databricks is a perfect pick for deep learning. Having access to distributed training will allow you to create deep learning models that wouldn’t be available on your computer due to resource limits.

Keras Softmax Loss

Keras Softmax Loss is the perfect last layer of probabilistic models. This is because softmax will produce a vector K that will sum to 1, giving an output that indicates which output the model prefers. You will see this a lot in categorical models.

Keras Pyspark

Pyspark and Keras are an incredible duo. Pyspark allows you access to distributed data, meaning you will have more data for modeling. Since Keras is an API that sits on TensorFlow, and deep learning networks are known for doing best with high quantities of data, combining these two is very harmonious. 

Dylan Kaplan
Latest posts by Dylan Kaplan (see all)

Leave a Comment