Heatmaps are great for quickly visualizing data that normally isn’t easy to ingest.
However, it sometimes feels impossible to find a coding resource that shows you how to code up these heatmaps in Python, and what they’ll look like when you’re done.
After this post, you’ll have both.
Heatmaps in this post:
- bqplot
- ggplot
- Lightning Viz
- Cufflinks
- MissingNo
- Matplotlib
- Seaborn
- Plotly
Example Data For Each Heatmap
For this tutorial, we’ll simply use the data frame below to implement these different heatmaps.
This will allow you to implement a quick visual and code comparison for your project!
import pandas as pd
import numpy as np
# example data
#https://www.kaggle.com/datasets/uciml/red-wine-quality-cortez-et-al-2009
df = pd.read_csv('winequality-red.csv')
# take the last 5 columns
df=df[[col for col in df.columns[5:]]]
# we will heatmap the correlations
corr = df.corr()
corr
How To Code A Heatmap In Seaborn
A standard in data science, Seaborn has one of the easiest-to-implement heatmaps.
This package is built on top of matplotlib and is one of my favorite packages for plotting distributions.
While I think this package struggles with customization, getting a heatmap out in one line of code quickly is extremely attractive.
Read more about Seaborn here or the initial paper here.
import seaborn as sns
sns.heatmap(corr)
How To Code A Heatmap In Plotly
Plotly is an interactive graphing library that has been on the rise for some time.
With mountains of examples and a community hangout where users exchange questions and code – you can’t really go wrong with Plotly.
Personally, I think their simple implementation of a heatmap is one of the best looking and is one I use when I don’t need special customization.
Read more about Plotly here.
import plotly.express as px
fig = px.imshow(corr)
fig.show()
How To Code A Heatmap In Matplotlib
The king of visualization in Python.
Matplotlib is one of the oldest and most stable libraries in python for visualization.
My gripe with Matplotlib is most of the graphs out-of-the-box are a little ugly – but can be made to look better with some customization.
Most of this list’s visualization libraries are built on top of Matplotlib (like Seaborn), leveraging the strong code base of Matplotlib.
Read more about matplotlib here.
import matplotlib.pyplot as plt
plt.imshow(corr, cmap="viridis", interpolation="nearest")
plt.colorbar()
plt.show()
import matplotlib.pyplot as plt
plt.imshow(corr, cmap="hot", interpolation="nearest")
plt.colorbar()
plt.show()
How To Code A Heatmap In ggplot
A very well-known package in R is now popping up in Python.
ggplot is simply a package for plotting in python.
While this package dominates in R, it simply hasn’t reached the same level of adoption as Python.
One of my favorite uses of ggplot is plotting text which can be done in one line of code.
Read more about ggplot here.
from plotnine import ggplot, aes, geom_tile, geom_text
from plotnine import scale_fill_gradientn, ggtitle
import pandas as pd
import numpy as np
# melt down our dataframe
melted_corr = corr.melt()
# repeat the columns
a = np.array([col for col in corr])
melted_corr = melted_corr\
.assign(variable2=a[np.arange(len(melted_corr)) % len(a)])
# create a figure
fig = plt.figure()
# plot, we can add whatever we want
# I added tiles forexample
ggplot(melted_corr, aes(x=melted_corr['variable'],
y=melted_corr['variable2'],
fill=melted_corr['value']))\
+ geom_tile()\
+ geom_text(aes(label = \
round(melted_corr['value'], 3)))
How To Code A Heatmap In bqplot
bqplot is an interactive visualization tool for the jupyter environment, bringing customer-ready visualizations to your customers with minimal code.
While the visual aspects of bqplot can mostly be found in plotly, the ability to have interactive charts right in your jupyter environment is a huge plus.
Since I do a lot of coding in jupyter notebooks, I’ll leverage bqplot if I know I will have a stakeholder viewing my visualizations.
That way, they’ll be able to interact with the charts.
You can read more about bqplot here.
from bqplot import pyplot as plt
from ipywidgets import Layout
fig = plt.figure(
title="HeatMap",
layout=Layout(width="425px", height="425px"),
min_aspect_ratio=1,
max_aspect_ratio=1,
padding_y=0,
)
axes_options = {'color': {'orientation': "vertical","side":"right"}}
heatmap = plt.heatmap(color=corr.values, axes_options=axes_options)
plt.show()
How To Code A Heatmap In Missingno
While all the packages above focus on visualizing your data, we need one to visualize what isn’t there.
MissingNo is a simple toolset that allows you to create visualizations and utilities to visualize your missing data quickly.
This is a no-brainer use in situations where you’re lacking data and wondering about the impact it will have on your analysis.
Read more about MissingNo here.
import missingno as msno
import random
%matplotlib inline
# random index values
ix = [(row, col) for row in range(df.shape[0]) for col in range(df.shape[1])]
# create nulls for 20% of the data
for row, col in random.sample(ix, int(round(.2 * len(ix)))):
df.iat[row, col] = np.nan
# show our heatmap
msno.heatmap(df)
How To Code A Heatmap in Cufflinks
Cufflinks is great for the average data scientist.
This package is built on top of plotly and pandas, which seems to work perfectly for data scientists, as most of the data is in dataframes.
One of my personal favorite packages that really starts to show its strength when it comes to “stacking” charts together.
While this package packs a lot of power, it seems not to be actively managed, as no commits seem to have been merged within the last three years.
It’s still a powerful tool you should know; read more about it here.
import cufflinks as cf
from plotly.offline import iplot
cf.go_offline() #will make cufflinks offline
cf.set_config_file(offline=False, world_readable=True)
corr.iplot(kind='heatmap', colorscale='rdpu' )
# some other colors that can be used for cufflinks heatmap
# dark2, dflt, ggplot, gnbu
# greens, greys, oranges
# original, orrd, paired
# pastel1, pastel2, piyg
# plotly, polar, prgn,
# pubu, pubugn, puor,
# purd, purples, rdbu
# rdgy, rdpu, rdylbu,
# rdylgn, reds, set1
# set2, set3, spectral
# ylgn, ylgnbu,
# ylorbr, ylorrd
How To Code A Heatmap in Lightning Viz
Finally, a python based graphing solution for your web apps.
Lightning provides API-based access to all your apps and is supported in Python, Javascript, Scala, and R.
I worry if this project is being maintained, as their SSL certificate has gone missing, and they’ve shut down their test server.
Anyways, if you’re willing to start up your own server, the code still exists and can be utilized.
Read more about Lightning here.
lgn = Lightning(ipython=True)
lgn.matrix(corr, colormap='BuPu', \
row_labels=list(corr.index.values), \
column_labels=list(corr.columns.values), \
width=500, \
description="HeatMap")
lgn.open()
Recapping Heatmaps In Python:
We’ve learned how to create heatmaps for the eight following python packages:
- Lightning Viz
- Cufflinks
- MissingNo
- bqplot
- ggplot
- Matplotlib
- Seaborn
- Plotly
We’ve also learned when to use each one, some positives and negatives about each package and full python code to implement these on your own.
Now that you can visualize your data, get out there and make a heatmap!
Other Articles In Our Machine Learning 101 Series
We have many brief guides that go over some of the fundamental parts of machine learning.
Some of those guides include:
- Reverse Standardization: Another staple in our 101 series is an introductory article teaching you about scaling and standardization.
- Parameter Vs. Variable: Commonly misunderstood – these two aren’t the same thing. This article will break down the difference.
- Criterion Vs. Predictor: One of the first steps in statistical testing is the independent and dependent variables.
- Welch’s T-Test: Do you know the difference between the student’s t-test and welch’s t-test? Don’t worry, we explain it in-depth here.
- SelectKBest (Sklearn): Now that you know how to make a heat map for your variables, make sure your feature selection is good!
- CountVectorizer vs. TfidfVectorizer: Interested in learning NLP? This is a great guide to jump into after learning about two famous distributions.
- Normal Distribution Vs. Uniform Distribution: Two key distributions will pop up everywhere in data science.
- Gini Index Vs. Entropy: Understanding these two fundamental parts of decision trees is key for data science!