tools,

If I had to pick only one tool that makes machine learning easier for everyone...

Nov 04, 2020 · 5 mins read
If I had to pick only one tool that makes machine learning easier for everyone...
Share this

If you read my posts or take a look at my projects, it comes as no surprise that I use Google Colab a lot in my machine learning projects. Each time I start a new project, I create a Colab notebook in the first place. Recently I was wondering why I keep doing that and what’s so special about it.

Let me just mention that it’s not an advertisement and nobody from Colab asked me for this, yet.

1. Free GPU and TPU

Colab comes to you with GPU (Tesla K80) and TPU that you can use for a total of 0$. Once you start a Colab session, your notebook can use these resources for up to 12 hours, which is normally enough for short-term experiments. One limitation which I already wrote about in another post is that although your resources are free, they are not guaranteed or unlimited:

Colab is able to provide free resources in part by having dynamic usage limits that sometimes fluctuate, and by not providing guaranteed or unlimited resources. This means that overall usage limits as well as idle timeout periods, maximum VM lifetime, GPU types available, and other factors vary over time. Colab does not publish these limits, in part because they can (and sometimes do) vary quickly.

Nevertheless, you can use GPU and TPU free of charge and don’t be afraid of exceeding budget or bother about CUDA drivers… which brings us to the next point.

2. No problems with environment

Figure 1.

Maybe it’s just me, but configuring environments for machine learning projects can be really irritating. Indeed, it’s a piece of cake when I just have to recreate the environment from requirements.txt file. When I have to install every single package manually and I struggle with version conflicts, it can get out of hand. Eventually, it’s a real pain in the neck when I have to set up whole environment from scratch, installing GPU drivers, CUDA and so on. Urgh.

In Google Colab notebooks you don’t need to think about it, because environment is ready-to-use and most popular frameworks (numpy, pandas, TF, PyTorch and what not) are already installed. Of course you can still install additional packages or change their versions. On the other hand, if you don’t have to do that, just start a sesson and code.

3. Integration with Drive and GitHub

There are two tools which I use with together with Colab notebooks: Google Drive and GitHub.

Drive gives you enough (15GB for free) space for your datasets or model checkpoints, becomign a perfect place to store your data for Colab notebooks, unless you make that one silly mistake. Furthermore, Drive is also used to store your saved notebooks by default, so you can share your code and collaborate with others.

Working alone or you prefer to have everything under (version) control? You will probably decide to keep notebooks in GitHub repository. And that’s cool. You can connect Colab with GitHub and use code from your repositories (public or private). At the end of each session, you may save changes back to GitHub with just one button.

4. Jupyter!

Each Colab project works on top of a Jupyter notebook

Figure 2. Each Colab project works on top of a Jupyter notebook

Each Colab project works on top of a Jupyter notebook, which are incredibly useful both for beginners and experienced developers. Thanks to the cell-based execution, you can split your code into several code cells, visualize results and write Markdown documentation (with TeX formulas) in the same place.

Jupyter does a good job for all experiments, where you need to explore data, compare models and do a lot of visualizations in one place. These notebooks are yet another advantage of using Colab for data science projects. Enough said.

5. Colab Pro for those who want more power

In case free version (and resources) of Colab is not enough, try Colab Pro. For 10$/month it gives you faster GPUs (T4 or P100), longer runtimes and more memory. However, resources aren’t guaranteed, just like in free plan. This is what Colab says about it:

In order to offer faster GPUs, longer runtimes and more memory in Colab for a relatively low price, Colab needs to maintain the flexibility to adjust usage limits and hardware availability on the fly. Although we make no guarantees, we anticipate that most subscribers who use Colab Pro as it is intended to be used – for interactive computing – will experience few if any usage limits.

In addition to that, resources are proritized for “subscribers who have recently used less resources”, so if you use Colab to extensively, you get less resources. Well, so is it worth it? It depends on your needs… remember that there are also few other services to consider.

Special feature: Corgi and Kitty modes

The most underestimated feature of Google Colab

Figure 3. The most underestimated feature of Google Colab

Killer feature. Funky alternative to rubber duck debugging.

You don’t have to choose between dogs and cats, you can have both (Figure 2.). Walking from left to right and right to left when you’re working on your advanced project. Of course you can ignore them or turn that feature off (if you are a grumpy guy), but why…

Conclusion

Although Colab is awesome for small projects, it may not be enough for bigger projects. Then you have to choose between Colab Pro (10$/month) and other platforms (GCP, AWS and so on). If you only want to play with machine learning and share code with your friends, I think Colab is the right tool for that.

Let me know if you use other features or you have a different opinion on Google Colab.