organizing ML experiments with

playing with - a tool for organizing, tracking and visualizing experiments

Whenever I start a new project that will require running a hundreds of experiments, integration with experiment tracking tools is the very first thing I add to my code. Such tools can do much more than metrics visualization and storing hyperparameters of each run. In this post I introduce and describe Neptune tool which I used for my recent projects.

Experiment “management” tools

Tools for experiment tracking give you a huge boost for your machine learning (and not only) projects. You can use them to track hyperparameters, visualize graphs (e.g. metrics) and images (confusion matrix or misclassified samples), store artifacts (model weights) and much more.

Fortunately people had noticed that project management tool is a must-have for well-organized projects. Hence, more and more tools and services are being created for this purpose: Comet, MLFlow, Neptune and Weights & Biases, to name a few.

Which tool do I use?

I personally use Neptune for my projects. I used it once and I liked it, but frankly I still have to find enough time to test other tools. Neptune is “the most lightweight” experiment management tool, as stated by authors. You can give it a try anytime because it is available as a service and is completely free to use with your personal projects, which means:

There are also other options available for companies or more demanding users.

“Team” plan allows you to create 10x more users and gives you 10x more disk space than the Free plan but it costs $79 for every user in your team. There’s no such thing as a free lunch… However, for education, research or non-profit purposes Team plan is entirely free (at the moment). Neptune tool can also be used on-prem (hosted within your organization) as a part of the “Enterprise” plan, but this is recommended mainly for large organizations that prefer not to use SaaS solutions for their products.

Using Neptune in code

Integrating Neptune with your code is a piece of cake, really. It is basically a Python module with simple API, for instance to log a couple of parameters and metric all you have to do is what you see below. Of course instead of these magic numbers, you will log values returned by a machine learning framework (like PyTorch Lightning which has a great support for Neptune and other tools).

import neptune

neptune.create_experiment(params={'lr':0.1, 'dropout':0.4})
neptune.log_metric('test_accuracy', 0.84)

Visit the official page of Neptune which is full of examples of how it works and lists all the features you may be interested in. I will show you which ones I find useful and how they look in my project. I will show only features I used, but note that there are plenty of other functionalities that your project needs.

A picture is worth a thousand words, so…


Figure 1. Dashboard: list of projects, disk usage and more.

Let’s start with a dashboard view (Figure 1.) with a list of all projects (including currently running experiments).

In the project view (Figure 2.), browse experiments list with their parameters and results (metrics e.g. test_loss). Modify displayed columns that include experiment information (author, running time, tags) and metrics (Neptune can suggest you adding a metric column based on what you log).

Figure 2. Experiments: recent runs, parameters and results.

Visualize metrics, by default each chart is displayed separately (Figure 3.), but you can create custom views as well. You may decide to show multiple visualizations in the same chart (Figure 4.).

Figure 3. Visualization of metrics and other values e.g. learning rate.
Figure 4. Custom charts: multiple metrics in the same chart.

Save any file (e.g. weights, output files) as an artifact (Figure 5.). Review them in a preview window (if possible) or download them to use for inference.

Figure 5. Artifacts: save/download artifacts (e.g. model weights).

Not using these is a waste of time

Change my mind.

Connecting your code to a “management” tool is a child’s play. In addition to that, in PyTorch Lightning for example, once you decide which metrics should be logged etc. you can use loggers (like Tensorboard, Neptune, Weights&Biases) interchangeably.

You can run experiments in background, all logs, images and results will be saved for you to overlook later.

Play with different tools, some can be used as Saas, others may run locally. They offer various features, they offer certainly much more than what I described. They also complement each other, so give one a try, then check out another one. Find the one that you and your team likes the most and have your projects well-organized. Thank me later.