Year by year machine learning models are getting more advanced, more accurate and break performance records. We collect more data and we use more complex models. Although it’s a good trend, things get more difficult for people starting with ML and using their low-end desktops. But it doesn’t mean that they cannot do cool things.
Does it make sense to use your low-end PC?
You can find numerous articles about recommended workstations for machine learning or advices for building the perfect computer to train your models. Authors and people in comments claim that runnning ML training on your notebook makes no sense and it’s just waste of time.
It comes as no surprise that you can’t do deep learning research or run state-of-the-art models on a low-end computer. But what if you just want to get your hands dirty with machine learning, you’re not really sure if it’s for you and you don’t want to buy a new PC for that? Or if you simply can’t afford it?
You can play with toy examples
Unless your PC is literally 30 years old, you should be still able to process small datasets and train simple models on your hardware. Although this makes your machine learning experience limited, this may be enough to test popular frameworks or check your implementation of simple classifiers.
Datasets such as CIFAR-10 or MNIST can be used for classification tasks with low resources and if you prefer regression problems, there are also a lot of small datasets publicly available. If you’re limited by weak GPU rather than data storage or I/O limits, use simpler models instead of recent state-of-the-art architectures.
If that’s not the option or you want to build and run something bigger, there is another way.
You can train your models on-line
When you keep hearing about cloud services, virtual machines, containers and other stuff, it may sound overwhelming. But within these unbelieveably long lists of Amazon AWS or Google Cloud products, there are few easy-to-use tools that allow you to make the most of machine learning even with your budget computer.
Of course you must pay for these online resources which are priced differently, depending mostly on hardware tier, amount of storage and time you use, but in most cases you will be able to use free tier that allows you to use quality hardware as long as you stay withing e.g. time or disk-space limit. There are even some tools that cost absolutely nothing - you can use them as long as you e.g. stick to weekly quota (like max. 20 hours per week).
Let me introduce couple of options I use(d), so that you can see how easy it is to train and test your (more complex) machine learning even without investing any money in a new piece of hardware. If you want to already know what’s the best for your case, just scroll down to the Conclusion section, I don’t mind 🙂
Although Kaggle is not meant for renting computation resources in the first place, it has a great feature called kernels (or notebooks) in which you can execute your code on their servers. Kaggle supports both Jupyter Notebooks and pure Python or R scripts. Most common data science libraries are already installed on both Python and R environments so you don’t have to care about setting them up.
There is not limit for running notebooks that use CPU only, but using GPU and TPU is limited to 40 and 30 hours per week respectively. If it’s enough for you and you don’t need have any high-end hardware to train your models locally, this may be the best option. An additional benefit is a great Kaggle community: a lot of public notebooks, tutorials and courses. This way you can not only run your code for free, but also accelerate your learning 📚
You can also use one of public datasets or upload your own datasets to Kaggle (limited to 50GB) and make them either public or private. And once datasets is uploaded, using it in kernels is really easy. Go to dataset page e.g. Credit Card Fraud Detection dataset and click New Notebook button on the right side. Your notebook will be created and you can read data and play with it immediately. That’s it!
What’s the alternative?
Actually “Google Colaboratory” is similar to Kaggle notebooks in the sense that it allow you to execute Python in the browser. However, unlike Kaggle, for Colab it is the main functionality (there are no datasets, competitions etc.). This means that Colab focuses more on notebook features and has much more to offer there.
In both Kaggle and Colab kernels you can use interactive Jupyter notebooks (here they call it “Colab notebooks” though) that allow you to install stuff in a separated environment, load your data, add Markdown cells for documenting the code and also code cells that you execute and see results right after it’s finished.
With regard to data, there is no public dataset repository in Colab, but instead you can upload data files right from your desktop (cool if you need only like one CSV file) or you can use Google Drive that integrates well with Colaboratory. Thanks to that, you can also share documents with other people and work together on your machine learning models.
Costs and limits? Colab is free, just like Kaggle, and has just one simple limitation: instead of weekly quota, you are allowed to run your code for as much as 12 hours. Then it’s terminated. And that’s it. Okay, there’s another thing - although your resources are free, they are not guaranteed or unlimited:
Colab is able to provide free resources in part by having dynamic usage limits that sometimes fluctuate, and by not providing guaranteed or unlimited resources. This means that overall usage limits as well as idle timeout periods, maximum VM lifetime, GPU types available, and other factors vary over time. Colab does not publish these limits, in part because they can (and sometimes do) vary quickly.
Google has also introduced Colab Pro that offers faster GPUs, longer runtimes (up to 24h instead of 12h with free Colab) and more memory for your notebooks. It costs only $9.99/month, but currently is only available in the US.
Amazon EC2, GCP and Azure
No doubt I can’t describe all of their features as a part of this post. Amazon, Google and Microsoft stay at the top of cloud services and I think it won’t change anytime soon. They also allow you to store data and run computations on virtual servers.
In their offers you may find free tiers with limited resources or you may get free budget to settle in (50, 100 or 200$ - it may vary or you may get a special coupon from a workshop, workplace or whatnot). With non-free configuration, you get an access to better GPUs, more memory and so on. Then you pay for it, usually it’s per-hour price that you pay as you use the machine.
But Amazon, Google and Microsoft have much more to offer than just a storage for your datasets and resources for your computation. All of them go one level higher and offer Machine Learning as a Service (MLaaS) which you may know as Azure ML Studio, AWS ML or Google Cloud ML Engine that let you not think about data processing or training and evaluating models. You can use their own pre-trained algorithms and call them directly from your application using REST API.
Back to having your new PC for deep learning - when you should consider it?
Having your own deep learning rig
As I said at the beginning, there are different options to enjoy machine learning while still using your budget computer, which is the case when you want to only play with ML a bit or you can’t afford new deep learning PC. But sometimes having one is a better option than renting GPU resources. And it’s more cost efficient.
When does it pay off? It depends mostly on how much time you spent running your models and how much disk space you need (so whether you would need to also pay for cloud storage). If you use your computer for complex tasks for like 8 hours a day, it may be a better option to spend money on a decent PC once and not pay for every hour of GCP resources.
Not sure if this is your scenario? Unfortunately, you need to calculate it individually - with your average energy cost, amount of time you spend training and also amount of disk space you need for your data. After you compare it with different tiers from AWS, Azure or GCP - you’ll know what’s the best for you.
Remember: Low-end PC was once a high-end PC
At the end, there’s a small caveat of having your own hardware beast for deep learning. Remember that every PC considered as a low-end (yours as well) was once a high-end PC, just a few years ago. There’s no way to predict the future, but there’s a chance that in few years from now, you’ll find yourself again wondering if it’s the time to get a better PC for current (future) state-of-the-art deep learning.
If you want to try out machine learning algorithms or you need to train a model on small amount of data for your project - use Kaggle or Google Colab as they are free. Kaggle has a lot of great resources, datasets and community whilst Colab has more features in their notebooks and integrates with your Google Drive.
It that’s not enough - consider signing up for Colab Pro or experimenting with virtual GPUs from Amazon, Google, Microsoft or any other virtual machine that you can rent for the time you need it. You can cancel the subscription if you see that you are going to pay a lot of cash for these resources. Then you may decide to look for a deep learning PC that you can buy once and use locally whenever and as long as you want to.
Hope you make the right decision and you enjoy machine learning!