The Best OS For Data Science [From An Ex-Data Scientist]

I think people overcomplicate this.

With so many options available, people think the operating system question is up for debate.

It’s not.

There’s a definitive answer – and this blog post will give it to you (and a little more).

I will tell you why macOS is the best Operating System (OS) for data science; then, I’ll give you some other things to consider (if you want to go down that path).

This will be a quick one, so let’s jump right in.

people jump


Why Do Data Scientists Need An Operating System?

Choosing an operating system may seem like an unimportant detail for data scientists, but it’s weirdly essential.

I mean, it’s important for obvious reasons – without an operating system, it would be impossible to do anything on your computer.

In today’s age, where computers are on every desk in the world, everyone relies on some operating system – not just data scientists.

So while you need an operating system at the minimum – you can do yourself a favor by picking the right one.

Your choosing criteria should follow the logic below:

  • What do other devs on my team use
  • Which operating system makes my job the easiest
  • Which operating system will make me better at my job.


Which OS is best For Data Science?

The reason macOS is the best operating system for data scientists (and anyone who writes code) is because it’s the best of both worlds.

The easiest way to understand this is by looking at the image below:

chart comparing linux windows and macos

While the Linux operating system mimics a production environment and is insanely easy to code on – it’s a brutal operating system to use.

At any moment, you can accidentally wipe your entire PC’s storage (speaking from experience).

Not only is it a high-risk OS, but it comes with an insanely high learning curve.

Most work is done through the terminal, which is the equivalent of learning another programming language (on top of the one you’re already learning).

Now, we contrast this to the other end of the spectrum.

Windows operating system is straightforward to use, and you could give that OS to anyone, and they’d be able to do almost everything.

The problem with Windows is it’s not UNIX based.

So, you’ll be running and writing all of this code, and once it’s done, you’ll put it on a production server (which will be running UNIX), and the code will start running into massive amounts of problems.

Errors will arise that don’t exist in your dev environment, so you’re put in a position where you’ll be trying to solve issues that only exist in one realm.

Trust me, this scenario is a nightmare – and is why many programmers are not using windows by default.

Okay – where does that leave us?

thinking

Windows is super easy to use but isn’t valuable in a production-dev environment, and Linux is incredible for coding but has too high of a learning curve to make it worth it – where do we go?

Right in the middle.

This is why macOS is perfect for data scientists.

You get a UNIX-based system that creates an incredible dev/production relationship while maintaining high ease of use.

And, with the advancements we’ve seen recently from Apple, the M1 and M2 laptops and processors are starting to dominate the market.

I know some make this debate “close.” – but it’s not; the answer to the best operating system for data scientists is macOS.

And, if we bring back to our original three criteria

  • What do other devs on my team use
  • Which operating system makes my job the easiest
  • Which operating system will make me better at my job.

Most devs (at least on the teams I’ve worked on) have used macOS.

The ease of use over Linux will make your job easier, and the ability to have concurrency from dev to production will make you a better data scientist than someone using windows.

It really is the whole package.


Is Linux or Windows Better For data science?

If I had to choose between Linux or Windows for my main computer as a data scientist, I’d go with a Linux operating system.

Putting models into production is incredibly important for data scientists, and I don’t trust the Windows dev to Linux production pipeline as it always leads to problems.

While I’d wish it was macOS the whole time, I would enjoy that sweet, sweet Linux Dev to Linux Production pipeline.


Which Linux Distro is Best For Data Science?

The best Linux Distro for data scientists is the distro they will use in production.

Matching your production OS to your dev OS is a massive advantage in the long run – so I’d choose whatever operating system my production server uses.

However, if I HAD to choose, I’d pick Linux Mint since it’s similar to windows.


Why Does Your OS Matter In Data Science?

Your OS matters in data science for the same reason that baseball players use gloves in the outfield – it’s easier to get the job done with the right tools.

Your operating system can be your best friend or your worst nightmare.

Instead of risking it – pick macOS and stop worrying about it.

baseball player

Stewart Kaplan

One comment

  1. Thanks for the eloquent and concise post on the virtues of macOS for Data Science.

    How much more difficult (if at all) has working on macOS become for ML practitioners following nVidia’s withdrawal of CUDA support a few years ago?

    Cheers!

Leave a Reply

Your email address will not be published. Required fields are marked *