Truths of a Data Scientist: How Much SQL is Needed For Data Science

There’s a lot of talk about data science and its prerequisites – what one needs to know to break in and become a data scientist.

And rightfully so, given that the field is exploding with opportunities (and high salaries).

But let’s be honest; there are thousands of guides out there telling you that you need to learn a thousand different things (to sell you something, probably).

It’s just not true.

In this post, we’ll try to answer that question by discussing how much SQL you need to finally secure your first data science job.

After this post, you’ll have learned:

  • What SQL is
  • How Much SQL Is Needed For Data Science
  • Why You’ll need to learn SQL
  • How Long it takes to learn SQL
  • If you can “get by” as a data scientist without SQL
  • And Which should you be spending most of your time on, python or SQL.

sql database app


What is SQL?

SQL is an abbreviation that stands for Structured Query Language. It is the go-to language for accessing, querying, and manipulating database systems.

As a tip, it’s pronounced “SEQUEL” like a Movie Sequel.

SQL is used by many relational database management systems (DBMS), including MySQL, Microsoft SQL Server, Oracle, Vertica, and PostgreSQL.

It should be noted that each database management system mentioned above has its own slightly different programming language implementation and isn’t exactly the same.

This isn’t as drastic as the difference between something like python and javascript, and then once you learn one, understanding the others will only take a day or two.

easy

Many standard languages have SQL packages (the one for python is called SQL Alchemy) that allow you to write SQL queries in your applications.

This will allow you to use SQL to select, update and delete right inside your application.

SQL has many uses outside of just applications, as analysts and data scientists use it to craft queries and get insights.

While SQL is known as a boring programming language, SQL still remains the standard language for interacting with structured data stored in databases and is an important skill for any full-stack developer, data analyst, and data scientist.


How Much SQL is Needed For Data Science?

There’s no one-size-fits-all answer to this question, as the amount of SQL needed for data science depends on the specific role you’re after.

If you’re interested in a more “Full-Stack” data science position, where you’ll be responsible for building ETL pipelines, you’ll need to be very efficient with SQL.

These types of roles are usually a third data engineer, a third data scientist, and a third machine learning engineer.

You’ll be writing a TON of SQL to thrive in this role.

sql code

While this may scare you, these roles are usually much more lucrative, as the skills needed to handle MLOPS, Modeling, Servers, and Production environments are much harder to find.

Personally speaking, these are the roles I target (more on that in another post)

On the other hand, if your job will mostly involve data analytics and working with Python, you’ll only need to be able to query your data.

These roles are usually closer to analyst roles, and you can squeeze by with strong skills in Python (Pandas).

The easiest way to tell what job it’ll be is by interviewing. If your interviewer emphasizes production environments and SQL questions – it will be a LOT of SQL.

If your interviewer is not heavily concerned with the amount of SQL you know, you’ll probably be okay with basic knowledge of SQL and SQL commands.

Determine what role works best for you, and work backward from that.

The only person who knows how much SQL YOU need is YOU.


Why do you need to learn SQL in data science?

There are two reasons why you’ll need to learn SQL for data science.

1.) You’ll never pass any interviews if you don’t know any SQL.

2.) Data is the engine of data science. You need to be able to get it out of databases.

To succeed as a data scientist or in any data analysis role, you’ll need SQL.

If you’re unable to retrieve data from your company’s systems, you’re going to have a tough time.

Just as a car without an engine is just a hunk of metal, a data scientist without data is just someone with a lot of theoretical knowledge.

Data provides the raw material that data scientists use to build their models and make predictions.

It’s what they use to test their hypotheses and fine-tune their algorithms.

In short, data is crucial for any data scientist who wants to.. do their job.

Imagine trying to play soccer without a soccer ball.

It would be pretty difficult, right?

That’s what it’s like for a data scientist without data.

soccerball


What is the best way to learn SQL?

In my opinion, the best way to learn SQL is either through a book or an online guide.

When I was initially learning SQL, I needed a more hands-on guide that allowed me to dive deep into it.

Not loving any online guides, I purchased Sylvia Moestl Vasilik’s book “SQL Practice Problems.”

I felt like most guides online were way overpriced compared to this book, which was around $20 when I purchased it.

It provides you with data and a how-to on setting up a free database, then teaches you about ~60 techniques in a “learn-by-doing approach” that anyone interested in learning SQL for data science will need to know.

Here is a picture of me holding the copy I purchased back in 2020!

SQL problems book


Like I’ve said in my other guides, I enjoy having physical copies of the books I use because I can mark up the code, highlight sections, and take notes in the margins that will be there the next time I encounter the same problems.

[ninja_table_builder id=”1496″]


How Long Does it Take to Learn SQL?

I was able to learn SQL using Vasilik’s book in three months by only studying for a couple of hours on the weekends.

The truth is this answer depends on your experience level and commitment.

If you’ve never touched any code before, it may take upwards of 6 months to really understand what’s going on.

calendar

However, the learning curve will be much shorter if you’re already familiar with another programming language.

If this is you, expect to see the same timeframe I ran into, or shorter if you’re willing to study during the week (I had grad school).

Of course, your mileage may vary, but it is important just to get started and not be discouraged if it takes a little longer than you initially expected.

[ninja_table_builder id=”1496″]


Can I get by as a data scientist without any SQL skills?

You could probably get by as a data scientist without SQL skills, but you won’t want to.

As a data scientist, you will inevitably have to work with data of some kind (as the name suggests).

And, as anyone who has worked with data knows, dealing with data can be a messy business.

SQL is the go-to tool for data retrieval.

Without at least a basic understanding of SQL, you will be at a disadvantage when working with data.

Moreover, if you can get a job as a data scientist without any SQL skills, you’re probably getting a data science job that isn’t doing much data science. These jobs should be avoided if your main goal is learning and career growth.

While they may offer some level of comfort in the short term, in the long run, they will likely prove to be more of a hindrance than a help.

long term assets

Data science is one of the most lucrative fields to be a part of; if you’re serious about becoming a data scientist, make sure you spend some time learning at least the basics of SQL.


Should I Spend More Time Learning Python or SQL for Data Science?

I believe 50% of your “learning time” should be spent learning data science theory, and the other 50% should be spent on coding.

Of that second 50%, 80% should be spent on Python and 20% on SQL.

I think this split is important because theory can help you understand the “why” behind everything, while code allows you to put that knowledge into practice (and build cool models).

By learning both, you’ll become a well-rounded scientist who can understand the complexities of code and write efficient programs.

What’s more, by focusing on Python and SQL, you’ll be able to cover the two most popular programming languages in data science, increasing your chances of getting your first data science role.

money man


Other Book Recommendations

At enjoymachinelearning.com, we only recommend six books.

Those 6 books are the ones in the images below

my favorite data science books

Check out some of those posts:

Stewart Kaplan