Lasso Regression vs PCA

Lasso Regression vs PCA [Use This Trick To Pick Right!!]

If you’re trying to understand the main differences between lasso regression and PCA – you’ve found the right place. In this article, we will go on a thrilling journey to learn about two cool data science techniques: Lasso Regression and PCA (Principal Component Analysis). While these two concepts may sound a bit complicated – don’t worry; we’ll break them down in a fun and easy way! 

The main difference between PCA and Lasso Regression is that Lasso Regression is a variable selection technique that deals with the original variables of the dataset. In contrast, PCA (Principle Component Analysis) deals with the eigenvectors created from the covariance matrix of the variables.

While the above makes it seem pretty simple – there are a few nuances to this difference that we will drive home later in the article.

If you’re trying to learn about these two topics, when to use them, or what makes them different, this article is perfect for you.

Let’s jump in.

thinking


When You Should Use Lasso Regression

Lasso Regression is an essential variable selection technique for eliminating unnecessary variables from your model.

This method can be highly advantageous when some variables do not contribute any variance (predictability) to the model. Lasso Regression will automatically set their coefficients to zero in situations like this, excluding them from the analysis. For example, let’s say you have a skiing dataset and are building a model to see how fast someone goes down the mountain. This dataset has a variable referencing the user’s ability to make basketball shots. This obviously does not contribute any variance to the model – Lasso Regression will quickly identify this and eliminate these variables.

Since variables are being eliminated with Lasso Regression, the model becomes more interpretable and less complex.

Even more important than the model’s complexity is the shrinking of the subspace of your dataset. Since we eliminate these variables, our dataset shrinks in size (dimensionality). This is insanely advantageous for most machine learning models and has been shown to increase model accuracy in things like linear regression and least squares.

While Lasso Regression shares similarities with Ridge Regression, it is important to distinguish their differences.

lasso regression


Both methods apply a penalty to the coefficients to reduce overfitting; however, Lasso employs an absolute value penalty, while Ridge uses a squared penalty.

This distinction leads to Lasso’s unique variable elimination capability.

One crucial aspect to consider is that Lasso Regression does not handle multicollinearity well.

Multicollinearity occurs when two or more highly correlated predictor variables make it difficult to determine their individual contributions to the model.

In such cases, Lasso Regression might not be the best choice. 

Nonetheless, when working with data that has irrelevant or redundant variables, Lasso Regression can be a powerful and efficient technique to apply.


When You Should Use PCA

PCA is a powerful feature selection technique, though it is one of the most unique ones of the bunch. 

PCA is handy when dealing with many variables that exhibit high correlation or when the goal is to reduce the complexity of a dataset without losing important information.

While PCA does not eliminate variables like Lasso Regression, it does transform the original set of correlated variables into a new set of uncorrelated variables called principal components (linear combination).

This transformation allows for preserving as much information as possible while reducing the number of dimensions in the data.

By extracting the most relevant patterns and trends from the data, PCA allows for more efficient analysis and interpretation. 

Since you’ll be modeling over the eigenvectors, PCA gives you complete control (much like the lambda in Lasso) to decide how much of the variance you want to keep.

Usually, the eigenvectors will contribute to the variance with something like this.

 Eigen Vector 1 (Highest Corresponding Eigen Value)  50.6% of the total variance  
 Eigen Vector 2 (Second Highest Corresponding Eigen Value)  18.5% of the total variance  
 Eigen Vector 3 (Third Highest Corresponding Eigen Value)  15% of the total variance  
 Eigen Vector 4 (Fourth Highest Corresponding Eigen Value)  11% of the total variance  
 Eigen Vector 5 (Fifth Highest Corresponding Eigen Value)  4.9% of the total variance  

 Due to our covariance matrix’s “box” shape, we’ll have the same amount of eigenvectors as variables.

However, as we can see from above, we can drop eigenvector 5 (a 20% reduction in data size!) while only losing out on 4.9% of the total variability of the dataset.

Before utilizing PCA, we would have had to drop one of the variables, losing 20% of the variability for a 20% reduction in the dataset (assuming all variables contributed equally).

You should use PCA when you have many variables but don’t want to eliminate any original variables or reduce their input into the model. This is common in DNA sequencing, where thousands of variables contribute equally to something.

Note: Since PCA is trained on the eigenvectors, you’ll have to apply this same transformation to all data points before predicting in production. While this may seem like a huge hassle, saving and applying the transformation within your pipeline is very easy.


PCA vs Lasso Regression

As we’ve seen above, both Lasso Regression and PCA hold their weight in dimensionality reduction. While PCA can seem a little confusing when discussing eigenvalue and orthogonal projections, data scientists and machine learning engineers use both these techniques daily.

In short – use PCA when you have variables that all contribute equally to the variance within your data or your data has high amounts of multicollinearity. Use Lasso Regression whenever variables can be eliminated, and your dataset has already been cleansed of multicollinearity. 


Pros And Cons of Lasso Regression


Pros:

  • Variable selection: Lasso Regression automatically eliminates irrelevant or redundant variables, resulting in a more interpretable and less complex model.
  • Reduced overfitting: By applying a penalty to the coefficients, Lasso Regression helps prevent overfitting, leading to better generalization in the model.
  • Model simplicity: With fewer variables, Lasso Regression often results in more straightforward, more easily understood models.
  • Computationally efficient: Compared to other variable selection techniques, Lasso Regression can be more computationally efficient, making it suitable for large datasets.


Cons:

  • Inability to handle multicollinearity: Lasso Regression does not perform well with highly correlated variables, making it less suitable for datasets with multicollinearity.
  • Selection of only one variable in a group of correlated variables: Lasso Regression tends to select only one variable from a group of correlated variables, which might not always represent the underlying relationships best.
  • Bias in coefficient estimates: The L1 penalty used by Lasso Regression can introduce bias in the coefficient estimates, especially for small sample sizes or when the true coefficients are large.
  • Less stable than Ridge Regression: Lasso Regression can be more sensitive to small data changes than Ridge Regression, resulting in less stable estimates.


Pros And Cons of PCA


Pros:

  • Addresses multicollinearity: PCA effectively handles multicollinearity by transforming correlated variables into a new set of uncorrelated principal components.
  • Dimensionality reduction: PCA reduces data dimensions while retaining essential information, making it easier to analyze and visualize.
  • Improved model performance: By reducing noise and redundancy, PCA can lead to better model performance and more accurate predictions.
  • Computationally efficient: PCA can be an efficient technique for large datasets, as it reduces the complexity of the data without significant information loss.


Cons:

  • Loss of interpretability: PCA can result in a loss of interpretability, as the principal components may not have a clear or intuitive meaning compared to the original variables.
  • Sensitivity to scaling: PCA is sensitive to the scaling of variables, requiring careful preprocessing to ensure that the results are not influenced by the variables’ choice of units or magnitude.
  • Assumes linear relationships: PCA assumes linear relationships between variables and may not perform well with data that exhibits nonlinear relationships.
  • Information loss: Although PCA aims to retain as much information as possible, some information is inevitably lost during dimensionality reduction.
Stewart Kaplan

Leave a Reply

Your email address will not be published. Required fields are marked *