In a classification problem, labels will be given for the data points, giving you a reference point to create your accuracy measure.
What happens in unsupervised clustering algorithms when these reference data points and labels don’t exist?
(Full K-Means Clustering Algorithm in Python at the Bottom)
K-Means is an unsupervised clustering algorithm used to create clusters of data points from data without labels. These clusters are great for finding groups with similar attributes. If your data has labels, a supervised algorithm will provide better insights than K-Means.
In unsupervised learning, you won’t have target values to compare for your accuracy. This means accuracy is found in K-Means using something called the Silhouette Score, and this score takes advantage of the distance between clusters and the distance to each point.
In Classification Algorithms, you can use regular accuracy scores comparing predicted values to class labels. Since you will not have original class labels in clustering, you’ll need to use a distance metric to figure out how good your clustering is.
For this type of calculation, the distance metric you use doesn’t usually matter, as long as it’s consistent.
What will matter is the algorithm that you choose.
To find the optimal K Clusters, we like to utilize the Silhouette Score at each cluster and each point.
The Silhouette score in the K-Means clustering algorithm is between -1 and 1. This score represents how well the data point has been clustered, and scores above 0 are seen as good, while negative points mean your K-means algorithm has put that data point in the wrong cluster.
Think about it this way in the below example.
We have two clusters: Cluster Blue and Cluster Red.
If I were to ask you what cluster you think the green circle belonged to, you would answer red.
However, what if this circle was classified as blue?
This circle would have a negative Silhouette score, as it’s closer to a different cluster than the one it’s assigned.
And as an inverse, if this green circle was classified as a red circle, it would have a value nearing one due to how close it is to the red centroid and how far it is from the blue centroid.
Conceptually, the Silhouette score utilizes some distance parameter to measure how far a point is from its cluster compared to the centroid of a different cluster. If this value is negative, this data point is closer to another cluster than the one assigned.
The Average Silhouette Method takes the average Silhouette score of each data point for each cluster. This value can be misleading, as some individual data points of a cluster can have negative values, but the overall cluster value is positive. However, it can act as an accuracy proxy.
While this method has its uses in data science and as a data scientist, I suggest you look for individual silhouette scores at each point instead of the overall cluster average.
You want to find the cluster centers that minimize the number of negative silhouette values.
This strategy can change depending on what your end goal is.
The easiest way to find K in K Means is by using the elbow method. Plot the inertia at many different values of K. When the graph looks like an elbow, select that as an initial K value moving forward. This value K will need to be validated.
In machine learning, people often make the mistake of maximizing their inertia value.
While this will give decent cluster assignments and labels, your overall clustering performance will be weak.
The next step, validating these clusters and labels, will lead you to better insights for you and your customers.
Realize that in K Means the most crucial parameter is K, and you have to do whatever you can to ensure that you have chosen the correct value.
# calculate k using python, with the elbow method
inertia = []
# define our possible k values
possible_K_values = [i for i in range(2,40)]
# we start with 2, as we can not have 0 clusters in k means, and 1 cluster is just a dataset
# iterate through each of our values
for each_value in possible_K_values:
# iterate through, taking each value from
model = KMeans(n_clusters=each_value, init='k-means++',random_state=32)
# fit it on YOUR dataframe
model.fit(df)
# append the inertia to our array
inertia.append(model.inertia_)
plt.plot(possible_K_values, inertia)
plt.title('The Elbow Method')
plt.xlabel('Number of Clusters')
plt.ylabel('Inertia')
plt.show()
# import our models
from sklearn.cluster import KMeans
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import silhouette_samples, silhouette_score
# cleaning, plotting and dataframes
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# as always, we will use a publicly available dataset, from kaggle
# https://www.kaggle.com/datasets/arjunbhasin2013/ccdata
# put it in the same folder as this notebook
df = pd.read_csv("CC_DATA.csv")
# for this example, we're not goin to be dealing with nans (different tutorial for that :P)
df = df.dropna()
# we drop cust_id, as a ID column doesn't help us specifically for clustering
df = df.drop(columns=['CUST_ID'])
# let's see a random sample
print(df.sample(n=5))
## lets apply min-max scaling to each row
# define our scaler
scaler = MinMaxScaler()
# scale down our data
df_scaled = scaler.fit_transform(df)
# see here four rows that are scaled
print(df_scaled[0:4])
# calculate k using python, with the elbow method
inertia = []
# define our possible k values
possible_K_values = [i for i in range(2,40)]
# we start with 2, as we can not have 0 clusters in k means, and 1 cluster is just a dataset
# iterate through each of our values
for each_value in possible_K_values:
# iterate through, taking each value from
model = KMeans(n_clusters=each_value, init='k-means++',random_state=32)
# fit it
model.fit(df_scaled)
# append the inertia to our array
inertia.append(model.inertia_)
plt.plot(possible_K_values, inertia)
plt.title('The Elbow Method')
plt.xlabel('Number of Clusters')
plt.ylabel('Inertia')
plt.show()
# now we have a problem, which K do we choose? anything past 15 looks really good, let's test 25
# let's use silhouette_samples and silhouette_score to find out
# new model
model = KMeans(n_clusters=25, init='k-means++',random_state=32)
# re-fit our model
model.fit(df_scaled)
# compute an average silhouette score for each point
silhouette_score_average = silhouette_score(df_scaled, model.predict(df_scaled))
# lets see what that score it
print(silhouette_score_average)
#0.261149550725173
# while that's nice, what does that tell us? there could still be a points with a negative value
# let's see the points
silhouette_score_individual = silhouette_samples(df_scaled, model.predict(df_scaled))
# iterate through to find any negative values
for each_value in silhouette_score_individual:
if each_value < 0:
print(f'We have found a negative silhouette score: {each_value}')
# wow, there is a ton!
# intially, 25 looked like a really good k value, now it's not seeming so!
# how can we find a value that optimizes this score?
# re-do our loop, try to find values with no negative scores, or one with the least!!
bad_k_values = {}
# remember, anything past 15 looked really good based on the inertia
possible_K_values = [i for i in range(15,30)]
# we start with 1, as we can not have 0 clusters in k means
# iterate through each of our values
for each_value in possible_K_values:
# iterate through, taking each value from
model = KMeans(n_clusters=each_value, init='k-means++',random_state=32)
# fit it
model.fit(df_scaled)
# find each silhouette score
silhouette_score_individual = silhouette_samples(df_scaled, model.predict(df_scaled))
# iterate through to find any negative values
for each_silhouette in silhouette_score_individual:
# if we find a negative, lets start counting them
if each_silhouette < 0:
if each_value not in bad_k_values:
bad_k_values[each_value] = 1
else:
bad_k_values[each_value] += 1
for key, val in bad_k_values.items():
print(f' This Many Clusters: {key} | Number of Negative Values: {val}')
# as we can see, inertia showed us that our value needed to be bigger than 15.
# but how did we choose past that?
# we optimized our K value utilizing the silhouette score, choosing 16 as it has
# the lowest amount of negative values
This concept is a little dense, and we hope some of the answers below help you better understand the topic.
Once you’ve found your clusters, you should look to create specific strategies for each cluster. Since these clusters are mathematically very similar, optimizing decisions at the cluster level will lead to great results, and this could mean many different strategies for your dataset.
For clustering, you always want positive silhouette scores. A negative silhouette score symbolizes that a point is closer to the centroid of a different cluster than the cluster it’s currently assigned to. The maximum value of a silhouette score is 1.
For Kmeans, a good silhouette score is above 0, which means for each data point, the silhouette score is above 0. You should re-do your clustering with a different K value if a silhouette score is negative or has a low average score.
In clustering, you always want positive silhouette scores. The values of a silhouette score are between 1 and -1. A negative silhouette score symbolizes that a point is closer to the centroid of a different cluster than the cluster it’s currently assigned to.
These different articles will help you better understand machine learning and the different ways you can implement these algorithms in python.
Links to those articles are below:
This hypothesis test is commonly used to test three different things.
In a later post, we will dive into the goodness of fit test, but this one will focus primarily on independence and homogeneity.
(Full Chi-Square Test of Independence Python Code at the bottom)
The Chi-Square Test is a test to see whether or not two categorical variables are independent.
This is extremely valuable, as other things in machine learning are based on the assumption of independence.
Many people get confused about the different tests based on the chi-squared distribution.
While the three we will be focused on in this blog are very important, they are used in different situations.
I do want to make a distinction about independence.
Rows of data that are not independent due to something underlying (like in time series) throughout a dataset is a different type of independence than we are testing here.
Think of chi-square (in data science terms) as column independence, where the type of independence referenced above is more focused on row independence. (Read More Here)
The chi-square test will tell whether two categorical variables are independent.
This is crucial as many other statistical tests cannot be applied if variables are not independent of one another.
For example, Naive Bayes assumes that all of your variables are independent.
The difference between the chi-square test of homogeneity and independence is that the test of independence tests if two categorical variables are related to each other in a population, while the test of homogeneity tests if two or more subgroups in the same population share distribution for a chosen categorical variable.
While these sound very similar, they’re not very close in practice.
It’s easiest to see with an example.
Let’s say you have 1000 cats.
Like any great data science project, you have a ton of perfect clean data. (haha)
The first thing you are interested in is judging if the type of cat and the color of the cat is independent.
For this, we would use our chi-square test of independence.
We will utilize the full population, and our null hypothesis is that these variables are independent.
This is the standard null hypothesis in any chi-square test of independence.
You’d compute your test statistic and critical value at whatever alpha value you desire, and if your test statistic is higher than your critical value, we reject the null hypothesis.
This would mean that these categorical variables are not independent for your population.
In p-value terms, this would mean your p-value is very low.
Most use a p-value of .05, and we’d check if our computed p-value is below our threshold.
If our value is below .05, we can say this is statistically significant. Meaning we reject the null hypothesis.
Remember, one minus your computed p-value is the probability that the alternative hypothesis is true, and if your p-value is very small, that probability is very high.
Now let’s say you were interested in if two different types of cats have the same frequency of colors.
This would compare two sub-groups (not variables!!) from the same population.
For this, we would use the chi-square test of homogeneity.
Many of the steps past this point are similar to the steps above.
You would set up your null hypothesis, assuming that the distribution for the categorical variable you are looking at is the same for both sub-groups.
You would build your contingency table and utilize observed and expected frequencies to compute the test statistic.
Depending on the chi-squared test statistic values obtained above, you can choose to either reject or not reject the null hypothesis.
As we can see, while both are great statistical methods, they have different uses.
Make sure you understand your specific situation to utilize the correct statistical test.
The chi-square test of independence compares two categorical variables to see if they are independent or not. Independence of variables is essential for further testing, as some statistical methods cant be applied to data that is not independent.
You use categorical variables in the chi-square test because the test statistic is computed based on frequencies. If the data is not categorical, there will be no way to bucket the responses. We can create a contingency table with these frequencies and compute the test statistic.
While this may seem odd, think about a data set around different heights.
How would you put these height values into categories?
You may have some heights that ended up being near each other (or the same), but most will be over a wide range of values. (depending on how in-depth measuring was)
This makes it nearly impossible to create our contingency tables.
There is a strategy to “bucket” these numerical columns, to pseudo convert them into buckets that can be utilized as a categorical variable. (Read More Here)
Now, consider if we had a categorical variable, either “Yes” or “No.”
It quickly becomes very easy to compute our contingency table, looking something like this.
We reject the null hypothesis with a low p-value because (1 – p-value) is a calculated representation that the alternative hypothesis is true. As the p-value gets closer to zero, we increase our chances that the alternative hypothesis is true, meaning we must reject the null hypothesis.
The chi2 contingency function is a function in Scipy. This function, which takes in a contingency table created from categorical values, evaluates the chi-square statistic and the p-value for that contingency table.
We can use these values to decide whether to reject or not reject the null hypothesis that these variables are independent.
Making a contingency table in Python is straightforward. Import the Pandas package and utilize the crosstab function.
# contigency table in python with pandas
compare = pd.crosstab(df['color'],df['transmission'])
#an example contigency table from pandas crosstab
print(compare)
To do a chi-square test in pandas, you need to convert the data into a contingency table with each frequency. The easiest way to do this is to utilize the Pandas crosstab function. Once done, use the Scipy package chi2 contingency function to compute the chi-square statistic and p-value.
# import your modules
import numpy as np
import pandas as pd
import scipy.stats as stats
# as always, we will use a publicly
# available dataset for the demo
# lets use https://www.kaggle.com/datasets/lepchenkov/usedcarscatalog
# put it in the same path as this notebook
df = pd.read_csv("cars.csv")
# for this example, let's focus on transmission, color, engine_type
df = df[['manufacturer_name','transmission','color','engine_type']]
print(df.sample(n=15))
# contigency table in python with pandas
compare = pd.crosstab(df['color'],df['transmission'])
#an example contigency table from pandas crosstab
print(compare)
# are these variables independent?
chi2, p, dof, ex = stats.chi2_contingency(compare)
print(f'Chi_square value {chi2}\n\np value {p}\n\ndegrees of freedom {dof}\n\n expected {ex}')
# looks like our p value is very low
# these categorical variables are not independent
These additional articles will help you better understand machine learning and the different ways you can implement these algorithms in Python.
Links to those articles are below:
For example, you could run into a situation where the data is not linear, you have more than one variable (multivariate), and you seem to have polynomial features.
You still want to ensure that your predicted values are correct, but a non-linear relationship is hard to accurately model with a linear regression model.
The data science toolbox is constantly expanding.
While most are equipped with a linear model, they’d use it in a linear scenario; what happens when data has many independent variables that display non-linear features?
That’s when we need to start looking for other models to use
How would you handle this problem?
(Full Python Code with Example Data at Bottom)
Multivariate polynomial regression is used to model complex relationships with multiple variables. These complex relationships are usually non-linear and high in dimensions. Once an accurate equation (model) is created or found, this equation can be used for future accurate predictions.
Let’s say you are trying to determine the relationship between multiple variables (in this scenario, we can think of variables X, Y, and Z).
You are interested in determining the relationship between one variable and the other.
You first need to figure out if there is any relationship in the data, and if so, we want to find the equation or function that relates those variables together.
You spend some time doing EDA and other visual data science techniques.
The first thing you notice is your data is not linear.
Polynomial regression is a basic linear regression with a higher order degree. This higher-order degree allows our equation to fit advanced relationships, like curves and sudden jumps. As the order increases in polynomial regression, we increase the chances of overfitting and creating weak models.
While most machine learning engineers or data scientists won’t have the equation up-front, a polynomial equation is straightforward to spot.
The order of a polynomial regression model does not refer to the total number of terms; it refers to the largest exponent in any of them.
Below, we’d see that this would be a n order polynomial regression model
y = bo + b1 x + b2 x^2 …..+ bn x^n + e
As we can see from this example, this looks very similar to our simple linear regression model, now with order n.
Collinearity is a correlation found between your predictor variables, and this correlation can be positive or negative. If Collinearity in your quadratic polynomial regression model is a concern, fit the model with X and (X-Sample Mean)^2. This can fix any polynomial by increasing the squared term.
This trick will work for any order; continue to increase the squared value to match the order of that current variable.
Polynomial regression is used similarly to linear regression to predict a value at some point. However, unlike linear regression, polynomial regression can estimate curves and more advanced relationships between the independent and dependent variables presented in our data.
Performing polynomial regression usually means we’re interested in the relationships between different variables in our dataset and the relationships these variables create on our outputs.
Polynomial regression can be used for multiple independent variables, which is called multivariate polynomial regression. These equations are usually very complex but give us more flexibility and higher accuracy due to utilizing multiple variables in the same equation.
Many have moved on to more complex models in machine learning to understand these relationships.
However, most of these algorithms are black-box, meaning that once relationships are found, we will no longer understand the relationship. [1,2]
This equation can be extracted and understood if these complex equations are found utilizing multiple linear regression or polynomial regression.
Linear regression is a subset of polynomial regression, as linear regression is just polynomial regression in the first order. This means that linear regression is still polynomial regression. Once you get out of first order (like quadratic), these equations are no longer linear.
The main difference between linear regression and polynomial regression is that polynomial regression can model complex relationships, while linear regression can only model linear relationships. However, linear regression is a subset of polynomial regression with just order one.
Before jumping into the code, we need to understand when we would use each model.
To understand, see these different lines below.
While both are under the polynomial regression model umbrella, only one is a linear relationship.
As we can see from the blue line, our equation is 2x.
For every value that x increases, y will increase by two.
This linear relationship will hold for any value of x.
For our orange line, we quickly see as our x values grow across the bottom, the gap between our x and y value grows.
The equation for this line is 2^x.
While these lines are equal at [0,1,2], as x increases, the deviation between these lines grows.
This is because this function’s derivative (slope) is 2x.
# make sure to import all of our modules
# sklearn package
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# dataframes
import pandas as pd
# computation
import numpy as np
# visualization
import matplotlib.pyplot as plt
# dataset
# https://www.kaggle.com/datasets/ciphernine/brooklyn-real-estate-listings
# place it in the same folder as this workbook
df = pd.read_csv('brooklyn_listings.csv')
# for this example, we're going to estimate the price with sqft, bathroom, and bedrooms
df = df[['price','bathrooms','sqft']].dropna()
# show some random lines from our data
print(df.sample(n=15))
# seperate out our x and y values
x_values = df[['bathrooms','sqft']].values
y_values = df['price'].values
# visual
print(x_values[0], y_values[0])
#define our polynomial model, with whatever degree we want
degree=2
# PolynomialFeatures will create a new matrix consisting of all polynomial combinations
# of the features with a degree less than or equal to the degree we just gave the model (2)
poly_model = PolynomialFeatures(degree=degree)
# transform out polynomial features
poly_x_values = poly_model.fit_transform(x_values)
# should be in the form [1, a, b, a^2, ab, b^2]
print(f'initial values {x_values[0]}\nMapped to {poly_x_values[0]}')
# [1, a=5, b=2940, a^2=25, 5*2940=14700, b^2=8643600]
# let's fit the model
poly_model.fit(poly_x_values, y_values)
# we use linear regression as a base!!! ** sometimes misunderstood **
regression_model = LinearRegression()
regression_model.fit(poly_x_values, y_values)
y_pred = regression_model.predict(poly_x_values)
regression_model.coef_
mean_squared_error(y_values, y_pred, squared=False)
# check our accuracy for each degree, the lower the error the better!
number_degrees = [1,2,3,4,5,6,7]
plt_mean_squared_error = []
for degree in number_degrees:
poly_model = PolynomialFeatures(degree=degree)
poly_x_values = poly_model.fit_transform(x_values)
poly_model.fit(poly_x_values, y_values)
regression_model = LinearRegression()
regression_model.fit(poly_x_values, y_values)
y_pred = regression_model.predict(poly_x_values)
plt_mean_squared_error.append(mean_squared_error(y_values, y_pred, squared=False))
plt.scatter(number_degrees,plt_mean_squared_error, color="green")
plt.plot(number_degrees,plt_mean_squared_error, color="red")
From above, we see our model did best when our degree=3, meaning a cubic function helped us predict housing pricing most accurately.
We have a ton of additional machine learning python tutorials built just like this one.
This will help you better understand machine learning and the different ways you can implement these algorithms in python.
Links to those articles are below:
We understand that multivariate polynomial regression in python is complicated.
We hope the answers listed below will help clear up any difficulties you are having.
As always, be sure to send us an email if you still have any questions.
Polynomial regression can be used for multiple variables; this is called multivariate polynomial regression. These equations are usually very complicated but give us more flexibility and higher accuracy due to utilizing multiple variables in the same equation.
A polynomial can have infinite amounts of variables. The prefix poly means “many” and the suffix nomial means “variable.” This implies a polynomial can have 3 variables or many more. If there is one variable, this is still a polynomial.
Regression can be run with any number of variables, which is called multiple linear regression and multivariate polynomial regression. However, to avoid the curse of dimensionality, the number of variables should never be higher than the rows of data in the training set.
If you fit degree 2 polynomial in linear regression, you’ll have a wildly inaccurate model. This is because a linear relationship can not be accurately modeled by a quadratic equation (degree 2 polynomial). A linear regression assumed degree 1 during fitting.
]]>Getting the correct output shape starts with correctly defining the right input shape for your deep learning models.
If you mess this up, you’ll spend a ton of time googling around to figure out why your model will not run correctly.
The Keras input shape is a parameter for the input layer (InputLayer). You’ll use the input shape parameter to define a tensor for the first layer in your neural network. If your input is an array of n integers, then your input shape would be (n,).
When defining your input layer, you need to consider the specific Keras model you are building.
If your input data is an image and your model is a classification model, you’ll want to define the input shape by the number of pixels and channels.
For classification models, think about your dataset being constrained to some subset of values; for example, if you’re trying to predict on the MNIST dataset (Source), your classification model will try to put each image into a group between [0,9]
A 250×250 pixel image with three channels will be
input_shape=(250,250,3)
However, if the model you are building is more regression-focused, your shape will be much different.
Let’s say your input will be an array of 600 values; this means you’ll need to define your input shape a bit differently.
You will usually see an array of inputs in supervised learning, where you’re trying to find patterns in a dataset that lead you to a specific target column that you will predict (regression).
A 600-value array would look something like this.
input_shape=(600,)
Many people will try not to include the comma in the input_shape, but this comma is mandatory in Python.
This is because tensors are created from tuples, and without the comma, Python does not transform this into a tuple, making it impossible for the Input function to create the tuple. (Read More)
One of the most confusing aspects of the input shape when using Keras is understanding how batching works with this input tensor.
Along with batching, we get a ton of questions about how to know steps per epoch in Keras and we go over it in-depth in that linked article.
Since we’re defining only one instance of the training data, we may see None for the first dimension whenever we access our model during the training process.
In this Keras example
input_shape=(600,)
If you were to print this out in model.summary(), you would see this.
(None, 600)
This shape tuple responds with None in the first parameter due to the batch size.
Remember, each array was 600 values long, but during training, we will probably be passing in batches that have a similar structure.
If we were to pass in batches that have size 30, when checking our model.summary(), what we would see is
(30,600)
as we now have 30 tensors of 600 values (batch size).
Knowing how many dimensions is crucial for accurate modeling, and correctly orchestrating the next layer from the previous layer is how you create accurate models.
Some people will try defining the batch size in their models; however, this can prove problematic.
Allowing Keras to choose the batch size without user contributions will allow for a fluid input size, meaning the batch size can change at any time.
This is optimal and will allow flexibility in your sequential model and output shape.
In Keras, much of your modeling can be done with the Sequential parameter.
Some inputs you may need for this modeling tutorial
from tensorflow import keras
from tensorflow.keras import layers
Think of the sequential model as a one-way road, where the entrance will be your input layer, then go through some hidden layers to a single output layer.
The input tensor is fundamental; remember, we can define that in a couple of different ways.
In newer versions of TensorFlow (Keras integrated), you’ll see the layer input defined as the following, using Conv2D (dense layers require inputs also).
CNNModel.add(layers.Conv2D(32, (3, 3), activation=’elu’, input_shape=(32, 32, 3)))
In older versions, you’ll see the Input layer defined as we discussed earlier with something like
CNNModel.add(keras.Input(shape=(32, 32, 3)))
CNNModel.add(layers.Conv2D(32, 3, activation=”relu”))
Both of these will work the same way and have the same shape.
When using the model summary, you will be able to see the outline of the model.
If you do not define an input layer while defining your model, you will not be able to call the model summary method until your input shape is defined.
Using the model summary is one of the easiest ways to understand how your model will progress, as you’ll be able to see how the dense layers will be laid out, even if the layer is hidden.
In Keras, there is another type of modeling philosophy that you can use.
This is called the functional API and compared to the sequential model, it will allow for multiple inputs and outputs throughout the model.
Instead of having one input layer and one final output layer, you could have multiple input layers and multiple output layers.
This logic also follows for the different hidden layers within the model, as they can also have separate inputs and outputs.
Initially, this isn’t very clear but think of this as an ensemble method from regular machine learning.
Sometimes, you need a little more than just the training data to get to the accuracy or outcome you are looking for.
If you are having accuracy problems, Keras shuffle could help you figure it out.
Let’s say you wanted to classify images, but along with those images, you had some text input (like tags) that existed in a separate database.
While we know we can represent the tags in a one dimensional array, and we saw previously how we could classify images with a CNN, how would we use these together?
Understanding features during modeling is important. We wrote Keras Feature Importance to give a good intro so you could understand your models better.
What if we handled our tags on one side, our image on the other, and brought them together into a final softmax function for classification?
Instead of now just relying on the image data, utilizing the functional API gave us a bit more data to classify our images correctly.
This is a massive upgrade over our other sequential models, which could only handle images or the tags one at a time.
The Keras Model output shapes depend entirely on the units defined in the previous layer.
If your previous dense layer was defined as something like
input_shape(600,)
model.add(units=4, activation’……..’,input_shape=(600,)
You will quickly notice
Output Shape = (None, 4)
And we know from earlier that None signifies the batch size.
When building Keras models, you will quickly notice that your models will decrease in size as you move down throughout your model.
The reason for this is simply due to the nature of matrix multiplication.
For example
[Batch, 600] * [600, 4] = [Batch, 4] (output shape)
This brings us back to earlier, where we saw our (None, 4) output tensor.
]]>Deep learning can be tricky, but we have some APIs that help us create wonderful models that can quickly converge to a great solution.
The Keras API used for neural networks has risen in popularity for modeling with TensorFlow.
Keras Shuffle is easy to mess up and is essential for your success with modeling and data science.
What is Keras Shuffle?
Keras Shuffle is a modeling parameter asking you if you want to shuffle your training data before each epoch. This parameter should be set to false if your data is time-series and true anytime the training data points are independent.
A successful Model starts way before you start writing your code.
Understanding how you want to set up your batching and epochs is crucial for your model’s success.
We go over Keras Shuffle, the different parameters of Keras Shuffle, when you should set it to True or False, and how to get the best usage out of it below.
Messing up this model parameter will create an overfitting model that isn’t reproducible in the real world.
This is one of the last things we want as machine learning engineers, and we will show you how to avoid this.
In the most basic explanation, Keras Shuffle is a modeling parameter asking you if you want to shuffle your training data before each epoch.
To break this down a little further, if we have one dataset and the number of epochs is set to 5, it would use the whole dataset set 5 times.
Many will set shuffle=True, so your model does not see the training data in the same order for each epoch.
This can improve the model’s accuracy and potentially cover up some bias in your data.
Realize this does not shuffle the validation or test set, so reproducibility of each training epoch will be impossible; However, model runs can still be compared fairly as the validation set for each epoch will remain not shuffled and the same.
model.fit(x, y, batch_size=400, epochs=5, shuffle=True)
In the above line, the dataset will be used five times and split up into 400 chunks each time. Because shuffle=True, the data will be shuffled five different times for each epoch.
model.fit(x, y, batch_size=400, epochs=5, shuffle=False)
In the above line, the dataset will be used five times and split up into 400 chunks each time. Because shuffle=False, your data will be taken in sequential order for each of the five epochs.
More information about building out these models is in our article all about the Dense Layer
Anytime you are modeling with Keras, you are required to use Keras Shuffle.
You do not have a choice as it is a required parameter in the .fit method.
More information on another .fit parameter can be found here at steps per epoch keras.
Let’s go over a couple of different instances of where’d you would set Keras Shuffle to true and when you would want to set it to false.
If you’re doing any classification, you’re going to want to have shuffle set to true.
Also, if you are dealing with any independent data, you’re going to want to set shuffle to true.
This is because shuffling has been shown to reduce overfitting in sample scores. (Source)
Your goal is to have shuffle set to true, as this will improve your model.
Sometimes, you can not have shuffle set to true, which will cause your model not to shuffle data.
Most machine learning algorithms have an underlying assumption.
This assumption assumes that each instant or line of your data is independent of each other.
We cannot shuffle time-series data because the data are no longer independent from each other.
Think about the stock market; one of the most significant indicators of a stock’s current position is the previous one.
For that to be true, how could this current instance be independent of the last?
(They aren’t Independent, and stock market data is time-series)
Now think what would happen if you shuffle that data.
Let’s say your training data included the bolded highlighted data points and value t was put inside your testing set.
Time | Value |
t-2 | 36 |
t-1 | 42 |
t | x |
t+1 | 58 |
Quickly we see how unfair it is to possess both data from the past and the future in the training set as predictions are now caped between [42,58] for t.
We will see an incredibly high test accuracy when running our tests and validations.
However, once this model is deployed, the accuracy will quickly fall off.
Because in the real world, we will never possess t+1 time, as the future doesn’t exist (at least in Machine Learning), and we won’t have a data point on it.
Our testing accuracy will quickly plummet without this upper bound on the current prediction.
Starting your model correctly is how you have success during modeling, we will teach you how in Keras Input Shape.
Parameters of Keras Shuffle?
Remember from earlier that Keras Shuffle is either true or false.
Since this parameter cannot be omitted, you must provide the true or false parameter (or do not mention it at all).
Keras Shuffle is always set to true by default, so even if you forget to provide it, your data will automatically be shuffled during training.
This is a great question, but it is fundamentally wrong to compare them.
Keras Shuffle is an intra-training set decision, meaning that whatever you choose, this will only be applied to the training set.
This does not affect validation or test sets, and only the trained model will be different based on this parameter.
The trained model will be different because it will see either shuffled or non-shuffled data.
Understanding features during modeling is important. We wrote Keras Feature Importance to give a good intro so you could understand your models better.
Train Test Split is much more about separating training and testing data sets.
Whenever you apply train test split, you’re slicing your data into entirely different data sets.
These two work very well together.
And using train test split to create the training and validation sets for your deep learning model (Keras API) will enable you to test the accuracy during training quickly.
The same rules apply for shuffling during train test split as they do for Keras Shuffle.
Most of the time, you’ll want to shuffle while splitting your data, but if your data is not independent (time-series), you will not be able to shuffle at any part of your pipeline.
So the question is not Keras Shuffle or Train Test Split?
It’s More Keras Shuffle and Train Test Split?
Remember to import pandas, with the assumption of import pandas as pd
df = pd.read_csv(file)
You will now need to grab your target variable
y = df['target']
df = df.drop(['target'], axis=1)
Convert this to a tensor; that way, Keras Shuffle is available, and you may also need to convert your target variable.
df = tf.convert_to_tensor(df.values)
You will need to build out your model in a function (named define_some_model here) and define it with some name (we use model)
model = define_some_model()
Finally, call your model on your dataset and target variable.
model.fit(df, y, batch_size=400, epochs=5, shuffle=True)
Databricks runtime includes both TensorFlow and the Keras API. Databricks is a perfect pick for deep learning. Having access to distributed training will allow you to create deep learning models that wouldn’t be available on your computer due to resource limits.
Keras Softmax Loss is the perfect last layer of probabilistic models. This is because softmax will produce a vector K that will sum to 1, giving an output that indicates which output the model prefers. You will see this a lot in categorical models.
Pyspark and Keras are an incredible duo. Pyspark allows you access to distributed data, meaning you will have more data for modeling. Since Keras is an API that sits on TensorFlow, and deep learning networks are known for doing best with high quantities of data, combining these two is very harmonious.
]]>
Knowing how to put your text into specific topics is crucial to understanding the different genres within your text.
Utilizing topic modeling also allows you to quickly ingest ideas from your text no matter the size of the text in your corpus.
In the beginning, we briefly introduce topic models, then a deep dive into what automatic labeling is and why you need it compared to previous approaches.
At the bottom is complete python code on automatic labeling.
Topic Modeling is a genre of techniques used to identify latent themes in a corpus.
In non-machine learning language, it’s strategies used to theme enormous groups of text data together that would be hard (or impossible) to do yourself.
Topic modeling first showed its head as Latent Dirichlet Allocation (LDA) from David Blei. (Paper)
David Bleh was interested in determining if a machine learning algorithm could be trained (using some forms of Bayesian Learning) to detect themes in different scientific abstracts.
A powerful text derivation from the LDA model was the idea of using mixed models on clusters to prevent accuracy loss.
Think about it this way; this research effort allowed the word vectors to belong to multiple categories and not just a single topic.
For example, is a dog a pet or an animal? The answer is probably both.
You lose a lot of information when you programmatically force different topics into specific categories.
This is where the idea of using the Mixed Membership Model with the corpus began.
Every document will have a little resemblance to each topic, allowing us to think of each document as a mixture of topics.
LDA model will return the top n words for each topic (k). These top words would now be classified into multiple topic labels with their respective topic distribution.
When your topic model returns the labels, you’ll want to see if the topic labeling answers make sense.
Do the responses for a given topic show any form of resemblance?
You’re probably done if you seem to have a topic distribution that makes a ton of sense.
If there seem to be multiple clusters based on one topic, you probably should reduce k, the number of topics.
If you have some clusters that feel like they’re being forced together, you’ll want to increase k to give the topic model the chance to spread these clusters out.
We quickly start to see a problem with the Latent Dirichlet Allocation Model, where we pick random K values and almost create the model we’re looking for.
In this scenario, we’re prone to overfitting. Altering our dataset and model until we arrive at the desired outcome.
This leads us to need a way for the model to decide what is the best label candidate for our process.
If we are allowed constantly edit what topics manually appear, we create a situation where results are not repeatable, and outputs are up to debate.
For example, if your colleague thinks that the words “[pirate, ship, cruise ship]” shouldn’t belong to the same topic, but you do – who is right?
Now that we know that some other techniques are not the answer, we dive deeper into different NLP algorithms.
This leads us to BERT (Bidirectional Encoder Representations from Transformers) (Source)
BERT works by converting the words in sentences to word vectors.
BERT improved on the previous ideas of Word2Vec, where it used a fixed embedding word vector.
The problem with Words2Vec is easy to understand from the words in the sentences below.
Watch out up there and take a left
Becareful out there, we almost got left
While both of these sentences structurally are very similar and sentiment-wise pretty identically, semantically, these sentences are very far apart.
One is about an event (turning left), and the other is about being left behind.
The word “left” in these sentences does not even mean the same thing.
So what do we do?
With contextualized embeddings, the word embeddings for “left” in these sentences will differ.
Since BERT is bidirectional, it reads all the words in a sentence at once and creates sentence vectors along with the individual word vectors.
Let’s see how BERT would categorize some text data.
We have a lot of additional machine learning python tutorials built just like this one.
This will help you get a more in-depth understanding of machine learning techniques and the different ways you can implement these algorithms in python.
Links to those articles are below:
#Automatic Labeling in Python
# installs
#!pip install bertopic[visualization]
#!pip install numpy
#!pip install pandas
# import what we need the for the analysis
import numpy as np
import pandas as pd
from copy import deepcopy
from bertopic import BERTopic
import random
# we will need some data for this, i'm using pg10 from here
#https://www.kaggle.com/datasets/tentotheminus9/religious-and-philosophical-texts?resource=download
data = pd.read_csv('pg10.txt', delimiter='\t')
print(data.sample(n=15))
# we can clean these up a bit
# convert our dataframe into an array of text
corpus = data.iloc[:,0].values
# some light cleaning on the text for better outputs
final_corpus = []
# for each line in the corpus
for val in corpus:
# if the line is longer than 10 characters
if len(val) > 10:
# lets rebuild it with clean text
cleaning_array = ''
# for each character in that line of text
for character in val:
# if its alnum or a space
if character.isalpha() or character == ' ':
# rebuild the sentence in lowercase
cleaning_array += character.lower()
# add it to our final corpus
final_corpus.append(cleaning_array)
# 71533 lines to put into topics
print(final_corpus[25:40], len(final_corpus))
# initialize our BERTopic Model (instance of class)
model = BERTopic(language='english')
# lets extract the most different topics from a random 6000 line sample
#randomize the list
random.shuffle(final_corpus)
all_topics, all_probs = model.fit_transform(final_corpus[2000:8000])
# lets see the top frequencies 97 total
model.get_topic_freq()
# we see our -1 topic, where many words are stuffed (stop words)
model.get_topic(-1)
# see the contents of topic 8 etc..
model.get_topic(8)
# visual of our topics
model.visualize_topics()
Now all of our text is grouped into a category, spatially laid out to understand how these terms interact in some subspace.
]]>
Messing up steps_per_epoch while modeling with the .fit method in Keras can create a ton of problems.
This guide will show you what steps_per_epoch does, how to figure out the correct number of steps, and what happens if you choose steps_per_epoch wrong.
The best way to set steps per epoch in Keras is by monitoring your computer memory and validation scores. If your computer runs out of memory during training, increase the steps_per_epoch parameter. If your training score is high, but your validation score is low, you’ll want to decrease steps_per_epoch as you are overfitting.
Remember, in machine learning, an epoch is one forward pass and backward pass of all the available training data.
If you have a dataset with 2500 lines, once all 2500 lines have been through your neural network’s forward and backward pass, this will count as an epoch.
Continuing with our previous example, we still have 2500 lines in our dataset.
However, what happens if your computer cannot load 2500 lines into memory to train on?
This data needs to be split up.
We will need to split up our dataset into smaller chunks; that way, we can process our input.
This would work for us since our computer could handle the 1250 lines and using a higher number of batches than one will allow us to alter the weights twice instead of once.
Understanding features during modeling is important. We wrote Keras Feature Importance to give a good intro so you could understand your models better.
The above process worked great, but what if we don’t always know the size of our training data?
For example, if one epoch is 3000 lines, the next epoch is 3103 lines, and the third epoch is 3050 lines.
These extra 100 lines probably wouldn’t matter much for our model or computer memory, but how would you know what to set the batch_size to?
You could write a function or try some default argument to test if you could figure it out, but it would probably be a waste of development time.
Let’s continue with our example above, where we had one epoch is 3000 lines, the next epoch is 3103 lines, and the third epoch is 3050 lines.
We have a general idea of the max capacity our training data can be in each batch size, but it would be hard to know if it should be 1500 or 1525.
This is where steps_per_epoch comes in.
Instead of our picture above, what if we just set steps_per_epoch = 2?
Setting steps_per_epoch within your model allows you to handle any situation where the number of samples in your epoch is different.
We can see that the number of batches has not changed, and even though the batch size has gone up and down, we will still have the same batch size per epoch.
Now that we’ve reviewed steps per epoch, how does this affect our model outcomes?
During training, you’ll want to make sure that method is correct, and one way to test this while you train your model is by using validation data.
Continuing with our example above, we know we want to increase the iterations of each epoch, as it allows us to increase the amount of time that we update the weights in our neural network.
Many newcomers will then ask why don’t we just set steps per epoch equal to the amount of data in our epoch?
This is a great thought, but doing this while training will result in some horrible accuracy problems.
Continuing on the thought above, why can’t we just set the batch size = 1, or steps per epoch equal to the amount of data in the epoch?
Doing this will result in a flawed model, as this model is trained incorrectly and will be overfitting the training data. (Read More)
So, we know we need multiple batches, but we can’t set the number of batches equal to the amount of data.
How do we know how big our batch size is supposed to be?
This is where we use our validation data.
Modeling in machine learning is an iterative process, and very rarely will you get it right on the first try.
One of the keys to modeling correctly is the different layers. Dense Layer is fundamental in machine learning and is something you should probably check out.
It doesn’t matter how long you’ve been at this or how much knowledge you have on the topic; machine learning is (literally) about trial and error.
The best way to find the steps_per_epoch hyperparameter is by testing.
I like to start with a value at around ten and go up or down based on the size of my training data and the accuracy of my validation dataset.
For example, if you start modeling and quickly run out of memory, you need to increase your steps per epoch. This will lower the amount of data being pushed into memory and will (theoretically) allow you to continue modeling.
Starting your model correctly is how you have success during modeling, we will teach you how in Keras Input Shape.
But remember, as we increase this number, we be susceptible to overfitting the training data.
In Keras, there is another parameter called validation_split.
Some others, like Keras Shuffle, are also super important for modeling accuracy.
This value is a decimal value that will tell your Keras model how much of the data to leave out to test against.
Your model will have never seen this data before, and after each epoch, Keras will test your trained model against this validation data.
Let’s say that we set validation_split = .2; this will hold out 20% of the data from our training.
While modeling, we expect our training accuracy and validation accuracy to be pretty close. (Understanding accuracy)
What happens when our training accuracy is much higher than our scores on the validation data?
This sadly means we have overfit on our data and need to make changes in the model to combat this.
To make things clear, I do want to say that there are many reasons why you can overfit a model, and steps_per_epoch is just one of them.
But one of the first things that I do if I am overfitting during modeling is reduce the number of steps_per_epoch that I previously set.
This will increase the amount of data in each iteration, hopefully keeping the model from overlearning some of the noise in our data.
]]>While the most straightforward layer, the dense layer is still vital in any neural network design and is one of the most commonly used layers.
Below we will be breaking down the output generated from a dense layer, input arrays, and the difference between a dense layer versus some other layers.
Layers are made of nodes, and the nodes provide an environment to perform computations on data.
In simpler terms, think of a neural network as a stadium, a layer as a row of seats in a stadium, and a node as each seat.
A node combines the inputs of a data set with a weighted coefficient which either increases or dampens inputs.
These rows and seats work together to get us to the final output layer, which will contain our final answers (based on how we defined the previous layer)
Keras is a Python API that runs on top of the Machine Learning Platform Tensorflow.
Keras enables users to add several prebuilt layers in different Neural network architectures.
When TensorFlow was initially released, it was pretty challenging to use.
Learning any Machine Learning framework will not be easy, and there will always be a learning curve, but early TensorFlow was pretty low-level and took a ton of time to learn.
Keras is a python library that builds on the top of TensorFlow, which has a user-friendly interface, faster production deployment, and faster initial development of machine learning models.
Using Keras makes the overall experience of TensorFlow easier.
Realize, before 2017, Keras was only a stand-alone API.
Now, TensorFlow has fully integrated Keras, but you can still use the Keras API by itself, and the stand-alone API usually is more up-to-date with newer features.
Understanding features during modeling is important. We wrote Keras Feature Importance to give a good intro so you could understand your models better.
Keras Layers are the building blocks of the whole API.
We will stack these layers together to create our models, but you could also have a single dense layer that acts as something as simple as a linear regression model or multiple dense layers (with a hidden layer) to create a neural network.
Changing one of the layers in a neural network will change the results in the final output arrays.
The core layers within the Keras API are
The Dense Layer is the most commonly used, and there is some slight overlap in these Keras layers.
For example, a parameter passed within a dense layer can be the activation function, or you can pass an activation function as a layer in a sequential model.
In future posts, we will be going more in-depth into activation functions and other deep learning model features. More information on modeling can be found here at steps per epoch keras.
The dense layer performs the following calculation
outputs = activation(dot(input, kernel) + bias)
Let’s break this down a bit (from the inside out).
Your input data passed will be as a matrix into your dense layer.
If your input data, for example, was a data frame with m rows and n columns.
Your matrix will be the same m rows and n columns, just lacking column identifiers.
We go over input data in-depth and much more about Keras in our other post, Keras Shuffle.
Each Kernel weight matrix is specific to that dense layer and node (think about row number and seat number).
The kernel weights matrix is the heart of the neural network; as the data progresses from dense to dense layers, these weights will be updated based on backpropagation. (Learn more)
The Kernel weights matrix is updated after every run, and the new weights matrix created will contain new weights to multiply the input data by.
The weight matrix is crucial to understand. Many newcomers to machine learning have trouble understanding the vector shape needed to do the dot product between the input data and weight matrix.
The output size of the dot product between the input and kernel will be a single scalar value.
This throws some people off who are expecting another matrix from the dot product and are unfamiliar with the differences. (See the difference).
The value received from this dot product of the Input and Kernel is the value that will be passed onward in your neural network before applying any bias to it.
To understand the bias vector, let’s go back to one of the most simple fundamentals of mathematics.
y = mx + B
Now, I know it isn’t talked about a bunch, but that B term is the bias of a line.
Understanding bias’s effect is simpler when you can see it in action.
In our first picture, even though the line is the same, our line never went through (2,2), and if our function we’re trying to predict value (2,2), it wouldn’t be possible.
However, once we added bias, our function went right through the point (2,2) and would give us that exact prediction with input x = 2.
Now, a bias vector is this same logic; just instead of one vector term, there is n number of vector terms, where n is the size of your vector.
Here is a list of the different dense layer activation functions
We know how the inside of our dense layer formula works; the last part is the activation function.
Remember, after our bias is applied, we will have a vector.
So, we have
outputs = activation(vector)
Where our activation can be anything chosen above, we will select the relu activation function for this.
The relu function will take each value in the vector and keep it if it’s above zero or replace it with zero if not.
For example, input vector = [-1,2,-4,2,4] (after out dot product and applying our bias vector)
Starting your model correctly is how you have success during modeling, we will teach you how in Keras Input Shape.
Will become output vector = [0,2,0,2,4] with the same output shape.
Dense layers are always hidden because a neural network will be initialized with an input layer, and the outputs will come from an output layer. The dense layers in the middle will not be accessible and hidden.
A fully connected layer has weights connected to all the output values from the previous layer, while a hidden layer is just a layer that is not the input or output layers. A fully connected layer can be a hidden layer, but these two can also exist separately.
A densely connected layer is another word for a dense layer. A dense layer is densely connected to the output layer before it, whether an input layer or another dense layer.
]]>You might be asking yourself, “Should I use a recruiter software engineer?” We’re here to guide you through this critical decision-making process.
Feeling overstimulated by the endless resumes, technical jargon, and candidate assessments? It’s not only you. The pain of sifting through countless applications only to find mismatched skills can be frustrating. Let us help alleviate your recruitment woes and streamline the process for you.
With years of experience in the tech industry, we’ve seen the tough difficulties that come with hiring software engineers firsthand. Trust our skill to provide you with useful ideas and recommendations on whether using a recruiter software engineer is the right move for your company. Let’s plunge into this recruitment voyage hand-in-hand and find the best talent for your team.
When considering whether to use a recruiter in software engineering, it’s critical to understand the significant role they play in the hiring process. Recruiters specialize in sourcing, screening, and selecting top talent for companies, aligning candidates’ skills with the organization’s needs. Here are key points to consider:
Recruiter software engineers can streamline the recruitment process, reducing the burden on internal teams and increasing the chances of finding top talent efficiently.
To investigate more into the role of a recruiter in software engineering, you can refer to this insightful article on The Balance Careers.
When considering hiring a software engineer, using a recruiter can offer numerous advantages.
Here are some key benefits of using a recruiter for your software engineering hiring needs:
When balancing the benefits of using a recruiter for hiring software engineers, it becomes clear that their skill and resources can significantly improve your recruitment process.
When contemplating whether to use a recruiter for hiring software engineers, there are several considerations to keep in mind:
Before making a decision, it’s critical to carefully weigh these factors and determine if using a recruiter fits your hiring objectives and budget constraints.
After all, making an informed choice ensures a successful recruitment process.
For more ideas on the considerations before using recruiter services, you can visit Recruiter.com For additional information and tips.
Feel confident in your decision-making by arming yourself with the necessary knowledge and considerations before enlisting the services of a recruiter.
When working with a recruiter to hire software engineers, there are some best practices that can improve the hiring process.
Here are a few tips to optimize collaboration and achieve successful outcomes:
By following these best practices, we can strengthen collaboration with recruiters and improve the recruitment of software engineers effectively.
For further ideas on optimizing recruitment strategies, we recommend visiting Recruiter.com For useful resources and tips.
When considering whether to use a recruiter for software engineer hiring, evaluating the Return on Investment (ROI) is critical.
Recruiter services can be a useful asset in finding top talent efficiently, but it’s super important to weigh the costs against the benefits.
Here are some key factors to consider when assessing the ROI of engaging a recruiter for software engineer recruitment:
When considering the ROI of recruiter services for software engineer hiring, thinking about these factors can help us make smart decisionss that align with our hiring goals and budget.
For more information on optimizing recruitment strategies, visit Recruiter.com.
]]>We understand the hustle of balancing a full-time job with the desire for additional earnings.
The good news is that there are lucrative opportunities waiting for you to investigate.
Feeling the pinch of financial constraints even though your skill in software development? We get it. The struggle to make ends meet while honing your craft can be real. But fret not, as we’ve got ideas that can turn your spare time into a profitable venture. Our skill in the tech industry positions us to guide you through the process of using your coding prowess for extra income.
Think having the freedom to earn more on your own terms, without compromising your current job. As fellow software engineers, we know the value of maximizing our skills outside the 9-5 grind. Join us as we investigate practical strategies adjusted to your needs, paving the way for a successful side hustle.
When it comes to making extra money on the side as a software engineer, Exploring Freelancing Opportunities can be a lucrative avenue. Freelancing allows us to use our technical skills to work on projects outside our full-time job. Platforms like Upwork and Freelancer connect us with clients seeking programming assistance, giving a flexible way to earn additional income.
As software engineers, we can showcase our skill in various areas such as web development, mobile app development, or software testing on these platforms.
By setting up a strong profile highlighting our skills and past projects, we can attract potential clients and secure freelance gigs.
It’s critical to deliver high-quality work consistently to build a good reputation and gain repeat business in the freelancing world.
Also, freelancing allows us to choose projects that align with our interests and schedule.
We have the freedom to negotiate rates based on the complexity of the task and our time commitment.
This flexibility enables us to manage our workload effectively, balancing our full-time job with side projects for a steady income stream.
By tapping into freelancing opportunities, we can monetize our coding skills and expand our professional network while maximizing our earning potential.
It’s a rewarding way for us as software engineers to turn our spare time into a profitable venture without added stress.
After all, consistency is key in building a successful freelancing career.
With dedication and a strategic approach, we can unpack exciting opportunities past our regular job.
When looking to make extra income, Building and Selling Software Products can be a lucrative venture for software engineers.
Developing software products that solve specific problems or meet the needs of a target audience can generate passive income over time.
By identifying niche markets or gaps in existing software solutions, we can create products with high demand and profit potential.
Platforms like Amazon Web Services offer cost-effective hosting solutions for launching and scaling software products.
To increase visibility and drive sales, we can use social media platforms and digital marketing strategies to promote our software products.
Using search engine optimization (SEO) techniques can improve the solve outability of our products online.
Also, we can investigate partnerships with influencers or collaborate with other developers to reach a wider audience.
Also, platforms like Product Hunt provide a space to showcase our software products to a tech-smart community and gather feedback for improvements.
By continuously iterating on our products based on user suggestions and market trends, we can improve the value proposition and attract more customers.
After all, Building and Selling Software Products requires dedication and continuous effort to ensure the success and profitability of our creations.
When looking to make extra money on the side, Teaching Coding or Tech Courses can be a lucrative option for software engineers.
Sharing your skill with others not only helps them learn useful skills but also allows you to earn income in the process.
You can offer online courses on platforms like Udemy or Coursera, or even conduct in-person workshops and training sessions in your local area.
By teaching coding or tech courses, you can establish yourself as an authority in your field and build credibility within the tech community.
It also provides an opportunity to network with like-minded individuals and stay updated on the latest industry trends.
Also, teaching can be a rewarding experience, as you witness your students’ growth and success.
Consider creating specialized courses catering to specific tech skills or in-demand programming languages.
Promote your courses through social media channels, tech forums, and your personal network to reach a wider audience.
Don’t forget to ask for feedback from your students to continually improve your courses and improve the learning experience.
Taking the initiative to teach coding or tech courses not only helps you explorersify your income streams but also allows you to give back to the tech community by enabling others with useful knowledge and skills.
Start sharing your skill today!
Engaging in hackathons and coding competitions is not simply a thrilling activity but also a fantastic way for software engineers to earn extra income.
These events challenge us to solve real-world problems under tight deadlines, showcasing our skills and creativity.
Winning a hackathon can lead to cash prizes, job offers, or even funding for our own projects.
Participating in these events not only boosts our technical skills but also expands our professional network.
We get the chance to collaborate with like-minded individuals, learn from others, and showcase our problem-solving abilities.
Also, some companies use hackathons as recruitment opportunities, giving participants the chance to secure freelance projects or full-time positions.
By actively engaging in hackathons and competitions, we not only improve our income opportunities but also gain useful experience, recognition, and connections in the tech industry.
So, keep an eye out for upcoming hackathons in your area and online platforms and get ready to take on exciting tough difficulties!
For more information on upcoming hackathons and coding competitions, check out this hackathon calendar.
When looking to make extra money on the side as a software engineer, affiliate marketing in the tech niche can be a lucrative avenue to investigate.
By promoting products or services relevant to the tech industry through only affiliate links, we can earn commissions for every sale or lead generated.
To excel in affiliate marketing, it’s super important to select products that align with our skill and audience’s interests.
By using our knowledge in the tech field, we can effectively market products to our network.
Also, creating high-quality content such as reviews, tutorials, or comparisons can attract more clicks and conversions.
Building a strong online presence through blogs, social media, or YouTube channels can significantly improve our affiliate marketing success.
Engaging with our audience and providing useful ideas can help establish credibility and trust, leading to higher conversion rates.
Also, joining reputable affiliate programs like Amazon Associates, ClickBank, or Shareable can provide access to a abundance of tech products to promote.
Tracking performance metrics and optimizing strategies based on data analysis is critical to maximizing earnings in affiliate marketing.
It’s critical to stay updated on trends in the tech industry to identify new opportunities for affiliate partnerships and higher income potential.
For more information on successful affiliate marketing strategies, you can check out this informative guide on affiliate marketing tips.
]]>