This blog post will discuss the final portion of our first project in our Dataset2API series, where we will build a working API.
This project focuses on creating a salary API that will allow users to query for average salaries based on job titles, locations, years of experience, and other attributes.
We will use Python and Pandas for data analysis and manipulation, Sklearn and XGBoost for modeling, and Flask (Restful) for web development.
Complete code is available here in this GitHub repo, but each post will walk through the process I personally go through to take a dataset and a problem to an API.
This is part four of Project 1; you can view the other posts in this project here:
Building Our Data Science Salary API
Now that everything is in order, we can finally start building our API.
Before we start, here is the complete code in case you’re interested in copying a piece of the code.
We will break down each of the sections below.
from flask import Flask, jsonify, request, make_response
from flask_restful import Resource, Api
import joblib
import pandas as pd
# load in our model
loaded_model = joblib.load('finalized_model.sav')
# load in our encoder
encoder = joblib.load('OHEencoder.sav')
# creating the flask app
app = Flask(__name__)
# creating an API object
api = Api(app)
class Prediction(Resource):
def get(self):
'''
returns the structure of the request needed for our
post request
'''
# tell how to make requests to our API (during posts)
return make_response(jsonify({
'experience_level': ['SE','MI','EN','EX'],
'employment_type' : ['FT','PT','CT','FL'],
'company_size': ['S', 'M', 'L'],
'role' : ['**Job Title**'],
'residence' : ['2 Syl Country Code (US, GB) etc'],
'remote%' : ['0','50','100']
}), 201)
def post(self):
'''
retrieves payload
sends back model prediction
'''
# grab the payload data sent
data = request.get_json()
# make sure we have all of our columns
if 'experience_level' not in data \
or 'employment_type' not in data \
or 'company_size' not in data \
or 'role' not in data \
or 'residence' not in data \
or 'remote%' not in data:
return make_response(jsonify({'message' : 'Missing A Category'}), 400)
# convert the roles
# the exact same way we did
# in trainning
def convertJob(text):
'''
converts job titles to form model can understand
'''
if 'lead' in text.lower() or 'manager' in text.lower() or 'director' in text.lower() or 'head' in text.lower():
return 'LDR'
elif 'machine' in text.lower() or 'ai ' in text.lower() or 'vision' in text.lower():
return 'ML'
if 'scientist' in text.lower() or 'analytics' in text.lower() or 'science' in text.lower():
return 'DS'
if 'analyst' in text.lower():
return 'AL'
if 'engineer' in text.lower():
return 'DE'
return 'OTHR_ROLE'
# convert residence
# the exact same way we did
# in trainning
def convertResidence(text):
'''
converts user input of residence so model can understand
'''
if len(text) != 2:
return 'OTHER_RES'
approved = ['US','GB','IN','CA','DE','FR','ES','GR','JP']
if text.upper() in approved:
return text
return 'OTHR_RES'
# convert remote work
# the exact same way we did
# in trainning
def ConvertRemote(percentage):
if int(percentage) > 50:
return 'Remote'
if int(percentage) < 50:
return 'Office'
return 'Hybrid'
# build out a prediction dictionary, using our functions
# that we used during trainning
user_dict = {
'experience_level': data['experience_level'],
'employment_type' : data['employment_type'],
'company_size' : data['company_size'],
'roles_converted' : convertJob(data['role']),
'residence_converted' : convertResidence(data['residence']),
'remote_converted' : ConvertRemote(data['remote%'])
}
# convert our dictoinary to a dataframe
df = pd.DataFrame([user_dict])
# use our encoder from trainning
encoded_df = pd.DataFrame(encoder.transform(df).toarray())
# now use our model from trainning for a prediction
pred = loaded_model.predict(encoded_df)
# return our prediction in a JSON
return make_response(jsonify({'prediction' : str(pred[0])}), 201)
api.add_resource(Prediction, '/pred')
# driver function
if __name__ == '__main__':
## load in model on start
app.run(debug = True)
Correct Python Packages
You’ll want to ensure you install a virtual environment in the project directory for the packages.
Whatever packages you use for training, use the identical versions on your APIs.
This will help avoid any weird interactions.
The first part of our API, our packages, are here:
from flask import Flask, jsonify, request, make_response
from flask_restful import Resource, Api
import joblib
import pandas as pd
We use a very lean setup for this simple API
Loading in our Model and Encoder, And Starting The API
You’ll have to find a way to add your encoder and model (from training) into the same folder in which you’re running your API.
Usually, the easiest way to do this is through a GitHub repo.
Also, flask APIs are always started with the two commands listed.
# load in our model
loaded_model = joblib.load('finalized_model.sav')
# load in our encoder
encoder = joblib.load('OHEencoder.sav')
# creating the flask app
app = Flask(__name__)
# creating an API object
api = Api(app)
Rest API Get Request
One thing I like to do for my models is creating a get request that holds the “structure” for our post request.
This simple function lets users quickly understand what data they need to provide to our post request.
Here is the example script.
def get(self):
'''
returns the structure of the request needed for our
post request
'''
# tell how to make requests to our API (during posts)
return make_response(jsonify({
'experience_level': ['SE','MI','EN','EX'],
'employment_type' : ['FT','PT','CT','FL'],
'company_size': ['S', 'M', 'L'],
'role' : ['**Job Title**'],
'residence' : ['2 Syl Country Code (US, GB) etc'],
'remote%' : ['0','50','100']
}), 201)
If a user pings our get request, they’ll immediately know how to talk with our API
Restful API Post Request
Next, we have our post request.
This will be the main piece of our project.
When the user sends the correct data in the payload, our model will return a prediction.
The API starts with some payload checking to ensure all the correct data is passed.
def post(self):
'''
retrieves payload
sends back model prediction
'''
# grab the payload data sent
data = request.get_json()
# make sure we have all of our columns
if 'experience_level' not in data \
or 'employment_type' not in data \
or 'company_size' not in data \
or 'role' not in data \
or 'residence' not in data \
or 'remote%' not in data:
return make_response(jsonify({'message' : 'Missing A Category'}), 400)
Once this if statement is passed, we continue on in our API.
We use the same functions as before to convert our data to a form our model can read.
If you’ve been following along, you’ll recognize the functions below.
# convert the roles
# the exact same way we did
# in trainning
def convertJob(text):
'''
converts job titles to form model can understand
'''
if 'lead' in text.lower() or 'manager' in text.lower() or 'director' in text.lower() or 'head' in text.lower():
return 'LDR'
elif 'machine' in text.lower() or 'ai ' in text.lower() or 'vision' in text.lower():
return 'ML'
if 'scientist' in text.lower() or 'analytics' in text.lower() or 'science' in text.lower():
return 'DS'
if 'analyst' in text.lower():
return 'AL'
if 'engineer' in text.lower():
return 'DE'
return 'OTHR_ROLE'
# convert residence
# the exact same way we did
# in trainning
def convertResidence(text):
'''
converts user input of residence so model can understand
'''
if len(text) != 2:
return 'OTHER_RES'
approved = ['US','GB','IN','CA','DE','FR','ES','GR','JP']
if text.upper() in approved:
return text
return 'OTHR_RES'
# convert remote work
# the exact same way we did
# in training
def ConvertRemote(percentage):
if int(percentage) > 50:
return 'Remote'
if int(percentage) < 50:
return 'Office'
return 'Hybrid'
Now that we have these functions set up and ready to be used, we call them from our dictionary object to build out a prediction array.
# build out a prediction dictionary, using our functions
# that we used during trainning
user_dict = {
'experience_level': data['experience_level'],
'employment_type' : data['employment_type'],
'company_size' : data['company_size'],
'roles_converted' : convertJob(data['role']),
'residence_converted' : convertResidence(data['residence']),
'remote_converted' : ConvertRemote(data['remote%'])
}
We use the payload data (stored in a dictionary called data from above) to either use it directly or send it to a function for cleaning.
Finally, we build out a data frame, use our encoder from training to build our array with the same columns as during training, get a prediction, and return it to the user.
# convert our dictoinary to a dataframe
df = pd.DataFrame([user_dict])
# use our encoder from trainning
encoded_df = pd.DataFrame(encoder.transform(df).toarray())
# now use our model from trainning for a prediction
pred = loaded_model.predict(encoded_df)
# return our prediction in a JSON
return make_response(jsonify({'prediction' : str(pred[0])}), 201)
While this is the “meat” of the project, the rest involves defining the API and creating a deployment run.
api.add_resource(Prediction, '/pred')
# driver function
if __name__ == '__main__':
## load in model on start
app.run(debug = True)
Using Python Requests Module To Test Our API
You can’t call an API finished without testing!
Here is our test script:
import requests
import json
def run_test():
# test get
_test_get = requests.get('http://127.0.0.1:5000/pred')
# print our json
print(_test_get.json())
print('\n\n\n\n')
# test post
payload = {
'experience_level': 'EX',
'employment_type' : 'PT',
'company_size': 'S',
'role' : 'Financial Analyst',
'residence' : 'US',
'remote%' : '0'
}
r = requests.post('http://127.0.0.1:5000/pred', json=payload)
print(r.json(), '\n\n\n')
if __name__ == '__main__':
# example tests
run_test()
For our first test, let’s see how much a financial analyst gets paid that is entirely remote.
We send this data within our payload:
payload = {
'experience_level': 'EX',
'employment_type' : 'PT',
'company_size': 'S',
'role' : 'Financial Analyst',
'residence' : 'US',
'remote%' : '100'
}
Quite low.
Let’s see how a full-time remote machine-learning engineer does
payload = {
'experience_level': 'EX',
'employment_type' : 'FT',
'company_size': 'L',
'role' : 'Machine Learning Engineer',
'residence' : 'US',
'remote%' : '100'
}
Machine learning engineers are paid much closer to the top of the distribution!
Now, wouldn’t it be nice to know what numbers those normalized values correlate to?
It’s pretty straightforward; head over to our guide reverse standardization, and we’ll show you how to change those numbers back into salary data!
Next Steps
This is, sadly, the last post in this project.
If you were interested in a previous section, you could check those out here:
If you’re interested in doing more for this API, try and add security (authorization and authentication) and deploy it!
- Unlocking the Formula for Two-Way ANOVA [Master Data Interpretation] - December 2, 2024
- How much does LinkedIn Senior Software Engineer make in Sunnyvale? [Top Salary Negotiation Strategies Revealed] - December 2, 2024
- What Do Custom Software Development Companies Do? [Unlock the Secrets] - December 1, 2024