{"id":1120,"date":"2024-02-22T12:46:09","date_gmt":"2024-02-22T12:46:09","guid":{"rendered":"https:\/\/enjoymachinelearning.com\/?p=1120"},"modified":"2024-02-22T12:46:09","modified_gmt":"2024-02-22T12:46:09","slug":"criterion-vs-predictor","status":"publish","type":"post","link":"https:\/\/enjoymachinelearning.com\/blog\/criterion-vs-predictor\/","title":{"rendered":"Machine Learning 101: Criterion vs Predictor (With Coded Examples)"},"content":{"rendered":"
In data science, there are many different ways to slice the pie. While many refer to independent and dependent variables differently, they usually mean the same thing.<\/span><\/p>\n Your predictor variables are your independent variables, and with these, you’ll (hopefully) be able to predict your criterion variable (dependent variable).<\/span><\/p>\n In the rest of this 3-minute guide<\/strong>, we’ll go over a deep-dive into criterion vs. predictor variables<\/strong>, what each of these means, <\/strong>and supply you with some code <\/strong>at the bottom to show you how to split each of these out in the python coding language<\/strong>.<\/span><\/p>\n This one is embarrassing to mess up, but don’t worry; we’ve got your back.<\/span><\/p>\n Simply, a criterion variable is a variable we’re trying to predict. Many machine learning projects refer to this as Y or as our target variable<\/a>. <\/span><\/p>\n The best way to identify your criterion variable is to identify the variable that you care about.<\/span><\/p>\n In a business context, this will be the variable that most closely resembles the problem you’re trying to solve. <\/span><\/p>\n For example, if your boss wants to build a model that can predict future sales of your company’s product, the criterion variable is the variable that most closely resembles sales.<\/span><\/p>\n It’s worth noting that a criterion variable could be a relationship of multiple variables (Or something much more complicated<\/a>). <\/span><\/p>\n Let’s say your boss wants you to do a research study for your company and needs you to look at the average amount spent per stock in the last six days.<\/span><\/p>\n You open your data and have these variables:<\/span><\/p>\n This new variable will help us explain our solution, and we can describe the correlation<\/a> and relationship between the other predictor variables in-depth.<\/span><\/p>\n A predictor variable is a variable used to predict another variable’s value, and these values can be utilized in both classification and regression. In most machine learning and statistics projects, there will be many predictor variables, as accuracy usually increases with more data.<\/span><\/p>\n For example, if you want to predict the price of a house, the predictor variables might be the size of the house, the number of bedrooms, the number of bathrooms, and the location.<\/span><\/p>\n <\/p>\n Once our target (criterion variable<\/strong>) is split from our independent variables (predictor variables<\/strong>), many data scientists will refer to this batch of predictor variables as just variables.<\/span><\/p>\n The main difference between a criterion variable and a predictor variable is that a predictor variable is used to find the values of the criterion variable. While there can be many predictor variables in a project, there is usually only a single criterion variable.<\/span><\/p>\n One of the most critical steps in any project design or machine learning project is understanding the business context and how that relates to selecting the correct criterion variable.<\/span><\/p>\n Since we now know that predictor variables are variables used to predict the value of a criterion variable, we can discuss the different types of predictor variables that exist.<\/span><\/p>\n There are two main types of predictor variables: categorical and quantitative.<\/span><\/p>\n Categorical predictor variables are those that can be divided into groups or categories. For example, a categorical predictor variable could be color, with the categories: red, blue, and green.<\/span><\/p>\n Quantitative predictor variables are those that can be quantified or measured. For example, a quantitative predictor variable could be age, with the values being the ages of different individuals.<\/span><\/p>\n Your boss wants you to build the most accurate model you can to predict what someone’s salary (in dollars) will be.<\/span><\/p>\n Your boss has provided you with the dataset below:<\/span><\/p>\n You quickly notice that a good criterion variable would be salary_in_usd, and you split that out from the rest of the data.<\/span><\/p>\n Our Criterion Variable:<\/span><\/p>\n Our Predictor Variables (Mix of Quantitative and Categorical)<\/span><\/p>\n Now that you have your datasets, you’re ready to start modeling!<\/span><\/p>\n Criterion variables do not exist in unsupervised learning. Since unsupervised<\/a> learning does not have labeled data, we do not have a dependent variable (criterion variable). These projects will only have predictor variables that we use to try to draw insights.<\/span><\/p>\n We have many quick guides that go over some of the fundamental parts of machine learning. Some of those guides include:<\/span><\/p>\n In data science, there are many different ways to slice the pie. While many refer to independent and dependent variables differently, they usually mean the same thing. Your predictor variables are your independent variables, and with these, you’ll (hopefully) be able to predict your criterion variable (dependent variable). In the rest of this 3-minute guide, … <\/p>\n <\/picture><\/p>\n
\nWhat is a criterion variable?<\/span><\/h2>\n <\/picture><\/span><\/p>\n
\nIn this scenario, we’ll have to find a way to combine our two variables to get the outcome that we need.<\/span><\/p>\n <\/picture><\/span><\/p>\n
\nNow that we’ve combined our two variables, the result we get is our official criterion variable.<\/span><\/p>\n
\nWhat is a Predictor variable?<\/span><\/h2>\n <\/picture>
\nThe predictor variable sometimes is referred to as the independent variable. Since we know in machine learning projects, there is generally more than one predictor variable; these are sometimes referred to as “X.”<\/span><\/p>\n
\nWhat is the difference between a criterion variable and a predictor variable?<\/span><\/h2>\n
\nWhat do a Criterion Variable and Predictor Variable Look Like in a Machine Learning Project?<\/span><\/h2>\n
\nPython Example of Criterion and Predictor Variables<\/span><\/h3>\nimport pandas as pd\n\ndf = pd.read_csv('ds_salaries.csv')\n\n\ndf.head()\n<\/code><\/pre>\n
<\/picture><\/p>\n
## split our predictor and criterion variables\n\ncriterion_variable = df[['salary_in_usd']]\n\npredictor_variables = df[['experience_level','employment_type','job_title',\n 'salary_currency','employee_residence','remote_ratio',\n 'company_location','company_size']]\n<\/code><\/pre>\n
<\/picture><\/span><\/p>\n
<\/picture><\/p>\n
\nDo Criterion Variables Exist in Unsupervised Learning?<\/span><\/h2>\n
\nOther Articles in our Machine Learning 101 Series<\/span><\/h2>\n\n