{"id":1120,"date":"2024-02-22T12:46:09","date_gmt":"2024-02-22T12:46:09","guid":{"rendered":"https:\/\/enjoymachinelearning.com\/?p=1120"},"modified":"2024-02-22T12:46:09","modified_gmt":"2024-02-22T12:46:09","slug":"criterion-vs-predictor","status":"publish","type":"post","link":"https:\/\/enjoymachinelearning.com\/blog\/criterion-vs-predictor\/","title":{"rendered":"Machine Learning 101: Criterion vs Predictor (With Coded Examples)"},"content":{"rendered":"<p><span style=\"font-size: 18pt;\">In data science, there are many different ways to slice the pie. While many refer to independent and dependent variables differently, they usually mean the same thing.<\/span><\/p>\n<p><span style=\"font-size: 18pt;\">Your predictor variables are your independent variables, and with these, you&#8217;ll (hopefully) be able to predict your criterion variable (dependent variable).<\/span><\/p>\n<p><span style=\"font-size: 18pt;\">In the rest of this <strong>3-minute guide<\/strong>, we&#8217;ll go over a deep-dive into <strong>criterion vs. predictor variables<\/strong>, what <strong>each of these means, <\/strong>and supply you with some <strong>code <\/strong>at the bottom to show you <strong>how to split each of these out in the python coding language<\/strong>.<\/span><\/p>\n<p><span style=\"font-size: 18pt;\">This one is embarrassing to mess up, but don&#8217;t worry; we&#8217;ve got your back.<\/span><\/p>\n<p><picture><source srcset=\"https:\/\/enjoymachinelearning.com\/wp-content\/uploads\/2022\/10\/got-your-back-1024x641.webp \"  type=\"image\/webp\"><img src=\"https:\/\/enjoymachinelearning.com\/wp-content\/uploads\/2022\/10\/got-your-back-1024x641.jpg\" height=\"347\" width=\"621\" class=\" wp-image-1133 aligncenter sp-no-webp\" alt=\"pointing at you\" loading=\"lazy\" decoding=\"async\"  > <\/picture><\/p>\n<h2><span style=\"font-size: 36pt;\"><br \/>\nWhat is a criterion variable?<\/span><\/h2>\n<p><span style=\"font-size: 18pt;\">Simply, a criterion variable is a variable we&#8217;re trying to predict. Many machine learning projects refer to this as Y or as our <a href=\"https:\/\/www.sciencedirect.com\/topics\/nursing-and-health-professions\/target-variable\" target=\"_blank\" rel=\"noopener\">target variable<\/a>. <\/span><\/p>\n<p><span style=\"font-size: 18pt;\">The best way to identify your criterion variable is to identify the variable that you care about.<\/span><\/p>\n<p><span style=\"font-size: 18pt;\">In a business context, this will be the variable that most closely resembles the problem you&#8217;re trying to solve. <\/span><\/p>\n<p><span style=\"font-size: 18pt;\">For example, if your boss wants to build a model that can predict future sales of your company&#8217;s product, the criterion variable is the variable that most closely resembles sales.<\/span><\/p>\n<p><span style=\"font-size: 18pt;\">It&#8217;s worth noting that a criterion variable could be a relationship of multiple variables (Or something <a href=\"https:\/\/citeseerx.ist.psu.edu\/viewdoc\/download?doi=10.1.1.331.8019&amp;rep=rep1&amp;type=pdf\" target=\"_blank\" rel=\"noopener\">much more complicated<\/a>). <\/span><\/p>\n<p><span style=\"font-size: 18pt;\">Let&#8217;s say your boss wants you to do a research study for your company and needs you to look at the average amount spent per stock in the last six days.<\/span><\/p>\n<p><span style=\"font-size: 18pt;\">You open your data and have these variables:<\/span><\/p>\n<p><span style=\"font-size: 18pt;\"><picture><source srcset=\"https:\/\/enjoymachinelearning.com\/wp-content\/uploads\/2022\/10\/critvspred_stockpricepic.webp 304w,https:\/\/enjoymachinelearning.com\/wp-content\/uploads\/2022\/10\/critvspred_stockpricepic-300x142.webp 300w\" sizes=\"(max-width: 441px) 100vw, 441px\" type=\"image\/webp\"><img src=\"https:\/\/enjoymachinelearning.com\/wp-content\/uploads\/2022\/10\/critvspred_stockpricepic.jpg\" height=\"210\" width=\"441\" srcset=\"https:\/\/enjoymachinelearning.com\/wp-content\/uploads\/2022\/10\/critvspred_stockpricepic.jpg 304w, https:\/\/enjoymachinelearning.com\/wp-content\/uploads\/2022\/10\/critvspred_stockpricepic-300x142.jpg 300w\" sizes=\"(max-width: 441px) 100vw, 441px\" class=\"wp-image-1122 aligncenter sp-no-webp\" alt=\"money spent on stocks per day\" loading=\"lazy\" decoding=\"async\"  > <\/picture><\/span><\/p>\n<p><span style=\"font-size: 18pt;\"><br \/>\nIn this scenario, we&#8217;ll have to find a way to combine our two variables to get the outcome that we need.<\/span><\/p>\n<p><span style=\"font-size: 18pt;\"><picture><source srcset=\"https:\/\/enjoymachinelearning.com\/wp-content\/uploads\/2022\/10\/critvspred_stockpricepic2.webp \"  type=\"image\/webp\"><img src=\"https:\/\/enjoymachinelearning.com\/wp-content\/uploads\/2022\/10\/critvspred_stockpricepic2.jpg\" height=\"238\" width=\"647\" class=\" wp-image-1123 aligncenter sp-no-webp\" alt=\"money spent on stocks each day with column breakout\" loading=\"lazy\" decoding=\"async\"  > <\/picture><\/span><\/p>\n<p><span style=\"font-size: 18pt;\"><br \/>\nNow that we&#8217;ve combined our two variables, the result we get is our official criterion variable.<\/span><\/p>\n<p><span style=\"font-size: 18pt;\">This new variable will help us explain our solution, and we can <a href=\"https:\/\/enjoymachinelearning.com\/blog\/correlation-analysis-in-data-mining\/\">describe the correlation<\/a> and relationship between the other predictor variables in-depth.<\/span><\/p>\n<h2><span style=\"font-size: 36pt;\"><br \/>\nWhat is a Predictor variable?<\/span><\/h2>\n<p><span style=\"font-size: 18pt;\">A predictor variable is a variable used to predict another variable&#8217;s value, and these values can be utilized in both classification and regression. In most machine learning and statistics projects, there will be many predictor variables, as accuracy usually increases with more data.<\/span><\/p>\n<p><span style=\"font-size: 18pt;\">For example, if you want to predict the price of a house, the predictor variables might be the size of the house, the number of bedrooms, the number of bathrooms, and the location.<\/span><\/p>\n<p>&nbsp;<\/p>\n<figure id=\"attachment_1135\" aria-describedby=\"caption-attachment-1135\" style=\"width: 720px\" class=\"wp-caption aligncenter\"><picture><source srcset=\"https:\/\/enjoymachinelearning.com\/wp-content\/uploads\/2022\/10\/47271420_young-parents-and-son-having-fun-during-moving-day-to-new-house-1024x683.webp \"  type=\"image\/webp\"><img src=\"https:\/\/enjoymachinelearning.com\/wp-content\/uploads\/2022\/10\/47271420_young-parents-and-son-having-fun-during-moving-day-to-new-house-1024x683.jpg\" height=\"439\" width=\"720\" class=\" wp-image-1135 sp-no-webp\" alt=\"Young parents and son having fun during moving day to new house\" loading=\"lazy\" decoding=\"async\"  > <\/picture><figcaption id=\"caption-attachment-1135\" class=\"wp-caption-text\">Young parents and son having fun during moving day to a new house, portrait<\/figcaption><\/figure>\n<p><span style=\"font-size: 18pt;\"><br \/>\nThe predictor variable sometimes is referred to as the independent variable. Since we know in machine learning projects, there is generally more than one predictor variable; these are sometimes referred to as &#8220;X.&#8221;<\/span><\/p>\n<p><span style=\"font-size: 18pt;\">Once our target (<strong>criterion variable<\/strong>) is split from our independent variables (<strong>predictor variables<\/strong>), many data scientists will refer to this batch of predictor variables as just variables.<\/span><\/p>\n<h2><span style=\"font-size: 36pt;\"><br \/>\nWhat is the difference between a criterion variable and a predictor variable?<\/span><\/h2>\n<p><span style=\"font-size: 18pt;\">The main difference between a criterion variable and a predictor variable is that a predictor variable is used to find the values of the criterion variable. While there can be many predictor variables in a project, there is usually only a single criterion variable.<\/span><\/p>\n<p><span style=\"font-size: 18pt;\">One of the most critical steps in any project design or machine learning project is understanding the business context and how that relates to selecting the correct criterion variable.<\/span><\/p>\n<h2><span style=\"font-size: 36pt;\"><br \/>\nWhat do a Criterion Variable and Predictor Variable Look Like in a Machine Learning Project?<\/span><\/h2>\n<p><span style=\"font-size: 18pt;\">Since we now know that predictor variables are variables used to predict the value of a criterion variable, we can discuss the different types of predictor variables that exist.<\/span><\/p>\n<p><span style=\"font-size: 18pt;\">There are two main types of predictor variables: categorical and quantitative.<\/span><\/p>\n<p><span style=\"font-size: 18pt;\">Categorical predictor variables are those that can be divided into groups or categories. For example, a categorical predictor variable could be color, with the categories: red, blue, and green.<\/span><\/p>\n<p><span style=\"font-size: 18pt;\">Quantitative predictor variables are those that can be quantified or measured. For example, a quantitative predictor variable could be age, with the values being the ages of different individuals.<\/span><\/p>\n<h3><span style=\"font-size: 24pt;\"><br \/>\nPython Example of Criterion and Predictor Variables<\/span><\/h3>\n<p><span style=\"font-size: 18pt;\">Your boss wants you to build the most accurate model you can to predict what someone&#8217;s salary (in dollars) will be.<\/span><\/p>\n<p><span style=\"font-size: 18pt;\">Your boss has provided you with the dataset below:<\/span><\/p>\n<pre><code class=\"language-python\">import pandas as pd\n\ndf = pd.read_csv(&#039;ds_salaries.csv&#039;)\n\n\ndf.head()\n<\/code><\/pre>\n<p><picture><source srcset=\"https:\/\/enjoymachinelearning.com\/wp-content\/uploads\/2022\/10\/critvspred_dfhead.webp 974w,https:\/\/enjoymachinelearning.com\/wp-content\/uploads\/2022\/10\/critvspred_dfhead-300x63.webp 300w,https:\/\/enjoymachinelearning.com\/wp-content\/uploads\/2022\/10\/critvspred_dfhead-768x160.webp 768w\" sizes=\"(max-width: 1528px) 100vw, 1528px\" type=\"image\/webp\"><img src=\"https:\/\/enjoymachinelearning.com\/wp-content\/uploads\/2022\/10\/critvspred_dfhead.jpg\" height=\"319\" width=\"1528\" srcset=\"https:\/\/enjoymachinelearning.com\/wp-content\/uploads\/2022\/10\/critvspred_dfhead.jpg 974w, https:\/\/enjoymachinelearning.com\/wp-content\/uploads\/2022\/10\/critvspred_dfhead-300x63.jpg 300w, https:\/\/enjoymachinelearning.com\/wp-content\/uploads\/2022\/10\/critvspred_dfhead-768x160.jpg 768w\" sizes=\"(max-width: 1528px) 100vw, 1528px\" class=\" wp-image-1128 aligncenter sp-no-webp\" alt=\"original dataset\" loading=\"lazy\" decoding=\"async\"  > <\/picture><\/p>\n<p><span style=\"font-size: 18pt;\">You quickly notice that a good criterion variable would be salary_in_usd, and you split that out from the rest of the data.<\/span><\/p>\n<pre><code class=\"language-python\">## split our predictor and criterion variables\n\ncriterion_variable = df[[&#039;salary_in_usd&#039;]]\n\npredictor_variables = df[[&#039;experience_level&#039;,&#039;employment_type&#039;,&#039;job_title&#039;,\n                          &#039;salary_currency&#039;,&#039;employee_residence&#039;,&#039;remote_ratio&#039;,\n                          &#039;company_location&#039;,&#039;company_size&#039;]]\n<\/code><\/pre>\n<p style=\"text-align: center;\"><span style=\"font-size: 18pt;\">Our Criterion Variable:<\/span><\/p>\n<p><span style=\"font-size: 18pt;\"><picture><source srcset=\"https:\/\/enjoymachinelearning.com\/wp-content\/uploads\/2022\/10\/critvspred_criterion.webp \"  type=\"image\/webp\"><img src=\"https:\/\/enjoymachinelearning.com\/wp-content\/uploads\/2022\/10\/critvspred_criterion.jpg\" height=\"539\" width=\"210\" class=\" wp-image-1127 aligncenter sp-no-webp\" alt=\"criterion variable in a machine learning project\" loading=\"lazy\" decoding=\"async\"  > <\/picture><\/span><\/p>\n<p style=\"text-align: center;\"><span style=\"font-size: 18pt;\">Our Predictor Variables (Mix of Quantitative and Categorical)<\/span><\/p>\n<p><picture><source srcset=\"https:\/\/enjoymachinelearning.com\/wp-content\/uploads\/2022\/10\/critvspred_predictor.webp 920w,https:\/\/enjoymachinelearning.com\/wp-content\/uploads\/2022\/10\/critvspred_predictor-300x104.webp 300w,https:\/\/enjoymachinelearning.com\/wp-content\/uploads\/2022\/10\/critvspred_predictor-768x267.webp 768w\" sizes=\"(max-width: 920px) 100vw, 920px\" type=\"image\/webp\"><img src=\"https:\/\/enjoymachinelearning.com\/wp-content\/uploads\/2022\/10\/critvspred_predictor.jpg\" height=\"320\" width=\"920\" srcset=\"https:\/\/enjoymachinelearning.com\/wp-content\/uploads\/2022\/10\/critvspred_predictor.jpg 920w, https:\/\/enjoymachinelearning.com\/wp-content\/uploads\/2022\/10\/critvspred_predictor-300x104.jpg 300w, https:\/\/enjoymachinelearning.com\/wp-content\/uploads\/2022\/10\/critvspred_predictor-768x267.jpg 768w\" sizes=\"(max-width: 920px) 100vw, 920px\" class=\"size-full wp-image-1129 aligncenter sp-no-webp\" alt=\"predictor variables in a machine learning project\" loading=\"lazy\" decoding=\"async\"  > <\/picture><\/p>\n<p><span style=\"font-size: 18pt;\">Now that you have your datasets, you&#8217;re ready to start modeling!<\/span><\/p>\n<h2><span style=\"font-size: 36pt;\"><br \/>\nDo Criterion Variables Exist in Unsupervised Learning?<\/span><\/h2>\n<p><span style=\"font-size: 18pt;\">Criterion variables do not exist in unsupervised learning. Since <a href=\"https:\/\/enjoymachinelearning.com\/blog\/what-are-the-challenges-of-clustering\/\">unsupervised<\/a> learning does not have labeled data, we do not have a dependent variable (criterion variable). These projects will only have predictor variables that we use to try to draw insights.<\/span><\/p>\n<h2><span style=\"font-size: 36pt;\"><br \/>\nOther Articles in our Machine Learning 101 Series<\/span><\/h2>\n<p><span style=\"font-size: 18pt;\">We have many quick guides that go over some of the fundamental parts of machine learning. Some of those guides include:<\/span><\/p>\n<ul>\n<li><span style=\"font-size: 18pt;\"><span style=\"font-size: 18pt;\"><a href=\"https:\/\/enjoymachinelearning.com\/blog\/reverse-standardization\/\">Reverse Standardization<\/a>: Now that you can split your data correctly, use this guide to build your first model.<\/span><\/span><\/li>\n<li><span style=\"font-size: 18pt;\"><span style=\"font-size: 18pt;\"><a class=\"row-title\" href=\"https:\/\/enjoymachinelearning.com\/blog\/countvectorizer-vs-tfidfvectorizer\/\" aria-label=\"\u201cMachine Learning 101: CountVectorizer vs TFIDFVectorizer\u201d (Edit)\">CountVectorizer vs. TFIDFVectorizer<\/a>: Two classical NLP algorithms; you&#8217;ll need correct data splitting here to take these two on.<\/span><\/span><\/li>\n<li><span style=\"font-size: 18pt;\"><a href=\"https:\/\/enjoymachinelearning.com\/blog\/welch-t-test-python\/\">Welch&#8217;s T-Test<\/a>: Do you know the difference between the student&#8217;s t-test and welch&#8217;s t-test? Don&#8217;t worry, we explain it in-depth here.<\/span><\/li>\n<li><span style=\"font-size: 18pt;\"><a href=\"https:\/\/enjoymachinelearning.com\/blog\/parameter-versus-variable\/\">Parameter Versus Variable<\/a>: Commonly misunderstood &#8211; these two aren&#8217;t the same thing. This article will break down the difference.<\/span><\/li>\n<li><span style=\"font-size: 18pt;\"><a href=\"https:\/\/enjoymachinelearning.com\/blog\/feature-selection-selectkbest-sklearn\/\">Feature Selection With SelectKBest Using Scikit-Learn<\/a>: Feature selection is tough; we make it easy for both regression and classification in this guide.<\/span><\/li>\n<li><span style=\"font-size: 18pt;\"><a href=\"https:\/\/enjoymachinelearning.com\/blog\/normal-distribution-vs-uniform-distribution\/\">Normal Distribution vs. Uniform Distribution<\/a>: Now that you know the difference between your variables, you can now start to understand the different distributions these variables can have.<\/span><\/li>\n<li><span style=\"font-size: 18pt;\"><a href=\"https:\/\/enjoymachinelearning.com\/blog\/heatmap-python\/\">Heatmaps In Python<\/a>: Visualizing data is key in data science; this post will teach eight different libraries to plot heatmaps.<\/span><\/li>\n<li><span style=\"font-size: 18pt;\"><a href=\"https:\/\/enjoymachinelearning.com\/blog\/gini-index-vs-entropy\/\">Gini Index vs. Entropy<\/a>: Learn how decision trees make splitting decisions. These two are the workhouse of top-performing tree-based methods.<\/span><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>In data science, there are many different ways to slice the pie. While many refer to independent and dependent variables differently, they usually mean the same thing. Your predictor variables are your independent variables, and with these, you&#8217;ll (hopefully) be able to predict your criterion variable (dependent variable). In the rest of this 3-minute guide, &#8230; <\/p>\n<p class=\"read-more-container\"><a title=\"Machine Learning 101: Criterion vs Predictor (With Coded Examples)\" class=\"read-more button\" href=\"https:\/\/enjoymachinelearning.com\/blog\/criterion-vs-predictor\/#more-1120\" aria-label=\"More on Machine Learning 101: Criterion vs Predictor (With Coded Examples)\">Read more<\/a><\/p>\n","protected":false},"author":2,"featured_media":1134,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1,4],"tags":[],"table_tags":[],"blocksy_meta":[],"_links":{"self":[{"href":"https:\/\/enjoymachinelearning.com\/wp-json\/wp\/v2\/posts\/1120"}],"collection":[{"href":"https:\/\/enjoymachinelearning.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/enjoymachinelearning.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/enjoymachinelearning.com\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/enjoymachinelearning.com\/wp-json\/wp\/v2\/comments?post=1120"}],"version-history":[{"count":5,"href":"https:\/\/enjoymachinelearning.com\/wp-json\/wp\/v2\/posts\/1120\/revisions"}],"predecessor-version":[{"id":1311,"href":"https:\/\/enjoymachinelearning.com\/wp-json\/wp\/v2\/posts\/1120\/revisions\/1311"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/enjoymachinelearning.com\/wp-json\/wp\/v2\/media\/1134"}],"wp:attachment":[{"href":"https:\/\/enjoymachinelearning.com\/wp-json\/wp\/v2\/media?parent=1120"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/enjoymachinelearning.com\/wp-json\/wp\/v2\/categories?post=1120"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/enjoymachinelearning.com\/wp-json\/wp\/v2\/tags?post=1120"},{"taxonomy":"table_tags","embeddable":true,"href":"https:\/\/enjoymachinelearning.com\/wp-json\/wp\/v2\/table_tags?post=1120"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}