Crash-Course: Independent vs Dependent Variables


1. Independent or dependent?


If you are planning to become a data scientist or machine learning engineer, having a strong knowledge of independent and dependent variables is critical. Without this knowledge, you cannot create a great dataset that is tailored to your problem and you won’t understand why your models are not giving the results you are expecting.

That is why I will teach you about what each variable means, how to identify them, and how to choose which of those data to keep and which to eliminate from your dataset. Buckle up!
2. Types of variables


What are Variables?


To understand what different types of variables are you must first get a feel of what a variable really is. A variable is any characteristic quantity number or data point that can be measured or counted. Age, height, salary country of birth, eye color, and many other data points are examples of variables.

There are a few types of variables depending on how they are measured and presented. Numeric variables have values that describe a measure or quantity as a number, like how many of one type or how much.

Numeric variables can also be described into 2 types: 
  • Continuous Variable — is a numeric value or variable, can be any value between a certain set of real numbers (0.45, 9.234, 10, etc.) The value given an observation usually has an upper and a lower limit, based on its type. For example, when measuring temperature, you cannot have values under 0 Kelvin or over 10³² Kelvin (the maximum possible temperature).
  • A discrete Variable — is also a numeric variable that can take values based on a count or other forms of whole values. A few examples of discrete variables are the number of humans, number of cars, and number of apples (values that need to be whole — you can’t say 0.45 of a human).

Also, there are Categorical data points that can be put in 2 categories: 
  • Ordinal variables — is a categorical type of variable that can be logically ordered or ranked. Those variables can be higher or lower than one another and there can be a numeric difference between each category. Some examples are grates, clothing size, and levels of emotion (sad, very sad, happy).
  • Nominal Variables — another type of Categorical variable that can take a value that is impossible to be organized in a sequence (which is not ordinal). Some examples include sex, eye color, and religion — you cannot say that one sex is greater than another, except if you are a misogynist, therefore sex is a nominal variable :).
3. Cause and effect

Dependent Variables

Dependent variables, as the name implies, are the variables that are “dependent” on other variables and need to be determined by the algorithm. You need to use independent variables that are relevant to predict the dependent variable.

For example, you may use independent variables like total space and proximity to the city center to determine the price of a building or apartment (the price is the dependent variable here).

Or, you can use the length and width of petals and flowers to determine the species of that plant (the species of plant is the dependent variable here).
Independent Variables

You may already get the idea about what an independent variable is from the dependent variable explanation above if you are a smart lad.

Independent variables are a type of data that has a direct impact on the dependent variable. From one of the examples above, proximity to the city center and the total space of an apartment directly influence the selling price or value of that apartment. But proximity to the city center and the total space of the apartment have no real correlation to each other(thus they are independent).

During the data preprocessing, you may also encounter variables that have no correlation to the dependent variable or variables. Those independent variables have no value in determining the price of an apartment whatsoever.

There are a few ways in which you can find out which independent variables have an effect on your dependent variable.

The first way is using common sense. For example, the independent variable that tells if the weather is sunny or cloudy would sure as hell not have any correlation to the price of the apartment. Nor would the price of ice cream. The first step in finding the right independent variables for your dependent variables is always common sense.

The second way is through some statistical calculations of correlation. The easiest way is through linear regression and calculating the values R and R².

The simple Linear Regression formula is:

y=b0 +b1*x

In this equation of simple linear regression, y is the dependent variable and x is the independent variable. In case you are curious about b0 and b1, b0 is the so-called y-intercept ( the y-coordinate of a point where a line, curve, or surface intersects the y-axis) and b1 is the regression coefficient (which means the amount by which change in x must be multiplied to give the corresponding average change in y).

The first one is calculating the value R which represents “the correlation between the observed values of the response variable and the predicted values of the response variable made by the model”. If you want to find out how to calculate R, follow this link.

R² or R squared represents the proportion of the variance in the response variable that can be explained by the predictor variable in the regression model. It can be calculated by simply multiplying R*R = R².

I will publish an article in the near future that will describe more about linear regression and the R and R² calculations.


Conclusion

Understanding variables is the critical first step in starting your machine learning or data science career. Be sure to read this article carefully and also do some more research, in case I forgot to write something important! Remember, always do your own research, and never just trust what someone says or writes. That is the only way in which you will find the truth!

I hope this article gave you a bit of an understanding of the types of variables and can get you interested in pursuing data science as a career if you aren’t already decided on it.

Happy learning and stay tuned for future data science and machine learning articles I will write in the near future!

Comments