Machine Learning with Scikit-Learn - [Part 40]
This is the 40th video in the tutorial series for machine learning with scikit-learn. I can't believe we've gone that far. These videos are building up slowly and there are many concepts and topics that are still to be discussed, so it might turn out to be a very long series.
For now, we're getting into One-Hot Encoding, a procedure to represent categorical variables using numeric variables. In short, a One-Hot encoder replaces a categorical variable with one or more features of numerical value (0 or 1).
If you're into machine learning, you'll understand that this increases data burden; it increases data dimensionality because one feature is turned into two or more features. And this might add unwanted complexity to our problem. However, it would not be used if it's not useful to the process. So, more often than not, this type of encoder increases the efficiency of the learning algorithm.
In this tutorial we're using the income dataset that's preloaded in scikit-learn; we select a few columns from the original dataset and we apply a One-Hot encoder to it. We also look at how dimensionality increases as we apply the encoder. Please watch the video below for a complete walk through:
To stay in touch with me, follow @cristi
Cristi Vlad Self-Experimenter and Author
A great mechine learning @cristi.