Machine Learning on a Cancer Dataset - Part 33
This is the second machine learning tutorial in which we're getting into uncertainty estimation in scikit-learn. In the previous video tutorial, we discussed and implemented the 'decision_function', which is a method of uncertainty estimation.
Here we're going to learn about predicting probabilities and we're going to implement 'predict_proba', which is another method of uncertainty estimation in scikit-learn. Unlike the decision function, I think 'predict_proba' is much more straight forward, and probably easier to understand.
In our case, we're applying 'predict_proba' for a binary classification. Remember, we're classifying tumor samples into malignant and/or benign. So, what 'predict_proba' does is to:
- assign a probability that a sample is in a class, and then in the other class
- adding up the two probabilities will result in 100% or 1.0, like in probability estimation
So, say we have a sample 'x'. Calling 'predict_proba' will tell us:
- probability of 'x' to be malignant: 0.43
- probability of 'x' to be benign: 0.56
And it will do that for all the samples (in our test subset). For each sample, the probability for the class that is higher than 50% is the one that's actually being given as a result. Please see the video tutorial below for the implementation of this.
Previous videos in this series:
- Machine Learning on a Cancer Dataset - Part 30
- Machine Learning on a Cancer Dataset - Part 31
- Machine Learning on a Cancer Dataset - Part 32
To stay in touch with me, follow @cristi
Cristi Vlad, Self-Experimenter and Author
Nice post. Very informative. Looking forward to the next class. Upvoted and resteemed