Machine Learning on a Cancer Dataset - Part 33

cristi (70)in #programming • 7 years ago

This is the second machine learning tutorial in which we're getting into uncertainty estimation in scikit-learn. In the previous video tutorial, we discussed and implemented the 'decision_function', which is a method of uncertainty estimation.

Here we're going to learn about predicting probabilities and we're going to implement 'predict_proba', which is another method of uncertainty estimation in scikit-learn. Unlike the decision function, I think 'predict_proba' is much more straight forward, and probably easier to understand.

In our case, we're applying 'predict_proba' for a binary classification. Remember, we're classifying tumor samples into malignant and/or benign. So, what 'predict_proba' does is to:

assign a probability that a sample is in a class, and then in the other class
adding up the two probabilities will result in 100% or 1.0, like in probability estimation

So, say we have a sample 'x'. Calling 'predict_proba' will tell us:

probability of 'x' to be malignant: 0.43
probability of 'x' to be benign: 0.56

And it will do that for all the samples (in our test subset). For each sample, the probability for the class that is higher than 50% is the one that's actually being given as a result. Please see the video tutorial below for the implementation of this.

Previous videos in this series: