Machine Learning on a Cancer Dataset - Part 33

in #programming7 years ago

This is the second machine learning tutorial in which we're getting into uncertainty estimation in scikit-learn. In the previous video tutorial, we discussed and implemented the 'decision_function', which is a method of uncertainty estimation.

Here we're going to learn about predicting probabilities and we're going to implement 'predict_proba', which is another method of uncertainty estimation in scikit-learn. Unlike the decision function, I think 'predict_proba' is much more straight forward, and probably easier to understand.

In our case, we're applying 'predict_proba' for a binary classification. Remember, we're classifying tumor samples into malignant and/or benign. So, what 'predict_proba' does is to:

  • assign a probability that a sample is in a class, and then in the other class
  • adding up the two probabilities will result in 100% or 1.0, like in probability estimation

So, say we have a sample 'x'. Calling 'predict_proba' will tell us:

  • probability of 'x' to be malignant: 0.43
  • probability of 'x' to be benign: 0.56

And it will do that for all the samples (in our test subset). For each sample, the probability for the class that is higher than 50% is the one that's actually being given as a result. Please see the video tutorial below for the implementation of this.


Previous videos in this series:

  1. Machine Learning on a Cancer Dataset - Part 30
  2. Machine Learning on a Cancer Dataset - Part 31
  3. Machine Learning on a Cancer Dataset - Part 32


To stay in touch with me, follow @cristi


Cristi Vlad, Self-Experimenter and Author

Sort:  

Nice post. Very informative. Looking forward to the next class. Upvoted and resteemed

Coin Marketplace

STEEM 0.27
TRX 0.27
JST 0.041
BTC 98341.53
ETH 3655.28
SBD 2.49