Machine Learning on a Cancer Dataset - Part 12

in #machine-learning7 years ago

In the second video of Decision Trees and the 12th in the machine learning series, we look at ways on how to improve the accuracy of the Decision Tree.

With the default parameters that we have set in the previous video, our Decision Tree classifier is overfitting - we basically get 100% accuracy on the training subset. To prevent and/or to reduce overfitting, we're gonna use a technique called pre-pruning.

To be more specific we're going to restrict the depth of the tree by setting the max_depth parameter to 4. For the complete walk-through, please see the video below.


As a reminder:

In this series I'm going to explore the cancer dataset that comes pre-loaded with scikit-learn. The purpose is to train the classifiers on this dataset, which consists of labeled data: ~569 tumor samples, each labeled malignant or benign, and then use them on new, unlabeled data.


Previous videos in this series:

  1. Machine Learning on a Cancer Dataset - Part 1
  2. Machine Learning on a Cancer Dataset - Part 2
  3. Machine Learning on a Cancer Dataset - Part 3
  4. Machine Learning on a Cancer Dataset - Part 4
  5. Machine Learning on a Cancer Dataset - Part 5
  6. Machine Learning on a Cancer Dataset - Part 6
  7. Machine Learning on a Cancer Dataset - Part 7
  8. Machine Learning on a Cancer Dataset - Part 8
  9. Machine Learning on a Cancer Dataset - Part 9
  10. Machine Learning on a Cancer Dataset - Part 10
  11. Machine Learning on a Cancer Dataset - Part 11


To stay in touch with me, follow @cristi

#machine-learning #science #python


Cristi Vlad, Self-Experimenter and Author

Coin Marketplace

STEEM 0.37
TRX 0.12
JST 0.040
BTC 70162.45
ETH 3540.43
USDT 1.00
SBD 4.79