Machine Learning on a Cancer Dataset - Part 18

cristi (70)in #machine-learning • 8 years ago

In the third and final video on Random Forest we're looking at the importance each feature of the cancer sample plays in the decision making process.

Recall from previous videos that we've done something similar for Decision Trees. The feature importances matrix was skewed, having 2-3 features (out of 30) play the majority of importance in decision making, while the rest of the features were close to zero. This may not be an accurate model of cancer because it may be least likely to have a feature like 'worst radius' play such a 'heavy' role in predicting if a tumor is malignant or benign.

As we compute the feature importances matrix for the Random Forest classifier we see that it looks much more balanced compared to the one for DT. Please see the full details and explanation in the video below.

As a reminder:

In this series I'm going to explore the cancer dataset that comes pre-loaded with scikit-learn. The purpose is to train the classifiers on this dataset, which consists of labeled data: ~569 tumor samples, each labeled malignant or benign, and then use them on new, unlabeled data.

Previous videos in this series:

To stay in touch with me, follow @cristi

Cristi Vlad, Self-Experimenter and Author

#science #python

8 years ago in #machine-learning by cristi (70)

$49.39

STEEM 0.24

TRX 0.19

JST 0.035

BTC 92183.93

ETH 3313.90

USDT 1.00

SBD 3.75

Machine Learning on a Cancer Dataset - Part 18

To stay in touch with me, follow @cristi

Coin Marketplace