Machine Learning on a Cancer Dataset - Part 14

cristi (70)in #machine-learning • 9 years ago

We are still working with Decision Trees and in this video we're specifically going to look at a parameter of the trained algorithm, which is called feature_importances.

We can call this parameter after the classifier (our Decision Tree) has been trained on the cancer dataset and it will show us how much weight each feature carries in the decision making process. This will aid our understanding of the entire process. However, despite having some features weigh more than others in classifying tumors, it is always advised to be considerate of the context.

In this case, 'worst radius' carries the most weight for decision making. But it wouldn't be advised to say that a larger value of this is directly associated with a tumor being 'malignant' or 'benign'. Please watch the video for a more detailed walk-through of what I just said.

As a reminder:

In this series I'm going to explore the cancer dataset that comes pre-loaded with scikit-learn. The purpose is to train the classifiers on this dataset, which consists of labeled data: ~569 tumor samples, each labeled malignant or benign, and then use them on new, unlabeled data.

Previous videos in the series: