Machine Learning on a Cancer Dataset - Part 3

cristi (70)in #machine-learning • 9 years ago

In this third video of the series on machine learning, I discuss about the dataset. In this case, we're gonna be working on the cancer dataset that's preloaded with scikit-learn.

Basically, this dataset contains 569 digitized images of FNAs (fine needle aspirates) of tumor masses. The data is labeled, benign or malignant. So, each sample has a set of ~30 features, which describe the nucleus (perimeter mean, area mean, smoothness, etc) and a target, which is malignant or benign.

We will feed this data into machine learning algorithms, and then train the algorithms (fit), check their accuracy and improve or optimize them if that's the case. The ultimate purpose is to use the trained algorithm for classification of new samples, so for prediction whether new data (sample) is malignant or benign. More in the video...

As a reminder:

In this series I'm going to explore the cancer dataset that comes pre-loaded with scikit-learn. The purpose is to train the classifiers on this dataset, which consists of labeled data: ~569 tumor samples, each labeled malignant or benign, and then use them on new, unlabeled data.

Previous videos in this series: