Machine Learning on a Cancer Dataset - Part 2
In the second video, I talk about the dependencies that we're gonna use in this machine learning series. So, these include:
- scikit-learn
- matplotlib
- and others that may come along as we progress through the series
The initial imports that we're doing:
- load_breast_cancer - this is the dataset that we're working on
- train_test_split - to split the dataset into training and test subset
- KNeighborsClassifier - the first ML classifier that we're gonna use
- matplotlib
I also look at how the description (DESCR) for this dataset looks like in scikit-learn. More in the video below.
As a reminder:
In this series I'm going to explore the cancer dataset that comes pre-loaded with scikit-learn. The purpose is to train the classifiers on this dataset, which consists of labeled data: ~569 tumor samples, each labeled malignant or benign, and then use them on new, unlabeled data.
Previous videos in this series:
To stay in touch with me, follow @cristi
#machine-learning #science #python
Cristi Vlad, Self-Experimenter and Author