Machine Learning on a Cancer Dataset - Part 5

in #machine-learning7 years ago

In this 5th video of the series, I'm going to provide a refresher on the KNN classifier.

KNN or K-Nearest Neighbors is an algorithm that classifies data points based on their k-nearest points. To be more specific, after we train (fit) the KNN algorithm on our cancer dataset, we can use it on new data. By default it uses k=5 for decision making.

So, if we provide a new sample, tumor image, it will look to the 5 nearest points (in euclidean space) and average them in order to classify the sample as benign or malignant. For better understanding, see the video below, in which I explain over a graphical (charted) representation of KNN.


As a reminder:

In this series I'm going to explore the cancer dataset that comes pre-loaded with scikit-learn. The purpose is to train the classifiers on this dataset, which consists of labeled data: ~569 tumor samples, each labeled malignant or benign, and then use them on new, unlabeled data.


Previous videos in this series:

  1. Machine Learning on a Cancer Dataset - Part 1
  2. Machine Learning on a Cancer Dataset - Part 2
  3. Machine Learning on a Cancer Dataset - Part 3
  4. Machine Learning on a Cancer Dataset - Part 4


To stay in touch with me, follow @cristi

#machine-learning #science #python


Cristi Vlad, Self-Experimenter and Author

Coin Marketplace

STEEM 0.30
TRX 0.12
JST 0.033
BTC 64534.17
ETH 3150.15
USDT 1.00
SBD 4.01