Machine Learning on a Cancer Dataset - Part 20

in #machine-learning7 years ago

In this video we train a neural network in scikit-learn on the cancer dataset.

In my last post I made a mistake saying that we were training the neural network in video #19, but in fact, the video was only about introductory concepts of neural nets in scikit-learn. Anyway, the type of neural network we train is a multi-layer perceptron. It is a basic classifier and its implementation and training are quite simple.

Without modifying any of the parameters - so, just going with the defaults -, the neural net performs worse compared to other algorithms.

One important fact is that the data that goes into training the algorithm is at different scale. More simply put, each cancer sample is described by ~30 numeric features. The minimum and maximum for each feature varies significantly. And this has an impact on the performance of the algorithm.

We're going to try to remediate this by bringing the data to the same scale, between 0 - 1. And this is the 'to-do' for the next tutorial.


As a reminder:

In this series I'm going to explore the cancer dataset that comes pre-loaded with scikit-learn. The purpose is to train the classifiers on this dataset, which consists of labeled data: ~569 tumor samples, each labeled malignant or benign, and then use them on new, unlabeled data.


Previous videos in this series:

  1. Machine Learning on a Cancer Dataset - Part 11
  2. Machine Learning on a Cancer Dataset - Part 12
  3. Machine Learning on a Cancer Dataset - Part 13
  4. Machine Learning on a Cancer Dataset - Part 14
  5. Machine Learning on a Cancer Dataset - Part 15
  6. Machine Learning on a Cancer Dataset - Part 16
  7. Machine Learning on a Cancer Dataset - Part 17
  8. Machine Learning on a Cancer Dataset - Part 18
  9. Machine Learning on a Cancer Dataset - Part 19


To stay in touch with me, follow @cristi


Cristi Vlad, Self-Experimenter and Author

Coin Marketplace

STEEM 0.20
TRX 0.12
JST 0.028
BTC 65809.08
ETH 3604.05
USDT 1.00
SBD 2.54