Machine Learning on a Cancer Dataset - Part 21

in #machine-learning7 years ago

In this video from the series of machine learning with scikit-learn, we learn an important aspect of data processing.

We are dealing with a cancer dataset that is made of tumor samples characterized by numerical features, such as radius, perimeter, area, and so on. These features are at different scales or within different ranges, meaning that the minimum and maximum values can differ quite significantly between any two features; and there are about 30 features characterizing each tumor sample.

This scale variability has an impact on the learning of our neural network and it has an impact on the accuracy of the predictions it can make. Here we learn how to bring the data at the same scale, retrain the network, and measure its accuracy. We compare it with the accuracy score on the unscaled data. Please watch the video below for the full 'scoop'.


As a reminder:

In this series I'm going to explore the cancer dataset that comes pre-loaded with scikit-learn. The purpose is to train the classifiers on this dataset, which consists of labeled data: ~569 tumor samples, each labeled malignant or benign, and then use them on new, unlabeled data.


Previous videos in this series:

  1. Machine Learning on a Cancer Dataset - Part 11
  2. Machine Learning on a Cancer Dataset - Part 12
  3. Machine Learning on a Cancer Dataset - Part 13
  4. Machine Learning on a Cancer Dataset - Part 14
  5. Machine Learning on a Cancer Dataset - Part 15
  6. Machine Learning on a Cancer Dataset - Part 16
  7. Machine Learning on a Cancer Dataset - Part 17
  8. Machine Learning on a Cancer Dataset - Part 18
  9. Machine Learning on a Cancer Dataset - Part 19
  10. Machine Learning on a Cancer Dataset - Part 20


To stay in touch with me, follow @cristi


Cristi Vlad, Self-Experimenter and Author

Sort:  

any example with tensorflow?

hopefully, once I finish with sklearn :)

What activation function are you using? I have no experience with sci kit learn. Do they default to tanh or ReLU?

That's the default? I'll have to follow along, I just looked over this video. Somewhat familiar with machine learning techniques and mathematics.

You are doing something truly remarkable here that only real scientists can appreciate. Keep it up.

steem really needs to curate more of this type of material.

that means a lot to me. thank you!

Coin Marketplace

STEEM 0.20
TRX 0.12
JST 0.028
BTC 65407.18
ETH 3576.28
USDT 1.00
SBD 2.48