Machine Learning on a Cancer Dataset - Part 29

in #programming8 years ago

I can't believe there are already 29 video tutorials in this series and I feel like I didn't even scratch the surface of one type of machine learning, using one library (scikit-learn) out of the numerous that are out there, on one dataset (Wisconsin Breast Cancer). Anyway, this is just to bring awareness on the perspective...

In this tutorial we scale our cancer dataset to see what is the difference in performance of our support vector machine classifier (SVC) compared to the performance on the unscaled data.

In scikit-learn it is relatively simple to scale the data; we could use methods like StandardScaler or MinMaxScaler from the preprocessing module and achieve our goal in one or two lines of code. But we're not going to do that here. Instead, we're going to learn how to scale the data 'by hand'; this is not complicated either, but it helps us better understand what 'data scaling' means.

With the scaled data, we're gonna re-train our algorithm and re-assess its performance. Please see the video below for a full walkthrough...


As a reminder:

In this series I'm going to explore the cancer dataset that comes pre-loaded with scikit-learn. The purpose is to train the classifiers on this dataset, which consists of labeled data: ~569 tumor samples, each labeled malignant or benign, and then use them on new, unlabeled data.


Previous videos in this series:

  1. Machine Learning on a Cancer Dataset - Part 20
  2. Machine Learning on a Cancer Dataset - Part 21
  3. Machine Learning on a Cancer Dataset - Part 22
  4. Machine Learning on a Cancer Dataset - Part 23
  5. Machine Learning on a Cancer Dataset - Part 24
  6. Machine Learning on a Cancer Dataset - Part 25
  7. Machine Learning on a Cancer Dataset - Part 26
  8. Machine Learning on a Cancer Dataset - Part 27
  9. Machine Learning on a Cancer Dataset - Part 28


To stay in touch with me, follow @cristi


Cristi Vlad, Self-Experimenter and Author

Sort:  

You are perseverant, you are already at 29. The program it develops resembles Matlab.

could be done in matlab too, but that system is not open-source.

This is very interesting !

Hi Cristi. I have not been upping almost no-one last few days because I must recharge SP over 90%. It went to 14% few days ago and can't use the slider atm. Will up you and everyone soon. I'm sorry about it.

Coin Marketplace

STEEM 0.09
TRX 0.31
JST 0.030
BTC 110811.51
ETH 3750.43
USDT 1.00
SBD 0.67