Machine Learning on a Cancer Dataset - Part 28

in #machine-learning7 years ago

This is the fourth video tutorial on support vector machines (SVMs) with scikit-learn on the cancer dataset. In the last video, we used a support vector classifier (SVC) with an RBF kernel and with all default parameters and we trained it on our dataset.

We noticed how it overfits the training data (by getting 100% performance) and how it poorly performs on the test subset. There are several reasons that could lead to the decreased performance of the algorithm. Some of them include the scale of the data and also the adjusting of the hyper-parameters.

In this video we're gonna use matplotlib to visualize our data and to understand what it means for it to be unscaled - the difference in orders of magnitude between the values of each feature and the difference in magnitude in-between features.

Then, in the next tutorial we're gonna try to remediate this issue by scaling the data. Please watch the video below for the full scoop.


As a reminder:

In this series I'm going to explore the cancer dataset that comes pre-loaded with scikit-learn. The purpose is to train the classifiers on this dataset, which consists of labeled data: ~569 tumor samples, each labeled malignant or benign, and then use them on new, unlabeled data.


Previous videos in this series:

  1. Machine Learning on a Cancer Dataset - Part 20
  2. Machine Learning on a Cancer Dataset - Part 21
  3. Machine Learning on a Cancer Dataset - Part 22
  4. Machine Learning on a Cancer Dataset - Part 23
  5. Machine Learning on a Cancer Dataset - Part 24
  6. Machine Learning on a Cancer Dataset - Part 25
  7. Machine Learning on a Cancer Dataset - Part 26
  8. Machine Learning on a Cancer Dataset - Part 27


To stay in touch with me, follow @cristi


Cristi Vlad, Self-Experimenter and Author

Sort:  

Very Cool Post!

thanks!

I'm discovering your series just right now. Have you every done any videos about logistic regression?

yes, there are 3 in the current series. for the first one, look here.

Extremely interesting @cristi! I looked into a summer research project modeling brain tumor growth. But I wasn't allowed to due to a lack of ethics training and obtainable data. I'm just wondering what your position on developing a machine-learning model to help predict tumor growth is?

I think it would be difficult, but approachable...

I agree, it is definitely approachable, I like to think very achievable. But the problem with coming up with a prediction for tumor growth for some patient 'A' is that your algorithm might be structured correctly and work great but the data you've/I/we used is biased and therefore 'A' wouldn't receive an inaccurate prediction as to how much their tumor might grow. Consequentially 'A' could be given treatment that is too heavy and unnecessary or too low and ineffective.

nice project, I think that's very good

Great post. Check out my profile if you have time :)

Coin Marketplace

STEEM 0.19
TRX 0.12
JST 0.028
BTC 65566.66
ETH 3559.87
USDT 1.00
SBD 2.48