Machine Learning on a Cancer Dataset - Part 27

in #machine-learning7 years ago

In the third video on support vector machines (SVMs) we begin implementing an SVM on our cancer dataset in scikit-learn.

We're using a support vector classifier (SVC) with an RBF (radial basis function) kernel. For an overview on kernels and how they work conceptually, please look at the previous video in this series.

There are many parameters that can be adjusted for our classifier. The defaults are usually good to start with. However, for our cancer dataset, the classifier seems to be overfitting with the default parameters (as it leads to 100% performance on the training subset).

To fix this we could try adjusting parameters such as the C and/or gamma which control regularization and the width of the Gaussian kernel. We could also look into the scaling of the data; it is currently unscaled. And this is what we're gonna work on in the next video. But for now, see the current tutorial on how to implement SVMs in scikit-learn.


As a reminder:

In this series I'm going to explore the cancer dataset that comes pre-loaded with scikit-learn. The purpose is to train the classifiers on this dataset, which consists of labeled data: ~569 tumor samples, each labeled malignant or benign, and then use them on new, unlabeled data.


Previous videos in this series:

  1. Machine Learning on a Cancer Dataset - Part 20
  2. Machine Learning on a Cancer Dataset - Part 21
  3. Machine Learning on a Cancer Dataset - Part 22
  4. Machine Learning on a Cancer Dataset - Part 23
  5. Machine Learning on a Cancer Dataset - Part 24
  6. Machine Learning on a Cancer Dataset - Part 25
  7. Machine Learning on a Cancer Dataset - Part 26


To stay in touch with me, follow @cristi


Cristi Vlad, Self-Experimenter and Author

Sort:  

Machine Learning is so interesting.
Thank you for this Post!

good luck you with your post)

Could you explain a bit what what RVC and SBF are? (sorry if I got the acronyms wrong, I'm on mobile) XD

SVC is the support vector classifier and RBF is the Gaussian kernel or radial basis function kernel, and these have been explained in the previous video

Coin Marketplace

STEEM 0.20
TRX 0.12
JST 0.028
BTC 65809.08
ETH 3604.05
USDT 1.00
SBD 2.54