Machine Learning on a Cancer Dataset - Part 25

cristi (70)in #machine-learning • 9 years ago

In this machine learning tutorial we introduce Support Vector Machines. This is the first video on SVMs with scikit-learn and the 25th in this series, which seems to get quite lengthy.

SVMs are algorithms similar to logistic regression but unlike logistic regression, SVMs try to find the optimal distance from the decision boundary or line to the nearest points of different classes. This is basically the largest distance. So, more simply put, SVMs try to keep the decision boundary as far as possible between classes.

And in this way, they may be superior to other algorithms, as they add another layer of complexity to the decision making process. SVMs work on both linear and non-linear data. For non-linear data, they use a trick, called the kernel trick. And this is what we're dealing with in the next video.

But until then, please watch the video for the visual introduction to SVMs and also to admire my lack of artistic skill.

As a reminder:

In this series I'm going to explore the cancer dataset that comes pre-loaded with scikit-learn. The purpose is to train the classifiers on this dataset, which consists of labeled data: ~569 tumor samples, each labeled malignant or benign, and then use them on new, unlabeled data.

Previous videos in this series: