Machine Learning on a Cancer Dataset - Part 30

cristi (70)in #programming • 8 years ago

In this machine learning tutorial we're going to continue optimizing our support vector machine algorithm (SVC) to improve its performance on the cancer dataset in scikit-learn.

Remember from the previous video tutorial that after scaling the data, the performance improved significantly. In fact, from an overfitting scenario (that we had on the unscaled data) we reached an underfitting scenario. So, we have to do something to fix the underfitting of the classifier.

There are numerous hyper-parameters for SVMs that could be adjusted. However, we're only going to modify one of them here, and see how it changes the performance of the classifier. Specifically, we're going to adjust the C parameter, which deals with regularization.

By default, C is equal to 1. We're going to set it to 1,000, thereby increasing the complexity of our model. Then, we're going to assess whether or not the performance of our SVC (notice I'm using SVM and SVC interchangeably here) improved.

Please see the video below for the complete walk-through.

Previous videos in this series: