Machine Learning on a Cancer Dataset - Part 8
Now that we know a little bit about KNN, we'll move on to another machine learning algorithm that can be used for classification, and that is Logistic Regression.
In this video, we'll train a Logistic Regression classifier on the cancer dataset in scikit-learn and we'll see how it performs compared to KNN, which we went through in our previous videos.
You can get the code below at my github too:
# Using LogisticRegression on the cancer dataset. Inspired by Muller and Guido ML book: (https://www.amazon.com/dp/1449369413/)
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
%matplotlib inline
cancer = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, stratify=cancer.target, random_state=42)
log_reg = LogisticRegression()
log_reg.fit(X_train, y_train)
# displaying the accuracy
print('Accuracy on the training subset: {:.3f}'.format(log_reg.score(X_train, y_train)))
print('Accuracy on the test subset: {:.3f}'.format(log_reg.score(X_test, y_test)))
For a walkthrough of the code, please see the video below.
As a reminder:
In this series I'm going to explore the cancer dataset that comes pre-loaded with scikit-learn. The purpose is to train the classifiers on this dataset, which consists of labeled data: ~569 tumor samples, each labeled malignant or benign, and then use them on new, unlabeled data.
Previous videos in this series:
- Machine Learning on a Cancer Dataset - Part 1
- Machine Learning on a Cancer Dataset - Part 2
- Machine Learning on a Cancer Dataset - Part 3
- Machine Learning on a Cancer Dataset - Part 4
- Machine Learning on a Cancer Dataset - Part 5
- Machine Learning on a Cancer Dataset - Part 6
- Machine Learning on a Cancer Dataset - Part 7
To stay in touch with me, follow @cristi
#machine-learning #science #python
Cristi Vlad, Self-Experimenter and Author