Machine Learning on a Cancer Dataset - Part 8

in #machine-learning7 years ago

Now that we know a little bit about KNN, we'll move on to another machine learning algorithm that can be used for classification, and that is Logistic Regression.

In this video, we'll train a Logistic Regression classifier on the cancer dataset in scikit-learn and we'll see how it performs compared to KNN, which we went through in our previous videos.

You can get the code below at my github too:

# Using LogisticRegression on the cancer dataset. Inspired by Muller and Guido ML book: (https://www.amazon.com/dp/1449369413/)

from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

import matplotlib.pyplot as plt
%matplotlib inline

cancer = load_breast_cancer()

X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, stratify=cancer.target, random_state=42)

log_reg = LogisticRegression()
log_reg.fit(X_train, y_train)

# displaying the accuracy
print('Accuracy on the training subset: {:.3f}'.format(log_reg.score(X_train, y_train)))
print('Accuracy on the test subset: {:.3f}'.format(log_reg.score(X_test, y_test)))

For a walkthrough of the code, please see the video below.


As a reminder:

In this series I'm going to explore the cancer dataset that comes pre-loaded with scikit-learn. The purpose is to train the classifiers on this dataset, which consists of labeled data: ~569 tumor samples, each labeled malignant or benign, and then use them on new, unlabeled data.


Previous videos in this series:

  1. Machine Learning on a Cancer Dataset - Part 1
  2. Machine Learning on a Cancer Dataset - Part 2
  3. Machine Learning on a Cancer Dataset - Part 3
  4. Machine Learning on a Cancer Dataset - Part 4
  5. Machine Learning on a Cancer Dataset - Part 5
  6. Machine Learning on a Cancer Dataset - Part 6
  7. Machine Learning on a Cancer Dataset - Part 7


To stay in touch with me, follow @cristi

#machine-learning #science #python


Cristi Vlad, Self-Experimenter and Author

Coin Marketplace

STEEM 0.21
TRX 0.14
JST 0.030
BTC 68220.71
ETH 3321.59
USDT 1.00
SBD 2.74