Machine Learning on a Cancer Dataset - Part 9

in #machine-learning7 years ago

The 9th video in the series and the second video on Logistic Regression is all about optimization.

One of the parameters of Logistic Regression we can tune to improve the accuracy of our classifier is the 'C' parameter. 'C' basically controls the strength of regularization.

In this video we compare the default value of 'C', which is 1, with the results and accuracy we get by setting C to 100 (so, 100 times the normal value) and by setting it to 0.01 (100 times less than the normal value). The code snippets below can also be found on my github:

log_reg100 = LogisticRegression(C=100)
log_reg100.fit(X_train, y_train)
print('Accuracy on the training subset: {:.3f}'.format(log_reg100.score(X_train, y_train)))
print('Accuracy on the test subset: {:.3f}'.format(log_reg100.score(X_test, y_test)))

Output:

Accuracy on the training subset: 0.972
Accuracy on the test subset: 0.965

log_reg001 = LogisticRegression(C=0.01)
log_reg001.fit(X_train, y_train)
print('Accuracy on the training subset: {:.3f}'.format(log_reg001.score(X_train, y_train)))
print('Accuracy on the test subset: {:.3f}'.format(log_reg001.score(X_test, y_test)))

Output:

Accuracy on the training subset: 0.934
Accuracy on the test subset: 0.930

For a complete walkthrough of the code, please see the video below.


As a reminder:

In this series I'm going to explore the cancer dataset that comes pre-loaded with scikit-learn. The purpose is to train the classifiers on this dataset, which consists of labeled data: ~569 tumor samples, each labeled malignant or benign, and then use them on new, unlabeled data.


Previous videos in this series:

  1. Machine Learning on a Cancer Dataset - Part 1
  2. Machine Learning on a Cancer Dataset - Part 2
  3. Machine Learning on a Cancer Dataset - Part 3
  4. Machine Learning on a Cancer Dataset - Part 4
  5. Machine Learning on a Cancer Dataset - Part 5
  6. Machine Learning on a Cancer Dataset - Part 6
  7. Machine Learning on a Cancer Dataset - Part 7
  8. Machine Learning on a Cancer Dataset - Part 8


To stay in touch with me, follow @cristi

#machine-learning #science #python


Cristi Vlad, Self-Experimenter and Author

Coin Marketplace

STEEM 0.20
TRX 0.12
JST 0.029
BTC 61536.69
ETH 3445.53
USDT 1.00
SBD 2.50