Machine Learning on a Cancer Dataset - Part 7

in #machine-learning7 years ago

After training our first machine learning algorithm, KNN, on the cancer dataset in the previous video, we want to see if it can do better.

Afterall, we used the algorithm with its default parameters. In this video we play with the parameters to see how we can optimize the algorithm. More specifically, we are going to loop through a number of n_neighbors, from 1 to 10, train the KNN for each, and evaluate which one yields the best performance. The code looks like this (and it's on my github too):

# Resplit the data, with a different randomization (inspired by Muller & Guido ML book - https://www.amazon.com/dp/1449369413/)
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, stratify=cancer.target, random_state=66)

# Create two lists for training and test accuracies
training_accuracy = []
test_accuracy = []

# Define a range of 1 to 10 (included) neighbors to be tested
neighbors_settings = range(1,11)

# Loop with the KNN through the different number of neighbors to determine the most appropriate (best)
for n_neighbors in neighbors_settings:
    clf = KNeighborsClassifier(n_neighbors=n_neighbors)
    clf.fit(X_train, y_train)
    training_accuracy.append(clf.score(X_train, y_train))
    test_accuracy.append(clf.score(X_test, y_test))

# Visualize results - to help with deciding which n_neigbors yields the best results (n_neighbors=6, in this case)
plt.plot(neighbors_settings, training_accuracy, label='Accuracy of the training set')
plt.plot(neighbors_settings, test_accuracy, label='Accuracy of the test set')
plt.ylabel('Accuracy')
plt.xlabel('Number of Neighbors')
plt.legend()

For a walkthrough and for the plotting of the results, please see the video below.


As a reminder:

In this series I'm going to explore the cancer dataset that comes pre-loaded with scikit-learn. The purpose is to train the classifiers on this dataset, which consists of labeled data: ~569 tumor samples, each labeled malignant or benign, and then use them on new, unlabeled data.


Previous videos in this series:

  1. Machine Learning on a Cancer Dataset - Part 1
  2. Machine Learning on a Cancer Dataset - Part 2
  3. Machine Learning on a Cancer Dataset - Part 3
  4. Machine Learning on a Cancer Dataset - Part 4
  5. Machine Learning on a Cancer Dataset - Part 5
  6. Machine Learning on a Cancer Dataset - Part 6


To stay in touch with me, follow @cristi

#machine-learning #science #python


Cristi Vlad, Self-Experimenter and Author

Sort:  

good science..

Its passing my overhead

Coin Marketplace

STEEM 0.20
TRX 0.12
JST 0.029
BTC 61536.69
ETH 3445.53
USDT 1.00
SBD 2.50