Horror Movie Poster Classifier

in #utopian-io7 years ago

HorrorMoviePosterClassifier

Background

Throughout the history of cinema movie posters have been used to capture and sell the mood of a film. Whether it's an action-adventure, romantic-comedy or sci-fi horror movie genres have developed their own visual language to communicate to audiences. This project is my attempt to see if, via simple machine learning, computers can tap into this language.

Method

From a list of movies containing roughly equal numbers of horror and non-horror movies we download the movie details, such as which genres it belongs to, and poster images from RottenTomatoes. Using k-medoids clustering we group the image pixels via colour value and calculate the fraction of the total number of pixels contained in each cluster. This gives us a reasonable idea of the colours used in each poster as well as their relative importance. Is the poster mainly black with white highlights or white with black text?

To classify an unknown movie poster we extract these colour-fraction vectors using k-medoids before finding similar colour combinations from our database using the K-nearest neighbors algorithm. This classifies if a poster is for a horror movie or not based on whether the majority of its k nearest neighbours are also horror movies. ###Additional details Once downloaded the details for each movie along with our calculated colour-fraction vectors are saved in a redis database. This is to speed up repeat calculations and minimise the number of API calls to RottenTomatoes. Posters are also saved to the hard-disk so that in the event the user wishes to change the number of colour-fraction vectors used to approximate each poster no additional downloads are necessary. ###Running

Setup the database first create a list of movie names.
movie_list = ["The Ring", "Poltergeist", "The Exorcist", "Forrest Gump", "The Sound of Music", "Casablanca"]
Larger lists will give more accurate results. An easy way to get large lists is to search for imdb user lists and parse the resulting RSS feeds to get a list of movie names.

The database can then be initialised.
d = Database(movie_list)
This will work itself through the list searching first for results on the redis server given before querying RottenTomatoes.

Suitable parameters, the number of clusters and nearest neighbours, are estimated and the database is trained searching for movies with the keyword horror.
d.cross_validation("horror")
An estimate of the accuracy of the database can be obtained using.
d.test()
This also prints the confusion matrix.

Individual movies can be tested with a simple get call to the database.
d["Gladiator"]
d["Titanic"]
d["Girl with a Pearl Earring"]
Dependencies



Posted on Utopian.io - Rewarding Open Source Contributors

Sort:  

Your contribution cannot be approved because it does not follow the Utopian Rules, and is considered as plagiarism. Plagiarism is not allowed on Utopian, and posts that engage in plagiarism will be flagged and hidden forever.

You can contact us on Discord.
[utopian-moderator]

Coin Marketplace

STEEM 0.19
TRX 0.15
JST 0.029
BTC 63549.46
ETH 2562.53
USDT 1.00
SBD 2.66