You are viewing a single comment's thread from:

RE: Using Machine Learning to Fight Plagiarism

in #steemit8 years ago

There has been some research along these lines. Part of the problem of using machine learning to find someone's "writing fingerprint" is that you need enough of a corpus to train the classifier on. Shakespeare, Agatha Christie and Louis L'Amour (to name a few) have pretty big bodies of work to use. But a random Internet poster? That's hard.

It might be better attacking the problem similar to financial fraud detection. Develop a list of features common to posters who plagiarize and use those features to score new posts. High scores then can be followed up with more in-depth (human) study.

What features are those? I don't know. I would begin investigating it by clustering posts around reading level, frequency, topics and maybe payouts, just for starters. Then analyze each cluster to see how many and what kind of plagiarized posts appear in them. If nothing turns up, vary the features and start again.

It's an interesting problem that I think will take a long time to solve.

Coin Marketplace

STEEM 0.16
TRX 0.16
JST 0.030
BTC 58474.85
ETH 2500.10
USDT 1.00
SBD 2.39