Why do 20% posts get 80% of payout?

in #steemit8 years ago

Do you know what is the most frequently used word in English language?

According to the analysis of British National Corpus, which consists of 100 million word collection of samples of written and spoken language from a wide range of sources, the most frequently used in English language is "the".

Word "the" accounts for nearly 6% of everything we say, read or write.


Source: screenshot from this this cool site : Wordcount

The top 20 word are in the following order: "the", "of", "and", "to", "a", "in", "is", "I", "that", "it", "for", "you", "was", "with", "on", "as", "have", "but", "be", "they".

Seems like a fun trivia, but is there something more?

It looks like that it doesn't matter whether we analyse an entire language, just one book or one post, almost every time an interesting pattern emerges.

Zipf's law

Word frequency and ranking on a log log graph follow a nice straight line. A power-law.


Image Source
This law is called Zipf's law and it states that given some form of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table.

Thus the most frequent word will occur approximately twice as often as the second most frequent word, three times as often as the third most frequent word, etc.

Image source

The law is named after the American linguist George K. Zipf (1902–1950), who popularized it and sought to explain it, though he did not claim to have originated it.

Zipf's law isn't limited on English language only. It applies to other languages, in fact, all of them.

Isn't it funny how something so complex and grandiose as language can be predicted in such a simple way.


Image source

And not only language, Zipf's law pattern can also be found in:

  • citations of scientific papers
  • the cumulative distribution of the number of “hits” received by web sites
  • copies of books sold
  • magnitude of earthquakes
  • intensity of solar flares
  • wealth of richest people
  • protein sequences


Image source

80-20 Rule

Zipf's distribution is discrete form of the continuous Pareto distribution.

The Pareto principle states that, for many events, roughly 80% of the effects come from 20% of the causes.

Joseph M. Juran suggested the principle and named it after Italian economist Vilfredo Pareto.

Pareto showed that approximately 80% of the land in Italy was owned by 20% of the population.

Pareto also observed that 20% of the pea pods in his garden contained 80% of the peas.

What does it mean today?

Pareto's Principle can also be observed in our daily lives.

The 80/20 rule should not be taken too seriously, it is a mere symbol of interesting disproportions of cause and effect that happen in the world we create.

Examples of Pareto's Principle I've found interesting:

  • 80% of word occurrences come from 20% of the words
  • 80% of sales come from 20% of customers
  • 80% of complaints come from 20% of issues
  • 85% of Facebook’s visitors are looking at only 8% of overall images
  • Most people spend 80% of their time with 20% of their friends
  • 20% of activities produce 80% of results


Image Source - Health are expenses by percentiles U.S.

In 2002 Microsoft reported that 80% of the crashes are caused by 20% of the bugs detected.

Possible Explanations

Although Zipf’s Law holds for most languages, we can't really tell why.

It may be explained to some point by the statistical analysis of randomly generated texts.

Theory is that the rank distribution arises naturally out of the fact that word length plays a part — long words tend not to be very common, whilst shorter words are.

But still there are still some values that don't undergo the given hypothesis. Let's take word frequencies for example. Taboo words like "sex" or the names of planets, days and chemical elements. They are highly constrained by the natural word.


Image source

Statistical analysis doesn't explain that.

The principle of least effort is another possible explanation. Zipf himself proposed that the word frequencies in language could have something with speakers and listeners. Speakers tend to use fewer words when expressing their ideas, while listeners liked when there were more words. Zipf's law is a result of compromise on amount of words used between speakers and listeners.

Another approach is called preferential attachment.
For example, posts, videos or images that have many views, get more views.

What happens is that some quantity, typically some form of wealth or credit, is distributed among a number of individuals or objects according to how much they already have, so that those who are already wealthy receive more than those who are not.

Once a word is used it is more likely to be used again.

But there doesn't need to be a conscious effort to do it. It also happens naturally.
Imagine having a number of unchained chain links.

By picking two out of the mess and linking them together you would create a longer chain that would now be more likely to get picked again randomly from the mess just because it is longer. Repeating the process in this situation would also end up in chain links length represented by Zipf's law.

Conclusion

Zipf’s Law is one of those empirical rules that characterize a surprising range of real-world phenomena remarkably well. I found interesting the amount of things that followed it.


Source: Steem Whitepaper

For the end, I'll leave you with a Steemit Payout Distribution graph and you can guess which pattern it follows.

I hope you liked the topic.

Sort:  

lmao

And only 20% of what they do will have 80% of the effect.

I think there is no evidence for this. I think it has more to do with looks and the way you carry yourself as a man (your ambitions, etc.). Women marry up you know, but there is a limit to that also :) Nature has a way to balance everything out, otherwise we wouldn't be here.

It may also work the other way around though as well.

In that day seven women will take hold of one man and say, "We will eat our own food and provide our own clothes; only let us be called by your name. Take away our disgrace!"

There's probably no peer reviewed data but OKCupid analyzed their membership.
http://blog.okcupid.com/index.php/your-looks-and-online-dating/

Really enjoyed reading and I can see you have put in a lot of effort so definitely an upvote , great job

Thanks, I tried to make it interesting.

Thanks for this post @eneismijmich. There have been some others that outline the way rewards are laid out but this info about Zipf’s Law certainly sheds a new light on it.

Thank you. I will check you out.

I was thinking about the 80/20 Principle over the last few days, especially with all of the complaints about the @dollarvigilante and how 'unfair' Steemit is. But yeah, the universe is unbalanced. Reading 'The 80/20 Principle' by Richard Koch taught me this. You see it EVERYWHERE once you train your mind. Anyways, great post!

Yes, it's unfair but that seems is the way nature work.

80% of whales upvote 20% of the krill

The problem with Steem's voting reward algorithm is not that 20% get 80% of the rewards, but it is that the selection of the 20% is done by 1%. And this is motivating the wrong behaviors and focus of content produced.

That would explain why 80 percent of the girls I talk to say "I have a boyfriend"

haha had to laugh hard about your comment @seasi06

What an interesting topic. Have heard and observed this phenomenon for years especially regarding volunteers in organizations like churches. Had no idea there was such a thing as Pareto's Principle or Zipf's Law. Thanks for the enlightenment. Have started following you also hoping for some more interesting reads like this.

There you go, even church volunteering undergoes this law.

Interesting, i have never thought of this before, how easy it is to quantify language. This can help journalists.

How do you think it could help?

Coin Marketplace

STEEM 0.19
TRX 0.16
JST 0.030
BTC 66256.41
ETH 2643.78
USDT 1.00
SBD 2.68