Exploratory Key Word Analysis for the World Cup on the Steemit Blockchain

in #utopian-io6 years ago (edited)

Repository:

https://github.com/steemit/steem

The world cup is well under way, starting on the 14th June we have now had 12 days of world class soccer.  With an event of such a global scale, you can hardly turn on the tv or read the news without mention of it somewhere.

Steemit is of no exception to the current world cup buzz.  There are many people now that I have engaged with and the world cup has come up in conversation.  Personally, I am not a soccer fan and Ireland is not in it, so I haven’t been watching or paying any attention.  But I did wonder how many accounts on Steemit are watching and paying attention. And I did wonder if this had any effect on postings on Steemit.

For this reason, I began an Exploratory analysis of the Comments made on Steemit using the keywords ‘Football’, ‘soccer’, ‘worldcup’ and ‘world cup’.

First, I had a look at Google trends since the beginning of June.

 

Although the world cup started in the 14th June, search on the topic did not peak until 17th June.

Next, I used data from the SteemSQL server.  I have detailed the code below as not to take from the analysis here.  However as this is an exploratory analysis, I will detail the order in which I explored the data.

First, I looked at the number of posts and comments made that contain the words ‘Football’, ‘soccer’, ‘worldcup’ and ‘world cup’ for the year to date.

 


Since the beginning of May 113.19 posts and comment have been made from 25.08K Authors.

In May the number of posts was 11K and authors 4.9K.  That works out to be an average of 356 posts a day

For June so far (26th June) the number of posts is 62.71K and authors is 8.2K. That works out to be an average post of 2550 posts a day.  

That’s a massive increase month on month of 616%.

We can also see the number of unique authors in red posting on this topic increased from 4.9K in May to 8.2K in June.

However, the world cup did not kick off until the 14th June.  If we drill further into June we can see on the 13th interest in topics took a jump from 900 posts to 2900 posts and it then jumped again on the 14th to 4.6K

 

The above data looked at both posts and comments. However, I also wanted to see how many level 1 posts were made to Steemit 

 

Now we can see there was 8.9K posts in May increasing to 17.4K which was a 95% increase.  We can deduct from this that most of the increase in activity was not in posts but in comment.

Lets take a look at that now

   

Wow 82% of all comments on the subject were made in June and there was a massive % increase from May to June.  Its also rather interesting the that number of authors doubled.

   

Above are ALL level 1 posts from May and June. I took these values so I could do some quick calculations.  From this we can work out that in May 0.75% of all posts contained one or more of the keywords.  In June 2.26% of all posts made so far contain one or more of the key words

Next, I wanted to look at the posting times.  The aim of this was to see if the world cup was having an impact on postings. 

 In the charts below, the top chart relates to ALL level 1 posts on Steemit and the bottom relates to posts with the keywords.  Matches seem to being as early as 10am UCT. Many starting at 3pm.  In general, 3pm is the peak posting time on Steemit.  3pm also showing as a high posting time for posts that include the key words.  However, this snap shot is from the start of the world cup (14th June) to today (26th June)

   

If we drill into the data for the 14th  only, we can see at 3pm the number of posts with the key word dropped, but what is more interesting is that 3pm was not the peak posting time on this day.  From previous analysis carried out each month (the post benchmarking report) I can confirm that 3pm has been the peak time each month since I started the analysis.

 

In general, Steemit sees a 5% increase in the number of posts at between 3-4pm over the previous hour, however on the first day of the world cup, Steemit saw a 4% reduction in posts for the same time period.

We can also see the posts with the key words also dropped when the match started.

Have a look at the charts below for the 15th.  The top being Steemit in general, the bottom being posts with the keywords.  Matches started on this day at 2pm, 3pm and 6pm.  Its hard to tell from these visualisations if these match times affected the number of posts on Steemit, however based on the data from the 14th when there was no match at 6pm we can see an increase in the number of posts whereas on the 15th there was a decrease.

 

I explore each of the days – lol to many charts to share, and none have shown the same pattern as the first world cup match on the 14th.

I also wanted to see if the world cup has increased or decreased the number of overall posts made on the platform since it started.

Below I have plotted the total level 1 posts since the start of May in grey with a trendline showing the average daily number of posts has been decreasing since May.  In yellow I have plotted the number of level 1 posts that contain one or more of the key words. We can see the trend line has increased.

Looking at ALL posts, below we can see on the 14th to the 19th the dip between the number of all posts per day and the trend line is visibly further that the other breaks on the trend line.  This could indicate that the first 5 days of the world cup distracted people from posting on Steemit and we can then see that on the 25th the number of daily posts broke above the trend line.

     

 Conclusion

It is very difficult to make concrete conclusion from an exploratory analysis as the aim is to highlight areas of interest that would require further investigation.  For example, it is impossible to tell if the lower number of posts made between the 14th and the 19th had anything at all to do with the world cup. 

On saying that I would be rather confident that the first world cup match had a direct impact on the posting activity on Steemit.  With an event of such global interest I would be very surprised to have found no impact or additional interest in the keywords.   And for sure there is interest with a massive increase in posts using the key words.

I did find it very interesting that Google search peaked on the 17th where as the number of posts peaked on Steemit on the 23rd.

Has the worldcup impacted how often and when you post?  Is your team winning?  Please do comment below.

The Data and the queries

I used M in Power BI to connect to the datasource.  After this I then used DAX calculations to model the data in preparation for the visualisation above.

The M query use to get data on all posts was

let
    Source = Sql.Database("vip.steemsql.com", "DBSteem", [Query="Select Author, permlink, created, body_length, net_votes, children, depth#(lf)from Comments (NOLOCK)#(lf)where#(lf)Created>=CONVERT(DATE,'2018-05-01')#(lf)#(lf)and CONTAINS(body,'""worldcup""')  #(lf)or CONTAINS(body,'""world cup""') #(lf)or CONTAINS(body,'""football""') #(lf)or CONTAINS(body,'""soccer""')                    "]),
    #"Split Column by Delimiter" = Table.SplitColumn(Table.TransformColumnTypes(Source, {{"created", type text}}, "en-IE"), "created", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"created.1", "created.2"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"created.1", type date}, {"created.2", type time}})
in
    #"Changed Type"


The M query used to get data on all posts with the key words was

let
    Source = Sql.Database("vip.steemsql.com", "DBSteem", [Query="Select Author, permlink, created, body_length, net_votes, children, depth#(lf)from Comments (NOLOCK)#(lf)where#(lf)#(lf)Created>=CONVERT(DATE,'2018-05-01')  #(lf)                  "]),
    #"Split Column by Delimiter" = Table.SplitColumn(Table.TransformColumnTypes(Source, {{"created", type text}}, "en-IE"), "created", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"created.1", "created.2"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"created.2", type time}, {"created.1", type date}})
in
    #"Changed Type"


Sort:  

Hi @paulag, impressive work as always! I'm not a big soccer fan either, I haven't posted about it and it has no influence my posting behavior. There are plenty of other things that keep me busy :)
It's quite interesting to see the total post counts in relation to the soccer related posts counts. I'm not fully sure if it's really the worldcup that affects the posting behavior or if there are other drivers. Posting count variations of that amplitude were there before as well. However the increase of posts about this topic cannot be denied. I guess we won't be able to know for sure.

From Utopian perspective, rewarding a Worldcup analysis is a bit difficult, because for the Steem open source project this represents a mostly social and behavioral analysis.

Your contribution has been evaluated according to Utopian policies and guidelines.


Need help? Write a ticket on https://support.utopian.io/.
Chat with us on Discord.
[utopian-moderator]

You have a minor misspelling in the following sentence:

Although the world cup started in the 14th June, search on the topic did not peak untill 17th June.
It should be until instead of untill.

🙄 well, nearly impossible to understand the sense of the sentence without your help... 😊 @peekbit

Amazing to see how well your knowledge of the English language is for someone from Germany!

😅 👍🏼

You wrote:
Since the beginning of May 113.19 posts
Should May not be replaced by 2018?

On Google people search in front and after the match. On Steemit we mainly write posts after the match. And the number of comments might peak during and also after the match. Just guessing to explain the Google vs Steemit difference.....

Congratulations! Your post has been selected as a daily Steemit truffle! It is listed on rank 8 of all contributions awarded today. You can find the TOP DAILY TRUFFLE PICKS HERE.

I upvoted your contribution because to my mind your post is at least 24 SBD worth and should receive 169 votes. It's now up to the lovely Steemit community to make this come true.

I am TrufflePig, an Artificial Intelligence Bot that helps minnows and content curators using Machine Learning. If you are curious how I select content, you can find an explanation here!

Have a nice day and sincerely yours,
trufflepig
TrufflePig

   In the charts below, the top chart relates to ALL level 1 posts on Steemit and the bottom relates to posts with the keywords

I’ve failed over and over and over again in my life and that is why I succeed. Michael Jordan

com/DQmcr5Pp1p4tQzBfrRuLVUP8s4EnNPTpv6FvbqxxUmRbyV5/image

This is an interesting expression in your post! :) I sometimes collect interesting quotes to make a post on their basis.

Hey @paulag
Thanks for contributing on Utopian.
We’re already looking forward to your next contribution!

Want to chat? Join us on Discord https://discord.gg/h52nFrV.

Vote for Utopian Witness!

Very interesting, and professional, I guess... how consequential however, I really dunno...

Coin Marketplace

STEEM 0.20
TRX 0.14
JST 0.029
BTC 67900.86
ETH 3250.85
USDT 1.00
SBD 2.63