New Steemit User Analysis December Report

in #utopian-io7 years ago (edited)

What a year 2017 was on Steemit. One heck of a journey that started for me in June. I guess this report ends the year on a high. I can’t wait to show you the data.

But before that, this analysis is prepared on a monthly basis. If you missed last month’s report you can find it here
https://steemit.com/utopian-io/@paulag/new-steemit-users-analysis-oct-and-nov-report-steemit-business-intelligence

However this month I have taken the analysis a step further. Included in this report are new accounts that by the numbers show suspicious activity and a report on drop offs when it comes to voting.

Datasource and queries

I connected to Steemsql held and managed by @arcange using Power BI
The sql query used was for December

 Select *
 FROM Accounts (NOLOCK)
 where   
 ( created >= CONVERT(datetime,'12/01/2017')
AND created< CONVERT(datetime,'01/01/2018')

After collecting the data I then use DAX (Data Analysis Expressions) to carry out calculations on the data, and the results are presented in the visualizations below.

The analysis includes data for December using November as the comparable period, and included analysis not previous shown in previous months reports

November Report

4.png

December Report
5.png

Summary

In November 40,788 new accounts were registered with Steemit. In December this value jumped 81% to 73,793. This is by far the largest jump since I began tracing this data. The total posts by new accounts also increased as you would expect with a rise in the number of new accounts. This increased by 57%.

Other important stats include % of accounts with an external link and ‘about’ profile complete. The % here has not changed much on November, however the percentages are slightly down.

What is also interesting is that only 39.81% of new account, so the minority of new accounts, have posted. This leaves a substantial number of new accounts that have not yet posted. This can be better seen in the pie chart below

6.png

7K new accounts only had 1 post, 3.5K had two posts and 2.81K had 3 posts.

Strange Activity

16.82% of new accounts had 10 or more posts. It is really good to see new Steemains engaging with the platform and posting. However when looking at the data I came across many accounts set up in December that have posted/commented more than 1000 times.

This seems rather excessive, I struggle to get 1 data post done a day and I know from talking with writers, good form content can take a number of hours to write and post.

What I will say is that I have not checked the integrity of these accounts or posts, but I really don’t see how one account as added 4416 good/valuable posts in 7 days like @nijeah and as it is one of the accounts I actually did look at. I would suggest you take a look yourself and decide.

7.png

On saying that I am tagging @steemcleaners so they can compare lists :-)

Additional Data

The visualization below shows the % off accounts registered in December that have not voted since the given date and also the cumulative

9.png

I am part of a Blockchain Business Intelligence community. We all post under the tag #BlockchainBI. If you have an analysis you would like carried out on Steemit or Blockchain data, please do contact me or any of the#BlockchainBI team and we will do our best to help you...

You can find #BlockchainBI on discord https://discordapp.com/invite/JN7Yv7j



Posted on Utopian.io - Rewarding Open Source Contributors

Sort:  

As someone else pointed out, it seems fairly clear that some of these high rate of post accounts are just bot front ends which are transferring posts which exist elsewhere onto the blockchain.

Now, 90% of the time that's going to be for illicit purposes. Either duplicating another account on the steem blockchain's postings to create confusion about who the real content creator is or just creating/copying crap content from elsewhere in order to drive up saturation of visibility on things like New, etc.

Maybe more than 90%. Maybe even 99%.

But I can technically imagine that there are or would be, could be, good uses for a bot that does the sort of thing. For instance, if I wanted to migrate the entirety of my pre-existing blog to the steem blockchain because that I believe that it's a long-term solution for online storage and blogging – that might be a defensible position. (One could immediately counter that dumping the entire content of said blog to the system in one go is incredibly wasteful and trickling it out over a few posts a day would be far more useful. And I would say that that person was exactly correct. Stupid use can still be legitimate use, however.)

To me, this seems like a fine use of @cheetah for determining if there is a legitimate content being duplicated, and specifically targeting these kinds of accounts for examination might be ideal.

Which just fills me with a certain amount of amusement that the best tool to use against potential bots is, in fact, another bot.

Determining which, if any, of these high rate bots are theoretically legitimate… That's a much harder problem. Step one is just trying to differentiate and localize them.

Hi, interesting points - if it is bots just duplicating, wouldn't they be automatically caught by @cheetah?

I've had the same thought about migrating my blog over (I assume yous is wordpress too), but because I can't refine or structure through pages this platform isn't good for me on that front!

Nice comment on the irony, that's mainly why I gave you my 100% upvote, now worth twice as much as it was earlier this week, which is something I'm sure we're all enjoying!

I'm not sure that @paulag has time to reply to comments btw - not surprising given the amount of work that goes into producing these reports!

You would think they would be, but since we have empirical proof by observation that they aren't – it can't.

In part that might just be because it would be computationally expensive to make that broad a set of comparisons. But I don't have enough of the inside skinny on how much in terms of resources that something like @cheetah consumes a regular basis. From my own experience with sliding window N-gram lexigraphical analysis, they can get surprisingly hefty in memory requirements really quickly.

My original blog platforms span a number of systems, which is one of the reasons that it might be a good idea to migrate things to a singular point at some point in the future. I'm not sure that any blockchain is a legitimate target of that; it's largely historical content and there aren't any blockchain-backed systems which really are interested in long-term document storage and discovery.

@paulag and I have exchanged pleasant words in the past, so I know she occasionally has time to get around to dealing with comments and enjoys doing so, so I'm not overly concerned on that point.

One day we will see a platform which is intended to reward evergreen content, have some sort of basal protection from content flooding, and go out of its way to cultivate creators in a relatively managed environment – but today is not that day, and we have what we have.

yes I do reply as much as I can.

When i started on steemit I also copies one of my own blogs and was catch by cheetah. I would like to move my blog and community over but that is only one of the reasons I have not yet done so.

I dont mind people using their own content, if it is good content, that's just my opinion - but I didn't do it a second time.

but if you are taking content that is not your then its different. also if the content adds no value then there is a problem too.

My personal feeling is that it is probably better not to migrate a blog with content that people are already invested in, in part because that information is already out there in the wild – and in part because getting a community to move is really about providing them a reason to follow you to get new stuff.

If you stop blogging somewhere and start blogging somewhere else, people will follow you. If you move blog content from one place to another, the community that you already have has no reason to leave – you're not really producing more stuff they haven't seen, and you have problems building a new community because it's content that they could've already had and might have already found elsewhere.

The ideal situation for someone with an established community is to post wherever they find to be the best fit for what they want to achieve, and then link to that content on other social media platforms where people can find them.

In my case, it means that my digital existence on social media exists on Twitter, Facebook, Google+ – but all of those points of presence are connected to wherever I currently happen to be blogging, whether that be on Medium, Blogger, or Steemit. I think that might be the best approach both philosophically and mechanically to dealing with the problem, such as it is.

I basically agree with everything you say - or as they put it up north 'It is what it is'!

I think this might actually deserve a larger blog post for me talking about points of presence versus blogging platforms, and why you want to keep up connectivity with multiple social media networks.

Thanks! I think you actually gave me today's writing topic, and hopefully it turns out to be okay.

Glad to have been some use, I think it would be a good post! I can't wait to quit work in July and take everything online full-on, in the meantime my challenge is to avoid spreading myself too thin!

I'll look forward to reading that post.

I got caught out twice - the second time after I'd deleted all of the content from one of my posts, but not the title (keeping a link in the post to here) - now THAT annoyed me.

I've since concluded the material on my WP blog is much better off there, and I do quite well out of the ad revenue and resource sales, so it doesn't make sense to migrate... cheers for the reply!

Hello Paula.

I consider my content to be good, and I do write my articles always in excess of 600 words.

I am an adept of long form content and I was registered 21 days ago, so December.

My current post number is well above 1000 by this point, and I am by no means a Spammer.

So, i think those numbers deserve a little raise to be suspicious, I think.

Cheers

Just checked it again and in fact it is 1800, so closer to 2000

I am glad you agree, and took the time to back it up. thank you

Hi, @paulag Great analysis I want to learn how to start analysis using Steem database need your help I'm also contributor in utopain-i0 Tutorials and video Tutorials now I need you learn something new hope you will help me :)

you should join the bi community and the utopian community on discord, you will find me over there a lot

Ok mam :) I joined utopian coummuity

Hey @paulag I am @utopian-io. I have just upvoted you!

Achievements

  • WOW WOW WOW People loved what you did here. GREAT JOB!
  • Seems like you contribute quite often. AMAZING!

Community-Driven Witness!

I am the first and only Steem Community-Driven Witness. Participate on Discord. Lets GROW TOGETHER!

mooncryption-utopian-witness-gif

Up-vote this comment to grow my power and help Open Source contributions like this one. Want to chat? Join me on Discord https://discord.gg/Pc8HG9x

its been a big year, we hoping for what lies ahead....full steem ahead @paulag

81% jump ! can't imagine all the good things 2018 will bring !

ty for your report

Thank you for the contribution. It has been approved.

The DAX work is really nice to have as part of this analysis.

Thanks!

You can contact us on Discord.
[utopian-moderator]

DAX rocks

Your analysis are brilliant. Future of cryptos are bright. Good new is that STEEM going up by almost 72% in value by the start of new year. Steem reached at $6.50 and SBD reached $10.50 currently.

...good form content can take a number of hours to write and post... yes absolutly

Coin Marketplace

STEEM 0.16
TRX 0.13
JST 0.027
BTC 57642.15
ETH 2578.06
USDT 1.00
SBD 2.49