[Analysis] May need vote dust threshold again

in #utopian-io5 years ago (edited)

Repository

https://github.com/steemit/steem

Introduction

HF20 removed the so-called "vote dust threshold" for a better user experience.

"If a vote is placed that is below the required threshold, it will be rejected by the blockchain. This can create a bad user experience for new users, as their votes can fail for seemingly no reason. ...

In hardfork 20, this “vote dust threshold” will be removed. After this change users with any amount of SP will be able to cast votes so long as they have sufficient bandwidth. Votes that are below the threshold will be posted to the blockchain but will have no impact on rewards. This will allow users to have a better user experience on all Steem-based applications by enabling them to vote whenever they want to (as long as they don’t exceed their generous bandwidth allocation), without adding to the computational load on the blockchain by requiring that it calculate the impact of effectively powerless votes on the rewards pool. ..."
https://steemit.com/steem/@steemitblog/hardfork-20-velocity-development-update



But now we all know that a real bad user experience is that a user cannot write.

a typical user with generous starting 15 SP who can make only 1 comment, but 20 votes).


  • Required RC for a comment is much higher (about 14 times) than for a vote. Even if they can vote, it only makes them need to wait longer to write something.
  • If they figure out that their vote doesn't count at all, it might be an even worse user experience for both voters and authors.

a post receiving so many dust votes. Do you like this if you're the author? Probably not.



Moreover, dust vote can also create a burden on the Steem blockchain. Furthermore, this can be a low-hanging attack point. Suppose tons of accounts do dust votes. (Currently even an account with 0 SP can vote.) Then it's possible that an entire block can be filled with dust votes. Vote dust threshold may need to be reintroduced for both user experience and blockchain health.

Scope

  • I analyzed fine details of votes on "almost" all posts from 8-15-2018 to 12-14-2018 (HF20 is on Sep 25/26th).
  • Total 1.1 million posts
    • 675K posts after HF20 that have valid voting data (out of 957K all posts, see below.)
    • 434K posts before HF20 that have valid voting data (out of 806K all posts, see below.)
  • Total 62 million votes (38 million after HF20)

The reason of "almost": this kind of analysis cannot (and should not, in my opinion) be done by using Steem API. But unfortunately, @steemsql sometimes has active_votes field in comments table that is still pending, which means they don't have finalized voting data. So I filtered them out. (Note that TxVotes cannot be used since it doesn't have rshares information.) Since they are still too many, the filtering provided some natural random sampling (55~70% coverage, actually this is way more than enough. If you wanna do the same analysis, you definitely wanna use a much smaller sample. Parsing active_votes and categorizing each vote takes quite a time.)

Technical Background

This analysis needs some technical background. Obviously, what we need is how to determine dust vote. You may think you know how to do this, but you may miss something.

Dust vote: a vote that has lower than 50,000,000 rshares (defined as STEEM_VOTE_DUST_THRESHOLD in the code below)

How are dust votes handled?

The vote is still accepted, but any vote is deducted by the threshold. Thus, a vote less than the threshold is set to have zero rshares.

Here is the code that removes dust votes:

https://github.com/steemit/steem/blob/c6b865b6f27999cba38f3840151c8306f14f3371/libraries/chain/steem_evaluator.cpp#L1419

How to find dust vote? Be careful. It's not just a vote that has 0 rshares.

It's tempting to find any votes that has zero rshares. But this is wrong. Any late vote comes after payout also has zero rshares, too! So they need to be separated. I also counted downvote that has negative rshares separately. This entire process takes lots of time, mainly because active_votes are a text field in steemsql. So, you first need to parse them into json, and to determine whether it's late vote or dust vote. And to determine if it's dust or late, you need to compare the post created time (created) and the voting time (time in active_votes), which is again a string.

Results

Dust votes are increasing

  • Dust votes are increasing.
  • Late votes and down votes do not have clear trend.

Dust votes are clearly increasing. You may think it's because the total number of votes is increasing. If you think so, you're not an active Steemit user :) Where have you been?

Ratio of dust vote is also increasing.

  • Dust vote ratio is increasing. Over the last 2.5 months, dust vote ratio is increased by 150% (from 0.5% to 1.25%).
    The total number of votes (red with the right axis) is decreasing, unfortunately, as you can even feel if you're an active user. As a result, the ratio of dust votes to total votes is also increasing.
  • Right after HF20, dust and late vote ratios are quite high.
    Note that the spikes of dust and late vote ratios are not incorrect. Come on, you already forgot that interesting moment that even Dolphins cannot write and most cannot even vote? Even if they were able to vote, due to the voting power reset, much more votes were dust than usual. And so many people voted posts that they liked but too late.

Wait a sec, how was it before HF20?

Very good question! First of all, again there was vote dust threshold, so any vote below the threshold wasn't even accepted, i.e., not on the blockchain. Thus, strictly speaking, we cannot compare two periods the same way.

But, don't give up. One thing we can do is to find vote that has less than rshares of the threshold, STEEM_VOTE_DUST_THRESHOLD 50,000,000. What's the problem of this? Obviously we cannot help but underestimating dust votes, since there must be many dust votes rejected. But at least this helps seeing the trend.

  • The trend of dust votes clearly changed after HF20.

Again, the small number of dust votes itself could be mainly due to the threshold, so let me explain further with the ratio graph.

  • Dust vote ratio was quite stable before HF20 but increases after HF20
    Despite the impossibility of the fair comparison, one thing for sure is the ratio was stable before HF20. How many votes were reject due to the threshold? We don't know, since they are not on the blockchain. But I believe that they are not so many. I also tried with higher threshold, but it doesn't change the trend of the ratio before HF20.

  • Down vote and late vote ratios are pretty stable before and after HF20.
    Because there were no policy change about them.

  • It clearly shows the downtime at the HF20 and another before HF20.
    No dust vote ratio spike in the first downtime perfectly makes sense, since there was no voting power reset as opposed to HF20's. So there is only late vote ratio spike. The reasons behind the spikes were already explained in the graphs for post-HF20 only.

Conclusion

In contrast to its original intention, removal of vote dust threshold doesn't seem to make a better user experience. If they know that dust vote doesn't increase any payout but still uses voting power and RC, both voters and authors will have a bad user experience. Dust votes are increasing since HF20, and this may be a real concern for the Steem blockchain in the future. Therefore, vote dust threshold may need to be reintroduced for both user experience and blockchain health.

"I can't write!"

"You can still vote."

"Thanks! Just voted your kind reply."

"Gotcha! You now wait even more to write."

"???"

Tools and Scripts

SELECT author, permlink, total_payout_value, CAST(active_votes AS TEXT) AS active_votes, created
FROM Comments
WHERE DEPTH = 0
    AND created < last_payout
    AND created BETWEEN '2018-9-26' AND '2018-12-14'
ORDER BY created ASC

You may wanna query by dividing the time range and process them separately and combine them. (It's a huge data) Note that 2018-09-26 is the date of HF20.

  • Python: analyze active_votes
import swifter
from dateutil.parser import parse
from datetime import datetime, timedelta

def categorize_vote(x):
    votes = json.loads(x['active_votes'])

    x['num_votes'] = len(votes)
    num_dustvotes = 0
    num_downvotes = 0
    num_latevotes = 0

    for v in votes:
        ts_voted = parse(v['time'])
        if (ts_voted - x['created']).total_seconds() > 7*24*60*60:
            num_latevotes += 1
            continue

        rshares = int(v['rshares'])
        if rshares < 0:
            num_downvotes += 1
        elif rshares <= 50000000: # this code is for pre-HF20, change this to rshares == 0 for post-HF20, better to run two periods separately, it's too big otherwise.
            num_dustvotes += 1

        x['num_dustvotes'] = num_dustvotes
        x['num_downvotes'] = num_downvotes
        x['num_latevotes'] = num_latevotes
    return x

df['created'] = df.index
df = df.swifter.apply(categorize_vote, axis=1)

Note that this is a large data set, so you may want to use some parallelization. In my case, swifter worked well. It's simple to use, just put swifter between df and apply, that's all :)

Relevant Links and Resources

Original intention of removing vote dust threshold

Two posts that are useful to understand rshares better (yes, they're mine :)

Proof of Authorship

While this is my first utopian article using large-scale data (finally I opened the hell gate and got @steemsql access :), here are my other recent utopian articles in addition to the two others.

ps. The reviewer @abh12345 let me know there was prior analysis on dust vote, and to me it seems this:

I really like @crokkon's posts and my main contribution is showing the increasing trend and explaining it in terms of UX/vulnerability. Hope Steemit to think about this problem seriously when they have more resources :) Thanks.

Sort:  

Hi @blockchainstudio

Thank you for your contribution.

An interesting topic, and one covered a couple of times in recent months. This is another worthy addition, and I appreciate the time and additional steps undertaken to weed out late votes and down-votes.

The line relating to 'improved user experience' is an interesting one, and it's debatable as to which users are being discussed here. Clearly the change doesn't help new users interact, but existing users are finding far less spam comments.

Is the threshold too low / Is this stopping growth of active/engaged accounts on the Steem blockchain? Tough to say.

In light of these changes and the planned move to RocksDB, it's clear that growth/hardware is a factor, and perhaps Steemit inc decided they didn't want to waste resources on 'meaningless' votes/comments?

Thanks again

Asher [CM - Analysis]

Your contribution has been evaluated according to Utopian policies and guidelines, as well as a predefined set of questions pertaining to the category.

To view those questions and the relevant answers related to your post, click here.


Need help? Write a ticket on https://support.utopian.io/.
Chat with us on Discord.
[utopian-moderator]

Hi @abh12345, thanks a lot for your review and comments. Alas, I didn't know that there was prior research on this topic. Thank you for letting me know. Is there an UI or easy way to see the title or search utopian posts only? Since I very recently started Utopian and weekly report seems to be posted on individual accounts only and authors also vary, so sometimes it's even hard to find weekly report.

Anyway, now I googled it and found https://steemit.com/utopian-io/@crokkon/don-t-cast-worthless-votes-zero-value-votes-in-hf20-1540641228665 Haha, it's quite interesting that @crokkon was my usual utopian reviewer and I didn't know this article. (Sorry @crokkon. You know I started very recently and I really didn't expect that there is someone who's interested in dust :)

But I'd like to emphasize that my main contribution is showing the trend that dust vote is increasing. Let me know if there is any article pointed out. Since this is very recent trend, there can't be if the post is older than a few months. To me, it's worrisome, considering that there are many dormant accounts that could be potentially used for dust vote in the future.

Regarding your comments,

The line relating to 'improved user experience' is an interesting one, and it's debatable as to which users are being discussed here. Clearly the change doesn't help new users interact, but existing users are finding far less spam comments.

But "spam comment" is unrelated to dust vote. My point is more like, we're getting dust vote spam :) I guess all users don't like it. Maybe some are find since too small number of votes doesn't look good. But I'm saying that now it's getting more and more.

Is the threshold too low / Is this stopping growth of active/engaged accounts on the Steem blockchain? Tough to say.

I'm not talking about the level of threshold itself. I agree that it's very hard to determine (like what % of self-vote is proper, for instance). But I'm talking about the way we should handle dust votes. I think they shouldn't be accepted on the blockchain, or at least we need to come up with another way. If there is no good idea, rejecting them is better than now in my opinion.

And I agree with you that Steemit should focus on cost reduction. You're right, they don't have resources on dusts :) Attackers also don't care now, since steemit is small and no short position is possible. Hope Steem will survive and prosper :)

Thank you so much for your feedback again. Have a nice day!

Thank you for your review, @abh12345! Keep up the good work!

Summary in Korean: 음 이래서 steemsql을 신청을 안 하려던 건데ㅠㅠ 원래 이 글부터 쓰려던 것은 아닌데 그냥 쓰기 좀 더 간편하고 시리즈로 두세개 더 쓸거리가 있어서 써봤습니다. 일반 유저 입장에선 크게 신경 쓰실 내용은 아니고 아시든 모르시든 크게 손해보거나 득볼것 없으니 정말 호기심 많은 분 아니시면 그냥 조용히 지나치시면 됩니다ㅎㅎ

HF20이후로 일정 기준 이하의 보팅은 소리소문 없이 사라집니다. 곰돌이 dust payout이랑 다른 개념인게 dust vote라고 해서 보팅 한건 한건이 사라질 수가 있어요. (사실 곰돌이 지난번 연쇄반응시 보팅이 아주 조금씩 줄어지는 것도 원인이 이걸로 거의 확실시 되는데 해당 글도 기회되면 다음에 따로 써보겠습니다.)

아무튼 이게 애초 목적은 UX를 좋게 하려는 거였는데(보팅조차 안되면 기분나쁘니까) 근데 보팅이 안되면 글은 더더욱 쓸 수 없는 상황이고 해당 보팅이 실은 무가치란 걸 알면 하는 사람(파워는 감소됨 미미하나. 물론 스파가 작으면 또 비율이 작진 않겠죠)이나 받는 사람이나 오히려 기분이 좋을 건 없죠. 더군다나 최근에 사람도 줄고 글도 줄고 전체 보팅수도 줄고 있는데 이런 기계적이고도 보상면에서도 무의미한(보상면에서 유의미한 기계적 보팅은 늘 인정ㅎㅎ) 보팅만 많이 늘어나는 추세고 결국 이게 블록체인에도 장기적으로는 위협이 될 수 있다는 이야기입니다. 스팀은 숏 포지션이 불가능한 걸 참 다행으로 여겨야합니다. 그만큼 사실 스팀은 스팀잇에 대한 의존도가 커서 공격에 취약한데 이런 dust vote도 (현재 인기가 별로 없고 숏포지션이 안돼서 그렇지) 장기적으로 큰 위협이 될 수 있습니다.

글이 안써지는게 더 기분 나쁜거 아닌가요? 영어 농담도 써놨지만 글이 안써는데 보팅은 되길래 보팅했더니 글 쓰려면 더 기다려야하는 상황, 이게 유저경험을 향상시키나요?ㅎㅎ


@blockchainstudio님 곰돌이가 6.8배로 보팅해드리고 가요~! 영차~

짱짱맨 호출에 응답하여 보팅하였습니다.

Hi @blockchainstudio!

Your post was upvoted by @steem-ua, new Steem dApp, using UserAuthority for algorithmic post curation!
Your post is eligible for our upvote, thanks to our collaboration with @utopian-io!
Feel free to join our @steem-ua Discord server

Congratulations! Your post has been selected as a daily Steemit truffle! It is listed on rank 3 of all contributions awarded today. You can find the TOP DAILY TRUFFLE PICKS HERE.

I upvoted your contribution because to my mind your post is at least 9 SBD worth and should receive 153 votes. It's now up to the lovely Steemit community to make this come true.

I am TrufflePig, an Artificial Intelligence Bot that helps minnows and content curators using Machine Learning. If you are curious how I select content, you can find an explanation here!

Have a nice day and sincerely yours,
trufflepig
TrufflePig

Thanks :) Wow rank 3 quite generous. But I don't think I'll get 153 votes unless you do. For 9 SBD, you'll see. You should train your algorithm more :) Anyway, this is very interesting though I'm not sure if you check this comment.

Hey, @blockchainstudio!

Thanks for contributing on Utopian.
We’re already looking forward to your next contribution!

Get higher incentives and support Utopian.io!
Simply set @utopian.pay as a 5% (or higher) payout beneficiary on your contribution post (via SteemPlus or Steeditor).

Want to chat? Join us on Discord https://discord.gg/h52nFrV.

Vote for Utopian Witness!

Coin Marketplace

STEEM 0.26
TRX 0.11
JST 0.033
BTC 64777.26
ETH 3101.53
USDT 1.00
SBD 3.84