[Steem] Visualizing Vote Histories on the Blockchain

in #steem6 years ago (edited)

(Full Jupyter Notebook version available on GitHub.)

Okay, so it's not more role-playing games. It's more looking at the steem blockchain and building graphs.

I can't exactly tell you why I have been fascinated by this lately, except to say that it's been a long time since I cut any significant amount of code and I don't see a lot of people approaching the issue by looking at relationships versus hard numbers. That's what I find interesting.

This is another of those "you're going to see all the code" Jupyter Notebook deals, so if you're just here for the pretty pictures, brace yourself and prepare to hum through the code-bits.

Yesterday's analysis gave us a pretty interesting set of things to look at. We were just looking at literal financial transfers on the blockchain and purely the directionality of those things. Today I think we're going to look at a different kind of transaction: votes which involve the top 200 most powerful accounts by SP on the blockchain.

I keep going back to that set because they represent way more than 50% of all active power by SP. As such, collectively, they have the power to redirect more than 50% of the rewards pool. Even allowing for the fact that some of these accounts are sitting around doing nothing, the amount of pooled SP is enormous. Each of those theoretical votes is enormous.

Let's start by building a new database extraction of those top 200 accounts, and try to do so with a more efficient database query.

Building the New Query

# Setting up the imports for our basic query tools

from steemdata import SteemData
import datetime
from datetime import datetime as dt

from pprint import pprint
# Init connection to database

db = SteemData()
# Just a list of all accounts over a value of 10,000 vests. 
#   That should neatly cut out most of the results.

query = {'vesting_shares.amount': {'$gte': 10000}}
# Projections define what fields we'll be returning 
#   from the query. We want the total number of vests, 
#   descending. We DON'T want the hashed ID.

proj = {'name': 1,
        'vesting_shares.amount': 1,
        '_id': 0}
# This is a different looking piece of code than before 
#   because we're integrating the sorting from highest 
#   to lowest in the query itself. Then we're limiting 
#   it straight off the bat to the accounts we want.

result = db.Accounts.find(query,
                          projection=proj,
                          sort=[('vesting_shares.amount', -1)],
                          limit=50)

What's that look like, I wonder?

# We need to turn the result into a list so we can work
#   with it.

resL = list(result)
pprint(resL[:10])

len(resL)
[{'name': 'steemit', 'vesting_shares': {'amount': 90039851836.6897}},
 {'name': 'misterdelegation', 'vesting_shares': {'amount': 33854469950.665653}},
 {'name': 'steem', 'vesting_shares': {'amount': 21249773925.079193}},
 {'name': 'freedom', 'vesting_shares': {'amount': 15607962233.428}},
 {'name': 'blocktrades', 'vesting_shares': {'amount': 9497442946.221754}},
 {'name': 'ned', 'vesting_shares': {'amount': 7344140982.676874}},
 {'name': 'mottler', 'vesting_shares': {'amount': 4617011297.0}},
 {'name': 'databass', 'vesting_shares': {'amount': 3500010180.297931}},
 {'name': 'hendrikdegrote', 'vesting_shares': {'amount': 3298001762.871842}},
 {'name': 'jamesc', 'vesting_shares': {'amount': 3199868835.022211}}]





50

Perfect! That is a far faster and more efficient query than we have been using, which I am sure that @furion appreciates.

I appreciate it because it comes back with exactly what I want in a way that I can use it, right out of the database. No extra sorting algorithms or time invested in working them out necessary.

intL = [e['name'] for e in resL]

pprint(intL[:10])
['steemit',
 'misterdelegation',
 'steem',
 'freedom',
 'blocktrades',
 'ned',
 'mottler',
 'databass',
 'hendrikdegrote',
 'jamesc']

Now that we have our targets of interest, we need to look at the transaction Collection. That's going to be our next target.

Here's the basic anatomy of a vote inside the Operations Collection in the database. It's interesting to note that we have the author, the voter, the date, the weight – but we would have to work backwards to figure out what that weight actually was in terms of vests.

I suppose I can thank normalized databases for that.

From my point of view, this provides a certain complication. I would really like to know how much SP that represented at the time the vote was made, but hacking that out is nontrivial.

The naïve and lazy version (which is inherently superior because it is naïve and lazy) would be to look up the current number of vests held by the voter and scale appropriately. Given that we are only going to be looking at votes from the last week as a first approximation, this may suffice.

# We want only the vote transactions which have happened
#   in the last week. Luckily, we worked out how to do
#   date-range queries in an earlier bit.

query = {
    'type': 'vote',
    'timestamp': {'$gte': dt.now() - datetime.timedelta(days=7)}}

proj = {'voter': 1, 'author': 1, 'weight': 1}
result = db.Operations.find(query,
                            projection=proj)

voteL = list(result)
pprint(voteL[:5])

len(voteL)
[{'_id': 'd1596d456b6bcc0248ecc70ae8d07cfc1b92ad0a',
  'author': 'hodgetwins',
  'voter': 'nour-money31',
  'weight': 10000},
 {'_id': 'd1fcdba491a673a8c37103359e627e50d8990bf8',
  'author': 'marksheppard',
  'voter': 'shahidshah',
  'weight': 10000},
 {'_id': 'f3705f34765100827a6f7e275232b2452d64de9b',
  'author': 'syllem',
  'voter': 'thetiger',
  'weight': 10000},
 {'_id': 'f2c719d070164684cc1088765b8332dadc9673ef',
  'author': 'krguidedog',
  'voter': 'asbear',
  'weight': 700},
 {'_id': 'c9c7ceeb0587821b1112384fea99885067ba88a2',
  'author': 'mazzle',
  'voter': 'minnowsupport',
  'weight': 50}]





5494637

5 1/2 million votes in the last week is nothing to shake a stick at. In fact, that's a lot of data to juggle.

I have some ideas about ways to cut the number of results that we have going on here; we might as well try one!

We might try only looking for votes which involve one of our accounts of interest. If we are lucky, we can get the database side to do most of the filtering for us.

# We want only the vote transactions which have happened
#   in the last week AND involve only our accounts of interest. 

query = {
    'type' : 'vote',
    'timestamp' : {'$gte': dt.now() - datetime.timedelta(days=7)},
    '$or': [{'author': {'$in': intL}},
           {'voter': {'$in': intL}}] 
    }

proj = {'voter': 1, 'author': 1, 'weight': 1}
result = db.Operations.find(query,
                            projection=proj)

%time intVL = list(result)
Wall time: 49.5 s
pprint(intVL[:5])

len(intVL)
[{'_id': '6f0c88326295c5c6d67f58613e128429c39236b2',
  'author': 'steemcleaners',
  'voter': 'adm',
  'weight': 150},
 {'_id': '19a354e6e512608f621cbd5f11ba88233d054405',
  'author': 'tejma',
  'voter': 'wackou',
  'weight': 160},
 {'_id': '2116e93daadb674fddf693743c711a7bfceb5030',
  'author': 'goldmonhla',
  'voter': 'pharesim',
  'weight': 2},
 {'_id': '364c14f2bcfb029c7cdcd81f5cc8784d98a36f66',
  'author': 'skypointstudios',
  'voter': 'fulltimegeek',
  'weight': 300},
 {'_id': '2749931690d884af3be8202228de2c48f907d78b',
  'author': 'skycae',
  'voter': 'fulltimegeek',
  'weight': 600}]





42125

That brings us from 5 million votes down to looking at a collection of only 188,000. Which is better, don't get me wrong – but graphing 188,000 edges is going to be ugly.

Maybe we can clean this up a little bit. Since we are going to use weight, let's drop that from what we are pulling back. And once we have the weight removed, we don't need duplicates of votes. That will leave us with a network of connected accounts, completely ignoring the articles which may connect them.

# We want only the vote transactions which have happened
#   in the last week AND involve only our accounts of interest,
#   AND we want unique results if we can arrange it.

query = {
    'type' : 'vote',
    'timestamp' : {'$gte': dt.now() - datetime.timedelta(days=7)},
    '$or': [{'author': {'$in': intL}},
           {'voter': {'$in': intL}}] 
    }

proj = {'voter': 1, 'author': 1, '_id': 0}
result = db.Operations.find(query,
                            projection=proj)

%time intVL = list(result)
Wall time: 13.5 s
pprint(intVL[:5])

len(intVL)
[{'author': 'steemcleaners', 'voter': 'adm'},
 {'author': 'fyrstikken', 'voter': 'jukian'},
 {'author': 'steemcleaners', 'voter': 'adm'},
 {'author': 'tejma', 'voter': 'wackou'},
 {'author': 'goldmonhla', 'voter': 'pharesim'}]





42126
intVLT = []

for e in intVL:
    intVLT.append((e['voter'], e['author']))
    
intVLT[:10]

[('adm', 'steemcleaners'),
 ('jukian', 'fyrstikken'),
 ('adm', 'steemcleaners'),
 ('wackou', 'tejma'),
 ('pharesim', 'goldmonhla'),
 ('fulltimegeek', 'skypointstudios'),
 ('fulltimegeek', 'skycae'),
 ('adm', 'steemcleaners'),
 ('adm', 'steemcleaners'),
 ('donkeypong', 'cryptogee')]
from more_itertools import unique_everseen

intUVL = list(unique_everseen(intVLT))
pprint(len(intUVL))

pprint(intUVL[:10])
19471
[('adm', 'steemcleaners'),
 ('jukian', 'fyrstikken'),
 ('wackou', 'tejma'),
 ('pharesim', 'goldmonhla'),
 ('fulltimegeek', 'skypointstudios'),
 ('fulltimegeek', 'skycae'),
 ('donkeypong', 'cryptogee'),
 ('pharesim', 'meesterboom'),
 ('fulltimegeek', 'abh12345'),
 ('adsactly', 'meesterboom')]

84,000 edges is something more like reasonable for our purposes.

It's still rather extreme, but it's far better than 5 million edges.

Because Python sets are iterators, we'll just be able to step down the collection and build some edges.

Time to break out the graphviz!

from graphviz import Digraph
dot = Digraph(comment="Steem Core Vote Relations", 
              format="jpg",
              engine="sfdp")
dot.attr('graph', overlap='false')
dot.attr('graph', ratio='auto')
dot.attr('graph', size='10000000,10000000')
dot.attr('graph', start='1.0')
dot.attr('graph', K='100000')
dot.attr('graph', margin='5')
dot.attr('node', shape='square', style='filled', color='black', fillcolor='green')
for a in intL:
    dot.node(a, a)
dot.attr('node', shape='oval', color='black', fillcolor='lightgrey')
# Let's try just the first 10,000

for e in intUVL[:10000]:
    if e[0] in intL:
        if e[1] in intL:
             dot.edge(e[0], e[1], color='black')
        else:
             dot.edge(e[0], e[1], color='green')
    else:
        dot.edge(e[0], e[1], color='red')
%time dot.render('steemVoteRelationships', view=True)
Wall time: 2min 18s





'steemVoteRelationships.png'

The Big Reveal

Let's start with the top 200 accounts, pull all the transactions which involve votes and and them for the last week, and then realize that we have way too many edges to graph with the systems that we have at hand (and probably too many to actually rationally make an analysis of), and just take the top 10,000.

That's roughly 1/8 of the full load, or perhaps better visualized as slightly less than one day of votes involving any of the 200 accounts.

Perhaps not surprisingly, this graph is super dense. It's also super large.

But here's the thing – it surprised me.

Pull back and look at this. Remember what we've said before about these graphs; the system tries to cluster things with more connections to one another closer to one another. Relationships roughly map to distance.

As on my previous graphs, green lines are energy flowing into the 200 accounts in question and redlines are energy flowing out of those 200 accounts.

Imgur

(Click on the image to go to download the whole thing for your own pleasure.)

Notice the vast lack of overlap between this set of accounts in the top 200 which are involved in giving votes and the set in the top 200 which are involved in giving votes.

"Never the twain shall meet" might be an exaggeration – but it's only a slight one.

As expected at some level, there are a fairly significant number of whales which don't involve themselves in up voting or down voting at all. For the sake of these diagrams, we don't distinguish the two. None of the SP in those accounts for that time slice was being exercised to redirect any of the rewards pool. They are acting purely as value repositories.

But check out the rest of that action! There are very clear, extreme loci of activity, and they are highly specialized.

There is one account with a relatively balanced degree of connection to both up votes and down votes: @onceuponatime

You'd probably expect @curie to be in either a much more central position than it occupies or much further out into the green zone, but it hangs at a strangely indecisive position. I'm chalking that up to the fact that they frequently solicit up votes on the comments which let people know they have received a @curie boost.

Similarly surprising, @ausbitbank has a huge number of green outgoing botes and almost nothing coming back in. I've passed words with him before so I'm aware that he actually exists as a thinking, speaking being and not just a bot, which makes it all the more shocking.

The area to the southeast it is of particular interest to me because these are whale accounts which are creating content to get voted on and are receiving those votes in great number. Far more than, at least at the time sampled by our system here, they are putting votes out.

But this is a really messy view. Surely we can do better!

Cut in Twain

How about we cut down the number of accounts were interested in by half? Let's just look at the top 100 accounts and the votes that they engaged with over the last week?

Imgur

(Click on the image to go to download the whole thing for your own horror.)

We still have a bit of a problem. The number of votes going back and forth well exceeds 10,000. In fact, we are only down to 50,000 or so. Again, will just take the top 10,000 from the list, threw them on the graph, and see what we see.

And what we see is largely the same as what we saw before. Whales either receive massive amounts of votes or give massive amounts of votes, and almost never do they do both. (Okay, some hang out on the periphery doing neither and nothing.)

For me here, the most interesting thing is in the southeast where @virus707 is probably the most visible balanced whale in this network. Lots going in, lots going out. Just lots and lots, truthfully.

The more that I stare at these directed acyclic graphs of interactions, the more I get the feeling that it should be possible to describe bot-like behavior in terms of the carafe of the interactions that they have with other accounts. I can't prove it, and as for now it is just an inchoate sensation in the back of my brain, but these interaction maps are definitely interesting.

But this is still too much data. The patterns are clear, but do they hold true if we go from the top 100 accounts to the top 50 accounts? Surely we will be able to look at an entire week's worth of votes between the top 50 accounts and the rest of the blockchain, right?

The Final Countdown

At least we have moved to the closest yet. The number of votes involving the top 50 accounts on the blockchain over the last week is "only" about 20,000. That's about twice the limit that we are working with here.

(I want to reiterate, this is after I go through the lists and turn them into unique transactions. Two accounts which are related by multiple votes only get one edge.)

Imgur

(Click on the image to go to download the whole thing for your own self-gratification.)

So far – so much like we've already seen, in so many ways. The nodes which are involved in giving out votes are almost purely involved in curational activities, and the nodes which are involved in receiving votes are almost purely involved in posting activity, one must assume.

We can see a few isolated islands of interaction, but for representing roughly half a week, this is pretty clear.

At this level, a lot of whales are just sitting around doing very little. That's not really much of a surprise as they are just repositories of value waiting to be tapped when the time is right.

For the most part, there is very little overlap between whale accounts which curate and whale accounts which create. From my perspective, that is not a great thing, though I can absolutely see where the game mechanics of the blockchain reward that as a conscious choice. It's almost impossible to "curate" (in the sense that the steem blockchain means) and actually spend time creating content. In fact, I wouldn't really be surprised to know that almost all of the whale curational activity that we see going on is at least 80% driven by some sort of automated bot system. There is simply too much activity over too wide spectrum to not be augmented.

It's also vaguely disappointing that so much of whale activity is devoted to curation rather than creation. It might be interesting to go back and specifically look at some of the clear patterns that we see centering on people who are receiving vast numbers of votes from outside the whale network. My suspicion is that most of them write about or do videos regarding cryptocurrency.

I'm not saying the steem blockchain does a lot of it's typing one-handed, but the Michael Jackson glove is a little obvious.

But What Does It Mean!?

The short answer?

I don't know.

We can observe for ourselves, at least for these time periods, what kind of activity is going on at the top end of the SP pool, keeping in mind my previous analysis which demonstrated that more than 50% of SP is held by the top nine accounts on the blockchain, and after eliminating Steemit corporate, more than 50% of SP is held by the top 90.

Examining the traces that we have before us, which is something like looking through goose intestines for the future, we can see that most of that SP isn't actually being used to direct any portion of the rewards pool. Of the active whales on that list, activities breakdown very, very clearly between curation and creation – and as a result, knowing what we know about the relative value of both of those behaviors, the guys with all of the red lines coming in are doing quite well indeed.

I've seen discussions which effectively call for all whale activity in terms of voting to stop, because they are "unfairly advantaged" over the rest of the blockchain. These graphs give the lie to that claim. The vast number of whale votes are not going to other whales – they are going to non-whales.

If whales stopped voting, contrary to claims, the percentage of the reward pool which goes to minnows would not be increased. On the contrary, the percentage of the rewards pool which would go to already extant whales which are receiving an absolute butt load of up votes from minnows (and dolphins, and orcas, and even plankton) would make sure that even more of it went that way.

I think we can reasonably put that old chestnut to bed, now.

Epilogue

I really need to stop creating massive graphical displays of activity on the steem blockchain and get back to doing what I do best, writing about role-playing games, tabletop wargames, video games, and staying way the Hell away from crypto cultism and all that goes with it.

But there's something about dissecting activity in terms of relationships in social networks. Whenever there are tools to do so, I find it almost impossible to resist tinkering with visualizing that activity.

Relationships are all too often overlooked. Whether those relationships be between accounts that send each other money or accounts which interact with one another through the mechanism of voting, or some other type of interaction. Visualizing those connections, those contacts, is compelling stuff.

I can't resist looking at this through a game designer's eyes, thinking about where the hotspots of activity are, assessing whether how people feel about the mechanics actually accords with what the mechanics actually are.

To be – that's a big deal.

In the meantime, hopefully in a couple of days I'll be back to write something about actual games and not burn your eyes out with red and green lines on a white background connecting green squares and gray ovals (and purple horseshoes).

We'll just have to see.

Tools

No real new tools leveraged this go-around. On the positive side, Imgur can be depended on to host at least a cut-down version of my larger images with Google Drive holding the straight-up files which it can't present but can download, so that's something.

Sort:  

Thanks for sharing a standard statistics.

It's a very nice idea.
I wish I could understand the code. Lol

The code is actually pretty straightforward. Actually, it's brute force at its worst.

Python is one of the easiest modern languages to get your hands on and really start learning. There are tons of resources online for doing so. I'm pretty sure there's even a @Utopian-io group for doing so.

Hmmmmm! I should give that some thoughts.

Those are some complex graphs! Glad you explained them and it was interesting to hear what your take on the data was. Also I didn’t know people thought @ausbitbank was a bot! For a bot he consumes an alarming amount of coffee LOL

I think we're going to have to create a new type of classification called a "bot -like behavior" which consists of various activities which occur at too high a volume or too consistently to fully resemble human activity.

There are a few whale accounts which show that kind of pattern – along with more human-like interactions. Analytically, it's fairly clear that they are engaging in bot-augmented behavior, which is something that I've been thinking about in terms of tools to help me do some of things I want as well.

I'm just not sure exactly what for, but something. :)

Highly rEsteemed!

Followed for the future analysis.

Imgur

I think that's a picture of my relationships for the last couple of decades.

There's been some ups and downs.

Great to share your analysis on steem boat.
Thanks for the information
Follow+upvote

I'm not sure all of this stuff counts as "information," so much as it's just "a bunch of data." Interpretation is hard.

What did you come away having learned from this?

Very good idea. Thanks for share this post.

It's a very nice idea.good job

the post you share is very good, very useful for me, after I read your article my thoughts and insights add, thank you @lextenebris

Coin Marketplace

STEEM 0.20
TRX 0.13
JST 0.030
BTC 65858.36
ETH 3493.88
USDT 1.00
SBD 2.53