DIY Steemit Statistics with Python: Part 5 - Reputation
Our next order of business: the distribution of reputation scores among active Steemit users.
Before we start, we prepare the workspace as usual (see the previous posts in the series for additional context: 1, 2, 3, 4):
%matplotlib inline
import sqlalchemy as sa, pandas as pd, seaborn as sns, matplotlib.pyplot as plt
sns.set_style()
e = sa.create_engine('mssql+pymssql://steemit:[email protected]/DBSteem')
def sql(query, index_col=None):
    return pd.read_sql(query, e, index_col=index_col)
 
The reputation of each user is updated dynamically with each vote they receive. In principle, we could recover it by processing the TxVotes blockchain table and accounting for all the votes and flags one by one. This would probably be computationally quite heavy, however. Instead, we can rely on the fact that SteemSQL is helpfully tracking various current user-related metrics in a dedicated table for us already.
Indeed, let us take a look at all the tables currently available to us at SteemSQL:
sql("select * from information_schema.tables")
| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | TABLE_TYPE | |
|---|---|---|---|---|
| 0 | DBSteem | dbo | Reblogs | VIEW | 
| 1 | DBSteem | dbo | TxWithdraws | BASE TABLE | 
| 2 | DBSteem | dbo | TxWithdrawVestingRoutes | BASE TABLE | 
| 3 | DBSteem | dbo | TxWitnessUpdates | BASE TABLE | 
| 4 | DBSteem | dbo | VOCurationRewards | BASE TABLE | 
| 5 | DBSteem | dbo | Tokens | BASE TABLE | 
| 6 | DBSteem | dbo | Blocks | BASE TABLE | 
| 7 | DBSteem | dbo | Transactions | BASE TABLE | 
| 8 | DBSteem | dbo | TxCustomsReblogs | VIEW | 
| 9 | DBSteem | dbo | TxCustomsFollows | VIEW | 
| 10 | DBSteem | dbo | Comments | BASE TABLE | 
| 11 | DBSteem | dbo | VOInterests | BASE TABLE | 
| 12 | DBSteem | dbo | TxAccountCreates | BASE TABLE | 
| 13 | DBSteem | dbo | VOFillOrders | BASE TABLE | 
| 14 | DBSteem | dbo | Accounts | BASE TABLE | 
| 15 | DBSteem | dbo | TxAccountRecovers | BASE TABLE | 
| 16 | DBSteem | dbo | TxVotes | BASE TABLE | 
| 17 | DBSteem | dbo | TxAccountUpdates | BASE TABLE | 
| 18 | DBSteem | dbo | TxAccountWitnessProxies | BASE TABLE | 
| 19 | DBSteem | dbo | TxAccountWitnessVotes | BASE TABLE | 
| 20 | DBSteem | dbo | VOFillConvertRequest | BASE TABLE | 
| 21 | DBSteem | dbo | TxClaimRewardBalances | BASE TABLE | 
| 22 | DBSteem | dbo | TxComments | BASE TABLE | 
| 23 | DBSteem | dbo | TxCommentsOptions | BASE TABLE | 
| 24 | DBSteem | dbo | TxConverts | BASE TABLE | 
| 25 | DBSteem | dbo | TxCustoms | BASE TABLE | 
| 26 | DBSteem | dbo | VOAuthorRewards | BASE TABLE | 
| 27 | DBSteem | dbo | TxDelegateVestingShares | BASE TABLE | 
| 28 | DBSteem | dbo | TxDeleteComments | BASE TABLE | 
| 29 | DBSteem | dbo | TxEscrowApproves | BASE TABLE | 
| 30 | DBSteem | dbo | TxEscrowDisputes | BASE TABLE | 
| 31 | DBSteem | dbo | TxEscrowReleases | BASE TABLE | 
| 32 | DBSteem | dbo | VOShutdownWitnesses | BASE TABLE | 
| 33 | DBSteem | dbo | TxEscrowTransfers | BASE TABLE | 
| 34 | DBSteem | dbo | VOFillVestingWithdraws | BASE TABLE | 
| 35 | DBSteem | dbo | TxFeeds | BASE TABLE | 
| 36 | DBSteem | dbo | TxLimitOrders | BASE TABLE | 
| 37 | DBSteem | dbo | Followers | VIEW | 
| 38 | DBSteem | dbo | TxPows | BASE TABLE | 
| 39 | DBSteem | dbo | TxTransfers | BASE TABLE | 
Scanning through the list, we can guess that Accounts is most probably the table we are interested in:
sql("select top 3 * from Accounts")
 
From the results of this query (omitted) can see that there's a reputation field indeed. It is, however, given as a long, unfamiliar number - not the short value we are used to seeing in the profile page. It turns out we can convert this "raw" value to a "usual" reputation score using the following formula:

Let us test it:
sql("""
select 
    name, 
    reputation as raw_reputation,
    cast(log10(reputation)*9 - 56 as int) as reputation
from Accounts 
where name = 'konstantint'""")
| name | raw_reputation | reputation | |
|---|---|---|---|
| 0 | konstantint | 263717261138 | 46 | 
This worked as expected. Now let us finally count how many accounts are there per each reputation score (dropping the scores below 26, just like @arcange does it):
reputations = sql("""
with Data as 
    (select 
       cast(log10(isnull(reputation, 0))*9 - 56 as int) as Reputation
     from Accounts
     where reputation > 0)
select 
    Reputation, count(*) as Count
from Data 
group by Reputation
having Reputation > 25
order by Reputation desc""", "Reputation")
 
For the sake of variety, we made use of a Common Table Expression in our query this time.
Let us conclude by plotting the results:
reputations.plot.bar(figsize=(8, 4));

The source code of this post is also available as a Jupyter notebook.
Next time we will plot the voting power distribution and learn to insert fishy images into our barplots.