Calculating statistics for the number of steemit users using python
I want to experiment with Steemit API, and I decide to create a small script for getting information about users from blockchain.
To collect statistics, I used the Python language, Jupyter Notebook, Steem-Python library to access the Steemit blockchain API and Matplotlib for plotting.
The installation of libraries is trivial with standard Python pip utility (an example can be found here), so I will immediately give examples of code to collect data.
For the beginning, we need to import modules and create an instance of Steem class.
from steem import *
from steem.instance import set_shared_steemd_instance
nodes = [
'https://steemd.steemitstage.com/', # working nodes can be changed in future
]
mySteem = Steem(nodes=nodes)
set_shared_steemd_instance(mySteem.steemd) # need to set instance for future operations with API.
Next, we download all steemit accounts' usernames.
def downloadAllUsernames():
batch = mySteem.lookup_accounts(-1, 1000) #1000 is maximum names for one batch request
i = 0
while len(batch) != 1:
batch = [x+"\n" for x in batch]
with open("data/usernames%d.txt"%i, "wt") as f:
f.writelines(batch)
i += 1
batch = mySteem.lookup_accounts(batch[-1], 1000)
downloadAllUsernames() # actually call the function
We save usernames to text files for getting intermediate results, so we can pause script and continue running it later.
After this step finished, we can collect extra information for all users, for example, we can retrieve creation date and posts count for every user.
import time
def applyToAllAccounts(func, lastUser = -1):
batch = mySteem.lookup_accounts(lastUser, 1000)
i = 0
while len(batch) != 1:
func(batch)
time.sleep(0.5) #pause for waiting and not get heavy requests to node, otherwise node return fake accounts instead real
i += 1
batch = mySteem.lookup_accounts(batch[-1], 1000)
Collecting all data
import datetime, dateutil.parser
def calcPerMonth(accList):
def convertDataToYearMonthString(date):
d = dateutil.parser.parse(date)
return d.strftime('%Y/%m')
accs = mySteem.get_accounts(accList)
#debug log
print(accs[0]["name"])
for acc in accs:
created = acc["created"]
isMiner = acc["mined"]
isActive = acc["post_count"] > 0
createdMonth = convertDataToYearMonthString(created)
if createdMonth not in accsPerMonthCount:
accsPerMonthCount[createdMonth] = 0
accsPerMonthActive[createdMonth] = 0
accsPerMonthMiners[createdMonth] = 0
accsPerMonthCount[createdMonth] += 1
if isMiner:
accsPerMonthMiners[createdMonth] += 1
if isActive:
accsPerMonthActive[createdMonth] += 1
accsPerMonthCount = {} #total users
accsPerMonthActive = {} #users with active posts
accsPerMonthMiners = {} #users with "mined" flag
applyToAllAccounts(calcPerMonth)
This step can take 30-60 minutes.
And, finally, we can display the diagrams with the number of accounts by month of creation.
def drawNewUsersPlot(dictToPlot, title):
#del dictToPlot["1970/01"]
listItems = list(dictToPlot.items())
listItems.sort()
dates, users = tuple(zip(*listItems))
totalUsers = [sum(users[:i]) for i in range(1, len(users)+1)]
plt.close('all')
fig, ax = plt.subplots(1, figsize=(20, 10))
plt.bar(range(len(totalUsers)), totalUsers, align='center', width = 0.9)
plt.xticks(range(len(dates)), dates)
fig.autofmt_xdate()
plt.title(title)
plt.show()
drawNewUsersPlot(accsPerMonthCount, "Steemit total accounts count per month")
drawNewUsersPlot(accsPerMonthActive, "Steemit active accounts count per month")
So, right now Steemit has 1003324 users, and half of them are active (with at least one post).
Thanks for sharing! It is a good idea to check of the 1 million accounts, how many of them are really active. Imo, there are too many bots/dummy accounts in Steemit.
Indeed there are too many bots/dummy accounts. It'll be good if majority of these accounts are hodling Steem(SP). Interesting to see monthly active users (ie. at least one post a month).
I'll try to update my script to calculate it, I think one post a month is good metric, thanks for idea. Maybe I'll add some others for checking active users (with 2-3 posts from last 3 months, for example).
Прикольная штука. Надо было через Утопиан подавать, думаю, они бы неплохо оценили
Спасибо, попробую в следующий раз через утопиан, у меня ещё интересные метрики есть, возможно с дублем на русском тоже сделаю