Getting started with SteemDatasteemCreated with Sketch.

in #steemdata7 years ago (edited)

Getting Started

To use SteemData from your favorite language, just install the appropriate MongoDB library. You can find one for all major languages like JavaScript, Python, Go and others.

You can connect to the public SteemData server via the following URI:

mongodb://steemit:[email protected]:27017/SteemData

This tutorial uses Python, for which you can install a neat helper library that includes PyMongo and a few extra niceties.

pip install -U steemdata

Quick example:

> from steemdata import SteemData

> s = SteemData()

> s.info()
mongodb://steemit:[email protected]:27017/SteemData

> s.Accounts.find_one({'name':'furion'})['balances']
{'SAVINGS_SBD': 100.0,
 'SAVINGS_STEEM': 0.0,
 'SBD': 6.453,
 'STEEM': 66.157,
 'VESTS': 86491944.341744}

RoboMongo

I highly recommend RoboMongo as a cross-platform GUI utility for playing around with SteemData.

You can find sample queries from the video here.

Collections

Accounts

Accounts contains Steem Accounts and their:

  • account info / profile
  • balances
  • vesting routes
  • open conversion requests
  • voting history on posts
  • a list of followers and followings
  • witness votes
  • curation stats

Posts

Here you can find all top-level posts, with full-text search support for content bodies.

Operations

Operations contains all the events that happened on the blockchain so far.
You can query for operations in individual blocks, or by time, operation type (comment, transfer, vote...) or arbitrary properties. See Digging Deeper for examples.

AccountOperations

Same as operations, but with account ownership attached for easy querying.

PriceHistory

Snapshots of Bitcoin, STEEM, SBD and USD implied prices.


You can access collections easily via SteemData helper.

> s = SteemData()
> [print(x) for x in s.__dict__.keys()]
db
Operations
AccountOperations
PriceHistory
Posts
Accounts

We can see a few properties starting with UPPER case letters. These give us easy access to main SteemData collections. Alternatively, you can query a collection via db property.

s = SteemData()

# these two do the same thing
s.Accounts
s.db['Accounts']

Querying

If you're new to MongoDB, I highly recommend this querying guide.

I will only point out a few gotchas in regards to SteemData.

Using Indexes

For best performance on your queries, make sure you're using indexed fields whenever possible. You can check out which fields are indexed by using index_information():

s = SteemData()
indexes = list(s.Operations.index_information())
print(indexes)

As you will find out, most commonly queried fields are indexed, like account/name, type, timestamp, identifier, permlink, author, memo to name a few.

Using Projection

Using projection will make queries a lot faster, save bandwidth and do the job of only returning the data that you need.

For example, if you're only interested in someone's followers, you can use projection to get only that field.

s.Accounts.find_one({'name': steemit_username},
                    projection={'followers': 1, '_id': 0})

This is similar to SELECT followers FROM accounts vs SELECT * FROM accounts in SQL.

Using Limits

By default, all results will be returned. This could make queries run for longer, and is wasteful, especially if you only need some results at a time (ie. top 100).

This is where limits come in, for example, if we need top 100 accounts by SteemPower:

q = s.Accounts.find({},
                    projection={'sp': 1, 'name': 1, '_id': 0},
                    sort=[('sp', -1)],
                    limit=100)
print(list(c))

Pagination

Following the above example, we can get the next 100 accounts (100-200) by using skip argument

q = s.Accounts.find({},
                    projection={'sp': 1, 'name': 1, '_id': 0},
                    sort=[('sp', -1)],
                    limit=100,
                    skip=100)

Syntax Sugar

If you'd like, you can also use method chaining instead of arguments. For example:

s.Accounts.find({}).projection(...).sort(...).limit(100).skip(100)

Example

Lets wrap up with a practical example. The folowers page on steemit is pretty bland - it only shows usernames. What if we could spice it up, by displaying users profile picture, steem-power, reputation, and their own followers statistics. How would we obtain this data? Here is a function that is powering an API endpoint that does just that.

def busy_account_following(account_name, following):
    """
    Fetch users followers or followings and their metadata.
    Returned list is ordered by follow time (newest followers first). \n
    Usage: `GET /busy/<string:account_name>/with_metadata/<string:following>`\n
    `following` must be 'following' or 'followers'.\n
    """
    if following not in ['following', 'followers']:
        raise ParseError(detail='Please specify following or followers.')

    acc = mongo.db['Accounts'].find_one({'name': account_name}, {following: 1, '_id': 0})
    if not acc:
        raise NotFound(detail='Could not find STEEM account %s' % account_name)

    # if follower list is empty
    if not acc[following]:
        return []

    allowed_fields = {
        '_id': 0, 'name': 1, 'sp': 1, 'rep': 1, 'profile.profile_image': 1,
        'followers_count': 1, 'following_count': 1, 'post_count': 1,
    }
    accounts_w_meta = list(mongo.db['Accounts'].find({'name': {'$in': acc[following]}}, allowed_fields))

    # return in LIFO order (last to follow is listed first)
    accounts_ordered = list(repeat('', len(acc[following])))
    for a in accounts_w_meta:
        with suppress(ValueError):
            accounts_ordered[acc[following].index(a.get('name', None))] = a
    return [x for x in accounts_ordered if x][::-1]

Digging Deeper

If you'd like to learn how SteemData Charts work behind the scenes, feel free to download and run this iPython Notebook.
It should give you some ideas of what SteemData can be used for, as well as provides a quick way for you to start playing with code and writing your own.

Sort:  

Exciting stuff. Thank you for all your work on this. An ER diagram would be very helpful. I would like to see the internal progress of the tables and the foreign keys.

Right now the structure is completely flat (as mongo is Document based db, it is very flexible in structure and nesting). I will be adding links between collections in future.

So basically, there are these collections, without relationships between them (yet):

Operations
AccountOperations
PriceHistory
Posts
Accounts

I'm trying to kickstart the use of the #steemdev tag. Your post would fit well in that category.

awesome, thank you. I've added the tag.

I am not sure I understand it correctly.. Is SteemData a db interface to steem blockchain? Can it read data from and write to locally running steem node?

if i understand correctly, it's just an independent database, where the data on blockchain is continuously being copied into it.

that is correct

Thanks for all your work.

Well considering that I haven't done any real programming since the day's of Q'Basic and DOS....I kind of follow this. Lol I just might need to brush up my skillset to really understand it.

This is great stuff. Thanks @furion

Hey @furion, is there a way to query the mongoDB using SQL? I'm not familiar with the mongoDB shell or python, etc...

Hey,

I try to not be annoying but I'm not that smart so I need to ask questions.

Why is it that some "active_votes.rshares" are stored as strings and some as integers? any way you could fix that, it really messes with the results when I try get the most upvoted post etc.

Coin Marketplace

STEEM 0.20
TRX 0.14
JST 0.030
BTC 64155.87
ETH 3422.91
USDT 1.00
SBD 2.59