One thing I get asked for a lot on steemstats.com is historical data. The problem I ran into was in the way steemstats was designed - everything is loaded in real time from the blockchain. I needed a way to store and retrieve data based on time and date - steemdb.com (steem database) is the concept I ended up with, which will allow all of us to use this data.
Before I go further, this post will be split into two parts: the first part is for everyone and outlines the benefits steemdb.com brings to steem users; and the second part for the developers interested in learning and/or contributing.
Also - This is a super early alpha, it might crash :)
Introducing steemdb.com, the website
steemdb is about being able to look at data in different ways, as well as back historically in time, using techniques that aren't currently available through the steem blockchain. It also serves as a playground for new ideas and experiments. It's a block explorer at heart, but doesn't feel like your typical explorer.
I'd invite you to check out my account overview here:
or you can even continue reading this post via steemdb:
Currently steemdb allows you to do the following things:
- View detailed account information + historical charts
- View detailed information about posts
- View the current witnesses + mining queue
- Search for accounts using the box in the upper right
- (Experiment #1) Browse posts by creation date, filter by tag, sorted by either votes or payouts
You currently can't do anything that requires a login - steemdb is all about exploration for the time being.
This post is going to get long, so for those of you who need a TLDR, I'd invite you to scroll through the small gallery of images below that show the various pages available.
The account view breaks down the different components of an account into separate tabs and tries to make the information easy to digest. You can look at anyone's account with this detail. It also charts and shows historical data for some areas of information to track changes over time. It's not mean to be a replacement to steemstats, but a tool that steemstats could link to to learn about others.
Much of this data will be available on steemstats.com in the near future. I'll also hopefully be providing some public APIs of the same thing for everyone else to use. This is all dependent on how optimized I can make the data and how much the infrastructure would cost to scale.
This was my take on how I'd want to explore the data behind a post. It's not interactive at this point, but already provides a bit more insight than we have here on steemit. It features the content broken out into a tabular view based on data aspect, and a sidebar that shows you about the author, other things they've written, and other meta information.
I've also included buttons on each post to quickly view it on steemit.com, just in case you discover something you'd like to vote on or respond to.
This section is a mirror of the witness page on https://steemd.com/witnesses. It displays information important to the witnesses and miners on the steem network.
Posts by Date (Experiment #1)
Example: https://steemdb.com (the current homepage)
This section was designed as an experimental view of data on the blockchain. This was my first attempt at organizing the data on steem in a more "content silo'd" fashion to see how it could be browsed.
I plan on doing a number of experiments on different ways to view the content in an attempt to help determine the best way to present it. My thought is that I'll be able to move a lot faster (since it's a database, not a blockchain) and help prototype what sort of things we may see here someday on steemit.com.
Currently, the posts page is set to:
- Grouping: It's grouped by the
creation date. You only view one day at a time, and can go forwards and backwards in time.
- Sorting: Posts by default are sorted by payouts which can be changed to votes.
- First Tag Priority: If you click on a tag, if will only show you posts where that is the first tag.
It's pretty interesting to go back in time and view what was popular on specific dates. It's even more interesting when you pick a tag, and then paginate by date. I've found a few posts that I never would have seen by doing this.
What's next for steemdb.com (the website)?
The short version is more historical data, charts, APIs and ways to view the data that exists on the blockchain.
One of the biggest goals with steemdb is to help people learn about all the various aspects of the steem platform. steemstats.com has the same goal, but just from a very different angle. steemstats.com is about someone specific, and steemdb.com is about everyone in general.
Over time, I think that these two projects will integrate with each other more and more.
The steemdb platform will also be used to show the community alternate ways that the content here on steem can be presented. I plan on trying out a bunch of interfaces you're likely familiar with to see what works best and what the community would like.
I feel the best way to actually garner feedback is with a prototype, so I plan on delivering them, hopefully with the help of others!
Speaking of which...
Introducing aaroncox/steemdb - the code
Let me start by saying I most definitely followed the mantra of "do things that don't scale" for this prototype. If I never updated this code ever again, it wouldn't be scalable to the size that I believe steem can achieve. In fact I don't even know how the servers will hold up with the announcement and general usage of the site. Gotta love these alpha versions!
It's also not really that easy to use yet! The reasons for this are:
- It's 2 weeks old
- I used technologies I didn't have to learn
- I'm learning the best ways to retrieve the data from the blockchain
Just like steem itself, if you're going to get involved, you're going to have to be able to figure how things work without documentation (for now). Learn to ask questions - lots of them.
I'd invite you to join the #steemdb channel on steemit.chat, as I'll try to focus any interested parties there. If you join to promote your posts... you're gonna get the evil eye.
The technologies behind steemdb currently
Technology stack was chosen based on technologies I've used recently, not what's best. I'm still working full time on other projects and steem is still only a portion of my time each week. So if I needed a hammer, and all I could find was a rock, I used it.
That doesn't mean these technologies can't change and be adapted. Remember, it's an alpha and an experiment...
Currently these are the components that make up the project:
- Data Storage: MongoDB. Say what you will about it, but I was able to throw blocks of json at it, index it, and then begin querying/aggregating it.
- The Website: Powered by PHP7/nginx, using the Phalcon 3 framework, and a small collection of composer packages. On the frontend it uses semantic-ui for UI and plottable.js for charts. It connects to MongoDB as well as directly to the steem blockchain (very lightly).
- The Services: Written in Python3, and powered by Piston, these services act as the synchronization tools between the blockchain and the database.
All of this is encapsulated in a variety of docker containers that's controlled through
Future direction of the project
Since this is the first release, a firm direction hasn't been set in stone. I'd like to see a number of improvements, as well as some optimizations, to really firm up it's core.
Right now off the top of my head, I believe it's immediate future includes:
- Some build processes to help manage js/css dependencies. It's very manual right now. You'll actually find a nice symlink from the public folder into the bower components.
- Cleaning up the synchronization scripts so they don't store quite so much data. Right now there's a lot of duplicate data, including posts, so the database is huge. Luckily disk space is cheap right now, and the project is still rather limited.
- The charts aren't in the best of places. The JS is actually stored in a volt file simply because docker was fighting me by corrupting my JS files.
- Most of the aggregation queries should be moved out of the controllers and into the appropriate models for reusability. There's very little duplicate code right now, but it might get out of hand if that pattern continues.
There are also some loftier goals of removing the frontend from PHP, and using something like react. This would make the PHP layer purely an API for the frontend to interact with.
How do I run the damn thing?
I will get a full development environment guide up in the coming weeks. It's going to take a LOT of disk space and some patience for the setup. You're going to have to build your own entire database version of the 4+ million blocks in the blockchain. Currently, completely unoptimized, it's consuming
21.943GB of disk space on my server.
If this is something you're interested in - the first step I'd take in getting started is setup a local steemd instance and synchronize the entire blockchain to a local computer/server. It just so happens I wrote a guide on a web instance of steemd a few weeks ago. I'd start there.
The first step of running steemdb is going to be letting the services run for like 6-12 hours as it creates your database.
Wrapping things up
I'm pretty excited about this project, being able to query the database for pretty much any bit of information is like having a super power. I will do my best to keep the community up to date of changes, digest feedback, and help shape it into another great tool for the community to use.
Philosophically, steemdb is what I'd consider to be the 3rd and final part of the trinity of projects I've been working on:
- steemstats: All about you and your data
- steemdb: Aggregate information about everyone and everything
- steempress: Taking your data and letting you start your own website
These projects combined will hopefully form a powerful open source combination to help contribute to the overall steem ecosystem. steempress is up next in the list of priorities, with two new themes to implement and a few projects that have expressed interest in using it to power their ideas.
As a current developer, backup witness, writer and normal guy - I hope to continue being a meaningful part of this community far into the future. Thank you all for your support!