Block.ops - An Analysis Tool

For the last year I have carried out a monthly analysis of the Steem blockchain activity by application (i.e. by the different websites and applications through which you can post to the Steem blockchain - often termed dApps).

My aim is now to build a tool that can automate such complex analyses of Steem data, providing both historic time series and rapidly updatable real-time results. In addition to the dApp analysis there are many other projects for which such a system could be useful.

This tool is Block.ops.

gears_blockops.jpg

Repository

https://github.com/miniature-tiger/block.ops

New Project - Block.ops

What is the project about?
  • Block.ops is a tool that will allow automation of complex analyses of Steem data.
  • The data will be sourced directly from the Steem blockchain through the AppBase API. This data will be filtered and formatted into the desired structure and then stored in a Mongo DB.
  • Initially the Block.ops project will focus on historic analysis but the roadmap also includes the build of real-time results, updating as new Steem blocks are created.
  • New features from this initial contribution are described in the section below.
Technology Stack
  • Block.ops is being built with Javascript, Node.js, AppBase API and MongoDB.
  • This is my first project with the latter three of these technologies so it will be a learning experience!
Roadmap
  • The short-term roadmap is:
    (1) Build Block.ops to automatically produce the monthly analysis of the Steem blockchain activity by application, including author, post, and payout statistics and rankings.
    (2) It should be possible to generate analyses for any chosen date range, not just monthly.
    (3) Consider, map and build more granular analyses by application and by user, including time-series.
    (4) Build automated charts and graphs to illustrate the data.
    (5) Consider which other data from the block operations data are to be stored (votes, follower operations, transfers etc) and which other analyses are to be built.
    (6) Once the mappings are defined, fill the database (this will take some time but can be done gradually if necessary - once the data has been loaded for an analysis of one month it then is available for all other historic analyses - assuming the data is stored in the required format).

  • The longer-term roadmap is still to be defined but is likely to include:
    (1) A front-end UI for more manageable launching of analyses and reading of results.
    (2) Build for production of real-time results.
    (3) Production of API for particular analyses.

How to contribute?
  • If you are interested in this project you can contact me as miniature-tiger on discord.

New Features - Initial contribution

Creation of blockDates index

The Steem blockchain operations data (posting data, rewards, voting etc) can be extracted from each block of the blockchain. However analyses are typically carried out by date rather than for a certain number of blocks. The first task is to create an index of blockDates by finding the first block of each day (in UTC time).

The index is created starting from the first block. It then moves forward a day at a time, estimating where the first block of each day should be (based on three second blocks), adjusting and re-estimating (due to dropped blocks) and validating by comparing the timestamp of the chosen block and the immediately prior block. There is a workaround for dates with lots of blocks missing.

This commit also includes the construction of various steem API functions for accessing AppBase. I have toyed with both Steem-js and dSteem but in the end I have built my own functions (from scratch and then using generic npm modules request and request-promise-native). There's no overwhelming reason for this choice, other than this being my first project of this type and I like to learn from the ground up.

There is also a report which checks to illustrate whether this blockDates setup has completed as desired. It looks like this:

blockDates report.png

As might be expected with 3 second blocks, virtually all the first blocks of each day are at midnight. The two exceptions are the first block (24 March 2016 16h05) and 1 June 2017 (3 seconds past midnight).

Almost all days have some dropped / missed blocks in comparison to the expected number based on 3 second intervals. I'd be interested in understanding how this happens if anyone can point me to a good explanation. Generally the number is small but occasionally there are days with significant outages. Well, you can't make omelettes without breaking a few eggs.

The code is here:
https://github.com/miniature-tiger/block.ops/commit/adea42f91017cde8270d4f4c3db8457b0515ad5d

Block Operations Loop and Comment Analysis

The second main commit covers:
(1) The creation of the loop to pull consecutive blocks of operations from blockchain based on date parameters;
(2) The analysis and formatting of comment operations data from each block, including separating out the dApp information, and inserting the individual records into the MongoDB;
(3) A report on the market share of each application by comment numbers.

Aside from comments (which include posts) all other block operations are ignored. More functionality will be added in the next contribution.

The report looks like this. Already looking good! More work will be required to classify applications between dApps that people can actually use, bots, and libraries.

MarketshareComments.png

The code is here:
https://github.com/miniature-tiger/block.ops/commit/70641d4048dbfc2578fea3125d1870b7338cbdd8

Capturing blocks processed / Date range parameters

The third main commit provides two main additions:

  • Work on parameters allowing the date range to be defined by two dates or by a single date plus a number of blocks to process (this latter option is for testing where you only really want to run a small number of blocks).
  • The capture of which blocks have been processed, with status "OK" or "error" and the reporting of blocks processed between date ranges. This will be used in future to allow blocks dropped in error to be reprocessed and prevent reprocessing of previously analysed blocks when lomger date ranges are chosen.

The report looks like this:
Reportblocksprocessed.png

GitHub Account

My account on github is here:
https://github.com/miniature-tiger

Sort:  

That's a great kickoff. Good to see the magic inside your analysis posts.

I see that you don't use dsteem or steemjs and create rpc requests yourself. Is there a specific reason for that? Do they create any overhead?


Your contribution has been evaluated according to Utopian policies and guidelines, as well as a predefined set of questions pertaining to the category.

To view those questions and the relevant answers related to your post, click here.


Need help? Write a ticket on https://support.utopian.io/.
Chat with us on Discord.
[utopian-moderator]

Thanks Emre!

For the rpc requests I played with http from node and also dsteem and steemjs before settling on request and request-promise native. There's no overwhelming technological reason but as it's the first project I've done with node.js and steem I wanted to understand the basics and try building things from scratch.

That said, I do like the transparency of request - it's pretty clear what each call is doing and where it's pointing. But I'm sure I'd get comfortable with the others if I used them more often.

Thank you for your review, @emrebeyler!

So far this week you've reviewed 14 contributions. Keep up the good work!

It occurs to me that this is exactly the sort of project that I would find extremely useful for a number of analyses that I would like to be doing but can't now that SteamData has gone the way of all flesh. I much prefer working with MongoDB than any of the alternatives currently accessible.

It also occurs to me that this project might profit substantially from real bifurcation, first focusing on a simple, straightforward way to pull content from the stream into a MongoDB and have that working first as an easily deployable standalone application, followed by analysis tools which speak to that MongoDB established by the first database.

For myself, I don't really need a set of tools implemented for me which can dig around in and analyze the database contents; I wrote a pile of Python code publicly a few months ago which would do that well enough, at least for my needs.

Getting blockchain contents into a database, on the other hand – that's more than slightly challenging. The method used by SteamData was open source and I appreciate that fact, but it was nontrivial.

This commit also includes the construction of various steem API functions for accessing AppBase. I have toyed with both Steem-js and dSteem but in the end I have built my own functions (from scratch and then using generic npm modules request and request-promise-native). There's no overwhelming reason for this choice, other than this being my first project of this type and I like to learn from the ground up.

Have you looked at beem? @holger80 has done an amazing job of building a very solid interface to the steem blockchain, accessible through Python. The Discord channel associated with the project is extremely well attended and I've seen some very insightful analysis of what are effectively some bugs in the methodologies implemented in the working specification of the blockchain itself come to light because of the way that he is working with unit tests and because of the projects that are already being made with beem.

Almost all days have some dropped / missed blocks in comparison to the expected number based on 3 second intervals. I'd be interested in understanding how this happens if anyone can point me to a good explanation. Generally the number is small but occasionally there are days with significant outages.

If anyone can answer this question, it's those guys.

I'm looking forward to seeing where the project goes, purely for my own self-interest, and I think there's a lot of potential for this to become very interesting.

Thanks @lextenebris, that's some thoughtful advice!

My own interest in carrying out the project is mainly for the production and automation of heavy / complex analyses that are difficult with the existing systems. A bespoke tool rather than an all-encompassing archive like steemsql or steemdata. That said, I do plan for it to be pretty flexible so that people can use it for different purposes. But there are serious volume considerations now with the full Steem data so I'm looking more along streamlined lines.

It also occurs to me that this project might profit substantially from real bifurcation, first focusing on a simple, straightforward way to pull content from the stream into a MongoDB and have that working first as an easily deployable standalone application, followed by analysis tools which speak to that MongoDB established by the first database.

The coding does naturally split along those lines so it should be possible to separate off the first requirement as a standalone tool without difficulty. I don't expect it will take too long to build overall so hopefully that could be ready fairly soon. I will need to take a holistic approach to including the various parts as I build it though, simply because it's my first time using node.js, steem and Mongo so I'll be working out what works and what doesn't as I go along. But it will be modular, so splitting should be simple.

Have you looked at beem?

I have heard good things of beem! But I understand that it's Python-based so not something I can work with currently. Next year! I will check in with them on the dropped blocks though. Thanks for the hint!

Loading...

I would like to give a suggestion on including package.json file in your repository, and you can declare node version in package.json also.

{ "engines" : { "node" : ">=10.9.0" } }

Thanks @superoo7. That's a great tip!

I'll admit to still being a beginner with node.js and npm and whilst I have a basic grasp of package.json it's an area I need to research to fully understand it. Lots still to work on!

sure, everything has a start. Good luck on that :)

This isn't my area at all but I like getting the numbers in and anything that makes it easier for those who explore to do their job is a good idea :)

I think that the more we understand the data and how people are actually using the blockchain, the better the chance to make it all work and grow. But the analyses necessary can be complex since there are so many factors. The more tools we have in this area the better!

from my understanding hivemind brings new possibilities too so I am expecting 'you blockchain snoops' to bring us a whole range of new views. =)

Hi @miniature-tiger!

Your post was upvoted by @steem-ua, new Steem dApp, using UserAuthority for algorithmic post curation!
Your post is eligible for our upvote, thanks to our collaboration with @utopian-io!
Feel free to join our @steem-ua Discord server

Hey, @miniature-tiger!

Thanks for contributing on Utopian.
We’re already looking forward to your next contribution!

Get higher incentives and support Utopian.io!
Simply set @utopian.pay as a 5% (or higher) payout beneficiary on your contribution post (via SteemPlus or Steeditor).

Want to chat? Join us on Discord https://discord.gg/h52nFrV.

Vote for Utopian Witness!

Coin Marketplace

STEEM 0.29
TRX 0.11
JST 0.033
BTC 63458.69
ETH 3084.37
USDT 1.00
SBD 3.99