Fixing Steem With Baby Steps Part 10: Database API Necessities

in utopian-io •  16 days ago


baby stepping stones fix steem.png

I've been putting a lot of hours into figuring out the Steem API and JavaScript. I've started a lot of projects but haven't finished anything, which is pretty annoying. Often times I get stuck, and when I can't find the answer I jump somewhere else. In the process of doing something else I come across information that solves the original problem. I bounce back and forth a lot. Good for learning; terrible for completion. Once I learn more it should get better.

In any case, I've learned a some things about the Steem API, and I am thoroughly unimpressed. The functionality is sub-par and the documentation is even worse. Often times I run into a problem that I think should have an easy solution, so I spend a lot of time looking for that answer. More often than not the functionality doesn't exist, or the documentation is so bad that I can't find it. That's the worst part: not knowing.

I was patient and I allowed HF20 to go through. I was told there would be improvements to documentation. If there are any I'm unaware. The API docs page looks almost exactly the same. This is really frustrating because I know that I'm just gonna have to sit on my hands and wait until SMTs get finished. Only then will Steemit Inc care about their API, because they are expecting all the SMT projects to use it.

However, I'd like to get a head start and offer some suggestions for the API. There are a lot of basic things that could be changed that would offer a lot of functionality.

Proposed Changes

First off, I'd like to point everyone's attention to the Discussion Object. A Discussion is a blog post or a comment; clearly the foundation of the blockchain. When you query https://api.steemit.com for a particular Discussion this is what you get:

Okay... so this is a blockchain... do you see the word block anywhere? No. So how am I supposed to figure out which block this discussion was posted/rewarded on? There are some dates:

  • active
  • cashout_time
  • created
  • last_payout
  • last_update
  • max_cashout_time

This would lead one to believe that their is a function that allows you to retrieve a block using a date. Spoiler alert! There isn't one!

So now I feel compelled to create a hack that downloads the latest block, records the date, and then extrapolates that data to find the block I'm looking for. Just the thought of doing this enrages me because that is a basic function that should be displayed front and center for any programmer to see. Take a look at the function list. How is anyone supposed to navigate that garbage cluster.

I would make the argument that recording all these dates is ridiculous. It's a blockchain! Dates don't matter! It's the blocks that matter. Put the date information with the blocks, not the discussions... duh.

Slightly less embarrassing are the naming conventions. When you have objects with this many properties you need to sort them alphabetically.

Old NameNew Name
activedate_active
cashout_timedate_cashout
createddate_created
last_payoutdate_last_payout
last_updatedate_updated
max_cashout_timedate_max_cashout_time (why does this even exist?)

Again, properties like last_update shouldn't even exist. It should be a property called blocks_updated and it should be an array of every block_num that the Discussion was modified, starting with the block it was created on. This, in turn, also eliminates the need for block_created because that info is already stored in blocks_updated[0]


Naming convention problems don't end there. All properties that refer to vests and voting are cluttered as well.

Old NameNew Name
abs_rsharesrshares_abs
children_abs_rsharesrshares_abs_children
net_rsharesrshares_net
vote_rsharesrshares_gross

I just flagged my own post to confirm that vote_rshares is indeed the gross value of reward share received. Does this make logical sense to anyone? Adding together flag rshares with regular upvote rshares? Why not just have rshares_upvote and rshares_flag??? In what universe should those two numbers be added together? I'm suddenly reminded of an old bug I read about that counted large flags as upvotes... Wow, the foundation of bugs like that is rooted in the nonsensical organization we see here.


Other random names I would change:

Old NameNew Name
parent_authorauthor_parent
parent_permlinkpermlink_parent
total_vote_weightvote_weight_total
root_titletitle_root
active_votesvotes

There comes a point when you have so many properties in a single object that the goal should be to group like attributes rather than make the names more readable in the English language. I'm constantly scanning the objects of the Steem API wondering if I missed something because it's so disorganized.

Sadly, I'm not even done with Discussions yet because there is still a problem with the active_votes array. Besides obviously just being called votes(because inactive votes aren't even sent to you, and if they were just call it votes_inactive) active_votes doesn't tell you where curation got rewarded.

This is a blockchain, and the very least Steemit Inc could do is make it easy to track where money on the blockchain is going. This is one of the most unacceptable oversights I've come across in my travels with Steemit Inc's blockchain database. I wanted to make a quick script to track curation rewards and realized I couldn't do it with Discussions.

No problem right? I'll just download the block that the Discussion was rewarded on. Oh wait! I can't do that either. This incompetence is downright embarrassing. For hours and hours I thought I was simply missing something. I was diving deep and finding all kinds of near-useless functionality while looking for this blatant oversight. Now I'm nearly convinced that it doesn't exist.

Blocks are pretty easy to navigate, but again, the documentation is bad. I was trying to match my account history to the block it was published on, and I couldn't figure it out. That's because the get_block() function I was using wasn't showing me the virtual transactions (mostly reward oriented). I eventually found the get_ops_in_block() function which does give you all the operations (if you tell it to), but again, this is a failed naming convention issue. I would have found this function a lot faster if it was called get_block_ops() or get_block_operations()

I want to say you should be able to request blocks (or block operations) with other information besides the block number. You should also be able to do it with the date or even the Discussion ID. However, this functionality would be completely unnecessary in the first place if there was simply the obvious solution of putting the block numbers in the Discussion object... seriously... mind blowing that it's not there.

There are other problems with the Account Object, but they are completely overshadowed by the ones rooted in the Discussion Object. Mostly just more naming convention atrocities and useless information. The nice thing is that these changes can be made without any change to the blockchain. The objects I'm referring to exist on Steemit Inc's centralized database so changes can be made without a fork or anything like that.

I feel like this is a very important thing to point out because even I was having trouble understanding what's going on. The Steem blockchain is actually very simple. Because we have a block every 3 seconds each one is pretty small. Mostly a block simply consists of a list of transactions that occurred on that block. Posts, comments, upvotes, rewards, and a bunch of random stuff you'll never have to worry about.

However, the real complexity comes into play when you try to interpret the blockchain. There is no such thing as a Discussion on the blockchain. You can only know how many upvotes a post got if you have access to every block on the chain. The way to get around this fact is to request the information from the Steemit Inc database or from some other source that has access to every block. Therefore, I don't have a problem with the Steem blockchain; I have a problem with how Steemit Inc is interpreting the blockchain and how they choose to organize that information when they send it out into the world.

This has some interesting implications. Have you ever wondered why you're allowed to edit a post? Blockchains are immutable, after all. If you're like me, you wondered why an edit costs just as much RC as the actual post. That's because an edit IS an entirely new post. Every time you edit something Steemit Inc ignores the old info and only displays the new, but the old info is still there, forever immutable. This means we can go back and look at comments people edited and see what they wrote on every iteration.

If I was so inclined, I could create my own Node and API, but I think @fulltimegeek is more qualified in that area and has a head start. Besides, I'd rather be working on my other projects.

In the near future I'll be creating that hack that lets one retrieve a block using a date; a function that should clearly already exist. It's going to be inefficient because I don't have my own node, but it will work. It will simply require 2 or more calls to the Steem API instead of one.

Steem on Steemians. The blockchain might be a primitive underdeveloped unprofessional nightmare, but it's our primitive underdeveloped unprofessional nightmare.

Ownership = Potential


Steemit Virtual Government
Part 1: The Vault
Part 2: Improved Filters
Part 3: Resteeming
Part 4: Hot and Trending Tabs
Part 5: Permanent Vesting
Part 6: Post Order Priority
Part 7: Upgrade @null
Part 8: Official Android Support
Part 9: SBD

Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!
Sort Order:  

SteemSQL may be of help in tracking down what happens in the blockchain. It is not very well documented, either. But finding out what has happened in the blockchain is a breeze with it compared to the API. It might be easier to cross-reference and check things using SteemSQL than to conduct experiments like flagging your own posts.

·

I tried to use SteemSQL and it broke my computer. On a side note: fuck Microsoft.

I don't understand why no one has created a mySQL database yet. I was actually thinking of doing this myself. I've already put a lot of time into creating my own Texas Holdem Poker database so i have a little experience.

·
·

My biggest problem with SteemSQL is that my password stopped working in the middle of a paid subscription.

·
·
·

Oh... drag.

·
·
·

Sorry to say, but I do not see any subscription to SteemSQL with your account!

·
·

How could SteemSQL break your computer? It's a cloud service.

I don't understand why no one has created a mySQL database yet

There was some initiatives to do so but they all gave up.

SteemSQL runs on a high-end enterprise class infrastructure and the database is close to 1TB, mainly because of the numerous indexes you need to create if you want good performances when you are hammered with thousands of queries every day.

·
·
·

I downloaded Microsoft SQL and it busted me pretty hard. I forget exactly what happened but I assume the issue was due to me already having installed mySQL or because I have a Linux / Windows 10 dual operating system.

·
·
·
·

If you need a light but still powerful client for MS-SQL, give a try to Heidi SQL. I often use the portable version to do some tests and really love it.