Bug on Hivemind’s following data
Project Information
- Repository: https://github.com/steemit/hivemind
- Project Name: Hivemind
- Publisher: Steemit inc.
- Related issue at Github: https://github.com/steemit/hivemind/issues/191
Problem
Hivemind backed api.steemit.com
reports invalid/missing following data for some of the accounts. (In comparison to a full node)
How to reproduce
- Query the user
curbot
's following list. (condenser_api.get_following
)
curl -s --data '{"jsonrpc":"2.0", "method":"condenser_api.get_following", "params":["curbot",null,"blog",100], "id":1}' https://api.steemit.com
- Do the same query on a full node: (https://rpc.usesteem.com)
curl -s --data '{"jsonrpc":"2.0", "method":"condenser_api.get_following", "params":["curbot",null,"blog",100], "id":1}' https://rpc.usesteem.com
You can see the response is different and incomplete in api.steemit.com.
.
A Python script the detect discrepancies
I believe this is not an exceptional case. I have seen more discrepancies like that while trying to test/benchmark the tower's new endpoints.
This Python script detects discrepancies on follower lists.
from steem import Steem
from steem.account import Account
def get_diff(account):
followers_on_hivemind = Account(
account,
steemd_instance=Steem(
nodes=["https://api.steemit.com"])
).get_followers()
followers_on_full_node = Account(
account,
steemd_instance= Steem(
nodes=["https://rpc.usesteem.com"])
).get_followers()
print(
"Accounts listed on api.steemit.com but not in the rpc.usesteem.com")
print(set(followers_on_hivemind).difference(set(followers_on_full_node)))
print("*" * 42)
print(
"Accounts listed on rpc.usesteem.com but not in the api.steemit.com")
print(set(followers_on_full_node).difference(set(followers_on_hivemind)))
The result for @emrebeyler
's followers:
Accounts listed on api.steemit.com but not in the rpc.usesteem.com
set()
******************************************
Accounts listed on rpc.usesteem.com but not in the api.steemit.com
{'hariyati.amin', 'curbot', 'kenzyobiadi', 'erhanbute'}
After some digging, I have found a rare case on a differently formatted custom json.
For example, I have checked the account history of curbot
that when he exactly followed my account, and found this transaction:
Transaction ID: aaccccb73b6dfcb4bbf95f6d2dcb76e1c87137e9
Looks like curbot
was bundling follow operations into one transaction. And steemd picked up these and registered as valid follow actions.
However, hive's indexer ignores the custom_json
op if loaded json's length is greater than 2.
For this case it's greater than 2 because the format is like:
[
['follow', {
'follower': 'curbot',
'following': 'kevinwong',
'what': ['blog']
}],
['follow', {
'follower': 'curbot',
'following': 'nothingismagick',
'what': ['blog']
}],
['follow', {
'follower': 'curbot',
'following': 'simnrodrguez',
'what': ['blog']
}],
['follow', {
'follower': 'curbot',
'following': 'steem-ua',
'what': ['blog']
}],
['follow', {
'follower': 'curbot',
'following': 'decentraland',
'what': ['blog']
}],
['follow', {
'follower': 'curbot',
'following': 'mikepm74',
'what': ['blog']
}],
['follow', {
'follower': 'curbot',
'following': 'empath',
'what': ['blog']
}],
['follow', {
'follower': 'curbot',
'following': 'emrebeyler',
'what': ['blog']
}],
['follow', {
'follower': 'curbot',
'following': 'eroche',
'what': ['blog']
}],
['follow', {
'follower': 'curbot',
'following': 'ervinneb',
'what': ['blog']
}]
]
This explains curbot
.
Regarding my other 3 missing followers:
Follower | Following | Tx ID | Block num | Timestamp |
---|---|---|---|---|
erhanbute | emrebeyler | d10dcd1bdb661fc4e63f2464fa2262624db5d003 | 26710986 | 2018-10-11T09:55:21 |
kenzyobiadi | emrebeyler | 9ef235eb36aac5e466b97ad3e459b7eb9495f898 | 26492393 | 2018-10-03T19:38:45 |
hariyati.amin | emrebeyler | 383a36f7aa65724eb634ebdae141366674dc1df8 | 26450469 | 2018-10-02T08:41:33 |
Timestamps suggest that it happened between 2018-10-02
a 2018-10-10
. These transactions don't involve anything unusual.
Additionaly, I have checked roadscape
's followers on Steem:
Got this discrepancies:
{'curbot', 'kamvreto', 'msutyler'}
We know the problem w/ curbot
so I have checked the other accounts.
For the kamvreto
, they followed roadscape
at 2016-07-25T22:35:12
.
Here is the account history output:
{
'trx_id': '2b7595b1f3e0e0105156d518b83d7eeaa19b6070',
'block': 3514062,
'trx_in_block': 3,
'op_in_trx': 0,
'virtual_op': 0,
'timestamp': '2016-07-25T22:35:12',
'op': ['custom_json', {
'required_auths': [],
'required_posting_auths': ['kamvreto'],
'id': 'follow',
'json': '{"follower":"kamvreto","following":"roadscape","what":["posts","blog"]}'
}]
}
It was a legacy custom_json transaction. The tricky part is that transaction's what
property includes two elements.
You can see the Follow constructor expects one element:
https://github.com/steemit/hivemind/blob/60dc61ee4bbde2080421a3fdf10c5b83be840e8b/hive/indexer/follow.py#L71
For this reason, Hive also ignores that.
The problem is same with the other missing follower of roadscape
:
{
'trx_id': 'c7694ff17ba7ba3fbe1740f05c2727ecbd98cd62',
'block': 3409232,
'trx_in_block': 1,
'op_in_trx': 0,
'virtual_op': 0,
'timestamp': '2016-07-22T06:18:27',
'op': ['custom_json', {
'required_auths': [],
'required_posting_auths': ['msutyler'],
'id': 'follow',
'json': '{"follower":"msutyler","following":"roadscape","what":["posts","blog"]}'
}]
}
Expanding the sample size:
Discrepancies on @utopian-io
's followers:
Accounts listed on rpc.usesteem.com but not in the api.steemit.com
{'qawazd', 'steemgems', 'curbot'}
Follower | Following | Tx ID | Block num | Timestamp |
---|---|---|---|---|
steemgems | utopian-io | 25e9c3d8e625e634b68bd5e16e99327fd37174ae | 26722368 | 2018-10-11T19:25:27 |
qawazd | utopian-io | 8de43899a8ad84b8bd65a896e71e3e0eafda0757 | 26838941 | 2018-10-15T20:37:51 |
Follow operations are valid. Dates are close to what we miss at @emrebeyler's account: 2018-10-11
and 2018-10-15
.
TL;DR
We have missing follow ops on api.steemit.com's hive instance. (Generally clustered around the month
2018-10
.)Hive ignores if the follow operation includes multiple follows. (steemd accepts it. The case with the @curbot)
Hive ignores some legacy follow operations. Because, these ops may include two elements in the
what
property. (Ex:["posts", "blog"]
)
Thanks for your contribution.
Apologies for the delay in review.
Your contribution is well detailed and the steps were very easy to follow. Overall I really like the amount of detail you put into the investigation, both within this contribution and inside the GitHub issue. This really is great!.
I can see that a collaborator has acknowledged the issue which is also great to see.
Although there is no potential fix provided, the level of detail you have added will reduce the level of investigation required by any developer looking into this considerably.
Overall, great work and once again, thanks for your contribution.
Your contribution has been evaluated according to Utopian policies and guidelines, as well as a predefined set of questions pertaining to the category.
To view those questions and the relevant answers related to your post, click here.
Need help? Chat with us on Discord.
[utopian-moderator]
Thank you for your review, @tobias-g! Keep up the good work!
ǝɹǝɥ sɐʍ ɹoʇɐɹnƆ pɐW ǝɥ┴
Great pickup! There are obviously still some teething issues with Hivemind and there will always be the need for some full steamd nodes to enable these sort of checks. Question is who pays for them?
Posted using Partiko iOS
Witnesses! :-)
I am planning to fire up a full node. Just waiting for the top20. 🎉
Do you think it’s realistic to expect all Top 20 witnesses to run full steamd nodes (with 512Gb RAM instances and the cost that goes with this?)
I think it’s reasonable to expect them all to run Hive based Full nodes (2x64Gb + 32Gb instances) but a smaller subset will still need to run full steamd nodes. Question how they are
compensated for the extra cost.
Posted using Partiko iOS
I think it's reasonable to establish your own expectations for what witnesses at each level should be doing to deserve your vote, and it seems entirely reasonably for those expectations to be set dependent on level.
(Cloud-based server fine under 50, physical server above 50, full node in top 20, for example)
2018-10-11 to 2018-10-15
Remind me again, what were the release dates for #hf20? Is this potentially related to one of the hotfixes applied at that time?
I dont think so. There is no problem on full nodes, they’re returning the data correct.
It might be a hiccup/past bug on tgat timeframe. Hard to say before a full index on a new fresh hivemind.
Would there be any way to detect this bug without a reference API node?
I don't think so. We need something to double check/cross reference. :)
Actually i love to learn this aspect of computer that deals with this coding and stuff like this but i am fearing that i will not be able to cope because of its level of complex
Hello!
I am a community manager at Snax. We are trying to make public blockchain based on EOS node. Snax chain will provide transactions over social networks, token supply based on user social influence.
Snax as well as Steemit rewards its users for the content created, but Snax works as overlay solution over existing social networks (e.g. Twitter)
We have no ICO. We already have a testnet, mainnet will be launched this month, and we currently looking for great candidates for Block Producers like yourself. You can find out more about us at our website snax.one
If our project is interesting for you, please let me know by emailing me at [email protected]
Looking forward to hearing from you, and keep rocking this world!
This post has been included in the latest edition of SoS Daily News - a digest of all you need to know about the State of Steem.
Hi @emrebeyler!
Your post was upvoted by @steem-ua, new Steem dApp, using UserAuthority for algorithmic post curation!
Your post is eligible for our upvote, thanks to our collaboration with @utopian-io!
Feel free to join our @steem-ua Discord server
Hey, @emrebeyler!
Thanks for contributing on Utopian.
Congratulations! Your contribution was Staff Picked to receive a maximum vote for the bug-hunting category on Utopian for being of significant value to the project and the open source community.
We’re already looking forward to your next contribution!
Get higher incentives and support Utopian.io!
Simply set @utopian.pay as a 5% (or higher) payout beneficiary on your contribution post (via SteemPlus or Steeditor).
Want to chat? Join us on Discord https://discord.gg/h52nFrV.
Vote for Utopian Witness!