Do forks with fallback - my HF21 wish

sircork (64) 8 years ago (edited)

You overlooked that we had to do exactly a "fallback" when we were still on 19.2 and some 20.2 boxes inadvertently started running code prematurely just a week before causing an unintended early fork.

Which in turn caused all of us who had successfully prepared for this fork to have to roll back our already updated, and properly prepared boxes to 19.2 again. (Those of us with some wisdom and experience had our backups in reserve on 19.2, so we only needed to swap and reset to the last correct headblock checkpoint, but it was still a clusterfuck of bugs and errors that led to a two day outage.)

Your post isn't wrong, but the problem remains that the very creators of the code that has failed us twice now, cannot even explain what to expect it to do, have not documented anything in any clear and accessible way, and often do not even comment the places they change in the code, that in turn they cannot explain or predict to others. But sure, tell the "witnesses" to "read the code" - easy for arm chair observers to say. Impossible in practice in this context.

Then we dont test. Clones of our chain and front ends are popping up all over like weku, and smoke io and others. But for some reason, even though those small time outfits can clone our entire infrastructure ,we cant seem to have a test net that doesn't apparently need "frequent restarts and wasn't running mixed version nodes" according to one insider developer on that team I've spoken to, but won't indict here since he is new to that team and I doubt its his fault.

So yes, roll back is a thing. But WAY before that, lack of basic coding standards, peer review BEFORE dev commits, documentation, testing, experience and competence are bigger things.

$0.37

4 votes

tarazkp (80) 8 years ago

So yes, roll back is a thing. But WAY before that, lack of basic coding standards, peer review BEFORE dev commits, documentation, testing, experience and competence are bigger things.

Yep. Rollback should 'not be seen as an option' in testing and prep but, should be available if absolutely necessary, so it doesn't become a crutch.

$0.06

3 votes

sircork (64) 8 years ago (edited)

There is no way to prevent bugs in production systems. If there was, cars and planes would never crash. That said, we definitely don't have safeguards in place here to mitigate them to the levels we should have them in place.

Fundamental errors were made here. Fundamental checks were not done. Fundamental understanding of the changes being made en masse here was not achieved. Fundamental documentation was not provided PRIOR to the testing and release phases. Fundamental testing environments did NOT exist in a proper fashion for anyone to use, least of all the average witness.

Someone will definitely try to refute this in a follow up comment here, Id bet on it, and they will be taking advantage of double speak to try and throw shade and fool the public to discredit those who stand up with this claim because we don't have to worry about losing our income by falling out of the top 20 or our stinc employment, trust, fam. Cui Bono - who benefits?

Roll backs are an emergency exit. Before you leap through them, 100 other things should have been done before the plane left the ground.

"The witnesses are at fault, and should have read the code" is all at once, a truth, and a gross red herring at the same time. For reasons the average non-technical reader would even understand if we found the right simple metaphors to explain them.

Maybe this metaphor will work. The plane crashes. The engineers made a change to it that they did not create proper tests for, didn't document well, and did not publish about in advance. They had no one check their work before bolting it in the plane and when queried, said, well we can't really articulate it in english, but fly the plane awhile and we'll see how it works out.

After the crash they says, well, it's up to the fuel guys, the ground crew ,the flight attendants and the pilot and co-pilot job to make sure our random undocumented changes worked, right?

And the public doesn't know they are wrong, so they get away with such remarks.

And that's why there's a lot of FUD and pissed off people pointing fingers right now.

But it all comes back down to the fail of the engineers who set up the failure. And those who allowed it to become this way.

$0.12

personz (66) 8 years ago

I broadly agree with you but I have to say no to this:

Roll backs are an emergency exit. Before you leap through them, 100 other things should have been done before the plane left the ground.

No way. It's not the first thing but patch after patch to a live system - no. We're talking vaguely here, your 100 things might be 5 of mine, but as it sounds no. You've got to know when to say, it's not actually ready, let users come first and let take our time to get it right.

Ask yourself again, what's the rush for HF20? The sun will rise again.

$0.00

sircork (64) 8 years ago

This reply confused me, its like you are stating you want to disagree but then you sort of go ahead and agree? No snark, you lost me here.

$0.00

personz (66) 8 years ago

Haha! Okay I see that. What I'm saying is yes, for emergencies but it's theoretically always available, even now, but it is unthinkable to many witnesses. So no, not the 100th thing you try, the 5th thing.

The difference of attitude I'm talking about is perhaps more important. Repeat after me: We can go back. No one believes that.

$0.00

sircork (64) 8 years ago (edited)

$0.00

personz (66) 8 years ago

Thanks. Like I'm saying elsewhere it's both, it's not either / or. What happened last week isn't what I'm taking about. In the diagrams you see a for real plan B as fallback. We haven't ever had that.

Yes too testing, one hundred times yes. I had an idea before about changes as a matter of course in a defined time period, say 1 month. After that the test coins (TESTS I believe is the convention) are worth something on the main chain, at least something. The idea was not taken seriously but this may be the time to advance it again, or at least start looking outside the box at such solutions.

$0.00

sircork (64) 8 years ago

Hear! hear!

$0.00

bobinson (63) 8 years ago

rollback will not be possible if there are changes to the blockchain (database schema) and it will require complete replay.

$0.00

sircork (64) 8 years ago

Not entirely. We did it just last week, and only had to go back to the last good headblock.

$0.00

bobinson (63) 8 years ago

I was not able to reply because of lack of "MANA"

Not entirely. We did it just last week, and only had to go back to the last good headblock.

I was not around, but from what I understand it was from a minor version to another minor version. It was a Soft fork and not a hardfork which often includes changes to the "consensus" logic and also to the format/schema in which blockchain snapshots are stored. Part of the blockchain state from many plugins are now stored to rocksdb. So a restart is possible without replay in cases where schema and the consensus is not changed.

I believe eventually STEEM is heading to a model where only the consensus related data will be on the immutable blockchain and rest will be in various databases.

Also, I need to elaborate on the "roll back" - generally roll back means going back to the earlier state. So what I meant to say is that is complete re-index will be needed if there are changes to the consensus state. Roll back will be against the "immutable nature" of the blockchains. TheDAO attack on the Etherium chain is probably the best example where the immutability was not touched and forks were brought into fix the issues (with the smart contract) : https://ethereum.github.io/blog/2016/06/17/critical-update-re-dao-vulnerability/

go back to the last good headblock.

I am not sure how this was done - blocks after the last-good-head-block was ignored ?

$0.00

sircork (64) 8 years ago

You aren't wrong, at all about any of your assessment, past or present.

Except, there was a fork from mixed node versions, leading to a split chain, aka an actual unintended fork. We DID roll back to a checkpoint block and restarted the chain, quickly, and lost transactions (reversed,as if they never happened) so if we do it fast enough (too late now because way too much to undo), it is not entirely impossible.

$0.00

bobinson (63) 8 years ago

oh, I was not aware of this - interesting scenario. Thank you for explaining.... This sounds like a classic Byzantine generals scenario.

$0.00

sircork (64) 8 years ago

Careful, we might sound smart and able to code and NEVER make the top 20...

$0.08

bobinson (63) 8 years ago

Oh, everything I said above was bluffing ... There are infinite parallel blockchains and infinite number of top 20s ... as people from this chain and has done more hard freezes, i mean forks than every other blockchain in the known universe of blockchains put together, the immutable genesis blocks of all the chains will bless us with infinite amounts of mana ... the super intellectual state machines using probabilistic methods to maintain inter galactic consensus will help us with intelligence to even understand the meaning of 42 .... believe in Satoshi .. don't fear .. Amen! Aham Brahamasmi.

When I am not bluffing I speak like this. Will this help ?

PS: 42 is the "Answer to the Ultimate Question of Life, the Universe, and Everything" in The Hitchhiker's Guide to the Galaxy books.

$0.00

Show 1 more reply

tarazkp (80) 8 years ago

I understand that the witnesses are auditing the code themselves but perhaps there should be some professional auditors that independently do that part of the job as well as professional testers on the testnet. In my limited experience testing for Nokia, there was a reason they outsourced it to us and it wasn't price.

There is a certain confidence blindness in coders as well as when it is a tight knit group, a certain amount of social consensus even though there might be misgivings from individuals. This also could highlight a need to spread the top 20 to the top 20 core and the next 20 with all responsible for audit.

It would be more expensive but less so than continually being forced to rollback because of oversights.

$0.12

whatsup (75) 8 years ago

I would hope our witnesses would be professionals, after all, it is a paying job, by that definition they are professionals our should conduct themselves as such.

$0.00

tarazkp (80) 8 years ago

there is a difference between professional coder and code auditor though and often a difference in the way they look at the code. Having it independent also means that they aren't coloured by social dynamics or any particular outcomes. It is their job to find errors, not make sure it works and that often takes a different set of eyes.

$0.07

3 votes

whatsup (75) 8 years ago

Interesting. Do you know if other blockchains use this "service".

$0.00

tarazkp (80) 8 years ago

No idea but auditing code is common practice in most tech industries (it is a boring job) as it is like looking for spelling mistakes in a text. The testing I did was localisation for languages and it doubled as test service that ran specific test cases to look for errors.

Here, it might not even have to be as formal but I wonder how many witnesses fully audited it considering it had so many massive issues and then, in a couple weeks, how much of the testnet could be thoroughly tested. From my limited understanding, there is a fair bit of complexity and a lot of things that can easily be overlooked so, having fresh 'less' biased eyes limits risk a bit further.

$0.04

3 votes

whatsup (75) 8 years ago

Yeah, I hear you. I've heard many witnesses suggest they don't even try to audit the code. It is frightening. :)

$0.00

indigoocean (67) 8 years ago

Quantstamp audits blockchain code for tokens. Presumably they have some clients. They might not be able to do it for STEEM since we aren't an ERC20, but where there is one, there may be others who are more broadly focused.

$0.03

personz (66) 8 years ago

Interesting and interesting further discussion. I like this idea. The problem is in payments I think. Witnesses are incentivized to witness, but who incentivizes the testers?

Maybe Utopian could step forward to lead that, in collaboration with another group, even Stinc

$0.00

fknmayhem (70) 8 years ago

It was a clusterfork and while bugs happen, this also happened because the top 20 is slightly too stale, too settled.

There are many issues in this HF and several which should already have been covered before testnet even.

It is indeed easy to bash but the process pre-testnet has been lacking. Testnet should be considered an actual release candidate and also should have specific guidances about what to test/validate.

Meanwhile biggest exchange wallets have been down for “scheduled maintenance” since last freeze.

Let’s not beat around the bush here - and this comes from someone who generally thinks Steemit Inc has the right vision: in any other company heads would roll over this release. Even more so since Velocity was in the works for more than a year. It would be whether bye Ned or bye Vanderberg.

Simple as.

Lessons need to be learned from this.

By everyone.

By Steemit Inc, by the governance, by the wider witness community, and by us voters.

$0.07

4 votes

whatsup (75) 8 years ago

Agreed, the stone throwing isn't helping and I see the witnesses as our fail-safe. But they weren't.

What about hostile code? Do we have any faith that anyone is checking?

$0.00

personz (66) 8 years ago

I hope it doesn't look like I'm throwing stones. I'm trying to offer solutions to head off something like this next time.

$0.03

whatsup (75) 8 years ago

No, I didn't think you were throwing stones.

$0.00

personz (66) 8 years ago

Testnet should be considered an actual release candidate and also should have specific guidances about what to test/validate.

This is an important main step. I think @sircork is right in priorities, it comes before my idea here, but the idea here is relatively cheap so I see no reason to not integrate true safety.

You brought up exchanges. Can you fathom how many folks are hurting right now because of that? While I've said before it's unwise to rely on Steem for your bread and butter, it's fair to expect some degree of consistency and witnesses should do their best to maintain that. The exchanges are just reading the writing on the wall.

$0.00

indigoocean (67) 8 years ago

Here's another witness post from last week saying not to fork now, this one by drakos.

When I read it then, I had completely drank the cool-aid and was wondering why he was such a worry-wart. Bwahahaha!

$0.04

anthonyadavisii (70) 8 years ago

@personz... or should I say The Devil! 😈 jk jk

Other than that, you nailed it. Needs to be a coherent rollback plan. I hit on a test plan idea a little on my post but apparently the testnet operates differently so that is kind of wonky.

Posted using Partiko Android

$0.04

personz (66) 8 years ago

Can you go into some details on the wonkiness? Very curious.

$0.00

anthonyadavisii (70) 8 years ago

Yes, I learned the below from @inertia's post.

A testnet doesn't have the same number of tokens as mainnet, so we have to adjust the actual tokens for alice and bob, yet maintain proportionality.

In the initial version of Tinman, this was accomplished by creating accounts with an account creation fee above the recommended fee. The fee was then automatically applied to the account as STEEM Power, on the testnet. It's a "fee shortcut" that allows us to avoid extra steps.

Posted using Partiko Android

$0.05

justyy (84) 8 years ago

Yes, what we need is actually git revert

$0.00

dominion01 (60) 8 years ago

I agree. Good to have a backup.

Posted using Partiko Android

$0.00

oups (63) 8 years ago

As far as I know, there was a fallback solution.

Impact on User Experience
By measuring more of the critical resource types the blockchain will more accurately price operations in RCs, but that also means that as of right now, resources are not being accurately priced. So after the RC system goes live, the user experience will have to change and the new system will need time to reach a new equilibrium. Due to this uncertainty, we added a “fail safe” to the code that will enable witnesses to revert from the RC system back to the old bandwidth system if absolutely necessary."
_source

It seems like it wasn't absolutely necessary.

I totally agree to support the guys who know what they does.

And sad part is I'm not sure it's still on equilibrium(?) phase but one of the HF20's goal was to enable more people to sign-up through dApps or high level stakeholders. However since it became freemium there is no other option than powering up the newly created account. This is my alt account with 3SP

$0.00

sircork (64) 8 years ago (edited)

It was always freemium. You've always had a cost to operate. That is literally the root of a dPOS system. The revised math here simply attempts to enforce it more accurately.

In the old days, you just had somebody giving you a handout to get you started, and look at the noise and spam that created. Your remark betrays that you do not fully understand what proof of stake means. Not a snarky jab, just a simple fact. It's well documented, beginning with the white paper and 1000s of posts since then. Please investigate, you'll find it quite enlightening.

$0.00

oups (63) 8 years ago

Dang I shouldn't press enter so fast. Now I left with 13 comments with 1200 SP. I'm a little bit tired so I forgot to check what I wrote in the first place, I kinda repeated again. Sorry about that. In the meantime I was looking for games to play because I can't afford to be active enough on the platform.

$0.00

sircork (64) 8 years ago

Yes, the equilibrium phase will help you, in a few days time, but also, a person with less money is not entitled to have equal signage space in times square with a person who can afford to put a sign on every building, and thats probably okay, because that same town square DOES allow the poor person to at least stand there with a sign of his own making, and if his message is solid, someone will help him be seen.... If its just spam, he will not be annoying everyone with giant blinking signs. And that's okay.

$0.03

valued-customer (67) 8 years ago

I don't think 'equal signage space' is the goal. If new users can't post, comment, and vote, they'll leave.

Whales can post dictionaries. New users don't need that, but they need to be able to engage effectively for Steem to survive.

Bandwidth - the ability to speak - shouldn't be a barrier to nominal engagement. If the result of HF20 is such a barrier, we're about to see the definition of 'death spiral'.

$0.00

sircork (64) 8 years ago

I don't think you understood my comment accurately. But no matter, neither of us can afford this comment thread till the patch and it will all be different then anyway. :D

$0.00

oups (63) 8 years ago

Thanks for the information, sir. Yes I'm a bit ignorant about crypto and its terms like dpos. As a regular social network user I feel this freemium thing after HF20, when my interaction capabilities down to some numbers. For example we lose that great argument "How much did you earn from facebook so far". or "3 seconds transaction/s (but only once per day if you are new)"Facebook wouldn't mind me spamming, I agree it may help to solve spam but it shouldn't turn this place into where only the rich ones able to talk, I think we are still in equilibration phase, otherwise a newly created account doesn't have any option but powering up. Which kinda hurts the aim of the HF20. AFAIK it suppose to increase the sign-ups via dApps using RCs. However I don't know who would want to put money in day one.

$0.00

felix.herrmann (65) 8 years ago

Can i See my current resource token somewhere?

$0.00

sircork (64) 8 years ago (edited)

steemd.com, top left

[edit], well you could yesterday, looks like it's being updated and that feature is now gone again

$0.00