Witness Configuration Leads to Missed Blocks

in #helpie5 years ago (edited)

helpie witness (4).png

😍Greetings, fine friends!😍

the news.png

Blocks Missed

It pains us to say it, but @helpie missed quite a few blocks yesterday due to a misconfiguration. It was slightly tricky to figure out because the misconfiguration actually happened much earlier than the trigger.

Timeline of the event is below, but the gist of it was that the server was up and running, finished replay, and we thought we needed to change a key and activate. Turns out the key change did not take effect at the time, and the problem appeared at the next restart. Incidentally, this happened when v20.9 was announced and we performed the switch.

Timeline

Times are in UTC

2018-01-20 - Beginning: Proper setup. Witness Replaying, etc.

2018-01-20 - Right before re-launch of witness configuration was modified, replacing brain private key with something else.

2018-01-24 03:35 - Last block produced before the incident.

2018-01-24 03:52 - Update to v20.9. Server restarted.

2018-01-24 ??:?? - Missed block. Investigated, and made a few tweaks to the build and restarted the server.

Missed 6 blocks.

2018-01-24 13:30 - Brought docker back up to date, restarted. Still missed the block.
2018-01-24 14:00 - Revert key. Server restarted.
2018-01-24 16:17 - Back in business.

Reflection

The server looked like it was running just fine in the logs. One of the complications was not having logs readily available to the time of the miss. Now we know we can run something like this:

docker logs seed --since 2019-01-24T15:30:00Z --until 2019-01-24T16:30:00Z -t

(We are using Steem-in-a-box here, and seed is the docker image name shown in the run.sh script) and it would have shown the Cannot produce block, no private key for STM... error. If the server shut down, we think logs still might be available with the given docker ID, but we didn't check this yet. Disabling the witness would keep the server running for the above command to work though.

The other big miss here was the response time to mitigation (e.g. no reported missed blocks). In hindsight, we should have disabled the witness while diagnosing the issue. We thought our actions fixed the problem but that was just a guess.

Lessons learned, that's for sure. We will take measures to prevent such problems, and be forward-thinking to handle other potential issues.

Action Items Taken

  1. (At least for silly @eonwarped, GINAbot notifications. now done.)
  2. Set up monitoring/pagers, e.g. using Witness Essentials. This looks to have paging, and auto-witness-shutoff, which seems exactly what we want.

Witness Specs

I don't believe we've mentioned this yet.

We have the following specs:

  • 64G DDR4 RAM
  • Intel i7
  • 2x250GB SSD
  • 1gbit/s network
  • Finland Dedicated Server

- The Helpie Witness Team

blocks border.png

vote for witness helpie.png

Helpie is an invite-only community, which focuses on community, content and the STEEM blockchain. We are always seeking new members who are creating quality content but aren't getting the reach or support they could be.
Once on our radar, our scouts will reach out with more information.

Sort:  

Hay ocasiones en las que perdemos, a veces tiempo, a veces dinero, pero si nuestro plan de vuelo es constante, y tengamos nuestra meta clara, podemos vencer las adversidades que encontremos en el camino.

This post has been included in the latest edition of SoS Daily News - a digest of all you need to know about the State of Steem.



Coin Marketplace

STEEM 0.30
TRX 0.12
JST 0.033
BTC 64400.33
ETH 3140.71
USDT 1.00
SBD 3.93