The blockchain as a serialisation format

ivoras (45)in #blockchain • 6 years ago (edited)

In the beginning there was Bitcoin, and its blockchain was (and is) thought of as a database. And fair enough, it was built this way: to find out anything about participants in transactions (addresses) or the details of the transactions, you need to inspect the blockchain data. Sure, you can (and usually do) cache this information in a proper, fast database if you're building a platform like a blockchain explorer or an exchange, but the point still remains: the blockchain is authoritative, in its entirety, and only taken as a whole, holds the complete picture.

Which is good.

Then came Ethereum and something interesting happened: the data ("state" of all accounts) is deliberately and purposefully kept separate from the blockchain itself. The state trie (which is just a database of a sort) is its own entity, documented, and integrated into the idea of Ethereum and its protocols. The blockchain itself (i.e. its transactions), is now only a carrier of instructions how to change the state.

Instead of the point of transactions being that they *are* the data queried in the blockchain, like in the UTXO system of Bitcoin, now the transactions are "diffs" to the state which is kept off-chain, while still referencing the relevant history of the state changes within the structure of the blockchain. Thus, the blockchain has become a carrier of state diffs, a serialisation format for instructions on how to change the relevant state, instead of being the state itself.

One could argue that UTXO system is also about diffs to the state, and it is a valid point, since each transaction to/from an address actually can be interpreted as an instruction to change the address' state, i.e. its balance (which is basically the only notable state data for Bitcoin), but it is far less explicit than in Ethereum, and doesn't allow for the next part of this discussion: state pruning.

In Bitcoin, if someone wants to calculate the balance for an old address (e.g. created in 2009), they literally must download and inspect the whole blockchain.

In Ethereum, if someone wants to find out the state (which is a richer concept than just the balance) of an old address, they will get it along with every other state by downloading a few dozen MB of state data from a full node.

Since the blockchain contains only instructions how to change the state, not the state itself (the Merkle root of the state trie makes cryptographically certain that no monkey business is happening), old instructions - meaning old blocks - can be safely removed.

The distinction about what is considered "old" basically only lies on the risk-sensitivity of the entity processing the transactions: I think there are basically two risks that can happen if all the block history up to a block, call it block X, is discarded: first, the astronomically unlikely one (and I mean it, because if it were likely the idea of the blockchain itself in cryptographic terms would be useless) that someone modifies the state in a roundabout way and then injects data into new blocks which reference the tampered state in a way which can't be noticed because old blocks were discarded; and second, if a rollback of orphaned blocks happens which involves blocks earlier than X. In other words, if a huge number of blocks get suddenly orphaned, which means the state needs to be rolled back past the point of the X block, which would be technically impossible - so full (or full-er) nodes need to be consulted for the missing blocks.

I believe the second risk will create a type of an archival service business which will record all the blockchain data "since the beginning of time", while full nodes will start keeping only the "recent" state (e.g. a fixed period of time, let's say 5 years of blockchain data for the really diligent ones, more likely 1 week for all the others).

I am writing this because I'm researching blockchains, I've been working on the tech side of blockchains since 2014, and it looks like, though I didn't plan for it happening this way, I'm working on a sort-of trilogy of blockchains. My first public, non-NDAd blockchain is Daisy, where I pushed the concept of what the blockchain is by making individual blocks SQL databases -- and the whole blockchain queryable with SQL. I've left it currently as a general-purpose project. The second one, WOT, implements a variant of the described state-keeping, but not, as Ethereum, with a single global state (hash of a Merkle tree of the whole state), but as partial state, only referencing states of accounts affected by a transaction, in addition to other things, like being thoroughly JSON-oriented, it is currently used for a specific purpose: implementing on-line trust (or "truthiness" if you will) but is easily adapted to other areas. T experience with designing those two is inspiring me towards the third one, which I sort of have a general idea and a glimpse of, which would not be a single-ledger one. We'll see how it goes.

#ethereum #bitcoin #tech