WTF is a blockchain?
(This will be part one of a series because there are many aspects of "blockchain" to consider. Here I will start with "blockchain as a data structure" in the context of Bitcoin. The next post will consider "blockchain as a technology".)
A blockchain is a chain of blocks.
💯A blockchain is a type of data structure uses a chain of blocks to manage circulation of digital currencies such as Bitcoin, Litecoin, and Ethereum (to name 3 of thousands).
WTF is a data structure?
💯A data structure in computer science refers to the format that we use to hold data.
For example, let's say that our data is names of people. We can hold the data in a list format using a data structure called an ArrayList:
[Alice, Bob, India, Grayson, Latoya, Ivan, Stacey, Treyvon]
Imagine that this list of names actually represents a genealogy. We could use a different data structure to indicate the relationships between each of the names. A good way to do that would be using a LinkedList:
Now we have a data structure to which we can apply some meaning--"Alice was the child of Bob, Bob was the child of India, India was the child of Grayson, Grayson was the child of Latoya, Latoya was the child of Ivan, Ivan was the child of Stacey, and Stacey was the child of Treyvon."
💯A blockchain is very similar to a LinkedList. Each item in the the list is called a block and the blocks are linked together to form a chain.
WTF is a block?
💯A block is a data structure that acts as a container to hold transactions. Every block has two parts:
- The block header
- A Merkle tree
WTF is the block header?
💯The block header contains metadata (aka summarizing information) about the block. The most pertinent metadata are:
- the hash of the previous block that this block is linked to
- the "Merkle root", which is the name for the label that references this block's Merkle tree
WTF is a hash?
💯A hash is what we call the output of a hash function. Blockchains use cryptographic hash functions to produce hashes.
WTF is a cryptographic hash function?
💯A cryptographic hash function takes some input and produces an output that is a unique identifier for the input. If we use the sha256 hash function, every time we hash the name Alice as an input, we will get 3bc51062973c458d5a6f2d8d64a023246354ad7e064b1e4e009ec8a0699a3043 as an output.
hash_sha256("Alice") = '3bc51062973c458d5a6f2d8d64a023246354ad7e064b1e4e009ec8a0699a3043'
Hashes are useful because they allow us to efficiently identify data of any type or size.
In a blockchain, every block is hashed and the hash output is placed in the header of the block that comes after it. This defines our linkage relationship between blocks. Now we can add more detail to our blockchain visual:
block4 block3 block2 block1 |header4 | |header3 | |header2 | |header1 | | hash(block3)| -> | hash(block2)| -> | hash(block1)| -> | hash(nothing)| | Merkle root4| | Merkle root3| | Merkle root2| | Merkle root1 | |Merkle tree4 | |Merkle tree3 | |Merkle tree2 | |Merkle tree1 |
Note: This is similar to our genealogy example, in the sense that the left-most block is the "youngest" (i.e. was added to the blockchain most recently) and the right-most block is the "oldest".
WTF is a Merkle tree?
💯A Merkle tree (also called hash tree) is a binary tree data structure where all of the leaves are transactions and everything else in the tree (the non-leaves) are hashes of the child items below them. The top-most item in the Merkle tree is called the Merkle tree root or "Merkle root".
WTF is a binary tree?
💯A binary tree is a data structure where each item, called a node, can have no more than two direct descendants (aka children) that link to it. The linkages are very similar to the ones we saw with the LinkedList. An example of a binary tree could look like this:
[Treyvon] ^ ^ / \ [Stacey] [Sean] ^ ^ ^ ^ / \ / \ [Ivan] [Imani] [Maisie] [Matthew]
We can apply meaning to this binary tree and say "Treyvon had two children, Stacey and Sean. Stacey had two children, Ivan and Imani. Sean had two children, Maisie and Matthew." The leaves are the nodes at the very bottom of the tree (e.g. Ivan, Imani, Maisie and Matthew). Everything else above we call non-leaves (e.g. Treyvon, Stacey, and Sean). The top-most node in the tree is called the root.
A blockchain Merkle tree has transactions (we use "tx" to represent "transaction") as the leaves and hashes as the non-leaves. The top-most hash is the Merkle root:
[hash(children)] ^ ^ / \ [hash(children)] [hash(children)] ^ ^ ^ ^ / \ / \ [tx1] [tx2] [tx3] [tx4]
So what you're saying is...
To create our blockchain we first produce hashes of transactions. Then we hash those hashes, and hash those hashes until we get a single hash that is the Merkle root. We then store this information in a block along with the hash of the preceding block, which effectively links our newest block to the rest of the chain.
This is what enables a blockchain to act like a ledger of transactions. Transactions are constantly being included in blocks that are constantly being added to the same chain. So we have every transaction recorded in one place.