Blockchain for the Layman
During my usual morning browse of Facebook before having my shower, I came across a request for a blockchain expert from an acquaintance who has a podcast. While I don't feel I'm an expert, I did feel the need to write something about it for my own reference as well as that of others. Sure, there are many other explanations covering blockchains out there, but perhaps my own may present the subject in a way some may understand. Forget about Bitcoin, cryptocurrencies, smart contracts, and all that because we won't be covering those just yet.
This is by no means an exhaustive summary of the blockchain, but is meant to serve as a very basic introduction for those unfamiliar with cryptography, software engineering, databases, and so on.
Before moving to more technical details about what a blockchain is and how it functions, let's discover why it's called a "blockchain". The term encompasses two separate but interlinked concepts, a block of data and a link connecting said block to other blocks. These blocks can be thought to form a "chain" of data blocks; and could loosely be considered a form of linked-list. In the computer sciences we typically refer to data blocks of many types as records. Note that for our purposes the term "record" will be used but is synonymous with the term "block" with respect to blockchains.
What is a Blockchain?
A blockchain is simply a form of database in which each new record is stored atop the record prior to it. This doesn't differ much from other methods of storing data; however, what makes the blockchain unique and very useful can be found in how the links connecting records operate.
Simplified diagram of a blockchain
In numerous other storage methods resembling linked-lists, the link serves as a pointer, or a "street address" of sorts, telling the computer where to find the next record. But because the records within a blockchain never change their "street address", Record 11 is always after Record 10 which is always after Record 9 and so on, the link is used instead to store a representation of the data within the previous record.
Digesting Messages, Yummy!
The representation of a record's data is called a cryptographic hash, and is calculated by performing a cryptographic hashing function on the complete contents of the previous record (the record's contents and link). The nature of this hashing function ensures that a change in one tiny piece of information within the record will cause a the hashing function to produce a dramatically different output (thanks to the avalanche effect). This output, or result, is known as a digest and is often smaller in size than the input data.
Cryptographic hashing and data integrity
Cryptographic hashing has numerous uses, two of which may be applied to blockchains. The first is to provide a method of reducing large and arbitrarily sized records to much smaller pieces of information with a fixed-size. This can be seen in the above diagram in the first and second entries; the second entry contains more characters than the first, yet both digests contain the same number of characters. This first application helps form the basis for the second function - checking data integrity of each record. We don't care what information a particular record contains, only that we can detect records whose contents have been changed or tampered with.
Let's assume we have a record containing data found in the first entry of the diagram above; we perform a cryptographic hash on it which results in the corresponding digest, colored in green. Now let's assume that we throw our copy of the record away to never be found again. Let's also assume that we occasionally receive records from other computers who claim these records are identical to the one we threw away. How will we know if these records are indeed identical to the one we threw away? We use the digest as a fingerprint to identify records whose contents have been changed. Notice in the above diagram that performing a cryptographic hash on entries two, three, and four all result in different digests because each entry is not identical to any of the others. We cannot deduce what has changed about a record, only that it has changed. If we know a record has changed we assume that it is invalid and ignore it.
Each record's link consists of the data and link from the previous record
Employing cryptographic hashes in this way prevents tampering with existing records by making it more computationally difficult to do so with each new record added on top of the one we wish to alter. For example, we cannot merely change the contents of Record 2 and assume everything is well. Changing Record 2 would creates a different hash for the record, and in-turn results in a different hash for the record above it. This would cascade up the remainder of the blockchain and result in all records above Record 2 becoming invalid.
To further aid in detecting records which have been altered, each also contains a timestamp, essentially indicating the date and time it was created. This timestamp is included, along with the record's data and link, in what the cryptographic hash function processes to form the link for a new record.
These features lead to the characteristic that a record may not be altered once a new record is placed atop it, even if such alteration is desired and legitimate.
Taking a Step Back
To this point we have been focused on the very basic structure of a blockchain, but let's take a step back to view a little more of the picture, how to use a blockchain.
Blockchains are intended to maintain identical copies of data across multiple computers
As previously discussed, blockchains are merely methods to store information; however, they are intended to be used specifically as distributed databases. This means that information is stored on thousands or millions of computers and is therefore resistant to corruption, loss, accessibility issues, and other problems. In addition, the way blockchains operate is in a decentralized manner with no one computer in charge of the blockchain. Each computer in a blockchain-based network maintains a copy of the blockchain which is identical to that on every other computer in the network. For the purposes of this discussion we will refer to these computers as hosts.
Propagation of data from multiple clients across multiple hosts
Each record consists of data that individual clients or users submit for inclusion in said record. Because most clients are not connected to every host in the network, it is the responsibility of each host to relay (or forward) any new data it receives to all other hosts it is connected to. In this way data submitted by clients quickly propagates through the network to all hosts. This process of collecting data is how an individual record is built.
Without a mechanism to control the amount of data within a record several problems would surface, mainly that a record would never be written to the blockchain and a record could become so large that it presents a burden to the system. To prevent this, the element of time is often used as a control; for example, the Bitcoin network tries to achieve placing a new record into the blockchain approximately every 10 minutes. Once a record has been placed into the blockchain, or committed, it effectively becomes unalterable. The reason for this is that while there is no record on top of the newly committed record, the majority of hosts consider the record complete and have no reason to alter it, instead moving on to gathering data for and constructing a new record. We will next discover why hosts prefer to move on to constructing new records rather than trying to modify the previous record.
Making it Pay
The blockchain we have discussed so far may be suitable for distributed databases operated by a business for its own purposes or by individuals out of the kindness of their hearts, but it has no incentive for anonymous parties to maintain the blockchain. How does database maintenance become profitable?
The incentive lies in awarding some form of currency to hosts, but how does the system determine which host receives the award? Many blockchain systems award currency to the first host to find a particular number, called a nonce, that when hashed with record contents results in a digest that is numerically smaller than a set threshold.
Hashing record data with a nonce for proof-of-work
As hosts gather data to build records they perform a cryptographic hash function on the record's data and a selected nonce. If the resulting digest is less than a predetermined difficulty threshold the host assembles the record and nonce and broadcasts them to other hosts as a proof-of-work as well as writing the record to its own copy of the blockchain. However, if the resulting digest is not smaller than the threshold, the host repeats the process using a different nonce.
When one host receives a proof-of-work from another host it verifies the record and nonce against the difficulty threshold. If the result is verified as smaller than the threshold, the receiving host commits the included record to its copy of the blockchain. If the verification fails, the receiving host ignores the proof-of-work and continues trying to claim the reward itself.
Rewards are claimed via proof-of-work
The winning host claims the currency prize by entering the transaction into the blockchain as part of the new record. The amount of currency awarded with each proof-of-work is set by an algorithm based on the current record number among several other parameters. In this way the reward is standard and can be easily calculated across all hosts.
What's it Good For?
We have not yet discussed what the data the entire system is designed to maintain consists of. Put simply, the data may consist of anything clients wish to place into it. In the case of Bitcoin this data is primarily transactions which transfer Bitcoin from one user to another. In other systems the data includes smart contracts. The possibilities for blockchain applications is limited only by the imagination.