Hashing functions - What are they and why are they so important to cryptocurrency
Hashing functions are utilized in every single cryptocurrency standard and are heavily used in ensuring the security of web and software applications. I strongly believe that even having a shallow understanding in the technology that drives some of the most innovative financial applications such as Ethereum, can aid in more informed crypto-investments. Without further adieu, here is my post regarding hashing functions.
What is a hash function?
A hash function is any function that can take a variable sized input and return a fixed output. A hash is usually just a string of numbers and letters. This has a lot of benefits some of which are:
- Easily store hashed data in which storage constraints are a concern. For example, hashes are often used in the storage of user passwords.
- Easily verify the integrity of data, if the hash on the website for the software I downloaded matches the hash after I run a special command then I know that the software I downloaded has not been tampered with.
- Easily generate unique identifiers, for example, you can take the current time down to the millisecond and hash it and you'll end up with a value you can use to mark a piece of data.
Hashing is not the same as encryption/decryption
One thing to note is that a hash function is a one-way function. This means that data which is hashed cannot be decrypted or rehashed in such a way to find out what the original data was, this quality of hash functions is called pre-image resistance. The only way to know whether or not a hash matches to a particular dataset is to take the original dataset and hash it again and see if the two hashes match. Let's go through an example to understand this concept more easily:
Take for example the plaintext: password
Let's hash password using the MD5 hashing algorithm and see what we get. I use the website www.md5.cz to hash the plaintext password and I get:
Anyone can go to this website and run the algorithm on password to get this hash, so storing this straight in a database would be a bad idea right? Couldn't someone just go through the English dictionary and run the MD5 algorithm on every word and combination of words and get the password if the hash matched the one that we stored in our database? Definitely. This is called a dictionary attack and it is used very often to gain access to user accounts.
Turns out people actually make tables that contain plaintext and their corresponding hash value take for example:
These are called rainbow tables and are used very often by hackers. There is an entire operating system called Kali Linux that has tools to execute rainbow table based attacks.
Preventing against dictionary attacks
To prevent against a dictionary attack a unique string can be added on to the end of plaintext, this is called a salt. Instead of hashing password we can hash password133423423 which will come out to a different hash with the 133423423 being the salt. Now launching a dictionary attack will be much more difficult as a potential attacker will also have to account for the unique salts added to the plaintext strings. A salt can be generated using the time, or maybe even random factors such as data regarding cosmic radiation.
Hashing in cryptocurrency
To put it bluntly - hashing is used a lot in cryptocurrency. In Bitcoin, an account is generated by taking a public key and hashing it with the RIPEMD160 and the SHA2-256 algorithm, in the case of SHA2-256 the 256 refers to the number of bits in the final hash. Perhaps in one post I can go over details regarding the SHA (Secure Hashing Algorithm) class of algorithms. These hash functions are both collision resistant this means that no two hashes will ever be the same. If it turned out that the Bitcoin protocol hashed two public keys to the same hash, then it would be a total disaster in that any of those two people holding those accounts could send money from those addresses. Hashing is also utilized in tracking transactions.
If you head over to the popular site etherscan.io you can see hashes being used in transactions first hand:
Storing transactions in this manner is very convenient in that transactions are easily able to be tracked without much difficulty.
Hashes used in Proof of Work based blockchains
Hashing is also used in proof of work based blockchains, to mine Bitcoin you are essentially trying to generate what's called a valid nonce. Let's break that down, a nonce is a number which is used to vary the output of the hashing algorithm. The Bitcoin protocol defines a set numeric limit, and a miner who is able to find a valid nonce which generates a hash that corresponds to a value that is less than this limit is rewarded a certain amount of Bitcoin. Due to the fact that it is infeasible to figure out the input of a cryptographic hash functions from the output, using hash functions for this type of system is ingenious. The Bitcoin protocol can make the difficulty of mining higher or lower by setting that numeric limit to a lower or higher value.
I hope that you have learned something about hashing functions and where they fit in the world of cryptography, blockchains and cryptocurrency. If you have any suggestions/corrections please feel free to comment them below. Perhaps in another post I will going into more depth regarding a specific hashing algorithm or talk about Proof of Stake (PoS) and Proof of Importance (PoI) as well.