Difference between revisions of "Hash"
(updated the page and explained how bitcoin uses hashes.)
m (1 revision imported)
Revision as of 21:39, 24 October 2017
For example hashing the sentence "the quick brown fox jumps over the lazy dog" with the popular MD5 hash function yields 1e280e1713df124d35709cf6138d9f91 (hashes are usually shown in hex format (base 16) rather than binary to save space), but if we make just a tiny change to the input, such as using a capital "T" instead of lower-case, the hash is 37c4b87edffc5d198ff5a185cee7ee09 - almost completely different, but always exactly the same size.
The idea is to map input data (a file, password, etc.) into fixed size output data. With only the output data the input cannot be derived.
For a simple way to think of this: remainder functions have a similar mechanism. If one has the remainder 1 from dividing 7 by 3 no one can determine what the inputs where because other functions provide the same result. Seven divided by three will always have the remainder of 1.
The same data always produces the same hash, and different data a different hash, but because the hash value is very short compared to the data being hashed, it's possible for different data to yield the same hash value. This problem is called a hash collision. However, in a typical hash of over a hundred bits in length, the chance of a collision is astronomically small, like picking atoms completely at random out of the entire universe and happening to pick the same one twice.
In an ideal hashing function there is no predictable correspondence between the initial data and the output data such as two similar inputs resulting in closer outputs, which means that a hash of a piece of data can act as a kind if finger print that identifies it specifically from all other data, and there is no way of obtaining the original data or anything about it from its hash value.
Bitcoin uses cryptographic hashes.
Some of the most common uses for hashing
It is useful to know if data has changed.
Hashing can be used to create a checksum. If the checksum of a file matches then the file is identical. This can save a lot of data transfer and can be used to detect file tampering.
Hashes allow data to be referred to by it's hash value which is much more economical on resources than transferring and storing the entire content. When a client needs the content they can request it and then hash it themselves to ensure they get the same result so they can be sure that it's exactly the same data as they requested. This is why download sites often supply the MD5 hash along with each downloadable item so the client can easily get the hash of their file locally and ensure it matches the one specified on the original site so they know that the file hasn't been corrupted or truncated in transit.
Another very simple but useful way of using hashes is for storing user passwords, if somebody hacks the system and obtains all the user data, all they have is hash values which they can't obtain the original password from without trying every possible combination. Actually trying every possible combination isn't that hard if you have a simple password, which is why sites always recommend you use upper and lower case, numbers and symbols in your passwords. When a user logs in, the server simply hashes the password they enter and checks if the hash matches the stored hash, if it does the server can be sure the user has entered the same value. If a user forgets their password, they'll need to create a new one, because the server cannot tell them what their password is since it only has hash values stored.
Hash lists and trees
In many systems, the content changes over time, and in this case the hash verification process can be a hash of not only the current state of the content, but also of the hash of the previous state, which in turn contains the hash of the state before, and so on. In this way it's impossible to modify any of the previous states without creating a totally different hash in the current state. The same applies to trees (called Merkle trees) where each node contains a hash of the child node's hashes so that any change of data of any node in the tree will cause different hash values to propagate up the tree all the way to the root node.
Bitcoin Uses for Hashes
Bitcoin blocks are designed to be discovered every ten minutes. To achieve this gap in time, mining is designed to be computationally expensive through a process called proof of work.
Bitcoin's proof of work algorithm, called Hashcash, uses the SHA-256 hash algorithm to generate verifiably "random" numbers in a way that requires a predictable amount of CPU effort. Generating a SHA-256 hash with a value less than the current target solves a block and entitles the miner to the block reward.
Bitcoin uses hashes of bitcoin public keys for payment addresses.
Instead of bitcoin users sending payments directly to public keys, the payments are sent to hashes of public keys.
Using a hash of a public key provides three benefits:
- Shorter addresses (20 byte hashes vs. 65 byte public keys)
- More secure - attackers can't being attacking a public key until it is used to sign a transaction. Today's computers cannot successfully attack public keys, but this feature along with using a new address for each transaction make bitcoin public addresses secure from quantum computers.