Bitcoin bit extension

Bitcoin uses two hash functions: A good way to understand how hash functions work is to experiment with them interactively. One resource for doing so is the SHA Online calculator. An attacker able to generate a new document with the same hash value as an old one could replace confirmed transactions and existing blocks. Several other attacks would also become possible.

The security of a hash function depends on two properties of the output: Range refers to the largest value that a hash function can produce, measured in bits. For example, a hash function producing bit output can produce at most 65, 2 16 hash values. Although widening the output range can decrease the collision rate, adding bits increases storage and transmission costs.

Uniformity refers to how evenly distributed hash value are. For example, a hash function capable of bit output that consistently produced a single value would have very poor uniformity despite a large range. To take full advantage of its output range, a good hash function ensures the widest possible distribution of values. No matter how well-designed, the security of any hash function can in principle be broken in two ways: In a preimage attack, a user attempts to find a new document whose hash value matches a predefined target.

For example, a Bitcoin user seeking to replace an existing block with one of her own choosing would generate variations until a match was found. The number of attempts she can expect is equal to the length of the output. A collision attack, in contrast, attempts to generate two messages with identical hash values.

Certain kinds of smart contracts can be attacked in this way. The birthday problem asks for the probability that at least two people in a randomly-selected group share a birthday. A preimage attack is not subject to this effect. It helps to consider the magnitude of this number in relation to a familiar reference point. This number is so vast that just counting that high with an extremely efficient computer would consume the combined energy output of the sun for many centuries.

Working with long sequences of ones and zeros is unwieldy, so Bitcoin uses a more compact notation known as hexadecimal. Hexadecimal notation is a number system based on powers of 16, and uses the digits and a-f. A binary zero and one representation of a hash value can be converted into a hexadecimal representation by breaking it up into groups of four digits and replacing each one with the corresponding hexadecimal digit. For example, the binary sequence:.

Blocks and transactions are identified as their SHA hash values, expressed in hexadecimal form. For reasons that remain unclear to this day, Satoshi Nakamoto designed Bitcoin to use double hashes to derive transaction and block identifiers. In a double hash operation, the hash function is applied once, and then once again to the resulting hash value.

The most likely reason for doing so is to protect against a length extension attack. Here, an attacker uses knowledge of the length of the original document to find a collision in better than brute-force time. The Bitcoin network only works if the rate of block generation stays constant. This problem is solved through proof-of-work.

Proof-of-work is a method for restricting access to a valuable resource by forcing computational work as a condition of use. A recipient of a message would only read those messages to which sufficient proof of computational work had been attached. Putting proof-of-work into practice requires a proof-of-work function. An essential quality of such a function is asymmetry.

This means that verifying a proof-of-work should be fast, but generating it should be slow. With a little creativity, a hash function can serve double-duty as a proof-of-work function. Recall that a hash function accepts a message as input, reproducibly returning a hash value as output. A hash function can be transformed into a proof-of-work function through the use of a nonce. A nonce, or number used once, is content embedded into a message that changes the output of a hash function.

For example, a simple proof-of-work function might append an integer to a message, then return the hash value obtained from the result. The output of a hash-based proof-of-work function is unpredictable, but the same nonce and message will always yield the same hash value.

In this way, a proof-of-work can be both easy to verify and difficult to produce. A proof-of-work function can serve as the basis for a proof-of-work puzzle. Such a puzzle asks for a nonce that when combined with a message gives a hash value less than or equal to a threshold value.

Recall that secure hash functions resist preimage attacks. This leaves trial-and-error as the only winning strategy to find a valid proof-of-work. Raising the target value widens the range of acceptable hash values, and therefore reduces the number of guesses and time needed to find a valid solution. Lowering the target value narrows the range of acceptable hash values, decreasing the speed with which a winning nonce can be found. By revealing a suitable nonce, a user proves that sufficient computational work has been performed to gain access to a communal resource.

Others can easily pass the original message and nonce into a hash function and verify that the output falls below the required threshold. In other words, a message, nonce, and target threshold prove that enough computational work was expended to unlock access to a resource. An address is a specially-formatted hash value. All three forms include additional data along with the hash value. Secure hash functions are resistant to preimage attacks.

In other words, a hash value can be published without risk that the original message will be guessed. However, anyone receiving the message can easily verify that the previously-published name matches by simply running it through the hash function.

Many applications for preimage resistance in smart contracts are possible. The examples in this section use a visual language designed to simplify discussion of smart contracts. Taking advantage of preimage attack resistance, Alice can run a primitive contest secured by a hash function. To do so, she locks a coin to the hash value h of a secret message m , her last name. Unlike the preimage attack discussed above, this one is easier because the search space is much smaller fewer than seven billion.

A similar principle can be used to mathematically link two otherwise unrelated payments together. The second problem means that those using a document can never be sure that its name will remain constant over time. The Bitcoin network manages two kinds of documents that require permanent, unique names issued without a centralized authority: To know when my payment was confirmed, I need to refer to its containing block by name.

More than this, transactions and blocks also refer to each other. Bitcoin needs to provide its users with a system for naming transactions and blocks so that they can later be accessed and linked together. Hash functions solve this problem.

This can be accomplished with the help of an imaginary invention, a random oracle. To the outside world, a random oracle looks like a black box with two slots cut into it. Anyone can slide a message written on an index card into the input slot.

The box responds by pushing a new card from the output slot. On the card is written a name, represented as a sequence of ones and zeros. The length of this name is adjustable, but constant for all documents at a given setting. Re-submitting a message always yields the same name. If two message texts differ, they will be assigned different names.

There are many ways to implement such a black box, especially if imaginary creatures are allowed. Imagine the box contains a gremlin, a book, a pencil, a stack of index cards, and a metal coin. Messages are inserted into the input slot. When one arrives, the gremlin scans the book for it. If the message is found, the gremlin writes the corresponding name on an index card. Each time heads comes up, the gremlin writes a one on the card.

Each time tails comes up, the gremlin writes a zero. Enough coin tosses are made to fulfill the name length quota used by the black box. This kind of random oracle solves the problem of assigning unique, permanent names to digital messages, but it scales poorly. Fortunately, our random oracle can be replaced for all practical purposes with a hash function.

Digitally-encoded messages enter the hash function and unique, permanent names exit. These names are called hash values. Given this background, here are seven things to keep in mind regarding hash functions and Bitcoin. Bitcoin uses two hash functions: A good way to understand how hash functions work is to experiment with them interactively. One resource for doing so is the SHA Online calculator. An attacker able to generate a new document with the same hash value as an old one could replace confirmed transactions and existing blocks.

Several other attacks would also become possible. The security of a hash function depends on two properties of the output: Range refers to the largest value that a hash function can produce, measured in bits.

For example, a hash function producing bit output can produce at most 65, 2 16 hash values. Although widening the output range can decrease the collision rate, adding bits increases storage and transmission costs.

This number is so vast that just counting that high with an extremely efficient computer would consume the combined energy output of the sun for many centuries. Working with long sequences of ones and zeros is unwieldy, so Bitcoin uses a more compact notation known as hexadecimal. Hexadecimal notation is a number system based on powers of 16, and uses the digits and a-f.

A binary zero and one representation of a hash value can be converted into a hexadecimal representation by breaking it up into groups of four digits and replacing each one with the corresponding hexadecimal digit. For example, the binary sequence:. Blocks and transactions are identified as their SHA hash values, expressed in hexadecimal form. For reasons that remain unclear to this day, Satoshi Nakamoto designed Bitcoin to use double hashes to derive transaction and block identifiers.

In a double hash operation, the hash function is applied once, and then once again to the resulting hash value. The most likely reason for doing so is to protect against a length extension attack. Here, an attacker uses knowledge of the length of the original document to find a collision in better than brute-force time. The Bitcoin network only works if the rate of block generation stays constant. This problem is solved through proof-of-work.

Putting proof-of-work into practice requires a proof-of-work function. An essential quality of such a function is asymmetry. This means that verifying a proof-of-work should be fast, but generating it should be slow.