How blockchain works.
A lot of this stuff relies on secure hashes. A secure hash is a function that takes an arbitrary amount of data and reduces it to single (very large) number which is known as the hash of the data. A good hash function will (a) produce a different number for every input, and (b) allow no way of deriving the input from the hash. Secure hash functions have been used for digital signatures for decades.
Confusingly enough, “blockchain” refers to two different things: (1) a data structure, and (2) a “distributed ledger” based on the data structure. To avoid confusion, I’ll refer to these two thing explicity as either “the blockchain data structure” or “the distributed ledger”.
The blockchain data structure consists, unsurprisingly enough, of a chain of blocks of data. Each block refers to the block prior to it in the chain via the prior block’s secure hash. That means each block in the chain authenticates the previous block, going all the way back to first block aka “the genesis block”. There’s nothing particularly magic about this. In fact Git uses the same data structure (called a Merkle tree) to track all changes to a Git repository going back to the original commit.
The distributed ledger is how cryptocurrencies use the blockchain data structure to support transactions without any central authority. For Bitcoin, the blockchain data structure contains a list of every Bitcoin trasaction ever. Every 10 minutes millions of servers each gather all new transactions, and place them in a block. Then the servers all compete to produce a hash of the new block with a large number of leading zeros. They way they do this is to put a random number in a special field (called the “nonce”) in the block, compute the hash of the block with the nonce, and stop if the hash has the required number of leading zeros. Otherwise they try again with a different nonce. Eventually, somebody comes up with a hash with the requisite number of leading zeros, and publishes their new block to the network. This process is called “mining”.
Why do people spend huge amounts of money mining Bitcoin? Because the winning miner’s block includes one addtional transaction paying 12.5 Bitcoin (approx $83,000) to the miner’s Bitcoin account.
Etherium works like Bitcoin with the addition of something called “smart contracts”. These are small programs of the basic form “if A is true, pay B from account C to account D”. To keep from wasting CPU time on all the miners machines, contracts pay for CPU time with “gas”, a payment to the winning miner for a certain amount of compute time. If the contract doesn’t complete with the amount of gas allocated, it terminates. That’s pretty much it. There’s lots more details, but those are the fundamentals.
Why blockchain won’t solve your problem.
1. Blockchains are a horribly inefficient way to store and retrieve data. We already have good, efficient, auditable ways of storing and retrieving data. They’re called databases. If you ran Bitcoin on a database instead of a distributed ledger, you could run Bitcoin’s transaction volume on a $50 Raspberry Pi. (But you’d probably want to use a few more for backup.)
2. The magical distributed nature of blockchain only works in the presence of huge payments and an associated huge waste of energy. Bitcoin’s network currently depends on an influx of about $12,000,000 per day to pay miners fees. The size of that reward led to an influx of miners. As a result, Bitcoin mining currently uses about as much electricity per day as the republic of Ireland.
If you piggyback your solution on a cryptocurrency-based distributed ledger, you’re betting that greater fools will continue to buy cryptocurrency at a rate that allows miners to pay their hardware and electricity bills. If that ever stops happening, cryptocurrencies will collapse, and the miners will stop mining.
There’s also a thing called “private blockchain” which some people are touting. This means running a distributed ledger on your own servers. If you’re doing that, why not just run a database on your own servers? Sometimes a third party runs a private blockchain, possibly for parties who don’t trust each other to run it. In that case, why not have the third party run a real database?
3. Smart contracts are a train wreck. Smart contracts are live code that’s been signed an placed on the blockchain. That means that code that handles money is immutable, and can’t be updated.
The first massive smart contract fail was The Distributed Autonomous Organization, or The DAO. A bunch of libertarians decided to create a venture capital fund from a smart contract. The DAO raised $150 million in 2016. Then somebody found a bug in the code, and used it to transfer all that money to his account. (When that happened, the Etherium developers rewrote the server code to back out that transaction — so much for decentralization and immutability.)
If we put US healthcare on the blockchain as smart contracts, we’ve just created a trillion-dollar bug bounty.
References
Nicholas Weaver’s excellent talk “Cryptocurrencies and Blockchains: Burn It With Fire” https://m.youtube.com/watch?v=xCHab0dNnj4
David Gerald’s book “Attack of the 50 Foot Blockchain” An excellent, well-footnoted layman’s introduction to the topic.