![](https://www.snoyman.com/img/profile.jpg)
@ Michael Snoyman
2024-12-16 15:40:10
We all know and love our 12 (or 24) word seed phrases. They’re the basis of how most of us secure our Bitcoin (or other crypto holdings, more on that in a bit). Following up on a discussion of a recent Ledger hardware wallet phishing attack, I wanted to put together a quick post explaining how seed phrases work and the connection with other chains. Since most of the content is chain-agnostic, I'll mostly be talking about "blockchains" instead of Bitcoin.
This will be semi-technical, caveat emptor!
## Public key cryptographic primitives
Private key cryptography allows for two pairs of operations:
* Encrypting a message using a public key, and decrypting it using the corresponding private key.
* Signing a message using a private key, and validating the signature using the corresponding public key.
Encryption is very rarely used in blockchains. Instead, signing of messages is the primary feature we use from private key cryptography. There are lots of different algorithms out there, but a bit of pseudocode will explain the high level concepts pretty well:
```
myPrivateKey := something // we'll explain where this comes from below
myPublicKey := derivePublicKey(myPrivateKey)
someMessage := b"deadbeef" // any arbitrary binary payload
signature := signMessage(someMessage, myPrivateKey)
isValid := validateSignature(someMessage, signature, myPublicKey)
```
The point here is that I can safely share my public key with the rest of the world, and the rest of the world cannot figure out what the corresponding private key is (at least before the heat death of the universe, assuming the cryptography is well designed). Then, using my private key, I can prove that a message was sent by someone who controls that private key. And anyone in the world can validate it.
For the blockchain, this is the basis for signing transactions and sending funds. If I have 2 BTC, and I want to send 1 BTC to Alice, I can sign a transaction using my private key, and the network will know that the owner of the 2 BTC created that transaction.
You might challenge this, and say that the private key could be hacked. This is true, and people lose funds like this all the time. This is another "feature" of the blockchain: whoever controls the keys controls the funds. Thus the phrase: not your keys, not your coins. This is a big difference between blockchain and TradFi (traditional finance). If someone hacks into my bank account and sends all my money to someone else, I'll probably be able to get the money back. On the blockchain, if they have your keys, your money is gone.
The question is: where do the private keys come from? One possibility is to generate and store individual private keys for each and every wallet on each and every chain you ever use. But this would be a pain. The biggest pain would be with Bitcoin itself, where---for privacy reasons---we generally like to generate separate receiving and change addresses each time we receive and send funds. Having to write down each of those private keys separately would be a pain. We need a better system.
## Seed phrases
A seed phrase is usually a set of 12 or 24 English words, taken from a dictionary of 1,024 words available. Seed phrases are specified by BIP-39, if you want to look up more details. Using that set of words, you can generate a large number. We call this large number the _entropy_, or the source of random data, that we use in the rest of our operations. In particular, this large number can be used to derive a private key using _derivation paths_.
Derivation paths are another part of the BIP-39 standard. These look like `m/44'/118'/0'/0/0`. You can see more information on them [in BIP-44](https://github.com/bitcoin/bips/blob/master/bip-0044.mediawiki). But the basic idea is that you can take that big number from the seed phrase and generate a large number of different private keys.
The different numbers in the derivation path indicate what path to use to produce a private key from this large number. The first number, 44, indicates the version of the standard. The next number is the _coin type_. In the example above, I used `118`, which is the default coin type for the Cosmos ecosystem. `60` is used in the Ethereum space, by contrast. And as the granddaddy of them all, Bitcoin holds the spot of honor at coin type 0. For a full list, check [SLIP-44](https://github.com/satoshilabs/slips/blob/master/slip-0044.md).
Other numbers in that list can be used for deriving a wider range of wallets within the same coin type. In particular, the last `0` is typically called the _index_, and can be used for creating _numbered accounts_. Some common use cases:
* Having a single seed phrase for bots, but allowing the bots to manage multiple wallets.
* Using a single hardware wallet (like a [Ledger](https://www.ledger.com/)) but managing multiple accounts.
* Generating a sequence of receiving and change addresses when interacting with the Bitcoin blockchain, in order to maintain some privacy.
## bech32
bech32 is a standard for representing binary data. It's not worth going into the details of how bech32 works here, but the important point is that it provides for a _human readable part_ (HRP) and a payload, and that the payload includes a checksum to avoid transcription errors. bech32 is used throughout the Cosmos ecosystem for encoding addresses.
You can [use an online encoder](https://slowli.github.io/bech32-buffer/) to get a feel for this. As an example, if I specify an HRP of `snoy` and a payload of `deadbeef`, I get the wallet address `snoy1m6kmamcdn8yec`. Note that the `1` here represents the separation between the HRP and the encoded data.
The question is: what data should we use for the payload? The obvious answer there would be the public key. Unfortunately public keys are large, and embedding an entire public key in the wallet address wouldn't scale well. Instead, we use a more complicated algorithm in most blockchains, including Bitcoin. You can [see more information on the bech32 Bitcoin wiki page](https://en.bitcoin.it/wiki/Bech32), which involves compressing keys and hashing with both the SHA256 and RIPEMD-160 algorithms.
## Bitcoin and other chains
If you use most blockchain software (hardware wallets, software wallets, etc.), they'll use the standard coin types for the chain in question. You can always override that, and you can also choose to modify the index are any of the other numbers in the derivation path. The result will be a completely different private key, and therefore a completely different public key. If you use hardened components in your derivation path, it should be impossible for someone to find a connection between two generated public keys, thus giving you the privacy guarantees most Bitcoin wallet software provides with extra receiving and change addresses.
This is also where xpub comes in with Bitcoin. For non-hardened subpaths, you can generate the full sequence of public keys from the xpub value. This allows you to get a read-only wallet that can generate all of the receiving and change addresses, without having to compromise your seed phrase or any private keys.
## The Ledger phishing attack
Now that we've seen all that, let's come back to the Ledger phishing attack I linked at the beginning of this article. The way a Ledger hardware device is supposed to work is that the seed phrase and any generated private keys never leave the device. Instead, the private key is generated on the device and used to sign messages your computer sends over. If that's true, even if you are scammed out of some funds by mistakenly signing a message you shouldn't have, the damage is limited to that one message. Your seed phrase and private key should be safe.
We also know that the Ledger devices don't fully work this way. The firmware on the device does in fact allow exporting the seed phrases for Ledger's recovery system. However, that should be a feature that needs explicit buy-in from the device's owner to activate.
So how exactly did the user in question lose funds on the Bitcoin chain because of a phishing attack on Ethereum? I personally can only see three choices:
1. The user is lying and stole the funds himself
2. The user screwed up and lost their seed phrase and doesn't realize it
3. There's a fundamental security flaw in Ledger which allows the private keys and/or seed phrases to be taken off the device
My bet is on (2), simply because it's simpler to assume that someone made a mistake than a device used to secure likely billions of dollars in funds has been compromised without massive reports of scams. But that's little more than conjecture on my part.
So should you ditch your Ledger and move your funds to a different hardware wallet? I wouldn't do that yet. What I _would_ recommend, however, would be to upgrade your Bitcoin storage from a single hardware wallet to a multisig for better security. I'll put out another article on that process another time.