details

Storing a key
Recovering a key
Difficulty of brute forcing a single encrypted key
Attack methods
Server list
Servers
client-server Proof of Work
Avoiding correlations
Detecting corrupt data
Versioning
Ideas

Storing a key

Input password and name from user. The name is a combination of their own name and a more obscure name (such as the name of their high-school sweetheart).
Get the keyid of the key. (This can be any a public value unique to the private key, eg a gpg keyid. It's only used to allow storing multiple keys under a given name. If a gpg public key is not on the keyservers, or the key is not a gpg key, can use "" for the keyid.)
Generate N by argon2(name, salt=keyid), tuned to take 10 minutes.
Generate N1-N3 by sha256 of N+1,2,3
Generate decryption puzzle P, a byte chosen at random.
Generate K by argon2(password, salt=name+P), tuned to take 0.195 minutes.
AES encrypt (data + checksum) with K as the key.
Shamir the encrypted key with N=2, M=3, yeilding S1-S3.
Servers reject attempts to store an object under a name that is already in use.
Servers do not allow enumerating all objects stored, and require a proof of work to handle any request.
Upload S1-S3 to separate servers under the names N1-N3. If any of the uploads is rejected as name already in use, ask user to enter a different name or password.

So, storing a key takes 10 minutes.

Recovering a key

Input password and name from user.
Calculate N and N1-N2
Request N1-N3 from servers until two objects are available.
Shamir recombine the objects.
Guess a value for P.
Generate K by argon2(password, salt=name+P)
AES decrypt
Repeat with new P until checksum verifies.

This takes 10 minutes to calculate N, plus on average 128 guesses of P. Total recovery time varies from 10 minutes to 60 minutes, with an average of 35 minutes.

Difficulty of brute forcing a single encrypted key

Assume we know the name and keyid, or have otherwise found a way to determine the shards of a key. Download and recombine the shards.
Guess a password.
For each possible value of P, AES decrypt with K = argon2(password, salt=name+P), and check if checksum verifies.
This takes 0.195 minutes * 256 = 50 minutes total.
Repeat for next password.

So, for a password with N entropy, the number of CPU-years of work is to crack it is: 2^(N-1)*50/60/24/365

Strong password (50 entropy): 53553077761 CPU-years
Weak password (30 entropy): 51072 CPU-years
Super-weak password (19 entropy): 25 CPU-years

So, if an attacker is able to find the right shards for a secret key, it's feasible for them to crack super-weak and weak passwords, assuming the secret key is worth the cost of doing do. Stronger passwords quickly become infeasible to crack.

Attack methods

An attacker who wants to target a particular person can guess the name they used, derive N1-N3, download two of S1-S3 and start brute forcing the password soon after the object is stored. This is the most likely attack method, so any user who could potentially be targeted like this should choose a strong password. A name that attackers are unlikely to guess prevents this attack, which is why keysafe prompts for not only the user's name, but also a more obscure name. Each name guess that the attacker makes takes 10 minutes of CPU time to generate N, as well as whatever proof of work the servers require.

The sharding prevents a single malicious server from blindly guessing weak passwords across its entire collection of objects. It takes two servers colluding to try to recombine their shards.

If recombining two shards yielded data which could be checked to see if it's valid, then it would become fairly inexpensive to try all combinations of shards, and obtain all the encrypted keys for further cracking. So it's important that there not be any checksum or header in addition to the AES encrypted data. (AES encrypted data cannot be distinguised from random garbage except by block size.) Get that right, and with N keysafe users, an attacker would need to try 2^(N-1) combinations of shards to find one on average, and would need to brute force the password of each combination. With only 20 keysafe users, assuming all users have super-weak passwords, this attack takes 13107200 years of CPU work (2^19*25) to crack one. With 50 users, this jumps to quadrillions of years of CPU work.

Colluding servers can try to correlate related objects based on access patterns and recombine pairs of those, and then brute force the password. The correlation needs to be very fine-grained for this to work. If 15 users' objects are all bucketed together by the correlation, then the attacker has 16384 combinations to try on average before finding a correct combination. Multiply by 5 years CPU work for cracking only super-weak passwords.

Colluding servers who want to target a particular person can guess their N1-N3, check if those objects exist on the server, and begin brute-forcing the password. This is not much cheaper than the same attack performed by a third-party attacker, except that the attacker doesn't need to wait for an object to download from the server to check if it exists.

A state-level entity may try to subpoena the entire contents of keysafe servers. Once 2 servers are compromised, the state-level entity can try the same attacks that colluding servers can use (but probably cannot use attacks involving correlation because the server operators should not be retaining the data needed for correlation). Of course, they probably have many more resources to throw at the problem. But with only 50 keysafe users, recombining shards and trying super-weak passwords would be prohibitively expensive as detailed above. So, a state-level entity will probably only find it feasible to target particular people. Since such an attack can be performed by anyone as discussed above, there seems to actually be no incentive for a state-level to subpoena data.

A state-level entity's best bet at getting lots of keys is probably to use their resources to compromise keysafe servers, and modify them to log data. Then a correlation attack can be done as discussed above.

A different kind of attack: Legal/extralegal action to get a particular person's key removed from storage servers to try to deny them access to it. Or, to entirely take down storage servers.

Malicious data attack

Two servers could collude to serve up malicious data to try to exploit the user's system.

For example, if the user is using their gpg key to encrypt emails, and they restore a different gpg key, they might use it to encrypt with and then what they said could be decrypted by the attacker.

To perform this attack, the attacker first has to manage to crack the user's password. Then they can replace the objects with malicious versions, encrypted with the same password.

So, this is not too useful for gpg key replacement, since the attacker must already know the secret key. However, perhaps they could exploit bugs in gpg to compromise the user's system.

Server list

There's a server list shipped with the client, giving their tor onion address and the organization providing the server.

Three of the servers in the list are recommended servers. Shards are stored on these unless overridden by other configuration.

When recovering a key, the client tries the recommended servers first. But, if it fails to recover the key using those, it goes on to try other servers on the list. This way we don't need to remember which servers the shards were stored on, and can change the recommended servers when necessary.

See servers for more on the server list.

Servers

Servers run exclusively as tor hidden services. This prevents them from finding out the IP addresses of clients, as well as providing transport level encryption.

Servers should avoid keeping any logs, and should santize the timestamps of any files stored on them. (Eg set back to epoch and store on filesystem mounted with noatime.)

Only small objects are accepted to be stored. This is to prevent this from being used as a general purpose data storage system, and only be useful for storing keys.

However, gpg --export-secret-key can be hundreds of KB in size when a key has a lot of signatures etc. While this size can be cut down to 4 KB using paperkey to only include the secret key, it's desirable to store the public and secret key together. This way, a user does not have to publish the key to keyservers, which makes some attack methods much less likely to try to crack their key.

So, the object size should be at least a few tens of KB. 64kb seems reasonable. If keysafe needs to store a larger key, it can chunk it up.

Objects shoud be padded to a fixed size before they are encrypted, to prevent attackers from correlating and targeting particular objects based on size.

client-server Proof of Work

The Proof of Work prevents servers being flooded with requests. This is particularly important to prevent misuse of keysafe servers to store large data on them. It's also somewhat useful to prevent attackers guessing the name someone used to store a key; but the cost of generating N from a name makes the server's proof of work only a secondary line of defense against such an attack. Finally, PoW is useful to protect against DOS attacks.

Assuming that the client communicates with the server over http:

PUT /keysafe/objects/N
GET /keysafe/objects/N

The server's response can be either the result of the request, or a proof of work requirement, which specifies the difficulty D (number of 0's needed), random salt RS, and the number of argon2 iterations. The client provides the proof of work in a query parameter, which is a string S such that argon2(N,S+RS) starts with a given number of 0's.

(The server should only be able to control the number of iterations, not other argon2 parameters, to avoid forcing the client to use too much memory. Normally, the server will keep iterations small, probably 1, since it does need to calculate the argon2 hash once itself.)

The server can use a token bucket to throttle requests to a given rate. In fact, there can be a whole series of token buckets B0,B1.., for increasing difficulty proofs of work.

A request without a proof of work is checked in B0. If that bucket is empty, the server responds with a proof of work requirement D=1, and the client makes a new request whose proof of work allows access to B1. If the server is under load, B1 might also be empty, and so the client will have to try again with D=2 to access B2, and so on.

If there are 4 buckets, and each bucket refills at the rate of a token every minute, then the maximum allowed throughput is 4 requests per minute. If calculating the proof of work takes 2^D seconds on average, then it will take on average 16 minutes work to access B4.

The server can generate a different RS for each request, and can insert them into a bloom filter to keep track of ones it has given out. Bloom filter false positives are not a problem because they are quite rare and so it is not efficient for a client to make up RS in hope that there will be a false positive.

To guard against DOS attacks that reuse proofs of work, the server can maintain a second bloom filter, putting RS into it once it's used, and rejecting attempts that reuse a RS. Since the bloom filter will (with a low probability) yield a false positive, the server should reject an attempt by generating a new RS' and requesting a new proof of work from the client.

Avoiding correlations

As noted above, the more objects that are stored, the more secure keysafe becomes, because it becomes harder to correlate objects that belong to a user.

Ways the server could draw correlations include:

Time that the data was stored.
(Avoid by waiting some period of time between uploading various objects, probably on the order of days. If every keysafe uploads its objects at midnight GMT, the server will find it hard to correlate them.)
IP address used to store the data.
(Using tor avoids this.)
When the data was retrieved.
(Avoid by waiting some period of time between retrieving shards. Although, this makes key recovery take longer, which could be frustrating..)
A user ID associated with the data.
(Avoid by allowing anyone to store data and don't associate data with any user ID.)
Same sized objects may be related shares.
(Avoid by making all objects stored be a standard size.)

Detecting corrupt data

ori> if a single server is compromised, it can return bogus data when you request its fragment of the shared secret
ori> if you only have three servers, you can't know which two to trust, short of just trying
ori> you end up with three possible ways to reconstruct the secrets, A-B, A-C, B-C. only one is legit.

This could also happen due to error not compromise. Or even due to asking the wrong server for an object and getting back a dummy response.

To guard against this, include a sha256sum of the secret key in the data that is sharded. This way the validity of the restored key can be verified.

Note that this may make it marginally easier for brute force attacks, since they can use that checksum to detect when the attack is successful. Only marginally easier because it's not hard to fingerprint a gpg secret key. Even the minimised raw key output by paperkey can be recognised by a parser.

Versioning

The algorithms and parameters chosen by keysafe could turn out to be wrong, or need adjusting to keep up with technology. While we'd like to avoid this by getting the design right, it's important to have a plan in case it becomes necessary.

The simplest approach would be to include a version number with each shard. But, this causes a problem: When switching to a new version, the early atopters of that version are a small group, and so it becomes easier to correlate their shards.

The version number could instead be included in the data that is sharded. This avoids the above problem, but leads to an even worse problem: An attacker can look for the version number after combining some random shards, and if they see it, they know they picked shards that are actually related. As described in "Attack methods", if an attacker can do that, it becomes easy for two colliding servers to find the right combinations of all shards.

A safe alternative to embedding a version number anywhere in the data is to adjust the parameters of the argon2 hash used to generate the shard names. Maintain a list of versions and their shard name argon2 parameters (as well as other parameters etc). During key recovery, keysafe can try different versions in turn, use argon2 to generate the shard names, and see if those shards can be downloaded. Once it finds shards, it knows the version that created them, and the other parameters of how the data is encoded.

The only downside to this method is that each new version added to the list makes recovering data from all older versions take 10 minutes longer, as the argon2 hash has to be run an additional time.

Note that the argon2 hash parameters to vary between versions should not be merely the number of rounds, as that would allow an attacker to hash for each version's number of rounds in one pass. Instead, vary the hash memory parameter.

Ideas

Some ideas for improvements to keysafe.

Assisted Password-based Key Derivation

An idea from Nigori:

If the user has an account at some server, the server can contribute part of the data used to generate the key encryption key K. So not only is the user's keysafe password needed for key recovery, but the user has to authenticate to the server. As well as adding entropy to K, the server can do rate limiting to prevent password guessing.

Risks include:

The server becomes a target to be compromised.
If the server goes down, the user loses the ability to recover their gpg key from keysafe.

Entangled objects

An idea from Anthony Towns:

Two or more objects stored on a keysafe server can be entangled. When one of them is requested, the server should delete the others.

This could be used in several ways:

Keysafe could upload random objects entangled with a key's object, and keep local records of the names of the random objects. Then if the user wants to stop storing a key on keysafe servers, keysafe can request the entangled objects to (hopefully) delete the real objects from the server.
Keysafe could pick or prompt for some tripwire names that an attacker might try if they were looking for a user's data. Then an attacker risks deleting the data stored on the server. Although this also has DOS potential.
The user could pick a real name and fake name and keysafe uploads objects for both. If the user is being forced to give up their keysafe name and password, they could provide the fake name, and if it were used, their data would get deleted from the keysafe servers.

Better object-id derivation

An idea from Ben M:

I was the fellow who mentioned using an HMAC instead of append-index-and-hash to generate the object-ids in keysafe.

That's probably an okay approach if you need to bind the output to a particular input string, but on reflection (unless I missed something) it would be equivalent for keysafe to take a stream and chop it up, then just "number" the chunks sequentially.

In that case, the "most correct" choice would probably be HKDF (RFC5869 [1]). Specifically, the second part of HKDF -- "HKDF-Expand".

(The first part, HKDF-Extract, is appropriate to apply /before/ key stretching, but stretching itself serves much the same purpose -- removing "structure" from the input key. Especially given that Argon2 is designed specifically to handle user passwords, I expect that HKDF-Extract is entirely unnecessary here.)

HKDF is what TLS 1.3 will use to expand its per-session master keys into individual keys for encryption and MACing [2], and AFAIK is generally considered The Right Way to generate a stream of distinct keys from a master key, where the compromise of any key should not permit derivation of the others.

So, um. Pretend I never mentioned HMAC, but spruiked HKDF instead

(Of course, this is pretty much bikeshedding. A first pre-image attack on SHA-2 in the near term would be a rude shock, and a full break would break HKDF too. But HKDF may prove more robust in the face of partial breaks, giving more time to move everyone to a new hash or scheme.)