Upspin Security
Introduction
The security design of Upspin has been sketched in the accompanying
Upspin overview document.
Here we dive into the deeper security issues.
Some of the discussion may be of interest only to experts, but the general
design should be understandable by anyone given background provided in the
referenced links.
When running the directory and storage servers on public cloud infrastructure
Upspin attempts to provide:
- confidentiality and integrity protection of content even against advanced
attackers, and
- protection of metadata against network attackers, but not against
due legal process upon the cloud provider.
Concerned users would run their directory server in a private cloud.
Upspin’s security model assumes that the Client endpoint platform is secure.
Additional trust discussion is in the Server Management section below.
Upspin-specific Storage
In our design, Alice (which is to say, Upspin client software run by Alice) shares a
file with Bob by picking a new random
symmetric key,
encrypting a file, wrapping the symmetric encryption key with Bob’s
public key,
signing the file using her
own elliptic curve
private key, and sending the ciphertext to a storage server and metadata to a
directory server.
The specific ciphersuite used is selectable and defaults to P-256 for the
elliptic curve algorithm, AES-256 for data encryption, and SHA-256 for
checksums.
The entire system is written in Go, is open-source, and uses Go’s standard
cryptographic packages.
The basic idea is to choose a random number as an encryption key K, use AES to
encrypt the data, and store the encrypted data in the storage server.
Then, for each potential reader of the file, we encrypt K using that user’s public key.
We store the set of encrypted keys in the DirEntry of the item along with a digital
signature of the data. To read the data, the reader looks in the DirEntry for the
appearance of K that was encrypted with their public key, decrypts it to recover K, and
then uses K to decrypt the data.
The next few paragraphs explain this process in detail for security experts and
can be skipped by less dedicated readers.
To store a file “pathname”, Alice obtains a fresh 256 bit random “dkey” and
XORs the file with an AES-CTR bitstream with IV=0.
The ciphertext is sent to the storage server.
The storage server returns a cryptographic location string, called a reference,
that we assume may safely be given to anyone and used to retrieve the
ciphertext.
A username list {U} is assembled including Bob, Alice, and any others granted
read access to items in the path name’s directory.
Alice looks up each of the username’s public key P(U) from (a local cache of) a
centralized KeyServer
running at key.upspin.io
.
Alice wraps dkey for each reader, annotated with a hash of that user’s public
key.
(Alice shares with others using ciphersuites she considers adequate, say
{p256,p384,p521}, though her own key may be p384.
If Bob picks an RSA 1024 key, she’ll decline to wrap for him.)
Keys are wrapped as in NIST 800-56A rev2 and RFC6637 §8 using ECDH.
Specifically, Alice creates an ephemeral key pair v, V=vG based on the agreed
elliptic curve point G and random v.
Using Bob’s public key R, Alice computes the shared point S = vR.
A shared secret “strong” is constructed by HKDF of S and a string composed of
the ciphersuite, the hash of Bob’s public key, and a nonce.
Next, dkey is encrypted by AES-GCM using the key strong.
This yields a wrapping
W(dkey,U) = {sha256(P(U)), nonce, V, aes(dkey,strong)}
which Bob can unwrap by looking through the list for his public key hash, then
computing S = rV using his private key r, then reconstructing the strong shared
secret via HKDF, and finally AES-GCM decrypting to recover dkey.
Using her private key, Alice signs
sha256("ciphersuite:pathname:time:dkey:ciphertext")
By signing, Alice ensures that even a reader colluding with upspin servers
cannot change the file contents undetected.
Alice is only claiming that she intended to save those contents with that
path name, not that she necessarily is the original author or even that the
contents are harmless; in this regard, we’re adopting the same semantics as
“owner” in a classic Unix filesystem.
We do not insist that Alice bind her name inside the file contents,
only inside the directory entry.
It is cryptographically possible that two authors of a file could each have
their own equally valid directory entries pointing to the same storage blob.
However, unlike with some content-addressable storage systems, if two
individuals write the same cleartext, it will almost certainly be encrypted
with different keys and thus be stored twice, once for each encryption.
The list of readers for key wrapping is taken from the read access list
described in the Access Control document.
When that list changes, wrapped keys should be removed for the dropped readers
and extra wrapped keys made for the added readers.
The directory server assists with this work queue, but needs cooperation of the
owner’s Client to do the actual wrapping for new readers.
This lazy update process can also handle readers’ public keys changing over
time, which helps users who have lost old keys.
It is inherent in the notion of a file archive that there is no perfect forward
secrecy.
However, a somewhat similar effect is achieved by this update process.
The path name, revision number, encrypted content location, signature, and
wrapped keys are the primary metadata about a file stored by the directory
server.
Thus Alice reveals information to the directory server, particularly the
cleartext path names and the (public keys of the) people she is sharing with.
Also, to the extent that elliptic curves might be cryptographically weaker over
time than AES, Alice also depends on the directory server being unwilling to
distribute data to unauthorized people.
The random bit generation, file encryption, and signing/key-wrapping all are
done on the Client, not on any of the servers.
We intend that this
system provides end-to-end encryption verifiably under the exclusive control of
the end users.
This discussion is about a data-encrypting method, or in Upspin terminology, a packing,
that is called ee.
It uses NIST elliptic curves for end-to-end encryption, and is the default.
There are other packings available, notably eeintegrity
which is useful when one is willing to store signed cleartext.
The directory server needs to store its hierarchy of directory entries
somewhere.
(It is represented as a
Merkle tree, a tree of hash
values.)
The server uses the encryption scheme described above to store its data in the
storage server.
Key Management
An Upspin user joins the system by publishing a key to a central key server.
We believe a global collection of public key bindings is the best way to
promote easy sharing between strangers, and we think this need extends
beyond Upspin.
We’re running our own key server for the moment but anticipate converting to
Key Transparency
or whatever other strong system becomes most popular.
Our key server enables detection of tampering by
publishing a full, incrementally hashed transaction log at
https://key.upspin.io/log.
If you can confirm a friend’s public key some other way, compare it
to what is stored at key.upspin.io and
report to us and the public if you ever find a mismatch.
Compare the key.upspin.io/log hash you see with what your friend sees,
and report any discrepancy.
Watch for your own key in the log and report if there’s ever a change, even
momentary, that you did not initiate yourself.
You’ll be giving the rest of our users herd immunity.
As far as Upspin is concerned, a user is an email address, authenticated by an
elliptic curve key pair used for signing and encrypting.
We anticipate that the user will rotate keys over time, but we also assume that
they will retain all old key pairs for use in decrypting old content, and will
accept losing that access to that content if they lose all copies of their keys.
To generate a new key pair, a user executes keygen
and copies on paper the
128 bit seed as backup.
This seed is expressed as a proquint.
The keygen program saves the elliptic curve public and private keys, as decimal
integers in plain text files in the user’s home .ssh directory.
A user may “restore” keys to multiple devices including smartphones.
The public part of the key pair is stored in a file public.upspinkey,
conventionally in the directory $HOME/.ssh/ along with the user’s other keys.
The SHA-256 hash of that file is called the keyHash
and is used to identify
which readers have cryptographic access to data contents via encryption key
wrapping.
This file can safely be given to anyone, and is the material registered at the
key server.
The private part of the key pair is stored in a file secret.upspinkey,
also
in ~/.ssh/, and is read-protected to the user by normal file permissions (but
no extra passphrase).
Eventually, we envision that such secrets will be protected by hardware but
we’re starting with local file as more portable for initial deployment.
If you want some amount of hardware protection, use an encrypted filesystem or
Ironkey for ~/.ssh.
Older key pairs, both public and private parts, are stored in a file
secret2.upspinkey
.
Based on past experience with PGP, our choice of filenames is intended to help
the average user avoid the common mistake of confusing which information can be
freely shared and which needs to be carefully protected.
Key rotation happens in the following sequence of operations:
upspin cmd operation |
public,secret.upspinkey |
secret2.upspinkey |
keyserver |
signatures |
wraps |
initial key |
k1 |
- |
k1 |
k1, - |
k1 |
new key |
k2 |
k1 |
k1 |
k1, - |
k1 |
countersign |
k2 |
k1 |
k1 |
k2, k1 |
k1 |
rotate |
k2 |
k1 |
k2 |
k2, k1 |
k1 |
share -fix |
k2 |
k1 |
k2 |
k2, k1 |
k2 |
We do not anticipate that the keys used here will be used for any other
purpose, and we’ve chosen proquint as an obscure technology to promote that
independence.
We therefore do not think there are any viable protocol interleaving attacks.
With secret.upspinkey,
we follow Chrome’s password-manager reasoning that if
the user does not have encrypted disk storage or is not in exclusive control of
their home directory, they have lost the security game anyway and there is
nothing meaningful we can do to protect them.
As with Chrome, we realize this will be a controversial position.
We look forward to adopting some Security Key or other hardware-protected
private key storage.
There are no passwords in our system and we don’t intend to have any.
Key pairs have three representations:
1. string, used for storage and between programs like User.Lookup
2. ecdsa, internal binary format for computation
3. a secret seed sufficient to reconstruct the key pair
In form 1, the first bytes describe the packing name, e.g. “p256”.
In form 2, there is a Curve field in the struct that plays that role.
Form 3, used only in cmd/upspin/keygen.go, is simply 128 bits of entropy
expressed as proquints.
Although we’re using AES 256 for bulk encryption to promote long-term
interoperability, the default client uses only 128 bits of entropy in
generating the elliptic curve key pair.
That bit length was chosen to make the secret seed small enough for
ordinary people to be willing to write down.
Safe backup of the key is a long-term risk of all archival encryption.
We’ll see if the mental model of protecting a secret on paper
works in real life.
It seems 128 bits of entropy is good enough, at least until practical
implementations of Grover’s algorithm come along,
and by then we’ll have to replace elliptic curves anyway.
By collecting all the private key operations into the factotum package, we are
providing for an isolated implementation, as in qubes-split-gpg or ssh-agent.
Server Management
We’re currently running our storage server (for encrypted bulk file content),
directory server (for metadata), and key server (for keys and location of
directory server) on Google Cloud Platform at domain name upspin.io
.
A user connects to these servers by HTTPS, implicitly using TLS 1.2.
To identify the user accessing any Upspin server, the RPC framework presents an
authentication request signed with the user’s private key.
This protocol guarantees that only registered Upspin users can access Upspin
services.
(Reads from the key server do not require authentication.)
Administrators of storage and directory servers can use the authenticated user
name to restrict write access to a subset of all Upspin users.
An instance of the default storage server maintains a list of
users permitted to store blocks on the server.
The upspin.io
servers use certificates from LetsEncrypt.
You may use the default system Root CA list, or specify tlscerts
in your
~/upspin/config
pointing to a directory with just DST_Root_CA_X3.pem
.
Implicit in the cryptographic discussion earlier is the fact that a directory
server administrator can read any file name, the writer, and the list of readers.
This is roughly equivalent to using PGP inside
an email system like Gmail:
very few attackers can reach the metadata,
but a rogue insider or law enforcement with judicial oversight would be
able to.
As mentioned in the introduction, a concerned user could choose
to run the directory server on their own machine.
For brevity, let us say a “bad directory server” is one that has been
compromised or is malicious or is compelled under legal process or
simply has bugs.
Besides observing metadata, a bad directory server can cause harm
by returning an incorrect Access file to the client.
Access files are signed by the owner, but replay is possible;
this might yield a stale list of readers or other permissions.
(Similarly, the directory server could serve a stale signed DirEntry.)
In addition to checking signatures, the client confirms an Access file
is in the path from the current directory up to the root to limit the
damage of a malicious directory server returning the wrong result from
a call to WhichAccess
.
A cautious owner should not place private directories inside public directories.
To prevent a bad directory server from returning fraudulent
directory entries that would be undetectable by upspinfs,
all the packings at a minimum include a signature by the writer
of the path name, packing, and timestamp.
Plain packing does only this minimum,
with no signature or encryption on the content,
to simplify implementation of lightweight dynamic file systems
as might be associated with devices such as cameras.
Finally, while the backup properties of Upspin improve on most people’s file
systems today, a bad directory or storage server can
certainly wreak havoc through deletion.
Writing a file to a storage server reveals the creation time and the file size,
but nothing else.
Thus we expect even very cautious users can enjoy
the availability advantages of public cloud storage.
If they prefer,
they can run the Upspin storage server code off their own local disk.
Alternative Designs
The design space has many choices, offering different protections.
Some ask why the directory server has access to cleartext filenames.
It looked complicated to provide the API we do while somehow wrapping
encryption keys for filenames that could be extracted by each client
that needed them.
Glob then has to then be implemented on the client, which adds even
more complexity when done not by the file tree owner but by a client
with permission only to parts of the tree.
Homomorphic encryption approaches either don’t support full glob or
are very complicated themselves.
In practice, the user can pick obscure filenames for special cases.
The more challenging information leak from a bad
directory server is the list of reader
accounts that you’ve shared your file with.
There are also some things that could be done about that,
adding cost and complexity.
Google Cloud has decent security and also pushes back against
overly broad warrants, so we believe running the directory server
in the cloud is an acceptable risk.
If you worry about this, run the directory server on
your own well-protected system.
Others ask why we depend on a central key server rather than some
distributed or federated system.
As mentioned before, we are not adamant about using our current
key server forever;
if a better solution comes along we would consider switching.
Any better system has to have at least the resistance our current
one does to undetected tampering or inconsistent responses.
Most of all, we welcome suggestions for how to make our system simpler.
For us, complexity bugs are a bigger fear than warrants.