couchdb-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Couchdb Wiki] Update of "SignedDocuments" by JensAlfke
Date Mon, 09 Mar 2009 17:22:00 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Couchdb Wiki" for change notification.

The following page has been changed by JensAlfke:
http://wiki.apache.org/couchdb/SignedDocuments

The comment on the change is:
More detail on canonicalization and signature verification.

------------------------------------------------------------------------------
  
  The fields of `signature` are:
  
-  * `signed`: The digital signature itself (the output of the RSA algorithm, in this example),
encoded in Base64.
+  * `signed`: The digital signature itself (the output of the RSA algorithm, in this example),
encoded in base64.
-  * `digest`: A SHA-1 digest of the data being signed.
+  * `digest`: A SHA-1 digest of the data being signed, also encoded in base64.
   * `date`: The time the signature was generated.
   * `expires`: The number of seconds the signature remains valid after being generated.
   * `signer`: A nested object describing the "identity" (aka "principal" or "signer") that
generated the signature:
@@ -40, +40 @@

  
  == Generating and Checking Digests ==
  
- The tricky bit I glossed over is how to generate the `digest`. To do this we take a stream
of bytes and run it through an algorithm like SHA-1. What's the stream of bytes? The JSON
of the document, of course. But that's not including the nested `signature` object, since
the digest is generated ''before'' the signature. So anyone validating the signature has to
strip the `signature` block out of the document first.
+ The tricky bit I glossed over is how to generate the `digest`. To do this we take a stream
of bytes and run it through an algorithm like SHA-1. What's the stream of bytes? The JSON
of the document, of course. But that's not including the nested `signature` object, since
the digest is generated ''before'' the signature. So anyone validating the signature has to
strip the `signature` block out of the document first. And since the CouchDB server may add
metadata to the already-signed document when the creator uploads it, top-level keys prefixed
with "_" should also be ignored.
  
- Moreover, the same JSON object can be represented by different sequences of bytes, since
key/value pairs may be rearranged, whitespace added or removed, and different encodings used.
It's possible for the byte representation to change in transit, if the document is parsed
into a data structure and then re-serialized. This would prevent the recipient from being
able to verify the signature. So the signature has to be generated from a ''canonical representation''
of the JSON, which we can define as:
+ So the process of verifying the digest looks like this:
  
-  * UTF-8 encoding
+  # Remove the `signature` property from the document.
+  # Remove all other properties whose keys begin with "_".
+  # Serialize the result as canonical JSON (q.v.)
+  # Compute a SHA-1 digest of the resulting byte stream.
+  # Compare this with `signature.digest`.
+ 
+ If the digest is valid, the digital signature itself is verified using a similar technique:
+ 
+  # Start with the `signature` object.
+  # Remove the `signed` property.
+  # Serialize the result as canonical JSON (q.v.)
+  # Perform digital-signature verification on the resulting byte stream, using the `signed`
field and the public key.
+ 
+ Note that the key does not directly sign the document. This is so that the signature can
also encompass metadata like the creation and expiration dates. Also, it would be feasible
to separate the signature from the document entirely, and store it elsewhere, since its `digest`
field uniquely identifies the document it signs.
+ 
+ === Canonicalizing JSON ===
+ 
+ A single object can be represented by multiple different JSON strings, with different sequences
of bytes, since key/value pairs may be rearranged, whitespace added or removed, and different
Unicode encodings used. It's possible for the byte representation to change in transit, if
the document is parsed into a data structure and then re-serialized. This would prevent the
recipient from being able to verify the signature.
+ 
+ So the signature has to be generated from a ''canonical representation'' of the JSON. There
is no standard for this yet, but [http://www.unicode.org/reports/tr15/ the OLPC group has
documented one] that's pretty reasonable:
+ 
-  * No whitespace
+  * No whitespace.
-  * Object keys sorted lexicographically by Unicode character values
-  * Floating-point numbers in a canonical representation (does the IEEE standard define one?)
+  * No escape sequences in strings other than `\"` and `\\`. All other characters must be
represented literally, including control characters.
+  * No trailing commas.
+  * Object keys sorted by Unicode character values (code points). The sorting occurs ''before''
escape sequences are added.
+  * No decimal points in numbers (i.e. only integers allowed) or leading zeros. "-0" is not
allowed.
+  * UTF-8 encoding of Unicode Normalization Form C
+ 
+ Note: The OLPC spec allows arbitrary byte sequences in strings, for easy storage of binary
data. But this contradicts the [http://www.ietf.org/rfc/rfc4627.txt JSON specification], which
clearly states that "a string is a  sequence of zero or more Unicode characters".
  
  == A Digression On Identities ==
  

Mime
View raw message