incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Cottlehuber <...@jsonified.com>
Subject Re: Duplicate fields in documents
Date Wed, 19 Feb 2014 14:07:37 GMT
On 19. Februar 2014 at 13:55:43, Suraj Kumar (suraj.kumar@inmobi.com) wrote:
> Hi,
>  
> If we put documents with same field name twice, we see both keys together
> in the document.
>  
> suraj@laptop:~ $ curl -d '{"_id":"doc","key":"value","key":"value"}' -X PUT
> http://mydbhost:5984/test/doc
> {"ok":true,"id":"doc","rev":"1-49200ce1b14d686a961d10af01026cf8"}
> suraj@laptop:~ $ curl http://mydbhost:5984/test/doc
> {"_id":"doc","_rev":"1-49200ce1b14d686a961d10af01026cf8","key":"value","key":"value"}
 
>  
> This seems wrong. While JS engines (node as well as mozjs) seem to be
> correctly 'overwriting' the key, why is couch storing everything? Is this a
> bug?
>  
> Or am I wrong? (I'm using version 1.4.0)
>  
> Regards,
>  
> -Suraj

TL;DR the appropriately named ECMA 404 JSON spec [1] is broken or more politely, insufficiently
specific.

This and other edge cases are not even mentioned. The RFC is marginally better [2], see below,
 but even Crockford isn’t sure what should happen [3]. The more recent ECMAScript 5.1 says
“NOTE In the case where there are duplicate name Strings within an object, lexically preceding
values for the same key shall be overwritten”.

    “The nice thing about standards is that there are so many of them to choose from.”
	— Andrew S. Tanenbaum

JSON is typically based on a dictionary or hash map, and there’s no particular reason for
that data structure to enforce uniqueness of keys. For example erlang has both unique and
repeated key data structures available. JavaScript presumably only has the unique flavour. 

From the IETF RFC:

	“The names within an object SHOULD be unique.”
	“A JSON parser MUST accept all texts that conform to the JSON grammar."

Now you *could* have a JSON parser that decides arbitrarily to delete some of your data, before
passing it to the storage engine to save on disk. Personally I’d rather CouchDB keeps the
duplicates, but until we see a content-type:application/json2 that specifies how to handle
these important edge cases, I guess the status quo is not unreasonable? The alternative is
to return error invalid_json which is incorrect.

The waters are muddied further because the conversion to/from JSON docs & couchdb on-disk
format is handled in Erlang/OTP, with a parser that makes this distinction, but the view engine
will be in JavaScript and presumably will do some cleaning up depending on which spec that
JS engine supports… YMMV.

--  
Dave Cottlehuber
Sent from my PDP11

[1]: http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf
[2]: http://tools.ietf.org/html/rfc4627#section-2.2
[3]: http://esdiscuss.org/topic/json-duplicate-keys
[4]: http://es5.github.io/x15.12.html



Mime
View raw message