couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Cottlehuber <...@jsonified.com>
Subject Re: Duplicate fields in documents
Date Thu, 20 Feb 2014 09:37:01 GMT
> > On Feb 19, 2014, at 6:07 AM, Dave Cottlehuber wrote:
> >
> > > TL;DR the appropriately named ECMA 404 JSON spec [1] is broken or more
> > politely, insufficiently specific.
> >
> On Wed, Feb 19, 2014 at 8:30 PM, Jens Alfke wrote:
> >
> > This seems to fall into the category of "things so obvious that the people
> > who wrote the spec didn't realize they had to mention them." I.e. "You
> > can't have duplicate keys.”

When I was in France, I learned that anything not explicitly forbidden is
therefore permitted. And while in Germany, I learned that anything not
explicitly permitted is forbidden :-). Luckily in New Zealand, it only matters
if you get caught ;-).

On 20. Februar 2014 at 06:14:29, Mikael Söderström (vimpyboy@msn.com) wrote:
>
> If there are duplicate keys, it should absolutely fail.

I think returning an error on receiving duplicate keys would be a sensible
change to CouchDB, albeit a relatively minor breaking one. See below.

> > > JSON is typically based on a dictionary or hash map, and there's no
> > particular reason for that data structure to enforce uniqueness of keys.
> >
> > I disagree. Mathematically, a dictionary/map object is a function: it maps
> > from a set of keys to a set of values, with each key mapping to exactly one
> > value. (That's basically the definition of 'function'.) It's certainly
> > possible to create implementations that map a key to _multiple_ values, but
> > that's something different: it's a mapping from a key to a set. (For
> > example, it's not from string-->int, it's now from string-->set.) The
> > JSON spec does not include this kind of mapping -- an object value in JSON
> > can be a number, but not a set of numbers.

I was referring to the fact that hash tables usually have some form of collision
detection internally, when multiple keys map to the same hash bucket. There are
perfect hash functions & algorithms that avoid collisions, but that’s getting
a bit off track. And it’s moot point, as we all agree that duplicate keys are not
what most people expect, including Crockford.

> > IMHO the reasonable thing for a JSON parser to do if it encounters a
> > duplicate key is to fail with a clear error. Failing that, the only other
> > reasonable option is to discard one or the other value (I don't have an
> > opinion which.) But keeping both is unreasonable.

If somebody wants to sort this out, I’d suggest implementing the fix (in C) in
Paul Davis’ jiffy library, which has been on the list of things to import
for a while.

I added https://github.com/davisp/jiffy/issues/54 and updated
https://issues.apache.org/jira/browse/COUCHDB-1294

jiffy:decode(<< "{\"foo\":\"bar\", \"foo\":\"bar\"}" >>) should return an error,
something like invalid_json_duplicate_name_in_object.

CC’d dev@.

A+
-- 
Dave Cottlehuber
Sent from my PDP11



Mime
View raw message