couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Smith <...@iriscouch.com>
Subject Re: [VOTE] Apache CouchDB 1.2.0 release, first round
Date Mon, 13 Feb 2012 08:09:02 GMT
Hi, Paul. Thank you very much for this!

IMHO, in future, Couch might use a BigDecimal type when working with
documents' numbers. It never *computes* anything with those numbers,
so performance should be okay; it's just to preserve the user's data
through a decode/encode round-trip.

For now, you make a persuasive point about the practical matter (not
that you require my approval!)

My earlier tangent was to argue that Couch should be allowed to trim
leading or trailing digits if it doesn't change the "real" value, for
which I dragged a comic book into the discussion. Sorry for that.
Dirkjan and Benoit changed my mind.

On Mon, Feb 13, 2012 at 6:55 AM, Paul Davis <paul.joseph.davis@gmail.com> wrote:
> So yeah. Numbers are hard.
>
> Firstly, anyone that mentioned RFC 4627 or JavaScript behavior is
> walking down a path entirely orthogonal to the issue at hand. Jason
> almost had it when he talked about them being different but then he
> went off on some weird tangent and lost me.
>
> In a nutshell, the issue is this:
>
> CPU's work with bits. Humans (and JSON sorta) work with numbers as a
> string of numerals with some punctuation. This is a lossy conversion.
>
> So, back to the details.
>
> COUCHDB-1407 reports that ejson now encodes the value "1.0" as "1".
> While we can wax philosophically about this, the bottom line is that
> this breaks a level of equality. Specifically:
>
> 1> ejson:decode(ejson:encode(1.0)) =:= 1.0.
> false
>
> This is precisely because the %g formatting used underneath removes
> trailing zeros and decimal points.
>
> On the face of it, this is bad. And I agree. There's a simple enough
> fix (and its not what Bob Newson suggested, but I'm going to leave him
> hanging for a bit).
>
> But, before we get all crazy, we should contemplate a few other fun cases:
>
> 1. Both mochijson2 and ejson change some number representations
>
> 5> mochijson2:encode(mochijson2:decode("1E1")) =:= "1E1".
> false
> 6> ejson:encode(ejson:decode("1E1")) =:= "1E1".
> false
>
> 2. Both mochijson2 and ejson turn numbers with exponents into IEEE-754
> internally
>
> 7> ejson:decode("1E1") =:= 10.
> false
> 8> mochijson2:decode("1E1") =:= 10.
> false
>
> 3. Others but I'm tired from staring at math.
>
> Basically, the end result is that we can match mochijson2's decoder
> damn near identically (At least, I know of no known differences in
> decoding in Jiffy). But now we get to the hard part.
>
> Mochijson2 does some fancy ass magic for encoding IEEE-754 values. And
> when I say fancy, I mean, implements an algorithm published in some
> random paper from 1996 based on the paper's author's Scheme
> implementation. I spent about twelve hours today trying to duplicate
> before I realized that it depends on having an integral type that can
> represent values with more than 64 bits (which made me sad).
>
> EIther way, this is dark voodoo. Anyone that's interested can checkout
> mochinum:digits/1 and the supporting functions for some mind bending
> looks into IEEE-754 representations.
>
> Anyway, bottom line is that 1.0 should be encoded as "1.0". The fix is
> simply to just check for a decimal point and append one if its not
> there. This is what Yajl does and Python appears to behave similarly.
> The patch for Jiffy is at [1] and shows the general idea.
>
> Also, for those still holding on to why %f is not a valid fix, the
> reason is the same as why %g is wrong (and why it needs to be %0.20g.
> printf and friends by default will round to the sixth decimal places.
> So, 0.123456789 would get encoded as "0.123457" which loses precision.
>
> Also, with that patch for Jiffy we never lose precision but the
> eyesore is that we encoded 0.1 as "0.10000000000888" (Roughly). Some
> people find that offensive but I don't really care enough to learn
> arbitrary precision math routines so people can have slightly prettier
> JSON. And I say that after having spent all day trying to make it
> work.
>
> So, yeah. Fix is simple enough.
>
> Also, food for thought: A JSON parser/serializer pair that converts
> all numbers to 42 is technically compliant with the JSON spec.
>
> [1] https://github.com/davisp/jiffy/commit/5042cc946008ee413cc66b9b0addcf33ecd2fd93
>
> On Sat, Feb 11, 2012 at 8:32 AM, Robert Newson <rnewson@apache.org> wrote:
>> I'd like some opinions on whether COUCHDB-1407 constitutes a release
>> blocking issue. Yes, I understand that the JSON spec is very weak on
>> numbers, blah blah boo splat. Is this because of the switch to ejson?
>> Is jiffy more compatible on this score?
>>
>> For my part, I'm close to considering it a release-blocking
>> regression. At the very least this change should be included at
>> http://wiki.apache.org/couchdb/Breaking_changes#Changes_Between_1.1.0_and_1.2.0
>> but I'd rather it was fixed.
>>
>> B.
>>
>> On 11 February 2012 10:44, Benoit Chesneau <bchesneau@gmail.com> wrote:
>>> On Sat, Feb 11, 2012 at 4:00 AM, Jason Smith <jhs@iriscouch.com> wrote:
>>>> On Sat, Feb 11, 2012 at 3:06 AM, Randall Leeds <randall.leeds@gmail.com>
wrote:
>>>>> On Feb 9, 2012 6:09 PM, "Randall Leeds" <randall.leeds@gmail.com>
wrote:
>>>>>>
>>>>>> On Thu, Feb 9, 2012 at 17:48, Jason Smith <jhs@iriscouch.com>
wrote:
>>>>>> > Hi, Noah. When I saw it hit Git, I realized it was a breaking
change,
>>>>>> > and I asked around. If memory serves, Randall happened to be
on at the
>>>>>> > time and he asked me the same question you just did. I said
I never
>>>>>> > saw an RFC email and that's when he realized it was not done
publicly.
>>>>>>
>>>>>> I was aware the entire time, but I think the motivation is sound
and
>>>>>> it needed to be done. A couple committers spoke up to say we didn't
>>>>>> think it was sensitive enough to warrant the private discussion but
>>>>>> ultimately there was broad consensus on the implementation and the
>>>>>> change itself. One of those (let us all celebrate) extremely rare
>>>>>> times where there wasn't opportunity for broad community input.
>>>>>>
>>>>>> Creating a view on _users that pulls the relevant parts of a user
>>>>>> document out is the way forward for public profiles, I think.
>>>>>> If someone would write a blog post showing how that works it'd be
>>>>>> great. In retrospect this would have been a great thing to do weeks
>>>>>> ago. Lesson learned.
>>>>>
>>>>> Just to be clear I don't want to dismiss your concerns. If you believe
this
>>>>> needs a config option rather than just documentation now is a good time
to
>>>>> speak up loudly since the vote was aborted.
>>>>
>>>> Thanks. I am concerned. To me, the change is noteworthy but not a showstopper.
>>>>
>>>> I tested your suggestion, however I do not think it is possible.
>>>> Non-admins cannot access a view.
>>>>
>>>> $ curlp http://admin:admin@localhost:5984/_users/_design/public -d
>>>> '{"views":{"all":{"map":"function(doc) { emit(doc._id, doc) }"}}}'
>>>> {"ok":true,"id":"_design/public","rev":"1-f605d1ea7825645132f54a91a76a1ddc"}
>>>>
>>>> $ curl -i http://user:user@localhost:5984/_users/_design/public/_view/all
>>>> HTTP/1.1 403 Forbidden
>>>> Server: CouchDB/1.2.0 (Erlang OTP/R15B)
>>>> Date: Sat, 11 Feb 2012 02:57:43 GMT
>>>> Content-Type: text/plain; charset=utf-8
>>>> Content-Length: 102
>>>> Cache-Control: must-revalidate
>>>>
>>>> {"error":"forbidden","reason":"Only admins can access design document
>>>> actions for system databases."}
>>>>
>>> Yes that's by design.
>>>
>>> - benoƮt



-- 
Iris Couch

Mime
View raw message