asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Till Westmann" <ti...@apache.org>
Subject Re: json vs. JSON
Date Thu, 06 Aug 2015 00:45:10 GMT
I think that one of the questions is if we want to map the atomic ADM 
types to a single value in clean JSON of if we want them to map to 
structured types (objects, arrays).

My first reaction would have been to use the structured types for 
lossless JSON and only single values for clean JSON. If you look e.g. at 
the datetime types, those would also naturally lend themselves to a 
structured representation, but we seem to agree that the XMLSchema 
string version is a good choice.
Staying consistent with that choice, I would try to find a simple string 
representation for the spatial types - ideally one that is easily parsed 
and widely accepted :)
Looking for something that might be accepted I stumbled upon 
https://en.wikipedia.org/wiki/Well-known_text , but I'm actually not 
sure if that's a good fit (even if it seems to be supported by a number 
of DBMS ...).

Thoughts?

Thanks,
Till

On 5 Aug 2015, at 1:30, Chris Hillery wrote:

> Sure, I think that shouldn't be too hard, given some help with the
> questions I raised.
>
> To start the discussion, I wrote a query that outputs all ADM types to 
> show
> how they are serialized to JSON (except the interval types, which 
> throw a
> NotImplementedException if you try to serialize them to JSON 
> currently) :
>
> { "string": string("Nancy"),
> "float": 32.5f,
> "double" : double("-2013.5938237483274"),
> "boolean" : true,
> "int8": int8("125"),
> "int16": int16("32765"),
> "int32": int32("294967295"),
> "int64": int64("1700000000000000000"),
> "unorderedList": {{"reading","writing"}},
> "orderedList": ["Brad","Scott"],
> "record": {  "number": 8389, "street": "Hill St.", "city": "Mountain
> View" },
> "date": date("-2011-01-27"),
> "time": time("12:20:30Z"),
> "datetime": datetime("-1951-12-27T12:20:30"),
> "duration": duration("P10Y11M12DT10H50M30S"),
> "location2d": point("41.00,44.00"),
> "location3d": point3d("44.00,13.00,41.00"),
> "line" : line("10.1,11.1 10.2,11.2"),
> "rectangle" : rectangle("5.1,11.8 87.6,15.6548"),
> "polygon" : polygon("1.2,1.3 2.1,2.5 3.5,3.6 4.6,4.8"),
> "circle" : circle("10.1,11.1 10.2"),
> "binary" : hex("ABCDEF0123456789"),
> "uuid": uuid("5c848e5c-6b6a-498f-8452-8847a2957421")
> }
>
> And here is how that gets serialized in "lossless JSON":
>
> { "string": "Nancy",
> "float": 32.5,
> "double": -2013.5938237483274,
> "boolean": true,
> "int8": { "int8": 125 },
> "int16": { "int16": 32765 },
> "int32": { "int32": 294967295 },
> "int64": { "int64": 1700000000000000000 },
> "unorderedList": { "unorderedlist": [ "reading", "writing" ] },
> "orderedList": { "orderedlist": [ "Brad", "Scott" ] },
> "record": { "number": { "int64": 8389 }, "street": "Hill St.", "city":
> "Mountain View" },
> "date": { "date": -125625945600000},
> "time": { "time": 44430000},
> "datetime": { "datetime": -123703587570000},
> "duration": { "duration": { "months": 131, "millis": 1075830000} },
> "location2d": { "point": [41.0, 44.0] },
> "location3d": { "point3d": [44.0, 13.0, 41.0] },
> "line": { "line":  [ { "point": [10.1, 11.1] }, { "point": [10.2, 
> 11.2] }
> ] },
> "rectangle": { "rectangle": [{ "point": [5.1, 11.8] }, { "point": 
> [87.6,
> 15.6548] } ] },
> "polygon": { "polygon": [{ "point": [1.2, 1.3] },{ "point": [2.1, 2.5]
> },{ "point": [3.5, 3.6] },{ "point": [4.6, 4.8] }] },
> "circle": { "circle": [10.1, { "point": [11.1, 10.2] } ] },
> "binary": hex("ABCDEF0123456789"),
> "uuid": uuid("5c848e5c-6b6a-498f-8452-8847a2957421")
> }
>
> Some observations and proposals:
>
> 1. The "JSON" serialization of the hex() and uuid() types are still 
> broken
> (not even valid JSON).
>
> 2. IMHO the string, float, double, boolean, and record types are 
> already
> serialized the way you would want in "clean JSON".
>
> 3. IMHO orderedList and unorderedList should be serialized as simple 
> JSON
> arrays in "clean JSON".
>
> 4. The serializations of date, time, datetime, and duration, while 
> valid
> JSON, are not very useful. It would be better if they were serialized 
> as
> canonical date, time, or dateTime forms from XML Schema. In "clean 
> JSON"
> they would be serialized simply as strings with that value. In 
> "lossless
> JSON" they would be serialized as records as shown here, but with a 
> string
> value, eg. { "date" : "-2011-01-27" }.
>
> 5. The serializations of int8/int16/int32/int64 should be serialized 
> as
> straight JSON numbers in "clean JSON".
>
> 6. Interval types should be supported. I am open to suggestions as to 
> how
> best to represent them in both "clean JSON" and "lossless JSON".
>
> 7. I'm really not sure what the best serialization of the spatial 
> types
> would be in "clean JSON", but as a strawman, how about serializing all
> points as simple arrays of JSON numbers? Then line, rectangle, and 
> polygon
> could either be an array of arrays, or else objects with names like
> "start"/"end" for line and rectangle and "point1", "point2", etc. for
> polygon. Circle, I think, should always be an object with the names
> "center" and "radius". So, in "clean JSON", the last few lines of the 
> above
> query results would look like this:
>
> "location2d" : [41.0, 44.0],
> "location3d" : [44.0, 13.0, 41.0],
> "line" : [ [10.1, 11.1], [10.2, 11.2] ],
> "rectangle" : [ [5.1, 11.8], [87.6, 15.6548] ],
> "polygon" : [ [1.2, 1.3], [2.1, 2.5], [3.5, 3.6], [4.6, 4.8] ],
> "circle" : { "radius" : 10.1, "center" : [ 11.1, 10.2 ] },
>
> or like this:
>
> "location2d" : [41.0, 44.0],
> "location3d" : [44.0, 13.0, 41.0],
> "line" : { "start" : [10.1, 11.1], "end" : [10.2, 11.2] },
> "rectangle" : { "start" : [5.1, 11.8], "end" : [87.6, 15.6548] },
> "polygon" : { "point1" : [1.2, 1.3], "point2" : [2.1, 2.5], "point3" :
> [3.5, 3.6], "point4" : [4.6, 4.8] },
> "circle" : { "radius" : 10.1, "center" : [ 11.1, 10.2 ] },
>
> My preference would probably be the latter, just so that "circle" 
> doesn't
> seem like such an odd duck and "line" and "rectangle" don't become
> ambiguous.
>
> (Aside: I think the current serialization of "circle" is broken; it 
> seems
> to be scrambling the radius and the point values.)
>
> So there are a number of actions here even with the existing code, in
> addition to supporting the new clean JSON output. I also found some 
> issues
> with the current AQL implementation and doc:
>
> A. ADM allows numeric serializations like 5550d for double and 12i8 
> for
> int8, but those are not valid in AQL it seems.
>
> B. AQL doesn't seem to have any constructors for intervals; you can 
> only
> create them via functions like interval-from-date().
>
> (A) and (B) both basically mean that not all valid ADM can be read as 
> AQL,
> which seems like it would be a desirable goal.
>
> C. The ADM doc doesn't mention the "point3d" type.
>
>
> Now accepting any input on the above, as well as the other issue about 
> how
> to select this form of output via the HTTP interface!
>
> Ceej
> aka Chris Hillery
>
> On Wed, Aug 5, 2015 at 12:28 AM, Sattam Alsubaiee 
> <salsubaiee@gmail.com>
> wrote:
>
>> HI Chris,
>>
>> Actually, it would be great if you can fix this since as you 
>> mentioned have
>> touched this part of the code.
>> Please confirm.
>>
>> Cheers,
>> Sattam
>>
>> On Wed, Aug 5, 2015 at 10:23 AM, Chris Hillery <chillery@lambda.nu> 
>> wrote:
>>
>>> I could take a look at this as well - it would be a natural 
>>> extension of
>>> the work I did earlier to clean up the existing JSON output. It 
>>> probably
>>> wouldn't be very difficult to do this in a relatively "dumb" way, 
>>> but
>> there
>>> also is some amount of duplicated code between the various output 
>>> formats
>>> and it would be tempting to try and tidy that up a bit as well.
>>>
>>> Three issues need to be addressed regardless of who does it or how:
>>>
>>> 1. We'd need to decide how to "strip down" all ADM types. In most 
>>> numeric
>>> cases it's pretty clear. For spatial types, it deserves a little bit 
>>> of
>>> thought. (It may be that the current "lossless" form is concise 
>>> enough.
>> For
>>> example, the ADM instance { "foo" : point("5,5") } gets rendered in 
>>> JSON
>> as
>>> { "foo" : { "point" : [5.0, 5.0] } } . Is there something that would 
>>> be
>>> better?)
>>>
>>> 2. How would the user select this format vs. the current JSON form? 
>>> When
>>> using the HTTP interface, the main way to select the returned
>> serialization
>>> is via the HTTP Accept: header, and you select the "lossless JSON" 
>>> form
>>> with the MIME type application/json. If we have two different JSON
>>> serializations, we'd need to invent a new MIME type, or introduce 
>>> some
>> kind
>>> of additional flag, or something.
>>>
>>> 3. When using the HTTP interface, the current lossless JSON is in 
>>> fact
>> the
>>> default output type. Should that remain the case, or should the 
>>> "lossy"
>>> JSON type be preferred?
>>>
>>> Ceej
>>> aka Chris Hillery
>>>
>>> On Wed, Aug 5, 2015 at 12:05 AM, Mike Carey <dtabass@gmail.com> 
>>> wrote:
>>>
>>>> Cool.  Sattam + Wail are going to sign up to do this, I believe!
>> (They
>>>> want/need it first....)
>>>>
>>>>
>>>> On 8/1/15 9:38 AM, Till Westmann wrote:
>>>>
>>>>> Only a few thoughts:
>>>>> 1) Yes, we should definitely have that!
>>>>> 2) For the non-numeric extended atomic types we should find a
>> reasonable
>>>>> string serialization and we need to provide functions to parse 
>>>>> that
>>>>> serialization back to the extended atomic type (and I think that 
>>>>> we
>>> already
>>>>> have that e.g. for the datetime types).
>>>>> 3) I think that we already had that discussion a few times (I 
>>>>> remember
>>>>> arguing for it when I first joined the project) and it’s time to 
>>>>> do it
>>> :)
>>>>>
>>>>> Cheers,
>>>>> Till
>>>>>
>>>>> On Aug 1, 2015, at 9:17 AM, Mike Carey <dtabass@gmail.com> wrote:
>>>>>>
>>>>>> Hey - our JSON output format is currently designed to be 
>>>>>> non-lossy,
>> in
>>>>>> the sense that it encodes all the details of the source types 
>>>>>> (since
>>> ADM is
>>>>>> JSON++ and there's quite a bit in that ++ section).  We really 
>>>>>> also
>>> need an
>>>>>> option for "normal application users" that's lossy but produces 
>>>>>> the
>>> kind of
>>>>>> JSON that would be expected by consuming applications that "don't
>>>>>> appreciate" the many different kinds of numeric data, the 
>>>>>> existence
>> of
>>>>>> spatial data, etc.  I.e., it'd be nice to have a default lossy
>>>>>> serialization into JSON as well....  (Note that if someone 
>>>>>> doesn't
>>> want to
>>>>>> suffer the loss, they can always do their own out-conversions of

>>>>>> the
>>> data
>>>>>> in the return section of their AQL query to bridge the gap.)
>> Thoughts?
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>

Mime
View raw message