asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Carey <dtab...@gmail.com>
Subject Re: json vs. JSON
Date Wed, 05 Aug 2015 22:40:43 GMT
Agreed on all points (so to speak) - though for the spatial JSON types I 
wonder if 
http://blogs.esri.com/esri/arcgis/2014/12/16/arcgis-online-geojson/ 
should be our guide?

On 8/5/15 1:30 AM, Chris Hillery wrote:
> Sure, I think that shouldn't be too hard, given some help with the
> questions I raised.
>
> To start the discussion, I wrote a query that outputs all ADM types to show
> how they are serialized to JSON (except the interval types, which throw a
> NotImplementedException if you try to serialize them to JSON currently) :
>
> { "string": string("Nancy"),
>    "float": 32.5f,
>    "double" : double("-2013.5938237483274"),
>    "boolean" : true,
>    "int8": int8("125"),
>    "int16": int16("32765"),
>    "int32": int32("294967295"),
>    "int64": int64("1700000000000000000"),
>    "unorderedList": {{"reading","writing"}},
>    "orderedList": ["Brad","Scott"],
>    "record": {  "number": 8389, "street": "Hill St.", "city": "Mountain
> View" },
>    "date": date("-2011-01-27"),
>    "time": time("12:20:30Z"),
>    "datetime": datetime("-1951-12-27T12:20:30"),
>    "duration": duration("P10Y11M12DT10H50M30S"),
>    "location2d": point("41.00,44.00"),
>    "location3d": point3d("44.00,13.00,41.00"),
>    "line" : line("10.1,11.1 10.2,11.2"),
>    "rectangle" : rectangle("5.1,11.8 87.6,15.6548"),
>    "polygon" : polygon("1.2,1.3 2.1,2.5 3.5,3.6 4.6,4.8"),
>    "circle" : circle("10.1,11.1 10.2"),
>    "binary" : hex("ABCDEF0123456789"),
>   "uuid": uuid("5c848e5c-6b6a-498f-8452-8847a2957421")
> }
>
> And here is how that gets serialized in "lossless JSON":
>
> { "string": "Nancy",
>    "float": 32.5,
>    "double": -2013.5938237483274,
>    "boolean": true,
>    "int8": { "int8": 125 },
>    "int16": { "int16": 32765 },
>    "int32": { "int32": 294967295 },
>    "int64": { "int64": 1700000000000000000 },
>    "unorderedList": { "unorderedlist": [ "reading", "writing" ] },
>    "orderedList": { "orderedlist": [ "Brad", "Scott" ] },
>    "record": { "number": { "int64": 8389 }, "street": "Hill St.", "city":
> "Mountain View" },
>    "date": { "date": -125625945600000},
>    "time": { "time": 44430000},
>    "datetime": { "datetime": -123703587570000},
>    "duration": { "duration": { "months": 131, "millis": 1075830000} },
>    "location2d": { "point": [41.0, 44.0] },
>    "location3d": { "point3d": [44.0, 13.0, 41.0] },
>    "line": { "line":  [ { "point": [10.1, 11.1] }, { "point": [10.2, 11.2] }
> ] },
>    "rectangle": { "rectangle": [{ "point": [5.1, 11.8] }, { "point": [87.6,
> 15.6548] } ] },
>    "polygon": { "polygon": [{ "point": [1.2, 1.3] },{ "point": [2.1, 2.5]
> },{ "point": [3.5, 3.6] },{ "point": [4.6, 4.8] }] },
>    "circle": { "circle": [10.1, { "point": [11.1, 10.2] } ] },
>    "binary": hex("ABCDEF0123456789"),
>    "uuid": uuid("5c848e5c-6b6a-498f-8452-8847a2957421")
> }
>
> Some observations and proposals:
>
> 1. The "JSON" serialization of the hex() and uuid() types are still broken
> (not even valid JSON).
>
> 2. IMHO the string, float, double, boolean, and record types are already
> serialized the way you would want in "clean JSON".
>
> 3. IMHO orderedList and unorderedList should be serialized as simple JSON
> arrays in "clean JSON".
>
> 4. The serializations of date, time, datetime, and duration, while valid
> JSON, are not very useful. It would be better if they were serialized as
> canonical date, time, or dateTime forms from XML Schema. In "clean JSON"
> they would be serialized simply as strings with that value. In "lossless
> JSON" they would be serialized as records as shown here, but with a string
> value, eg. { "date" : "-2011-01-27" }.
>
> 5. The serializations of int8/int16/int32/int64 should be serialized as
> straight JSON numbers in "clean JSON".
>
> 6. Interval types should be supported. I am open to suggestions as to how
> best to represent them in both "clean JSON" and "lossless JSON".
>
> 7. I'm really not sure what the best serialization of the spatial types
> would be in "clean JSON", but as a strawman, how about serializing all
> points as simple arrays of JSON numbers? Then line, rectangle, and polygon
> could either be an array of arrays, or else objects with names like
> "start"/"end" for line and rectangle and "point1", "point2", etc. for
> polygon. Circle, I think, should always be an object with the names
> "center" and "radius". So, in "clean JSON", the last few lines of the above
> query results would look like this:
>
>    "location2d" : [41.0, 44.0],
>    "location3d" : [44.0, 13.0, 41.0],
>    "line" : [ [10.1, 11.1], [10.2, 11.2] ],
>    "rectangle" : [ [5.1, 11.8], [87.6, 15.6548] ],
>    "polygon" : [ [1.2, 1.3], [2.1, 2.5], [3.5, 3.6], [4.6, 4.8] ],
>    "circle" : { "radius" : 10.1, "center" : [ 11.1, 10.2 ] },
>
> or like this:
>
>    "location2d" : [41.0, 44.0],
>    "location3d" : [44.0, 13.0, 41.0],
>    "line" : { "start" : [10.1, 11.1], "end" : [10.2, 11.2] },
>    "rectangle" : { "start" : [5.1, 11.8], "end" : [87.6, 15.6548] },
>    "polygon" : { "point1" : [1.2, 1.3], "point2" : [2.1, 2.5], "point3" :
> [3.5, 3.6], "point4" : [4.6, 4.8] },
>    "circle" : { "radius" : 10.1, "center" : [ 11.1, 10.2 ] },
>
> My preference would probably be the latter, just so that "circle" doesn't
> seem like such an odd duck and "line" and "rectangle" don't become
> ambiguous.
>
> (Aside: I think the current serialization of "circle" is broken; it seems
> to be scrambling the radius and the point values.)
>
> So there are a number of actions here even with the existing code, in
> addition to supporting the new clean JSON output. I also found some issues
> with the current AQL implementation and doc:
>
> A. ADM allows numeric serializations like 5550d for double and 12i8 for
> int8, but those are not valid in AQL it seems.
>
> B. AQL doesn't seem to have any constructors for intervals; you can only
> create them via functions like interval-from-date().
>
> (A) and (B) both basically mean that not all valid ADM can be read as AQL,
> which seems like it would be a desirable goal.
>
> C. The ADM doc doesn't mention the "point3d" type.
>
>
> Now accepting any input on the above, as well as the other issue about how
> to select this form of output via the HTTP interface!
>
> Ceej
> aka Chris Hillery
>
> On Wed, Aug 5, 2015 at 12:28 AM, Sattam Alsubaiee <salsubaiee@gmail.com>
> wrote:
>
>> HI Chris,
>>
>> Actually, it would be great if you can fix this since as you mentioned have
>> touched this part of the code.
>> Please confirm.
>>
>> Cheers,
>> Sattam
>>
>> On Wed, Aug 5, 2015 at 10:23 AM, Chris Hillery <chillery@lambda.nu> wrote:
>>
>>> I could take a look at this as well - it would be a natural extension of
>>> the work I did earlier to clean up the existing JSON output. It probably
>>> wouldn't be very difficult to do this in a relatively "dumb" way, but
>> there
>>> also is some amount of duplicated code between the various output formats
>>> and it would be tempting to try and tidy that up a bit as well.
>>>
>>> Three issues need to be addressed regardless of who does it or how:
>>>
>>> 1. We'd need to decide how to "strip down" all ADM types. In most numeric
>>> cases it's pretty clear. For spatial types, it deserves a little bit of
>>> thought. (It may be that the current "lossless" form is concise enough.
>> For
>>> example, the ADM instance { "foo" : point("5,5") } gets rendered in JSON
>> as
>>> { "foo" : { "point" : [5.0, 5.0] } } . Is there something that would be
>>> better?)
>>>
>>> 2. How would the user select this format vs. the current JSON form? When
>>> using the HTTP interface, the main way to select the returned
>> serialization
>>> is via the HTTP Accept: header, and you select the "lossless JSON" form
>>> with the MIME type application/json. If we have two different JSON
>>> serializations, we'd need to invent a new MIME type, or introduce some
>> kind
>>> of additional flag, or something.
>>>
>>> 3. When using the HTTP interface, the current lossless JSON is in fact
>> the
>>> default output type. Should that remain the case, or should the "lossy"
>>> JSON type be preferred?
>>>
>>> Ceej
>>> aka Chris Hillery
>>>
>>> On Wed, Aug 5, 2015 at 12:05 AM, Mike Carey <dtabass@gmail.com> wrote:
>>>
>>>> Cool.  Sattam + Wail are going to sign up to do this, I believe!
>>   (They
>>>> want/need it first....)
>>>>
>>>>
>>>> On 8/1/15 9:38 AM, Till Westmann wrote:
>>>>
>>>>> Only a few thoughts:
>>>>> 1) Yes, we should definitely have that!
>>>>> 2) For the non-numeric extended atomic types we should find a
>> reasonable
>>>>> string serialization and we need to provide functions to parse that
>>>>> serialization back to the extended atomic type (and I think that we
>>> already
>>>>> have that e.g. for the datetime types).
>>>>> 3) I think that we already had that discussion a few times (I remember
>>>>> arguing for it when I first joined the project) and it’s time to do
it
>>> :)
>>>>> Cheers,
>>>>> Till
>>>>>
>>>>> On Aug 1, 2015, at 9:17 AM, Mike Carey <dtabass@gmail.com> wrote:
>>>>>> Hey - our JSON output format is currently designed to be non-lossy,
>> in
>>>>>> the sense that it encodes all the details of the source types (since
>>> ADM is
>>>>>> JSON++ and there's quite a bit in that ++ section).  We really also
>>> need an
>>>>>> option for "normal application users" that's lossy but produces the
>>> kind of
>>>>>> JSON that would be expected by consuming applications that "don't
>>>>>> appreciate" the many different kinds of numeric data, the existence
>> of
>>>>>> spatial data, etc.  I.e., it'd be nice to have a default lossy
>>>>>> serialization into JSON as well....  (Note that if someone doesn't
>>> want to
>>>>>> suffer the loss, they can always do their own out-conversions of
the
>>> data
>>>>>> in the return section of their AQL query to bridge the gap.)
>> Thoughts?
>>>>>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message