asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hillery <chill...@hillery.land>
Subject Re: json vs. JSON
Date Wed, 05 Aug 2015 23:10:21 GMT
Took a quick look at the relevant spec (http://geojson.org/geojson-spec.html).
GeoJSON appears to have point and polygon which directly map to ADM (their
Point is N-dimensional so it would work for both point and point3d). It has
LineString, of which ADM's line is a degenerate case. It has no corollary
for rectangle - I guess you could implement rectangle as a Polygon, but our
rectangle is kind of different and simplified. GeoJSON also has no circle.

So, while I think we could come up with a JSON representation that is
"inspired by" GeoJSON, we can't use just GeoJSON. I'm not sure it's a close
enough analogue that there be value there, for either us or for anyone
actually working with GeoJSON. It would probably be better for us to have a
library of functions to translate in/out of GeoJSON, using ADM native types
where possible.

Ceej
aka Chris Hillery

On Wed, Aug 5, 2015 at 3:40 PM, Mike Carey <dtabass@gmail.com> wrote:

> Agreed on all points (so to speak) - though for the spatial JSON types I
> wonder if
> http://blogs.esri.com/esri/arcgis/2014/12/16/arcgis-online-geojson/
> should be our guide?
>
>
> On 8/5/15 1:30 AM, Chris Hillery wrote:
>
>> Sure, I think that shouldn't be too hard, given some help with the
>> questions I raised.
>>
>> To start the discussion, I wrote a query that outputs all ADM types to
>> show
>> how they are serialized to JSON (except the interval types, which throw a
>> NotImplementedException if you try to serialize them to JSON currently) :
>>
>> { "string": string("Nancy"),
>>    "float": 32.5f,
>>    "double" : double("-2013.5938237483274"),
>>    "boolean" : true,
>>    "int8": int8("125"),
>>    "int16": int16("32765"),
>>    "int32": int32("294967295"),
>>    "int64": int64("1700000000000000000"),
>>    "unorderedList": {{"reading","writing"}},
>>    "orderedList": ["Brad","Scott"],
>>    "record": {  "number": 8389, "street": "Hill St.", "city": "Mountain
>> View" },
>>    "date": date("-2011-01-27"),
>>    "time": time("12:20:30Z"),
>>    "datetime": datetime("-1951-12-27T12:20:30"),
>>    "duration": duration("P10Y11M12DT10H50M30S"),
>>    "location2d": point("41.00,44.00"),
>>    "location3d": point3d("44.00,13.00,41.00"),
>>    "line" : line("10.1,11.1 10.2,11.2"),
>>    "rectangle" : rectangle("5.1,11.8 87.6,15.6548"),
>>    "polygon" : polygon("1.2,1.3 2.1,2.5 3.5,3.6 4.6,4.8"),
>>    "circle" : circle("10.1,11.1 10.2"),
>>    "binary" : hex("ABCDEF0123456789"),
>>   "uuid": uuid("5c848e5c-6b6a-498f-8452-8847a2957421")
>> }
>>
>> And here is how that gets serialized in "lossless JSON":
>>
>> { "string": "Nancy",
>>    "float": 32.5,
>>    "double": -2013.5938237483274,
>>    "boolean": true,
>>    "int8": { "int8": 125 },
>>    "int16": { "int16": 32765 },
>>    "int32": { "int32": 294967295 },
>>    "int64": { "int64": 1700000000000000000 },
>>    "unorderedList": { "unorderedlist": [ "reading", "writing" ] },
>>    "orderedList": { "orderedlist": [ "Brad", "Scott" ] },
>>    "record": { "number": { "int64": 8389 }, "street": "Hill St.", "city":
>> "Mountain View" },
>>    "date": { "date": -125625945600000},
>>    "time": { "time": 44430000},
>>    "datetime": { "datetime": -123703587570000},
>>    "duration": { "duration": { "months": 131, "millis": 1075830000} },
>>    "location2d": { "point": [41.0, 44.0] },
>>    "location3d": { "point3d": [44.0, 13.0, 41.0] },
>>    "line": { "line":  [ { "point": [10.1, 11.1] }, { "point": [10.2,
>> 11.2] }
>> ] },
>>    "rectangle": { "rectangle": [{ "point": [5.1, 11.8] }, { "point":
>> [87.6,
>> 15.6548] } ] },
>>    "polygon": { "polygon": [{ "point": [1.2, 1.3] },{ "point": [2.1, 2.5]
>> },{ "point": [3.5, 3.6] },{ "point": [4.6, 4.8] }] },
>>    "circle": { "circle": [10.1, { "point": [11.1, 10.2] } ] },
>>    "binary": hex("ABCDEF0123456789"),
>>    "uuid": uuid("5c848e5c-6b6a-498f-8452-8847a2957421")
>> }
>>
>> Some observations and proposals:
>>
>> 1. The "JSON" serialization of the hex() and uuid() types are still broken
>> (not even valid JSON).
>>
>> 2. IMHO the string, float, double, boolean, and record types are already
>> serialized the way you would want in "clean JSON".
>>
>> 3. IMHO orderedList and unorderedList should be serialized as simple JSON
>> arrays in "clean JSON".
>>
>> 4. The serializations of date, time, datetime, and duration, while valid
>> JSON, are not very useful. It would be better if they were serialized as
>> canonical date, time, or dateTime forms from XML Schema. In "clean JSON"
>> they would be serialized simply as strings with that value. In "lossless
>> JSON" they would be serialized as records as shown here, but with a string
>> value, eg. { "date" : "-2011-01-27" }.
>>
>> 5. The serializations of int8/int16/int32/int64 should be serialized as
>> straight JSON numbers in "clean JSON".
>>
>> 6. Interval types should be supported. I am open to suggestions as to how
>> best to represent them in both "clean JSON" and "lossless JSON".
>>
>> 7. I'm really not sure what the best serialization of the spatial types
>> would be in "clean JSON", but as a strawman, how about serializing all
>> points as simple arrays of JSON numbers? Then line, rectangle, and polygon
>> could either be an array of arrays, or else objects with names like
>> "start"/"end" for line and rectangle and "point1", "point2", etc. for
>> polygon. Circle, I think, should always be an object with the names
>> "center" and "radius". So, in "clean JSON", the last few lines of the
>> above
>> query results would look like this:
>>
>>    "location2d" : [41.0, 44.0],
>>    "location3d" : [44.0, 13.0, 41.0],
>>    "line" : [ [10.1, 11.1], [10.2, 11.2] ],
>>    "rectangle" : [ [5.1, 11.8], [87.6, 15.6548] ],
>>    "polygon" : [ [1.2, 1.3], [2.1, 2.5], [3.5, 3.6], [4.6, 4.8] ],
>>    "circle" : { "radius" : 10.1, "center" : [ 11.1, 10.2 ] },
>>
>> or like this:
>>
>>    "location2d" : [41.0, 44.0],
>>    "location3d" : [44.0, 13.0, 41.0],
>>    "line" : { "start" : [10.1, 11.1], "end" : [10.2, 11.2] },
>>    "rectangle" : { "start" : [5.1, 11.8], "end" : [87.6, 15.6548] },
>>    "polygon" : { "point1" : [1.2, 1.3], "point2" : [2.1, 2.5], "point3" :
>> [3.5, 3.6], "point4" : [4.6, 4.8] },
>>    "circle" : { "radius" : 10.1, "center" : [ 11.1, 10.2 ] },
>>
>> My preference would probably be the latter, just so that "circle" doesn't
>> seem like such an odd duck and "line" and "rectangle" don't become
>> ambiguous.
>>
>> (Aside: I think the current serialization of "circle" is broken; it seems
>> to be scrambling the radius and the point values.)
>>
>> So there are a number of actions here even with the existing code, in
>> addition to supporting the new clean JSON output. I also found some issues
>> with the current AQL implementation and doc:
>>
>> A. ADM allows numeric serializations like 5550d for double and 12i8 for
>> int8, but those are not valid in AQL it seems.
>>
>> B. AQL doesn't seem to have any constructors for intervals; you can only
>> create them via functions like interval-from-date().
>>
>> (A) and (B) both basically mean that not all valid ADM can be read as AQL,
>> which seems like it would be a desirable goal.
>>
>> C. The ADM doc doesn't mention the "point3d" type.
>>
>>
>> Now accepting any input on the above, as well as the other issue about how
>> to select this form of output via the HTTP interface!
>>
>> Ceej
>> aka Chris Hillery
>>
>> On Wed, Aug 5, 2015 at 12:28 AM, Sattam Alsubaiee <salsubaiee@gmail.com>
>> wrote:
>>
>> HI Chris,
>>>
>>> Actually, it would be great if you can fix this since as you mentioned
>>> have
>>> touched this part of the code.
>>> Please confirm.
>>>
>>> Cheers,
>>> Sattam
>>>
>>> On Wed, Aug 5, 2015 at 10:23 AM, Chris Hillery <chillery@lambda.nu>
>>> wrote:
>>>
>>> I could take a look at this as well - it would be a natural extension of
>>>> the work I did earlier to clean up the existing JSON output. It probably
>>>> wouldn't be very difficult to do this in a relatively "dumb" way, but
>>>>
>>> there
>>>
>>>> also is some amount of duplicated code between the various output
>>>> formats
>>>> and it would be tempting to try and tidy that up a bit as well.
>>>>
>>>> Three issues need to be addressed regardless of who does it or how:
>>>>
>>>> 1. We'd need to decide how to "strip down" all ADM types. In most
>>>> numeric
>>>> cases it's pretty clear. For spatial types, it deserves a little bit of
>>>> thought. (It may be that the current "lossless" form is concise enough.
>>>>
>>> For
>>>
>>>> example, the ADM instance { "foo" : point("5,5") } gets rendered in JSON
>>>>
>>> as
>>>
>>>> { "foo" : { "point" : [5.0, 5.0] } } . Is there something that would be
>>>> better?)
>>>>
>>>> 2. How would the user select this format vs. the current JSON form? When
>>>> using the HTTP interface, the main way to select the returned
>>>>
>>> serialization
>>>
>>>> is via the HTTP Accept: header, and you select the "lossless JSON" form
>>>> with the MIME type application/json. If we have two different JSON
>>>> serializations, we'd need to invent a new MIME type, or introduce some
>>>>
>>> kind
>>>
>>>> of additional flag, or something.
>>>>
>>>> 3. When using the HTTP interface, the current lossless JSON is in fact
>>>>
>>> the
>>>
>>>> default output type. Should that remain the case, or should the "lossy"
>>>> JSON type be preferred?
>>>>
>>>> Ceej
>>>> aka Chris Hillery
>>>>
>>>> On Wed, Aug 5, 2015 at 12:05 AM, Mike Carey <dtabass@gmail.com> wrote:
>>>>
>>>> Cool.  Sattam + Wail are going to sign up to do this, I believe!
>>>>>
>>>>   (They
>>>
>>>> want/need it first....)
>>>>>
>>>>>
>>>>> On 8/1/15 9:38 AM, Till Westmann wrote:
>>>>>
>>>>> Only a few thoughts:
>>>>>> 1) Yes, we should definitely have that!
>>>>>> 2) For the non-numeric extended atomic types we should find a
>>>>>>
>>>>> reasonable
>>>
>>>> string serialization and we need to provide functions to parse that
>>>>>> serialization back to the extended atomic type (and I think that
we
>>>>>>
>>>>> already
>>>>
>>>>> have that e.g. for the datetime types).
>>>>>> 3) I think that we already had that discussion a few times (I remember
>>>>>> arguing for it when I first joined the project) and it’s time to
do it
>>>>>>
>>>>> :)
>>>>
>>>>> Cheers,
>>>>>> Till
>>>>>>
>>>>>> On Aug 1, 2015, at 9:17 AM, Mike Carey <dtabass@gmail.com>
wrote:
>>>>>>
>>>>>>> Hey - our JSON output format is currently designed to be non-lossy,
>>>>>>>
>>>>>> in
>>>
>>>> the sense that it encodes all the details of the source types (since
>>>>>>>
>>>>>> ADM is
>>>>
>>>>> JSON++ and there's quite a bit in that ++ section).  We really also
>>>>>>>
>>>>>> need an
>>>>
>>>>> option for "normal application users" that's lossy but produces the
>>>>>>>
>>>>>> kind of
>>>>
>>>>> JSON that would be expected by consuming applications that "don't
>>>>>>> appreciate" the many different kinds of numeric data, the existence
>>>>>>>
>>>>>> of
>>>
>>>> spatial data, etc.  I.e., it'd be nice to have a default lossy
>>>>>>> serialization into JSON as well....  (Note that if someone doesn't
>>>>>>>
>>>>>> want to
>>>>
>>>>> suffer the loss, they can always do their own out-conversions of the
>>>>>>>
>>>>>> data
>>>>
>>>>> in the return section of their AQL query to bridge the gap.)
>>>>>>>
>>>>>> Thoughts?
>>>
>>>>
>>>>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message