asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sattam Alsubaiee <salsuba...@gmail.com>
Subject Re: json vs. JSON
Date Thu, 06 Aug 2015 04:19:52 GMT
Agreed on your points as well.
For spatial types, it looks like GeoJSON is a good reference. We can pretty
much use their style for all of our spatial types:

"geometry": { "type": "Point", "coordinates": [104.0, 39.0] }
"geometry": { "type": "Line", "coordinates": [[104.0, 39.0], [110.0, 44.0]]
}
"geometry": { "type": "Rectangle", "coordinates": [[104.0, 39.0], [110.0,
44.0]] }
"geometry": { "type": "Polygon", "coordinates": [[104.0, 39.0], [110.0,
44.0], [120.0, 46.0], [110.0, 44.0]] }
"geometry": { "type": "Circle", "coordinates": [104.0, 39.0] , "raduis":
5.0}

The point3d was created from the beginning, but we don't mention it in the
supported data types and currently there is no supported operations for 3d.
So I would suggest we completely remove it.

Sattam

On Thu, Aug 6, 2015 at 2:10 AM, Chris Hillery <chillery@hillery.land> wrote:

> Took a quick look at the relevant spec (
> http://geojson.org/geojson-spec.html).
> GeoJSON appears to have point and polygon which directly map to ADM (their
> Point is N-dimensional so it would work for both point and point3d). It has
> LineString, of which ADM's line is a degenerate case. It has no corollary
> for rectangle - I guess you could implement rectangle as a Polygon, but our
> rectangle is kind of different and simplified. GeoJSON also has no circle.
>
> So, while I think we could come up with a JSON representation that is
> "inspired by" GeoJSON, we can't use just GeoJSON. I'm not sure it's a close
> enough analogue that there be value there, for either us or for anyone
> actually working with GeoJSON. It would probably be better for us to have a
> library of functions to translate in/out of GeoJSON, using ADM native types
> where possible.
>
> Ceej
> aka Chris Hillery
>
> On Wed, Aug 5, 2015 at 3:40 PM, Mike Carey <dtabass@gmail.com> wrote:
>
> > Agreed on all points (so to speak) - though for the spatial JSON types I
> > wonder if
> > http://blogs.esri.com/esri/arcgis/2014/12/16/arcgis-online-geojson/
> > should be our guide?
> >
> >
> > On 8/5/15 1:30 AM, Chris Hillery wrote:
> >
> >> Sure, I think that shouldn't be too hard, given some help with the
> >> questions I raised.
> >>
> >> To start the discussion, I wrote a query that outputs all ADM types to
> >> show
> >> how they are serialized to JSON (except the interval types, which throw
> a
> >> NotImplementedException if you try to serialize them to JSON currently)
> :
> >>
> >> { "string": string("Nancy"),
> >>    "float": 32.5f,
> >>    "double" : double("-2013.5938237483274"),
> >>    "boolean" : true,
> >>    "int8": int8("125"),
> >>    "int16": int16("32765"),
> >>    "int32": int32("294967295"),
> >>    "int64": int64("1700000000000000000"),
> >>    "unorderedList": {{"reading","writing"}},
> >>    "orderedList": ["Brad","Scott"],
> >>    "record": {  "number": 8389, "street": "Hill St.", "city": "Mountain
> >> View" },
> >>    "date": date("-2011-01-27"),
> >>    "time": time("12:20:30Z"),
> >>    "datetime": datetime("-1951-12-27T12:20:30"),
> >>    "duration": duration("P10Y11M12DT10H50M30S"),
> >>    "location2d": point("41.00,44.00"),
> >>    "location3d": point3d("44.00,13.00,41.00"),
> >>    "line" : line("10.1,11.1 10.2,11.2"),
> >>    "rectangle" : rectangle("5.1,11.8 87.6,15.6548"),
> >>    "polygon" : polygon("1.2,1.3 2.1,2.5 3.5,3.6 4.6,4.8"),
> >>    "circle" : circle("10.1,11.1 10.2"),
> >>    "binary" : hex("ABCDEF0123456789"),
> >>   "uuid": uuid("5c848e5c-6b6a-498f-8452-8847a2957421")
> >> }
> >>
> >> And here is how that gets serialized in "lossless JSON":
> >>
> >> { "string": "Nancy",
> >>    "float": 32.5,
> >>    "double": -2013.5938237483274,
> >>    "boolean": true,
> >>    "int8": { "int8": 125 },
> >>    "int16": { "int16": 32765 },
> >>    "int32": { "int32": 294967295 },
> >>    "int64": { "int64": 1700000000000000000 },
> >>    "unorderedList": { "unorderedlist": [ "reading", "writing" ] },
> >>    "orderedList": { "orderedlist": [ "Brad", "Scott" ] },
> >>    "record": { "number": { "int64": 8389 }, "street": "Hill St.",
> "city":
> >> "Mountain View" },
> >>    "date": { "date": -125625945600000},
> >>    "time": { "time": 44430000},
> >>    "datetime": { "datetime": -123703587570000},
> >>    "duration": { "duration": { "months": 131, "millis": 1075830000} },
> >>    "location2d": { "point": [41.0, 44.0] },
> >>    "location3d": { "point3d": [44.0, 13.0, 41.0] },
> >>    "line": { "line":  [ { "point": [10.1, 11.1] }, { "point": [10.2,
> >> 11.2] }
> >> ] },
> >>    "rectangle": { "rectangle": [{ "point": [5.1, 11.8] }, { "point":
> >> [87.6,
> >> 15.6548] } ] },
> >>    "polygon": { "polygon": [{ "point": [1.2, 1.3] },{ "point": [2.1,
> 2.5]
> >> },{ "point": [3.5, 3.6] },{ "point": [4.6, 4.8] }] },
> >>    "circle": { "circle": [10.1, { "point": [11.1, 10.2] } ] },
> >>    "binary": hex("ABCDEF0123456789"),
> >>    "uuid": uuid("5c848e5c-6b6a-498f-8452-8847a2957421")
> >> }
> >>
> >> Some observations and proposals:
> >>
> >> 1. The "JSON" serialization of the hex() and uuid() types are still
> broken
> >> (not even valid JSON).
> >>
> >> 2. IMHO the string, float, double, boolean, and record types are already
> >> serialized the way you would want in "clean JSON".
> >>
> >> 3. IMHO orderedList and unorderedList should be serialized as simple
> JSON
> >> arrays in "clean JSON".
> >>
> >> 4. The serializations of date, time, datetime, and duration, while valid
> >> JSON, are not very useful. It would be better if they were serialized as
> >> canonical date, time, or dateTime forms from XML Schema. In "clean JSON"
> >> they would be serialized simply as strings with that value. In "lossless
> >> JSON" they would be serialized as records as shown here, but with a
> string
> >> value, eg. { "date" : "-2011-01-27" }.
> >>
> >> 5. The serializations of int8/int16/int32/int64 should be serialized as
> >> straight JSON numbers in "clean JSON".
> >>
> >> 6. Interval types should be supported. I am open to suggestions as to
> how
> >> best to represent them in both "clean JSON" and "lossless JSON".
> >>
> >> 7. I'm really not sure what the best serialization of the spatial types
> >> would be in "clean JSON", but as a strawman, how about serializing all
> >> points as simple arrays of JSON numbers? Then line, rectangle, and
> polygon
> >> could either be an array of arrays, or else objects with names like
> >> "start"/"end" for line and rectangle and "point1", "point2", etc. for
> >> polygon. Circle, I think, should always be an object with the names
> >> "center" and "radius". So, in "clean JSON", the last few lines of the
> >> above
> >> query results would look like this:
> >>
> >>    "location2d" : [41.0, 44.0],
> >>    "location3d" : [44.0, 13.0, 41.0],
> >>    "line" : [ [10.1, 11.1], [10.2, 11.2] ],
> >>    "rectangle" : [ [5.1, 11.8], [87.6, 15.6548] ],
> >>    "polygon" : [ [1.2, 1.3], [2.1, 2.5], [3.5, 3.6], [4.6, 4.8] ],
> >>    "circle" : { "radius" : 10.1, "center" : [ 11.1, 10.2 ] },
> >>
> >> or like this:
> >>
> >>    "location2d" : [41.0, 44.0],
> >>    "location3d" : [44.0, 13.0, 41.0],
> >>    "line" : { "start" : [10.1, 11.1], "end" : [10.2, 11.2] },
> >>    "rectangle" : { "start" : [5.1, 11.8], "end" : [87.6, 15.6548] },
> >>    "polygon" : { "point1" : [1.2, 1.3], "point2" : [2.1, 2.5], "point3"
> :
> >> [3.5, 3.6], "point4" : [4.6, 4.8] },
> >>    "circle" : { "radius" : 10.1, "center" : [ 11.1, 10.2 ] },
> >>
> >> My preference would probably be the latter, just so that "circle"
> doesn't
> >> seem like such an odd duck and "line" and "rectangle" don't become
> >> ambiguous.
> >>
> >> (Aside: I think the current serialization of "circle" is broken; it
> seems
> >> to be scrambling the radius and the point values.)
> >>
> >> So there are a number of actions here even with the existing code, in
> >> addition to supporting the new clean JSON output. I also found some
> issues
> >> with the current AQL implementation and doc:
> >>
> >> A. ADM allows numeric serializations like 5550d for double and 12i8 for
> >> int8, but those are not valid in AQL it seems.
> >>
> >> B. AQL doesn't seem to have any constructors for intervals; you can only
> >> create them via functions like interval-from-date().
> >>
> >> (A) and (B) both basically mean that not all valid ADM can be read as
> AQL,
> >> which seems like it would be a desirable goal.
> >>
> >> C. The ADM doc doesn't mention the "point3d" type.
> >>
> >>
> >> Now accepting any input on the above, as well as the other issue about
> how
> >> to select this form of output via the HTTP interface!
> >>
> >> Ceej
> >> aka Chris Hillery
> >>
> >> On Wed, Aug 5, 2015 at 12:28 AM, Sattam Alsubaiee <salsubaiee@gmail.com
> >
> >> wrote:
> >>
> >> HI Chris,
> >>>
> >>> Actually, it would be great if you can fix this since as you mentioned
> >>> have
> >>> touched this part of the code.
> >>> Please confirm.
> >>>
> >>> Cheers,
> >>> Sattam
> >>>
> >>> On Wed, Aug 5, 2015 at 10:23 AM, Chris Hillery <chillery@lambda.nu>
> >>> wrote:
> >>>
> >>> I could take a look at this as well - it would be a natural extension
> of
> >>>> the work I did earlier to clean up the existing JSON output. It
> probably
> >>>> wouldn't be very difficult to do this in a relatively "dumb" way, but
> >>>>
> >>> there
> >>>
> >>>> also is some amount of duplicated code between the various output
> >>>> formats
> >>>> and it would be tempting to try and tidy that up a bit as well.
> >>>>
> >>>> Three issues need to be addressed regardless of who does it or how:
> >>>>
> >>>> 1. We'd need to decide how to "strip down" all ADM types. In most
> >>>> numeric
> >>>> cases it's pretty clear. For spatial types, it deserves a little bit
> of
> >>>> thought. (It may be that the current "lossless" form is concise
> enough.
> >>>>
> >>> For
> >>>
> >>>> example, the ADM instance { "foo" : point("5,5") } gets rendered in
> JSON
> >>>>
> >>> as
> >>>
> >>>> { "foo" : { "point" : [5.0, 5.0] } } . Is there something that would
> be
> >>>> better?)
> >>>>
> >>>> 2. How would the user select this format vs. the current JSON form?
> When
> >>>> using the HTTP interface, the main way to select the returned
> >>>>
> >>> serialization
> >>>
> >>>> is via the HTTP Accept: header, and you select the "lossless JSON"
> form
> >>>> with the MIME type application/json. If we have two different JSON
> >>>> serializations, we'd need to invent a new MIME type, or introduce some
> >>>>
> >>> kind
> >>>
> >>>> of additional flag, or something.
> >>>>
> >>>> 3. When using the HTTP interface, the current lossless JSON is in fact
> >>>>
> >>> the
> >>>
> >>>> default output type. Should that remain the case, or should the
> "lossy"
> >>>> JSON type be preferred?
> >>>>
> >>>> Ceej
> >>>> aka Chris Hillery
> >>>>
> >>>> On Wed, Aug 5, 2015 at 12:05 AM, Mike Carey <dtabass@gmail.com>
> wrote:
> >>>>
> >>>> Cool.  Sattam + Wail are going to sign up to do this, I believe!
> >>>>>
> >>>>   (They
> >>>
> >>>> want/need it first....)
> >>>>>
> >>>>>
> >>>>> On 8/1/15 9:38 AM, Till Westmann wrote:
> >>>>>
> >>>>> Only a few thoughts:
> >>>>>> 1) Yes, we should definitely have that!
> >>>>>> 2) For the non-numeric extended atomic types we should find
a
> >>>>>>
> >>>>> reasonable
> >>>
> >>>> string serialization and we need to provide functions to parse that
> >>>>>> serialization back to the extended atomic type (and I think
that we
> >>>>>>
> >>>>> already
> >>>>
> >>>>> have that e.g. for the datetime types).
> >>>>>> 3) I think that we already had that discussion a few times (I
> remember
> >>>>>> arguing for it when I first joined the project) and it’s time
to do
> it
> >>>>>>
> >>>>> :)
> >>>>
> >>>>> Cheers,
> >>>>>> Till
> >>>>>>
> >>>>>> On Aug 1, 2015, at 9:17 AM, Mike Carey <dtabass@gmail.com>
wrote:
> >>>>>>
> >>>>>>> Hey - our JSON output format is currently designed to be
non-lossy,
> >>>>>>>
> >>>>>> in
> >>>
> >>>> the sense that it encodes all the details of the source types (since
> >>>>>>>
> >>>>>> ADM is
> >>>>
> >>>>> JSON++ and there's quite a bit in that ++ section).  We really also
> >>>>>>>
> >>>>>> need an
> >>>>
> >>>>> option for "normal application users" that's lossy but produces
the
> >>>>>>>
> >>>>>> kind of
> >>>>
> >>>>> JSON that would be expected by consuming applications that "don't
> >>>>>>> appreciate" the many different kinds of numeric data, the
existence
> >>>>>>>
> >>>>>> of
> >>>
> >>>> spatial data, etc.  I.e., it'd be nice to have a default lossy
> >>>>>>> serialization into JSON as well....  (Note that if someone
doesn't
> >>>>>>>
> >>>>>> want to
> >>>>
> >>>>> suffer the loss, they can always do their own out-conversions of
the
> >>>>>>>
> >>>>>> data
> >>>>
> >>>>> in the return section of their AQL query to bridge the gap.)
> >>>>>>>
> >>>>>> Thoughts?
> >>>
> >>>>
> >>>>>>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message