asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wail Alkowaileet <wael....@gmail.com>
Subject Re: json vs. JSON
Date Wed, 05 Aug 2015 19:23:01 GMT
Hey Chris,

I agree with most of what you suggested but I have some points on the
Spatial part:
1- Line with *start* and *end* attributes give it a sense of "direction"
which means {"start": [x1,y1], "end": [x2,y2]} != {"start": [x2,y2], "end":
[x1,y1]}.

2-Similarly for the rectangles, it seem there are some direction on the
lines forming the rectangle.

* I would suggest something similar to JTS (Java Topology Suites) where
there are some generalization like for instance *rectangle *is-a *polygon *with
two-points.


On Wed, Aug 5, 2015 at 4:30 AM, Chris Hillery <chillery@hillery.land> wrote:

> Sure, I think that shouldn't be too hard, given some help with the
> questions I raised.
>
> To start the discussion, I wrote a query that outputs all ADM types to show
> how they are serialized to JSON (except the interval types, which throw a
> NotImplementedException if you try to serialize them to JSON currently) :
>
> { "string": string("Nancy"),
>   "float": 32.5f,
>   "double" : double("-2013.5938237483274"),
>   "boolean" : true,
>   "int8": int8("125"),
>   "int16": int16("32765"),
>   "int32": int32("294967295"),
>   "int64": int64("1700000000000000000"),
>   "unorderedList": {{"reading","writing"}},
>   "orderedList": ["Brad","Scott"],
>   "record": {  "number": 8389, "street": "Hill St.", "city": "Mountain
> View" },
>   "date": date("-2011-01-27"),
>   "time": time("12:20:30Z"),
>   "datetime": datetime("-1951-12-27T12:20:30"),
>   "duration": duration("P10Y11M12DT10H50M30S"),
>   "location2d": point("41.00,44.00"),
>   "location3d": point3d("44.00,13.00,41.00"),
>   "line" : line("10.1,11.1 10.2,11.2"),
>   "rectangle" : rectangle("5.1,11.8 87.6,15.6548"),
>   "polygon" : polygon("1.2,1.3 2.1,2.5 3.5,3.6 4.6,4.8"),
>   "circle" : circle("10.1,11.1 10.2"),
>   "binary" : hex("ABCDEF0123456789"),
>  "uuid": uuid("5c848e5c-6b6a-498f-8452-8847a2957421")
> }
>
> And here is how that gets serialized in "lossless JSON":
>
> { "string": "Nancy",
>   "float": 32.5,
>   "double": -2013.5938237483274,
>   "boolean": true,
>   "int8": { "int8": 125 },
>   "int16": { "int16": 32765 },
>   "int32": { "int32": 294967295 },
>   "int64": { "int64": 1700000000000000000 },
>   "unorderedList": { "unorderedlist": [ "reading", "writing" ] },
>   "orderedList": { "orderedlist": [ "Brad", "Scott" ] },
>   "record": { "number": { "int64": 8389 }, "street": "Hill St.", "city":
> "Mountain View" },
>   "date": { "date": -125625945600000},
>   "time": { "time": 44430000},
>   "datetime": { "datetime": -123703587570000},
>   "duration": { "duration": { "months": 131, "millis": 1075830000} },
>   "location2d": { "point": [41.0, 44.0] },
>   "location3d": { "point3d": [44.0, 13.0, 41.0] },
>   "line": { "line":  [ { "point": [10.1, 11.1] }, { "point": [10.2, 11.2] }
> ] },
>   "rectangle": { "rectangle": [{ "point": [5.1, 11.8] }, { "point": [87.6,
> 15.6548] } ] },
>   "polygon": { "polygon": [{ "point": [1.2, 1.3] },{ "point": [2.1, 2.5]
> },{ "point": [3.5, 3.6] },{ "point": [4.6, 4.8] }] },
>   "circle": { "circle": [10.1, { "point": [11.1, 10.2] } ] },
>   "binary": hex("ABCDEF0123456789"),
>   "uuid": uuid("5c848e5c-6b6a-498f-8452-8847a2957421")
> }
>
> Some observations and proposals:
>
> 1. The "JSON" serialization of the hex() and uuid() types are still broken
> (not even valid JSON).
>
> 2. IMHO the string, float, double, boolean, and record types are already
> serialized the way you would want in "clean JSON".
>
> 3. IMHO orderedList and unorderedList should be serialized as simple JSON
> arrays in "clean JSON".
>
> 4. The serializations of date, time, datetime, and duration, while valid
> JSON, are not very useful. It would be better if they were serialized as
> canonical date, time, or dateTime forms from XML Schema. In "clean JSON"
> they would be serialized simply as strings with that value. In "lossless
> JSON" they would be serialized as records as shown here, but with a string
> value, eg. { "date" : "-2011-01-27" }.
>
> 5. The serializations of int8/int16/int32/int64 should be serialized as
> straight JSON numbers in "clean JSON".
>
> 6. Interval types should be supported. I am open to suggestions as to how
> best to represent them in both "clean JSON" and "lossless JSON".
>
> 7. I'm really not sure what the best serialization of the spatial types
> would be in "clean JSON", but as a strawman, how about serializing all
> points as simple arrays of JSON numbers? Then line, rectangle, and polygon
> could either be an array of arrays, or else objects with names like
> "start"/"end" for line and rectangle and "point1", "point2", etc. for
> polygon. Circle, I think, should always be an object with the names
> "center" and "radius". So, in "clean JSON", the last few lines of the above
> query results would look like this:
>
>   "location2d" : [41.0, 44.0],
>   "location3d" : [44.0, 13.0, 41.0],
>   "line" : [ [10.1, 11.1], [10.2, 11.2] ],
>   "rectangle" : [ [5.1, 11.8], [87.6, 15.6548] ],
>   "polygon" : [ [1.2, 1.3], [2.1, 2.5], [3.5, 3.6], [4.6, 4.8] ],
>   "circle" : { "radius" : 10.1, "center" : [ 11.1, 10.2 ] },
>
> or like this:
>
>   "location2d" : [41.0, 44.0],
>   "location3d" : [44.0, 13.0, 41.0],
>   "line" : { "start" : [10.1, 11.1], "end" : [10.2, 11.2] },
>   "rectangle" : { "start" : [5.1, 11.8], "end" : [87.6, 15.6548] },
>   "polygon" : { "point1" : [1.2, 1.3], "point2" : [2.1, 2.5], "point3" :
> [3.5, 3.6], "point4" : [4.6, 4.8] },
>   "circle" : { "radius" : 10.1, "center" : [ 11.1, 10.2 ] },
>
> My preference would probably be the latter, just so that "circle" doesn't
> seem like such an odd duck and "line" and "rectangle" don't become
> ambiguous.
>
> (Aside: I think the current serialization of "circle" is broken; it seems
> to be scrambling the radius and the point values.)
>
> So there are a number of actions here even with the existing code, in
> addition to supporting the new clean JSON output. I also found some issues
> with the current AQL implementation and doc:
>
> A. ADM allows numeric serializations like 5550d for double and 12i8 for
> int8, but those are not valid in AQL it seems.
>
> B. AQL doesn't seem to have any constructors for intervals; you can only
> create them via functions like interval-from-date().
>
> (A) and (B) both basically mean that not all valid ADM can be read as AQL,
> which seems like it would be a desirable goal.
>
> C. The ADM doc doesn't mention the "point3d" type.
>
>
> Now accepting any input on the above, as well as the other issue about how
> to select this form of output via the HTTP interface!
>
> Ceej
> aka Chris Hillery
>
> On Wed, Aug 5, 2015 at 12:28 AM, Sattam Alsubaiee <salsubaiee@gmail.com>
> wrote:
>
> > HI Chris,
> >
> > Actually, it would be great if you can fix this since as you mentioned
> have
> > touched this part of the code.
> > Please confirm.
> >
> > Cheers,
> > Sattam
> >
> > On Wed, Aug 5, 2015 at 10:23 AM, Chris Hillery <chillery@lambda.nu>
> wrote:
> >
> > > I could take a look at this as well - it would be a natural extension
> of
> > > the work I did earlier to clean up the existing JSON output. It
> probably
> > > wouldn't be very difficult to do this in a relatively "dumb" way, but
> > there
> > > also is some amount of duplicated code between the various output
> formats
> > > and it would be tempting to try and tidy that up a bit as well.
> > >
> > > Three issues need to be addressed regardless of who does it or how:
> > >
> > > 1. We'd need to decide how to "strip down" all ADM types. In most
> numeric
> > > cases it's pretty clear. For spatial types, it deserves a little bit of
> > > thought. (It may be that the current "lossless" form is concise enough.
> > For
> > > example, the ADM instance { "foo" : point("5,5") } gets rendered in
> JSON
> > as
> > > { "foo" : { "point" : [5.0, 5.0] } } . Is there something that would be
> > > better?)
> > >
> > > 2. How would the user select this format vs. the current JSON form?
> When
> > > using the HTTP interface, the main way to select the returned
> > serialization
> > > is via the HTTP Accept: header, and you select the "lossless JSON" form
> > > with the MIME type application/json. If we have two different JSON
> > > serializations, we'd need to invent a new MIME type, or introduce some
> > kind
> > > of additional flag, or something.
> > >
> > > 3. When using the HTTP interface, the current lossless JSON is in fact
> > the
> > > default output type. Should that remain the case, or should the "lossy"
> > > JSON type be preferred?
> > >
> > > Ceej
> > > aka Chris Hillery
> > >
> > > On Wed, Aug 5, 2015 at 12:05 AM, Mike Carey <dtabass@gmail.com> wrote:
> > >
> > > > Cool.  Sattam + Wail are going to sign up to do this, I believe!
> >  (They
> > > > want/need it first....)
> > > >
> > > >
> > > > On 8/1/15 9:38 AM, Till Westmann wrote:
> > > >
> > > >> Only a few thoughts:
> > > >> 1) Yes, we should definitely have that!
> > > >> 2) For the non-numeric extended atomic types we should find a
> > reasonable
> > > >> string serialization and we need to provide functions to parse that
> > > >> serialization back to the extended atomic type (and I think that we
> > > already
> > > >> have that e.g. for the datetime types).
> > > >> 3) I think that we already had that discussion a few times (I
> remember
> > > >> arguing for it when I first joined the project) and it’s time to
do
> it
> > > :)
> > > >>
> > > >> Cheers,
> > > >> Till
> > > >>
> > > >> On Aug 1, 2015, at 9:17 AM, Mike Carey <dtabass@gmail.com> wrote:
> > > >>>
> > > >>> Hey - our JSON output format is currently designed to be non-lossy,
> > in
> > > >>> the sense that it encodes all the details of the source types
> (since
> > > ADM is
> > > >>> JSON++ and there's quite a bit in that ++ section).  We really
also
> > > need an
> > > >>> option for "normal application users" that's lossy but produces
the
> > > kind of
> > > >>> JSON that would be expected by consuming applications that "don't
> > > >>> appreciate" the many different kinds of numeric data, the existence
> > of
> > > >>> spatial data, etc.  I.e., it'd be nice to have a default lossy
> > > >>> serialization into JSON as well....  (Note that if someone doesn't
> > > want to
> > > >>> suffer the loss, they can always do their own out-conversions
of
> the
> > > data
> > > >>> in the return section of their AQL query to bridge the gap.)
> > Thoughts?
> > > >>>
> > > >>
> > > >>
> > > >
> > >
> >
>



-- 

*Regards,*
Wail Alkowaileet

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message