asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hillery <chill...@hillery.land>
Subject Re: json vs. JSON
Date Wed, 05 Aug 2015 08:30:35 GMT
Sure, I think that shouldn't be too hard, given some help with the
questions I raised.

To start the discussion, I wrote a query that outputs all ADM types to show
how they are serialized to JSON (except the interval types, which throw a
NotImplementedException if you try to serialize them to JSON currently) :

{ "string": string("Nancy"),
  "float": 32.5f,
  "double" : double("-2013.5938237483274"),
  "boolean" : true,
  "int8": int8("125"),
  "int16": int16("32765"),
  "int32": int32("294967295"),
  "int64": int64("1700000000000000000"),
  "unorderedList": {{"reading","writing"}},
  "orderedList": ["Brad","Scott"],
  "record": {  "number": 8389, "street": "Hill St.", "city": "Mountain
View" },
  "date": date("-2011-01-27"),
  "time": time("12:20:30Z"),
  "datetime": datetime("-1951-12-27T12:20:30"),
  "duration": duration("P10Y11M12DT10H50M30S"),
  "location2d": point("41.00,44.00"),
  "location3d": point3d("44.00,13.00,41.00"),
  "line" : line("10.1,11.1 10.2,11.2"),
  "rectangle" : rectangle("5.1,11.8 87.6,15.6548"),
  "polygon" : polygon("1.2,1.3 2.1,2.5 3.5,3.6 4.6,4.8"),
  "circle" : circle("10.1,11.1 10.2"),
  "binary" : hex("ABCDEF0123456789"),
 "uuid": uuid("5c848e5c-6b6a-498f-8452-8847a2957421")
}

And here is how that gets serialized in "lossless JSON":

{ "string": "Nancy",
  "float": 32.5,
  "double": -2013.5938237483274,
  "boolean": true,
  "int8": { "int8": 125 },
  "int16": { "int16": 32765 },
  "int32": { "int32": 294967295 },
  "int64": { "int64": 1700000000000000000 },
  "unorderedList": { "unorderedlist": [ "reading", "writing" ] },
  "orderedList": { "orderedlist": [ "Brad", "Scott" ] },
  "record": { "number": { "int64": 8389 }, "street": "Hill St.", "city":
"Mountain View" },
  "date": { "date": -125625945600000},
  "time": { "time": 44430000},
  "datetime": { "datetime": -123703587570000},
  "duration": { "duration": { "months": 131, "millis": 1075830000} },
  "location2d": { "point": [41.0, 44.0] },
  "location3d": { "point3d": [44.0, 13.0, 41.0] },
  "line": { "line":  [ { "point": [10.1, 11.1] }, { "point": [10.2, 11.2] }
] },
  "rectangle": { "rectangle": [{ "point": [5.1, 11.8] }, { "point": [87.6,
15.6548] } ] },
  "polygon": { "polygon": [{ "point": [1.2, 1.3] },{ "point": [2.1, 2.5]
},{ "point": [3.5, 3.6] },{ "point": [4.6, 4.8] }] },
  "circle": { "circle": [10.1, { "point": [11.1, 10.2] } ] },
  "binary": hex("ABCDEF0123456789"),
  "uuid": uuid("5c848e5c-6b6a-498f-8452-8847a2957421")
}

Some observations and proposals:

1. The "JSON" serialization of the hex() and uuid() types are still broken
(not even valid JSON).

2. IMHO the string, float, double, boolean, and record types are already
serialized the way you would want in "clean JSON".

3. IMHO orderedList and unorderedList should be serialized as simple JSON
arrays in "clean JSON".

4. The serializations of date, time, datetime, and duration, while valid
JSON, are not very useful. It would be better if they were serialized as
canonical date, time, or dateTime forms from XML Schema. In "clean JSON"
they would be serialized simply as strings with that value. In "lossless
JSON" they would be serialized as records as shown here, but with a string
value, eg. { "date" : "-2011-01-27" }.

5. The serializations of int8/int16/int32/int64 should be serialized as
straight JSON numbers in "clean JSON".

6. Interval types should be supported. I am open to suggestions as to how
best to represent them in both "clean JSON" and "lossless JSON".

7. I'm really not sure what the best serialization of the spatial types
would be in "clean JSON", but as a strawman, how about serializing all
points as simple arrays of JSON numbers? Then line, rectangle, and polygon
could either be an array of arrays, or else objects with names like
"start"/"end" for line and rectangle and "point1", "point2", etc. for
polygon. Circle, I think, should always be an object with the names
"center" and "radius". So, in "clean JSON", the last few lines of the above
query results would look like this:

  "location2d" : [41.0, 44.0],
  "location3d" : [44.0, 13.0, 41.0],
  "line" : [ [10.1, 11.1], [10.2, 11.2] ],
  "rectangle" : [ [5.1, 11.8], [87.6, 15.6548] ],
  "polygon" : [ [1.2, 1.3], [2.1, 2.5], [3.5, 3.6], [4.6, 4.8] ],
  "circle" : { "radius" : 10.1, "center" : [ 11.1, 10.2 ] },

or like this:

  "location2d" : [41.0, 44.0],
  "location3d" : [44.0, 13.0, 41.0],
  "line" : { "start" : [10.1, 11.1], "end" : [10.2, 11.2] },
  "rectangle" : { "start" : [5.1, 11.8], "end" : [87.6, 15.6548] },
  "polygon" : { "point1" : [1.2, 1.3], "point2" : [2.1, 2.5], "point3" :
[3.5, 3.6], "point4" : [4.6, 4.8] },
  "circle" : { "radius" : 10.1, "center" : [ 11.1, 10.2 ] },

My preference would probably be the latter, just so that "circle" doesn't
seem like such an odd duck and "line" and "rectangle" don't become
ambiguous.

(Aside: I think the current serialization of "circle" is broken; it seems
to be scrambling the radius and the point values.)

So there are a number of actions here even with the existing code, in
addition to supporting the new clean JSON output. I also found some issues
with the current AQL implementation and doc:

A. ADM allows numeric serializations like 5550d for double and 12i8 for
int8, but those are not valid in AQL it seems.

B. AQL doesn't seem to have any constructors for intervals; you can only
create them via functions like interval-from-date().

(A) and (B) both basically mean that not all valid ADM can be read as AQL,
which seems like it would be a desirable goal.

C. The ADM doc doesn't mention the "point3d" type.


Now accepting any input on the above, as well as the other issue about how
to select this form of output via the HTTP interface!

Ceej
aka Chris Hillery

On Wed, Aug 5, 2015 at 12:28 AM, Sattam Alsubaiee <salsubaiee@gmail.com>
wrote:

> HI Chris,
>
> Actually, it would be great if you can fix this since as you mentioned have
> touched this part of the code.
> Please confirm.
>
> Cheers,
> Sattam
>
> On Wed, Aug 5, 2015 at 10:23 AM, Chris Hillery <chillery@lambda.nu> wrote:
>
> > I could take a look at this as well - it would be a natural extension of
> > the work I did earlier to clean up the existing JSON output. It probably
> > wouldn't be very difficult to do this in a relatively "dumb" way, but
> there
> > also is some amount of duplicated code between the various output formats
> > and it would be tempting to try and tidy that up a bit as well.
> >
> > Three issues need to be addressed regardless of who does it or how:
> >
> > 1. We'd need to decide how to "strip down" all ADM types. In most numeric
> > cases it's pretty clear. For spatial types, it deserves a little bit of
> > thought. (It may be that the current "lossless" form is concise enough.
> For
> > example, the ADM instance { "foo" : point("5,5") } gets rendered in JSON
> as
> > { "foo" : { "point" : [5.0, 5.0] } } . Is there something that would be
> > better?)
> >
> > 2. How would the user select this format vs. the current JSON form? When
> > using the HTTP interface, the main way to select the returned
> serialization
> > is via the HTTP Accept: header, and you select the "lossless JSON" form
> > with the MIME type application/json. If we have two different JSON
> > serializations, we'd need to invent a new MIME type, or introduce some
> kind
> > of additional flag, or something.
> >
> > 3. When using the HTTP interface, the current lossless JSON is in fact
> the
> > default output type. Should that remain the case, or should the "lossy"
> > JSON type be preferred?
> >
> > Ceej
> > aka Chris Hillery
> >
> > On Wed, Aug 5, 2015 at 12:05 AM, Mike Carey <dtabass@gmail.com> wrote:
> >
> > > Cool.  Sattam + Wail are going to sign up to do this, I believe!
>  (They
> > > want/need it first....)
> > >
> > >
> > > On 8/1/15 9:38 AM, Till Westmann wrote:
> > >
> > >> Only a few thoughts:
> > >> 1) Yes, we should definitely have that!
> > >> 2) For the non-numeric extended atomic types we should find a
> reasonable
> > >> string serialization and we need to provide functions to parse that
> > >> serialization back to the extended atomic type (and I think that we
> > already
> > >> have that e.g. for the datetime types).
> > >> 3) I think that we already had that discussion a few times (I remember
> > >> arguing for it when I first joined the project) and it’s time to do it
> > :)
> > >>
> > >> Cheers,
> > >> Till
> > >>
> > >> On Aug 1, 2015, at 9:17 AM, Mike Carey <dtabass@gmail.com> wrote:
> > >>>
> > >>> Hey - our JSON output format is currently designed to be non-lossy,
> in
> > >>> the sense that it encodes all the details of the source types (since
> > ADM is
> > >>> JSON++ and there's quite a bit in that ++ section).  We really also
> > need an
> > >>> option for "normal application users" that's lossy but produces the
> > kind of
> > >>> JSON that would be expected by consuming applications that "don't
> > >>> appreciate" the many different kinds of numeric data, the existence
> of
> > >>> spatial data, etc.  I.e., it'd be nice to have a default lossy
> > >>> serialization into JSON as well....  (Note that if someone doesn't
> > want to
> > >>> suffer the loss, they can always do their own out-conversions of the
> > data
> > >>> in the return section of their AQL query to bridge the gap.)
> Thoughts?
> > >>>
> > >>
> > >>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message