asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Till Westmann" <ti...@apache.org>
Subject Re: json vs. JSON
Date Thu, 13 Aug 2015 05:26:50 GMT

On 12 Aug 2015, at 10:15, Chris Hillery wrote:

> So I don't think we've reached anything like a consensus here 
> regarding
> spatial types. I'll restate my opinion that coercing them into any of 
> the
> complex specifications that have been mentioned (GeoJSON, arcgis, even
> Well-Known Text) is inappropriate for serialization. Also, most of 
> those
> specifications are even more complex than the "lossless JSON" 
> serialization
> we already have, so would be doubly inappropriate for our "clean JSON"
> variant.
>
> That's what I don't think should be done, so what do I think should be
> done? Well, the whole purpose of this exercise as initially suggested 
> by
> Mike was to allow a form of JSON output that was more like what 
> someone
> consuming this *as JSON* would expect. To me that means that the 
> format
> should be something that (a) is useful for downstream JSON tools, 
> while (b)
> being as simply-structured as possible. Also, a non-goal in my mind is 
> that
> this output format be able to be returned to its original ADM form; it 
> is
> explicitly "lossy" in that sense.
>
> Till suggested that the rule should be that all atomic ADM types got
> serialized as atomic JSON, generally by creating a string 
> representation of
> the data. That works nicely for numerics as well as things that are
> basically strings anyway, such as UUID and Hex. It also suggests an 
> obvious
> way to handle date/time/duration types since there is something of a 
> global
> standard string representation for those.
>
> However, upon thinking about it, I don't think that's the simplest nor 
> the
> most useful way for us to represent spatial types. The best we could 
> do
> there would be something not entirely unlike a dramatic subset of
> Well-Known Text, eg. "POINT (30 10)". While that arguably meets 
> criterion
> (b) above, it definitely doesn't meet (a) since any downstream
> JSON-accepting tool is going to have to do non-JSON string processing 
> to
> extract the actual meaning. I also come back again to the problem that
> Circle cannot be represented unless we create a non-standard extension 
> to
> WKT.

I really would like to get to a consistent set of rules on how we 
serialize ADM instances to JSON.
My proposal for those rules is:

1) structures are represented by JSON structures (objects and arrays)
2) values are represented by JSON values (string, number)
3) types that are not numeric are represented by a widely supported 
string representation.

And I think that those rules make sense. When consuming some JSON, the 
JSON parser natively supports the JSON structures and values. And if 
someone works is a specific domain (e.g. spatial) they probably have a 
parser for the widely supported string representation that they can use 
to parse the string value that they got from the JSON parser.
If we invent our own structured representation, we might make things a 
little easier for people who manually craft their application for he 
first time, but we make it harder for people who are already working in 
the domain and want to use AsterixDB to store their data.

Also, if our support for spatial types differs significantly from the 
"usual" support, we should consider if we doing the right thing. I think 
that we don't want to tell people dealing with spatial data how to do 
it. I'd like to support them by providing the right infrastructure.

Unfortunately I don't really have the right expertise on the subject and 
if nobody else in the project has it, I think that we should at least 
try to find it somewhere else.
Maybe we can find someone in the Apache SIS project 
(http://sis.apache.org) .
Looking at their PMC, Chris Mattmann is on the roster, so he might be 
able to tell us or to point us to the right people.

> After considering the various things that have been discussed, I've 
> gotta
> be honest: I still like my original proposal the best. It's a concise 
> but
> usable consolidation of the data represented in ADM, which best I can 
> tell
> is what we're looking to implement.
>
> "location2d" : [41.0, 44.0],
> "location3d" : [44.0, 13.0, 41.0],
> "line" : [ [10.1, 11.1], [10.2, 11.2] ],
> "rectangle" : [ [5.1, 11.8], [87.6, 15.6548] ],
> "polygon" : [ [1.2, 1.3], [2.1, 2.5], [3.5, 3.6], [4.6, 4.8] ],
> "circle" : { "radius" : 10.1, "center" : [ 11.1, 10.2 ] },
>
> I'm not entirely happy that circle gets rendered as as an object; 
> something
> like  "circle": [ [11.1, 10.2], 10.1 ] could work too. Or, if 
> necessary,
> all shapes (not points) could be rendered as objects as per my 
> secondary
> proposal.

The things about this format is, that it's really difficult to see (for 
humans or parsers) what spatial types are represented by these nested 
arrays.

My 2c,
Till

>
> On Fri, Aug 7, 2015 at 11:08 PM, Mike Carey <dtabass@gmail.com> wrote:
>
>> I am willing to retract my proposal... :-)
>> (Consider it retracted; I agree with Ted Dunning's comment, and 
>> similar
>> comments by others.)
>>
>>
>> On 8/7/15 10:35 PM, Chen Li wrote:
>>
>>> In today's weekly meeting Mike mentioned the idea of getting rid of
>>> the "circle" data type.  It will be good to have a F2F discussion
>>> before we make the final decision.
>>>
>>> Chen
>>>
>>
>>

Mime
View raw message