incubator-drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luca Morandini <lmorand...@ieee.org>
Subject Re: Querying GeoJSON
Date Mon, 11 Aug 2014 02:16:01 GMT
On 11/08/14 02:52, Timothy Chen wrote:
> Hi Luca,
>
> Currently Drill supports the same way MongoDB inserts Json records,
> which is each Json object is seperated by newlines,
>
> { "type": "Feature", "geometry".....}
> { "type": "Feature", "geometry".....}

I see: this is one of the way mongoimport works (the other being with an array of 
Objects, hence proper JSON).


> It's possible we can extend our options like MongoDB does
> (http://zaiste.net/2012/08/importing_json_into_mongodb/)
>
> either expecting it in a array or comma seperated and read the json
> files through some added options.

Indeed.


> I see Yash already filed a JIRA, feel free to contribute as well.

I think it is worth considering the wider picture here.

A JSON document, at its top level, is either an Array or another type of Object 
(well, according to http://tools.ietf.org/html/rfc7158 it can be a value as well, 
but this is beside the point I suppose); I think it would be safe to assume Drill 
equates an Object (not Arrays) with a tuple, and Arrays as a vector of elements 
having the same type.

The problem with this is defining what a tuple is:
1) Shall {"total_rows":2,"offset":0,"rows":[{"id":1}, {"id":2}]} be considered a 
tuple, or a table-like structure containing 2 tuples  (incidentally, this is what 
a query to CouchDB would return) ?
2) Can Arrays be heterogeneous (in JSON nothing prevents that) ?

To simplify things, Drill may adopt a subset of JSON, with homogeneous Arrays, and 
accept a finite number of input file formats... but this has to be explicitly stated.

Regards,

Luca Morandini
Data Architect - AURIN project
Melbourne eResearch Group
Department of Computing and Information Systems
University of Melbourne
Tel. +61 03 903 58 380
Skype: lmorandini


Mime
View raw message