drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Querying relatively big GeoJSON data
Date Mon, 11 Aug 2014 02:04:22 GMT
On Sun, Aug 10, 2014 at 5:29 PM, Luca Morandini <lmorandini@ieee.org> wrote:

> On 11/08/14 09:21, Ted Dunning wrote:
>
>> This looks like a single JSON object with lots of internal structure on a
>> single 250MB line.
>>
>
> As it should be: by definition, a JSON document is one Object (or an
> Array, which is an Object too).
>

Well, it is very common to have an envelope format which is not JSON.  This
allows many independent JSON objects to be stored in a file.

Typically, there are different technologies used to handle objects at
different levels of nesting.  It is not useful to store a database as a
single object ... better to have records which are themselves composed of
fields which can be objects.  How you get to that point is a bit of an open
question, but it is clear that the world consists of many objects and there
is little mileage in thinking of the universe using the same representation
as the molecule.


>
>  Without some external un-nesting Drill in the 0.4 preliminary release
>> isn't
>> going to be able to do much with this.
>>
>
> From another thread I gathered Drill's JSON support is ongoing: no
> reproach meant.
>

Indeed.

But I expect that the data that you have here can be profitably analyzed by
Drill already by allowing this bit of un-nesting and not being quite so
doctrinaire about JSON purity.

What kind and degree of un-nesting makes sense for your data is something
that you really would have to be the judge of.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message