drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject RE: Convert CSV to nested JSON
Date Tue, 19 Sep 2017 00:14:24 GMT
What we really need is a list aggregator.

The would make this a snap.

On Sep 18, 2017 4:38 PM, "Kunal Khatua" <kkhatua@mapr.com> wrote:

> I've been looking at a way to use existing benchmarks converted into a
> complex json document.
>
> Take for example TPCH benchmark, which has PKey-FKey relations.
>
> So for a JSON output for a query like this:
> 0: jdbc:drill:schme=dfs.tpchDri1000> select  r.r_NAME, n.n_NAME
> . . . . . . . . . . . . . . . . . .>   , r.r_REGIONKEY
> . . . . . . . . . . . . . . . . . .>   , n.n_NATIONKEY
> . . . . . . . . . . . . . . . . . .>   , n.n_REGIONKEY
> . . . . . . . . . . . . . . . . . .> from nation n,region r
> . . . . . . . . . . . . . . . . . .> where n.n_regionkey = r.r_regionkey
> . . . . . . . . . . . . . . . . . .> order by r.r_NAME, n.n_NAME;
> +--------------+-----------------+--------------+-----------
> ---+--------------+
> |    r_NAME    |     n_NAME      | r_REGIONKEY  | n_NATIONKEY  |
> n_REGIONKEY  |
> +--------------+-----------------+--------------+-----------
> ---+--------------+
> | AFRICA       | ALGERIA         | 0            | 0            | 0
>     |
> | AFRICA       | ETHIOPIA        | 0            | 5            | 0
>     |
> | AFRICA       | KENYA           | 0            | 14           | 0
>     |
> | AFRICA       | MOROCCO         | 0            | 15           | 0
>     |
> | AFRICA       | MOZAMBIQUE      | 0            | 16           | 0
>     |
> | AMERICA      | ARGENTINA       | 1            | 1            | 1
>     |
> | AMERICA      | BRAZIL          | 1            | 2            | 1
>     |
> | AMERICA      | CANADA          | 1            | 3            | 1
>     |
> | AMERICA      | PERU            | 1            | 17           | 1
>     |
> | AMERICA      | UNITED STATES   | 1            | 24           | 1
>     |
> | ASIA         | CHINA           | 2            | 18           | 2
>     |
> | ASIA         | INDIA           | 2            | 8            | 2
>     |
> | ASIA         | INDONESIA       | 2            | 9            | 2
>     |
> | ASIA         | JAPAN           | 2            | 12           | 2
>     |
> | ASIA         | VIETNAM         | 2            | 21           | 2
>     |
> | EUROPE       | FRANCE          | 3            | 6            | 3
>     |
> | EUROPE       | GERMANY         | 3            | 7            | 3
>     |
> | EUROPE       | ROMANIA         | 3            | 19           | 3
>     |
> | EUROPE       | RUSSIA          | 3            | 22           | 3
>     |
> | EUROPE       | UNITED KINGDOM  | 3            | 23           | 3
>     |
> | MIDDLE EAST  | EGYPT           | 4            | 4            | 4
>     |
> | MIDDLE EAST  | IRAN            | 4            | 10           | 4
>     |
> | MIDDLE EAST  | IRAQ            | 4            | 11           | 4
>     |
> | MIDDLE EAST  | JORDAN          | 4            | 13           | 4
>     |
> | MIDDLE EAST  | SAUDI ARABIA    | 4            | 20           | 4
>     |
> +--------------+-----------------+--------------+-----------
> ---+--------------+
> 25 rows selected (0.519 seconds)
>
> I'm wondering if I could get, say, 5 documents representing the 5 regions
> and the nested structure within that representing the countries.
>
> Not the best usecase, I agree... but to distil it down to a simple
> question, what I'm asking is whether there is a value in having some series
> of simple steps that would reverse how that a JSON doc can be "flattened"
> to a CSV format.
>
> It can't be as simple as just using an un-flatten operator, but close
> enough. For e.g., I could have the data defined by defining the nesting
> based on the ORDER BY operator, so that the final writer can stream through
> the output and create the nested document accordingly.
>
> Just wondering the value of something like this.
>
>
> -----Original Message-----
> From: rahul challapalli [mailto:challapallirahul@gmail.com]
> Sent: Monday, September 18, 2017 4:02 PM
> To: dev <dev@drill.apache.org>
> Subject: Re: Convert CSV to nested JSON
>
> Can you give an example? Converting CSV into nested JSON does not make
> sense to me.
>
> On Mon, Sep 18, 2017 at 3:54 PM, Ted Dunning <ted.dunning@gmail.com>
> wrote:
>
> > What is the ultimate purpose here?
> >
> >
> >
> > On Mon, Sep 18, 2017 at 3:21 PM, Kunal Khatua <kkhatua@mapr.com> wrote:
> >
> > > I'm curious about whether there are any implementations of
> > > converting CSV to a nested JSON format  "automagically".
> > >
> > > Within Drill, I know that the CTAS route will basically convert each
> > > row into a JSON document with depth=1, which is pretty much an obese
> > > CSV data format.
> > >
> > > Is it worth having something like this, or is it too hard a problem
> > > that it's best that users explicitly define and write the documents?
> > >
> > > ~ Kunal
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message