drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kunal Khatua <kkha...@mapr.com>
Subject RE: Convert CSV to nested JSON
Date Mon, 18 Sep 2017 23:37:53 GMT
I've been looking at a way to use existing benchmarks converted into a complex json document.

Take for example TPCH benchmark, which has PKey-FKey relations. 

So for a JSON output for a query like this:
0: jdbc:drill:schme=dfs.tpchDri1000> select  r.r_NAME, n.n_NAME 
. . . . . . . . . . . . . . . . . .>   , r.r_REGIONKEY 
. . . . . . . . . . . . . . . . . .>   , n.n_NATIONKEY 
. . . . . . . . . . . . . . . . . .>   , n.n_REGIONKEY 
. . . . . . . . . . . . . . . . . .> from nation n,region r 
. . . . . . . . . . . . . . . . . .> where n.n_regionkey = r.r_regionkey
. . . . . . . . . . . . . . . . . .> order by r.r_NAME, n.n_NAME;
+--------------+-----------------+--------------+--------------+--------------+
|    r_NAME    |     n_NAME      | r_REGIONKEY  | n_NATIONKEY  | n_REGIONKEY  |
+--------------+-----------------+--------------+--------------+--------------+
| AFRICA       | ALGERIA         | 0            | 0            | 0            |
| AFRICA       | ETHIOPIA        | 0            | 5            | 0            |
| AFRICA       | KENYA           | 0            | 14           | 0            |
| AFRICA       | MOROCCO         | 0            | 15           | 0            |
| AFRICA       | MOZAMBIQUE      | 0            | 16           | 0            |
| AMERICA      | ARGENTINA       | 1            | 1            | 1            |
| AMERICA      | BRAZIL          | 1            | 2            | 1            |
| AMERICA      | CANADA          | 1            | 3            | 1            |
| AMERICA      | PERU            | 1            | 17           | 1            |
| AMERICA      | UNITED STATES   | 1            | 24           | 1            |
| ASIA         | CHINA           | 2            | 18           | 2            |
| ASIA         | INDIA           | 2            | 8            | 2            |
| ASIA         | INDONESIA       | 2            | 9            | 2            |
| ASIA         | JAPAN           | 2            | 12           | 2            |
| ASIA         | VIETNAM         | 2            | 21           | 2            |
| EUROPE       | FRANCE          | 3            | 6            | 3            |
| EUROPE       | GERMANY         | 3            | 7            | 3            |
| EUROPE       | ROMANIA         | 3            | 19           | 3            |
| EUROPE       | RUSSIA          | 3            | 22           | 3            |
| EUROPE       | UNITED KINGDOM  | 3            | 23           | 3            |
| MIDDLE EAST  | EGYPT           | 4            | 4            | 4            |
| MIDDLE EAST  | IRAN            | 4            | 10           | 4            |
| MIDDLE EAST  | IRAQ            | 4            | 11           | 4            |
| MIDDLE EAST  | JORDAN          | 4            | 13           | 4            |
| MIDDLE EAST  | SAUDI ARABIA    | 4            | 20           | 4            |
+--------------+-----------------+--------------+--------------+--------------+
25 rows selected (0.519 seconds)

I'm wondering if I could get, say, 5 documents representing the 5 regions and the nested structure
within that representing the countries. 

Not the best usecase, I agree... but to distil it down to a simple question, what I'm asking
is whether there is a value in having some series of simple steps that would reverse how that
a JSON doc can be "flattened" to a CSV format. 

It can't be as simple as just using an un-flatten operator, but close enough. For e.g., I
could have the data defined by defining the nesting based on the ORDER BY operator, so that
the final writer can stream through the output and create the nested document accordingly.


Just wondering the value of something like this.


-----Original Message-----
From: rahul challapalli [mailto:challapallirahul@gmail.com] 
Sent: Monday, September 18, 2017 4:02 PM
To: dev <dev@drill.apache.org>
Subject: Re: Convert CSV to nested JSON

Can you give an example? Converting CSV into nested JSON does not make sense to me.

On Mon, Sep 18, 2017 at 3:54 PM, Ted Dunning <ted.dunning@gmail.com> wrote:

> What is the ultimate purpose here?
>
>
>
> On Mon, Sep 18, 2017 at 3:21 PM, Kunal Khatua <kkhatua@mapr.com> wrote:
>
> > I'm curious about whether there are any implementations of 
> > converting CSV to a nested JSON format  "automagically".
> >
> > Within Drill, I know that the CTAS route will basically convert each 
> > row into a JSON document with depth=1, which is pretty much an obese 
> > CSV data format.
> >
> > Is it worth having something like this, or is it too hard a problem 
> > that it's best that users explicitly define and write the documents?
> >
> > ~ Kunal
> >
> >
>
Mime
View raw message