drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Altekruse (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-2783) Abbreviated Flatten syntax for multi-level nested elements
Date Tue, 05 May 2015 01:02:06 GMT

     [ https://issues.apache.org/jira/browse/DRILL-2783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jason Altekruse updated DRILL-2783:
-----------------------------------
    Description: 
The flatten function in Drill works great for any level of nesting, however the applying nested
flatten can complicate query syntax as the # of nested levels increase due to the need to
use subqueries, especially when the structure contains maps of repeated lists.
{code}
Consider JSON below
 {
   "Id":0,
   "level1":[
      {
         "Id1":1,
         "level2":[
            {
               "Id2":2,
               "level3":[
                  {
                     "Id3":3,
                     "Num":1
                  }
               ]
            }
         ]
      }
   ]
}
{code}

Below is a query we need to write to flatten the 3 levels.

select count(t2.td.id3) from 
         (select flatten(t1.tp.level3) as td from 
                  (select flatten(t.ts.level2) as tp from 
                      (select flatten(level1) as ts from    dfs.`default`.`/Users/nrentachintala/Downloads/flattenex.json`)
as t) as t1) as t2;

An abbreviated/intuitive syntax to apply flatten on maps with arrays would make working with
this kind of data much easier. 
For ex: 
The above query could potentially be rewritten as below.
select flatten(flatten(flatten(level1).level2).level3) as ts from    dfs.`default`.`/Users/nrentachintala/Downloads/flattenex.json`

  was:
The flatten function in Drill works great for any level of nesting, however the applying nested
flatten can complicate query syntax as the # of nested levels increase due to the need to
use subqueries, especially when the structure contains maps of repeated lists.

Consider JSON below
 {
   "Id":0,
   "level1":[
      {
         "Id1":1,
         "level2":[
            {
               "Id2":2,
               "level3":[
                  {
                     "Id3":3,
                     "Num":1
                  }
               ]
            }
         ]
      }
   ]
}

Below is a query we need to write to flatten the 3 levels.

select count(t2.td.id3) from 
         (select flatten(t1.tp.level3) as td from 
                  (select flatten(t.ts.level2) as tp from 
                      (select flatten(level1) as ts from    dfs.`default`.`/Users/nrentachintala/Downloads/flattenex.json`)
as t) as t1) as t2;

An abbreviated/intuitive syntax to apply flatten on maps with arrays would make working with
this kind of data much easier. 
For ex: 
The above query could potentially be rewritten as below.
select flatten(flatten(flatten(level1).level2).level3) as ts from    dfs.`default`.`/Users/nrentachintala/Downloads/flattenex.json`


> Abbreviated Flatten syntax for multi-level nested elements
> ----------------------------------------------------------
>
>                 Key: DRILL-2783
>                 URL: https://issues.apache.org/jira/browse/DRILL-2783
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - JSON
>    Affects Versions: 0.7.0
>            Reporter: Neeraja
>            Assignee: Steven Phillips
>             Fix For: Future
>
>
> The flatten function in Drill works great for any level of nesting, however the applying
nested flatten can complicate query syntax as the # of nested levels increase due to the need
to use subqueries, especially when the structure contains maps of repeated lists.
> {code}
> Consider JSON below
>  {
>    "Id":0,
>    "level1":[
>       {
>          "Id1":1,
>          "level2":[
>             {
>                "Id2":2,
>                "level3":[
>                   {
>                      "Id3":3,
>                      "Num":1
>                   }
>                ]
>             }
>          ]
>       }
>    ]
> }
> {code}
> Below is a query we need to write to flatten the 3 levels.
> select count(t2.td.id3) from 
>          (select flatten(t1.tp.level3) as td from 
>                   (select flatten(t.ts.level2) as tp from 
>                       (select flatten(level1) as ts from    dfs.`default`.`/Users/nrentachintala/Downloads/flattenex.json`)
as t) as t1) as t2;
> An abbreviated/intuitive syntax to apply flatten on maps with arrays would make working
with this kind of data much easier. 
> For ex: 
> The above query could potentially be rewritten as below.
> select flatten(flatten(flatten(level1).level2).level3) as ts from    dfs.`default`.`/Users/nrentachintala/Downloads/flattenex.json`



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message