drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sudheesh Katkam (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4102) Only one row found in a JSON document that contains multiple items.
Date Wed, 18 Nov 2015 05:31:11 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15010276#comment-15010276
] 

Sudheesh Katkam commented on DRILL-4102:
----------------------------------------

Drill supports json files of a [certain format|https://drill.apache.org/docs/json-data-model/#reading-json].
A simple change to the file allows for queries that you might be interested in:
{code}
{ 
  "keys" : 
  {
    "Key1":
      { 
        "htmltags": "<htmltag attr1='bravo' /><htmltag attr2='delta' /><htmltag
attr3='charlie' />" 
      },
    "Key2":
      { 
        "htmltags": "<htmltag attr1='kilo' /><htmltag attr2='lima' /><htmltag
attr3='mike' />"
      },
    "Key3":
      {
        "htmltags": "<htmltag attr1='november' /><htmltag attr2='foxtrot' /><htmltag
attr3='sierra' />"
      }
  }
}
{code}
Queries:
{code}
> select kvgen(keys) from dfs.`/root/data.json`;
+--------+
| EXPR$0 |
+--------+
| [{"key":"Key1","value":{"htmltags":"<htmltag attr1='bravo' /><htmltag attr2='delta'
/><htmltag attr3='charlie' />"}},{"key":"Key2","value":{"htmltags":"<htmltag attr1='kilo'
/><htmltag attr2='lima' /><htmltag attr3='mike' />"}},{"key":"Key3","value":{"htmltags":"<htmltag
attr1='november' /><htmltag attr2='foxtrot' /><htmltag attr3='sierra' />"}}]
|
+--------+

> select flatten(kvgen(keys)) from dfs.`/root/data.json`;
+--------------------------------------------------------------------------------------------------------------------------+
|                                                          EXPR$0                        
                                 |
+--------------------------------------------------------------------------------------------------------------------------+
| {"key":"Key1","value":{"htmltags":"<htmltag attr1='bravo' /><htmltag attr2='delta'
/><htmltag attr3='charlie' />"}}      |
| {"key":"Key2","value":{"htmltags":"<htmltag attr1='kilo' /><htmltag attr2='lima'
/><htmltag attr3='mike' />"}}           |
| {"key":"Key3","value":{"htmltags":"<htmltag attr1='november' /><htmltag attr2='foxtrot'
/><htmltag attr3='sierra' />"}}  |
+--------------------------------------------------------------------------------------------------------------------------+

> select t.r.key, t.r.`value` from (select flatten(kvgen(keys)) as r from dfs.`/root/data.json`)
t;
+---------+---------------------------------------------------------------------------------------------------+
| EXPR$0  |                                              EXPR$1                          
                    |
+---------+---------------------------------------------------------------------------------------------------+
| Key1    | {"htmltags":"<htmltag attr1='bravo' /><htmltag attr2='delta' /><htmltag
attr3='charlie' />"}      |
| Key2    | {"htmltags":"<htmltag attr1='kilo' /><htmltag attr2='lima' /><htmltag
attr3='mike' />"}           |
| Key3    | {"htmltags":"<htmltag attr1='november' /><htmltag attr2='foxtrot' /><htmltag
attr3='sierra' />"}  |
+---------+---------------------------------------------------------------------------------------------------+
{code}

> Only one row found in a JSON document that contains multiple items.
> -------------------------------------------------------------------
>
>                 Key: DRILL-4102
>                 URL: https://issues.apache.org/jira/browse/DRILL-4102
>             Project: Apache Drill
>          Issue Type: Bug
>         Environment: OS X, Drill embedded, v1.1.0 installed via HomeBrew
>            Reporter: aditya menon
>
> I tried to analyse a JSON file that had the following (sample) structure:
> {code:json}
> {
>     "Key1": {
>       "htmltags": "<htmltag attr1='bravo' /><htmltag attr2='delta' /><htmltag
attr3='charlie' />"
>     },
>     "Key2": {
>       "htmltags": "<htmltag attr1='kilo' /><htmltag attr2='lima' /><htmltag
attr3='mike' />"
>     },
>     "Key3": {
>       "htmltags": "<htmltag attr1='november' /><htmltag attr2='foxtrot' /><htmltag
attr3='sierra' />"
>     }
> }
> {code}
> (Apologies for the obfuscation, I am unable to publish the original dataset. But the
structure is exactly the same. Note especially how the keys and other data points *differ*
in some places, and remain identical in others.)
> When I run a {code:sql}SELECT * FROM DataFile.json{code} what I get is a single row listed
under three columns: {code:html}"<htmltag attr1='bravo' /><htmltag attr2='delta'
/><htmltag attr3='charlie' />"{code} [i.e., only the entry `Key1.htmltags`] .
> Ideally, I should see three rows, each with entries from Key1..Key3, listed under the
correct respective column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message