drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "aditya menon (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-4102) Only one row found in a JSON document that contains multiple items.
Date Tue, 17 Nov 2015 12:44:11 GMT
aditya menon created DRILL-4102:
-----------------------------------

             Summary: Only one row found in a JSON document that contains multiple items.
                 Key: DRILL-4102
                 URL: https://issues.apache.org/jira/browse/DRILL-4102
             Project: Apache Drill
          Issue Type: Bug
         Environment: OS X, Drill embedded, v1.1.0 installed via HomeBrew
            Reporter: aditya menon


I tried to analyse a JSON file that had the following (sample) structure:

```
{
    "Key1": {
      "htmltags": "<htmltag attr1='bravo' /><htmltag attr2='delta' /><htmltag
attr3='charlie' />"
    },
    "Key2": {
      "htmltags": "<htmltag attr1='kilo' /><htmltag attr2='lima' /><htmltag
attr3='mike' />"
    },
    "Key3": {
      "htmltags": "<htmltag attr1='november' /><htmltag attr2='foxtrot' /><htmltag
attr3='sierra' />"
    }
}
```

(Apologies for the obfuscation, I am unable to publish the original dataset. But the structure
is exactly the same. Note especially how the keys and other data points *differ* in some places,
and remain identical in others.)

When I run a `SELECT * FROM DataFile.json` what I get is a single row listed under three columns:
`"<htmltag attr1='bravo' /><htmltag attr2='delta' /><htmltag attr3='charlie'
/>"` [i.e., only the entry `Key1.htmltags`] .

Ideally, I should see three rows, each with entries from Key1..Key3, listed under the correct
respective column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message