drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Luca Morandini (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-1279) Drill does not provide a way to unflatten a single large json record into sub records
Date Mon, 11 Aug 2014 22:10:12 GMT

    [ https://issues.apache.org/jira/browse/DRILL-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093418#comment-14093418
] 

Luca Morandini commented on DRILL-1279:
---------------------------------------

As a user, my main sources of JSON data are in the following format:
# GeoJSON: http://geojson.org/geojson-spec.html (which is an Object with some metadata properties
and all the tuples in an Array );
{code}
{ "type": "FeatureCollection",
    "features": [
      { "type": "Feature",
        "geometry": {"type": "Point", "coordinates": [102.0, 0.5]},
        "properties": {"prop0": "value0"}
        },
      { "type": "Feature",
        "geometry": {
          "type": "LineString",
          "coordinates": [
            [102.0, 0.0], [103.0, 1.0], [104.0, 0.0], [105.0, 1.0]
            ]
          },
        "properties": {
          "prop0": "value0",
          "prop1": 0.0
          }
        },
      { "type": "Feature",
         "geometry": {
           "type": "Polygon",
           "coordinates": [
             [ [100.0, 0.0], [101.0, 0.0], [101.0, 1.0],
               [100.0, 1.0], [100.0, 0.0] ]
             ]
         },
         "properties": {
           "prop0": "value0",
           "prop1": {"this": "that"}
           }
         }
       ]
     }
{code}
# The output of CouchDB views (again, one big Object containing both metadata and an array
of tuples)
{code}
{"total_rows":5,"offset":0,"rows":[
{"id":"22222","key":["hello",0],"value":null,
  "doc":{"_id":"22222","_rev":"1-0eee81fecb5aa4f51e285c621271ff02","ancestors":["11111"],"value":"hello"}},
{"id":"22222","key":["hello",1],"value":{"_id":"11111"},
  "doc":{"_id":"11111","_rev":"1-967a00dff5e02add41819138abb3284d"}},
{"id":"33333","key":["world",0],"value":null,
  "doc":{"_id":"33333","_rev":"1-11e42b44fdb3d3784602eca7c0332a43","ancestors":["22222","11111"],"value":"world"}},
{"id":"33333","key":["world",1],"value":{"_id":"22222"},
  "doc":{"_id":"22222","_rev":"1-0eee81fecb5aa4f51e285c621271ff02","ancestors":["11111"],"value":"hello"}},
{"id":"33333","key":["world",2],"value":{"_id":"11111"},
  "doc":{"_id":"11111","_rev":"1-967a00dff5e02add41819138abb3284d"}}
]}
{code}
Please, note the heterogeneous "Key" Array.

> Drill does not provide a way to unflatten a single large json record into sub records
> -------------------------------------------------------------------------------------
>
>                 Key: DRILL-1279
>                 URL: https://issues.apache.org/jira/browse/DRILL-1279
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - JSON
>            Reporter: Yash Sharma
>            Assignee: Neeraja
>             Fix For: Future
>
>
> Error while executing a query on Geo JSON data.
> {quote}
> select t.features[0].properties.name from dfs.`/opt/drill/sample-data/geo1.json` t;
> Query failed: Failure while running fragment. Resetting to invalid mark [41a66a4a-b8c2-4fc6-a7fc-f1c76e312f32]
> Error: exception while executing query: Failure while trying to get next result batch.
(state=,code=0)
> {quote}
> Data can be located at:
> https://drive.google.com/file/d/0B7bWVX3BL3wrUGRKaHRyRTFLV2c/edit?usp=sharing
> The data is a valid JSON data (~250 Megs) in a single line.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message