drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Altekruse (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-2241) CTAS fails when writing a repeated list
Date Mon, 16 Mar 2015 23:34:39 GMT

    [ https://issues.apache.org/jira/browse/DRILL-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364216#comment-14364216

Jason Altekruse commented on DRILL-2241:

I actually did not realize this when working on the parquet reader/writer, parquet doesn't
support a list directly nested inside of another, we need to change the schema to add a column
between the two lists (even if every time it is called something like "inner_list").

in the parquet file the records would have to look something like this:

{ "a" : null "b" : [ { "inner_list" : ["B1", "B2"] ] }, } 

And we have to translate this format into a plain nested list if we know that we wrote this
transformed data.

To make this work seamlessly with JSON we would need to have this kind of transformation happen
in the background when writing/reading. This unfortunately opens up the territory of us being
like all of the other object models that have to map between their representations and the
parquet object model, we will need to discuss the priority of this. We may need to just say
this is unsupported in parquet for now.

> CTAS fails when writing a repeated list
> ---------------------------------------
>                 Key: DRILL-2241
>                 URL: https://issues.apache.org/jira/browse/DRILL-2241
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>    Affects Versions: 0.8.0
>            Reporter: Abhishek Girish
>            Assignee: Jason Altekruse
>            Priority: Blocker
>             Fix For: 0.9.0
>         Attachments: drillbit_replist.log
> Drill can read the following JSON file with a repeated list:
> {
>   "a" : null
>   "b" : [ ["B1", "B2"] ],
> }
> Writing this to Parquet via a simple CTAS fails. 
> > create table temp as select * from `replist.json`;
> Log indicates this to be unsupported (UnsupportedOperationException: Unsupported type
> Log attached. 

This message was sent by Atlassian JIRA

View raw message