drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Altekruse (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-1559) Writing to JSON from Parquet throws error when the Parquet file is created from JSON
Date Wed, 12 Nov 2014 00:51:34 GMT

    [ https://issues.apache.org/jira/browse/DRILL-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207439#comment-14207439
] 

Jason Altekruse commented on DRILL-1559:
----------------------------------------

https://reviews.apache.org/r/27891/

> Writing to JSON from Parquet throws error when the Parquet file is created from JSON
> ------------------------------------------------------------------------------------
>
>                 Key: DRILL-1559
>                 URL: https://issues.apache.org/jira/browse/DRILL-1559
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - JSON
>    Affects Versions: 0.6.0
>            Reporter: Abhishek Girish
>            Assignee: Chris Westin
>             Fix For: 0.7.0
>
>         Attachments: DRILL-1559.patch
>
>
> Succeeds: 
> > alter session set `store.format` = 'parquet';
> +------------+------------+
> |     ok     |  summary   |
> +------------+------------+
> | true       | store.format updated. |
> +------------+------------+
> 1 row selected (0.038 seconds)
> > create table `yelp_academic_dataset_review_parquet` as select * from `yelp_academic_dataset_review.json`;
> +------------+---------------------------+
> |  Fragment  | Number of records written |
> +------------+---------------------------+
> | 0_0        | 1125458                   |
> +------------+---------------------------+
> 1 row selected (163.893 seconds)
> $ hadoop fs -ls /jsondata/yelp_academic_dataset_review_parquet
> Found 2 items
> -rwxr-xr-x   3 mapr mapr  535544902 2014-10-20 17:08 /jsondata/yelp_academic_dataset_review_parquet/0_0_0.parquet
> -rwxr-xr-x   3 mapr mapr   29696406 2014-10-20 17:09 /jsondata/yelp_academic_dataset_review_parquet/0_0_1.parquet
> Fails:
> > alter session set `store.format` = 'json';
> +------------+------------+
> |     ok     |  summary   |
> +------------+------------+
> | true       | store.format updated. |
> +------------+------------+
> 1 row selected (0.033 seconds)
> > create table `yelp_academic_dataset_review_json` as select * from yelp_academic_dataset_review_parquet;
> Query failed: Failure while running fragment. Schema is currently null.  You must call
buildSchema(SelectionVectorMode) before this container can return a schema. [b96dc570-77f2-46db-b9e6-8215e2062b15]
> $ hadoop fs -ls /jsondata/yelp_academic_dataset_review_json
> Found 2 items
> -rwxr-xr-x   3 root root   54493549 2014-10-20 17:10 /jsondata/yelp_academic_dataset_review_json/1_0_0.json
> -rwxr-xr-x   3 mapr mapr   37305528 2014-10-20 17:10 /jsondata/yelp_academic_dataset_review_json/1_1_0.json
> Querying the newly created JSON file succeeds:
> > select * from yelp_academic_dataset_review_json limit 1;
> +------------+------------+------------+------------+------------+------------+------------+-------------+
> |   votes    |  user_id   | review_id  |   stars    |    date    |    text    |    type
   | business_id |
> +------------+------------+------------+------------+------------+------------+------------+-------------+
> | {"funny":0,"useful":2,"cool":1} | Xqd0DzHaiyRqVH3WRG7hzg | 15SdjuK7DmYqUAj6rjGowg |
5          | 2007-05-17 | dr. goldberg offers everything i look for in a general practitioner
|
> +------------+------------+------------+------------+------------+------------+------------+-------------+
> 1 row selected (0.078 seconds)
> LOG entry:
> 2014-10-20 17:10:47,785 [cbccfeb9-a235-4ea7-9bcc-56d35daf4827:frag:1:0] ERROR o.a.d.e.w.f.AbstractStatusReporter
- Error de3eb523-3924-4941-8cf4-eb7a71a2df2d: Failure while running fragment.
> java.lang.NullPointerException: Schema is currently null.  You must call buildSchema(SelectionVectorMode)
before this container can return a schema.
>         at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:208)
~[guava-14.0.1.jar:na]
>         at org.apache.drill.exec.record.VectorContainer.getSchema(VectorContainer.java:220)
~[drill-java-exec-0.6.0-incubating-SNAPSHOT-rebuffed.jar:0.6.0-incubating-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.getSchema(AbstractRecordBatch.java:115)
~[drill-java-exec-0.6.0-incubating-SNAPSHOT-rebuffed.jar:0.6.0-incubating-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.getSchema(IteratorValidatorBatchIterator.java:74)
~[drill-java-exec-0.6.0-incubating-SNAPSHOT-rebuffed.jar:0.6.0-incubating-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:101)
~[drill-java-exec-0.6.0-incubating-SNAPSHOT-rebuffed.jar:0.6.0-incubating-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:57)
~[drill-java-exec-0.6.0-incubating-SNAPSHOT-rebuffed.jar:0.6.0-incubating-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:104)
~[drill-java-exec-0.6.0-incubating-SNAPSHOT-rebuffed.jar:0.6.0-incubating-SNAPSHOT]
>         at org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:250)
[drill-java-exec-0.6.0-incubating-SNAPSHOT-rebuffed.jar:0.6.0-incubating-SNAPSHOT]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[na:1.7.0_65]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_65]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message