Return-Path: X-Original-To: apmail-drill-issues-archive@minotaur.apache.org Delivered-To: apmail-drill-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C0DC211CCC for ; Thu, 24 Jul 2014 18:17:02 +0000 (UTC) Received: (qmail 98235 invoked by uid 500); 24 Jul 2014 18:17:02 -0000 Delivered-To: apmail-drill-issues-archive@drill.apache.org Received: (qmail 98210 invoked by uid 500); 24 Jul 2014 18:17:02 -0000 Mailing-List: contact issues-help@drill.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.incubator.apache.org Delivered-To: mailing list issues@drill.incubator.apache.org Received: (qmail 98201 invoked by uid 99); 24 Jul 2014 18:17:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Jul 2014 18:17:02 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.3] (HELO mail.apache.org) (140.211.11.3) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 24 Jul 2014 18:16:59 +0000 Received: (qmail 95583 invoked by uid 99); 24 Jul 2014 18:16:39 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Jul 2014 18:16:39 +0000 Date: Thu, 24 Jul 2014 18:16:39 +0000 (UTC) From: "Amit Katti (JIRA)" To: issues@drill.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (DRILL-1058) Unable to read or write nested/repeated data in PARQUET format MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/DRILL-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amit Katti updated DRILL-1058: ------------------------------ Description: ================================================= DRILL WRITING A PARQUET TABLE WITH NESTED DATA ================================================= I have a JSON file with nested data (schema present below): {"rownum":1,"name":"fred ovid","age":76,"gpa":1.55,"studentnum":692315658449,"create_time":"2014-05-27 00:26:07", "interests": [ "Reading", "Mountain Biking", "Hacking" ]} I am able to read this JSON file successfully from drill and access nested values. However when I try to import this data and create a table in PARQUET format, it errors: QUERY: create table test as select * from `/user/root/sample-data/nested_student.json`; ERROR: Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while running query.[error_id: "3ce3dc1e-d920-4262-ae2d-28bd2d034597" endpoint { address: "perfnode154.perf.lab" user_port: 31010 control_port: 31011 data_port: 31012 } error_type: 0 message: "Failure while running fragment. < ParquetEncodingException:[ error starting field interests at 6 ] < ClassCastException:[ parquet.io.PrimitiveColumnIO cannot be cast to parquet.io.GroupColumnIO ]" ] Error: exception while executing query (state=,code=0) {code} 2014-06-24 00:41:18,646 [b10db58d-8d4d-4d02-9fb5-a5081e5cb254:frag:0:0] ERROR o.a.d.e.w.f.AbstractStatusReporter - Error 48602de2-8306-47d2-875f-8ad2cd2e964a: Failure while running fragment. java.lang.ClassCastException: parquet.io.PrimitiveColumnIO cannot be cast to parquet.io.GroupColumnIO at parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.startField(MessageColumnIO.java:171) ~[parquet-column-1.5.0-20140513.004024-1.jar:na] at org.apache.drill.exec.store.ParquetOutputRecordWriter.addRepeatedVarCharHolder(ParquetOutputRecordWriter.java:761) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.store.EventBasedRecordWriter$RepeatedVarCharFieldWriter.writeField(EventBasedRecordWriter.java:1156) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.store.EventBasedRecordWriter.write(EventBasedRecordWriter.java:150) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext(WriterRecordBatch.java:111) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:91) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:72) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:65) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:45) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:94) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:91) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:56) ~[drill-java-exec-1.0.0-m2-incubat ing-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:85) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:46) ~[drill-java-exec-1.0.0-m2-incubat ing-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:100) ~[drill-java-exec-1.0.0-m2 -incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] {code} ================================================= DRILL READING A PARQUET TABLE WITH NESTED DATA ================================================= I generated a parquet file by reading the below Json file into pig and storing it in a parquet format: {"recipe":"Tacos","ingredients":[{"name":"Beef"},{"name":"Lettuce"},{"name":"Cheese"}],"inventor":{"name":"Alex","age":25}} {"recipe":"TomatoSoup","ingredients":[{"name":"Tomatoes"},{"name":"Milk"}],"inventor":{"name":"Steve","age":23}} When I try to read this parquet table in Drill, it errors: QUERY: Select * from `/user/root/complex.parquet`; ERROR: Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while running query.[error_id: "c2e735f4-e11c-4e10-a410-959b3880dce0" endpoint { address: "perfnode154.perf.lab" user_port: 31010 control_port: 31011 data_port: 31012 } error_type: 0 message: "Failure while running fragment. < UnsupportedOperationException:[ unsupported type: BINARY LIST ]" ] Error: exception while executing query (state=,code=0) {code} 2014-07-23 22:16:45,239 [d106ad59-595f-42e7-880a-ef9f6bff1ff0:frag:0:0] DEBUG o.a.d.e.w.fragment.FragmentExecutor - Failure while initializing operator tree java.lang.UnsupportedOperationException: unsupported type: BINARY LIST at org.apache.drill.exec.store.parquet.ParquetRecordReader.toMajorType(ParquetRecordReader.java:446) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.store.parquet.ParquetRecordReader.setup(ParquetRecordReader.java:219) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ScanBatch.(ScanBatch.java:93) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch(ParquetScanBatchCreator.java:126) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch(ParquetScanBatchCreator.java:47) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:62) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:39) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitSubScan(AbstractPhysicalVisitor.java:113) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.store.parquet.ParquetRowGroupScan.accept(ParquetRowGroupScan.java:113) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:74) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:62) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:39) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitIteratorValidator(AbstractPhysicalVisitor.java:196) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.config.IteratorValidator.accept(IteratorValidator.java:34) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:74) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:62) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:39) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitProducerConsumer(AbstractPhysicalVisitor.java:191) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.config.ProducerConsumer.accept(ProducerConsumer.java:42) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:74) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:62) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:39) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitIteratorValidator(AbstractPhysicalVisitor.java:196) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.config.IteratorValidator.accept(IteratorValidator.java:34) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:74) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:59) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:39) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitStore(AbstractPhysicalVisitor.java:118) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitScreen(AbstractPhysicalVisitor.java:176) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.config.Screen.accept(Screen.java:95) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.getExec(ImplCreator.java:87) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:81) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:242) [drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_60] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_60] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_60] {code} I am able to verify that it has repeated data by dumping the parquet file using parquet-tools {code} ./parquet-tools dump badpigparquet row group 0 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- recipe: BINARY UNCOMPRESSED DO:0 FPO:4 SZ:85/85/1.00 VC:6 ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY ingredients: .bag: ..name: BINARY UNCOMPRESSED DO:0 FPO:89 SZ:120/120/1.00 VC:15 ENC:RLE,PLAIN_DICTIONARY inventor: .name: BINARY UNCOMPRESSED DO:0 FPO:209 SZ:74/74/1.00 VC:6 ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY .age: INT32 UNCOMPRESSED DO:0 FPO:283 SZ:64/64/1.00 VC:6 ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY recipe TV=6 RL=0 DL=1 DS: 2 DE:PLAIN_DICTIONARY ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- page 0: DLE:RLE RLE:BIT_PACKED VLE:PLAIN_DICTIONARY SZ:9 VC:6 ingredients.bag.name TV=15 RL=1 DL=3 DS: 5 DE:PLAIN_DICTIONARY ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- page 0: DLE:RLE RLE:RLE VLE:PLAIN_DICTIONARY SZ:21 VC:15 inventor.name TV=6 RL=0 DL=2 DS: 2 DE:PLAIN_DICTIONARY ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- page 0: DLE:RLE RLE:BIT_PACKED VLE:PLAIN_DICTIONARY SZ:10 VC:6 inventor.age TV=6 RL=0 DL=2 DS: 2 DE:PLAIN_DICTIONARY ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- page 0: DLE:RLE RLE:BIT_PACKED VLE:PLAIN_DICTIONARY SZ:10 VC:6 BINARY recipe ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- *** row group 1 of 1, values 1 to 6 *** value 1: R:0 D:1 V:Tacos value 2: R:0 D:1 V:TomatoSoup value 3: R:0 D:1 V:Tacos value 4: R:0 D:1 V:TomatoSoup value 5: R:0 D:1 V:Tacos value 6: R:0 D:1 V:TomatoSoup BINARY ingredients.bag.name ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- *** row group 1 of 1, values 1 to 15 *** value 1: R:0 D:3 V:Beef value 2: R:1 D:3 V:Lettuce value 3: R:1 D:3 V:Cheese value 4: R:0 D:3 V:Tomatoes value 5: R:1 D:3 V:Milk value 6: R:0 D:3 V:Beef value 7: R:1 D:3 V:Lettuce value 8: R:1 D:3 V:Cheese value 9: R:0 D:3 V:Tomatoes value 10: R:1 D:3 V:Milk value 11: R:0 D:3 V:Beef value 12: R:1 D:3 V:Lettuce value 13: R:1 D:3 V:Cheese value 14: R:0 D:3 V:Tomatoes value 15: R:1 D:3 V:Milk BINARY inventor.name ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- *** row group 1 of 1, values 1 to 6 *** value 1: R:0 D:2 V:Alex value 2: R:0 D:2 V:Steve value 3: R:0 D:2 V:Alex value 4: R:0 D:2 V:Steve value 5: R:0 D:2 V:Alex value 6: R:0 D:2 V:Steve INT32 inventor.age ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- *** row group 1 of 1, values 1 to 6 *** value 1: R:0 D:2 V:25 value 2: R:0 D:2 V:23 value 3: R:0 D:2 V:25 value 4: R:0 D:2 V:23 value 5: R:0 D:2 V:25 value 6: R:0 D:2 V:23 {code} was: ================================================= DRILL WRITING A PARQUET TABLE WITH NESTED DATA ================================================= I have a JSON file with nested data (schema present below): {"rownum":1,"name":"fred ovid","age":76,"gpa":1.55,"studentnum":692315658449,"create_time":"2014-05-27 00:26:07", "interests": [ "Reading", "Mountain Biking", "Hacking" ]} I am able to read this JSON file successfully from drill and access nested values. However when I try to import this data and create a table in PARQUET format, it errors: QUERY: create table test as select * from `/user/root/sample-data/nested_student.json`; ERROR: Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while running query.[error_id: "3ce3dc1e-d920-4262-ae2d-28bd2d034597" endpoint { address: "perfnode154.perf.lab" user_port: 31010 control_port: 31011 data_port: 31012 } error_type: 0 message: "Failure while running fragment. < ParquetEncodingException:[ error starting field interests at 6 ] < ClassCastException:[ parquet.io.PrimitiveColumnIO cannot be cast to parquet.io.GroupColumnIO ]" ] Error: exception while executing query (state=,code=0) {code} 2014-06-24 00:41:18,646 [b10db58d-8d4d-4d02-9fb5-a5081e5cb254:frag:0:0] ERROR o.a.d.e.w.f.AbstractStatusReporter - Error 48602de2-8306-47d2-875f-8ad2cd2e964a: Failure while running fragment. java.lang.ClassCastException: parquet.io.PrimitiveColumnIO cannot be cast to parquet.io.GroupColumnIO at parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.startField(MessageColumnIO.java:171) ~[parquet-column-1.5.0-20140513.004024-1.jar:na] at org.apache.drill.exec.store.ParquetOutputRecordWriter.addRepeatedVarCharHolder(ParquetOutputRecordWriter.java:761) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.store.EventBasedRecordWriter$RepeatedVarCharFieldWriter.writeField(EventBasedRecordWriter.java:1156) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.store.EventBasedRecordWriter.write(EventBasedRecordWriter.java:150) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext(WriterRecordBatch.java:111) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:91) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:72) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:65) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:45) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:94) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:91) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:56) ~[drill-java-exec-1.0.0-m2-incubat ing-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:85) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:46) ~[drill-java-exec-1.0.0-m2-incubat ing-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:100) ~[drill-java-exec-1.0.0-m2 -incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] {code} ================================================= DRILL READING A PARQUET TABLE WITH NESTED DATA ================================================= I generated a parquet file by reading the below Json file into pig and storing it in a parquet format: {"recipe":"Tacos","ingredients":[{"name":"Beef"},{"name":"Lettuce"},{"name":"Cheese"}],"inventor":{"name":"Alex","age":25}} {"recipe":"TomatoSoup","ingredients":[{"name":"Tomatoes"},{"name":"Milk"}],"inventor":{"name":"Steve","age":23}} {"recipe":"Tacos","ingredients":[{"name":"Beef"},{"name":"Lettuce"},{"name":"Cheese"}],"inventor":{"name":"Alex","age":25}} {"recipe":"TomatoSoup","ingredients":[{"name":"Tomatoes"},{"name":"Milk"}],"inventor":{"name":"Steve","age":23}} {"recipe":"Tacos","ingredients":[{"name":"Beef"},{"name":"Lettuce"},{"name":"Cheese"}],"inventor":{"name":"Alex","age":25}} {"recipe":"TomatoSoup","ingredients":[{"name":"Tomatoes"},{"name":"Milk"}],"inventor":{"name":"Steve","age":23}} When I try to read this parquet table in Drill, it errors: QUERY: Select * from `/user/root/complex.parquet`; ERROR: Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while running query.[error_id: "c2e735f4-e11c-4e10-a410-959b3880dce0" endpoint { address: "perfnode154.perf.lab" user_port: 31010 control_port: 31011 data_port: 31012 } error_type: 0 message: "Failure while running fragment. < UnsupportedOperationException:[ unsupported type: BINARY LIST ]" ] Error: exception while executing query (state=,code=0) {code} 2014-07-23 22:16:45,239 [d106ad59-595f-42e7-880a-ef9f6bff1ff0:frag:0:0] DEBUG o.a.d.e.w.fragment.FragmentExecutor - Failure while initializing operator tree java.lang.UnsupportedOperationException: unsupported type: BINARY LIST at org.apache.drill.exec.store.parquet.ParquetRecordReader.toMajorType(ParquetRecordReader.java:446) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.store.parquet.ParquetRecordReader.setup(ParquetRecordReader.java:219) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ScanBatch.(ScanBatch.java:93) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch(ParquetScanBatchCreator.java:126) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch(ParquetScanBatchCreator.java:47) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:62) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:39) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitSubScan(AbstractPhysicalVisitor.java:113) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.store.parquet.ParquetRowGroupScan.accept(ParquetRowGroupScan.java:113) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:74) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:62) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:39) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitIteratorValidator(AbstractPhysicalVisitor.java:196) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.config.IteratorValidator.accept(IteratorValidator.java:34) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:74) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:62) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:39) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitProducerConsumer(AbstractPhysicalVisitor.java:191) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.config.ProducerConsumer.accept(ProducerConsumer.java:42) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:74) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:62) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:39) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitIteratorValidator(AbstractPhysicalVisitor.java:196) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.config.IteratorValidator.accept(IteratorValidator.java:34) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:74) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:59) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:39) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitStore(AbstractPhysicalVisitor.java:118) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitScreen(AbstractPhysicalVisitor.java:176) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.config.Screen.accept(Screen.java:95) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.getExec(ImplCreator.java:87) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:81) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:242) [drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_60] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_60] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_60] {code} I am able to verify that it has repeated data by dumping the parquet file using parquet-tools {code} ./parquet-tools dump badpigparquet row group 0 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- recipe: BINARY UNCOMPRESSED DO:0 FPO:4 SZ:85/85/1.00 VC:6 ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY ingredients: .bag: ..name: BINARY UNCOMPRESSED DO:0 FPO:89 SZ:120/120/1.00 VC:15 ENC:RLE,PLAIN_DICTIONARY inventor: .name: BINARY UNCOMPRESSED DO:0 FPO:209 SZ:74/74/1.00 VC:6 ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY .age: INT32 UNCOMPRESSED DO:0 FPO:283 SZ:64/64/1.00 VC:6 ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY recipe TV=6 RL=0 DL=1 DS: 2 DE:PLAIN_DICTIONARY ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- page 0: DLE:RLE RLE:BIT_PACKED VLE:PLAIN_DICTIONARY SZ:9 VC:6 ingredients.bag.name TV=15 RL=1 DL=3 DS: 5 DE:PLAIN_DICTIONARY ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- page 0: DLE:RLE RLE:RLE VLE:PLAIN_DICTIONARY SZ:21 VC:15 inventor.name TV=6 RL=0 DL=2 DS: 2 DE:PLAIN_DICTIONARY ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- page 0: DLE:RLE RLE:BIT_PACKED VLE:PLAIN_DICTIONARY SZ:10 VC:6 inventor.age TV=6 RL=0 DL=2 DS: 2 DE:PLAIN_DICTIONARY ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- page 0: DLE:RLE RLE:BIT_PACKED VLE:PLAIN_DICTIONARY SZ:10 VC:6 BINARY recipe ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- *** row group 1 of 1, values 1 to 6 *** value 1: R:0 D:1 V:Tacos value 2: R:0 D:1 V:TomatoSoup value 3: R:0 D:1 V:Tacos value 4: R:0 D:1 V:TomatoSoup value 5: R:0 D:1 V:Tacos value 6: R:0 D:1 V:TomatoSoup BINARY ingredients.bag.name ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- *** row group 1 of 1, values 1 to 15 *** value 1: R:0 D:3 V:Beef value 2: R:1 D:3 V:Lettuce value 3: R:1 D:3 V:Cheese value 4: R:0 D:3 V:Tomatoes value 5: R:1 D:3 V:Milk value 6: R:0 D:3 V:Beef value 7: R:1 D:3 V:Lettuce value 8: R:1 D:3 V:Cheese value 9: R:0 D:3 V:Tomatoes value 10: R:1 D:3 V:Milk value 11: R:0 D:3 V:Beef value 12: R:1 D:3 V:Lettuce value 13: R:1 D:3 V:Cheese value 14: R:0 D:3 V:Tomatoes value 15: R:1 D:3 V:Milk BINARY inventor.name ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- *** row group 1 of 1, values 1 to 6 *** value 1: R:0 D:2 V:Alex value 2: R:0 D:2 V:Steve value 3: R:0 D:2 V:Alex value 4: R:0 D:2 V:Steve value 5: R:0 D:2 V:Alex value 6: R:0 D:2 V:Steve INT32 inventor.age ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- *** row group 1 of 1, values 1 to 6 *** value 1: R:0 D:2 V:25 value 2: R:0 D:2 V:23 value 3: R:0 D:2 V:25 value 4: R:0 D:2 V:23 value 5: R:0 D:2 V:25 value 6: R:0 D:2 V:23 {code} > Unable to read or write nested/repeated data in PARQUET format > -------------------------------------------------------------- > > Key: DRILL-1058 > URL: https://issues.apache.org/jira/browse/DRILL-1058 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Writer > Environment: CentOS release 6.5 > Reporter: Amit Katti > Assignee: Parth Chandra > Attachments: complex.parquet > > > ================================================= > DRILL WRITING A PARQUET TABLE WITH NESTED DATA > ================================================= > I have a JSON file with nested data (schema present below): > {"rownum":1,"name":"fred ovid","age":76,"gpa":1.55,"studentnum":692315658449,"create_time":"2014-05-27 00:26:07", "interests": [ "Reading", "Mountain Biking", "Hacking" ]} > I am able to read this JSON file successfully from drill and access nested values. However when I try to import this data and create a table in PARQUET format, it errors: > QUERY: create table test as select * from `/user/root/sample-data/nested_student.json`; > ERROR: Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while running query.[error_id: "3ce3dc1e-d920-4262-ae2d-28bd2d034597" > endpoint { > address: "perfnode154.perf.lab" > user_port: 31010 > control_port: 31011 > data_port: 31012 > } > error_type: 0 > message: "Failure while running fragment. < ParquetEncodingException:[ error starting field interests at 6 ] < ClassCastException:[ parquet.io.PrimitiveColumnIO cannot be cast to parquet.io.GroupColumnIO ]" > ] > Error: exception while executing query (state=,code=0) > {code} > 2014-06-24 00:41:18,646 [b10db58d-8d4d-4d02-9fb5-a5081e5cb254:frag:0:0] ERROR o.a.d.e.w.f.AbstractStatusReporter - Error 48602de2-8306-47d2-875f-8ad2cd2e964a: Failure while running fragment. > java.lang.ClassCastException: parquet.io.PrimitiveColumnIO cannot be cast to parquet.io.GroupColumnIO > at parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.startField(MessageColumnIO.java:171) ~[parquet-column-1.5.0-20140513.004024-1.jar:na] > at org.apache.drill.exec.store.ParquetOutputRecordWriter.addRepeatedVarCharHolder(ParquetOutputRecordWriter.java:761) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.store.EventBasedRecordWriter$RepeatedVarCharFieldWriter.writeField(EventBasedRecordWriter.java:1156) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.store.EventBasedRecordWriter.write(EventBasedRecordWriter.java:150) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext(WriterRecordBatch.java:111) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:91) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:72) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:65) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:45) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:94) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:91) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:56) ~[drill-java-exec-1.0.0-m2-incubat > ing-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:85) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:46) ~[drill-java-exec-1.0.0-m2-incubat > ing-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:100) ~[drill-java-exec-1.0.0-m2 > -incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > {code} > ================================================= > DRILL READING A PARQUET TABLE WITH NESTED DATA > ================================================= > I generated a parquet file by reading the below Json file into pig and storing it in a parquet format: > {"recipe":"Tacos","ingredients":[{"name":"Beef"},{"name":"Lettuce"},{"name":"Cheese"}],"inventor":{"name":"Alex","age":25}} > {"recipe":"TomatoSoup","ingredients":[{"name":"Tomatoes"},{"name":"Milk"}],"inventor":{"name":"Steve","age":23}} > When I try to read this parquet table in Drill, it errors: > QUERY: Select * from `/user/root/complex.parquet`; > ERROR: Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while running query.[error_id: "c2e735f4-e11c-4e10-a410-959b3880dce0" > endpoint { > address: "perfnode154.perf.lab" > user_port: 31010 > control_port: 31011 > data_port: 31012 > } > error_type: 0 > message: "Failure while running fragment. < UnsupportedOperationException:[ unsupported type: BINARY LIST ]" > ] > Error: exception while executing query (state=,code=0) > {code} > 2014-07-23 22:16:45,239 [d106ad59-595f-42e7-880a-ef9f6bff1ff0:frag:0:0] DEBUG o.a.d.e.w.fragment.FragmentExecutor - Failure while initializing operator tree > java.lang.UnsupportedOperationException: unsupported type: BINARY LIST > at org.apache.drill.exec.store.parquet.ParquetRecordReader.toMajorType(ParquetRecordReader.java:446) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.store.parquet.ParquetRecordReader.setup(ParquetRecordReader.java:219) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.physical.impl.ScanBatch.(ScanBatch.java:93) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch(ParquetScanBatchCreator.java:126) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch(ParquetScanBatchCreator.java:47) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:62) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:39) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitSubScan(AbstractPhysicalVisitor.java:113) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.store.parquet.ParquetRowGroupScan.accept(ParquetRowGroupScan.java:113) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:74) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:62) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:39) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitIteratorValidator(AbstractPhysicalVisitor.java:196) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.physical.config.IteratorValidator.accept(IteratorValidator.java:34) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:74) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:62) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:39) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitProducerConsumer(AbstractPhysicalVisitor.java:191) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.physical.config.ProducerConsumer.accept(ProducerConsumer.java:42) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:74) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:62) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:39) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitIteratorValidator(AbstractPhysicalVisitor.java:196) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.physical.config.IteratorValidator.accept(IteratorValidator.java:34) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:74) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:59) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:39) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitStore(AbstractPhysicalVisitor.java:118) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitScreen(AbstractPhysicalVisitor.java:176) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.physical.config.Screen.accept(Screen.java:95) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.physical.impl.ImplCreator.getExec(ImplCreator.java:87) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:81) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:242) [drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_60] > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_60] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_60] > {code} > I am able to verify that it has repeated data by dumping the parquet file using parquet-tools > {code} > ./parquet-tools dump badpigparquet > row group 0 > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > recipe: BINARY UNCOMPRESSED DO:0 FPO:4 SZ:85/85/1.00 VC:6 ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY > ingredients: > .bag: > ..name: BINARY UNCOMPRESSED DO:0 FPO:89 SZ:120/120/1.00 VC:15 ENC:RLE,PLAIN_DICTIONARY > inventor: > .name: BINARY UNCOMPRESSED DO:0 FPO:209 SZ:74/74/1.00 VC:6 ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY > .age: INT32 UNCOMPRESSED DO:0 FPO:283 SZ:64/64/1.00 VC:6 ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY > recipe TV=6 RL=0 DL=1 DS: 2 DE:PLAIN_DICTIONARY > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > page 0: DLE:RLE RLE:BIT_PACKED VLE:PLAIN_DICTIONARY SZ:9 VC:6 > ingredients.bag.name TV=15 RL=1 DL=3 DS: 5 DE:PLAIN_DICTIONARY > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > page 0: DLE:RLE RLE:RLE VLE:PLAIN_DICTIONARY SZ:21 VC:15 > inventor.name TV=6 RL=0 DL=2 DS: 2 DE:PLAIN_DICTIONARY > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > page 0: DLE:RLE RLE:BIT_PACKED VLE:PLAIN_DICTIONARY SZ:10 VC:6 > inventor.age TV=6 RL=0 DL=2 DS: 2 DE:PLAIN_DICTIONARY > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > page 0: DLE:RLE RLE:BIT_PACKED VLE:PLAIN_DICTIONARY SZ:10 VC:6 > BINARY recipe > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > *** row group 1 of 1, values 1 to 6 *** > value 1: R:0 D:1 V:Tacos > value 2: R:0 D:1 V:TomatoSoup > value 3: R:0 D:1 V:Tacos > value 4: R:0 D:1 V:TomatoSoup > value 5: R:0 D:1 V:Tacos > value 6: R:0 D:1 V:TomatoSoup > BINARY ingredients.bag.name > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > *** row group 1 of 1, values 1 to 15 *** > value 1: R:0 D:3 V:Beef > value 2: R:1 D:3 V:Lettuce > value 3: R:1 D:3 V:Cheese > value 4: R:0 D:3 V:Tomatoes > value 5: R:1 D:3 V:Milk > value 6: R:0 D:3 V:Beef > value 7: R:1 D:3 V:Lettuce > value 8: R:1 D:3 V:Cheese > value 9: R:0 D:3 V:Tomatoes > value 10: R:1 D:3 V:Milk > value 11: R:0 D:3 V:Beef > value 12: R:1 D:3 V:Lettuce > value 13: R:1 D:3 V:Cheese > value 14: R:0 D:3 V:Tomatoes > value 15: R:1 D:3 V:Milk > BINARY inventor.name > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > *** row group 1 of 1, values 1 to 6 *** > value 1: R:0 D:2 V:Alex > value 2: R:0 D:2 V:Steve > value 3: R:0 D:2 V:Alex > value 4: R:0 D:2 V:Steve > value 5: R:0 D:2 V:Alex > value 6: R:0 D:2 V:Steve > INT32 inventor.age > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > *** row group 1 of 1, values 1 to 6 *** > value 1: R:0 D:2 V:25 > value 2: R:0 D:2 V:23 > value 3: R:0 D:2 V:25 > value 4: R:0 D:2 V:23 > value 5: R:0 D:2 V:25 > value 6: R:0 D:2 V:23 > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)