From issues-return-154398-archive-asf-public=cust-asf.ponee.io@hive.apache.org Thu Mar 28 09:29:02 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 3FCF618076D for ; Thu, 28 Mar 2019 10:29:02 +0100 (CET) Received: (qmail 37159 invoked by uid 500); 28 Mar 2019 09:29:01 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 37137 invoked by uid 99); 28 Mar 2019 09:29:00 -0000 Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Mar 2019 09:29:00 +0000 Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 56C8FE0144 for ; Thu, 28 Mar 2019 09:29:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 13AFA24595 for ; Thu, 28 Mar 2019 09:29:00 +0000 (UTC) Date: Thu, 28 Mar 2019 09:29:00 +0000 (UTC) From: "Ganesha Shreedhara (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-21492) VectorizedParquetRecordReader can't to read parquet file generated using thrift/custom tool MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-21492?page=3Dcom.atlassian= .jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1680= 3742#comment-16803742 ]=20 Ganesha Shreedhara commented on HIVE-21492: ------------------------------------------- Please review the patch.=C2=A0 > VectorizedParquetRecordReader can't to read parquet file generated using = thrift/custom tool > -------------------------------------------------------------------------= ------------------ > > Key: HIVE-21492 > URL: https://issues.apache.org/jira/browse/HIVE-21492 > Project: Hive > Issue Type: Bug > Reporter: Ganesha Shreedhara > Assignee: Ganesha Shreedhara > Priority: Major > Attachments: HIVE-21492.patch > > > Taking an example of a parquet table having array of integers as below.= =C2=A0 > {code:java} > CREATE=C2=A0EXTERNAL=C2=A0TABLE (=C2=A0list_of_ints` array) > STORED AS PARQUET=C2=A0 > LOCATION '{location}'; > {code} > Parquet file generated using hive will have schema for Type as below: > {code:java} > group list_of_ints (LIST) { repeated group bag { optional int32 array;\n}= ;\n}=C2=A0{code} > Parquet file generated using thrift or any custom tool (using org.apache.= parquet.io.api.RecordConsumer) > may have schema for Type as below: > {code:java} > required group list_of_ints (LIST) { repeated int32 list_of_tuple}=C2=A0{= code} > VectorizedParquetRecordReader handles only parquet file generated using h= ive. It throws the following exception when parquet file generated using th= rift is read because of the changes done as part of=C2=A0HIVE-18553=C2=A0. > {code:java} > Caused by: java.lang.ClassCastException: repeated int32 list_of_ints_tupl= e is not a group > at org.apache.parquet.schema.Type.asGroupType(Type.java:207) > at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordRe= ader.getElementType(VectorizedParquetRecordReader.java:479) > at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordRe= ader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:532) > at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordRe= ader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440) > at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordRe= ader.nextBatch(VectorizedParquetRecordReader.java:401) > at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordRe= ader.next(VectorizedParquetRecordReader.java:353) > at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordRe= ader.next(VectorizedParquetRecordReader.java:92) > at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(Hive= ContextAwareRecordReader.java:365){code} > =C2=A0 > =C2=A0I have done a small change to handle the case where the child type = of group type can be PrimitiveType. -- This message was sent by Atlassian JIRA (v7.6.3#76005)