Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D697C107D9 for ; Tue, 4 Mar 2014 21:32:00 +0000 (UTC) Received: (qmail 38811 invoked by uid 500); 4 Mar 2014 21:31:44 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 38739 invoked by uid 500); 4 Mar 2014 21:31:44 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 38727 invoked by uid 500); 4 Mar 2014 21:31:43 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 38722 invoked by uid 99); 4 Mar 2014 21:31:43 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Mar 2014 21:31:43 +0000 Date: Tue, 4 Mar 2014 21:31:43 +0000 (UTC) From: "Szehon Ho (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-6414) ParquetInputFormat provides data values that do not match the object inspectors MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13920048#comment-13920048 ] Szehon Ho commented on HIVE-6414: --------------------------------- Hi Justin, thanks for taking care of it. Do you want resubmit the patch for testing for this issue? There had been an issue where the pre-commit test queue got lost. > ParquetInputFormat provides data values that do not match the object inspectors > ------------------------------------------------------------------------------- > > Key: HIVE-6414 > URL: https://issues.apache.org/jira/browse/HIVE-6414 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers > Affects Versions: 0.13.0 > Reporter: Remus Rusanu > Assignee: Justin Coffey > Labels: Parquet > Fix For: 0.13.0 > > Attachments: HIVE-6414.2.patch, HIVE-6414.3.patch, HIVE-6414.patch > > > While working on HIVE-5998 I noticed that the ParquetRecordReader returns IntWritable for all 'int like' types, in disaccord with the row object inspectors. I though fine, and I worked my way around it. But I see now that the issue trigger failuers in other places, eg. in aggregates: > {noformat} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"cint":528534767,"ctinyint":31,"csmallint":4963,"cfloat":31.0,"cdouble":4963.0,"cstring1":"cvLH6Eat2yFsyy7p"} > at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534) > at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) > ... 8 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to java.lang.Short > at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:808) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:524) > ... 9 more > Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to java.lang.Short > at org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaShortObjectInspector.get(JavaShortObjectInspector.java:41) > at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:671) > at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:631) > at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.merge(GenericUDAFMin.java:109) > at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.iterate(GenericUDAFMin.java:96) > at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:183) > at org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:641) > at org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:838) > at org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:735) > at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:803) > ... 15 more > {noformat} > My test is (I'm writing a test .q from HIVE-5998, but the repro does not involve vectorization): > {noformat} > create table if not exists alltypes_parquet ( > cint int, > ctinyint tinyint, > csmallint smallint, > cfloat float, > cdouble double, > cstring1 string) stored as parquet; > insert overwrite table alltypes_parquet > select cint, > ctinyint, > csmallint, > cfloat, > cdouble, > cstring1 > from alltypesorc; > explain select * from alltypes_parquet limit 10; select * from alltypes_parquet limit 10; > explain select ctinyint, > max(cint), > min(csmallint), > count(cstring1), > avg(cfloat), > stddev_pop(cdouble) > from alltypes_parquet > group by ctinyint; > select ctinyint, > max(cint), > min(csmallint), > count(cstring1), > avg(cfloat), > stddev_pop(cdouble) > from alltypes_parquet > group by ctinyint; > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)