hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Szehon Ho (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-6414) ParquetInputFormat provides data values that do not match the object inspectors
Date Mon, 24 Feb 2014 21:07:23 GMT

    [ https://issues.apache.org/jira/browse/HIVE-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13910810#comment-13910810
] 

Szehon Ho commented on HIVE-6414:
---------------------------------

Hi Justin, I think this version of the fix looks more complete than mine, let's go with this
one if it works.  

But just some comments.  Which version of the branch did you create the branch from?  There
are some late changes that added new output to all q.out file in HIVE-5958, and this might
need to regenerate based on that.  

Also does the query need a "sort by" after "group by" to guarantee deterministic result of
the q.out file?  Thanks.

> ParquetInputFormat provides data values that do not match the object inspectors
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-6414
>                 URL: https://issues.apache.org/jira/browse/HIVE-6414
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.13.0
>            Reporter: Remus Rusanu
>            Assignee: Justin Coffey
>              Labels: Parquet
>             Fix For: 0.13.0
>
>         Attachments: HIVE-6414.patch
>
>
> While working on HIVE-5998 I noticed that the ParquetRecordReader returns IntWritable
for all 'int like' types, in disaccord with the row object inspectors. I though fine, and
I worked my way around it. But I see now that the issue trigger failuers in other places,
eg. in aggregates:
> {noformat}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
processing row {"cint":528534767,"ctinyint":31,"csmallint":4963,"cfloat":31.0,"cdouble":4963.0,"cstring1":"cvLH6Eat2yFsyy7p"}
>         at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534)
>         at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
>         ... 8 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException:
org.apache.hadoop.io.IntWritable cannot be cast to java.lang.Short
>         at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:808)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790)
>         at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790)
>         at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790)
>         at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:524)
>         ... 9 more
> Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast
to java.lang.Short
>         at org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaShortObjectInspector.get(JavaShortObjectInspector.java:41)
>         at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:671)
>         at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:631)
>         at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.merge(GenericUDAFMin.java:109)
>         at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.iterate(GenericUDAFMin.java:96)
>         at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:183)
>         at org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:641)
>         at org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:838)
>         at org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:735)
>         at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:803)
>         ... 15 more
> {noformat}
> My test is (I'm writing a test .q from HIVE-5998, but the repro does not involve vectorization):
> {noformat}
> create table if not exists alltypes_parquet (
>   cint int,
>   ctinyint tinyint,
>   csmallint smallint,
>   cfloat float,
>   cdouble double,
>   cstring1 string) stored as parquet;
> insert overwrite table alltypes_parquet
>   select cint,
>     ctinyint,
>     csmallint,
>     cfloat,
>     cdouble,
>     cstring1
>   from alltypesorc;
> explain select * from alltypes_parquet limit 10; select * from alltypes_parquet limit
10;
> explain select ctinyint,
>   max(cint),
>   min(csmallint),
>   count(cstring1),
>   avg(cfloat),
>   stddev_pop(cdouble)
>   from alltypes_parquet
>   group by ctinyint;
> select ctinyint,
>   max(cint),
>   min(csmallint),
>   count(cstring1),
>   avg(cfloat),
>   stddev_pop(cdouble)
>   from alltypes_parquet
>   group by ctinyint;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message