hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-7569) Make sure multi-MR queries work
Date Wed, 06 Aug 2014 20:08:15 GMT

    [ https://issues.apache.org/jira/browse/HIVE-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088162#comment-14088162
] 

Chao commented on HIVE-7569:
----------------------------

(Not sure it's related)
Sometimes when I run a multi-insertion job in Spark, I got exception like following.
If I ran the SAME query in MR mode AND THEN in Spark mode, the query will succeed and produce
correct result.

{{code}}
2014-08-06 12:58:53,168 INFO  [Executor task launch worker-0]: exec.GroupByOperator (Operator.java:initialize(389))
- Initialization Done 35 GBY
2014-08-06 12:58:53,169 ERROR [Executor task launch worker-0]: ExecReducer (ExecReducer.java:reduce(272))
- org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error: Unable to deserialize
reduce input key from x1x1x1x98x98x98x98x98x98x98x98x98x98x98x98x98x98x98x98x0x0x255 with
properties {columns=_col0, serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe,
serialization.sort.order=+, columns.types=map<string,string>}
        at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:212)
        at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:60)
        at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
        at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:161)
        at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:161)
        at org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:559)
        at org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:559)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
        at org.apache.spark.scheduler.Task.run(Task.scala:51)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.hadoop.hive.serde2.SerDeException: java.io.EOFException
        at org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:191)
        at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:210)
        ... 15 more
Caused by: java.io.EOFException
        at org.apache.hadoop.hive.serde2.binarysortable.InputByteBuffer.read(InputByteBuffer.java:54)
        at org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:201)
        at org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:491)
        at org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:187)
        ... 16 more
{{code}}



> Make sure multi-MR queries work
> -------------------------------
>
>                 Key: HIVE-7569
>                 URL: https://issues.apache.org/jira/browse/HIVE-7569
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Chao
>
> With the latest dev effort, queries that would involve multiple MR jobs should be supported
by spark now, except for sorting, multi-insert, union, and join (map join and smb might just
work). However, this hasn't be verified and tested. This task is to ensure this is the case.
Please create JIRAs for problems found.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message