hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rui Li (JIRA)" <>
Subject [jira] [Commented] (HIVE-7431) When run on spark cluster, some spark tasks may fail
Date Fri, 18 Jul 2014 06:49:05 GMT


Rui Li commented on HIVE-7431:

[~xuefuz] This failure happens when I run select count ( * ) or max/min queries on a table.
Spark cluster is deployed in standalone mode.

I added some log to debug the issue. I found that the issue is due to setting parent for a
TS more than once. Take the malformed op tree I mentioned earlier for example, when MAP (69)
is created, we set TS (65) as its child and set TS (65)'s parent to MAP (69). But later when
creating MAP (71), TS (65) is set as its child again. Now TS (65)'s parent doesn't contain
MAP (69), which triggers the exception.

I'm not familiar with how the map operator is structured but I suppose a TS shouldn't be assigned
to multiple MAPs right? Please note the successful tasks don't have such a malformed tree.

I'm still working to find the root cause. Any thoughts on this issue?

> When run on spark cluster, some spark tasks may fail
> ----------------------------------------------------
>                 Key: HIVE-7431
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Rui Li
> When running queries on spark, some spark tasks fail (usually the first couple of tasks)
with the following stack trace:
> {quote}
> org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:559)
> org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:559)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
> ...
> {quote}
> Observed for spark standalone cluster. Not verified for spark on yarn or mesos.
> NO PRECOMMIT TESTS. This is for spark branch only.

This message was sent by Atlassian JIRA

View raw message