hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang (JIRA)" <>
Subject [jira] [Commented] (HIVE-8920) SplitSparkWorkResolver doesn't work with UnionWork [Spark Branch]
Date Wed, 17 Dec 2014 23:52:13 GMT


Xuefu Zhang commented on HIVE-8920:

Even with HIVE-9041, a query like this:
from (select * from dec union all select * from dec2) s
insert overwrite table dec3 select, sum(s.value) group by
insert overwrite table dec4 select, s.value order by s.value;
apparently finishing successfully even though the query actually failed. (This is a separate
issue though.) Exception is seen in hive.log with a WARN (again, a separate problem) log level:
2014-12-17 15:35:53,741 WARN  [task-result-getter-2]: scheduler.TaskSetManager (Logging.scala:logWarning(71))
- Lost task 0.0 in stage 1.0 (TID 2, localhost): java.lang.RuntimeException: java.lang.IllegalStateException:
Invalid input path hdfs://localhost:8020/user/hive/warehouse/dec2/dec.txt
        at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(
        at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(
        at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(
        at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(
        at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
        at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:210)
        at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.apache.spark.executor.Executor$
        at java.util.concurrent.ThreadPoolExecutor.runWorker(
        at java.util.concurrent.ThreadPoolExecutor$
Caused by: java.lang.IllegalStateException: Invalid input path hdfs://localhost:8020/user/hive/warehouse/dec2/dec.txt
        at org.apache.hadoop.hive.ql.exec.MapOperator.getNominalPath(
        at org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(
        at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(
        at org.apache.hadoop.hive.ql.exec.MapOperator.process(
        at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(
        ... 13 more
The exception is suspected to IOContext initialization problem.

> SplitSparkWorkResolver doesn't work with UnionWork [Spark Branch]
> -----------------------------------------------------------------
>                 Key: HIVE-8920
>                 URL:
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>    Affects Versions: spark-branch
>            Reporter: Chao
>            Assignee: Xuefu Zhang
> The following query will not work:
> {code}
> from (select * from table0 union all select * from table1) s
> insert overwrite table table3 select s.x, count(1) group by s.x
> insert overwrite table table4 select s.y, count(1) group by s.y;
> {code}
> Currently, the plan for this query, before SplitSparkWorkResolver, looks like below:
> {noformat}
>    M1    M2
>      \  / \
>       U3   R5
>       |
>       R4
> {noformat}
> In {{SplitSparkWorkResolver#splitBaseWork}}, it assumes that the {{childWork}} is a ReduceWork,
but for this case, you can see that for M2 the childWork could be UnionWork U3. Thus, the
code will fail.

This message was sent by Atlassian JIRA

View raw message