hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "liyunzhang_intel (JIRA)" <>
Subject [jira] [Created] (HIVE-17407) TPC-DS/query65 hangs on HoS in certain settings
Date Tue, 29 Aug 2017 08:45:00 GMT
liyunzhang_intel created HIVE-17407:

             Summary: TPC-DS/query65 hangs on HoS in certain settings
                 Key: HIVE-17407
             Project: Hive
          Issue Type: Bug
            Reporter: liyunzhang_intel

hangs when using following settings on 3TB scale.
  the explain is attached in explain65. The screenshot shows that it hanged in the Stage5.

Let's explain why hang.
       Reducer 10 <- Map 9 (GROUP, 1009)
        Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 1), Map 5 (PARTITION-LEVEL SORT, 1),
        Reducer 3 <- Reducer 10 (PARTITION-LEVEL SORT, 1009), Reducer 2 (PARTITION-LEVEL
SORT, 1009)
        Reducer 4 <- Reducer 3 (SORT, 1)
        Reducer 7 <- Map 6 (GROUP PARTITION-LEVEL SORT, 1009)

The numPartitions of SparkEdgeProperty which connects Reducer 2 and Reducer 3 is 1. This is
public ReduceWork createReduceWork(GenSparkProcContext context, Operator<?> root,
    SparkWork sparkWork) throws SemanticException {
    for (Operator<? extends OperatorDesc> parentOfRoot : root.getParentOperators())
      Preconditions.checkArgument(parentOfRoot instanceof ReduceSinkOperator,
          "AssertionError: expected parentOfRoot to be an "
              + "instance of ReduceSinkOperator, but was "
              + parentOfRoot.getClass().getName());
      ReduceSinkOperator reduceSink = (ReduceSinkOperator) parentOfRoot;
      maxExecutors = Math.max(maxExecutors, reduceSink.getConf().getNumReducers());

here the numReducers of all parentOfRoot is 1( in the explain, the parallelism of Map 1, Map
5, Reducer 7 is 1), so the numPartitions of SparkEdgeProperty which connects Reducer 2 and
Reducer 3 is 1. 
More explain why the parallelism of Map 1, Map 5,Reducer 7 are 1. The physical plan of the
query is 
The related RS of Map1, Map5, Reducer 7 is RS\[31\], RS\[32\], RS\[33\]. The parallelism is
set by [SemanticAnalyzer#genJoinReduceSinkChild|]
It seems that there is no logical error in the code. But it is not reasonable to use 1 task
to execute to deal with so big data(more than 30GB). Is there any way to pass the query in
this situation( the reason why i set as 3000000,
if the join is converted to the map join, it will throw disk error).

This message was sent by Atlassian JIRA

View raw message