hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-8840) Print prettier Spark work graph after HIVE-8793 [Spark Branch]
Date Wed, 12 Nov 2014 18:36:35 GMT
Xuefu Zhang created HIVE-8840:
---------------------------------

             Summary: Print prettier Spark work graph after HIVE-8793 [Spark Branch]
                 Key: HIVE-8840
                 URL: https://issues.apache.org/jira/browse/HIVE-8840
             Project: Hive
          Issue Type: Improvement
          Components: Spark
            Reporter: Xuefu Zhang


Because of HIVE-8793, the work graph for Spark is possibly modified by SplitSparkWorkResolver.
 Original:
{code}
    Spark
      Edges:
        Reducer 2 <- Map 1 (SORT, 1)
        Reducer 3 <- Reducer 2 (GROUP, 1)
         Reducer 4 <- Reducer 2 (GROUP, 1)
{code}
New graph
{code}
    Spark
      Edges:
        Reducer 3 <- Reducer 5 (GROUP, 1)
        Reducer 4 <- Reducer 6 (GROUP, 1)
        Reducer 5 <- Map 1 (SORT, 1)
        Reducer 6 <- Map 1 (SORT, 1)
{code}
where Reducer2 was splitted into Reducer5 and Reducer6.

Two types of ordering can be considered:
1. Topological order
{code}
    Spark
      Edges:
        Reducer 5 <- Map 1 (SORT, 1)
        Reducer 6 <- Map 1 (SORT, 1)
        Reducer 3 <- Reducer 5 (GROUP, 1)
        Reducer 4 <- Reducer 6 (GROUP, 1)
{code}
2.  DFS
{code}
    Spark
      Edges:
        Reducer 5 <- Map 1 (SORT, 1)
        Reducer 3 <- Reducer 5 (GROUP, 1)
        Reducer 6 <- Map 1 (SORT, 1)
        Reducer 4 <- Reducer 6 (GROUP, 1)
{code}

Both seems better, though topolical seems more suitable for a graph. Please feel free to create
a patch on trunk if needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message