tez-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rajesh Balamohan (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TEZ-1649) ShuffleVertexManager auto reduce parallelism can cause jobs to hang indefinitely (with ScatterGather edges)
Date Thu, 09 Oct 2014 09:12:34 GMT
Rajesh Balamohan created TEZ-1649:
-------------------------------------

             Summary: ShuffleVertexManager auto reduce parallelism can cause jobs to hang
indefinitely (with ScatterGather edges)
                 Key: TEZ-1649
                 URL: https://issues.apache.org/jira/browse/TEZ-1649
             Project: Apache Tez
          Issue Type: Bug
            Reporter: Rajesh Balamohan


Consider the following DAG
 M1, M2 --> R1
 M2, M3 --> R2
 R1 --> R2

All edges are Scatter-Gather.
 1. Set R1's (1000 parallelism) min/max setting to 0.25 - 0.5f
 2. Set R2's (21 parallelism) min/max setting to 0.2 and 0.3f
 3. Let M1 send some data from HDFS (test.txt)
 4. Let M2 (50 parallelism) generate some data and send it to R2
 5. Let M3 (500 parallelism) generate some data and send it to R2

- Since R2's min/max can get satisfied by getting events from M3 itself, R2 will change its
parallelism quickly than R1.
- In the mean time, R1 changes its parallelism from 1000 to 20.  This is not propagated to
R2 and it would keep waiting.

Tested this on a small scale (20 node) cluster and it happens consistently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message