tez-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rajesh Balamohan (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (TEZ-1635) Dag gets stuck intermittently
Date Sun, 05 Oct 2014 15:24:33 GMT

    [ https://issues.apache.org/jira/browse/TEZ-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14159558#comment-14159558
] 

Rajesh Balamohan edited comment on TEZ-1635 at 10/5/14 3:23 PM:
----------------------------------------------------------------

Attaching the successful and hung job details for tez_smb_1.q with additional logs in org.apache.hadoop.hive.ql.exec.tez.CustomPartitionVertex.

tez_smb_1.q: (DAG snapshot is already attached)
- Map_1 [Map_2] - MultiMRInput, initializer=MRInputAMSplitGenerator
- Map_1 [s1] - MRInputLegacy, initializer=MRInputAMSplitGenerator
- (Map_1 [Map_2], Map_1 [s1]) --> Map_1[MapTezProcessor]
- Map_1[MapTezProcessor] --> Map_1[out_Map_1] MROutput

Map_1 vertexManager is org.apache.hadoop.hive.ql.exec.tez.CustomPartitionVertex

Successful job:
===============
- When CustomPartitionVertex.onRootVertexInitialized() gets called for "s1", CustomPartionVertex.processAllEvents()
is invoked which internally populates bucketToTaskMap datastructure.
- When CustomPartitionVertex.onRootVertexInitialized() gets called for "Map_2", CustomPartitionVertex.processAllSideEvents()
is invoked which depends on bucketToTaskMap to generate the InputDataInformationEvent.

Failure/hung job:
===============
- CPV.onRootVertexInitialized() gets called for "Map_2" first.  This ends up calling CPV.processAllSideEvents().
 Since bucketToTaskMap structure is empty, it would *not* generate any InputDataInformationEvent.
- CPV.onRootVertexInitialized() gets called for "s1" later. 

In this case, events pertaining to MultiMRInput (Map_2) is never sent to Tez from CustomPartitionVertex.
 [~hagleitn] - Is this expected behavior of CustomPartitionVertex?





was (Author: rajesh.balamohan):
Attaching the successful and hung job details for tez_smb_1.q with additional logs in org.apache.hadoop.hive.ql.exec.tez.CustomPartitionVertex.

tez_smb_1.q: (DAG snapshot is already attached)
- Map_1 [Map_2] - MultiMRInput, initializer=MRInputAMSplitGenerator
- Map_1 [s1] - MRInputLegacy, initializer=MRInputAMSplitGenerator
- (Map_1 [Map_2], Map_1 [s1]) --> Map_1[MapTezProcessor]
- Map_1[MapTezProcessor] --> Map_1[out_Map_1] MROutput

Map_1 vertexManager is org.apache.hadoop.hive.ql.exec.tez.CustomPartitionVertex

Successful job:
===============
- When CustomPartitionVertex.onRootVertexInitialized() gets called for "s1", CustomPartionVertex.processAllEvents()
is invoked which internally populates bucketToTaskMap datastructure.
- When CustomPartitionVertex.onRootVertexInitialized() gets called for "Map_2", CustomPartitionVertex.processAllSideEvents()
is invoked which depends on bucketToTaskMap to generate the InputDataInformationEvent.

Failure/hung job:
===============
- CPV.onRootVertexInitialized() gets called for "Map_2" first.  This ends up calling CPV.processAllSideEvents().
 Since bucketToTaskMap structure is empty, it would *not* generate any InputDataInformationEvent.
- CPV.onRootVertexInitialized() gets called for "s1" later. 

In this case, events pertaining to MultiMRInput (Map_2) is never sent to Tez from CustomPartitionVertex.




> Dag gets stuck intermittently
> -----------------------------
>
>                 Key: TEZ-1635
>                 URL: https://issues.apache.org/jira/browse/TEZ-1635
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.5.0
>            Reporter: Vikram Dixit K
>            Priority: Blocker
>         Attachments: Screen Shot 2014-10-05 at 9.46.31 AM.png, syslog_dag_1412109415326_0002_10.gz,
tez_smb_1_hung_job.log, tez_smb_1_successful_job.log
>
>
> Attaching logs for the dag.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message