tez-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (TEZ-1470) TaskAttemptFinishedEvent is recorded multiple times for the same task
Date Wed, 08 Oct 2014 01:32:34 GMT

     [ https://issues.apache.org/jira/browse/TEZ-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jeff Zhang updated TEZ-1470:
----------------------------
    Summary: TaskAttemptFinishedEvent is recorded multiple times for the same task  (was:
TaskFinishedEvent is recorded multiple times)

> TaskAttemptFinishedEvent is recorded multiple times for the same task
> ---------------------------------------------------------------------
>
>                 Key: TEZ-1470
>                 URL: https://issues.apache.org/jira/browse/TEZ-1470
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>            Priority: Minor
>
> When AM is in attempt 2, sometimes TaskFinishedEvent may be recorded multiple times,
although it won't affect the recovery behavior now, track the issue here. 
> Here's one example ( task_1408499461970_0035_1_00_000000 is finished 2 times here, one
is in the attempt 1 and the other is in the attempt 2 )
> {code}
> 15:59:00,671 - Thread( main) - (RecoveryParser.java:332) - Parsing DAG recovery file
hdfs://0.0.0.0:9000/user/jzhang/tez/staging/.tez/application_1408499461970_0035/recovery/2/dag_1408499461970_0035_1.recovery
> 15:59:00,716 - Thread( main) - (RecoveryParser.java:312) - Parsed event from recovery
stream, eventType=DAG_SUBMITTED, event=dagID=dag_1408499461970_0035_1, submitTime=1408515526009
> 15:59:00,718 - Thread( main) - (RecoveryParser.java:312) - Parsed event from recovery
stream, eventType=DAG_INITIALIZED, event=dagID=dag_1408499461970_0035_1, initTime=1408515526187
> 15:59:00,719 - Thread( main) - (RecoveryParser.java:312) - Parsed event from recovery
stream, eventType=DAG_STARTED, event=dagID=dag_1408499461970_0035_1, startTime=1408515526263
> 15:59:00,722 - Thread( main) - (RecoveryParser.java:312) - Parsed event from recovery
stream, eventType=VERTEX_INITIALIZED, event=vertexName=reduce, vertexId=vertex_1408499461970_0035_1_01,
initRequestedTime=1408515526282, initedTime=1408515526325, numTasks=1, processorName=null,
additionalInputsCount=0
> 15:59:00,725 - Thread( main) - (RecoveryParser.java:312) - Parsed event from recovery
stream, eventType=VERTEX_INITIALIZED, event=vertexName=map, vertexId=vertex_1408499461970_0035_1_00,
initRequestedTime=1408515526265, initedTime=1408515526331, numTasks=1, processorName=null,
additionalInputsCount=1
> 15:59:00,736 - Thread( main) - (RecoveryParser.java:312) - Parsed event from recovery
stream, eventType=VERTEX_DATA_MOVEMENT_EVENTS_GENERATED, event=vertexId=vertex_1408499461970_0035_1_00,
eventCount=1
> 15:59:00,738 - Thread( main) - (RecoveryParser.java:312) - Parsed event from recovery
stream, eventType=VERTEX_DATA_MOVEMENT_EVENTS_GENERATED, event=vertexId=vertex_1408499461970_0035_1_00,
eventCount=1
> 15:59:00,739 - Thread( main) - (RecoveryParser.java:312) - Parsed event from recovery
stream, eventType=VERTEX_STARTED, event=vertexId=vertex_1408499461970_0035_1_00, startRequestedTime=1408515526332,
startedTime=1408515526332
> 15:59:00,742 - Thread( main) - (RecoveryParser.java:312) - Parsed event from recovery
stream, eventType=TASK_STARTED, event=vertexName=null, taskId=task_1408499461970_0035_1_00_000000,
scheduledTime=1408515526345, launchTime=1408515526345
> 15:59:00,742 - Thread( main) - (RecoveryParser.java:312) - Parsed event from recovery
stream, eventType=VERTEX_STARTED, event=vertexId=vertex_1408499461970_0035_1_01, startRequestedTime=1408515526350,
startedTime=1408515526350
> 15:59:00,750 - Thread( main) - (RecoveryParser.java:312) - Parsed event from recovery
stream, eventType=TASK_ATTEMPT_STARTED, event=vertexName=null, taskAttemptId=attempt_1408499461970_0035_1_00_000000_0,
startTime=1408515529190, containerId=container_1408499461970_0035_01_000002, nodeId=192.168.3.4:50974,
inProgressLogs=null, completedLogs=null
> 15:59:00,750 - Thread( main) - (RecoveryParser.java:312) - Parsed event from recovery
stream, eventType=VERTEX_DATA_MOVEMENT_EVENTS_GENERATED, event=vertexId=vertex_1408499461970_0035_1_00,
eventCount=1
> 15:59:00,768 - Thread( main) - (RecoveryParser.java:312) - Parsed event from recovery
stream, eventType=TASK_ATTEMPT_FINISHED, event=vertexName=null, taskAttemptId=attempt_1408499461970_0035_1_00_000000_0,
startTime=0, finishTime=1408515534680, timeTaken=1408515534680, status=SUCCEEDED, diagnostics=,
counters=Counters: 16, File System Counters, FILE: BYTES_READ=32, FILE: BYTES_WRITTEN=1042,
FILE: READ_OPS=0, FILE: LARGE_READ_OPS=0, FILE: WRITE_OPS=0, org.apache.tez.common.counters.TaskCounter,
SPILLED_RECORDS=100, GC_TIME_MILLIS=9, COMMITTED_HEAP_BYTES=257425408, INPUT_RECORDS_PROCESSED=50,
OUTPUT_RECORDS=100, OUTPUT_BYTES=800, OUTPUT_BYTES_WITH_OVERHEAD=1006, OUTPUT_BYTES_PHYSICAL=1010,
ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILL_COUNT=0
> 15:59:00,772 - Thread( main) - (RecoveryParser.java:312) - Parsed event from recovery
stream, eventType=TASK_FINISHED, event=vertexName=null, taskId=task_1408499461970_0035_1_00_000000,
startTime=0, finishTime=1408515534688, timeTaken=1408515534688, status=SUCCEEDED, successfulAttemptID=null,
diagnostics=, counters=Counters: 16, File System Counters, FILE: BYTES_READ=32, FILE: BYTES_WRITTEN=1042,
FILE: READ_OPS=0, FILE: LARGE_READ_OPS=0, FILE: WRITE_OPS=0, org.apache.tez.common.counters.TaskCounter,
SPILLED_RECORDS=100, GC_TIME_MILLIS=9, COMMITTED_HEAP_BYTES=257425408, INPUT_RECORDS_PROCESSED=50,
OUTPUT_RECORDS=100, OUTPUT_BYTES=800, OUTPUT_BYTES_WITH_OVERHEAD=1006, OUTPUT_BYTES_PHYSICAL=1010,
ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILL_COUNT=0
> 15:59:00,774 - Thread( main) - (RecoveryParser.java:312) - Parsed event from recovery
stream, eventType=VERTEX_FINISHED, event=vertexName=map, vertexId=vertex_1408499461970_0035_1_00,
initRequestedTime=0, initedTime=0, startRequestedTime=0, startedTime=0, finishTime=1408515534690,
timeTaken=1408515534690, status=SUCCEEDED, diagnostics=, counters=Counters: 16, File System
Counters, FILE: BYTES_READ=32, FILE: BYTES_WRITTEN=1042, FILE: READ_OPS=0, FILE: LARGE_READ_OPS=0,
FILE: WRITE_OPS=0, org.apache.tez.common.counters.TaskCounter, SPILLED_RECORDS=100, GC_TIME_MILLIS=9,
COMMITTED_HEAP_BYTES=257425408, INPUT_RECORDS_PROCESSED=50, OUTPUT_RECORDS=100, OUTPUT_BYTES=800,
OUTPUT_BYTES_WITH_OVERHEAD=1006, OUTPUT_BYTES_PHYSICAL=1010, ADDITIONAL_SPILLS_BYTES_WRITTEN=0,
ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILL_COUNT=0, vertexStats=null
> 15:59:00,775 - Thread( main) - (RecoveryParser.java:312) - Parsed event from recovery
stream, eventType=TASK_STARTED, event=vertexName=null, taskId=task_1408499461970_0035_1_01_000000,
scheduledTime=1408515534698, launchTime=1408515534698
> 15:59:00,775 - Thread( main) - (RecoveryParser.java:312) - Parsed event from recovery
stream, eventType=TASK_ATTEMPT_STARTED, event=vertexName=null, taskAttemptId=attempt_1408499461970_0035_1_01_000000_0,
startTime=1408515534718, containerId=container_1408499461970_0035_01_000002, nodeId=192.168.3.4:50974,
inProgressLogs=null, completedLogs=null
> 15:59:00,777 - Thread( main) - (RecoveryParser.java:312) - Parsed event from recovery
stream, eventType=TASK_FINISHED, event=vertexName=null, taskId=task_1408499461970_0035_1_00_000000,
startTime=0, finishTime=1408515540309, timeTaken=1408515540309, status=SUCCEEDED, successfulAttemptID=null,
diagnostics=, counters=Counters: 16, File System Counters, FILE: BYTES_READ=32, FILE: BYTES_WRITTEN=1042,
FILE: READ_OPS=0, FILE: LARGE_READ_OPS=0, FILE: WRITE_OPS=0, org.apache.tez.common.counters.TaskCounter,
SPILLED_RECORDS=100, GC_TIME_MILLIS=9, COMMITTED_HEAP_BYTES=257425408, INPUT_RECORDS_PROCESSED=50,
OUTPUT_RECORDS=100, OUTPUT_BYTES=800, OUTPUT_BYTES_WITH_OVERHEAD=1006, OUTPUT_BYTES_PHYSICAL=1010,
ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILL_COUNT=0
> 15:59:00,778 - Thread( main) - (RecoveryParser.java:312) - Parsed event from recovery
stream, eventType=VERTEX_FINISHED, event=vertexName=map, vertexId=vertex_1408499461970_0035_1_00,
initRequestedTime=0, initedTime=0, startRequestedTime=0, startedTime=0, finishTime=1408515540344,
timeTaken=1408515540344, status=SUCCEEDED, diagnostics=, counters=Counters: 16, File System
Counters, FILE: BYTES_READ=32, FILE: BYTES_WRITTEN=1042, FILE: READ_OPS=0, FILE: LARGE_READ_OPS=0,
FILE: WRITE_OPS=0, org.apache.tez.common.counters.TaskCounter, SPILLED_RECORDS=100, GC_TIME_MILLIS=9,
COMMITTED_HEAP_BYTES=257425408, INPUT_RECORDS_PROCESSED=50, OUTPUT_RECORDS=100, OUTPUT_BYTES=800,
OUTPUT_BYTES_WITH_OVERHEAD=1006, OUTPUT_BYTES_PHYSICAL=1010, ADDITIONAL_SPILLS_BYTES_WRITTEN=0,
ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILL_COUNT=0, vertexStats=null
> 15:59:00,779 - Thread( main) - (RecoveryParser.java:312) - Parsed event from recovery
stream, eventType=TASK_ATTEMPT_STARTED, event=vertexName=null, taskAttemptId=attempt_1408499461970_0035_1_01_000000_1,
startTime=1408515542968, containerId=container_1408499461970_0035_02_000002, nodeId=192.168.3.4:50974,
inProgressLogs=null, completedLogs=null
> 15:59:00,801 - Thread( main) - (Limits.java:59) - Counter limits initialized with parameters:
 GROUP_NAME_MAX=128, MAX_GROUPS=500, COUNTER_NAME_MAX=64, MAX_COUNTERS=1200
> 15:59:00,802 - Thread( main) - (RecoveryParser.java:312) - Parsed event from recovery
stream, eventType=TASK_ATTEMPT_FINISHED, event=vertexName=null, taskAttemptId=attempt_1408499461970_0035_1_01_000000_1,
startTime=0, finishTime=1408515553528, timeTaken=1408515553528, status=SUCCEEDED, diagnostics=,
counters=Counters: 30, File System Counters, FILE: BYTES_READ=1010, FILE: BYTES_WRITTEN=1010,
FILE: READ_OPS=0, FILE: LARGE_READ_OPS=0, FILE: WRITE_OPS=0, org.apache.tez.common.counters.TaskCounter,
REDUCE_INPUT_GROUPS=100, REDUCE_INPUT_RECORDS=100, COMBINE_INPUT_RECORDS=0, SPILLED_RECORDS=100,
NUM_SHUFFLED_INPUTS=1, NUM_SKIPPED_INPUTS=0, NUM_FAILED_SHUFFLE_INPUTS=0, MERGED_MAP_OUTPUTS=1,
GC_TIME_MILLIS=11, COMMITTED_HEAP_BYTES=257425408, OUTPUT_RECORDS=0, ADDITIONAL_SPILLS_BYTES_WRITTEN=1010,
ADDITIONAL_SPILLS_BYTES_READ=1010, SHUFFLE_BYTES=1010, SHUFFLE_BYTES_DECOMPRESSED=1006, SHUFFLE_BYTES_TO_MEM=1010,
SHUFFLE_BYTES_TO_DISK=0, NUM_MEM_TO_DISK_MERGES=0, NUM_DISK_TO_DISK_MERGES=0, Shuffle Errors,
BAD_ID=0, CONNECTION=0, IO_ERROR=0, WRONG_LENGTH=0, WRONG_MAP=0, WRONG_REDUCE=0
> 15:59:00,804 - Thread( main) - (RecoveryParser.java:312) - Parsed event from recovery
stream, eventType=VERTEX_FINISHED, event=vertexName=reduce, vertexId=vertex_1408499461970_0035_1_01,
initRequestedTime=0, initedTime=0, startRequestedTime=0, startedTime=0, finishTime=1408515553533,
timeTaken=1408515553533, status=SUCCEEDED, diagnostics=, counters=Counters: 30, File System
Counters, FILE: BYTES_READ=1010, FILE: BYTES_WRITTEN=1010, FILE: READ_OPS=0, FILE: LARGE_READ_OPS=0,
FILE: WRITE_OPS=0, org.apache.tez.common.counters.TaskCounter, REDUCE_INPUT_GROUPS=100, REDUCE_INPUT_RECORDS=100,
COMBINE_INPUT_RECORDS=0, SPILLED_RECORDS=100, NUM_SHUFFLED_INPUTS=1, NUM_SKIPPED_INPUTS=0,
NUM_FAILED_SHUFFLE_INPUTS=0, MERGED_MAP_OUTPUTS=1, GC_TIME_MILLIS=11, COMMITTED_HEAP_BYTES=257425408,
OUTPUT_RECORDS=0, ADDITIONAL_SPILLS_BYTES_WRITTEN=1010, ADDITIONAL_SPILLS_BYTES_READ=1010,
SHUFFLE_BYTES=1010, SHUFFLE_BYTES_DECOMPRESSED=1006, SHUFFLE_BYTES_TO_MEM=1010, SHUFFLE_BYTES_TO_DISK=0,
NUM_MEM_TO_DISK_MERGES=0, NUM_DISK_TO_DISK_MERGES=0, Shuffle Errors, BAD_ID=0, CONNECTION=0,
IO_ERROR=0, WRONG_LENGTH=0, WRONG_MAP=0, WRONG_REDUCE=0, vertexStats=null
> 15:59:00,805 - Thread( main) - (RecoveryParser.java:312) - Parsed event from recovery
stream, eventType=DAG_COMMIT_STARTED, event=dagID=dag_1408499461970_0035_1
> 15:59:00,810 - Thread( main) - (RecoveryParser.java:312) - Parsed event from recovery
stream, eventType=DAG_FINISHED, event=dagId=dag_1408499461970_0035_1, startTime=0, finishTime=1408515553540,
timeTaken=1408515553540, status=SUCCEEDED, diagnostics=, counters=Counters: 36, org.apache.tez.common.counters.DAGCounter,
TOTAL_LAUNCHED_TASKS=3, File System Counters, FILE: BYTES_READ=1042, FILE: BYTES_WRITTEN=2052,
FILE: READ_OPS=0, FILE: LARGE_READ_OPS=0, FILE: WRITE_OPS=0, org.apache.tez.common.counters.TaskCounter,
REDUCE_INPUT_GROUPS=100, REDUCE_INPUT_RECORDS=100, COMBINE_INPUT_RECORDS=0, SPILLED_RECORDS=200,
NUM_SHUFFLED_INPUTS=1, NUM_SKIPPED_INPUTS=0, NUM_FAILED_SHUFFLE_INPUTS=0, MERGED_MAP_OUTPUTS=1,
GC_TIME_MILLIS=20, COMMITTED_HEAP_BYTES=514850816, INPUT_RECORDS_PROCESSED=50, OUTPUT_RECORDS=100,
OUTPUT_BYTES=800, OUTPUT_BYTES_WITH_OVERHEAD=1006, OUTPUT_BYTES_PHYSICAL=1010, ADDITIONAL_SPILLS_BYTES_WRITTEN=1010,
ADDITIONAL_SPILLS_BYTES_READ=1010, ADDITIONAL_SPILL_COUNT=0, SHUFFLE_BYTES=1010, SHUFFLE_BYTES_DECOMPRESSED=1006,
SHUFFLE_BYTES_TO_MEM=1010, SHUFFLE_BYTES_TO_DISK=0, NUM_MEM_TO_DISK_MERGES=0, NUM_DISK_TO_DISK_MERGES=0,
Shuffle Errors, BAD_ID=0, CONNECTION=0, IO_ERROR=0, WRONG_LENGTH=0, WRONG_MAP=0, WRONG_REDUCE=0
> 15:59:00,810 - Thread( main) - (RecoveryParser.java:309) - Reached end of stream
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message