hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sharad Agarwal (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3802) If an MR AM dies twice it looks like the process freezes
Date Wed, 08 Feb 2012 07:01:00 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203325#comment-13203325
] 

Sharad Agarwal commented on MAPREDUCE-3802:
-------------------------------------------

bq. need to understand a little bit better how these names are determined
The task attemptIds are unique across all the generations of AM. This is to avoid any remote
task attempt from previous generation of AM joining the current AM. The assumption is there
won't be more than 1000 attempts of a task in AM run. The suffix part of task attemptId is
determined as follows:
 _(AMGeneration-1)*1000. For first AM it will start from 0. For second it will start from
1000, for third from 2000 ..


                
> If an MR AM dies twice  it looks like the process freezes
> ---------------------------------------------------------
>
>                 Key: MAPREDUCE-3802
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3802
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.1, 0.24.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Critical
>         Attachments: syslog
>
>
> It looks like recovering from an RM AM dieing works very well on a single failure.  But
if it fails multiple times we appear to get into a live lock situation.
> {noformat}
> yarn jar hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*-SNAPSHOT.jar wordcount
-Dyarn.app.mapreduce.am.log.level=DEBUG -Dmapreduce.job.reduces=30 input output
> 12/02/03 21:06:57 WARN conf.Configuration: fs.default.name is deprecated. Instead, use
fs.defaultFS
> 12/02/03 21:06:57 WARN conf.Configuration: mapred.used.genericoptionsparser is deprecated.
Instead, use mapreduce.client.genericoptionsparser.used
> 12/02/03 21:06:57 INFO input.FileInputFormat: Total input paths to process : 17
> 12/02/03 21:06:57 INFO util.NativeCodeLoader: Loaded the native-hadoop library
> 12/02/03 21:06:57 WARN snappy.LoadSnappy: Snappy native library not loaded
> 12/02/03 21:06:57 INFO mapreduce.JobSubmitter: number of splits:17
> 12/02/03 21:06:57 INFO mapred.ResourceMgrDelegate: Submitted application application_1328302034486_0003
to ResourceManager at HOST/IP:8040
> 12/02/03 21:06:57 INFO mapreduce.Job: The url to track the job: http://HOST:8088/proxy/application_1328302034486_0003/
> 12/02/03 21:06:57 INFO mapreduce.Job: Running job: job_1328302034486_0003
> 12/02/03 21:07:03 INFO mapreduce.Job: Job job_1328302034486_0003 running in uber mode
: false
> 12/02/03 21:07:03 INFO mapreduce.Job:  map 0% reduce 0%
> 12/02/03 21:07:09 INFO mapreduce.Job:  map 5% reduce 0%
> 12/02/03 21:07:10 INFO mapreduce.Job:  map 17% reduce 0%
> #KILLED AM with kill -9 here
> 12/02/03 21:07:16 INFO mapreduce.Job:  map 29% reduce 0%
> 12/02/03 21:07:17 INFO mapreduce.Job:  map 35% reduce 0%
> 12/02/03 21:07:30 INFO mapreduce.Job:  map 52% reduce 0%
> 12/02/03 21:07:35 INFO mapreduce.Job:  map 58% reduce 0%
> 12/02/03 21:07:37 INFO mapreduce.Job:  map 70% reduce 0%
> 12/02/03 21:07:41 INFO mapreduce.Job:  map 76% reduce 0%
> 12/02/03 21:07:43 INFO mapreduce.Job:  map 82% reduce 0%
> 12/02/03 21:07:44 INFO mapreduce.Job:  map 88% reduce 0%
> 12/02/03 21:07:47 INFO mapreduce.Job:  map 94% reduce 0%
> 12/02/03 21:07:49 INFO mapreduce.Job:  map 100% reduce 0%
> 12/02/03 21:07:53 INFO mapreduce.Job:  map 100% reduce 3%
> 12/02/03 21:08:00 INFO mapreduce.Job:  map 100% reduce 6%
> 12/02/03 21:08:06 INFO mapreduce.Job:  map 100% reduce 10%
> 12/02/03 21:08:12 INFO mapreduce.Job:  map 100% reduce 13%
> 12/02/03 21:08:18 INFO mapreduce.Job:  map 100% reduce 16%
> #killed AM with kill -9 here
> 12/02/03 21:08:20 INFO ipc.Client: Retrying connect to server: HOST/IP:44223. Already
tried 0 time(s).
> 12/02/03 21:08:21 INFO ipc.Client: Retrying connect to server: HOST/IP:44223. Already
tried 1 time(s).
> 12/02/03 21:08:22 INFO ipc.Client: Retrying connect to server: HOST/IP:44223. Already
tried 2 time(s).
> 12/02/03 21:08:26 INFO mapreduce.Job:  map 64% reduce 16%
> #It never makes any more progress...
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message