hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergey <ser...@gmail.com>
Subject Re: Re-execution of map task
Date Wed, 11 Nov 2015 19:02:27 GMT
Hi Varun,
thank you!

But the main question is the reason...

I did check log files with yarn logs and local logs on nodes.
And don't see reasons why Hadoop can think that nodes gone unusable.


Regards,
Sergey.




2015-11-11 21:23 GMT+03:00 Varun Saxena <vsaxena.varun@gmail.com>:

> Hi Sergey,
>
> This indicates that one or more of your Node Managers' may have gone down.
> RM indicates this to AM on allocate response.
> If a map task ran on such a node, its output is considered unusable even
> though the map task has been marked as success previously.
> Such a map task is then KILLED and a new attempt is launched.
>
> Regards,
> Varun Saxena.
>
> On Wed, Nov 11, 2015 at 11:44 PM, Sergey <sergun@gmail.com> wrote:
>
>> Hi,
>>
>> yes, there are several "failed" map, because of 600 sec time-out.
>>
>> I also found a lot messages like this in the log:
>>
>> 2015-11-09 22:00:35,882 INFO [AsyncDispatcher event handler]
>> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: TaskAttempt killed
>> because it ran on unusable node 10.0.0.5:30050.
>> AttemptId:attempt_1447029285980_0001_m_000043_1
>>
>> Different nodes got unusable status very often.
>>
>> Do you know something about possible reason? Maybe changing some time-out
>> params in communication between nodes could help?
>>
>> As I already said I work in the cloud on Azure HDInsight.
>>
>>
>>
>>
>>
>>
>> 2015-11-11 20:33 GMT+03:00 Namikaze Minato <lloydsensei@gmail.com>:
>>
>>> Hi.
>>>
>>> Do you also have "failed" map attempts?
>>> Killed map attempts won't help us understand why your job is failing.
>>>
>>> Regards,
>>> LLoyd
>>>
>>>
>>> On 11 November 2015 at 16:37, Sergey <sergun@gmail.com> wrote:
>>> >
>>> > Hi experts!
>>> >
>>> > I see strange behaviour of Hadoop while execution of my tasks.
>>> > It re-runs task attempt which has completed with SUCCEEDED status
>>> > (see the log below about attempt_1447029285980_0001_m_000012_0).
>>> >
>>> > I don't know why but this tasks repeats in attempts numbers 0,1,2,3,4
>>> and
>>> > than 2000.
>>> >
>>> > The same story with some other tasks..
>>> > A also see on screen after execution of task that some times map
>>> progress is
>>> > decreasing...
>>> >
>>> > I don't use preemption, speculative execution and don't see any
>>> exceptions,
>>> > time-outs in yarn log
>>> > (except last line "Container killed on request. Exit code is 143").
>>> >
>>> > How to catch the reason?
>>> >
>>> > I use version 2.6.0 in Azure cloud (HDInsight)
>>> >
>>> >
>>> > 2015-11-09 19:57:45,584 INFO [IPC Server handler 17 on 53153]
>>> > org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of
>>> TaskAttempt
>>> > attempt_1447029285980_0001_m_000012_0 is : 1.0
>>> > 2015-11-09 19:57:45,592 INFO [IPC Server handler 12 on 53153]
>>> > org.apache.hadoop.mapred.TaskAttemptListenerImpl: Done acknowledgement
>>> from
>>> > attempt_1447029285980_0001_m_000012_0
>>> > 2015-11-09 19:57:45,592 INFO [AsyncDispatcher event handler]
>>> > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
>>> > attempt_1447029285980_0001_m_000012_0 TaskAttempt Transitioned from
>>> RUNNING
>>> > to SUCCESS_CONTAINER_CLEANUP
>>> > 2015-11-09 19:57:45,593 INFO [ContainerLauncher #4]
>>> > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl:
>>> > Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container
>>> > container_e04_1447029285980_0001_01_002951 taskAttempt
>>> > attempt_1447029285980_0001_m_000012_0
>>> > 2015-11-09 19:57:45,593 INFO [ContainerLauncher #4]
>>> > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl:
>>> KILLING
>>> > attempt_1447029285980_0001_m_000012_0
>>> > 2015-11-09 19:57:45,593 INFO [ContainerLauncher #4]
>>> >
>>> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy:
>>> > Opening proxy : 10.0.0.8:30050
>>> > 2015-11-09 19:57:45,906 INFO [AsyncDispatcher event handler]
>>> > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
>>> > attempt_1447029285980_0001_m_000012_0 TaskAttempt Transitioned from
>>> > SUCCESS_CONTAINER_CLEANUP to SUCCEEDED
>>> > 2015-11-09 19:57:45,907 INFO [AsyncDispatcher event handler]
>>> > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded
>>> with
>>> > attempt attempt_1447029285980_0001_m_000012_0
>>> > 2015-11-09 19:57:45,907 INFO [AsyncDispatcher event handler]
>>> > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl:
>>> > task_1447029285980_0001_m_000012 Task Transitioned from RUNNING to
>>> SUCCEEDED
>>> > 2015-11-09 19:57:45,907 INFO [AsyncDispatcher event handler]
>>> > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed
>>> Tasks: 4
>>> > 2015-11-09 19:57:46,553 INFO [RMCommunicator Allocator]
>>> > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before
>>> > Scheduling: PendingReds:0 ScheduledMaps:35 ScheduledReds:1
>>> AssignedMaps:8
>>> > AssignedReds:0 CompletedMaps:4 CompletedReds:0 ContAlloc:16 ContRel:0
>>> > HostLocal:0 RackLocal:16
>>> > 2015-11-09 19:57:48,575 INFO [RMCommunicator Allocator]
>>> > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received
>>> > completed container container_e04_1447029285980_0001_01_002951
>>> > 2015-11-09 19:57:48,575 INFO [RMCommunicator Allocator]
>>> > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got
>>> allocated
>>> > containers 1
>>> > 2015-11-09 19:57:48,575 INFO [RMCommunicator Allocator]
>>> > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to
>>> > reduce
>>> > 2015-11-09 19:57:48,575 INFO [AsyncDispatcher event handler]
>>> > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
>>> Diagnostics
>>> > report from attempt_1447029285980_0001_m_000012_0: Container killed by
>>> the
>>> > ApplicationMaster.
>>> > Container killed on request. Exit code is 143
>>> > Container exited with a non-zero exit code 143
>>>
>>
>>
>

Mime
View raw message