hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rohith Sharma K S <rohithsharm...@huawei.com>
Subject RE: Reducers are launched after jobClient is exited.
Date Thu, 30 Jan 2014 02:47:46 GMT
Thank you vinod for your reply.. 


	I used FileOutputCommitter(Default) for committing file in job out directory. I noticed that
there is no commit/abort happened for Reducer Task when Reducer is killed by NodeManager(stopContainer


request) . For reproducing test, I manually killed("kill" and "kill -9") Reducer task, and
end up with same issue.

I walkthrough YarnChild class and found there is NO shutdownHook is registered. 

Why there is no shutdownHook for YarnChild? Is this intentional? 



-----Original Message-----
From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com] 
Sent: 29 January 2014 12:17
To: yarn-dev@hadoop.apache.org
Cc: mapreduce-dev@hadoop.apache.org; mapreduce-user@hadoop.apache.org; yarn-user@hadoop.apache.org
Subject: Re: Reducers are launched after jobClient is exited.

MapReduce AppMaster and YARN at large use asynchronous event handling inside the JVM and so
you may run into race conditions like this.

Even otherwise, doing this in a deterministic manner is better achieved by overriding your
OutputCommitter. Job output commit/abort happens only once.

+Vinod

On Jan 28, 2014, at 7:06 PM, Rohith Sharma K S <rohithsharmaks@huawei.com> wrote:

> Hi All ,
> 
>             I ran job with 1 Map and 1 Reducers ( 
> mapreduce.job.reduce.slowstart.completedmaps=1 ).  Map failed ( 
> because of error in Mapper implementation), but still Reducers are 
> launched by applicationMaster.  These reducers killed by 
> applicationMaster while
> 
> stopping RMCommunicator service.
> 
> 
> 1.       Why Reducers are launching after job is finished.? ( Is this is bug in MR? )
> 
> 
> 
> Our use case is when job is finished(succeeded/failed),client program 
> delete the JobOutput directory. Here, jobclient exit immediately after 
> jobStatus is set. ( in below log, at 2014-01-23 07:34:43,166)
> 
> 
> 
> But , in the below log as mentioned reducers are launched later , Reducer temporary directory
and files are created(_temporary). These files left in hdfs undeleted forever.
> 
> Kindly suggest your thoughts, how we can handle this situation?
> 
> 
> 
> 2014-01-23 07:34:43,151 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: 
> task_1389970937094_0047_m_000000 Task Transitioned from RUNNING to 
> FAILED
> 2014-01-23 07:34:43,151 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed 
> Tasks: 1
> 2014-01-23 07:34:43,151 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Job failed as 
> tasks failed. failedMaps:1 failedReduces:0
> 2014-01-23 07:34:43,153 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: 
> job_1389970937094_0047Job Transitioned from RUNNING to FAIL_ABORT
> 2014-01-23 07:34:43,153 INFO [CommitterEvent Processor #0] 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: 
> Processing the event EventType: JOB_ABORT
> 2014-01-23 07:34:43,166 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1389970937094_0047Job Transitioned
from FAIL_ABORT to FAILED ...............
> ...............
> 2014-01-23 07:34:43,707 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:1 ScheduledMaps:0 ScheduledReds:0 
> AssignedMaps:1 AssignedReds:0 CompletedMaps:1 CompletedReds:0 
> ContAlloc:4 ContRel:0 HostLocal:1 RackLocal:0
> 2014-01-23 07:34:43,709 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: 
> Recalculating schedule, headroom=12288
> 2014-01-23 07:34:43,709 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Reduce slow start threshold reached. Scheduling reduces.
> 2014-01-23 07:34:43,709 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: All maps assigned. Ramping
up all remaining reduces:1 ...............
> ...............
> 2014-01-23 07:34:45,714 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got 
> allocated containers 1
> 2014-01-23 07:34:45,714 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned 
> to reduce
> 2014-01-23 07:34:45,714 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned 
> container container_1389970937094_0047_01_000006 to 
> attempt_1389970937094_0047_r_000000_0
> ...............
> ...............
> 2014-01-23 07:34:45,724 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
> attempt_1389970937094_0047_r_000000_0 TaskAttempt Transitioned from 
> UNASSIGNED to ASSIGNED
> 2014-01-23 07:34:45,725 INFO [ContainerLauncher #8] 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: 
> Processing the event EventType: CONTAINER_REMOTE_LAUNCH for container 
> container_1389970937094_0047_01_000006 taskAttempt 
> attempt_1389970937094_0047_r_000000_0
> 2014-01-23 07:34:45,725 INFO [ContainerLauncher #8] 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: 
> Launching attempt_1389970937094_0047_r_000000_0
> 2014-01-23 07:34:45,727 INFO [ContainerLauncher #8] 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: 
> Shuffle port returned by ContainerManager for 
> attempt_1389970937094_0047_r_000000_0 : 11234
> 2014-01-23 07:34:45,728 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
> TaskAttempt: [attempt_1389970937094_0047_r_000000_0] using 
> containerId: [container_1389970937094_0047_01_000006 on NM: 
> [linux85:11232]
> 2014-01-23 07:34:45,728 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
> attempt_1389970937094_0047_r_000000_0 TaskAttempt Transitioned from 
> ASSIGNED to RUNNING
> 2014-01-23 07:34:45,728 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1389970937094_0047_r_000000
Task Transitioned from SCHEDULED to RUNNING ...............
> .............
> 2014-01-23 07:34:48,178 INFO [Thread-59] 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: 
> KILLING attempt_1389970937094_0047_r_000000_0
> 2014-01-23 07:34:48,180 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1389970937094_0047_r_000000_0
TaskAttempt Transitioned from RUNNING to KILLED ...............
> .............
> 
> 
> Thanks & Regards
> Rohith


--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed
and may contain information that is confidential, privileged and exempt from disclosure under
applicable law. If the reader of this message is not the intended recipient, you are hereby
notified that any printing, copying, dissemination, distribution, disclosure or forwarding
of this communication is strictly prohibited. If you have received this communication in error,
please contact the sender immediately and delete it from your system. Thank You.

Mime
View raw message