Mailing-List: contact yarn-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: yarn-dev@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of rohithsharmaks@huawei.com
 designates 119.145.14.66 as permitted sender)
From: Rohith Sharma K S <rohithsharmaks@huawei.com>
To: "yarn-dev@hadoop.apache.org" <yarn-dev@hadoop.apache.org>
CC: "mapreduce-dev@hadoop.apache.org" <mapreduce-dev@hadoop.apache.org>,
        "mapreduce-user@hadoop.apache.org" <mapreduce-user@hadoop.apache.org>,
        "yarn-user@hadoop.apache.org" <yarn-user@hadoop.apache.org>
Subject: RE: Reducers are launched after jobClient is exited.
Thread-Topic: Reducers are launched after jobClient is exited.
Thread-Index: Ac8cnxZ1zG2miKMLTBOrfsgY6QXx3///t6IA//4xYtA=
Date: Thu, 30 Jan 2014 02:47:46 +0000
Message-ID: 
 <0EE80F6F7A98A64EBD18F2BE839C9115275A5C35@szxeml512-mbx.china.huawei.com>
References: 
 <0EE80F6F7A98A64EBD18F2BE839C9115275A171D@szxeml512-mbx.china.huawei.com>
 <D587D0C6-66B9-4198-8F77-386B749481AF@hortonworks.com>
In-Reply-To: <D587D0C6-66B9-4198-8F77-386B749481AF@hortonworks.com>
Accept-Language: en-US, zh-CN
Content-Language: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

Thank you vinod for your reply..=20


	I used FileOutputCommitter(Default) for committing file in job out directo=
ry. I noticed that there is no commit/abort happened for Reducer Task when =
Reducer is killed by NodeManager(stopContainer=20

request) . For reproducing test, I manually killed("kill" and "kill -9") Re=
ducer task, and end up with same issue.

I walkthrough YarnChild class and found there is NO shutdownHook is registe=
red.=20

Why there is no shutdownHook for YarnChild? Is this intentional?=20


-----Original Message-----
From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]=20
Sent: 29 January 2014 12:17
To: yarn-dev@hadoop.apache.org
Cc: mapreduce-dev@hadoop.apache.org; mapreduce-user@hadoop.apache.org; yarn=
-user@hadoop.apache.org
Subject: Re: Reducers are launched after jobClient is exited.

MapReduce AppMaster and YARN at large use asynchronous event handling insid=
e the JVM and so you may run into race conditions like this.

Even otherwise, doing this in a deterministic manner is better achieved by =
overriding your OutputCommitter. Job output commit/abort happens only once.

+Vinod

On Jan 28, 2014, at 7:06 PM, Rohith Sharma K S <rohithsharmaks@huawei.com> =
wrote:

> Hi All ,
>=20
>             I ran job with 1 Map and 1 Reducers (=20
> mapreduce.job.reduce.slowstart.completedmaps=3D1 ).  Map failed (=20
> because of error in Mapper implementation), but still Reducers are=20
> launched by applicationMaster.  These reducers killed by=20
> applicationMaster while
>=20
> stopping RMCommunicator service.
>=20
>=20
> 1.       Why Reducers are launching after job is finished.? ( Is this is =
bug in MR? )
>=20
>=20
>=20
> Our use case is when job is finished(succeeded/failed),client program=20
> delete the JobOutput directory. Here, jobclient exit immediately after=20
> jobStatus is set. ( in below log, at 2014-01-23 07:34:43,166)
>=20
>=20
>=20
> But , in the below log as mentioned reducers are launched later , Reducer=
 temporary directory and files are created(_temporary). These files left in=
 hdfs undeleted forever.
>=20
> Kindly suggest your thoughts, how we can handle this situation?
>=20
>=20
>=20
> 2014-01-23 07:34:43,151 INFO [AsyncDispatcher event handler]=20
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl:=20
> task_1389970937094_0047_m_000000 Task Transitioned from RUNNING to=20
> FAILED
> 2014-01-23 07:34:43,151 INFO [AsyncDispatcher event handler]=20
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed=20
> Tasks: 1
> 2014-01-23 07:34:43,151 INFO [AsyncDispatcher event handler]=20
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Job failed as=20
> tasks failed. failedMaps:1 failedReduces:0
> 2014-01-23 07:34:43,153 INFO [AsyncDispatcher event handler]=20
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:=20
> job_1389970937094_0047Job Transitioned from RUNNING to FAIL_ABORT
> 2014-01-23 07:34:43,153 INFO [CommitterEvent Processor #0]=20
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler:=20
> Processing the event EventType: JOB_ABORT
> 2014-01-23 07:34:43,166 INFO [AsyncDispatcher event handler]=20
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1389970937094_00=
47Job Transitioned from FAIL_ABORT to FAILED ...............
> ...............
> 2014-01-23 07:34:43,707 INFO [RMCommunicator Allocator]=20
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before=20
> Scheduling: PendingReds:1 ScheduledMaps:0 ScheduledReds:0=20
> AssignedMaps:1 AssignedReds:0 CompletedMaps:1 CompletedReds:0=20
> ContAlloc:4 ContRel:0 HostLocal:1 RackLocal:0
> 2014-01-23 07:34:43,709 INFO [RMCommunicator Allocator]=20
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:=20
> Recalculating schedule, headroom=3D12288
> 2014-01-23 07:34:43,709 INFO [RMCommunicator Allocator] org.apache.hadoop=
.mapreduce.v2.app.rm.RMContainerAllocator: Reduce slow start threshold reac=
hed. Scheduling reduces.
> 2014-01-23 07:34:43,709 INFO [RMCommunicator Allocator]=20
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: All maps assi=
gned. Ramping up all remaining reduces:1 ...............
> ...............
> 2014-01-23 07:34:45,714 INFO [RMCommunicator Allocator]=20
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got=20
> allocated containers 1
> 2014-01-23 07:34:45,714 INFO [RMCommunicator Allocator]=20
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned=20
> to reduce
> 2014-01-23 07:34:45,714 INFO [RMCommunicator Allocator]=20
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned=20
> container container_1389970937094_0047_01_000006 to=20
> attempt_1389970937094_0047_r_000000_0
> ...............
> ...............
> 2014-01-23 07:34:45,724 INFO [AsyncDispatcher event handler]=20
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:=20
> attempt_1389970937094_0047_r_000000_0 TaskAttempt Transitioned from=20
> UNASSIGNED to ASSIGNED
> 2014-01-23 07:34:45,725 INFO [ContainerLauncher #8]=20
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl:=20
> Processing the event EventType: CONTAINER_REMOTE_LAUNCH for container=20
> container_1389970937094_0047_01_000006 taskAttempt=20
> attempt_1389970937094_0047_r_000000_0
> 2014-01-23 07:34:45,725 INFO [ContainerLauncher #8]=20
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl:=20
> Launching attempt_1389970937094_0047_r_000000_0
> 2014-01-23 07:34:45,727 INFO [ContainerLauncher #8]=20
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl:=20
> Shuffle port returned by ContainerManager for=20
> attempt_1389970937094_0047_r_000000_0 : 11234
> 2014-01-23 07:34:45,728 INFO [AsyncDispatcher event handler]=20
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:=20
> TaskAttempt: [attempt_1389970937094_0047_r_000000_0] using=20
> containerId: [container_1389970937094_0047_01_000006 on NM:=20
> [linux85:11232]
> 2014-01-23 07:34:45,728 INFO [AsyncDispatcher event handler]=20
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:=20
> attempt_1389970937094_0047_r_000000_0 TaskAttempt Transitioned from=20
> ASSIGNED to RUNNING
> 2014-01-23 07:34:45,728 INFO [AsyncDispatcher event handler]=20
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1389970937094_=
0047_r_000000 Task Transitioned from SCHEDULED to RUNNING ...............
> .............
> 2014-01-23 07:34:48,178 INFO [Thread-59]=20
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl:=20
> KILLING attempt_1389970937094_0047_r_000000_0
> 2014-01-23 07:34:48,180 INFO [AsyncDispatcher event handler]=20
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1389=
970937094_0047_r_000000_0 TaskAttempt Transitioned from RUNNING to KILLED .=
..............
> .............
>=20
>=20
> Thanks & Regards
> Rohith


--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to=
 which it is addressed and may contain information that is confidential, pr=
ivileged and exempt from disclosure under applicable law. If the reader of =
this message is not the intended recipient, you are hereby notified that an=
y printing, copying, dissemination, distribution, disclosure or forwarding =
of this communication is strictly prohibited. If you have received this com=
munication in error, please contact the sender immediately and delete it fr=
om your system. Thank You.