Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3D32717B45 for ; Wed, 25 Feb 2015 14:22:21 +0000 (UTC) Received: (qmail 61473 invoked by uid 500); 25 Feb 2015 14:22:15 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 61348 invoked by uid 500); 25 Feb 2015 14:22:15 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 61337 invoked by uid 99); 25 Feb 2015 14:22:14 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Feb 2015 14:22:14 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of igor.bogomolov@gmail.com designates 209.85.192.42 as permitted sender) Received: from [209.85.192.42] (HELO mail-qg0-f42.google.com) (209.85.192.42) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Feb 2015 14:21:49 +0000 Received: by mail-qg0-f42.google.com with SMTP id z107so3156348qgd.1 for ; Wed, 25 Feb 2015 06:21:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=YCtc+fLYUq5i/Qflfg/Es6GASDc9UCl0L2cW+3gTc2E=; b=zO+YwBO9iGmH2PRcq+2vkIePEH0KgcSog5C3zvOireSPiL768iWb6kjs0BIMkfPzGP ssi4WoWgFpeKTB1sXRBJhNwNwaSc7SRfIHe3yMdbjEtO0VBdin0fLmE/JASBaXRKUZ5j dSQSri2yW40gbuSgQXdAIfcpdLQ5o98vYo0ureob0mefG6PBTQGbeXW6gl4Osp08oDzx QcqRM+9xA9UpxyJrlU96I+I45yWLoeXYRDKlIG6tN3d/0fVDQzezHoKqknj0Tqqv1hOf cLNYMt5E2YfUfGYf1pkuob30TRGqMV7E+2evKTpEXlYPPGCW6tGrOS4aJBaV39Y130eR uKOQ== MIME-Version: 1.0 X-Received: by 10.140.93.73 with SMTP id c67mr6921862qge.53.1424874108175; Wed, 25 Feb 2015 06:21:48 -0800 (PST) Received: by 10.140.41.40 with HTTP; Wed, 25 Feb 2015 06:21:47 -0800 (PST) In-Reply-To: References: Date: Wed, 25 Feb 2015 15:21:47 +0100 Message-ID: Subject: Re: tracking remote reads in datanode logs From: Igor Bogomolov To: =?UTF-8?B?RHJha2Xrr7zsmIHqt7w=?= Cc: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=001a1139b1c0116278050fea5b72 X-Virus-Checked: Checked by ClamAV on apache.org --001a1139b1c0116278050fea5b72 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Thanks a lot! Igor On Tue, Feb 24, 2015 at 11:46 PM, Drake=EB=AF=BC=EC=98=81=EA=B7=BC wrote: > Hi, Igor > > The AM logs are in the Hdfs if you set log aggregation property. > Otherwise, they are in the container log directory. See this: > http://ko.hortonworks.com/blog/simplifying-user-logs-management-and-acces= s-in-yarn/ > > Thanks > > 2015=EB=85=84 2=EC=9B=94 25=EC=9D=BC =EC=88=98=EC=9A=94=EC=9D=BC, Igor Bo= gomolov=EB=8B=98=EC=9D=B4 =EC=9E=91=EC=84=B1=ED= =95=9C =EB=A9=94=EC=8B=9C=EC=A7=80: > > Hi Drake, >> >> Thanks for a pointer. AM log indeed have information about remote map >> tasks. But I'd like to have more low level details. Like on which node e= ach >> map task was scheduled and how many bytes was read. That should be exact= ly >> in datanode log and I saw it for another job. But after I reinstall the >> cluster it's not there anymore :( >> >> Could you please tell the path where AM log is located (from which you >> copied the lines)? I found it in web interface but not as file on a disk= . >> And nothing in /var/log/hadoop-* >> >> Thanks, >> Igor >> >> On Tue, Feb 24, 2015 at 1:51 AM, Drake=EB=AF=BC=EC=98=81=EA=B7=BC wrote: >> >>> I found this in the mapreduce am log. >>> >>> 2015-02-23 11:22:45,576 INFO [RMCommunicator Allocator] >>> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before >>> Scheduling: PendingReds:1 ScheduledMaps:5 ScheduledReds:0 AssignedMaps:= 0 >>> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 >>> HostLocal:0 RackLocal:0 >>> .. >>> 2015-02-23 11:22:46,641 INFO [RMCommunicator Allocator] >>> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After >>> Scheduling: PendingReds:1 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:= 5 >>> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:5 ContRel:0 >>> HostLocal:3 RackLocal:2 >>> .. >>> >>> The first line says Map tasks are 5 and second says HostLocal 3 and Rac= k >>> Local 2. I think the Rack Local 2 are the remote map tasks as you menti= oned >>> before. >>> >>> >>> Drake =EB=AF=BC=EC=98=81=EA=B7=BC Ph.D >>> kt NexR >>> >>> On Tue, Feb 24, 2015 at 9:45 AM, Drake=EB=AF=BC=EC=98=81=EA=B7=BC wrote: >>> >>>> Hi, Igor >>>> >>>> Did you look at the mapreduce application master log? I think the loca= l >>>> or rack local map tasks are logged in the MapReduce AM log. >>>> >>>> Good luck. >>>> >>>> Drake =EB=AF=BC=EC=98=81=EA=B7=BC Ph.D >>>> kt NexR >>>> >>>> On Tue, Feb 24, 2015 at 3:30 AM, Igor Bogomolov < >>>> igor.bogomolov@gmail.com> wrote: >>>> >>>>> Hi all, >>>>> >>>>> In a small cluster of 5 nodes that run CDH 5.3.0 (Hadoop 2.5.0) I >>>>> want to know how many remote map tasks (ones that read input data fro= m >>>>> remote nodes) there are in a mapreduce job. For this purpose I took l= ogs of >>>>> each datanode an looked for lines with "op: HDFS_READ" and cliID >>>>> field that contains map task id. >>>>> >>>>> Surprisingly, 4 datanode logs does not contain lines with "op: HDFS_R= EAD". >>>>> Another 1 has many lines with "op: HDFS_READ" but all cliID look like >>>>> DFSClient_NONMAPREDUCE_* and does not contain any map task id. >>>>> >>>>> I concluded there are no remote map tasks but that does not look >>>>> correct. Also even local reads are not logged (because there is no li= ne >>>>> where cliID field contains some map task id). Could anyone please >>>>> explain what's wrong? Why logging is not working? (I use default sett= ings). >>>>> >>>>> Chris, >>>>> >>>>> Found HADOOP-3062 >>>>> that you have implemented. Thought you might have an explanation. >>>>> >>>>> Best, >>>>> Igor >>>>> >>>>> >>>> >>> >> > > -- > Drake =EB=AF=BC=EC=98=81=EA=B7=BC Ph.D > kt NexR > > --001a1139b1c0116278050fea5b72 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Thanks a lot!

Igor

On Tue, Feb 24, 2015 at 11:= 46 PM, Drake=EB=AF=BC=EC=98=81=EA=B7=BC <drake.min@nexr.com> wrote:
Hi, Igor

The = AM logs are in the Hdfs if you set log aggregation property. Otherwise, the= y are in the container log directory. See this:=C2=A0http://ko.hortonworks.com/blog/simplifying-user-logs-manag= ement-and-access-in-yarn/

Thanks

2015= =EB=85=84 2=EC=9B=94 25=EC=9D=BC =EC=88=98=EC=9A=94=EC=9D=BC, Igor Bogomolo= v<igor.bog= omolov@gmail.com>=EB=8B=98=EC=9D=B4 =EC=9E=91=EC=84=B1=ED=95=9C =EB= =A9=94=EC=8B=9C=EC=A7=80:

Hi Drake,

Thanks = for a pointer. AM log indeed have information about remote map tasks. But I= 'd like to have more low level details. Like on which node each map tas= k was scheduled and how many bytes was read. That should be exactly in data= node log and I saw it for another job. But after I reinstall the cluster it= 's not there anymore :(

Could you please tell the path wh= ere AM log is located (from which you copied the lines)? I found it in web = interface but not as file on a disk. And nothing in /var/log/hadoop-*
Thanks,
Igor

On Tue, Feb 24, 2015 at 1:51 AM, Drake=EB=AF=BC=EC=98= =81=EA=B7=BC <drake.min@nexr.com> wro= te:
I found this in the = mapreduce am log.

2015-02-23 11:22:45,576 INFO [RMC= ommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAll= ocator: Before Scheduling: PendingReds:1 ScheduledMaps:5 ScheduledReds:0 As= signedMaps:0 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 Con= tRel:0 HostLocal:0 RackLocal:0
..
2015-02-23 11:22:46,6= 41 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RM= ContainerAllocator: After Scheduling: PendingReds:1 ScheduledMaps:0 Schedul= edReds:0 AssignedMaps:5 AssignedReds:0 CompletedMaps:0 CompletedReds:0 Cont= Alloc:5 ContRel:0 HostLocal:3 RackLocal:2
..

=
The first line says Map tasks are 5 and second says HostLocal 3 = and Rack Local 2. I think the Rack Local 2 are the remote map tasks as you = mentioned before.


Drake = =EB=AF=BC=EC=98=81=EA=B7=BC Ph.D
kt NexR
<= /div>
On Tue, Feb 24, 2015 at 9:4= 5 AM, Drake=EB=AF=BC=EC=98=81=EA=B7=BC <drake.min@n= exr.com> wrote:
Hi, Igor

Did you look at the mapreduce application= master log? I think the local or rack local map tasks are logged in the Ma= pReduce AM log.

Good luck.

Drake =EB=AF=BC=EC=98=81=EA=B7=BC Ph.D
kt NexR

On Tue, Feb 24, 2015 at 3:30 AM, Igor Bogomo= lov <igor.bogomolov@gmail.com> wrote:=
Hi all,

In a small cluster of 5 nodes that run CD= H 5.3.0 (Hadoop 2.5.0) I want to know how many remote m= ap tasks (ones that read input data from remote nodes) there are in a mapre= duce job. For this purpose I took logs of each datanode an lo= oked for lines with "op: HDFS_READ" and cliID<= /span> field that contains map task id.

Surprisingly, 4 datanode logs does not contain lines with "op: HDFS_READ". Another 1 has many lines with "op: HDFS_R= EAD" but all cliID look like DFSClient_NONMAPREDUCE_* and does not contain any map task id.

I conc= luded there are no remote map tasks but that does not look correct. Also ev= en local reads are not logged (because there is no line where cliID field contains some map task id). Could anyone please explain what= 9;s wrong? Why logging is not working? (I use default settings).

Chris,

Found HADOOP-306= 2 that you have implemented. Thought you might have an explanation.
=
Best,
Igor

=





--
Drake =EB=AF=BC=EC=98=81= =EA=B7=BC Ph.D
kt NexR


--001a1139b1c0116278050fea5b72--