Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7858D17D53 for ; Tue, 24 Feb 2015 18:36:23 +0000 (UTC) Received: (qmail 97242 invoked by uid 500); 24 Feb 2015 18:36:00 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 97142 invoked by uid 500); 24 Feb 2015 18:36:00 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 97132 invoked by uid 99); 24 Feb 2015 18:36:00 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Feb 2015 18:36:00 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of igor.bogomolov@gmail.com designates 209.85.216.50 as permitted sender) Received: from [209.85.216.50] (HELO mail-qa0-f50.google.com) (209.85.216.50) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Feb 2015 18:35:34 +0000 Received: by mail-qa0-f50.google.com with SMTP id f12so28522953qad.9 for ; Tue, 24 Feb 2015 10:34:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=FEDMMO21Un8/o7Ls54LnLa+qYJYEz6/glqb6zbSIg44=; b=MJtCuyKtDiN+WMG14MvqCksn1J+xjln+oOlVCsjuPD4roM2ulTSGJk/XsLTFPfUjDg MBqxHrGPw6N9KsNhK6z8t8a4GwKLVh9K6NjtmlCCkhq87ABuKOW1AK1s3ihd51wRlBAM icUrdLTSZnrYfOeFEw8E2b5YmaHgcDJpu0fKkf2mDvV5fM+bOQitPLre9/kZuK14nfMv +HgGs2FNFWZrm89IJXe32eoF5gElZoIPfrVxelAXN0P2HtcPAb+fbce6PujQKJmk4cXQ VU2yTcwwIgmasrw4h5OuBZOTcmluSVvDJMdcOGe6FNdEcSVuoJwdIFUPx2OKTA+bJqDH 8WYw== MIME-Version: 1.0 X-Received: by 10.229.78.137 with SMTP id l9mr27506850qck.11.1424802887379; Tue, 24 Feb 2015 10:34:47 -0800 (PST) Received: by 10.140.41.40 with HTTP; Tue, 24 Feb 2015 10:34:47 -0800 (PST) In-Reply-To: References: Date: Tue, 24 Feb 2015 19:34:47 +0100 Message-ID: Subject: Re: tracking remote reads in datanode logs From: Igor Bogomolov To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a1134046afa30e2050fd9c5dc X-Virus-Checked: Checked by ClamAV on apache.org --001a1134046afa30e2050fd9c5dc Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Drake, Thanks for a pointer. AM log indeed have information about remote map tasks. But I'd like to have more low level details. Like on which node each map task was scheduled and how many bytes was read. That should be exactly in datanode log and I saw it for another job. But after I reinstall the cluster it's not there anymore :( Could you please tell the path where AM log is located (from which you copied the lines)? I found it in web interface but not as file on a disk. And nothing in /var/log/hadoop-* Thanks, Igor On Tue, Feb 24, 2015 at 1:51 AM, Drake=EB=AF=BC=EC=98=81=EA=B7=BC wrote: > I found this in the mapreduce am log. > > 2015-02-23 11:22:45,576 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before > Scheduling: PendingReds:1 ScheduledMaps:5 ScheduledReds:0 AssignedMaps:0 > AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 > HostLocal:0 RackLocal:0 > .. > 2015-02-23 11:22:46,641 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After > Scheduling: PendingReds:1 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:5 > AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:5 ContRel:0 > HostLocal:3 RackLocal:2 > .. > > The first line says Map tasks are 5 and second says HostLocal 3 and Rack > Local 2. I think the Rack Local 2 are the remote map tasks as you mention= ed > before. > > > Drake =EB=AF=BC=EC=98=81=EA=B7=BC Ph.D > kt NexR > > On Tue, Feb 24, 2015 at 9:45 AM, Drake=EB=AF=BC=EC=98=81=EA=B7=BC wrote: > >> Hi, Igor >> >> Did you look at the mapreduce application master log? I think the local >> or rack local map tasks are logged in the MapReduce AM log. >> >> Good luck. >> >> Drake =EB=AF=BC=EC=98=81=EA=B7=BC Ph.D >> kt NexR >> >> On Tue, Feb 24, 2015 at 3:30 AM, Igor Bogomolov > > wrote: >> >>> Hi all, >>> >>> In a small cluster of 5 nodes that run CDH 5.3.0 (Hadoop 2.5.0) I want >>> to know how many remote map tasks (ones that read input data from remot= e >>> nodes) there are in a mapreduce job. For this purpose I took logs of ea= ch >>> datanode an looked for lines with "op: HDFS_READ" and cliID field that >>> contains map task id. >>> >>> Surprisingly, 4 datanode logs does not contain lines with "op: HDFS_REA= D". >>> Another 1 has many lines with "op: HDFS_READ" but all cliID look like >>> DFSClient_NONMAPREDUCE_* and does not contain any map task id. >>> >>> I concluded there are no remote map tasks but that does not look >>> correct. Also even local reads are not logged (because there is no line >>> where cliID field contains some map task id). Could anyone please >>> explain what's wrong? Why logging is not working? (I use default settin= gs). >>> >>> Chris, >>> >>> Found HADOOP-3062 >>> that you have implemented. Thought you might have an explanation. >>> >>> Best, >>> Igor >>> >>> >> > --001a1134046afa30e2050fd9c5dc Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Drake,

Thanks for a po= inter. AM log indeed have information about remote map tasks. But I'd l= ike to have more low level details. Like on which node each map task was sc= heduled and how many bytes was read. That should be exactly in datanode log= and I saw it for another job. But after I reinstall the cluster it's n= ot there anymore :(

Could you please tell the path where AM l= og is located (from which you copied the lines)? I found it in web interfac= e but not as file on a disk. And nothing in /var/log/hadoop-*

= Thanks,
Igor

On Tue, Feb 24, 2015 at 1:51 AM, Drake=EB=AF=BC=EC=98=81=EA= =B7=BC <drake.min@nexr.com> wrote:
I found this in the mapreduce am log.
2015-02-23 11:22:45,576 INFO [RMCommunicator Allocator] o= rg.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Schedulin= g: PendingReds:1 ScheduledMaps:5 ScheduledReds:0 AssignedMaps:0 AssignedRed= s:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 HostLocal:0 RackL= ocal:0
..
2015-02-23 11:22:46,641 INFO [RMCommunicator = Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Afte= r Scheduling: PendingReds:1 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:5 = AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:5 ContRel:0 HostLo= cal:3 RackLocal:2
..

The first lin= e says Map tasks are 5 and second says HostLocal 3 and Rack Local 2. I thin= k the Rack Local 2 are the remote map tasks as you mentioned before.
<= div>

Drake =EB=AF=BC= =EC=98=81=EA=B7=BC Ph.D
kt NexR

On Tue, Feb 24= , 2015 at 9:45 AM, Drake=EB=AF=BC=EC=98=81=EA=B7=BC <<= a href=3D"mailto:drake.min@nexr.com" target=3D"_blank">drake.min@nexr.com> wrote:
Hi,= Igor

Did you look at the mapreduce application master l= og? I think the local or rack local map tasks are logged in the MapReduce A= M log.

Good luck.

Dr= ake =EB=AF=BC=EC=98=81=EA=B7=BC Ph.D
kt NexR

On Tue, Feb 24, 2015 at 3:30 AM, Igor Bogomo= lov <igor.bogomolov@gmail.com> wrote:
Hi all,
In a small cluster of 5 nodes that run CDH 5.3.0 (= Hadoop 2.5.0) I want to know how many remote map tasks (ones t= hat read input data from remote nodes) there are in a mapreduce job. For th= is purpose I took logs of each datanode an looked for lines w= ith "op: HDFS_READ" and cliID field tha= t contains map task id.

Surprisingly, 4 datanode= logs does not contain lines with "op: HDFS_READ". A= nother 1 has many lines with "op: HDFS_READ" but al= l cliID look like DFSClient_NONMAPREDUCE_* and does not contain any map task id.

I concluded there are = no remote map tasks but that does not look correct. Also even local reads a= re not logged (because there is no line where cliID field cont= ains some map task id). Could anyone please explain what's wrong? Why l= ogging is not working? (I use default settings).

Chris,
Found HADOOP-3062 that you h= ave implemented. Thought you might have an explanation.

Best,<= br>
Igor




--001a1134046afa30e2050fd9c5dc--