Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1A80A17AAD for ; Tue, 24 Feb 2015 22:48:10 +0000 (UTC) Received: (qmail 92330 invoked by uid 500); 24 Feb 2015 22:47:30 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 92233 invoked by uid 500); 24 Feb 2015 22:47:30 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 92223 invoked by uid 99); 24 Feb 2015 22:47:30 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Feb 2015 22:47:30 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW X-Spam-Check-By: apache.org Received-SPF: error (nike.apache.org: local policy) Received: from [209.85.218.49] (HELO mail-oi0-f49.google.com) (209.85.218.49) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Feb 2015 22:47:05 +0000 Received: by mail-oi0-f49.google.com with SMTP id v63so70684oia.8 for ; Tue, 24 Feb 2015 14:46:41 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=nMWClWORUwwIZLl0Qj45WAjmJAujniqgzYqOh2klICg=; b=OO5N/3JA72ZZUo64O0B7CSXmUoUiACyqkwVDrkPew7pqWOGfoD7ssfDOhehcqnrp3x lyJi8YP1W2zKoOf0q4mXqkrvj/waFwZwIE1xaDyjpLYsc5e9e9LKSFJTv27lLfV5iEoJ PCx+BkwgRv5ZP8hztXZgJ4/wBFPZ6VwjyAE1LYrnmL3uw8lTdoESFQB0Elx/eoG4u3Ij lenHxLum9USSyHtgtCFHereEn3Y1j9c10057MLVM1j2zlsCtfH6OdkTY8ozEi0zc7CrY WuxeejHhdpeXoGqbynj6bjZMJPreuUxpi006r0z9k1BEC8pfy6zLtJZoXb9o1Fcg00GS qwaw== X-Gm-Message-State: ALoCoQlZHHMJq2bklAovz10YuTJc5zaOtf68FA5vJSmyTOKAjgiBVmq8mi5i50f6T6bqZYB6uITy MIME-Version: 1.0 X-Received: by 10.202.74.70 with SMTP id x67mr186968oia.36.1424818001389; Tue, 24 Feb 2015 14:46:41 -0800 (PST) Received: by 10.182.38.132 with HTTP; Tue, 24 Feb 2015 14:46:41 -0800 (PST) In-Reply-To: References: Date: Wed, 25 Feb 2015 07:46:41 +0900 Message-ID: Subject: Re: tracking remote reads in datanode logs From: =?UTF-8?B?RHJha2Xrr7zsmIHqt7w=?= To: Igor Bogomolov Cc: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=001a11352e18d7c25e050fdd4a16 X-Virus-Checked: Checked by ClamAV on apache.org --001a11352e18d7c25e050fdd4a16 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi, Igor The AM logs are in the Hdfs if you set log aggregation property. Otherwise, they are in the container log directory. See this: http://ko.hortonworks.com/blog/simplifying-user-logs-management-and-access-= in-yarn/ Thanks 2015=EB=85=84 2=EC=9B=94 25=EC=9D=BC =EC=88=98=EC=9A=94=EC=9D=BC, Igor Bogo= molov=EB=8B=98=EC=9D=B4 =EC=9E=91=EC=84=B1=ED=95= =9C =EB=A9=94=EC=8B=9C=EC=A7=80: > Hi Drake, > > Thanks for a pointer. AM log indeed have information about remote map > tasks. But I'd like to have more low level details. Like on which node ea= ch > map task was scheduled and how many bytes was read. That should be exactl= y > in datanode log and I saw it for another job. But after I reinstall the > cluster it's not there anymore :( > > Could you please tell the path where AM log is located (from which you > copied the lines)? I found it in web interface but not as file on a disk. > And nothing in /var/log/hadoop-* > > Thanks, > Igor > > On Tue, Feb 24, 2015 at 1:51 AM, Drake=EB=AF=BC=EC=98=81=EA=B7=BC > wrote: > >> I found this in the mapreduce am log. >> >> 2015-02-23 11:22:45,576 INFO [RMCommunicator Allocator] >> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before >> Scheduling: PendingReds:1 ScheduledMaps:5 ScheduledReds:0 AssignedMaps:0 >> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 >> HostLocal:0 RackLocal:0 >> .. >> 2015-02-23 11:22:46,641 INFO [RMCommunicator Allocator] >> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After >> Scheduling: PendingReds:1 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:5 >> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:5 ContRel:0 >> HostLocal:3 RackLocal:2 >> .. >> >> The first line says Map tasks are 5 and second says HostLocal 3 and Rack >> Local 2. I think the Rack Local 2 are the remote map tasks as you mentio= ned >> before. >> >> >> Drake =EB=AF=BC=EC=98=81=EA=B7=BC Ph.D >> kt NexR >> >> On Tue, Feb 24, 2015 at 9:45 AM, Drake=EB=AF=BC=EC=98=81=EA=B7=BC > > wrote: >> >>> Hi, Igor >>> >>> Did you look at the mapreduce application master log? I think the local >>> or rack local map tasks are logged in the MapReduce AM log. >>> >>> Good luck. >>> >>> Drake =EB=AF=BC=EC=98=81=EA=B7=BC Ph.D >>> kt NexR >>> >>> On Tue, Feb 24, 2015 at 3:30 AM, Igor Bogomolov < >>> igor.bogomolov@gmail.com >>> > wrote: >>> >>>> Hi all, >>>> >>>> In a small cluster of 5 nodes that run CDH 5.3.0 (Hadoop 2.5.0) I want >>>> to know how many remote map tasks (ones that read input data from remo= te >>>> nodes) there are in a mapreduce job. For this purpose I took logs of e= ach >>>> datanode an looked for lines with "op: HDFS_READ" and cliID field that >>>> contains map task id. >>>> >>>> Surprisingly, 4 datanode logs does not contain lines with "op: HDFS_RE= AD". >>>> Another 1 has many lines with "op: HDFS_READ" but all cliID look like >>>> DFSClient_NONMAPREDUCE_* and does not contain any map task id. >>>> >>>> I concluded there are no remote map tasks but that does not look >>>> correct. Also even local reads are not logged (because there is no lin= e >>>> where cliID field contains some map task id). Could anyone please >>>> explain what's wrong? Why logging is not working? (I use default setti= ngs). >>>> >>>> Chris, >>>> >>>> Found HADOOP-3062 >>>> that you have implemented. Thought you might have an explanation. >>>> >>>> Best, >>>> Igor >>>> >>>> >>> >> > --=20 Drake =EB=AF=BC=EC=98=81=EA=B7=BC Ph.D kt NexR --001a11352e18d7c25e050fdd4a16 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi, Igor

The AM logs are in the Hdfs if you set log aggr= egation property. Otherwise, they are in the container log directory. See t= his:=C2=A0http://ko.hortonworks.com/blog/simplifying-u= ser-logs-management-and-access-in-yarn/

Thanks=

2015=EB=85=84 2=EC=9B=94 25=EC=9D=BC =EC=88=98=EC=9A=94=EC=9D=BC, I= gor Bogomolov<igor.bogomolov= @gmail.com>=EB=8B=98=EC=9D=B4 =EC=9E=91=EC=84=B1=ED=95=9C =EB=A9=94= =EC=8B=9C=EC=A7=80:
Hi Drake,

Thanks for a pointer. AM log indeed = have information about remote map tasks. But I'd like to have more low = level details. Like on which node each map task was scheduled and how many = bytes was read. That should be exactly in datanode log and I saw it for ano= ther job. But after I reinstall the cluster it's not there anymore :( <= br>
Could you please tell the path where AM log is located (from w= hich you copied the lines)? I found it in web interface but not as file on = a disk. And nothing in /var/log/hadoop-*

Thanks,
Igor=

On Tue,= Feb 24, 2015 at 1:51 AM, Drake=EB=AF=BC=EC=98=81=EA=B7=BC <drake.min@nexr.com> wrote:
I found this in the mapreduce = am log.

2015-02-23 11:22:45,576 INFO [RMCommunicato= r Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Be= fore Scheduling: PendingReds:1 ScheduledMaps:5 ScheduledReds:0 AssignedMaps= :0 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 Hos= tLocal:0 RackLocal:0
..
2015-02-23 11:22:46,641 INFO [R= MCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerA= llocator: After Scheduling: PendingReds:1 ScheduledMaps:0 ScheduledReds:0 A= ssignedMaps:5 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:5 Co= ntRel:0 HostLocal:3 RackLocal:2
..

The first line says Map tasks are 5 and second says HostLocal 3 and Rack L= ocal 2. I think the Rack Local 2 are the remote map tasks as you mentioned = before.


Drake =EB=AF=BC= =EC=98=81=EA=B7=BC Ph.D
kt NexR

On Tue, Feb 24, 2015 at 9:4= 5 AM, Drake=EB=AF=BC=EC=98=81=EA=B7=BC <drake.min@nexr.com> wrote:
Hi, Igor

Did you look at the ma= preduce application master log? I think the local or rack local map tasks a= re logged in the MapReduce AM log.

Good luck.

Drake =EB=AF=BC=EC=98=81=EA=B7=BC Ph.D
= kt NexR

On Tue, Feb 24, 2015 at 3:30 AM, Igor Bogomo= lov <igor.bogomolov@gmail.= com> wrote:
Hi all,

In a small cluster of = 5 nodes that run CDH 5.3.0 (Hadoop 2.5.0) I want = to know how many remote map tasks (ones that read input data from remote no= des) there are in a mapreduce job. For this purpose I took logs of each datanode an looked for lines with "op: HDFS_RE= AD" and cliID field that contains map task id.

Surprisingly, 4 datanode logs does not contain lines with = "op: HDFS_READ". Another 1 has many lines with &quo= t;op: HDFS_READ" but all cliID look like DFSClient_NONMAPREDUCE_* and does not contain any ma= p task id.

I concluded there are no remote map tasks but that does = not look correct. Also even local reads are not logged (because there is no= line where cliID field contains some map task id). Could anyo= ne please explain what's wrong? Why logging is not working? (I use defa= ult settings).

Chris,

Found HADOOP-3062 that you have implemented. Thought you might= have an explanation.

Best,
Igor






--
Dr= ake =EB=AF=BC=EC=98=81=EA=B7=BC Ph.D
kt NexR
--001a11352e18d7c25e050fdd4a16--