Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4CACE7323 for ; Sat, 3 Dec 2011 14:30:52 +0000 (UTC) Received: (qmail 72354 invoked by uid 500); 3 Dec 2011 14:30:51 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 72300 invoked by uid 500); 3 Dec 2011 14:30:50 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 72288 invoked by uid 99); 3 Dec 2011 14:30:50 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 03 Dec 2011 14:30:50 +0000 X-ASF-Spam-Status: No, hits=1.6 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of arunk786@gmail.com designates 209.85.213.176 as permitted sender) Received: from [209.85.213.176] (HELO mail-yx0-f176.google.com) (209.85.213.176) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 03 Dec 2011 14:30:43 +0000 Received: by yenm10 with SMTP id m10so2855398yen.35 for ; Sat, 03 Dec 2011 06:30:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=2FEncvnkxyXKkcqIOaFW9Y38pDUvq1xaCbC2U/0Q+PE=; b=opU86dnwHJk+F4FTs/wT03txHPJOQSnYDCVmr8qoM5QpAy8q+dyY/MsMPGhjIfpq2L XWqmQyc7jT17bD+5yhlj1DiY1cXbwhuyAK6N500J1xC2BGZg7Ddus7ij9RZz5Ucews/M V83WXM8ip9eg1hioO4+tQM4vO3b5qqChFUa6s= Received: by 10.236.124.6 with SMTP id w6mr2836495yhh.94.1322922622499; Sat, 03 Dec 2011 06:30:22 -0800 (PST) MIME-Version: 1.0 Received: by 10.236.201.70 with HTTP; Sat, 3 Dec 2011 06:30:01 -0800 (PST) In-Reply-To: <84C0C20E-1523-49D4-BBFD-BFF549FB4B5F@cloudera.com> References: <84C0C20E-1523-49D4-BBFD-BFF549FB4B5F@cloudera.com> From: arun k Date: Sat, 3 Dec 2011 20:00:01 +0530 Message-ID: Subject: Re: Capturing Map/reduce task run times and bytes read To: mapreduce-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=20cf300e4f07fabbe404b330ec2b X-Virus-Checked: Checked by ClamAV on apache.org --20cf300e4f07fabbe404b330ec2b Content-Type: text/plain; charset=ISO-8859-1 Harsh, I wanted to conform about it b'coz in case if it doesn't i want to write code to capture it. Does it make sense to classify a map/reduce task as I/O bound or cpu bound based on its I/O rate ? Arun On Sat, Dec 3, 2011 at 2:43 PM, Harsh J wrote: > Arun, > > Inline again. > > On 03-Dec-2011, at 12:39 PM, arun k wrote: > > > Q>Does the map/reduce task run time displayed in web GUI is > decent/accurate enough ? > > > Don't see why not. We only display what's been genuinely collected. What > you get out of an API on the CLI is absolutely the same thing. Or perhaps I > do not understand your question completely here - what's led you to ask > this? > > Q>If i want to do find the IO rate of a task, will the task run time > divided by total number of FIle bytes and HDFS bytes read/written give it > approximately ? > > > Yes, that should give you a stop-watch measure. Task start -> Task end, > and the counters the task puts up for itself. > > Q>Does the FILE Bytes read for the reduce task include the map output > record bytes read non-locally over network or the bytes read locally from > the map output records after they are copied locally ? > > > FILE counters are from whatever is read off a local filesystem (file:///), > so would mean the latter. If you look again, you will notice another > counter named "Reduce shuffle bytes" that gives you the former count - > separately. > --20cf300e4f07fabbe404b330ec2b Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Harsh,

I wanted to conform about it b'coz in case if= it doesn't i want to write code to capture it.

Does it make sense to classify a map/reduce task as I/O bound or cpu boun= d based on its I/O rate ?

Arun

On Sat, D= ec 3, 2011 at 2:43 PM, Harsh J <harsh@cloudera.com> wrote:
Arun,

Inline again.<= /div>

On 03-Dec-2011, at 12:39 PM, arun= k wrote:

<= /font>
Q>Does the map/reduce task run time displayed in web GUI is decent/= accurate enough ?

Don't see= why not. We only display what's been genuinely collected. What you get= out of an API on the CLI is absolutely the same thing. Or perhaps I do not= understand your question completely here - what's led you to ask this?=

Q>If i want to do find the IO rate of a task, will the task run tim= e divided by total number of FIle bytes and HDFS bytes read/written give it= approximately ?

Yes, that shou= ld give you a stop-watch measure. Task start -> Task end, and the counte= rs the task puts up for itself.

Q>Does the FILE Byt= es read for the reduce task include the map output record bytes read non-lo= cally over network or the bytes read locally from the map output records af= ter they are copied locally ?

FILE counters are from whatever is = read off a local filesystem (file:///), so would mean the latter. If= you look again, you will notice another counter named "Reduce shuffle bytes" that gives yo= u the former count - separately.

--20cf300e4f07fabbe404b330ec2b--