Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3C4E9D300 for ; Thu, 13 Dec 2012 10:51:46 +0000 (UTC) Received: (qmail 96302 invoked by uid 500); 13 Dec 2012 10:51:44 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 96120 invoked by uid 500); 13 Dec 2012 10:51:44 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 96067 invoked by uid 99); 13 Dec 2012 10:51:42 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Dec 2012 10:51:42 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dontariq@gmail.com designates 209.85.216.48 as permitted sender) Received: from [209.85.216.48] (HELO mail-qa0-f48.google.com) (209.85.216.48) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Dec 2012 10:51:33 +0000 Received: by mail-qa0-f48.google.com with SMTP id l8so2788993qaq.14 for ; Thu, 13 Dec 2012 02:51:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=k2VmWnAl8v3DNPcWfDK/SDRq6v0Tv3QgvYqPjA90DXE=; b=LUHlvyTCjdAYbgr51yXnptE3o2cSGsBZa5V64qNdIer6cO+WVF2C+EhAQjO7Bu8MJB qI4RSuAns5VGNwNA2un+Np1sVTtHOUzh+5SwzQN0iyrCL6HZVW9G/jmOCl8b5zjLOGlL 4V9ONQ1NVxIGBn4YKhdEueklqRlPfjgjQL0OyC/hgxzORY+E1Wakfc8NBMjohF/4VMSY nv5bH2kRlzatMOTPPAUAOHCKzoBa5Dn8JfL4x9T9YNpErZ7uJOPiSwo7HzafmhNEG0bV 2Fhjx7CGFyBLZw1VL377R4Vg0WGf/srgHDKlBsJNFcqqejy5UfHbBvp/kXplg57fsBeK umpA== Received: by 10.224.191.6 with SMTP id dk6mr759414qab.71.1355395872997; Thu, 13 Dec 2012 02:51:12 -0800 (PST) MIME-Version: 1.0 Received: by 10.229.126.165 with HTTP; Thu, 13 Dec 2012 02:50:32 -0800 (PST) In-Reply-To: References: From: Mohammad Tariq Date: Thu, 13 Dec 2012 16:20:32 +0530 Message-ID: Subject: Re: Incresing map reduce tasks will increse the time of the cpu does this seem to be correct To: user Content-Type: multipart/alternative; boundary=20cf3005dc3a8a407d04d0b9b1d7 X-Virus-Checked: Checked by ClamAV on apache.org --20cf3005dc3a8a407d04d0b9b1d7 Content-Type: text/plain; charset=ISO-8859-1 Hello Imen, If you have huge no of tasks then the overhead of managing the map and reduce task creation begins to dominate the total job execution time. Also, more tasks means you need more free cpu slots. If the slots are not free then the data block of interest will be moved to some other node where frees lots are available and it will consume time and it is also against the most basic principle of Hadoop i.e data localization. So, the no. of maps and reduces should be raised keeping all the factors in mind, otherwise you may face performance issues. HTH Regards, Mohammad Tariq On Thu, Dec 13, 2012 at 4:11 PM, Nitin Pawar wrote: > If the number of maps or reducers your job launched are more than the > jobqueue/cluster capacity, cpu time will increase > On Dec 13, 2012 4:02 PM, "imen Megdiche" wrote: > >> Hello, >> >> I am trying to increase the number of map and reduce tasks for a job and >> even for the same data size, I noticed that the total time CPU increases but >> I thought it would decrease. MapReduce is known for performance calculation, >> but I do not see this when i do these small tests. >> >> What de you thins about this issue ? >> >> --20cf3005dc3a8a407d04d0b9b1d7 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hello Imen,

=A0 =A0 =A0 If you have huge no of tasks the= n=A0the overhead of managing the map and reduce task creation begins to dom= inate the total job execution time. Also, more tasks means you need more fr= ee cpu slots. If the slots are not free then the data block of interest wil= l be moved to some other node where frees lots are available and it will co= nsume time and it is also against the most basic principle of Hadoop i.e da= ta localization. So, the no. of maps and reduces should be raised keeping a= ll the factors in mind, otherwise you may face performance issues.

HTH


Regards,=A0=A0 =A0Mohammad Tariq



On Thu, Dec 13, 2012 at 4:11 PM, Nitin P= awar <nitinpawar432@gmail.com> wrote:

If the number of maps or reducers your job launched are more= than the jobqueue/cluster capacity, cpu time will increase

On Dec 13, 2012 4:02 PM, "imen Megdiche&quo= t; <imen.me= gdiche@gmail.com> wrote:
Hello,

I am trying= to increase the number of map and r= educe tasks for a job a= nd even for the same data size, I noticed that the total time CPU increa= ses but I thought it wo= uld decrease. MapReduce is known for performance calculation, but I do not see= =A0this when i=A0 do these small tests.

What de you thins about this issue ?


--20cf3005dc3a8a407d04d0b9b1d7--