Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: core-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of snickerdoodle08@gmail.com
 designates 209.85.217.170 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=YZD1UiIcAQ7eqoAoQ7iJ8O39WHkCGcMBczhg3rMvFt9haYXntQMDw021i2e1G6PaKg
         pkK1hdIQ+I1OrDyFv9ZwmooXHa/krXktVcgOPYMgKO+XTYjc4YLw7jS7OKXqLIYoTTs+
         oaXHEq+adlLbSknHH1Nlg21p8UW1x9S/yt+Lo=
MIME-Version: 1.0
In-Reply-To: <257c70550903051022i496c9004qf156be53098d5b61@mail.gmail.com>
References: <257c70550903041446w6d456184gdf14323bbef60783@mail.gmail.com>
	 <E001FC33-1636-4B8F-9F87-03BAC02B3E76@yahoo-inc.com>
	 <257c70550903051022i496c9004qf156be53098d5b61@mail.gmail.com>
Date: Thu, 5 Mar 2009 17:10:22 -0600
Message-ID: <257c70550903051510h754d571y588a0f67bde64883@mail.gmail.com>
Subject: Re: wordcount getting slower with more mappers and reducers?
From: Sandy <snickerdoodle08@gmail.com>
To: core-user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=000e0cd56aa4ca98ee0464674521

--000e0cd56aa4ca98ee0464674521
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

I was trying to control the maximum number of tasks per tasktracker by using
the
mapred.tasktracker.tasks.maximum parameter

I am interpreting your comment to mean that maybe this parameter is
malformed and should read:
mapred.tasktracker.map.tasks.maximum = 8
mapred.tasktracker.map.tasks.maximum = 8

I did that, and reran on a 428MB input, and got the same results as before.
I also ran it on a 3.3G dataset, and got the same pattern.

I am still trying to run it on a 20 GB input. This should confirm if the
filesystem cache thing is true.

-SM

On Thu, Mar 5, 2009 at 12:22 PM, Sandy <snickerdoodle08@gmail.com> wrote:

> Arun,
>
> How can I check the number of slots per tasktracker? Which parameter
> controls that?
>
> Thanks,
> -SM
>
>
> On Thu, Mar 5, 2009 at 12:14 PM, Arun C Murthy <acm@yahoo-inc.com> wrote:
>
>> I assume you have only 2 map and 2 reduce slots per tasktracker - which
>> totals to 2 maps/reduces for you cluster. This means with more maps/reduces
>> they are serialized to 2 at a time.
>>
>> Also, the -m is only a hint to the JobTracker, you might see less/more
>> than the number of maps you have specified on the command line.
>> The -r however is followed faithfully.
>>
>> Arun
>>
>>
>> On Mar 4, 2009, at 2:46 PM, Sandy wrote:
>>
>>  Hello all,
>>>
>>> For the sake of benchmarking, I ran the standard hadoop wordcount example
>>> on
>>> an input file using 2, 4, and 8 mappers and reducers for my job.
>>> In other words,  I do:
>>>
>>> time -p bin/hadoop jar hadoop-0.18.3-examples.jar wordcount -m 2 -r 2
>>> sample.txt output
>>> time -p bin/hadoop jar hadoop-0.18.3-examples.jar wordcount -m 4 -r 4
>>> sample.txt output2
>>> time -p bin/hadoop jar hadoop-0.18.3-examples.jar wordcount -m 8 -r 8
>>> sample.txt output3
>>>
>>> Strangely enough, when this increase in mappers and reducers result in
>>> slower running times!
>>> -On 2 mappers and reducers it ran for 40 seconds
>>> on 4 mappers and reducers it ran for 60 seconds
>>> on 8 mappers and reducers it ran for 90 seconds!
>>>
>>> Please note that the "sample.txt" file is identical in each of these
>>> runs.
>>>
>>> I have the following questions:
>>> - Shouldn't wordcount get -faster- with additional mappers and reducers,
>>> instead of slower?
>>> - If it does get faster for other people, why does it become slower for
>>> me?
>>>  I am running hadoop on psuedo-distributed mode on a single 64-bit Mac
>>> Pro
>>> with 2 quad-core processors, 16 GB of RAM and 4 1TB HDs
>>>
>>> I would greatly appreciate it if someone could explain this behavior to
>>> me,
>>> and tell me if I'm running this wrong. How can I change my settings (if
>>> at
>>> all) to get wordcount running faster when i increases that number of maps
>>> and reduces?
>>>
>>> Thanks,
>>> -SM
>>>
>>
>>
>

--000e0cd56aa4ca98ee0464674521--