hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandy <snickerdoodl...@gmail.com>
Subject Re: wordcount getting slower with more mappers and reducers?
Date Fri, 06 Mar 2009 18:34:18 GMT
I'm trying to make sense of the results, but running it like this working at
least a little better.

map4  reduce1
map4 reduce2
map4 reduce4
map4 reduce8

I tried keeping the reduces constant, while varying the maps.. this results
in an increase of running time.

When I tried keeping the maps constant, and varying the reduces, I got
something better, though when it hit something like map4 reduce4, the
running time shoots up, even though previously it had been decreasing.

This has been very helpful... though I am very curious: Is the reason one
worked better than the other a function of the input only? Or what about
pseudo-distibuted mode makes one way work better than the other?

Thanks again!

-SM



On Thu, Mar 5, 2009 at 9:04 PM, haizhou zhao <randomtea@gmail.com> wrote:

> As I metioned above, you should at least try like this:
> map2 reduce1
> map4 reduce1
> map8 reduce1
>
> map4 reduce1
> map4 reduce2
> map4 reduce4
>
> instead of :
> map2 reduce2
> map4 reduce4
> map8 reduce8
>
> 2009/3/6 Sandy <snickerdoodle08@gmail.com>
>
> > I was trying to control the maximum number of tasks per tasktracker by
> > using
> > the
> > mapred.tasktracker.tasks.maximum parameter
> >
> > I am interpreting your comment to mean that maybe this parameter is
> > malformed and should read:
> > mapred.tasktracker.map.tasks.maximum = 8
> > mapred.tasktracker.map.tasks.maximum = 8
> >
> > I did that, and reran on a 428MB input, and got the same results as
> before.
> > I also ran it on a 3.3G dataset, and got the same pattern.
> >
> > I am still trying to run it on a 20 GB input. This should confirm if the
> > filesystem cache thing is true.
> >
> > -SM
> >
> > On Thu, Mar 5, 2009 at 12:22 PM, Sandy <snickerdoodle08@gmail.com>
> wrote:
> >
> > > Arun,
> > >
> > > How can I check the number of slots per tasktracker? Which parameter
> > > controls that?
> > >
> > > Thanks,
> > > -SM
> > >
> > >
> > > On Thu, Mar 5, 2009 at 12:14 PM, Arun C Murthy <acm@yahoo-inc.com>
> > wrote:
> > >
> > >> I assume you have only 2 map and 2 reduce slots per tasktracker -
> which
> > >> totals to 2 maps/reduces for you cluster. This means with more
> > maps/reduces
> > >> they are serialized to 2 at a time.
> > >>
> > >> Also, the -m is only a hint to the JobTracker, you might see less/more
> > >> than the number of maps you have specified on the command line.
> > >> The -r however is followed faithfully.
> > >>
> > >> Arun
> > >>
> > >>
> > >> On Mar 4, 2009, at 2:46 PM, Sandy wrote:
> > >>
> > >>  Hello all,
> > >>>
> > >>> For the sake of benchmarking, I ran the standard hadoop wordcount
> > example
> > >>> on
> > >>> an input file using 2, 4, and 8 mappers and reducers for my job.
> > >>> In other words,  I do:
> > >>>
> > >>> time -p bin/hadoop jar hadoop-0.18.3-examples.jar wordcount -m 2 -r
2
> > >>> sample.txt output
> > >>> time -p bin/hadoop jar hadoop-0.18.3-examples.jar wordcount -m 4 -r
4
> > >>> sample.txt output2
> > >>> time -p bin/hadoop jar hadoop-0.18.3-examples.jar wordcount -m 8 -r
8
> > >>> sample.txt output3
> > >>>
> > >>> Strangely enough, when this increase in mappers and reducers result
> in
> > >>> slower running times!
> > >>> -On 2 mappers and reducers it ran for 40 seconds
> > >>> on 4 mappers and reducers it ran for 60 seconds
> > >>> on 8 mappers and reducers it ran for 90 seconds!
> > >>>
> > >>> Please note that the "sample.txt" file is identical in each of these
> > >>> runs.
> > >>>
> > >>> I have the following questions:
> > >>> - Shouldn't wordcount get -faster- with additional mappers and
> > reducers,
> > >>> instead of slower?
> > >>> - If it does get faster for other people, why does it become slower
> for
> > >>> me?
> > >>>  I am running hadoop on psuedo-distributed mode on a single 64-bit
> Mac
> > >>> Pro
> > >>> with 2 quad-core processors, 16 GB of RAM and 4 1TB HDs
> > >>>
> > >>> I would greatly appreciate it if someone could explain this behavior
> to
> > >>> me,
> > >>> and tell me if I'm running this wrong. How can I change my settings
> (if
> > >>> at
> > >>> all) to get wordcount running faster when i increases that number of
> > maps
> > >>> and reduces?
> > >>>
> > >>> Thanks,
> > >>> -SM
> > >>>
> > >>
> > >>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message