hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cornelio Iñigo <cornelio.ini...@gmail.com>
Subject Re: program running faster on single node than cluster
Date Thu, 18 Nov 2010 06:19:13 GMT
Hi
the cluster has 12 nodes and the master node, I made a new test increasing
the child nodes memory to 2000m and the HADOOP_HEAP_SIZE to 2000m
and mapred.tasktracker.map.tasks.maximum and
mapred.tasktracker.reduce.tasks.maximum is 2 (like default) and now the time
is 6 minutes, but I think it is very much time compared to the single node
run (7 to 8 minutes)

It seems to be a configuration issue but I'm not sure what values I have to
put (for the 12 nodes cluster).
Bibliography says that
  mapred.tasktracker.map.tasks.maximum between 10 and 100 maps/node
  mapred.tasktracker.reduce.tasks.maximum  1.75 * nodes

or

mapred.tasktracker.map.tasks.maximum = 10 * #slaves
  mapred.tasktracker.reduce.tasks.maximum  2 * #slaves processors

Thanks



2010/11/17 Alex Baranau <alex.baranov.v@gmail.com>

> How many nodes do you use for you "fully distributed" cluster?
>
> Alex Baranau
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
>
> On Wed, Nov 17, 2010 at 5:44 AM, Cornelio Iñigo
> <cornelio.inigof@gmail.com>wrote:
>
> > Hi
> >
> > I have a question to you:
> >
> > I developed a program using Hadoop, it has one map function and one
> reduce
> > function (like WordCount) and in the map function I do all the process of
> > my
> > data
> > when I run this program in a single node machine it takes like 7 minutes
> > (its a small dataset), in a pseudo-distributed machine takes like 7
> minutes
> > too, but when I run it on a
> > full distributed cluster (12 nodes) it takes much longer, like an hour!!
> >
> > I tried changing the mapred.tasktracker.map.tasks.maximum and
> > mapred.tasktracker.reduce.tasks.maximum variables (2 and 2 like default,
> 10
> > and 2, 2 and 10, 5 and 5) and the results are the same
> > Am I missing something?
> > Is this a cluster configuration issue or is in my program?
> >
> > Thanks
> >
> > --
> > *Cornelio*
> >
>



-- 
*Cornelio*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message