hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shengkai Zhu" <geniusj...@gmail.com>
Subject Re: parallel mapping on single server
Date Fri, 11 Jul 2008 04:02:54 GMT
Is this data-local dispatching still a design or already implemeted?
And if implemented, in which version it is, for i didn't find its
implementation in 0.16.0.

Thanks


On 7/11/08, Joman Chu <jomanc@andrew.cmu.edu> wrote:
>
> Hadoop will try to split the file according to how it is split up in
> the HDFS. For example, if an input file has three blocks with a
> replication factor of two, there are six total blocks. Say there are
> six machines, each with a single block. Block 1 is on machines 1 and
> 2, block 2 is on 3 and 4, and block 3 is on 5 and 6. Hadoop will make
> three Map tasks. Each task is assigned to a machine and it will
> process the block that is locally on that machine. If it can't do
> this, then blocks are transferred among the rack and then to other
> machines in the cluster but further away.
>
> Joman Chu
> AIM: ARcanUSNUMquam
> IRC: irc.liquid-silver.net
>
>
> On Thu, Jul 10, 2008 at 10:40 AM, hong <minghong.zhou@163.com> wrote:
> > Hi
> >
> > Follows Cao Haijun's reply:
> >
> > Suppose we have set 8 map tasks. How does each map know which part of
> input
> > file it should process?
> >
> > 在 2008-7-10,上午2:33,Haijun Cao 写道:
> >
> >> Set number of map slots per tasktracker to 8 in order to run 8 map tasks
> >> on one machine (assuming one tasktracker per machine) at the same time:
> >>
> >>
> >> <property>
> >>  <name>mapred.tasktracker.map.tasks.maximum</name>
> >>  <value>8</value>
> >>  <description>The maximum number of map tasks that will be run
> >>  simultaneously by a task tracker.
> >>  </description>
> >> </property>
> >>
> >>
> >> -----Original Message-----
> >> From: Deepak Diwakar [mailto:ddeepak4u@gmail.com]
> >> Sent: Monday, July 07, 2008 1:29 AM
> >> To: core-user@hadoop.apache.org
> >> Subject: parallel mapping on single server
> >>
> >> Hi,
> >>
> >> I am pretty naive to hadoop. I ran a modification of wordcount  on
> >> almost a
> >> TB data on single server, but found that it takes too much time.
> >> Actually i
> >> found that at a time only one core is utilized even though my server is
> >> of 8
> >> cores.  I read that hadoop speeds up computation in DFS mode.But how to
> >> make
> >> full utilization of a single server with multicore processors?  Is there
> >> in
> >> pseudo dfs mode in hadoop? What are the changes required in config files
> >> .Please let me know in detail. Is there anything to do with
> >> hadoop-site.xml
> >> and mapred-default.xml?
> >>
> >> Thanks in advance.
> >> --
> >> - Deepak Diwakar,
> >> Associate Software Eng.,
> >> Pubmatic, pune
> >> Contact: +919960930405
> >
> >
> >
> >
>
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message