hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From shashwat shriparv <dwivedishash...@gmail.com>
Subject Re: 2 Map tasks running for a small input file
Date Thu, 26 Sep 2013 12:37:08 GMT
just try giving -Dmapred.tasktracker.map.tasks.maximum=1 on the command
line and check how many map task its running. and also set this in
mapred-site.xml and check.

*Thanks & Regards    *

∞
Shashwat Shriparv



On Thu, Sep 26, 2013 at 5:24 PM, Harsh J <harsh@cloudera.com> wrote:

> Hi Sai,
>
> What Viji indicated is that the default Apache Hadoop setting for any
> input is 2 maps. If the input is larger than one block, regular
> policies of splitting such as those stated by Shekhar would apply. But
> for smaller inputs, just for an out-of-box "parallelism experience",
> Hadoop ships with a 2-maps forced splitting default
> (mapred.map.tasks=2).
>
> This means your 5 lines is probably divided as 2:3 or other ratios and
> is processed by 2 different Tasks. As Viji also indicated, to turn off
> this behavior, you can set the mapred.map.tasks to 1 in your configs
> and then you'll see only one map task process all 5 lines.
>
> On Thu, Sep 26, 2013 at 4:59 PM, Sai Sai <saigraph@yahoo.in> wrote:
> > Thanks Viji.
> > I am confused a little when the data is small y would there b 2 tasks.
> > U will use the min as 2 if u need it but in this case it is not needed
> due
> > to size of the data being small
> > so y would 2 map tasks exec.
> > Since it results in 1 block with 5 lines of data in it
> > i am assuming this results in 5 map computations 1 per each line
> > and all of em in 1 process/node since i m using a pseudo vm.
> > Where is the second task coming from.
> > The 5 computations of map on each line is 1 task.
> > Is this right.
> > Please help.
> > Thanks
> >
> >
> > ________________________________
> > From: Viji R <viji@cloudera.com>
> > To: user@hadoop.apache.org; Sai Sai <saigraph@yahoo.in>
> > Sent: Thursday, 26 September 2013 5:09 PM
> > Subject: Re: 2 Map tasks running for a small input file
> >
> > Hi,
> >
> > Default number of map tasks is 2. You can set mapred.map.tasks to 1 to
> > avoid this.
> >
> > Regards,
> > Viji
> >
> > On Thu, Sep 26, 2013 at 4:28 PM, Sai Sai <saigraph@yahoo.in> wrote:
> >> Hi
> >> Here is the input file for the wordcount job:
> >> ******************
> >> Hi This is a simple test.
> >> Hi Hadoop how r u.
> >> Hello Hello.
> >> Hi Hi.
> >> Hadoop Hadoop Welcome.
> >> ******************
> >>
> >> After running the wordcount successfully
> >> here r the counters info:
> >>
> >> ***************
> >> Job Counters SLOTS_MILLIS_MAPS 0 0 8,386
> >> Launched reduce tasks 0 0 1
> >> Total time spent by all reduces waiting after reserving slots (ms) 0 0 0
> >> Total time spent by all maps waiting after reserving slots (ms) 0 0 0
> >> Launched map tasks 0 0 2
> >> Data-local map tasks 0 0 2
> >> SLOTS_MILLIS_REDUCES 0 0 9,199
> >> ***************
> >> My question why r there 2 launched map tasks when i have only a small
> >> file.
> >> Per my understanding it is only 1 block.
> >> and should be only 1 split.
> >> Then for each line a map computation should occur
> >> but it shows 2 map tasks.
> >> Please let me know.
> >> Thanks
> >> Sai
> >>
> >
> >
>
>
>
> --
> Harsh J
>

Mime
View raw message