hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Hafner <ted...@tedivm.com>
Subject Re: No Mapper but Reducer
Date Wed, 07 Sep 2011 16:33:44 GMT

You could just have a mapper which sent off the exact values it took in (ie, output k1,v1
as k2,v2). I think that's the best you'll be able to do here.



On Sep 7, 2011, at 4:21 AM, Bejoy KS <bejoy.hadoop@gmail.com> wrote:

> Thank You All. Even I have noticed this strange behavior some time back. 
> Now my inital concern still remains.  If I provide my input directory an empty one, yes
the map tasks wont be executed .But my reducer needs  input to do the processing/ aggregation.
In such a scenario, is there an option to provide input just to the reducer?
> 
> Regards
> Bejoy.K.S
> 
> On Wed, Sep 7, 2011 at 3:09 PM, Sudharsan Sampath <sudhan65@gmail.com> wrote:
> This is true and it took as off by surprise in recent past. Also, it had quite some impact
on our job cycles where the size of input is totally random and could also be zero at times.

> 
> In one of our cycles, we run a lot of jobs. Say we configure X as the num of reducers
for a job which does not have any input.
> 
> Y -> No of tasktrackers in the cluster
> 
> H -> Time Interval for Heartbeat response
> 
> With the cdh2 version, the job takes, 
> 
> ( X / Y) * H seconds to complete without doing any work since we assign only one reduce
task per heartbeat
> 
> 
> If the number of such jobs in the cycle is more, then the total time that the cluster
spends doing nothing accumulates.
> 
> I was thinking of raising this as a jira but not sure. Should we raise and fix this as
jira request? Num of reducers set by the client can be overriden if the number of mappers
is 0?
> 
> We have a way to hack, by verifying the existence of the input path to the Map phase
ourselves but just thought would be more intuitive for the framework to handle itself
> 
> -Sudhan S
> 
> On Wed, Sep 7, 2011 at 2:25 PM, Harsh J <harsh@cloudera.com> wrote:
> Oh boy are you in for a surprise. Reducers _can_ run with 0 mappers in a job ;-)
> 
> /me puts his troll-mask on.
> 
> ➜  ~HADOOP_HOME  hadoop fs -mkdir abc
> ➜  ~HADOOP_HOME  hadoop jar hadoop-examples-0.20.2-cdh3u1.jar wordcount abc out
> 11/09/07 14:24:14 INFO input.FileInputFormat: Total input paths to process : 0
> 11/09/07 14:24:14 INFO mapred.JobClient: Running job: job_201109071413_0001
> 11/09/07 14:24:15 INFO mapred.JobClient:  map 0% reduce 0%
> 11/09/07 14:24:21 INFO mapred.JobClient:  map 0% reduce 100%
> 11/09/07 14:24:22 INFO mapred.JobClient: Job complete: job_201109071413_0001
> 11/09/07 14:24:22 INFO mapred.JobClient: Counters: 13
> 11/09/07 14:24:22 INFO mapred.JobClient:   Job Counters
> 11/09/07 14:24:22 INFO mapred.JobClient:     Launched reduce tasks=1
> 11/09/07 14:24:22 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=2209
> 11/09/07 14:24:22 INFO mapred.JobClient:     Total time spent by all
> reduces waiting after reserving slots (ms)=0
> 11/09/07 14:24:22 INFO mapred.JobClient:     Total time spent by all
> maps waiting after reserving slots (ms)=0
> 11/09/07 14:24:22 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=3113
> 11/09/07 14:24:22 INFO mapred.JobClient:   FileSystemCounters
> 11/09/07 14:24:22 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=59220
> 11/09/07 14:24:22 INFO mapred.JobClient:   Map-Reduce Framework
> 11/09/07 14:24:22 INFO mapred.JobClient:     Reduce input groups=0
> 11/09/07 14:24:22 INFO mapred.JobClient:     Combine output records=0
> 11/09/07 14:24:22 INFO mapred.JobClient:     Reduce shuffle bytes=0
> 11/09/07 14:24:22 INFO mapred.JobClient:     Reduce output records=0
> 11/09/07 14:24:22 INFO mapred.JobClient:     Spilled Records=0
> 11/09/07 14:24:22 INFO mapred.JobClient:     Combine input records=0
> 11/09/07 14:24:22 INFO mapred.JobClient:     Reduce input records=0
> 
> /me takes off troll mask.
> 
> On Wed, Sep 7, 2011 at 1:30 PM, Bejoy KS <bejoy.hadoop@gmail.com> wrote:
> > Thanks Sonal. I was just thinking of some weird design and wanted to make
> > sure whether there is a possibility like that- no maps and all reducers.
> >
> > On Wed, Sep 7, 2011 at 1:22 PM, Sonal Goyal <sonalgoyal4@gmail.com> wrote:
> >>
> >> I dont think that is possible, can you explain in what scenario you want
> >> to have no mappers, only reducers?
> >> Best Regards,
> >> Sonal
> >> Crux: Reporting for HBase
> >> Nube Technologies
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Wed, Sep 7, 2011 at 1:18 PM, Bejoy KS <bejoy.hadoop@gmail.com> wrote:
> >>>
> >>> Hi
> >>>           I'm having a query here. Is it possible to have no mappers but
> >>> reducers alone? AFAIK If we need to avoid the tyriggering of reducers we
can
> >>> set numReduceTasks to zero but such a setting on mapper wont work. So how
> >>> can it be achieved if possible?
> >>>
> >>> Thank You
> >>>
> >>> Regards
> >>> Bejoy.K.S
> >>
> >
> >
> 
> 
> 
> --
> Harsh J
> 
> 

Mime
View raw message