hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bejoy KS <bejoy.had...@gmail.com>
Subject Re: No Mapper but Reducer
Date Thu, 08 Sep 2011 06:09:35 GMT
Exactly Matthew, The weird thought was in that direction. Basically i do
have a tilde separated input which has to undergo some aggregation
operation. So I was just giving a shot to see if there is a possibility to
run directly into Sort Shuffle phase directly and then the reducer without a
mapper. I know I need to need at least depend on IdentityMapper.
                 A small query on top of this. If we take a basic map reduce
job, say word count without a combiner. What would the percentage
distribution of execution time on map, reduce and the sort shuffle phase?


On Wed, Sep 7, 2011 at 10:30 PM, GOEKE, MATTHEW (AG/1000) <
matthew.goeke@monsanto.com> wrote:

>  Bejoy,****
>
> ** **
>
> What exactly is your use case? I know down below you said you were just
> thinking of a weird design but it would really help if we knew exactly what
> you were shooting for because we might be able to refactor it.****
>
> ** **
>
> I have a job that I developed that still required the input to be sorted
> for the reduce but I did not need to do any transformation or filtering in
> the map side so I just did an identity mapper, as Robert mentions below
> this, and it works perfectly. I do not think that there is any way to pass
> data directly into the S/S phase without going through the map phase (if
> that is what you were hinting at) and if you don’t require the data to go
> through S/S then you can make it a map only job.****
>
> ** **
>
> Matt****
>
> ** **
>
> *From:* Robert Hafner [mailto:tedivm@tedivm.com]
> *Sent:* Wednesday, September 07, 2011 11:34 AM
>
> *To:* mapreduce-user@hadoop.apache.org
> *Subject:* Re: No Mapper but Reducer****
>
>  ** **
>
> ** **
>
> You could just have a mapper which sent off the exact values it took in
> (ie, output k1,v1 as k2,v2). I think that's the best you'll be able to do
> here.
>
> ****
>
>
> On Sep 7, 2011, at 4:21 AM, Bejoy KS <bejoy.hadoop@gmail.com> wrote:****
>
>  Thank You All. Even I have noticed this strange behavior some time back.
> Now my inital concern still remains.  If I provide my input directory an
> empty one, yes the map tasks wont be executed .But my reducer needs  input
> to do the processing/ aggregation. In such a scenario, is there an option to
> provide input just to the reducer?
>
> Regards
> Bejoy.K.S****
>
> On Wed, Sep 7, 2011 at 3:09 PM, Sudharsan Sampath <sudhan65@gmail.com>
> wrote:****
>
> This is true and it took as off by surprise in recent past. Also, it had
> quite some impact on our job cycles where the size of input is totally
> random and could also be zero at times. ****
>
> ** **
>
> In one of our cycles, we run a lot of jobs. Say we configure X as the num
> of reducers for a job which does not have any input.****
>
> ** **
>
> Y -> No of tasktrackers in the cluster****
>
> ** **
>
> H -> Time Interval for Heartbeat response****
>
> ** **
>
> With the cdh2 version, the job takes, ****
>
> ** **
>
> ( X / Y) * H seconds to complete without doing any work since we assign
> only one reduce task per heartbeat****
>
> ** **
>
> ** **
>
> If the number of such jobs in the cycle is more, then the total time that
> the cluster spends doing nothing accumulates.****
>
> ** **
>
> I was thinking of raising this as a jira but not sure. Should we raise and
> fix this as jira request? Num of reducers set by the client can be overriden
> if the number of mappers is 0?****
>
> ** **
>
> We have a way to hack, by verifying the existence of the input path to the
> Map phase ourselves but just thought would be more intuitive for the
> framework to handle itself****
>
> ** **
>
> -Sudhan S****
>
> ** **
>
> On Wed, Sep 7, 2011 at 2:25 PM, Harsh J <harsh@cloudera.com> wrote:****
>
> Oh boy are you in for a surprise. Reducers _can_ run with 0 mappers in a
> job ;-)
>
> /me puts his troll-mask on.
>
> ➜  ~HADOOP_HOME  hadoop fs -mkdir abc
> ➜  ~HADOOP_HOME  hadoop jar hadoop-examples-0.20.2-cdh3u1.jar wordcount
> abc out
> 11/09/07 14:24:14 INFO input.FileInputFormat: Total input paths to process
> : 0
> 11/09/07 14:24:14 INFO mapred.JobClient: Running job: job_201109071413_0001
> 11/09/07 14:24:15 INFO mapred.JobClient:  map 0% reduce 0%
> 11/09/07 14:24:21 INFO mapred.JobClient:  map 0% reduce 100%
> 11/09/07 14:24:22 INFO mapred.JobClient: Job complete:
> job_201109071413_0001
> 11/09/07 14:24:22 INFO mapred.JobClient: Counters: 13
> 11/09/07 14:24:22 INFO mapred.JobClient:   Job Counters
> 11/09/07 14:24:22 INFO mapred.JobClient:     Launched reduce tasks=1
> 11/09/07 14:24:22 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=2209
> 11/09/07 14:24:22 INFO mapred.JobClient:     Total time spent by all
> reduces waiting after reserving slots (ms)=0
> 11/09/07 14:24:22 INFO mapred.JobClient:     Total time spent by all
> maps waiting after reserving slots (ms)=0
> 11/09/07 14:24:22 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=3113
> 11/09/07 14:24:22 INFO mapred.JobClient:   FileSystemCounters
> 11/09/07 14:24:22 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=59220
> 11/09/07 14:24:22 INFO mapred.JobClient:   Map-Reduce Framework
> 11/09/07 14:24:22 INFO mapred.JobClient:     Reduce input groups=0
> 11/09/07 14:24:22 INFO mapred.JobClient:     Combine output records=0
> 11/09/07 14:24:22 INFO mapred.JobClient:     Reduce shuffle bytes=0
> 11/09/07 14:24:22 INFO mapred.JobClient:     Reduce output records=0
> 11/09/07 14:24:22 INFO mapred.JobClient:     Spilled Records=0
> 11/09/07 14:24:22 INFO mapred.JobClient:     Combine input records=0
> 11/09/07 14:24:22 INFO mapred.JobClient:     Reduce input records=0
>
> /me takes off troll mask.****
>
>
> On Wed, Sep 7, 2011 at 1:30 PM, Bejoy KS <bejoy.hadoop@gmail.com> wrote:
> > Thanks Sonal. I was just thinking of some weird design and wanted to make
> > sure whether there is a possibility like that- no maps and all reducers.
> >
> > On Wed, Sep 7, 2011 at 1:22 PM, Sonal Goyal <sonalgoyal4@gmail.com>
> wrote:
> >>
> >> I dont think that is possible, can you explain in what scenario you want
> >> to have no mappers, only reducers?
> >> Best Regards,
> >> Sonal
> >> Crux: Reporting for HBase
> >> Nube Technologies
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Wed, Sep 7, 2011 at 1:18 PM, Bejoy KS <bejoy.hadoop@gmail.com>
> wrote:
> >>>
> >>> Hi
> >>>           I'm having a query here. Is it possible to have no mappers
> but
> >>> reducers alone? AFAIK If we need to avoid the tyriggering of reducers
> we can
> >>> set numReduceTasks to zero but such a setting on mapper wont work. So
> how
> >>> can it be achieved if possible?
> >>>
> >>> Thank You
> >>>
> >>> Regards
> >>> Bejoy.K.S
> >>
> >
> >
>
>
> ****
>
> --
> Harsh J****
>
> ** **
>
> ** **
>
>  This e-mail message may contain privileged and/or confidential
> information, and is intended to be received only by persons entitled
> to receive such information. If you have received this e-mail in error,
> please notify the sender immediately. Please delete it and
> all attachments from any servers, hard drives or any other media. Other use
> of this e-mail by you is strictly prohibited.
>
> All e-mails and attachments sent and received are subject to monitoring,
> reading and archival by Monsanto, including its
> subsidiaries. The recipient of this e-mail is solely responsible for
> checking for the presence of "Viruses" or other "Malware".
> Monsanto, along with its subsidiaries, accepts no liability for any damage
> caused by any such code transmitted by or accompanying
> this e-mail or any attachment.
>
>
> The information contained in this email may be subject to the export
> control laws and regulations of the United States, potentially
> including but not limited to the Export Administration Regulations (EAR)
> and sanctions regulations issued by the U.S. Department of
> Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this
> information you are obligated to comply with all
> applicable U.S. export laws and regulations.
>

Mime
View raw message