hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "GOEKE, MATTHEW (AG/1000)" <matthew.go...@monsanto.com>
Subject RE: No Mapper but Reducer
Date Wed, 07 Sep 2011 17:00:16 GMT

What exactly is your use case? I know down below you said you were just thinking of a weird
design but it would really help if we knew exactly what you were shooting for because we might
be able to refactor it.

I have a job that I developed that still required the input to be sorted for the reduce but
I did not need to do any transformation or filtering in the map side so I just did an identity
mapper, as Robert mentions below this, and it works perfectly. I do not think that there is
any way to pass data directly into the S/S phase without going through the map phase (if that
is what you were hinting at) and if you don’t require the data to go through S/S then you
can make it a map only job.


From: Robert Hafner [mailto:tedivm@tedivm.com]
Sent: Wednesday, September 07, 2011 11:34 AM
To: mapreduce-user@hadoop.apache.org
Subject: Re: No Mapper but Reducer

You could just have a mapper which sent off the exact values it took in (ie, output k1,v1
as k2,v2). I think that's the best you'll be able to do here.

On Sep 7, 2011, at 4:21 AM, Bejoy KS <bejoy.hadoop@gmail.com<mailto:bejoy.hadoop@gmail.com>>
Thank You All. Even I have noticed this strange behavior some time back.
Now my inital concern still remains.  If I provide my input directory an empty one, yes the
map tasks wont be executed .But my reducer needs  input to do the processing/ aggregation.
In such a scenario, is there an option to provide input just to the reducer?

On Wed, Sep 7, 2011 at 3:09 PM, Sudharsan Sampath <sudhan65@gmail.com<mailto:sudhan65@gmail.com>>
This is true and it took as off by surprise in recent past. Also, it had quite some impact
on our job cycles where the size of input is totally random and could also be zero at times.

In one of our cycles, we run a lot of jobs. Say we configure X as the num of reducers for
a job which does not have any input.

Y -> No of tasktrackers in the cluster

H -> Time Interval for Heartbeat response

With the cdh2 version, the job takes,

( X / Y) * H seconds to complete without doing any work since we assign only one reduce task
per heartbeat

If the number of such jobs in the cycle is more, then the total time that the cluster spends
doing nothing accumulates.

I was thinking of raising this as a jira but not sure. Should we raise and fix this as jira
request? Num of reducers set by the client can be overriden if the number of mappers is 0?

We have a way to hack, by verifying the existence of the input path to the Map phase ourselves
but just thought would be more intuitive for the framework to handle itself

-Sudhan S

On Wed, Sep 7, 2011 at 2:25 PM, Harsh J <harsh@cloudera.com<mailto:harsh@cloudera.com>>
Oh boy are you in for a surprise. Reducers _can_ run with 0 mappers in a job ;-)

/me puts his troll-mask on.

➜  ~HADOOP_HOME  hadoop fs -mkdir abc
➜  ~HADOOP_HOME  hadoop jar hadoop-examples-0.20.2-cdh3u1.jar wordcount abc out
11/09/07 14:24:14 INFO input.FileInputFormat: Total input paths to process : 0
11/09/07 14:24:14 INFO mapred.JobClient: Running job: job_201109071413_0001
11/09/07 14:24:15 INFO mapred.JobClient:  map 0% reduce 0%
11/09/07 14:24:21 INFO mapred.JobClient:  map 0% reduce 100%
11/09/07 14:24:22 INFO mapred.JobClient: Job complete: job_201109071413_0001
11/09/07 14:24:22 INFO mapred.JobClient: Counters: 13
11/09/07 14:24:22 INFO mapred.JobClient:   Job Counters
11/09/07 14:24:22 INFO mapred.JobClient:     Launched reduce tasks=1
11/09/07 14:24:22 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=2209
11/09/07 14:24:22 INFO mapred.JobClient:     Total time spent by all
reduces waiting after reserving slots (ms)=0
11/09/07 14:24:22 INFO mapred.JobClient:     Total time spent by all
maps waiting after reserving slots (ms)=0
11/09/07 14:24:22 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=3113
11/09/07 14:24:22 INFO mapred.JobClient:   FileSystemCounters
11/09/07 14:24:22 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=59220
11/09/07 14:24:22 INFO mapred.JobClient:   Map-Reduce Framework
11/09/07 14:24:22 INFO mapred.JobClient:     Reduce input groups=0
11/09/07 14:24:22 INFO mapred.JobClient:     Combine output records=0
11/09/07 14:24:22 INFO mapred.JobClient:     Reduce shuffle bytes=0
11/09/07 14:24:22 INFO mapred.JobClient:     Reduce output records=0
11/09/07 14:24:22 INFO mapred.JobClient:     Spilled Records=0
11/09/07 14:24:22 INFO mapred.JobClient:     Combine input records=0
11/09/07 14:24:22 INFO mapred.JobClient:     Reduce input records=0

/me takes off troll mask.

On Wed, Sep 7, 2011 at 1:30 PM, Bejoy KS <bejoy.hadoop@gmail.com<mailto:bejoy.hadoop@gmail.com>>
> Thanks Sonal. I was just thinking of some weird design and wanted to make
> sure whether there is a possibility like that- no maps and all reducers.
> On Wed, Sep 7, 2011 at 1:22 PM, Sonal Goyal <sonalgoyal4@gmail.com<mailto:sonalgoyal4@gmail.com>>
>> I dont think that is possible, can you explain in what scenario you want
>> to have no mappers, only reducers?
>> Best Regards,
>> Sonal
>> Crux: Reporting for HBase
>> Nube Technologies
>> On Wed, Sep 7, 2011 at 1:18 PM, Bejoy KS <bejoy.hadoop@gmail.com<mailto:bejoy.hadoop@gmail.com>>
>>> Hi
>>>           I'm having a query here. Is it possible to have no mappers but
>>> reducers alone? AFAIK If we need to avoid the tyriggering of reducers we can
>>> set numReduceTasks to zero but such a setting on mapper wont work. So how
>>> can it be achieved if possible?
>>> Thank You
>>> Regards
>>> Bejoy.K.S

Harsh J

This e-mail message may contain privileged and/or confidential information, and is intended
to be received only by persons entitled
to receive such information. If you have received this e-mail in error, please notify the
sender immediately. Please delete it and
all attachments from any servers, hard drives or any other media. Other use of this e-mail
by you is strictly prohibited.

All e-mails and attachments sent and received are subject to monitoring, reading and archival
by Monsanto, including its
subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence
of "Viruses" or other "Malware".
Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such
code transmitted by or accompanying
this e-mail or any attachment.

The information contained in this email may be subject to the export control laws and regulations
of the United States, potentially
including but not limited to the Export Administration Regulations (EAR) and sanctions regulations
issued by the U.S. Department of
Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this information you
are obligated to comply with all
applicable U.S. export laws and regulations.
View raw message