hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Himanshu Vashishtha <vashishth...@gmail.com>
Subject Re: Is it possible to use NullWritable in combiner? + general question about combining output from many small maps
Date Wed, 21 Jul 2010 09:57:59 GMT
Please see my comments in-line, as per my understanding of Hadoop & your
problems. See if they are helpful.

Cheers,
Himanshu

On Wed, Jul 21, 2010 at 2:59 AM, Leo Alekseyev <dnquark@gmail.com> wrote:

> Hi All,
> I have a job where all processing is done by the mappers, but each
> mapper produces a small file, which I want to combine into 3-4 large
> ones.  In addition, I only care about the values, not the keys, so
> NullWritable key is in order.  I tried using the default reducer
> (which according to the docs is identity) by setting
> job.setReducerClass(org.apache.hadoop.mapreduce.Reducer.class) and
> using a NullWritable key on the mapper output.  However, this seems to
> concentrate the work on one reducer only.


NullWritable is a singleton class. So, the entire map output related to it
will go to a single reduce node.


> I then tried to output
> LongWritable as the mapper key, and write a combiner to output
> NullWritable (i.e. class GenerateLogLineProtoCombiner extends
> Reducer<LongWritable, ProtobufLineMsgWritable, NullWritable,
> ProtobufLineMsgWritable>); still using the default reducer.  This gave
> me the following error thrown by the combiner:
>
> 10/07/21 01:21:38 INFO mapred.JobClient: Task Id :
> attempt_201007122205_1058_m_000104_2, Status : FAILED
> java.io.IOException: wrong key class: class
> org.apache.hadoop.io.NullWritable is not class
> org.apache.hadoop.io.LongWritable
>        at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:164)
>          .........
>
> A combiner goal is to lessen the  reducer's workload. Ideally, its output
key-value should be same as that of Mapper's output key-value. Therefore the
error.

> I was able to get things working by explicitly putting in an identity
> reducer that takes (LongWritable key, value) and outputs
> (NullWritable, value).  However, now most of my processing is in the
> reduce phase, which seems like a waste -- it's copying and sorting
> data, but all I really need is to "glue" together the small map
> outputs.
>
> Thus, my questions are: I don't really understand why the combiner is
> throwing an error here.  Does it simply not allow NullWritables on the
> output?...
> The second question is -- is there a standard strategy for quickly
> combining the many small map outputs?  Is it worth, perhaps, to look
> into adjusting the min split size for the mappers?.. (can this value
> be adjusted dynamically based on the input file size?..)
>
> I don't know of any such strategy. How about defining a smaller number of
reducers. I am also not able to understand teh problem. It will be great if
you are a bit more specific (in terms of map file input and output size, and
reduce output size).


> Thanks to anyone who can give me some pointers :)
> --Leo
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message