hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Kozlov <ale...@cloudera.com>
Subject Re: Is it possible to use NullWritable in combiner? + general question about combining output from many small maps
Date Wed, 21 Jul 2010 18:20:46 GMT
Hi Leo,

I am confused: how do you want to partition the work between multiple
reducers if the map emitted key is NULL?  If you don't, say you want to
reduce everything in one reducer, then the key type/value should not matter:
just emit a constant of any type and discard it later on.

Alex K

On Wed, Jul 21, 2010 at 1:59 AM, Leo Alekseyev <dnquark@gmail.com> wrote:

> Hi All,
> I have a job where all processing is done by the mappers, but each
> mapper produces a small file, which I want to combine into 3-4 large
> ones.  In addition, I only care about the values, not the keys, so
> NullWritable key is in order.  I tried using the default reducer
> (which according to the docs is identity) by setting
> job.setReducerClass(org.apache.hadoop.mapreduce.Reducer.class) and
> using a NullWritable key on the mapper output.  However, this seems to
> concentrate the work on one reducer only.  I then tried to output
> LongWritable as the mapper key, and write a combiner to output
> NullWritable (i.e. class GenerateLogLineProtoCombiner extends
> Reducer<LongWritable, ProtobufLineMsgWritable, NullWritable,
> ProtobufLineMsgWritable>); still using the default reducer.  This gave
> me the following error thrown by the combiner:
> 10/07/21 01:21:38 INFO mapred.JobClient: Task Id :
> attempt_201007122205_1058_m_000104_2, Status : FAILED
> java.io.IOException: wrong key class: class
> org.apache.hadoop.io.NullWritable is not class
> org.apache.hadoop.io.LongWritable
>        at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:164)
>          .........
> I was able to get things working by explicitly putting in an identity
> reducer that takes (LongWritable key, value) and outputs
> (NullWritable, value).  However, now most of my processing is in the
> reduce phase, which seems like a waste -- it's copying and sorting
> data, but all I really need is to "glue" together the small map
> outputs.
> Thus, my questions are: I don't really understand why the combiner is
> throwing an error here.  Does it simply not allow NullWritables on the
> output?...
> The second question is -- is there a standard strategy for quickly
> combining the many small map outputs?  Is it worth, perhaps, to look
> into adjusting the min split size for the mappers?.. (can this value
> be adjusted dynamically based on the input file size?..)
> Thanks to anyone who can give me some pointers :)
> --Leo

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message