hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit Sela <am...@infolinks.com>
Subject manipulating key in combine phase
Date Sun, 12 Jan 2014 16:25:58 GMT
Hi all,

I was wondering if it is possible to manipulate the key during combine:

Say I have a mapreduce job where the key has many qualifiers.
I would like to "split" the key into two (or more) keys if it has more
than, say 100 qualifiers.
In the combiner class I would do something like:

int count = 0;
for (Writable value: values) {
  if (++count >= 100){
    context.write(newKey, value);
  } else {
    context.write(key, value);
  }
}

where newKey is something like key+randomUUID

I know that the combiner can be called "zero, once or more..." and I'm
getting strange results (same key written more then once) so I would be
glad to get some deeper insight into how the combiner works.

Thanks,

Amit.

Mime
View raw message