hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Lilley <john.lil...@redpoint.net>
Subject RE: manipulating key in combine phase
Date Mon, 13 Jan 2014 00:28:55 GMT
Isn't this is what you'd normally do in the Mapper?
My understanding of the combiner is that it is like a "mapper-side pre-reducer" and operates
on blocks of data that have already been sorted by key, so mucking with the keys doesn't *seem*
like a good idea.

From: Amit Sela [mailto:amits@infolinks.com]
Sent: Sunday, January 12, 2014 9:26 AM
To: user@hadoop.apache.org
Subject: manipulating key in combine phase

Hi all,

I was wondering if it is possible to manipulate the key during combine:

Say I have a mapreduce job where the key has many qualifiers.
I would like to "split" the key into two (or more) keys if it has more than, say 100 qualifiers.
In the combiner class I would do something like:

int count = 0;
for (Writable value: values) {
  if (++count >= 100){
    context.write(newKey, value);
  } else {
    context.write(key, value);

where newKey is something like key+randomUUID

I know that the combiner can be called "zero, once or more..." and I'm getting strange results
(same key written more then once) so I would be glad to get some deeper insight into how the
combiner works.



View raw message