hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Devin Suiter RDX <dsui...@rdx.com>
Subject Re: manipulating key in combine phase
Date Mon, 13 Jan 2014 13:06:02 GMT
Amit,

Have you explored chainMapper class?

*Devin Suiter*
Jr. Data Solutions Software Engineer
100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
Google Voice: 412-256-8556 | www.rdx.com


On Sun, Jan 12, 2014 at 7:28 PM, John Lilley <john.lilley@redpoint.net>wrote:

>  Isn’t this is what you’d normally do in the Mapper?
>
> My understanding of the combiner is that it is like a “mapper-side
> pre-reducer” and operates on blocks of data that have already been sorted
> by key, so mucking with the keys doesn’t **seem** like a good idea.
>
> john
>
>
>
> *From:* Amit Sela [mailto:amits@infolinks.com]
> *Sent:* Sunday, January 12, 2014 9:26 AM
> *To:* user@hadoop.apache.org
> *Subject:* manipulating key in combine phase
>
>
>
> Hi all,
>
>
>
> I was wondering if it is possible to manipulate the key during combine:
>
>
>
> Say I have a mapreduce job where the key has many qualifiers.
>
> I would like to "split" the key into two (or more) keys if it has more
> than, say 100 qualifiers.
>
> In the combiner class I would do something like:
>
>
>
> int count = 0;
>
> for (Writable value: values) {
>
>   if (++count >= 100){
>
>     context.write(newKey, value);
>
>   } else {
>
>     context.write(key, value);
>
>   }
>
> }
>
>
>
> where newKey is something like key+randomUUID
>
>
>
> I know that the combiner can be called "zero, once or more..." and I'm
> getting strange results (same key written more then once) so I would be
> glad to get some deeper insight into how the combiner works.
>
>
>
> Thanks,
>
>
>
> Amit.
>

Mime
View raw message