hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Basic Question
Date Tue, 07 Aug 2012 18:33:55 GMT
Each write call registers (writes) a KV pair to the output. The output
collector does not look for similarities nor does it try to de-dupe
it, and even if the object is the same, its value is copied so that
doesn't matter.

So you will get two KV pairs in your output - since duplication is
allowed and is normal in several MR cases. Think of wordcount, where a
map() call may emit lots of ("is", 1) pairs if there are multiple "is"
in the line it processes, and can use set() calls to its benefit to
avoid too many object creation.

On Tue, Aug 7, 2012 at 11:56 PM, Mohit Anchlia <mohitanchlia@gmail.com> wrote:
> In Mapper I often use a Global Text object and througout the map processing
> I just call "set" on it. My question is, what happens if collector receives
> similar byte array value. Does the last one overwrite the value in
> collector? So if I did
> Text zip = new Text();
> zip.set("9099");
> collector.write(zip,value);
> zip.set("9099");
> collector.write(zip,value1);
> Should I expect to receive both values in reducer or just one?

Harsh J

View raw message