hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew John <tmatthewjohn1...@gmail.com>
Subject Reduce side join
Date Mon, 18 Oct 2010 08:16:43 GMT
Hi all,

   I am working on a join operation using Hadoop. I came across Reduce-side
join in Hadoop The Definitive Guide. As far as I understand , this technique
is all about :

1) Read the two inputs using separate mappers  and tag the two inputs using
different values such that in the Sort Shuffle phase the primary key Record
(with only one instance of a Record with the key) comes before the records
with the same foreign key.

2) In the Reduce phase , read the required portion of the 1st record to a
variable and keep on appending it to the rest of the records to follow .

My doubt is :
Is it fine if I have more than 1 set of input records (primary record
followed by the foreign records) in the same reduce phase.
For example, will this technique work if I have just one reducer running.


Matthew John

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message