hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From abc xyz <fabc_xyz...@yahoo.com>
Subject Hashing two relations
Date Sat, 03 Jul 2010 06:10:14 GMT
Hey Folks,

I have to mess around with hashing. I want to take two input sources, partition 
them using hash function, then make the in-memory hash table for each partition 
of one sources, and compare the hash of each record of the same partition of the 
other table against it for joining these two. 

I know that map-side join does this (on pre-partitioned data), but I want to do 
it on reduce side. Using job-chaining, I can output (hash(key), value) by two 
map tasks on the two input files, but when it comes to the reduce stage, i have 
to take the same partition from both the hash tables. I am not sure how can I 
accomplish this. Any guidance in this regards would be appreciated.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message