hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From abc xyz <fabc_xyz...@yahoo.com>
Subject Re: Partitioned Datasets Map/Reduce
Date Tue, 06 Jul 2010 08:50:14 GMT

well, I want to do some experimentation with hadoop. I need to partition two 
datasets using same partitioning function and then in the next job, take the 
same partition from both datasets and apply some operation in the mapper. But 
how to ensure to get the same partition from both sources in one mapper??

From: Hemanth Yamijala <yhemanth@gmail.com>
To: general@hadoop.apache.org
Sent: Tue, July 6, 2010 5:40:49 AM
Subject: Re: Partitioned Datasets Map/Reduce


> I have written my custom partitioner for partitioning datasets. I want  to
> partition two datasets using the same partitioner and then in the  next
> mapreduce job, I want each mapper to handle the same partition from  the two
> sources and perform some function such as joining etc. How I  can I ensure 
> one mapper gets the split that corresponds to same  partition from both the
> sources?

Not really an answer to your specific question, but have you taken a
look at Pig (http://hadoop.apache.org/pig) which is suitable for
operations like Joining data sets ?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message