hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rahul Sood" <rs...@yahoo-inc.com>
Subject contrib join package
Date Fri, 05 Sep 2008 12:35:21 GMT
Hi,

 

Is there any detailed documentation on the
org.apache.hadoop.contrib.utils.join package ? I have a simple Join task
consisting of 2 input datasets. Each contains tab-separated records.

 

Set1: Record format = field1\tfield2\tfield3\tfield4\tfield5

Set2: Record format = field1\tfield2\tfield3

 

Join criterion: Set1.field1 = Set2.field1

 

Output: Set2.field2\tSet1.field2\tSet1.field3\tSet1.field4

 

The org.apache.hadoop.contrib.utils.join package contains DataJoinMapperBase
and DataJoinReducerBase abstract classes, and a TaggedMapOutput class which
should be the base class for the mapper output values. But there aren't any
examples showing how these classes should be used to implement inner or
outer joins in a generic manner.

 

If anybody has used this package and would like to share their experience,
please let me know.

 

Thanks,

 

Rahul Sood

rsood@yahoo-inc.com

 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message