hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma" <jssa...@facebook.com>
Subject RE: JOIN-type operations with Hadoop...
Date Thu, 13 Sep 2007 14:43:00 GMT
We use the directory namespace to distinguish different types of files.
Wrote a simple wrapper around TextInputFormat/SequenceFileInputFormat -
such that they key returned is the pathname (or some component of the
pathname). That way u can look at the key - and then decide what kind of
record structure the value encodes and take the proper action.

Ping me if u want an example and will be happy to share.


-----Original Message-----
From: C G [mailto:parallelguy@yahoo.com] 
Sent: Thursday, September 13, 2007 7:11 AM
To: hadoop-user@lucene.apache.org
Subject: JOIN-type operations with Hadoop...

Consider two row based files.  The first has fields:
   
      A B C
   
  the second has fields:
   
     B D E 
   
  I want to join these files on the key B, to create records of the
form:
   
    A B C D E
   
  So B can be thought of as a primary key, and the second file will only
distinct values of B...i.e. no repeats.
   
  I'm trying to reason through how to do this type of join operation in
Hadoop but am unsure how to proceed with different "types" of files.  
   
  Does the community have any wisdom to share?
   
  Thanks,
  C G

       
---------------------------------
Yahoo! oneSearch: Finally,  mobile search that gives answers, not web
links. 

Mime
View raw message