hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From PeterAtReunion <pet...@mylife.com>
Subject Re: MultiFilterRecordReader
Date Thu, 19 Aug 2010 18:37:35 GMT
Lance -

Fun to see you on a mailing list.
How are things?


On 08/18/10 22:11, Lance Norskog wrote:
> Hadoop has a toolkit called 'map-side joins' which requires sorted
> input tables.  org.apache.hadoop.examples.Join.java shows how. Good
> luck decoding it!
> Could you use chained mapper tasks to sort each input set before using
> the join framework?
> On Wed, Aug 18, 2010 at 10:10 AM, y l <unoptenium@gmx.com> wrote:
>> Hi,
>> My first email on the list, and overall pretty new to Hadoop, so I'm hoping to find
some help with a new task I have to do for work.
>> I need to do a join between 2 sets of files. One is a bunch of csv files and the
other set is sequence files.
>> I was told MultiFilterRecorderReader could help me do the join, but I haven't been
successful to find some good example on where and how to use that class to do the join.
>> I have found a good example using CompositeInputFormat here: http://www.congiu.com/node/5
>> But it assumes that the input is sorted and I can't guarantee that it will be on
the csv files at least.
>> Anyone knows what I need to do with that MultiFilterRecorderReader? Inherit it on
the mapper? I'm a little confused... Please let me know if you have any pointers on that one.
>> Thanks.

View raw message