hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: MultiFilterRecordReader
Date Thu, 19 Aug 2010 05:11:41 GMT
Hadoop has a toolkit called 'map-side joins' which requires sorted
input tables.  org.apache.hadoop.examples.Join.java shows how. Good
luck decoding it!

Could you use chained mapper tasks to sort each input set before using
the join framework?

On Wed, Aug 18, 2010 at 10:10 AM, y l <unoptenium@gmx.com> wrote:
> Hi,
> My first email on the list, and overall pretty new to Hadoop, so I'm hoping to find some
help with a new task I have to do for work.
> I need to do a join between 2 sets of files. One is a bunch of csv files and the other
set is sequence files.
> I was told MultiFilterRecorderReader could help me do the join, but I haven't been successful
to find some good example on where and how to use that class to do the join.
> I have found a good example using CompositeInputFormat here: http://www.congiu.com/node/5
> But it assumes that the input is sorted and I can't guarantee that it will be on the
csv files at least.
> Anyone knows what I need to do with that MultiFilterRecorderReader? Inherit it on the
mapper? I'm a little confused... Please let me know if you have any pointers on that one.
> Thanks.

Lance Norskog

View raw message