hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Theocharis Ian Athanasakis" <tha...@thatha.org>
Subject Re: Reducer with two sets of inputs
Date Wed, 06 Aug 2008 02:48:07 GMT
Apologies for misphrasing my question.

Let me rephrase it: Using the Hadoop Java APIs is there a suggested
way of doing a pair-wise comparison between all LineRecords in a file?

More generically: is there a Hadoop Java API design pattern for a
reducer to iterate through all the records in another file stored on
HDFS?

I'm currently using the DistributeCache class to cache the reference
file locally. The shard a reducer is examining is always a part of the
reference file. My reducer, then, ends up doing all the comparisons
between its shard and the reference file.

When all of these get combined, I have my pair-wise comparison between
all records.

Any better ways?

On Tue, Aug 5, 2008 at 11:20, Theocharis Ian Athanasakis
<thatha@thatha.org> wrote:
> What's the proposed the design pattern for a reducer that needs two
> sets of inputs?
> Are there any source code examples?
>
> Thanks :)
>

Mime
View raw message