hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kylie McCormick" <kyliemccorm...@gmail.com>
Subject Hadoop Architecture Question: Distributed Information Retrieval
Date Thu, 10 Jul 2008 20:43:26 GMT
My name is Kylie McCormick, and I'm currently working on creating a
distributed information retrieval package with Hadoop based on my previous
work with other middlewares like OGSA-DAI. I've been developing a design
that works with the structures of the other systems I have put together for
distributed IR.

Essentially, each service (search) returns a ResultSet, which is then merged
into a single FinalSet object as soon as it is returned to the main program.
Merging a ResultSet generally entails rescoring the documents and putting
them in the same OrderedList as documents from other services that have also
been rescored.

I have re-designed this so at the Map phase a service is invoked and the
ResultSet is collected by the OutputCollector. In the Reduce phase, I hoped
to merge all the results together. Is it possible to have reduce produce one
(and only one) object output?

Thank you,

The Circle of the Dragon -- unlock the mystery that is the dragon.

"Light, seeking light, doth the light of light beguile!"
-- William Shakespeare's Love's Labor's Lost

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message