hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miles Osborne" <mi...@inf.ed.ac.uk>
Subject Re: Hadoop Architecture Question: Distributed Information Retrieval
Date Thu, 10 Jul 2008 20:54:40 GMT
If you tell Hadoop to use a single reducer, it should produce a single file
of output.

btw, you do know about Nutch I presume?


This is a distributed IR system built using Hadoop.

2008/7/10 Kylie McCormick <kyliemccormick@gmail.com>:

> Hello!
> My name is Kylie McCormick, and I'm currently working on creating a
> distributed information retrieval package with Hadoop based on my previous
> work with other middlewares like OGSA-DAI. I've been developing a design
> that works with the structures of the other systems I have put together for
> distributed IR.
> Essentially, each service (search) returns a ResultSet, which is then
> merged
> into a single FinalSet object as soon as it is returned to the main
> program.
> Merging a ResultSet generally entails rescoring the documents and putting
> them in the same OrderedList as documents from other services that have
> also
> been rescored.
> I have re-designed this so at the Map phase a service is invoked and the
> ResultSet is collected by the OutputCollector. In the Reduce phase, I hoped
> to merge all the results together. Is it possible to have reduce produce
> one
> (and only one) object output?
> Thank you,
> Kylie
> --
> The Circle of the Dragon -- unlock the mystery that is the dragon.
> http://www.blackdrago.com/index.html
> "Light, seeking light, doth the light of light beguile!"
> -- William Shakespeare's Love's Labor's Lost

The University of Edinburgh is a charitable body, registered in Scotland,
with registration number SC005336.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message