hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kylie McCormick" <kyliemccorm...@gmail.com>
Subject Re: Hadoop Architecture Question: Distributed Information Retrieval
Date Thu, 10 Jul 2008 22:14:26 GMT
Thanks for the replies! If I use a single reducer, however, would it be
possible for there to be only one object (FinalSet) to which the Reduce
function merges? If not, I could redo the structure of the program, but I
was hoping to maintain it as much as possible.

Yes, I am aware of Nutch, and I've been using some of the documentation to
help with my new design. It's quite exciting! I'm hoping to have another
Java package with which to continue work on large TREC tracks.

My work with OGSA-DAI can be seen @
http://snowy.arsc.alaska.edu:8080/edu/arsc/multisearch/ if you're
interested, and by the end of the summer I hope to have a write up that
discusses the differences (esp. performance) between the two. The system
from last year was used on this year's TREC collection (with 1,000 services
and 10,000 queries) and performed fairly well. I'm hoping Hadoop will make
more sense and run faster.

Thank you,

On Thu, Jul 10, 2008 at 1:47 PM, Steve Loughran <stevel@apache.org> wrote:

> Kylie McCormick wrote:
>> Hello!
>> My name is Kylie McCormick, and I'm currently working on creating a
>> distributed information retrieval package with Hadoop based on my previous
>> work with other middlewares like OGSA-DAI. I've been developing a design
>> that works with the structures of the other systems I have put together
>> for
>> distributed IR.
> It would be interesting to see your write up of the different experiences
> that OGSA-DAI's storage model offers versus that of hadoop.
> -steve

The Circle of the Dragon -- unlock the mystery that is the dragon.

"Light, seeking light, doth the light of light beguile!"
-- William Shakespeare's Love's Labor's Lost

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message