hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From maar...@sherpa-consulting.be
Subject Re: Hadoop + Lucene integration: possible? how?
Date Mon, 15 Jan 2007 13:09:34 GMT
Thanks Andrzej,

let's me quickly explain my situation:

I'm developing an application which is partially based upon 'tags'  
(the new hype lolz), instead of suing rdmbs for full text searching  
the tag list / item I'll be using Lucene. The application will have  
about 100000 visitors / day, mostly / only searching and not adding  
stuff. I have, at the moment, no idea on how the performance will be  
when all these users will be hitting Lucene. That was why I was  
looking at a distributed solution and found Hadoop. So I'll be adding  
and removing indexes, is removing possible on Hadoop, because you  
mentioned read-only?

You have any idea whether the scenario above can easily be handled by  
Lucene (best guess) or that idd I'll be needing some kind of DFS? And  
if so you have any suggestions?

Thanks in advance!


Quoting Andrzej Bialecki <ab@getopt.org>:

> maarten@sherpa-consulting.be wrote:
>> I'm new to lucene and Hadoop but what I can't seem to find in the   
>> docs, internet... is how (and if possible?) to use Hadoop as the   
>> underlying FS for Lucene?
>> Could anyone explain me how these can be tied together? Some small   
>> code/configuration example would be nice :-)
> It's possible to use Hadoop DFS to host a read-only Lucene index and
> use it for searching (Nutch has an implementation of FSDirectory for
> this purpose), but the performance is not stellar ... Currently it's
> not (yet) possible to use HDFS for creating Lucene indexes, a minor
> change to Lucene index format would be required.
> -- 
> Best regards,
> Andrzej Bialecki     <><
> ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com

View raw message