hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From maar...@sherpa-consulting.be
Subject Re: Hadoop + Lucene integration: possible? how?
Date Tue, 16 Jan 2007 09:18:22 GMT
Thanks for all the info!

I've also been looking at the combination between Lucene and  
Terracotta (clustering service) they seem to have a way to cluster the  
Directory instances? Haven't tested it yet? Anyone any experience with  
i.e. clustering services like TerraCotat or Tangosol Coherence? Or is  
this not the right way to use Lucene with?

Other question: My indexing engine, so lucene, should run on dedicated  
machines. My app is a web app, not much business stuff to do but for  
95% it will be used to retrieve data. I'm not familiar with load  
balancing but I was thing about a load balancer, behind it multiple  
web servers and behind that I would have a dedicated indexing engine.  
But I'm not sure how the architecture would look, how would you use  
combine the web servers with the indexing servers? Webservices eem so  
heavy loaded?

Quoting Doug Cutting <cutting@apache.org>:

> Andrzej Bialecki wrote:
>> It's possible to use Hadoop DFS to host a read-only Lucene index   
>> and use it for searching (Nutch has an implementation of   
>> FSDirectory for this purpose), but the performance is not stellar ...
> Right, the "best practice" is to copy Lucene indexes to local drives in
> order to search them.  Solr uses rsync to efficiently replicate an
> index.  If, however you have lots of small indexes, it can make sense
> to keep them in HDFS and copy them to local drives as they're deployed.
> Then, when a box fails, one can quickly re-deploy its index to its
> replacement.
> Doug

View raw message