lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zach Bailey <>
Subject Re: Clustered Indexing on common network filesystem
Date Thu, 02 Aug 2007 15:33:38 GMT
Thanks for your response --

Based on my understanding, hadoop and nutch are essentially the same 
thing, with nutch being derived from hadoop, and are primarily intended 
to be standalone applications.

We are not looking for a standalone application, rather we must use a 
framework to implement search inside our current content management 
application. Currently the application search functionality is designed 
and built around Lucene, so migrating frameworks at this point is not 

We are currently re-working our back-end to support clustering (in 
tomcat) and we are looking for information on the migration of Lucene 
from a single node filesystem index (which is what we use now and hope 
to continue to use for clients with a single-node deployment) to a 
shared filesystem index on a mounted network share.

We prefer to use this strategy because it means we do not have to have 
two disparate methods of managing indexes for clients who run in a 
single-node, non-clustered environment versus clients who run in a 
multiple-node, clustered environment.

So, hopefully here are some easy questions someone could shed some light on:

Is this not a recommended method of managing indexes across multiple nodes?

At this point would people recommend storing an individual index on each 
node and propagating index updates via a JMS framework rather than 
attempting to handle it transparently with a single shared index?

Is the Lucene index code so intimately tied to filesystem semantics that 
using a shared/networked file system is infeasible at this point in time?

What would be the quickest time-to-implementation of these strategies 
(JMS vs. shared FS)? The most robust/least error-prone?

I really appreciate any insight or response anyone can provide, even if 
it is a short answer to any of the related topics, "i.e. we implemented 
clustered search using per-node indexing with JMS update propagation and 
it works great", or even something as simple as "don't use a shared 
filesystem at this point".


testn wrote:
> Why don't you check out Hadoop and Nutch? It should provide what you are
> looking for.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message