lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Best Practices for Distributing Lucene Indexing and Searching
Date Thu, 14 Jul 2005 23:28:33 GMT
Interesting.  I'm planning on doing something similar for some new
Simpy features.  Why are your worker bees sending whole indices to the
Queen bee?  Wouldn't it be easier to send in Documents and have the
Queen index them in the same index?  Maybe you need those individual,
smaller indices to be separate....

How do you deal with the possibility of the same Document being present
in multiple indices?

Otis


--- Paul Smith <psmith@aconex.com> wrote:

> I had a crack at whipping up something along this lines during a 1  
> day hackathon we held here at work, using ActiveMQ as the bus between
>  
> the 'co-ordinator' (Queen bee) and the 'worker" bees.  The index work
>  
> was segmented as jobs on a work queue, and the workers feed the  
> relatively smal index chunks back along a result queue, which the co-
> 
> ordinator then merged in.
> 
> The tough part from observing the outcome is knowing what the chunk  
> size should be, because in the end the co-ordinator needs to merge  
> all the sub-indexes together into 1 and for a large index that's not 
> 
> an insignificant time.  You also have to use bookkeeping to work out 
> 
> if a 'job' has not been completed in time (maybe failure by the  
> worker) and decide whether the job should be resubmitted (in theory  
> JMS with transactions would help there, but then you have a  
> throughput problem on that too).
> 
> Would love to see something like this work really well, and perhaps  
> generalize it a bit more.  I do like the simplicity of the SEDA  
> principles.
> 
> cheers,
> 
> Paul Smith
> 
> 
> On 14/07/2005, at 11:50 PM, Peter Gelderbloem wrote:
> 
> > I am currently looking into building a similar system and came
> across
> > this architecture:
> > http://www.eecs.harvard.edu/~mdw/proj/seda/
> >
> > I am just reading up on it now. Does anyone have experience
> building a
> > lucene system based on this architecture? Any advice would be
> greatly
> > appreciated.
> >
> > Peter Gelderbloem
> >
> >    Registered in England 3186704
> > -----Original Message-----
> > From: Luke Francl [mailto:luke.francl@stellent.com]
> > Sent: 13 May 2005 22:04
> > To: java-user@lucene.apache.org
> > Subject: Re: Best Practices for Distributing Lucene Indexing and
> > Searching
> >
> > On Tue, 2005-03-01 at 19:23, Chris Hostetter wrote:
> >
> >
> >> I don't really consider reading/writing to an NFS mounted
> FSDirectory
> >>
> > to
> >
> >> be viable for the very reasons you listed; but I haven't really
> found
> >>
> > any
> >
> >> evidence of problems if you take they approach that a single
> "writer"
> >> node indexes to local disk, which is NFS mounted by all of your
> other
> >> nodes for doing queries.  concurent updates/queries may still not
> be
> >>
> > safe
> >
> >> (i'm not sure) but you could have the writer node "clone" the
> entire
> >>
> > index
> >
> >> into a new directory, apply the updates and then signal the other
> >>
> > nodes to
> >
> >> stop using the old FSDirectory and start using the new one.
> >>
> >
> > Thanks to everyone who contributed advice to my question about how
> to
> > distribute a Lucene index across a cluster.
> >
> > I'm about to start on the implementation and I wanted to clarify
> > something about using NFS that Chris wrote about above.
> >
> > There are many warnings about indexing on an NFS file system, but  
> > is it
> > safe to have a single node index, while the other nodes use the
> file
> > system in read-only mode?
> >
> > On a related note, our software is cross-platform and needs to work
> on
> > Windows as well. Are there any problems known problems having a
> > read-only index shared over SMB?
> >
> > Using a shared file system is preferable to me because it's easier,
>  
> > but
> > if it's necessary I will write the code to copy the index to each  
> > node.
> >
> > Thanks,
> > Luke Francl
> >
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message