lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: Very large number of indices in distributed environment
Date Thu, 04 Aug 2005 16:17:44 GMT
In my experience, searching a read only index mounted via NFS is fine.
The NFS related issues are with locking.

I'd agree with Chris that you should try to avoid very big indexes.


--
Ian.


On 04/08/05, Chris Lu <chris.lu@gmail.com> wrote:
> A big index is slow to merge, slow to search, and as you mentioned,
> it's slow to sync. An 1G index took me several hours to merge on a P3
> 1GHz 512MRam.
> Mounting Lucene on NFS is also a "No GO".
> 
> I feel you choice 2 may be feasible, although complicated. That's your
> job, right. :)
> 
> BTW: I guess your app is a web hosted app for different users, is it?
> 
> --
> Chris Lu
> ------------
> Lucene Search RAD on Any Database
> http://www.dbsight.net
> 
> 
> On 8/4/05, Benjamin Reitzammer <breitzammer@gmail.com> wrote:
> > Hi,
> > we are in the process of planning a search feature of a product and we
> > are having quite a hard time figuring out the "right" way to do it.
> >
> > The requirements for our app are the following:
> > 1) Large number of indices (at _least_ 10000)
> > 2) The amount of data involved per index is not very high, but because
> > of the number of indices involved the data set will be something about
> > 500 - 1000 GB
> > 3) The searching capabilities must be fail safe, while it's acceptable
> > if deletes/updates can take some time.
> > 4) The majority of operations will be searching the indices.
> >
> > I've followed the mailing list intensively the last month and
> > especially the "Best Practices for Distributing Lucene Indexing and
> > Searching" (http://marc.theaimsgroup.com/?l=lucene-user&m=110971318020691&w=2)
> > and "Real time indexing and distribution to lucene on separate boxes (long)"
> > (http://marc.theaimsgroup.com/?l=lucene-user&m=107900097217474&w=2)
> > threads provided some interesting insight.
> >
> > But still our requirements are a bit different.
> >
> > My thoughts how the above could be handled, so far are:
> >
> > 1) Have one *really big* "master"  which handles all tasks related to
> > index manipulation. Sync the indices according to Doug's tips
> > http://marc.theaimsgroup.com/?l=lucene-user&m=110973989200204&w=2 out
> > to a cluster of slaves that are responsible for searching.
> > Problem: How to make sure that indices across  slaves are in sync.
> > Big Problem: Syncing of this large number of indices will cause a lot
> > of traffic and cause already quite a load on the slaves (not to speak
> > of the master)
> >
> > 1.1) Is it safe to _search_ (only) an index mounted via NFS? If yes,
> > then the search boxes could mount the indices on the master box. But
> > this solution would probably lead to some serious perfomance issues
> > because of the needed disk I/O on the master.
> > Though I'd love to be proven wrong on this one.
> >
> > 2) Split up index collection into smaller portions and distribute a
> > certain number of indices (~ up to 1000 indices) into smaller
> > autonomous clusters, that are completely responsible for their
> > collection of indices.
> > Problem: How do I keep index distribution dynamic so I don't have to
> > hardcode where to look for a certain index (that's not a real lucene
> > issue, but more one of distributed computing, but nevertheless I
> > thought you guys might know a way to solve it).
> >
> > Any ideas on this?
> > Has anyone ever worked with such a large number of Lucene indices (and
> > the amount of data it involves)?
> >
> > I appreciate your help very much.
> >
> > Cheers
> >
> > Benjamin

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message