lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Lu" <chris...@gmail.com>
Subject Re: Indexing large index with Lucene
Date Thu, 28 Sep 2006 22:25:44 GMT
I like the approach in your second point. But I have doubt on the first point.

For a production level index, usually pretty big, freqent close/reopen
the searcher may not be fast enough, especially when you want to cache
sorting. It's better to keep the searchers open. But when the indexing
process is going on, the files are changing. The searcher's segment
information will be outdated and read EOF exceptions will happen.

So for a big index, it's better to keep two copies of index, one for
searching, one for indexing. And hot-swapping them when indexing is
done. This is what we did in DBSight. No read EOF exceptions or
corrupted indexes any more.

Chris Lu
---------------------------
Full-Text Search on Any Applications/Databases
http://www.dbsight.net

On 9/28/06, Erick Erickson <erickerickson@gmail.com> wrote:
> Two things come to mind...
>
> First, you can freely write to an index while searching it, the search is
> always available. I'm pretty sure this includes deleting/readding documents.
> However, you won't be able to search on the changes in your index until you
> close/reopen the *searcher*.
>
> Second, depending on how quickly you need updates, you could always make a
> *copy* of your index, update that and then move it back to where your
> searcher looks for it, sort of a batch process really. It all depends upon
> how quickly you require seeing the changes.
>
> Hope this helps
> Erick
>
> On 9/28/06, Eric Louvard <eric.louvard@hauk-sasko.de> wrote:
> >
> > I'm using Lucene since several year. We had to index allways more
> > documents.
> >
> > I'm now trying to optimise the index process with more than 1.000.000
> > documents and I can see that the performance will decrease when the
> > index size is greater.
> > I would like to know if someone as allready studied this case.
> >
> > It's interactively maintained index and the fisrt index process is my
> > biggest Problem.
> >
> > - A document contains several attributs.
> > - I can't block the index during the index process (the search must
> > allways be availlable).
> > - I need to delete the older version of document if I become an newer.
> >
> > Thank you to tell me about you personnal experience.
> >
> > Éric Louvard.
> >
> > --
> > Mit freundlichen Grüßen
> >
> > i. A. Éric Louvard
> > HAUK & SASKO Ingenieurgesellschaft mbH
> > Zettachring 2
> > D-70567 Stuttgart
> >
> > Phone: +49 7 11 7 25 89 - 19
> > Fax: +49 7 11 7 25 89 - 50
> > E-Mail: eric.louvard@hauk-sasko.de
> > www: www.hauk-sasko.de
> >
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message