lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: I need your opinion about working with big index and frecuently updates
Date Tue, 03 Oct 2006 11:42:03 GMT
Think about IndexModifier to change your index, although the documentation
does state that it's better to batch your deletions together and batch your
additions together if possible.

100Mb is not, in my experience, a very big index, so I really don't
anticipate many problems. Do note that you can't modify a document in place.
That is, to update a document, you need to delete it from the index and
re-add it.

I believe, if you just write some code to take some timings, your questions
will be answered to your satisfaction. I also believe that if you open your
index, change it, and close it for each document, your performance will look
much worse (on a per-document basis) than if you open your index, do a bunch
of modifications, and *then* close it.

Also note that you MUST close and re-open your IndexSearcher for any changes
in your index to be visible. A Searcher will only show the documents in the
index when it was opened (don't worry, it works just fine as you modify your
index, it just doesn't show those modifications). However, you want to keep
your single instance of a searcher open as long as possible since opening a
Searcher is a very expensive operation. This really leaves you with the
question "How quickly to you need to see the results of your index
modifications?" If the answer is that an hour's delay is good enough, you
can just close/re-open your searcher every hour (perhaps just after doing
your updates).

Finally, it would be helpful if we knew how frequently is "frequently" when
you talk about updates. Are you talking hundreds of documents a minute?
tens? a document an hour? Again, I think you'll have a better idea of
whether updating is acceptable if you try an experiment and just take some
timings. I've noticed that when creating an index, the optimize step is a
long one.....

Hope this helps
Erick

P.S. Your English is waaaaaaay better than my ... well... any other language
<G>.....

On 10/3/06, Enrique Lamas <enrique.lamas@corp.ya.com> wrote:
>
> Hi,
> I'm working with a 100Mb length index. By application requirements, the
> information indexed is frecuently updated, with plenty of modifications,
> deletions and additions.
>
> I think Lucene is a very powerful searching tool once the index is already
> created, but I'm not sure if update index frecuently is also efficient. Even
> when index is quite long.
>
> Previosly, I regenerated all index from zero once at the day, but now I
> need changes to be noticed earlier.
>
> ┬┐How would you implement this?
>
> Sorry for my bad english.
>
> Thanks
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message