lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re: Updating the index and searching
Date Thu, 08 Sep 2005 09:04:15 GMT

Hello Brian,

Updating an index is very straightforward. Simply open the index writer for
your existing index and add the new documents. The issue is that if you
need to search on the updated index you need to open a new index reader in
order to see the new documents. This is the timeconsuming bit - it also
seems that there's a lot caching going on in the index reader that would be
lost by opening a new index reader. Ideally, if you can, you need to batch
up your documents that need to be added to the index and do it in one hit.
If you can't batch up the documents then you're likely to suffer the same
problem as me.

Currently I have a single index writer and a single index reader per index.
When I add a document I open the index writer and then add the document. I
now leave the index writer open in case any other documents come along that
need to be added. I have a timeout on this to close the index writer if
nothing happens for a while to flush data to disk. If a query request comes
along I close any index writers and open an index reader and perform the
query. I now leave this open in case any more queries come along. I only
close this index reader if I need to add documents to the index in which
case I close the reader and open the writer. The idea is to keep either the
index writer or reader open as long as possible to minimise the hit of
opening a new index reader/writer.

Some ideas :-
1) Like I said in the previous mail an idea I have had is to have two
indexes, one for todays documents that has the constant swapping of index
reader/writers and another bigger index that holds all the documents prior
to todays that only ever gets open for reading. At the end of the day the
indexes would be merged ready for tomorrow. This means only the small index
ever gets opened/closed repeatedly.
2) Something else I found out about  yesterday that looks useful (to me at
least) is the ParallelReader that allows fields for a particular document
to be split across indexes. This isn't in Lucene 1.4.3 but I think is in
the source repository. This allows, for example, updates to be carried out
on a meta data fields (held in one  index) avoiding re-analysing and
re-indexing the text of a document held in another index.

Unless anything else crops up I'll probably give the two index approach a
go in a week or two.


Paul I.

Brian <> wrote on 07/09/2005 20:11:26:

> Paul,
>      Is there a way just to update an existing index?
> Meaning I have ~20,000 documents a day that I need to
> append to my index. I don't need to delete anything,
> just add the new documents. Is there an easy way to
> get that done?
> Thanks for any thought's.
> Brian
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message