lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject RE: Insert new records into index
Date Fri, 11 Nov 2005 16:48:56 GMT

I queue up all my index operations. If the app stops the queue gets saved
to disk. When the app restarts the queue is loaded and everything carries
on. I haven't looked at the app failing just yet. I know the JVM has hooks
that can be used to ensure clean up code gets called when the JVM exits.
I'm don't think this copes with every exit condition though.

When my app starts I check for the existance of Lucene lock files - if thay
are there then it either means another process is using the index (not
allowed on my system) or they were not removed when the app closed down
last time. In the latter case it probably means the index is suspect. In
this case I can reproduce the index from the source documents - not
something I particularly want to do but it can be.

Hope this helps.

Paul I.

             "Aigner, Thomas"                                              
   >                                                     To 
             11/11/2005 16:32                                           cc 
             Please respond to         RE: Insert new records into index   

Thanks for the advice Paul,
I thought about doing two passes.. Delete all and then insert all, but
the problem with that approach is if my program fails somewhere in
between start and end.. I may end up with many deleted records and none
changed.  The same could happen with a batch build.  How are you
handling that possible scenario?

-----Original Message-----
Sent: Friday, November 11, 2005 11:22 AM
Subject: Re: Insert new records into index


You really do need to batch up your deletes and inserts otherwise it
take a long time. If you can, do all your deletes and then all of your
inserts. I have gone to the trouble of queueing index operations and
when a
new operation comes along I reorder the job queue to ensure deletes and
indexing jobs are grouped together.

If your system doesn't allow you to batch together deletes and writes
something I have found useful is to split the index into two. I have an
"old" index and a "new" index. I add documents to the "new" index and
periodically merge into the "old" and clear out the "new". To delete I
to delete from both indexes as I don't know where my documents are. This
means that you only ever have to open the "old" index with an
(except when merging of course). The "new" index can be opened with a
writer for writes and swapped to a reader for reads. By keeping this
small speeds up the constant opening and closing. Searching is
straightforward using the MultiSearcher.

I don't know anything about the lock problem - it's not something I've
seen (I'm using 1.4.3).


Paul I.

             "Aigner, Thomas"



             11/11/2005 14:55

             Please respond to         Insert new records into index



Howdy all,
             I am having a problem with inserting/updating records into
index.  I have approximately 1.5M records in the index taking about 2.5G
space when optimized.

If I want to update 1000 records, I delete the old item and insert the
new one.  This is taking a LONG time to accomplish.  I believe this is
taking time due to the fact that I have to close the writer to delete
from the reader, then open the writer to insert the new record.  I have
to do this 1 time for each item that needs to be inserted.

I tried to not optimize the index, thinking that opening the
index/closing it was taking the big time, but the time seems to be the
same when I have many files ( I had to uncrease the ulimit by quite a
few to avoid the too many files error as well).

Snippets of code..
To delete a record, I am:
             //Close the index

             //Instantiate the reader object for deletion
             IndexReader reader =;

             reader.delete(new Term("simm",simm));

      //Get directory again and Create a new writer to open for insert

Then insert the record.. and move to the next record

I can't keep the writer open to delete an item cause I get this error:
             Lock obtain timed out: Lock@

Anyone have any ideas on how to speed this process up?

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message