lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aigner, Thomas" <TAig...@WescoDist.com>
Subject RE: Insert new records into index
Date Fri, 11 Nov 2005 16:32:42 GMT
Thanks for the advice Paul,
I thought about doing two passes.. Delete all and then insert all, but
the problem with that approach is if my program fails somewhere in
between start and end.. I may end up with many deleted records and none
changed.  The same could happen with a batch build.  How are you
handling that possible scenario?

-----Original Message-----
From: Paul.Illingworth@saaconsultants.com
[mailto:Paul.Illingworth@saaconsultants.com] 
Sent: Friday, November 11, 2005 11:22 AM
To: java-user@lucene.apache.org
Subject: Re: Insert new records into index






Hello,

You really do need to batch up your deletes and inserts otherwise it
will
take a long time. If you can, do all your deletes and then all of your
inserts. I have gone to the trouble of queueing index operations and
when a
new operation comes along I reorder the job queue to ensure deletes and
indexing jobs are grouped together.

If your system doesn't allow you to batch together deletes and writes
then
something I have found useful is to split the index into two. I have an
"old" index and a "new" index. I add documents to the "new" index and
then
periodically merge into the "old" and clear out the "new". To delete I
have
to delete from both indexes as I don't know where my documents are. This
means that you only ever have to open the "old" index with an
IndexReader
(except when merging of course). The "new" index can be opened with a
writer for writes and swapped to a reader for reads. By keeping this
index
small speeds up the constant opening and closing. Searching is
straightforward using the MultiSearcher.

I don't know anything about the lock problem - it's not something I've
ever
seen (I'm using 1.4.3).

Regards

Paul I.





 

             "Aigner, Thomas"

             <TAigner@WescoDis

             t.com>
To 
                                       <java-user@lucene.apache.org>

             11/11/2005 14:55
cc 
 

 
Subject 
             Please respond to         Insert new records into index

             java-user@lucene.

                apache.org

 

 

 

 





Howdy all,
             I am having a problem with inserting/updating records into
my
index.  I have approximately 1.5M records in the index taking about 2.5G
space when optimized.

If I want to update 1000 records, I delete the old item and insert the
new one.  This is taking a LONG time to accomplish.  I believe this is
taking time due to the fact that I have to close the writer to delete
from the reader, then open the writer to insert the new record.  I have
to do this 1 time for each item that needs to be inserted.

I tried to not optimize the index, thinking that opening the
index/closing it was taking the big time, but the time seems to be the
same when I have many files ( I had to uncrease the ulimit by quite a
few to avoid the too many files error as well).

Snippets of code..
To delete a record, I am:
             //Close the index
             writer.close();

             //Instantiate the reader object for deletion
             IndexReader reader = IndexReader.open(dir);

             reader.delete(new Term("simm",simm));
      reader.close();

      //Get directory again and Create a new writer to open for insert

Then insert the record.. and move to the next record


I can't keep the writer open to delete an item cause I get this error:
             Lock obtain timed out: Lock@

Anyone have any ideas on how to speed this process up?


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message