lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Leimbach, Johannes" <JLeimb...@CONET.DE>
Subject AW: Need advice for doing incremental Index updates
Date Wed, 09 Aug 2006 06:22:16 GMT
Good morning Chris,

Thank you for your answer. 

I have though about using an external filetable to solve my problem, but I don't like this
idea very much either.

The problem is, that your lucene index might be very easily get corrupted and out of sync.
Imagine the external index gets lost or writing to it is aborted while still updating the
index. It seems like you can get very easily inconsistencies. Can't I?

Though this will probably be the way I'll gonna go.. touching everything in the index might
be truly atomic but too slow.

To John:
I don't understand your question, can you post it again?

Bye,
Johannes

-----Urspr√ľngliche Nachricht-----
Von: Chris Hostetter [mailto:hossman_lucene@fucit.org] 
Gesendet: Dienstag, 8. August 2006 23:32
An: general@lucene.apache.org
Betreff: Re: Need advice for doing incremental Index updates


i would solve your problem external to the index ... everytime you run
your incrimental process, as you walk your directory tree of files (adding
the new ones, deleting/readdign the modified ones) record every file and
save that somewhere.  when you are all done, compare the list from this
run with the list from the last run -- any file in the old list and not in
hte new list is a document to be deleted.


: Date: Tue, 8 Aug 2006 15:48:16 +0200
: From: "Leimbach, Johannes" <JLeimbach@CONET.DE>
: Reply-To: general@lucene.apache.org
: To: general@lucene.apache.org
: Subject: Need advice for doing incremental Index updates
:
: Hello,
:
:
:
: I need some advice regarding incremental index updates.
:
:
:
: There are three cases I need to handle when iterating over the
: sourcefiles (files that need to be indexed):
:
: 1.	A file did not change since the last update
: 2.	A file did change since the last update
: 3.	A file was removed since the last update
:
:
:
: Case 1. is easy...
:
: Case 2. as well.. just remove the old file and add the new one
:
: Case 3. is bugging me..
:
:
:
: How can I find out if a file which is specified in the index, does not
: exist anymore?
:
:
:
: The blunt solution would be to retrieve *all* file paths from the index,
: and check whether each one exists. If so - go on, if the file does not
: exist on disk, remove it from the index. The problem I have with this
: is, that I am possibly pulling a lot of data from the lucene index. I
: will also do a lot of local filesystem checks. Sloooow?!
:
:
:
: Another idea I had is about introducing an "index version" integer. This
: number will be unique for each start of the parsing process. So each
: time my indexer program is started a new "index version" is created. Now
: each file which exists in the index and gets processed will have the
: "index version" number stored as a document field.
:
: This way all newly added and modified documents will have an up to date
: "index version" flag after indexing is complete.
:
: Now, to remove all physically deleted files from the index, I would
: select all documents which have an old "index version" flag stored
: inside them. Every document with such an old number can be safely
: removed.
:
: Problem with this solution is, that *every* document in the index will
: get updated: First the old index version field is removed, then the new
: field is added.
:
: On the plusside, removing deleted files will be very fast.
:
:
:
:
:
: What would you recommend for keeping an incremental update?
:
: I fear the first version will be utterly slow for small updates whereas
: the second version will be a lot faster - though adding stuff is slower
: because of the additional field update for every document.
:
:
:
: Thanks for your advice,
:
: Johannes :-)
:
:
:
:
:
:



-Hoss


Mime
View raw message