lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Øie <k...@gan.no>
Subject Re: Crash / Recovery Scenario
Date Tue, 09 Jul 2002 09:48:31 GMT
> only deletes the old one while it's working on the new one, so is there a
> way of checking for the .lock files in case
> of a crash a rolling back to the old index image?
>
> Nader Henein

i have some thoughts about crash/recovery/rollback that i haven't found any 
good solutions for.

If a crash happends during writing happens there is no good way to know if the 
index is intact, removing lock files doesn't help this fact, as we really 
don't know. So providing rollback functionality is a good but expensive way 
of compensating for lack of recovery.

To provide rollback i have used a RAMDirectory and serialized it to a SQL 
table. By doing this i can catch any exceptions and ask the database to 
rollback if required. This works great for small indexes but if the index 
grows you will have problems with performance as the whole RAMDir has to be 
serialized/deserialized into the BLOB all the time.

A better solution would be to hack the FSDirectory to store each file it would 
store in a file-directory as a serialized  byte array in a blob of a sql 
table. This would increase performance because the whole Directory don't have 
to change each time, and it doesn't have to read the while directory into 
memory. I also suspect lucene to sort its records into these different files 
for increased performance (like: i KNOW that record will be in segment "xxx" 
if it is there at all).

I have looked at the source for the RAMDirectory and the FSDirectory and they 
could both be altered to store their internal buffers into a BLOB, but i 
haven't managed to do this successfully. The problem i have been pounding is 
the lucene.InputStream's seek() function. This really requires the underlying 
impl to be either a file, or a array in memory. For a BLOB this would mean 
that the blob has to be fetched, then read/seek-ed/written/ then stored back 
again. (is this correct?!?, and if so is there a way to know WHEN it is 
required to fetch/store the array).

I would really appreciate any tips on this as i would think 
crash/recovery/rollback functionality to benefit lucene greatly.

I have indexes that uses 5 days to build, and it's really bad to receive 
exceptions during a long index run, and no recovery/rollback functionality.

Mvh Karl Øie

--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message