lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nader S. Henein" <...@bayt.net>
Subject RE: Crash / Recovery Scenario
Date Tue, 09 Jul 2002 10:51:14 GMT
Karl, what if I copy the index in memory or in another directory prior to
indexing thereby, assuring a working index in the case of a crash. I want to
stay away from DB interaction as I am trying to move out of an Oracle
Intermedia search solution (if you saw the Oracle price list you would too).
I have a backup process witch
1) Checks if the index is being updated
2) Does a small trial search (to ensure that the index s not corrupt)
3) Tar the index and move the file to another disk

I'm thinking of writing a full backup/restore add-on to Lucene so all of
this can be jared together as part of the package.

Nader

-----Original Message-----
From: Karl Øie [mailto:karl@gan.no]
Sent: Tuesday, July 09, 2002 1:49 PM
To: Lucene Users List
Subject: Re: Crash / Recovery Scenario


> only deletes the old one while it's working on the new one, so is there a
> way of checking for the .lock files in case
> of a crash a rolling back to the old index image?
>
> Nader Henein

i have some thoughts about crash/recovery/rollback that i haven't found any
good solutions for.

If a crash happends during writing happens there is no good way to know if
the
index is intact, removing lock files doesn't help this fact, as we really
don't know. So providing rollback functionality is a good but expensive way
of compensating for lack of recovery.

To provide rollback i have used a RAMDirectory and serialized it to a SQL
table. By doing this i can catch any exceptions and ask the database to
rollback if required. This works great for small indexes but if the index
grows you will have problems with performance as the whole RAMDir has to be
serialized/deserialized into the BLOB all the time.

A better solution would be to hack the FSDirectory to store each file it
would
store in a file-directory as a serialized  byte array in a blob of a sql
table. This would increase performance because the whole Directory don't
have
to change each time, and it doesn't have to read the while directory into
memory. I also suspect lucene to sort its records into these different files
for increased performance (like: i KNOW that record will be in segment "xxx"
if it is there at all).

I have looked at the source for the RAMDirectory and the FSDirectory and
they
could both be altered to store their internal buffers into a BLOB, but i
haven't managed to do this successfully. The problem i have been pounding is
the lucene.InputStream's seek() function. This really requires the
underlying
impl to be either a file, or a array in memory. For a BLOB this would mean
that the blob has to be fetched, then read/seek-ed/written/ then stored back
again. (is this correct?!?, and if so is there a way to know WHEN it is
required to fetch/store the array).

I would really appreciate any tips on this as i would think
crash/recovery/rollback functionality to benefit lucene greatly.

I have indexes that uses 5 days to build, and it's really bad to receive
exceptions during a long index run, and no recovery/rollback functionality.

Mvh Karl Øie

--
To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>




--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message