From lucene-user-return-2121-qmlist-jakarta-archive-lucene-user=jakarta.apache.org@jakarta.apache.org Tue Jul 09 10:49:31 2002 Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@apache.org Received: (qmail 17574 invoked from network); 9 Jul 2002 10:49:31 -0000 Received: from unknown (HELO nagoya.betaversion.org) (192.18.49.131) by daedalus.apache.org with SMTP; 9 Jul 2002 10:49:31 -0000 Received: (qmail 19981 invoked by uid 97); 9 Jul 2002 10:49:38 -0000 Delivered-To: qmlist-jakarta-archive-lucene-user@jakarta.apache.org Received: (qmail 19965 invoked by uid 97); 9 Jul 2002 10:49:37 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 19953 invoked by uid 98); 9 Jul 2002 10:49:37 -0000 X-Antivirus: nagoya (v4198 created Apr 24 2002) Reply-To: From: "Nader S. Henein" To: "Lucene Users List" , Subject: RE: Crash / Recovery Scenario Date: Tue, 9 Jul 2002 14:51:14 +0400 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) In-Reply-To: <200207091148.31931.karl@gan.no> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 Importance: Normal X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Karl, what if I copy the index in memory or in another directory prior to indexing thereby, assuring a working index in the case of a crash. I want to stay away from DB interaction as I am trying to move out of an Oracle Intermedia search solution (if you saw the Oracle price list you would too). I have a backup process witch 1) Checks if the index is being updated 2) Does a small trial search (to ensure that the index s not corrupt) 3) Tar the index and move the file to another disk I'm thinking of writing a full backup/restore add-on to Lucene so all of this can be jared together as part of the package. Nader -----Original Message----- From: Karl Řie [mailto:karl@gan.no] Sent: Tuesday, July 09, 2002 1:49 PM To: Lucene Users List Subject: Re: Crash / Recovery Scenario > only deletes the old one while it's working on the new one, so is there a > way of checking for the .lock files in case > of a crash a rolling back to the old index image? > > Nader Henein i have some thoughts about crash/recovery/rollback that i haven't found any good solutions for. If a crash happends during writing happens there is no good way to know if the index is intact, removing lock files doesn't help this fact, as we really don't know. So providing rollback functionality is a good but expensive way of compensating for lack of recovery. To provide rollback i have used a RAMDirectory and serialized it to a SQL table. By doing this i can catch any exceptions and ask the database to rollback if required. This works great for small indexes but if the index grows you will have problems with performance as the whole RAMDir has to be serialized/deserialized into the BLOB all the time. A better solution would be to hack the FSDirectory to store each file it would store in a file-directory as a serialized byte array in a blob of a sql table. This would increase performance because the whole Directory don't have to change each time, and it doesn't have to read the while directory into memory. I also suspect lucene to sort its records into these different files for increased performance (like: i KNOW that record will be in segment "xxx" if it is there at all). I have looked at the source for the RAMDirectory and the FSDirectory and they could both be altered to store their internal buffers into a BLOB, but i haven't managed to do this successfully. The problem i have been pounding is the lucene.InputStream's seek() function. This really requires the underlying impl to be either a file, or a array in memory. For a BLOB this would mean that the blob has to be fetched, then read/seek-ed/written/ then stored back again. (is this correct?!?, and if so is there a way to know WHEN it is required to fetch/store the array). I would really appreciate any tips on this as i would think crash/recovery/rollback functionality to benefit lucene greatly. I have indexes that uses 5 days to build, and it's really bad to receive exceptions during a long index run, and no recovery/rollback functionality. Mvh Karl Řie -- To unsubscribe, e-mail: For additional commands, e-mail: -- To unsubscribe, e-mail: For additional commands, e-mail: