Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 72438 invoked from network); 11 Sep 2006 17:05:00 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 11 Sep 2006 17:05:00 -0000 Received: (qmail 79685 invoked by uid 500); 11 Sep 2006 17:05:00 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 79067 invoked by uid 500); 11 Sep 2006 17:04:57 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 79056 invoked by uid 99); 11 Sep 2006 17:04:57 -0000 Received: from idunn.apache.osuosl.org (HELO idunn.apache.osuosl.org) (140.211.166.84) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Sep 2006 10:04:57 -0700 Authentication-Results: idunn.apache.osuosl.org smtp.mail=paul.elschot@xs4all.nl; spf=permerror X-ASF-Spam-Status: No, hits=0.1 required=5.0 tests=FORGED_RCVD_HELO Received-SPF: error (idunn.apache.osuosl.org: domain xs4all.nl from 194.109.24.37 cause and error) Received: from ([194.109.24.37:2971] helo=smtp-vbr17.xs4all.nl) by idunn.apache.osuosl.org (ecelerity 2.1 r(10620)) with ESMTP id E2/00-25916-14795054 for ; Mon, 11 Sep 2006 10:05:07 -0700 Received: from k8l.lan (porta.xs4all.nl [80.127.24.69]) by smtp-vbr17.xs4all.nl (8.13.6/8.13.6) with ESMTP id k8BH4pFX069553 for ; Mon, 11 Sep 2006 19:04:51 +0200 (CEST) (envelope-from paul.elschot@xs4all.nl) From: Paul Elschot To: java-dev@lucene.apache.org Subject: Re: After kill -9 index was corrupt Date: Mon, 11 Sep 2006 19:04:50 +0200 User-Agent: KMail/1.8.2 References: <4504ACA4.6090309@manawiz.com> <200609110915.24630.paul.elschot@xs4all.nl> <4505153C.7060509@manawiz.com> In-Reply-To: <4505153C.7060509@manawiz.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Message-Id: <200609111904.50709.paul.elschot@xs4all.nl> X-Virus-Scanned: by XS4ALL Virus Scanner X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N On Monday 11 September 2006 09:50, Chuck Williams wrote: >=20 > Paul Elschot wrote on 09/10/2006 09:15 PM: > > On Monday 11 September 2006 02:24, Chuck Williams wrote: > > =20 > >> Hi All, > >> > >> An application of ours under development had a memory link that caused > >> it to slow interminably. On linux, the application did not response to > >> kill -15 in a reasonable time, so kill -9 was used to forcibly termina= te > >> it. After this the segments file contained a reference to a segment > >> whose index files were not present. I.e., the index was corrupt and > >> Lucene could not open it. > >> > >> A thread dump at the time of the kill -9 shows that Lucene was merging > >> segments inside IndexWriter.close(). Since segment merging only commi= ts > >> (updates the segments file) after the newly merged segment(s) are > >> complete, I expect this is not the actual problem. > >> > >> Could a kill -9 prevent data from reaching disk for files that were > >> previously closed? If so, then Lucene's index can become corrupt after > >> kill -9. In this case, it is possible that a prior merge created new > >> segment index files, updated the segments file, closed everything, the > >> segments file made it to disk, but the index data files and/or their > >> directory entries did not. > >> > >> If this is the case, it seems to me that flush() and > >> FileDescriptor.sync() are required on each index file prior to close() > >> to guarantee no corruption. Additionally a FileDescriptor.sync() is > >> also probably required on the index directory to ensure the directory > >> entries have been persisted. > >> =20 > > > > Shouldn't the sync be done after closing the files? I'm using sync in a > > (un*x) shell script after merges before backups. I'd prefer to have some > > more of this syncing built into Lucene because the shell sync syncs all > > disks which might be more than needed. So far I've had no problems, > > so there was no need to investigate further. > > =20 > I believe FileDescriptor,sync() uses fsync and not sync on linux. A > FileDescriptor is no longer valid after the stream is closed, so sync() > could not be done on a closed stream. I think the correct protocol is > flush() the stream, sync() it's FD, then close() it. =46rom Sun's javadocs: flush(), fsync(), close() is indeed the right order for a single file. =20 > Paul, do you know if kill -9 can create the situation where bytes from a > closed file never make it to disk in linux? I think Lucene needs sync() What do mean by "never"? The problem with not using flush() is that the jvm simply does _not_ guarantee that data will ever end up on disk, which is why I added the sync in the shell script after the document mergin= g. With flush() and sync the guarantee is only given as far as the OS can use the disk driver, if the disk actually does not write, there is nothing to b= e=20 done about that, see the link in the other post. > in any event to be robust with respect to OS crashes, but am wondering > if this explains my kill -9 problem as well. It seems bogus to me that This can explain your problem, data will eventually be written to the disk by the OS, but when? > a closed file's bytes would fail to be persisted unless the OS crashed, > but I can't find any other explanation and I can't find any definitive > information to affirm or refute this possible side effect of kill -9. >=20 > The issue I've got is that my index can never lose documents. So I've > implemented journaling on top of Lucene where only the last > maxBufferedDocs documents are journaled and the whole journal is reset > after close(). My application has no way to know when the bytes make it > to disk, and so cannot manage its journal properly unless Lucene ensures > index integrity with sync()'s. Do you also flush/sync the journal to disk? If you need to recover from the journal, it has to be written to disk before doing "transactions" (adding docs) in lucene. Regards, Paul Elschot --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org