Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 83622 invoked from network); 5 Jun 2006 15:18:16 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 5 Jun 2006 15:18:16 -0000 Received: (qmail 67540 invoked by uid 500); 5 Jun 2006 15:17:33 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 67471 invoked by uid 500); 5 Jun 2006 15:17:33 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 67418 invoked by uid 99); 5 Jun 2006 15:17:33 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Jun 2006 08:17:33 -0700 X-ASF-Spam-Status: No, hits=0.6 required=10.0 tests=NO_REAL_NAME X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [204.90.187.18] (HELO smtp-gw4.us.gxs.com) (204.90.187.18) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Jun 2006 08:17:31 -0700 Received: from unknown (HELO bromail01.internal.gxs.com) ([10.160.0.166]) by smtp-gw4.us.gxs.com with ESMTP; 05 Jun 2006 11:24:25 -0400 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Subject: RE: Compound / non-compound index files and SIGKILL Date: Mon, 5 Jun 2006 11:17:10 -0400 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Compound / non-compound index files and SIGKILL Thread-Index: AcaIQNnMM65L/fuoTZCjizCT3xD93AAN3bGwAA5zkuA= From: To: X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Rob, Not sure I have the answer to your question, but thought I would add my observations. According to the book "Lucene In Action", the default index type is compound, although I did not find this mentioned in the API docs. However, my observations have been that this is not true. At least on my system it appears to build a multifile index at times. Which has lead to other problems similar to what you are reporting. I found the solution to my problem was to declare my index as compound. IndexWriter writer =3D new IndexWriter(indexDir, new StandardAnalyzer(), false); writer.setUseCompoundFile(true);=20 Now my index is built as compound and I no longer have "orphaned" index files. Hope this helps someone. Charles -----Original Message----- From: Rob Staveley (Tom) [mailto:rstaveley@seseit.com]=20 Sent: Monday, June 05, 2006 6:17 AM To: java-user@lucene.apache.org Subject: Compound / non-compound index files and SIGKILL I've been indexing live data into a compound index from an MTA. I'm resolving a bunch of problems unrelated to Lucene (disparate hangs in my content handlers). When I get a hang, I typically need to kill my daemon, alas more often than not using kill -9 (SIGKILL). However, these SIGKILLs are leaving large temporary(?) files, which I guess are non-compound index files transiently extracted from the working .cfs files: -rw-r--r-- 1 373138432 Jun 2 13:42 _18hup.fdt -rw-r--r-- 1 5054464 Jun 2 13:42 _18hup.fdx -rw-r--r-- 1 426 Jun 2 13:42 _18hup.fnm -rw-r--r-- 1 457253888 Jun 2 09:22 _15djq.fdt -rw-r--r-- 1 6205440 Jun 2 09:22 _15djq.fdx -rw-r--r-- 1 426 Jun 2 09:21 _15djq.fnm They are left intact after restarting my daemon. Presumably they are not treated as being part of the compound index. I see no corresponding .cfs file for them.=20 As a consequence of these - I suspect - I am getting a very large overall disk requirement for my index, presumably because of replicated field data. My guess is that the field data in the orphaned .fdt files needs to be regenerated. In another index directory from a previous test run (again with SIGKILLs), I have 98 GB of index files, with only 12 BG devoted to compound files for the field index (.cfs). The rest of the disk space is used by orphaned uncompounded index files; I see 51 GB devoted to uncompounded field data (.fdt), 13 BG devoted to term positions (.prx) and 13 BG devoted to term frequencies (.frq). Here's my question: How can I attempt to merge these orphaned into the compound index, using IndexWriter.addIndexes(), or would I be foolish attempting this? --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org