lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rob Staveley (Tom)" <rstave...@seseit.com>
Subject RE: Compound / non-compound index files and SIGKILL
Date Mon, 05 Jun 2006 15:45:04 GMT
Thanks for the suggestion, Charles. I've also been consistently explicitly
setting writer.setUseCompoundFile(true) too. When I started the project with
Lucene 1.4.3 and ran out of file handles without it I found that it was
needed. I'm fairly certain from timestamps on the orphaned files that the
orphaning is coincidental with my SIGKILLs.

-----Original Message-----
From: Charles.Sanders@gxs.com [mailto:Charles.Sanders@gxs.com] 
Sent: 05 June 2006 16:17
To: java-user@lucene.apache.org
Subject: RE: Compound / non-compound index files and SIGKILL

Rob,
Not sure I have the answer to your question, but thought I would add my
observations.  According to the book "Lucene In Action", the default index
type is compound, although I did not find this mentioned in the API docs.
However, my observations have been that this is not true.  At least on my
system it appears to build a multifile index at times.
Which has lead to other problems similar to what you are reporting.

I found the solution to my problem was to declare my index as compound.
	IndexWriter writer = new IndexWriter(indexDir, new
StandardAnalyzer(), false);
      writer.setUseCompoundFile(true); 

Now my index is built as compound and I no longer have "orphaned" index
files.

Hope this helps someone.
Charles




-----Original Message-----
From: Rob Staveley (Tom) [mailto:rstaveley@seseit.com]
Sent: Monday, June 05, 2006 6:17 AM
To: java-user@lucene.apache.org
Subject: Compound / non-compound index files and SIGKILL

I've been indexing live data into a compound index from an MTA. I'm
resolving a bunch of problems unrelated to Lucene (disparate hangs in my
content handlers). When I get a hang, I typically need to kill my daemon,
alas more often than not using kill -9 (SIGKILL).

However, these SIGKILLs are leaving large temporary(?) files, which I guess
are non-compound index files transiently extracted from the working .cfs
files:

-rw-r--r--    1  373138432 Jun  2 13:42 _18hup.fdt
-rw-r--r--    1      5054464 Jun  2 13:42 _18hup.fdx
-rw-r--r--    1              426 Jun  2 13:42 _18hup.fnm

-rw-r--r--    1  457253888 Jun  2 09:22 _15djq.fdt
-rw-r--r--    1      6205440 Jun  2 09:22 _15djq.fdx
-rw-r--r--    1              426 Jun  2 09:21 _15djq.fnm

They are left intact after restarting my daemon. Presumably they are not
treated as being part of the compound index. I see no corresponding .cfs
file for them. 

As a consequence of these - I suspect - I am getting a very large overall
disk requirement for my index, presumably because of replicated field data.
My guess is that the field data in the orphaned .fdt files needs to be
regenerated.

In another index directory from a previous test run (again with SIGKILLs), I
have 98 GB of index files, with only 12 BG devoted to compound files for the
field index (.cfs). The rest of the disk space is used by orphaned
uncompounded index files; I see 51 GB devoted to uncompounded field data
(.fdt), 13 BG devoted to term positions (.prx) and 13 BG devoted to term
frequencies (.frq).

Here's my question:

How can I attempt to merge these orphaned into the compound index, using
IndexWriter.addIndexes(), or would I be foolish attempting this?

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Mime
View raw message