lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rob Staveley (Tom)" <rstave...@seseit.com>
Subject RE: Seeing what's occupying all the space in the index
Date Fri, 26 May 2006 19:12:55 GMT
> Note that IndexReader has a main() that will list the contents of compound
index files.

It looks like some of my index is compound and some isn't. My not very well
informed guess is that an optimize() got interrupted somewhere along the
line.

If I try to optimize the index now, it throws exceptions.

lucli> optimize
Starting to optimize index.
java.io.IOException: Cannot overwrite: /mnt/sdb1/index/index-1/_2lhqi.fnm
        at
org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:322)
        at org.apache.lucene.index.FieldInfos.write(FieldInfos.java:255)
        at
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:176)
        at
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:88)
        at
org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:707)
        at
org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:684)
        at
org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:543)
        at lucli.LuceneMethods.optimize(LuceneMethods.java:192)
        at lucli.Lucli.handleCommand(Lucli.java:223)
        at lucli.Lucli.<init>(Lucli.java:148)
        at lucli.Lucli.main(Lucli.java:175)

Would the smart thing to do at this point be to use IndexReader's main()
method to extract the contents of the compound files and then to attempt to
merge them with the unmerged indexes? [I'll need to delve further into
Doug's excellent LIA to figure out how to do this.]

To recap, I have 98 GB of files in my index, with: 

	51 GB devoted to field data (.fdt), 
	13 GB devoted to term positions (.prx)
	13 GB devoted to term frequencies (.frq)
	12 GB devoted to compound files for the field index (.cfs)

I seem to have the same mess on a parallel system too - I'm indexing in two
data centres.

Mime
View raw message