lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rob Staveley (Tom)" <>
Subject RE: Merging "orphaned" segments into a composite index
Date Sat, 16 Sep 2006 08:31:58 GMT
It looks like my segments file only contains information for the .cfs
segments. So this approach doesn't work. I was wondering if I could use
IndexWriter.addIndexes(IndexReader[]) instead. Can I open an IndexReader
without a corresponding segments file? In notice that
always operates on directories, which I guess means that it uses the
segments file. 

Is a segments file something that can be easily bodged for a bunch of index
files which aren't referenced by the segments file?

This probably all seems like a foolish errand, but my two indexes are > 300G
each and regenerating them is something I'd like to avoid.

-----Original Message-----
From: Rob Staveley (Tom) [] 
Sent: 15 September 2006 18:18
Subject: Merging "orphaned" segments into a composite index

I have had some badly behaved Lucene indexing software crash on me several
times and have been left with an index directory with lots of non-composite
files in, when all I ought to be getting is the compound files .cfs  files
plus deletable  and segments. 

Re-indexing everything doesn't bear thinking about. I was wondering if I'd
be able to merge these non-compound files into the composite index, and if
so... how? [I appreciate that there is some risk in doing this, bearing in
mind software crashed when the orphaned index files were created.]

If you'll excuse the Perl gibber, this gives a sense of what's in the index

$ find . | perl -n -e 'if (/\..+.(\..+)/) {print "$1\n"}' | sort | uniq -c
     15 .cfs
      4 .f0
     40 .fdt
     36 .fdx
     40 .fnm
     16 .frq
      1 .log
     16 .prx
     15 .tii
     16 .tis
      5 .tmp

Here's my thinking:

(1) I stop my indexer
(2) I create a temp directory and move everything other than the .cfs files,
deletable and segments into it
(3) I open an IndexWriter to my composite index and use
IndexWriter.addIndexes(Directory[]) on the temp directory

Assuming the files aren't corrupt, should that do the job to create a nicely
merged composite index, or is this a foolish undertaking?

View raw message