lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Index corruption with lucene 3.0.3
Date Wed, 17 Dec 2014 21:28:38 GMT
Hi,

 

the error message is: “java.io.IOException: No sub-file with id _xv.fnm found”, produced
by CompoundFileReader. This means that, the corresponding compound file does not contain the
missing sile –xv.fnm. Because it should be inside the CFS file, it is of course not part
of the directory. Lucene never has separate copies of the same data, only during merging or
when commit points are kept for later use.

 

The CFS file seems to be corrupt. Back in times of Lucene 3, CFS files had their “index”
(the dictionary of files inside) at the end of the file, because it was written at the end
(and then the offset of dictionary was written at beginning of file). You mentioned that you
had disk full issues, so it’s almost sure, that the cfs file is incomplete and the dictionary
is completely missing. It is very unlikely that you can recover from that situation unless
you have very deep knowledge on

 

In some  Lucene JAR files is an additional tool to “extract” CFS files (like unzip), you
may try to use it – but I am not sure if this was already existent in Lucene 3.0.3 (you
need to do some Javadoc search to look it up). But without the dictionary at the end of the
file it will also not work.

 

Uwe

 

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

http://www.thetaphi.de <http://www.thetaphi.de/> 

eMail: uwe@thetaphi.de

 

From: Shlomit Rosen [mailto:SHLOMITR@il.ibm.com] 
Sent: Wednesday, December 17, 2014 8:04 PM
To: <java-user@lucene.apache.org>
Subject: Index corruption with lucene 3.0.3

 

Hello, 

We have a client that is using lucene 3.0.3. 
They  are working with NAS storage device which recently had permission issues, 
which might have generated some "out of disk space" exceptions during indexing. 
We are uncertain if they also suffered JDK crashes in the past few months, as we 
discovered dmp files and javacores on their system. 

Consequently, they now have 3 corrupted indices. 
All of them show a similar issue: 

java.io.IOException: No sub-file with id _xv.fnm found 
        at org.apache.lucene.index.CompoundFileReader.openInput(CompoundFileReader.java:137)

        at org.apache.lucene.index.CompoundFileReader.openInput(CompoundFileReader.java:125)

        at org.apache.lucene.index.FieldInfos.<init>(FieldInfos.java:68) 
        at org.apache.lucene.index.SegmentReader$CoreReaders.<init>(SegmentReader.java:120)

        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:605) 
        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:583) 
        at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:470) 
        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:883) 


Looking at the indices file listing, I see that this file (i.e. - _xv.fnm) is really missing,

but I also see that a compound file with the same name exist on disk (i.e. - _xv.cfs). 

My question is - 
        is there a way to "save" the collection by re-creating the fnm file from the cfs file
(or in any other way...?) 
        Or does our client need to re-index the entire collection? (Assuming the checkIndex
-fix option is no good, because we cannot know which documents are lost...) 

I'm attaching the checkIndex output as reference 

Thanks in advance! 
Shlomit 




Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message