lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Index Corruption, lucene 5.0 codec, term posting.
Date Wed, 06 Jan 2016 00:01:39 GMT
Hi,

> We are seeing index corruption on one of our index. The index is 30GB with
> 169 fields, created using Lucene 5.2.1 and lucene 5.0 codec,
> 
> no deletes and is optimized.
> 
> 
> We see an " Invalid vLong detected (negative values disallowed)" while
> checking the field posting. However the checksum checks on the index are
> not failing.
> 
> If there was some data corruption on the physical disk, wouldn't the
> checksum be different (one of the disk and one compute while reading)?
> Could there be any possibility of a bug in the BlockTreeTermsWriter while
> encoding/writing the block term information.

Yes. If checksum is correct, Index should be fine, if the data written was already fine (see
below).

> Exception stack trace from CheckIndex tool,

This looks like a well known sign-flip bug in your JVM, which is antique. So broken data was
already written to disk and checksum over wrong data was calculated.

> Exception in thread "main" java.lang.RuntimeException: Invalid vLong
> detected (negative values disallowed)
> at
> org.apache.lucene.store.ByteArrayDataInput.readVLong(ByteArrayDataInpu
> t.java:153)
> at
> org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.decodeMet
> aData(SegmentTermsEnumFrame.java:464)
> at
> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.docFreq(Segment
> TermsEnum.java:983)
> at org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:1269)
> at org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:1716)
> 
> 
> Unfortunately I am not able to reproduce this on a smaller scale and have
> seen it couple of other times as well. Let me know if you need more data.
> 
> 
> Java version : java version - java version "1.7.0_04"
> 
> Java(TM) SE Runtime Environment (build 1.7.0_04-b20)
> Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode)

This Java version is "antique". Those old version had serious bugs that most likely caused
the index corrumption (while writing and merging data). The checksum data is correct because
Lucene "thinks" that data was written correctly. In reality, the VLong is not correct (a sign
flipped because of one of the well-known sign-flipping bugs in those antique Java versions).

Java 7 is only safe to use after 7u55, 7u4 is now more than 4 years old.

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message