lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manuel Le Normand <manuel.lenorm...@gmail.com>
Subject segment corruption - ArrayIndexOutOfBoundsException
Date Tue, 22 Oct 2013 11:36:40 GMT
Hello,

My lucene index contains 46 segments with a total of 4M docs. Lately, while
running queries I started getting seldom exceptions from this index:

java.lang.ArrayIndexOutOfBoundsException at

org.apache.lucene.codecs.lucene41.ForUtil.readBlock(ForUtil.java196) at

org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlcokDocsAndPostionsEnum.refillPositions(Lucene41PostingsReader.java:796)
at

org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlcokDocsAndPostionsEnum.skipPositions(Lucene41PostingsReader.java:961)
at

org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlcokDocsAndPostionsEnum.nextPosition(Lucene41PostingsReader.java:988)
at

org.apache.lucene.search.ExactPhraseScorer.phraseFreq(ExactPhraseScorer.java:213)
at

…


looking at the code the exceptions comes from

final int ecodedSize = encodedSizez[numBits];


These exceptions provoke query failures (about 5%, not sure what is the
pattern of it).

I run a checkIndex on this index, getting on one of the segments the
following log:


Segments file=segments_4k4 numSegments=46 version=4.3 format=
userData={commitTimeMSec=1382425789488}

1 of 46: name=_1ye docCount=67529

Codec=Lucene42

Compound=false

numFiles=15

size (MB)=4,922.155

diagnostics = {timestamp=1371533248779, os=Linux,
ss.version=2.6.32-279.e16.x86_64, mergeFactor=15, source=merge,
lucene.version=4.3.0 1477023 – simonw – 2013-04-29 14:55:14, os.arch=amd64,
mergeMaxSumSegments=-1, java.version=1.7.0_11, java.vendor=Oracle
Corporation}

has deletions [delGen=417]

test: open reader………FAILED

WARNING: fixIndex() would remove reference to this segment; full excpetion:

Java.lang.AsertionError: liveDocs.count()=40242 info,docCount=67529
info.getDelCount()=27193

At
org.apache.lucene.codecs.lucene40.Lucene40LovieDocsFormat.readLiceDocs(Lucene40LiveDocsFormat.java:92)

At org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:61)

At org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:543)

At org.apache.lucene.index.CheckIndex.main(CheckIndex.java:1854)



While other segments look fine


2 of 46: name=_3wz docCount=130918

Codec=Lucene42

Compound=false

numFiles=15

size (MB)=4,982.155

diagnostics = {timestamp=1372838229010, os=Linux,
os.version=2.6.32-279.e16.x86_64, mergeFactor=15, source=merge,
lucene.version=4.3.0 1477023 – simonw – 2013-04-29 14:55:14, os.arch=amd64,
mergeMaxSumSegments=-1, java.version=1.7.0_11, java.vendor=Oracle
Corporation}

has deletions [delGen=552]

test: open reader………OK [24610 deleted docs]

test: fields………………..OK [235 fields]

test: fields normas…..OK [29 fields]

test: terms, freq, prox….OK [45127880 terms; 25487529 terms/docs pairs;
854489030 tokens]

test (ignoring deletes): terms, freq, prox…OK [50784030; 305300244
terms/docs pairs; 854489030 tokens]

test: stored fields…….OK [41472391 total field count; avg 390 fields per
doc]

test: term vectors……OK [268790 total vector count; avg 3.035 term/freq
vector fields per doc]

test: docvalues……….OK [0 total doc count;1 docalues fields]



Does anyone know what kind of corruption might throw this exception on
opening a reader?


btw - The above index is one of few other shard in the same collection
(managed in Solr). Other shards are in good state.


Thanks in advance,


Manuel

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message