lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bogdan Ghidireac" <bog...@ecstend.com>
Subject CheckIndex tool issues
Date Tue, 27 Nov 2007 14:44:15 GMT
Hi,

I tried to use the CheckIndex tool (the latest svn code) and I was surprised
to notice that all my indexes from production (around 30) are corrupt. This
is highly unlikely because they were running for about one year and I had no
exception during search so far.

One recurring pattern I observed is that the tool reports the segments with
deleted docs as corrupt. The one without deleted docs are fine.. Here is a
sample output.

index 1

  6 of 7: name=_wxlp docCount=1001
    compound=true
    numFiles=1
    size (MB)=0.213
    no deletions
    test: open reader.........OK
    test: fields, norms.......OK [12 fields]
    test: terms, freq, prox...OK [4142 terms; 8004 terms/docs pairs; 8006
tokens]
    test: stored fields.......OK [12012 total field count; avg 12 fields per
doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq
vector fields per doc]

  7 of 7: name=_wxqg docCount=178
    compound=true
    numFiles=1
    size (MB)=0.039
    no deletions
    test: open reader.........OK
    test: fields, norms.......OK [12 fields]
    test: terms, freq, prox...OK [819 terms; 1417 terms/docs pairs; 1417
tokens]
    test: stored fields.......OK [2136 total field count; avg 12 fields per
doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq
vector fields per doc]

index 2

  6 of 7: name=_10hr docCount=1978
    compound=true
    numFiles=2
    size (MB)=3.601
    has deletions [delFileName=_10hr_5.del]
    test: open reader.........OK [17 deleted docs]
    test: fields, norms.......OK [10 fields]
    test: terms, freq, prox...FAILED
    WARNING: would remove reference to this segment (-fix was not
specified); full exception:
java.lang.RuntimeException: term ASIN:342678033X docFreq=5 != num docs seen
4
        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:217)

  7 of 7: name=_10i0 docCount=196
    compound=true
    numFiles=1
    size (MB)=0.44
    no deletions
    test: open reader.........OK
    test: fields, norms.......OK [10 fields]
    test: terms, freq, prox...OK [8611 terms; 24307 terms/docs pairs; 32841
tokens]
    test: stored fields.......OK [1960 total field count; avg 10 fields per
doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq
vector fields per doc]


Is this a known issue or my indexes are really corrupt ?

Regards,
Bogdan

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message