lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] [Created] (LUCENE-4455) CheckIndex shows wrong segment size in 4.0 because SegmentInfoPerCommit.sizeInBytes counts every file 2 times; check for deletions is negated and results in wrong output
Date Mon, 01 Oct 2012 16:59:07 GMT
Uwe Schindler created LUCENE-4455:
-------------------------------------

             Summary: CheckIndex shows wrong segment size in 4.0 because SegmentInfoPerCommit.sizeInBytes
counts every file 2 times; check for deletions is negated and results in wrong output
                 Key: LUCENE-4455
                 URL: https://issues.apache.org/jira/browse/LUCENE-4455
             Project: Lucene - Core
          Issue Type: Bug
          Components: core/index
    Affects Versions: 4.0-BETA
            Reporter: Uwe Schindler
            Priority: Blocker
             Fix For: 4.0


I found this bug in 4.0-RC1 when I compared the checkindex outputs for 4.0 and 3.6.1:
- The segment size is twice as big as reported by "ls -lh". The reason is that SegmentInfoPerCommit.sizeInBytes
counts every file 2 times. This seems to be serious (it is just statistics), because MergePolicy
chooses merges because of this. On the other hand if all segments are twice as big it should
not affect merging behaviour (unless absolute sizes in megabytes are used). So we should really
fix this - sorry for investigating this so late!
- The deletions in the segments are inverted. Segments that have no deleteions are reported
as those *with deletions* but delGen=-1, and those with deletions show "no deletions", this
is not serious, but should be fixed, too.

There is one "bug" in sizeInBytes (which we should NOT fix), is that for 3.x indexes, if they
are from 3.0 and have shared doc stores they are overestimated. But that's fine. For this
case, the index was a 3.6.1 segment and a 4.0 segment, both showed double size.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message