lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-4485) CheckIndex's term stats should not include deleted docs
Date Tue, 16 Oct 2012 15:51:03 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael McCandless updated LUCENE-4485:
---------------------------------------

    Attachment: LUCENE-4485.patch

Simple patch ...
                
> CheckIndex's term stats should not include deleted docs
> -------------------------------------------------------
>
>                 Key: LUCENE-4485
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4485
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-4485.patch
>
>
> I was looking at the CheckIndex output on and index that has deletions, eg:
> {noformat}
>   4 of 30: name=_90 docCount=588408
>     codec=Lucene41
>     compound=false
>     numFiles=14
>     size (MB)=265.318
>     diagnostics = {os=Linux, os.version=3.2.0-23-generic, mergeFactor=10, source=merge,
lucene.version=5.0-SNAPSHOT, os.arch=amd64, mergeMaxNumSegments=-1, java.version=1.7.0_07,
java.vendor=Oracle Corporation}
>     has deletions [delGen=1]
>     test: open reader.........OK [39351 deleted docs]
>     test: fields..............OK [8 fields]
>     test: field norms.........OK [2 fields]
>     test: terms, freq, prox...OK [4910342 terms; 61319238 terms/docs pairs; 65597188
tokens]
>     test (ignoring deletes): terms, freq, prox...OK [4910342 terms; 61319238 terms/docs
pairs; 70293065 tokens]
>     test: stored fields.......OK [1647171 total field count; avg 3 fields per doc]
>     test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields
per doc]
>     test: docvalues...........OK [0 total doc count; 1 docvalues fields]
> {noformat}
> If you compare the {{test: terms, freq, prox}} (includes deletions) and the next line
(doesn't include deletions), it's confusing because only the 3rd number (tokens) reflects
deletions.  I think the first two numbers should also reflect deletions?  This way an app
could get a sense of how much "deadweight" is in the index due to un-reclaimed deletions...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message