lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-3473) CheckIndex should verify numUniqueTerms == recomputedNumUniqueTerms
Date Sun, 23 Oct 2011 23:56:32 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir updated LUCENE-3473:
--------------------------------

    Attachment: LUCENE-3473.patch

Patch adding the checks to checkindex.

there were some problems:
* IndexReader.getUniqueTermCount doesn't work in trunk, but works fine in 3.x. This is because
it sums per-field across the Terms api, but PreFlex codec doesn't know this information per-field
* If a field has no postings (but exists in fieldinfos), then IR.getUniqueTermCount hits an
NPE (ant test-core -Dtestcase=TestNorms -Dtestmethod=testCustomEncoder -Dtests.seed=-6a2248fc7313e45:c41a685f840f6ed:-5a3fd5b8ec315508)
* MemoryCodec didn't implement Fields.getUniqueTermCount, probably just forgotten because
its not abstract (instead throwing UOE by default).

So, i fixed MemoryCodec to impl Terms.getUniqueTermCount, changed Terms.getUniqueTermCount
to be abstract (throw -1 if you cannot implement it), and added Fields.getUniqueTermCount,
called by IR.getUniqueTermCount: default implementation sums across fields, but PreFlex overrides
so that its IR.getUniqueTermCount works again.

we might want to deprecate the latter method when 3.x indexes no longer need to be supported,
or maybe its just fine as-is (you have to do the summing somewhere).
                
> CheckIndex should verify numUniqueTerms == recomputedNumUniqueTerms
> -------------------------------------------------------------------
>
>                 Key: LUCENE-3473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3473
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 3.4, 4.0
>            Reporter: Robert Muir
>         Attachments: LUCENE-3473.patch
>
>
> Just glancing at the code it seems to sorta do this check, but only in the hasOrd==true
case maybe (which seems to be testing something else)?
> It would be nice to verify this also for terms dicts that dont support ord.
> we should add explicit checks per-field in 4.x, and for-all-fields in 3.x and preflex

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message