lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-3473) CheckIndex should verify numUniqueTerms == recomputedNumUniqueTerms
Date Mon, 24 Oct 2011 02:02:32 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir updated LUCENE-3473:
--------------------------------

    Attachment: LUCENE-3473.patch

Hmm, I noticed i left a s.o.p in the previous patch for preflex, but it wasn't being called
from CheckIndex.

This is because we always wrap PreFlex inside PerFieldCodecWrapper... even if its a 3.x index!
This is a problem as it still perpetuates the loss of IR.numUniqueTerms.

So in this patch we no longer do this, which means I'm able to remove the assume as well.

But, now that preflex is being tested I think I've found an off-by-one with this statistic
when the field name is the empty string. 

I'm gonna see if i can make a testcase/issue against 3.x separately for this... because this
patch is already too big. 

NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter -Dtestmethod=testEmptyFieldName
-Dtests.seed=57fd2807ecfb5a2b:5556d32d3a1f68b7:469f7ed779c63825 -Dtests.codec=PreFlex
                
> CheckIndex should verify numUniqueTerms == recomputedNumUniqueTerms
> -------------------------------------------------------------------
>
>                 Key: LUCENE-3473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3473
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 3.4, 4.0
>            Reporter: Robert Muir
>         Attachments: LUCENE-3473.patch, LUCENE-3473.patch, LUCENE-3473.patch
>
>
> Just glancing at the code it seems to sorta do this check, but only in the hasOrd==true
case maybe (which seems to be testing something else)?
> It would be nice to verify this also for terms dicts that dont support ord.
> we should add explicit checks per-field in 4.x, and for-all-fields in 3.x and preflex

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message