lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2348) DuplicateFilter incorrectly handles multiple calls to getDocIdSet for segment readers
Date Fri, 26 Nov 2010 10:54:16 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935970#action_12935970
] 

Michael McCandless commented on LUCENE-2348:
--------------------------------------------

bq. if its not doing the work in getdocidset, it shouldn't extend Filter!

I don't think we can or should dictate that.

I think it's fair game for a Filter to compute/cache whatever it
wants.  The only requirement for Filter is that it implement
getDocIdSet.  Where it does its work, what it's storing in its
instance, etc., is up to it.

Sure, we strive for a strong separation of "computing the bits" vs
"caching them", but for some cases that ideal is not feasible.

In fact in this case the filter is so costly to build that no
realistic app can possibly rely on the filter without first wrapping
it in CachingWrapperFilter.  So I see no harm in conflating caching
with this.  We could rename it to CachingDuplicateFilter.  In fact we
could factor out the FilterCache utility class now inside
CachingWrapperFilter and make it easily reused by other filters like
this one that need to compute & cache right off.

This would also be cleaner if we change the filter API so getDocIdSet
receives the top reader and docBase in addition to the sub; this way a
CachingDuplicateFilter instance could be reused across reopened top
readers.

{quote}
If someone wants to make a "DuplicateBitSetBuilder" that is a factory for creating a BitSet,
to me that is more natural and obvious as to what is going on.
{quote}

That sounds good... but how would it work?  Ie how would an app tie
that into a Filter?


> DuplicateFilter incorrectly handles multiple calls to getDocIdSet for segment readers
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2348
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2348
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/*
>    Affects Versions: 2.9.2
>            Reporter: Trejkaz
>         Attachments: LUCENE-2348.patch, LUCENE-2348.patch
>
>
> DuplicateFilter currently works by building a single doc ID set, without taking into
account that getDocIdSet() will be called once per segment and only with each segment's local
reader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message