lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3593) Add a filter returning all document without a value in a field
Date Thu, 24 Nov 2011 22:22:39 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156917#comment-13156917
] 

Uwe Schindler commented on LUCENE-3593:
---------------------------------------

Hi Simon,

the first patch looks exactly like I said in our first ideas-exchange!

There are smaller problems (but solveable) and one optimization... (a trick one...): FieldCacheDocIdSet
has some special cases which work with the implementation here, but are unclean and should
violate some assertions - and should be fixed...:

- FieldCacheDocIdSet excepts that the match() method throws ArrayIndexOutOfBoundsException
when the FieldCacheArray is out of bounds. With the FixedBitSet behind that implementation
of the FieldCache this basically works, but should violate some code assertions added by MikeMcCandless
(not sure why the testcase does not hit this - doesn't it - I assume it does not because the
trunk bits() on DocIdSet will intercept this as our filter is not sparse -> it switches
to random access)
- The FieldCacheDocIdSet should maybe made un-private and refactored out of the FieldCacheRangeFilter.
- The positive case could be optimized: A instanceof check in the getDocIdSet() method could
check for the positive case that the FieldCacheImpl itsself returns a FixedBitSet/DocIdSet
already and return this directly:

{code:java}
final Bits docsWithField = FieldCache.DEFAULT.getDocsWithField(context.reader, field);
if (negate && docsWithField instanceof DocIdSet) // this is always the case for our
current impl - but who knows :-)
  return (DocIdSet) docsWithField;
{code}

In general the other cases can be easily done by the default stupid (stupid in the case that
its slowly iterating by doc++ and in trunk directly uses the Bits) impl like you did, but
once factoring out the FieldCacheRangeFilter.FieldCacheDocIdSet we could optimize this and
maybe have a better negation.

In all cases I dont like double negation of this Filter.

I'll work on the problems and make this filter work better. Should I take this issue and solve
the problems first? I also want to backport the FieldCacheTermsFilter code-duplication removal
in trunk to 3.x, so some cleanup is really needed!

I will come with a patch adressing those problems later or tomorrow.
                
> Add a filter returning all document without a value in a field
> --------------------------------------------------------------
>
>                 Key: LUCENE-3593
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3593
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: core/search
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 3.6, 4.0
>
>         Attachments: LUCENE-3593.patch
>
>
> In some situations it would be useful to have a Filter that simply returns all document
that either have at least one or no value in a certain field. We don't have something like
that out of the box and adding it seems straight forward.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message