lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs
Date Wed, 22 Sep 2010 16:49:33 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913659#action_12913659
] 

Uwe Schindler commented on LUCENE-2649:
---------------------------------------

I am also strongly +1 for the additional Bits interface (as Ryan did, it does not always need
to be a real OpenBitSet, so when no deletions and all things set, we can use a dummy one).
I had often use cases where i needed the information, if this document really has a value
set or not, and i don't use Solr so much.

{quote}
And being able to distinguish missing values, eg to sort them last, or
to do something else, is useful. Once we do this we should also
[eventually] move "sort missing last" capability into Lucene's
comparators.
{quote}

+1

{quote}
I think this is the right approach - expecting FC's valid bits to
take deletions into account is too much. We have IR.getDeletedDocs
for this.
{quote}

We don't need to AND them together, maybe simply wrap the OpenBitset by a custom Bits impl,
that ands in the getter? But as deletions are separated in IndexReader and the cache can reuse
the cache even when new deletions are added, i think keeping it separate is fine.

About the whole bit set: Do we really need to couple the Bits interface to the type? Because
if you exchange the parser/native type (e.g. parse ints as byte), the valid docs are still
the same, only the native type representation is different. So how about we add a getBits(field)
method to FieldCache that returns the valid docs. If field was not yet retrieved as a native
type it could throw IllegalStateEx, else it would return the Bits interface (globally, but
per field, but not per parser/datatype) created during the last FC polulation run? We have
then also the possibility to disable the default generation of Bits and do it lazily (which
should run faster, as it does not need to parse the values, only enumerate terms and termdocs).

{quote}
Really, "in general" we need a better way for the query execution path
to enforce deleted docs. Eg if the FCRF will be AND'd w/ a query
that's already excluding del docs then it need not be careful about
deletions...
{quote}

Thats another thing, but maybe we remove deleted docs completely from query processing and
simply apply it like a filter before the collector. Not sure about the implications and performance.

> FieldCache should include a BitSet for matching docs
> ----------------------------------------------------
>
>                 Key: LUCENE-2649
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2649
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Ryan McKinley
>             Fix For: 4.0
>
>         Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch,
LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch
>
>
> The FieldCache returns an array representing the values for each doc.  However there
is no way to know if the doc actually has a value.
> This should be changed to return an object representing the values *and* a BitSet for
all valid docs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message