lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan McKinley (JIRA)" <>
Subject [jira] Updated: (LUCENE-2649) FieldCache should include a BitSet for matching docs
Date Wed, 22 Sep 2010 15:37:33 GMT


Ryan McKinley updated LUCENE-2649:

    Attachment: LUCENE-2649-FieldCacheWithBitSet.patch

Here is a new patch that removes the static config.  Rather then put a property on Parser
class, I added a class:
  public abstract static class CacheConfig {
    public abstract boolean cacheValidBits();
and this gets passed to the getXXXValues function:
ByteValues getByteValues(IndexReader reader, String field, ByteParser parser, CacheConfig

I think this is a better option then adding a parameter to Parser since we can have an easy
upgrade path.  Parser is an interface, so we can not just add to it without breaking compatibility.
 To change things in 4.x, 3.x should have an upgrade path.

I took Mike's suggestion and include the CacheConfig hashcode in the Cache key -- however,
I don't cache the Bits separately since this is an edge case that *should* be avoided, but
at least does not fail if you are not consistent.

This does cache a MatchAllBits even when 'cacheValidBits' is false, since that is small (a
small class with one int)


bq.     *  We don't have to @Deprecate for 4.0 - just remove it, and note this in MIGRATE.txt.
(Though for 3.x we need the deprecation, so maybe do 3.x patch first, then remove deprecations
for 4.0?).

My plan was to apply with deprecations to 4.x, then merge with 3.x.  Then replace the calls
in 4.x, then remove the old functions.  Does this sound reasonable?

I would like this to get in 3.x since we could then remove many solr types in 4.x and have
a 3.x migration path.

bq.  * FieldCache.EntryCreator looks orphan'd?

dooh, thanks

bq. It looks like the valid bits will not reflect deletions (by design), right? Ie caller
cannot rely on valid always incorporating deleted docs. (Eg the MatchAll opto disregards deletions,
and, a reopened segment can have new deletions yet shares the FC entry).

Right, the ValidBits are only checked for docs that exists (and the FC values are only set
for docs that exists -- this has not changed), and may contain false positives for deleted
docs.  I think this is OK since most use cases (i can think of) deal with deletions anyway.
  Any ideas how/if we should change this?  (I did not realize that the FC is reused after
deletions -- so clever)


bq. I'm having trouble understanding the use case for this bitset.

My motivation is for supporting the supportMissingLast feature in solr sorting (that could
now be pushed to lucene).  For example if I have a bunch of documents and only some have the
field "bytes" -- sorting 'bytes desc' works great, but sorting 'bytes asc' puts all the documents
that do not have the field 'bytes' first since the FieldCache thinks they are all zero.

If we get this working in solr, we can deprecate and delete all the "sortable" number fields
and have that same functionality on Trie* fields.

> FieldCache should include a BitSet for matching docs
> ----------------------------------------------------
>                 Key: LUCENE-2649
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Ryan McKinley
>             Fix For: 4.0
>         Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch,
LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch
> The FieldCache returns an array representing the values for each doc.  However there
is no way to know if the doc actually has a value.
> This should be changed to return an object representing the values *and* a BitSet for
all valid docs.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message