lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan McKinley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs
Date Thu, 23 Sep 2010 00:02:33 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913857#action_12913857
] 

Ryan McKinley commented on LUCENE-2649:
---------------------------------------

bq Is the one-time-per-IndexReader-lifecycle cost of multiplying the cache load time by some
factor < 2.0 ... really so terrible 

it can be... on a big index just iterating all the terms/docs take a long time.  Try the LukeRequestHandler
on an index with a million+ docs!

-------------

Here is different variation, it changes *lots* but if we are talking about changing Parser
from interface to class, then I guess the cat can be out of the bag.

What about something like: 
{code:java|title=FieldCache.java}
  ...

  
  public class EntryConfig implements Serializable 
  {
    public Parser getParser() {
      return null;
    }
    public boolean cacheValidBits() {
      return false;
    }
    public boolean cacheValues() {
      return true;
    }
    
    /**
     * The HashCode is used as part of the Cache Key (along with the field name).  
     * To allow multiple calls with different parameters, make sure the hashCode 
     * does not include the specific instance and parameters.
     */
    public int hashCode()
    {
      return EntryConfig.class.hashCode();
    }
  }
  
  
  public abstract class CachePopulator 
  {
    public abstract void fillValidBits(  CachedArray vals, IndexReader reader, String field,
EntryConfig creator ) throws IOException;
    public abstract void fillByteValues( CachedArray vals, IndexReader reader, String field,
EntryConfig creator ) throws IOException;
    ...
  }

  public abstract CachePopulator getCachePopulator();

...

  public ByteValues getByteValues(IndexReader reader, String field, EntryConfig creator )

...

{code}


The field cache implementation would make sure what you asked for is filled up before passing
it back (though i think this has some concurrency issue)
{code:java}

  public ByteValues getByteValues(IndexReader reader, String field, EntryConfig config) throws
IOException
  {
    ByteValues vals = (ByteValues) caches.get(Byte.TYPE).get(reader, new Entry(field, config));
    if( vals.values == null && config.cacheValues() ) {
      populator.fillByteValues(vals, reader, field, config);
    }
    if( vals.valid == null && config.cacheValidBits() ) {
      populator.fillValidBits(vals, reader, field, config);
    }
    return vals;
  }
{code}

The Cache would then delegate the creation to the populator:
{code:java}

    @Override
    protected final ByteValues createValue(IndexReader reader, Entry entry, CachePopulator
populator) throws IOException {
      String field = entry.field;
      EntryConfig config = (EntryConfig)entry.custom;
      if (config == null) {
        return wrapper.getByteValues(reader, field, new EntryConfig() );
      }
      ByteValues vals = new ByteValues();
      if( config.cacheValues() ) {
        populator.fillByteValues(vals, reader, field, config);
      }
      else if( config.cacheValidBits() ) {
        populator.fillValidBits(vals, reader, field, config);
      }
      else {
        throw new RuntimeException( "the config must cache values and/or bits" );
      }
      return vals;
    }
{code}

The fillByteValues would be the same code as always, but I think the CachedArray should make
sure the same parser is used everytime
{code:java}

    @Override
    public void fillByteValues( CachedArray vals, IndexReader reader, String field, EntryConfig
config ) throws IOException
    {
      ByteParser parser = (ByteParser) config.getParser();
      if( parser == null ) {
        parser = FieldCache.DEFAULT_BYTE_PARSER;
      }
      // Make sure it is the same parser
      int parserHashCode = parser.hashCode();
      if( vals.parserHashCode != null && vals.parserHashCode != parserHashCode ) {
        throw new RuntimeException( "Subsequent calls with different parser!" );
      }
      vals.parserHashCode = parserHashCode;
     ...
{code}

This is different then the current code where asking for the cached values with two different
parsers (that return different hashcodes) will make two entries in the cache.

This approach would let us:
* cache values and bits independently or together
* subsequent calls with different parameters should behave reasonably
* If CachePopulator is pluggable/extendable that may make some other issues easier
* lets us use CachePopulator outside of the cache context (perhaps useful)





> FieldCache should include a BitSet for matching docs
> ----------------------------------------------------
>
>                 Key: LUCENE-2649
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2649
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Ryan McKinley
>             Fix For: 4.0
>
>         Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch,
LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch
>
>
> The FieldCache returns an array representing the values for each doc.  However there
is no way to know if the doc actually has a value.
> This should be changed to return an object representing the values *and* a BitSet for
all valid docs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message