lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From markharw00d <markharw...@yahoo.co.uk>
Subject Re: auto-filters?
Date Mon, 03 Jan 2005 20:30:29 GMT
It looks like Lucene does not use any of the BitSet boolean logic 
operators ( and , or etc)  - it just seems to use the "get" method to 
test set membership for individual docs.
If this is true the DocIdSet would look like this:
  public interface DocIdSet
  {
      public abstract boolean contains(int docId);
  }
And Filter would become:
  public interface Filter
  {
      public abstract DocIdSet getDocIdSet(IndexReader reader) throws 
IOException;
   }

As you suggest, the DocIdSet would be cached and the policy for evicting 
DocIdSets from cache would have to balance these factors for each DocIdSet:
1) Cache "Hit rate" on the set
2) Cost of  recreating the set (ie computational cost/ disk access)
3) Memory used by set

We can compute #1 easily enough, #2 may prove hard to quantify but we 
could ensure we have #3 by insisting that the DocIdSet include this method:
    public abstract int getCachedSizeInBytes();
We could also consider the option of allowing DocIdSets to implement 
"Serializable" in which case the cache manager would be able to 
serialize DocIdSets to temporary storage.

I'm not sure how you would want to handle the versioning issues around a 
change to the Filter interface though.



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message