lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Johan Stuyts" <j.stu...@hippo.nl>
Subject Method to speed up caching for faceted navigation
Date Wed, 26 Jul 2006 09:50:58 GMT
Hi,

I am working on faceted navigation. This is nothing new but I am
anticpating an index that changes very frequently (every couple of
seconds). After the index has been updated, I need to cache the bit sets
of the facet values so I can do counting during searches later on.
Because I need to get a lot of bit sets often this needs to be as fast
as possible.

I did the following:
  IndexReader ir = ...;
  TermDocs td = ir.termDocs(new Term("facet name", "facet value"));
  while (td.next())
  {
    bitSet.set(td.doc());
  }

The problem with this code is that it gets the document IDs one by one.
I tried to optimize the loop by reading blocks of IDs by using
'read(int[], int[])', but this did not have a noticable effect.

I looked at the implementation of 'read(int[], int[])' in
'SegmentTermDocs' and saw that it did the following things:
- check if the document has a frequency higher than 1, and if so read
it;
- check if the document has been deleted, and if so don't add it to the
result;
- store the document IDs, counts and frequences in attributes instead of
local variables.

Given that the following preconditions hold in my situation:
- all documents have a frequency of 1 for the term;
- I never delete documents using the 'IndexReader' from which I get the
'TermDocs' object;
- I am only interested in the document IDs.

I made 'SegmentTermDocs' a public class and added the following method.
This method eliminates the overhead in the 'read(int[], int[]) method:
  public void readDocsWithoutFreqsAssumingNoDeletions(final BitSet
destination)
          throws IOException {
    int count = this.count;
    final int df = this.df;
    int doc = this.doc;
    while (count < df) {
      doc += freqStream.readVInt() >>> 1;
      count++;

      destination.set(doc);
    }
    // Leave a consistent state
    this.doc = doc;
    freq = 1;
    this.count = df;
  }

By using the method above I gained a speed improvement of over 20%.

Will this method always work correctly given the preconditions?

Kind regards,

Johan Stuyts
Hippo

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message