lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Pine <>
Subject Re: BitSet in a HitCollector
Date Thu, 06 Jul 2006 15:32:42 GMT

Sorry, I will explain a bit more about my collect
method. Currently my collect method is executing
IndexSearcher.doc(id) and storing some stuff in a Map
which I can then retrieve from the HitCollector (much
like the example in the Lucene In Action book). Of
course that's somewhat expensive, so I'd like to do
some statistical sampling based on the result set size
to try and speed things up.

The way I was thinking about doing this was, during
the collect method only executing
IndexSearcher.doc(id) on every Nth document, where N
is calculated dynamically based on a minimum number X.
The rule would be:

N = Max(1,(numResults / X))

In order to do this in the collect method, I need to
know the total number of results before ever invoking
the collect method right? That seemed to make a case
for the BitSet/QueryFilter in the constructor.

In addition, someone else on the list mentioned that
one of the reasons calling IndexSearcher.doc(id) in
the collect method was that it caused the disk to do a
lot of seeking. Maybe that's a moot point if one is
using a RAMDirectory or an FSDirectory small enough
that it gets cached by the OS anyway, but if it's not,
then I thought it might be more performant to have the
hitcollector set the Bits in the collect method and
then do another pass to do the statistical sampling. 

Either way it seems that to do the statistical
sampling that I envision I either need to calculate
the total result count/document id set in the
constructor, before calling the collect method, or
calculate the total result count/document id set in
the collect method and then execute some sort of
post-collect method, right? So I was just wondering
which method was better/faster. Thanx.


--- Chris Hostetter <> wrote:

> : I'm using a HitCollector and would like to know
> the
> : total number of results that matched a given
> query.
> : Based on the JavaDoc, I this will do the trick:
> you don't need a BitSet in that case, you could find
> that out just using
> an int...
>     public CountingCollector extends HitCollector {
>       public int count = 0;
>       public void collect(int doc, float score) {
> count++ };
>     }
>     CountingCollector c = new CountingCollector();
>, c)
>     int numResults = c.count;
> : If I want to know the total number of results
> inside
> : of the HitCollector, i.e. before the collect
> method
> : has ever been called, I think I could pass the
> Query
> : and Searcher objects into the HitCollector and do
> this
> : in its constructor:
> :
> : BitSet bits = (new
> :
> QueryFilter(query)).bits(searcher.getIndexReader());
> : int numResults = bits.cardinality();
> This question doesn't make a lot of sense to me, why
> do you need to know
> the total number ofresults before the collect method
> is called? .. what
> you are suggesting here (using QueryFilter in this
> way) is perfectly
> legal, but it's going to do just as much work as
> using a HitCollector will
> (possibly more, i can't remember).
> : Is Lucene executing another pass over the index in
> : order to populate the BitSet and then doing
> another
> : pass while calling the collect method? Thanx.
> in your last example, you never us your
> HitCollector, so i'm not sure what
> you mean, but assuming you aresking about combining
> those examples into
> something like this....
>   Searcher searcher = new
> IndexSearcher(indexReader);
>   BitSet bits = (new
> QueryFilter(query)).bits(searcher.getIndexReader());
>   final int numResults = bits.cardinality();
>, new HitCollector() {
>        public void collect(int doc, float score) {
>           /* do something with numResults and doc
> and score */
>        }
>   });
> ...then yes, you are most definitely making two
> passes to do do that.
> -Hoss
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message