lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: What would be the fastest BooleanQuery possible?
Date Wed, 16 Sep 2009 14:16:52 GMT
You could get the Scorer and call next() yourself; this would avoid
scoring.  EG something like this:

      Weight weight = query.weight(searcher);
      Scorer scorer = weight.scorer(searcher.getIndexReader());
      while(scorer.next()) {
        final int docID = scorer.doc();
        /* do something w/ docID */
      }

But note that this code is not generally recommended in 2.9 (since
it's not operating at the segment level).

If your queries contain only SHOULD and up to 32 MUST_NOT clauses,
then calling BooleanQuery.setAllowDocsOutOfOrder should improve
performance since internally it will use BooleanScorer instead of
BooleanScorer2.

Mike

On Wed, Sep 16, 2009 at 9:14 AM, Benjamin Pasero
<benjamin.pasero@gmail.com> wrote:
> Ah wow that sounds great. I am using 2.3.2 though (and have to use it
> for now). Anything
> in that version that could speed things up?
>
> On Wed, Sep 16, 2009 at 6:48 PM, Mark Miller <markrmiller@gmail.com> wrote:
>> With the new Collector API in Lucene 2.9, you no longer have to compute the
>> score.
>>
>> Now a Collector is passed a Scorer if they want to use it, but you can
>> just ignore it.
>>
>> --
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>>
>>
>> Benjamin Pasero wrote:
>>> Hi,
>>>
>>> I am using Lucene not only for smart fulltext searches but also for
>>> getting the results for a DB-like query, where I am not tokenizing the
>>> terms at all. For this query, I am interested in all results and for
>>> that
>>> I am using my own HitCollector.
>>>
>>> Now, while profiling I noticed that quite some time is spent in
>>> methods like TermQuery.weight() or BooleanScorer2.score(). Given that
>>> I am interested in all results, I am not interested in any score for
>>> the
>>> results.
>>>
>>> Is it possible to run a query where Lucene simply checks if a Document
>>> is a hit or not and completly ignore weighting and scoring? Or is that
>>> an integrated part of the search used to determine if a Document
>>> is a hit or not?
>>>
>>> Thanks for helping,
>>> Ben
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message