lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aleksey <bitterc...@gmail.com>
Subject Re: TermRangeQuery performance oddness
Date Tue, 07 May 2013 09:31:51 GMT
Thank you for the detailed answer. I'll look into the FieldCacheRangeFilter.
Is there another way of getting a page of results starting with some
term that does not have similar issue? Basically I want to implement
pagination of docs sorted by title, next page starting from last doc
of the previous, and while, typically, getting the first page is
fastest, in my case it ends up being slowest. Is it possible to
disable counting hits and just return top 50 results that qualify the
query?

Also, if the hit count is a direct influence on the query speed, how
come some other queries that span a larger or similar set are faster?
Say, a TermQuery for a term that happens to be the same for majority
of docs and spans more docs than the above termrange.

Aleksey




On Tue, May 7, 2013 at 12:13 AM, Uwe Schindler <uwe@thetaphi.de> wrote:
> Hi,
>
> The problem is by design: Lucene is an inverted index, so lookups can only be done by
single terms and find the documents related to every single term. To execute a range, the
query first have to position the terms enum on the first term and then iterate over all *terms*
in the index (not documents) until the last term is reached. If the number of terms in the
field is large (because you have many distinct values), this takes some time. For every term
in the enumeration that matches the range, Lucene has to look up all matching documents in
the posting list and report them as hits (using a bitset). The latter (looking up the posting
lists involves lots of work), so ranges with thousands of terms will get slow.
>
> So the time depends: How many terms are in your term dictionary between the lower bound
and the higher bound of your range, not really the size of the index (although this is quite
often directly related).
>
> If you want faster range queries, use maybe NumericRangeQuery, because this has some
optimizations on the cost of a large index size. But if you are stuck with text, you may also
review FieldCacheRangeFilter (which only works for untokenized fields, but I assume from your
example "title" is not tokenized).
>
> The order of results of a range query is in "index order", because there is no TF-IDF
ranking involved (all hits have the same score of 1). Index order means the order in which
they were indexed.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Aleksey [mailto:bittercold@gmail.com]
>> Sent: Tuesday, May 07, 2013 4:15 AM
>> To: java-user@lucene.apache.org
>> Subject: TermRangeQuery performance oddness
>>
>> Hi guys,
>>
>> If I run 2 term range queries:
>>
>> new TermRangeQuery("title", new BytesRef("A"), null, true, true); and new
>> TermRangeQuery("title", new BytesRef("Z"), null, true, true);
>>
>> The one that starts with "Z" is several times faster (I make 1000 queries in a
>> loop to measure). I understand that the first one has much larger hit number,
>> but if the query is bounded to 50 results, why does that matter?
>> At first I thought that it grabs all hits and sorts them, but then it doesn't seem
>> to make any difference whether or not I pass sort by "title" parameter to the
>> searcher. Results are either sorted or kind of random, but speed is the same.
>> Why is that?
>>
>> Thank you in advance,
>>
>> Aleksey
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message