lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <>
Subject Re: Query with many clauses
Date Wed, 29 Oct 2014 14:00:10 GMT
For queries with many terms, where each term matches few documents
(actually a single document for "ID filters" in my tests), I saw
speedups between 4x and 8x  (the 3rd chart)

-Yonik - native code faceting, facet functions,
sub-facets, off-heap data

On Wed, Oct 29, 2014 at 9:42 AM, Michael McCandless
<> wrote:
> I suggested TermsFilter, not TermFilter :)  Note the sneaky extra s ....
> Mike McCandless
> On Wed, Oct 29, 2014 at 8:20 AM, Pawel Rog <> wrote:
>> Hi,
>> I already tried to transform Queries to filter (TermQuery -> TermFilter)
>> but didn't see much speed up. I wrote that  wrapped this filter into
>> ConstantScoreQuery and in other test I used FilteredQuery with
>> MatchAllDocsQuery and BooleanFilter. Both cases seems to work quite similar
>> in terms of performance to simple BooleanQuery.
>> But of course I'll also try to use TermsFilter. Maybe it will speedUp
>> filters.
>> Michael Sokolov I haven't prepared any statistics about number of
>> BooleanClauses used and if there are some repeating sets of terms. I think
>> I have to collect some stats for better understanding what can be improved.
>> --
>> Paweł Róg
>> On Wed, Oct 29, 2014 at 12:30 PM, Michael Sokolov <
>>> wrote:
>>> I'm curious to know more about your use case, because I have an idea for
>>> something that addresses this, but haven't found the opportunity to develop
>>> it yet - maybe somebody else wants to :).  The basic idea is to reduce the
>>> number of terms needed to be looked up by collapsing commonly-occurring
>>> collections of terms into synthetic "tiles".  If your queries have a lot of
>>> overlap, this could greatly reduce the number of terms in a query rewritten
>>> to use tiles. It's sort of complex, requires indexing support, or a filter
>>> cache, and there's no working implementation as yet, so this is probably
>>> not really going to be helpful for you in the short term, but if you can
>>> share some information I'd love to know:
>>> what kind of things are you searching?
>>> how many terms do your larger queries have?
>>> do the query terms overlap among your queries?
>>> -Mike Sokolov
>>> On 10/28/14 9:40 PM, Pawel Rog wrote:
>>>> Hi,
>>>> I have to run query with a lot of boolean should clauses. Queries like
>>>> these were of course slow so I decided to change query to filter wrapped
>>>> by
>>>> ConstantScoreQuery but it also didn't help.
>>>> Profiler shows that most of the time is spent on seekExact in
>>>> BlockTreeTermsReader$FieldReader$SegmentTermsEnum
>>>> When I go deeper in trace I see that inside seekExact most time is spent
>>>> on
>>>> loadBlock and even deeper ByteBufferIndexInput.clone.
>>>> Do you have any ideas how I can make it work faster or it is not possible
>>>> and I have to live with it?
>>>> --
>>>> Paweł Róg
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail:
>>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message