Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (nike.apache.org: domain of msokolov@safaribooksonline.com
 designates 209.85.216.53 as permitted sender)
Message-ID: <5450FB32.8040800@safaribooksonline.com>
Date: Wed, 29 Oct 2014 10:35:30 -0400
From: Michael Sokolov <msokolov@safaribooksonline.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:24.0) Gecko/20100101 Thunderbird/24.7.0
MIME-Version: 1.0
To: java-user@lucene.apache.org
Subject: Re: Query with many clauses
References: 
 <CA+DeSAf+J5DvN4yZVCyBUE_=g8Po6ELdhWnxR+GKSRRafuCSDQ@mail.gmail.com>
	<5450CFF3.7040801@safaribooksonline.com>
 <CAF9ZkbNkQOwBOi3oz59oTZWaQBC5naNY7xb0qbuJKMer99SVOA@mail.gmail.com>
In-Reply-To: 
 <CAF9ZkbNkQOwBOi3oz59oTZWaQBC5naNY7xb0qbuJKMer99SVOA@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit

I did some analysis with access-control lists and found that our 
customers have significant overlap in the documents they have access to, 
so we would be able to realize very nice compression in the number of 
terms in access control queries by indexing overlapping subsets.  
However this is a fair amount of effort since it requires analyzing all 
the access lists periodically and re-indexing some set of documents when 
that changes.  We're able to achieve good-enough performance by simply 
caching a filter we generate when a session starts - even though the 
initial query may be kind of slow, we only run it once and the user is 
largely unaffected.  Maybe you can play some trick like that?

-Mike

On 10/29/2014 08:20 AM, Pawel Rog wrote:
> Hi,
> I already tried to transform Queries to filter (TermQuery -> TermFilter)
> but didn't see much speed up. I wrote that  wrapped this filter into
> ConstantScoreQuery and in other test I used FilteredQuery with
> MatchAllDocsQuery and BooleanFilter. Both cases seems to work quite similar
> in terms of performance to simple BooleanQuery.
> But of course I'll also try to use TermsFilter. Maybe it will speedUp
> filters.
>
> Michael Sokolov I haven't prepared any statistics about number of
> BooleanClauses used and if there are some repeating sets of terms. I think
> I have to collect some stats for better understanding what can be improved.
>
> --
> Paweł Róg
>
>
> On Wed, Oct 29, 2014 at 12:30 PM, Michael Sokolov <
> msokolov@safaribooksonline.com> wrote:
>
>> I'm curious to know more about your use case, because I have an idea for
>> something that addresses this, but haven't found the opportunity to develop
>> it yet - maybe somebody else wants to :).  The basic idea is to reduce the
>> number of terms needed to be looked up by collapsing commonly-occurring
>> collections of terms into synthetic "tiles".  If your queries have a lot of
>> overlap, this could greatly reduce the number of terms in a query rewritten
>> to use tiles. It's sort of complex, requires indexing support, or a filter
>> cache, and there's no working implementation as yet, so this is probably
>> not really going to be helpful for you in the short term, but if you can
>> share some information I'd love to know:
>>
>> what kind of things are you searching?
>> how many terms do your larger queries have?
>> do the query terms overlap among your queries?
>>
>> -Mike Sokolov
>>
>>
>> On 10/28/14 9:40 PM, Pawel Rog wrote:
>>
>>> Hi,
>>> I have to run query with a lot of boolean should clauses. Queries like
>>> these were of course slow so I decided to change query to filter wrapped
>>> by
>>> ConstantScoreQuery but it also didn't help.
>>>
>>> Profiler shows that most of the time is spent on seekExact in
>>> BlockTreeTermsReader$FieldReader$SegmentTermsEnum
>>>
>>> When I go deeper in trace I see that inside seekExact most time is spent
>>> on
>>> loadBlock and even deeper ByteBufferIndexInput.clone.
>>>
>>> Do you have any ideas how I can make it work faster or it is not possible
>>> and I have to live with it?
>>>
>>> --
>>> Paweł Róg
>>>
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org