lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From SUJIT PAL <sujit....@comcast.net>
Subject Re: Statically store sub-collections for search (faceted search?)
Date Mon, 15 Apr 2013 20:29:11 GMT
Hi Uwe,

I see, makes sense, thanks very much for the info. Sorry about giving you wrong info Carsten.

-sujit

On Apr 15, 2013, at 1:06 PM, Uwe Schindler wrote:

> Hi,
> 
> ----Original Message-----
>> From: Sujit Pal [mailto:sujitatgtalk@gmail.com] On Behalf Of SUJIT PAL
>> Sent: Monday, April 15, 2013 9:43 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: Statically store sub-collections for search (faceted search?)
>> 
>> Hi Uwe,
>> 
>> Thanks for the info, I was under the impression that it didn't... I got this info
>> (that filters don't have a limit because they are not scoring) from a document
>> like the one below. Can't say this is the exact doc because its been a while
>> since I saw that, though.
>> 
>> http://searchhub.org/2009/06/08/bringing-the-highlighter-back-to-wildcard-
>> queries-in-solr-14/
>> 
>> """
>> As a response to this performance pitfall on very large indices’s (and the
>> infamous TooManyClauses exception), new queries were developed that
>> relied on a new Query class called ConstantScoreQuery.
>> ConstantScoreQuerys accept a filter of matching documents and then score
>> with a constant value equal to the boost. Depending on the qualities of your
>> index, this method can be faster than the Boolean expansion method, and
>> more importantly, does not suffer from TooManyClauses exceptions. Rather
>> than matching and scoring n BooleanQuery clauses (potentially thousands of
>> clauses), a single filter is enumerated and then traveled for scoring. On the
>> other hand, constructing and scoring with a BooleanQuery containing a few
>> clauses is likely to be much faster than constructing and traveling a Filter.
>> """
> 
> This is true, but you misunderstood it: This is about MultiTermQueries (which is the
superclass of WildcardQuery, Fuzzy-, and range queries). Those queries are no native Lucene
queries, so they rewrite to basic/native queries. In earlier Lucene versions, Wildcards were
always rewritten to BooleanQueries with many TermQueries (one for each term that matches the
wildcard), leading to the problem with too many terms. This is still the case, but only in
some limits (this mode is only used if the wildcard expands to few terms). Those BooleanQueris
are then used with ConstantScoreQuery(Query).
> The above text talks about another mode (which is used for many terms today): *No* BooleanQuery
is build at all, instead all matching term's documents are marked in a BitSet and this BitSet
is used with a Filter to construct a different Query type: ConstantScoreQuery(Filter). The
BooleanQuery max clause count does not apply, because no BooleanQuery is involved in the whole
process. If you use ConstantScoreQuery(BooleanQuery), the limit still applies, but not for
ConstantScoreQuery(internalWildcardFilter).
> 
> Uwe
> 
>> On Apr 15, 2013, at 1:04 AM, Uwe Schindler wrote:
>> 
>>> The limit also applies for filters. If you have a list of terms ORed together,
>> the fastest way is not to use a BooleanQuery at all, but instead a TermsFilter
>> (which has no limits).
>>> 
>>> -----
>>> Uwe Schindler
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>> http://www.thetaphi.de
>>> eMail: uwe@thetaphi.de
>>> 
>>> 
>>>> -----Original Message-----
>>>> From: Carsten Schnober [mailto:schnober@ids-mannheim.de]
>>>> Sent: Monday, April 15, 2013 9:53 AM
>>>> To: java-user@lucene.apache.org
>>>> Subject: Re: Statically store sub-collections for search (faceted
>>>> search?)
>>>> 
>>>> Am 12.04.2013 20:08, schrieb SUJIT PAL:
>>>>> Hi Carsten,
>>>>> 
>>>>> Why not use your idea of the BooleanQuery but wrap it in a Filter
>> instead?
>>>> Since you are not doing any scoring (only filtering), the max boolean
>>>> clauses limit should not apply to a filter.
>>>> 
>>>> Hi Sujit,
>>>> thanks for your suggestion! I wasn't aware that the max clause limit
>>>> does not match for a BooleanQuery wrapped in a filter. I suppose the
>>>> ideal way would be to use a BooleanFilter but not a QueryWrapperFilter,
>> right?
>>>> 
>>>> However, I am also not sure how to apply a filter in my use case
>>>> because I perform a SpanQuery. Although SpanQuery#getSpans() does
>>>> take a Bits object as an argument (acceptDocs), I haven't been able
>>>> to figure out how to generate this Bits object correctly from a Filter
>> object.
>>>> 
>>>> Best,
>>>> Carsten
>>>> 
>>>> --
>>>> Institut für Deutsche Sprache | http://www.ids-mannheim.de
>>>> Projekt KorAP                 | http://korap.ids-mannheim.de
>>>> Tel. +49-(0)621-43740789      | schnober@ids-mannheim.de
>>>> Korpusanalyseplattform der nächsten Generation Next Generation
>> Corpus
>>>> Analysis Platform
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message