lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: Performance measurements
Date Thu, 25 Jul 2013 20:56:50 GMT
In addition, although I am a bit beyond my expertise here, I believe you 
should be able to take any query object, including one returned from a query 
parser, wrap it with a ConstantScoreQuery, and then search on the CSQ to 
avoid all the scoring overhead.

For example, "*:*" is super fast even though it matches everything - no 
scoring.

-- Jack Krupansky

-----Original Message----- 
From: Arjen van der Meijden
Sent: Thursday, July 25, 2013 3:06 PM
To: java-user@lucene.apache.org
Subject: Re: Performance measurements

Hi Sriram,

I don't see any obvious mistakes, although you don't need to create a
FilteredQuery: There are plenty of search-methods on the IndexSearcher
that accept both a query (your TermQuery) and a filter (your TermsFilter).

The way I understand Filters (but I have no advanced in-depth knowledge
of them) is that they are very similar to Queries.
Queries are used for two tasks; matching a item and giving some measure
of how "well" it matched (i.e. the score).
Filters are used only for matching, but I doubt there is very much
difference from a technical point of view between the to ways of
matching items.

I'll leave more detailed explanations to others, as I might make too
many mistakes or just assume I know something I actually don't :)

Best regards,

Arjen

On 25-7-2013 19:56 Sriram Sankar wrote:
> Thanks everyone.  I'm trying this out:
>
>> So searching would become:
>> - Create a Query with only your termA
>> - Create a TermsFilter with all your termB's
>> - execute your preferred search-method with both the query and the filter
>
> I don't the get the same results as before - and am still debugging.  But
> I'm including before and after code in case someone is able to see a
> problem with what I'm doing.
>
> I'm also looking for docs on how filters work (or will read the code). 
> But
> at a high level, is the filter fully created when the Filter object is
> created?  Or is it incrementally built during traversal (when next() and
> advance() are called on the filters).  Reason for this question is related
> to early termination.
>
> The two versions of code (query based and filter based) are shown below -
> let me know if you see a problem with either.  Ignore any minor syntactic
> errors that may have got introduced as I simplified my code for inclusion
> here.
>
> Thanks,
>
> Sriram.
>
>
> QUERY APPROACH:
>
> BooleanQuery orTerms = new BooleanQuery();
> for (int i = 0; i < orCount; ++i) {
>      TermQuery orArg = new TermQuery(new Term("conn",
>       Integer.toString(connection[i])));
>      BooleanClause cl = new BooleanClause(orArg, 
> BooleanClause.Occur.SHOULD);
>      orTerms.add(cl);
> }
> TermQuery tq = new TermQuery(new Term("name", name));
> BooleanQuery query = new BooleanQuery();
> query.add(new BooleanClause(tq, BooleanClause.Occur.MUST));
> query.add(new BooleanClause(orTerms, BooleanClause.Occur.MUST));
>
> FILTER APPROACH:
>
> List<Term> orTerms = new ArrayList<Term>();
> for (int i = 0; i < orCount; ++i) {
>      terms.add(new Term("conn",
>         Integer.toString(connection[i])));
> }
> TermsFilter conns = new TermsFilter(terms);
> TermQuery tq = new TermQuery(new Term("name", name));
> FilteredQuery query = new FilteredQuery(tq, conns);
>
>
>
> On Thu, Jul 25, 2013 at 12:14 AM, Arjen van der Meijden <
> acmmailing@tweakers.net> wrote:
>
>> On 24-7-2013 21:58 Sriram Sankar wrote:
>>
>>> On Wed, Jul 24, 2013 at 10:24 AM, Jack Krupansky 
>>> <jack@basetechnology.com
>>>> **wrote:
>>>
>>>> Scoring has been a major focus of Lucene. Non-scored filters are also
>>>>
>>>> available, but the query parsers are focused (exclusively) on
>>>> scored-search.
>>>>
>>>>
>>> When you say "filter" do you mean a step performed after retrieval?  Or 
>>> is
>>> it yet another retrieval operation?
>>>
>>
>> He is really referring to the Filters available as an addition to
>> retrieval. The ones you supply with the search-method:
>> http://lucene.apache.org/core/**4_4_0/core/org/apache/lucene/**
>> search/IndexSearcher.html#**search%28org.apache.lucene.**
>> search.Query,%20org.apache.**lucene.search.Filter,%20int%29<http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/search/IndexSearcher.html#search%28org.apache.lucene.search.Query,%20org.apache.lucene.search.Filter,%20int%29>
>>
>> Unfortunately the documentation of Lucene is a bit fragmented, but
>> basically they limit the scope of your search domain (i.e. reduce the
>> available set of documents) during the processing of a query. So it
>> basically becomes (query) AND (filters).
>>
>> There are several useful implementations available for the filters. But 
>> in
>> your case you can just create a single TermsFilter (its in the queries
>> module/package) which is simply a OR-list like the one in your example
>> (similar to a basic IN in sql):
>>
>> http://lucene.apache.org/core/**4_4_0/queries/org/apache/**
>> lucene/queries/TermsFilter.**html<http://lucene.apache.org/core/4_4_0/queries/org/apache/lucene/queries/TermsFilter.html>
>>
>> So searching would become:
>> - Create a Query with only your termA
>> - Create a TermsFilter with all your termB's
>> - execute your preferred search-method with both the query and the filter
>>
>> If you where interested in the scores of each result, this would not work
>> too well since all scores will only be based on the query that only
>> contains termA... But since you don't care about that, this should be get
>> you a big performance gain.
>>
>> Best regards,
>>
>> Arjen
>>
>>
>> ------------------------------**------------------------------**---------
>> To unsubscribe, e-mail: 
>> java-user-unsubscribe@lucene.**apache.org<java-user-unsubscribe@lucene.apache.org>
>> For additional commands, e-mail: 
>> java-user-help@lucene.apache.**org<java-user-help@lucene.apache.org>
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message