lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: [jira] Issue Comment Edited: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API
Date Fri, 23 Oct 2009 17:53:15 GMT
Nice! I like it. Even if its not much faster (havn't checked either), I
can't see it being much slower and its cleaner code.

I'd be happy to do some quick perf tests when I get a chance, but I'm +1
on it.

Uwe Schindler wrote:
> Mark,
>
> when removing may comment (as I now understand the whole
> FieldDocSortedHitQueue), I found the following as a optimization of the
> whole hq:
>
> All FieldDoc values are Compareables (also the score or docid, if they
> appear as SortField in a MultiSearcher or ParallelMultiSearcher). The code
> of lessThan seems very ineffective, as it has a big switch statement on the
> SortField type, then casts the value to the underlying numeric type Object,
> calls Number.xxxValue() & co for it and then compares manually. As
> j.l.Number is itself Comparable, I see no reason to do this. Just call
> compareTo on the Comparable interface and we are happy. The big deal is that
> it prevents casting and the two method calls xxxValue(), as Number.compareTo
> works more efficient internally.
>
> The only special cases are String sort, where the Locale may be used and the
> score sorting which is backwards. But these are two if statements instead of
> the whole switch.
>
> I had not tested it now for performance, but in my opinion it should be
> faster for MultiSearchers. All tests still pass (because they should).
>
> Attached patch applies to (current) trunk.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>   
>> -----Original Message-----
>> From: Mark Miller (JIRA) [mailto:jira@apache.org]
>> Sent: Friday, October 23, 2009 3:33 PM
>> To: java-dev@lucene.apache.org
>> Subject: [jira] Issue Comment Edited: (LUCENE-1997) Explore performance of
>> multi-PQ vs single-PQ sorting API
>>
>>
>>     [ https://issues.apache.org/jira/browse/LUCENE-
>> 1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
>> tabpanel&focusedCommentId=12769221#action_12769221 ]
>>
>> Mark Miller edited comment on LUCENE-1997 at 10/23/09 1:31 PM:
>> ---------------------------------------------------------------
>>
>> bq. but how does this fit together.
>>
>> Thats what Comparable FieldComparator#value is for - fillFields will grab
>> all those and load up FieldDoc fields - so the custom FieldComparator is
>> tied into it - it creates Comparable objects that can be compared by the
>> native compareTos. (the old API did the same thing)
>>
>> {code}
>>   /**
>>    * Given a queue Entry, creates a corresponding FieldDoc
>>    * that contains the values used to sort the given document.
>>    * These values are not the raw values out of the index, but the
>> internal
>>    * representation of them. This is so the given search hit can be
>> collated by
>>    * a MultiSearcher with other search hits.
>>    *
>>    * @param entry The Entry used to create a FieldDoc
>>    * @return The newly created FieldDoc
>>    * @see Searchable#search(Weight,Filter,int,Sort)
>>    */
>>   FieldDoc fillFields(final Entry entry) {
>>     final int n = comparators.length;
>>     final Comparable[] fields = new Comparable[n];
>>     for (int i = 0; i < n; ++i) {
>>       fields[i] = comparators[i].value(entry.slot);
>>     }
>>     //if (maxscore > 1.0f) doc.score /= maxscore;   // normalize scores
>>     return new FieldDoc(entry.docID, entry.score, fields);
>>   }
>> {code}
>>
>>       was (Author: markrmiller@gmail.com):
>>     bq. but how does this fit together.
>>
>> Thats what Comparable FieldComparator#value is for - fillFields will grab
>> all those and load up FieldDoc fields - so the custom FieldComparator is
>> tied into it - it creates Comparable objects that can be compared by the
>> native compareTos.
>>
>> {code}
>>   /**
>>    * Given a queue Entry, creates a corresponding FieldDoc
>>    * that contains the values used to sort the given document.
>>    * These values are not the raw values out of the index, but the
>> internal
>>    * representation of them. This is so the given search hit can be
>> collated by
>>    * a MultiSearcher with other search hits.
>>    *
>>    * @param entry The Entry used to create a FieldDoc
>>    * @return The newly created FieldDoc
>>    * @see Searchable#search(Weight,Filter,int,Sort)
>>    */
>>   FieldDoc fillFields(final Entry entry) {
>>     final int n = comparators.length;
>>     final Comparable[] fields = new Comparable[n];
>>     for (int i = 0; i < n; ++i) {
>>       fields[i] = comparators[i].value(entry.slot);
>>     }
>>     //if (maxscore > 1.0f) doc.score /= maxscore;   // normalize scores
>>     return new FieldDoc(entry.docID, entry.score, fields);
>>   }
>> {code}
>>
>>     
>>> Explore performance of multi-PQ vs single-PQ sorting API
>>> --------------------------------------------------------
>>>
>>>                 Key: LUCENE-1997
>>>                 URL: https://issues.apache.org/jira/browse/LUCENE-1997
>>>             Project: Lucene - Java
>>>          Issue Type: Improvement
>>>          Components: Search
>>>    Affects Versions: 2.9
>>>            Reporter: Michael McCandless
>>>            Assignee: Michael McCandless
>>>         Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>>>
>>>
>>> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
>>> where a simpler (non-segment-based) comparator API is proposed that
>>> gathers results into multiple PQs (one per segment) and then merges
>>> them in the end.
>>> I started from John's multi-PQ code and worked it into
>>> contrib/benchmark so that we could run perf tests.  Then I generified
>>> the Python script I use for running search benchmarks (in
>>> contrib/benchmark/sortBench.py).
>>> The script first creates indexes with 1M docs (based on
>>> SortableSingleDocSource, and based on wikipedia, if available).  Then
>>> it runs various combinations:
>>>   * Index with 20 balanced segments vs index with the "normal" log
>>>     segment size
>>>   * Queries with different numbers of hits (only for wikipedia index)
>>>   * Different top N
>>>   * Different sorts (by title, for wikipedia, and by random string,
>>>     random int, and country for the random index)
>>> For each test, 7 search rounds are run and the best QPS is kept.  The
>>> script runs singlePQ then multiPQ, and records the resulting best QPS
>>> for each and produces table (in Jira format) as output.
>>>       
>> --
>> This message is automatically generated by JIRA.
>> -
>> You can reply to this email to add a comment to the issue online.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>     
>
>   
> ------------------------------------------------------------------------
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message