lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: [jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API
Date Mon, 02 Nov 2009 23:34:19 GMT

Yeah, I was half joking about the Latin - I'm sure you do know what it  
means. But you still chose to reword it as "the" concern. And so I  
chose to flaunt my ridiculously miniscule grasp of high school Latin :)

- Mark

http://www.lucidimagination.com (mobile)

On Nov 2, 2009, at 6:27 PM, Jake Mannix <jake.mannix@gmail.com> wrote:

> Right, that's my common sense saying that I'm guessing that worrying  
> about memory usage of the order of numSegments * numResultsReturned  
> is rarely going to be an issue.
>
> I think I actually took latin once, so I may have even known what  
> "eg" literally meant back at some point in prehistory.  I just have  
> seen how these discussions can catch on one particular thing that is  
> brought up, and then derailed into long banterings back and forth on  
> that topic like it was the most important in the world for ever (for  
> example, how I'm highlighting this one parenthetical remark which  
> Mike made), losing what the actual consensus and disagreement was.
>
> As far as I could see, in general, for most common use cases, the  
> performance was so comparable as to be changeable by wind patterns,  
> and ram usage is such that in comparison to how much memory a lucene  
> system which is asking for thousands of hits to be returned across  
> hundred-segment indices (for typical LogMergePolicy, this would only  
> happen once you get up toward a billion documents) is probably  
> already so large that worrying about another 10KB here and there is  
> really going to get washed away in the same way that the leading  
> order complexity is totally dominating the CPU time.
>
> The real thing to focus on is the API itself, for maximal ease of  
> use for the next possibly 3 years (if the timeframe of 2.x is any  
> indicator).
>
>   -jake
>
> On Mon, Nov 2, 2009 at 3:16 PM, Mark Miller <markrmiller@gmail.com>  
> wrote:
> That I don't know. Probably fewer than more, but that's just me  
> guessing based on common sense :)
>
>
> - Mark
>
> http://www.lucidimagination.com (mobile)
>
> On Nov 2, 2009, at 6:13 PM, Jake Mannix <jake.mannix@gmail.com> wrote:
>
>> Ok, and how many of those users are also running on indices with  
>> hundreds of segments?
>>
>>   -jake
>>
>> On Mon, Nov 2, 2009 at 3:10 PM, Mark Miller <markrmiller@gmail.com>  
>> wrote:
>> There are plenty of Lucene users that do go 1000 in. We've been  
>> calling it "deep paging" at LI. I like that name :)
>>
>> - Mark
>>
>> http://www.lucidimagination.com (mobile)
>>
>>
>> On Nov 2, 2009, at 6:04 PM, "Jake Mannix (JIRA)" <jira@apache.org>  
>> wrote:
>>
>>
>>   [ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772741#action_12772741

>>  ]
>>
>> Jake Mannix commented on LUCENE-1997:
>> -------------------------------------
>>
>> The current concern is to do with the memory?  I'm more concerned  
>> with the weird "java ghosts" that are flying around, sometimes  
>> swaying results by 20-40%...  the memory could only be an issue on  
>> a setup with hundreds of segments and sorting the top 1000 values  
>> (do we really try to optimize for this performance case?).  In the  
>> normal case (no more than tens of segments, and the top 10 or 100  
>> hits), we're talking about what, 100-1000 PQ entries?
>>
>> Explore performance of multi-PQ vs single-PQ sorting API
>> --------------------------------------------------------
>>
>>               Key: LUCENE-1997
>>               URL: https://issues.apache.org/jira/browse/LUCENE-1997
>>           Project: Lucene - Java
>>        Issue Type: Improvement
>>        Components: Search
>>  Affects Versions: 2.9
>>          Reporter: Michael McCandless
>>          Assignee: Michael McCandless
>>       Attachments: LUCENE-1997.patch, LUCENE-1997.patch,  
>> LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch,  
>> LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch,  
>> LUCENE-1997.patch
>>
>>
>> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java- 
>> dev,
>> where a simpler (non-segment-based) comparator API is proposed that
>> gathers results into multiple PQs (one per segment) and then merges
>> them in the end.
>> I started from John's multi-PQ code and worked it into
>> contrib/benchmark so that we could run perf tests.  Then I generified
>> the Python script I use for running search benchmarks (in
>> contrib/benchmark/sortBench.py).
>> The script first creates indexes with 1M docs (based on
>> SortableSingleDocSource, and based on wikipedia, if available).  Then
>> it runs various combinations:
>>  * Index with 20 balanced segments vs index with the "normal" log
>>   segment size
>>  * Queries with different numbers of hits (only for wikipedia index)
>>  * Different top N
>>  * Different sorts (by title, for wikipedia, and by random string,
>>   random int, and country for the random index)
>> For each test, 7 search rounds are run and the best QPS is kept.  The
>> script runs singlePQ then multiPQ, and records the resulting best QPS
>> for each and produces table (in Jira format) as output.
>>
>> -- 
>> This message is automatically generated by JIRA.
>> -
>> You can reply to this email to add a comment to the issue online.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>

Mime
View raw message