lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Klaas <>
Subject Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter
Date Sun, 10 Feb 2008 23:48:13 GMT
While I agree in general that excessive optimization at the expense  
of code clarity is undesirable, you are overstating the point.  2X is  
a ridiculous threshold to apply to something as performance critical  
as a full text search engine.  If search was twice as slow, lucene  
would be utterly unusable for me.  Indexing less important than  
search, of course, but a 2X slowdown with be quite painful there.

I don't have an opinion in this case: I believe that there is a  
tradeoff but that it is the responsibility of the commiter(s) to  
achieve the correct balance--they are the ones who will be  
maintaining the code, after all.  I find your persistence surprising  
and your tone dangerously near condescending.  Telling the guy who  
has spent hundreds of hours carefully optimizing this code that  
"Almost always there is a better bottleneck somewhere" shows an  
astonishing lack of perspective and respect.


On 10-Feb-08, at 12:15 PM, robert engels wrote:

> I am not sure these numbers matter. I think they are skewed because  
> you are probably running too short a test, and the index is in  
> memory (or OS cache).
> Once you use a real index that needs to read/write from the disk,  
> the percentage change will be negligible.
> This is the problem with many of these "performance changes" - they  
> just aren't real world enough.  Even if they were, I would argue  
> that code simplicity/maintainability is worth more than 6 seconds  
> on a operation that takes 4 minutes to run...
> There are many people that believe micro benchmarks are next to  
> worthless. A good rule of thumb is that if the optimization doesn't  
> result in 2x speedup, it probably shouldn't be done. In most cases  
> any efficiency gains are later lost in maintainability issues.
> See
> Almost always there is a better bottleneck somewhere.
> On Feb 10, 2008, at 1:37 PM, Michael McCandless wrote:
>> Yonik Seeley wrote:
>>> I wonder how well a single generic quickSort(Object[] arr, int low,
>>> int high) would perform vs the type-specific ones?  I guess the main
>>> overhead would be a cast from Object to the specific class to do the
>>> compare?  Too bad Java doesn't have true generics/templates.
>> OK I tested this.
>> Starting from the patch on LUCENE-1172, which has 3 quickSort methods
>> (one per type), I created a single quickSort method on Object[] that
>> takes a Comparator, and made 3 Comparators instead.
>> Mac OS X 10.4 (JVM 1.5):
>>     original patch --> 247.1
>>   simplified patch --> 254.9 (3.2% slower)
>> Windows Server 2003 R64 (JVM 1.6):
>>     original patch --> 440.6
>>   simplified patch --> 452.7 (2.7% slower)
>> The times are best in 10 runs.  I'm running all tests with these JVM
>> args:
>>   -Xms1024M -Xmx1024M -Xbatch -server
>> I think this is a big enough difference in performance that it's
>> worth keeping 3 separate quickSorts in DocumentsWriter.
>> Mike
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message