lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter
Date Sat, 09 Feb 2008 15:14:50 GMT
I agree, there comes a point where the cost of added complexity is not
worth the gains, on balance.  Making that tradeoff is not easy.

I don't think the patch in LUCENE-1172 crosses that line: a 1.6% (4.1%
on small docs) top line gain is still a sizable gain.

The profiler points to many other smaller things which I think are
below the line, that I didn't pursue.

I also agree that DocumentsWriter is complex now, and I'd definitely
like to simplify it with time, hopefully without losing too much

Believe it or not, earlier versions (on LUCENE-843) were more complex,
and I pared it down before committing it.  At one point I had a
specialized segment merger that would much more efficiently merge
"partial" segments flushed from RAM.  This was actually a fairly
sizable gain (maybe ~15% overall) when building large indices.  But it
also added sizable complexity, so I took it out.  I still think this
is eventually worthwhile (especially when autoCommit=false), but it
belongs with segment merging instead (this is why I opened


Grant Ingersoll wrote:

> I also agree w/ Robert and Michael, here.  While DocsWriter is  
> really effective, it is very complicated to follow and it makes  
> debugging and maintenance much harder.
> -Grant
> On Feb 9, 2008, at 5:03 AM, Michael Busch wrote:
>> robert engels wrote:
>>> Curious... on things like this, is it really worth adding (and
>>> maintaining) Lucene's own sort, just to achieve a 1.5 % performance
>>> increase. It is almost doubtful that you can even measure an  
>>> improvement
>>> at that level, given all of the variables you can't control.
>> I somewhat agree with Robert here. The DocumentsWriter is a quite
>> complicated class which has already two quicksort implementations and
>> this patch adds even a third one. Is it really so much more  
>> expensive to
>> e. g. sort on an Object[] array and pass in a Comparator?
>> Don't get me wrong, I think this is very sophisticated code and it's
>> super fast as the performance test and also the user experiences with
>> 2.3 proof. However, I think especially in the Open Source world  
>> one of
>> our goals should be to write code that is easy to understand, so that
>> it's easier for new people to get on board. To find a good balance  
>> and
>> trade-off between simplicity, functionality and performance is not
>> always easy. Of course, if a patch improves performance by say 15%, I
>> wouldn't hesitate to commit it. But if it's just 1% but makes the  
>> code
>> more complicated I'm not so sure if it's worth it.
>> That being said, I wouldn't vote -1 against a patch like this one to
>> prevent someone from committing it, but I don't think I would
>> write/commit it myself. I'd just like to encourage everyone to also
>> think about code simplicity and readability before writing and
>> committing new code.
>> -Michael
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message