lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tatu Saloranta <>
Subject Re: [jira] Commented: (LUCENE-510) IndexOutput.writeString() should write length in bytes
Date Mon, 08 May 2006 21:21:37 GMT
--- "Marvin Humphrey (JIRA)" <> wrote:

> It also slows Lucene down -- indexing takes around a
> 20% speed hit.  It would be possible to submit a
> patch which had a smaller impact on performance, but
> this one is already over 700 lines long, and it's
> goal is to achieve standard UTF-8 compliance and
> modify the definition of Lucene strings as simply
> and reliably as possible.  Optimization patches can
> now be submitted which build upon this one.

I'm quite sure that the UTF-8 decoding loop can be
improved quite a bit after merging in the patch, so
eventual performance hit is probably lower (assuming
this is a hot spot). Using a tighter inner loop for
single-byte values can give a significant boost (up to
50% speedup compared to default UTF-8 decoder jdk 1.5
ships with).
In this case, it's probably best to isolate the hot
spot (when working on this part, measuring impact of
changes), since otherwise it may be hard to measure
direct impact. And then measure the total effect when
integrating the change.

That is to say, I wouldn't worry too much about the
initial hit, much/most of it can be optimized away
quite soon, just like you suggested.

-+ Tatu +-

Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message