lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 34930] - [PATCH] IndexWriter.maybeMergeSegments() takes lots of CPU resources
Date Wed, 25 May 2005 04:40:55 GMT
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=34930>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=34930





------- Additional Comments From whoschek@lbl.gov  2005-05-25 06:40 -------
Yep, throwing and catching exception in the critical path is always a performance gotcha,
common case 
or not. See any VM implementation performance papers such as in the IBM Systems journal some
years 
ago, and others. 

No idea why the javacc folks didn't come up with an API that does not involve exceptions for
*normal* 
control flow. Well, javacc has probably been unmaintained dead code for some time now. [Even
Xerces 
has such gotchas deep inside it's low level native API - I chatted with this some time ago
with a Sun 
engineer].

Anyway, you can preallocate the IOException in FastCharStream in  a private static final var,
and then 
throw the same exception object again and again on EOS. That gives some factor 2x the cheap
way 
because the stack trace does not have to be generated and filled repeatadly (Same for the
QueryParser 
copy of FastCharStream).

The other additional 5x comes from getting rid of the exception completely - catching exceptions
is 
expensive. This is done via dirty patching the javacc generated code to not require EOS exceptions
at 
all. Instead you can return 0xFFFF as an EOS marker, or some other unused Unicode value. Look
at the 
javacc generated code and see where it catches the EOS exception. That's where you'd need
to fiddle 
around, making sure true abnormal exceptions are still handled properly. It's really an akward

maintainance nightmare because it interferes with generated code, so I don't really recommend
this. 

Still, StandardAnalyzer eats CPU (and buffer memory) like there's no tomorrow. Instead, I'd
recommend 
giving PatternAnalyzer (from the "memory" SVN contrib area) a try. The functionality is almost
the same 
as StandardAnalyzer, but it can be many times faster, especially when using it with a String
rather than 
a Reader, and you don't have to wait indefinitely for lucene to get fixed.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message