lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tatu Saloranta <t...@hypermall.net>
Subject Re: Performance question
Date Fri, 09 Jan 2004 01:42:02 GMT
On Wednesday 07 January 2004 20:48, Dror Matalon wrote:
> On Wed, Jan 07, 2004 at 07:24:22PM -0700, Scott Smith wrote:
...
> > Thanks for the suggestions.  I wonder how much faster I can go if I
> > implement some of those?
>
> 25 msecs to insert a document is on the high side, but it depends of
> course on the size of your document. You're probably spending 90% of
> your time in the XML parsing. I believe that there are other parsers
> that are faster than xerces, you might want to look at these. You might
> want to look at http://dom4j.org/.

I think more significant than whether one uses DOM or some other full-document 
in-memory parser, is whether to perhaps use streaming (usually event-based) 
parsers such as ones using SAX. These are generally an order of magnitude 
faster, at least for bigger documents. Fortunately many standard XML parsers 
can work as both DOM and SAX parsers (I believe Xerces at least does, in any 
case).

It's bit more cumbersome to use event-based parsers (push vs. pull; need to 
explicitly keep track of current subtree, if parent tag order matters), but 
from performance perspective (memory usage, speed) it may be worth it.

-+ Tatu +-


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message